I've had an intermittant problem with some 2012 VMs I had migrated that have been locking up after a random time, usually going to 100% CPU and all IO stops. VM Console is unresponsive and network not reachable, reset is only solution to bring back online. It would then run again for a number of minutes, hours, even a day or more only to do same thing again at a random time through day or night, so was no logical patten to it.
I endup tracking it down to a timezone config issue, when rebooting the VMs in question the timezone would set their clock to the hypervisior host time, which was in a different timezone (default TZ nodes were shipped with) to the the host VM TZ. I had a similar problem in the past, although not with the host lockups, where a number of hosts would lose time sync and be out by the TZ difference between host and VM on our Citrix XenServer Cluster hypervisor host. Because our VM are on an AD domain they will be out of NTP adjustment range becasue time difference was too big.
As part of my cluster setup I had changed the cluster timezone to be same as mine (Australia/Perth). but failed to do the AHV hosts TZ.
After changing AHV hosts to correct TZ and Rebooting VMs, all appears to be fine now and no more host lockups, also on the correct time after reboots each and every time
hmmm spoke too soon....sigh
Lockups have gone since TZ changes, but now possibly a coincidence,
Time is still out on VM reboot by -8 hours which just so happens to be the time difference for my timezone (Australia/Perth) GMT+8.
Will investigate further.
Bring this up on your case with Support (looks like its assigned to Chetan), I've got a sneaking suspicion this is VirtIO driver related.
@BB_Infra - is support engaged? If not, please file a support ticket right away, so that we can link up with you to debug.