We love our VM level snapshots on VMware ESXi. It is a quick and easy way to roll back recent changes as if nothing has ever happened. It gives peace of mind and boosts confidence. It is not an answer to all prayers as, just like anything, it has its limitations.
What are the limitations of VM level snapshot in a Nutanix cluster?
First, let’s take a look at what are some of the interesting events that occur during the snapshot operation and its presence:
If a virtual machine is running off of a snapshot, it is making changes to a child or sparse disk also called delta disk.
The delta disk metadata in-memory of vSphere host includes the delta disk header. Updates to the header of the delta disks happen in memory as required and the changes are written to disk only upon certain events such as snapshot consolidation or when the delta disk is closed.
Storage snapshot operations and storage replications are transparent to ESXi hosts. If the storage snapshot used to restore a VM was taken before the snapshot header changes were flushed to disk, then delta disk metadata on the restored VM is not consistent. Similarly, a synchronous or asynchronous replica of VMFS file system might not contain all header changes as they might have not been flushed at the moment replication of underlying LUN was stopped.
Storage level snapshots happen at the hardware disk level. Since VMware does not provide a mechanism to know when the in-memory updates to the delta disk header are flushed to disk, it is possible that the storage snapshot taken of the VM at this point may be inconsistent.
This behavior is common across all storage vendors that provide the ability to take storage snapshots in ESXi environments This is described in detail in the VMware KB article - VMware KB-2147276
How does that intervene with Nutanix Protection Domain operations? PD is based on snapshots that are taken at a vDisk level. When there is a Virtual Machine that is running on ESXi and has a VMware snapshot present at the time when a storage snapshot is taken, Nutanix raises the alert of “Protected VM(s) Not Recoverable”
For the solution and symptoms of a recovered VM failure please refer to KB-7023 Alert - Protected VM(s) Not Recoverable.
For more guidance on Async DR please see Prism Web Console Guide v5.10: Data Protection Guidelines (Async DR).
For more information on Async DR including configuration and recovery operations please see Prism Web Console Guide v5.10: Data Protection.
For more explanation on VMware VM snapshot try this VMware article KB-1015180.