Failures are part of everything and Nutanix Clusters is not immune to it. But how we plan for failures determines the versatility of the product or a person for that matter!!
Nutanix categorizes the type of failures into availability domains essentially based on type of failure. Nutanix provides the ability to tolerate rack failure for extended data availability, in addition to drive, node, block and network link failure.
Node Failure
A Nutanix Node comprises Physical host and a controller VM. Both these components can fail without any impact to the Nutanix cluster.
CVM failure
When a CVM fails, an alert is generated in Prism and another CVM redirects the storage path on the related host to another CVM. Read and writes will occur over the 10GbE network until the CVM comes back online.
It is business as usual for the end customer with maybe a slight performance decrease.
Controller VM Failure
Physical Host failure
If a node fails, all HA-protected VMs can be automatically restarted on other nodes in the cluster. End users will see that their application is unavailable during the time that the VMs are restarted on other hosts.
Node Failure
For More Info:
- Availability Domains from Prism Web Console Guide
- Rack Awareness
- Block Awareness