Understanding Fault Domains and Rack Awareness

  • 21 August 2020
  • 0 replies

Userlevel 2
Badge +1

What is Fault Tolerance (FT)?

FT is how the system ensures that both user VM data and cluster infrastructure data is protected from failure. 

What are Fault Domains?

Failure scenarios can be thought of in terms of fault domains.  There are four fault domains in a Nutanix cluster:

  • Disk
  • Node
  • Block
  • Rack

This article focusses on Rack Awareness. Rack failure can occur in the following situations:

  • All power supplies fail within a rack
  • Top-of-rack (TOR) switch fails
  • Network partition; where one of the racks becomes inaccessible from other racks

When rack fault tolerance is enabled, the cluster has rack awareness and the guest VMs can continue to run with failure of one rack (RF2) or two racks (RF3). The redundant copies of guest VM data and metadata exist on other racks when one rack fails.
Note – Rack fault tolerance has to be configured manually.

Requirements and configuration of Rack Fault Tolerance can be found here.
To learn more about Rack Fault tolerance on Nutanix Clusters click here.
Now, Rack awareness is available even on clusters with AWS. Click here to read more about it.

This topic has been closed for comments