Hardware failures are a part of any datacenter lifecycle. The Nutanix architecture was designed with this inevitability in mind. A cluster can tolerate one or two failures (depending on the replication factor of the cluster or container) of a variety of hardware components while still running guest VMs and responding to commands through the management console. Many of these failures also trigger an alert through that same management console in order to give the administrator a chance to respond to the situation.
Nutanix provides the ability to tolerate rack failures for extended data availability, in addition to drive, node, block, and network link failure.
Block fault tolerance lets a Nutanix cluster make redundant copies of data and metadata and place the copies on nodes in different blocks.
A block is a rack-mountable enclosure that contains one to four Nutanix nodes. All nodes in a block share power supplies, front control panels (ears), backplane, and fans.
Nutanix offers block fault tolerance as an opt-in procedure, as in Configuring Block Fault Tolerance, or a best-effort procedure, as in Block Fault Tolerance in Best Effort mode.
The opt-in block fault tolerance feature offers guaranteed data resiliency when required conditions are met. For best-effort fault tolerance mode, data copies remain on the same block when there is insufficient space across all blocks.
With block fault tolerance enabled, guest VMs can continue to run after a block failure because redundant copies of guest VM data and metadata exist on other blocks.
Run the command below to identify if your cluster is block aware:
ncli cluster get-domain-fault-tolerance-status type=rackable_unit
The Current Fault Tolerance value should be 1