Solved

How many nodes can fail in a nutanix cluster?

  • 25 August 2014
  • 5 replies
  • 11064 views

Badge +9
  • Adventurer
  • 7 replies
Hi,

in the platform admin guide is mentioned that there can be only 1 node failure at one time.
What happens if a complete block get's lost?

thanks and best regards
icon

Best answer by christie 29 August 2014, 06:37

if you have rf=2 containers, then you can only sustain a single node outage within your cluster.
if you have 4 nodes per block, a single block failure will mean you hvae 4 nodes down therefore you will likely experience storage unavailability.

if you have rf=3 containers, you can sustain a two node outage.
however, similar math applies.

this is why we offer power supply redundancy per block so you should only experience complete block failure in rare circumstances.
View original

5 replies

Userlevel 4
Badge +21
Guess thats a depends question. If you have 5 nodes, you can config the cluster to lose 2 nodes with RF3. If you have 3 uniform blocks and you lose a block the cluster will keep running. That feature called availablity domains.


If you only have 1 block and you lose power to the block, once you restore power everything should come back fine. All writes are synced and knowledged to the guess vms.

Does that help?
Badge +6
if you have rf=2 containers, then you can only sustain a single node outage within your cluster.
if you have 4 nodes per block, a single block failure will mean you hvae 4 nodes down therefore you will likely experience storage unavailability.

if you have rf=3 containers, you can sustain a two node outage.
however, similar math applies.

this is why we offer power supply redundancy per block so you should only experience complete block failure in rare circumstances.
Badge +9
thank you for the information.
is it possible to get a two rack "rack aware" config? (considering a 50% compute reservation)
Badge +9
thank you for the information!
best regards
Manfred
Badge +6
It is worth pointing out than although the cluster can only handle 1 (or 2 with higher RF settings) node failures at one time, you can lose Node #1 and the cluster will start to heal. After that process finishes, you can lose another node and still up and running. You are only "down" if you lose more nodes than your RF settings can handle if the failures happen before the cluster is healed. That's not very helpful if you lose a block, but for something that affects only 1 node (hdd or bad RAM for example), you are good to go.

Reply