hi team, i want to ask something, in my understanding nutanix can tolerates 1 node failure if I use rf 2 where the data will be replicated into 2 pieces, what happens if I have 10 nodes, how much nutanix can tolerance of node failure at the same time? it still tolerates only 1 node failure or any other mechanism? I am confuse after read this kb https://portal.nutanix.com/page/documents/details/?targetId=Web-Console-Guide-Prism-v5_17%3Aarc-host-failure-c.html
Best answer by Sergei Ivanov
There are 2 options of fault tolerance - RF2 and RF3.
RF2 means there are 2 copies of all data. With RF2 one node can go down at a given time.
RF3 means there are 3 copies of all data. With RF3 two nodes can go down at a given time.
If you have 10 nodes and RF2 configuration, one node can go down and the cluster will stay up. When the node goes down, the data starts rebuilding and the cluster recreates the copies of data that went missing. If another node goes down while the data rebuild is not finished, the cluster will go down.
If you have enough free space in the cluster, after some time, when the rebuild is complete, one more node can go down and so on.