I’m been trying to figure out why the configuration with RF2 & N+1 reserves more capacity than just RF2. I understand that with RF2 all the data is duplicated along the cluster and the failure of one node is tolerated. In my scenario I have a 3-node cluster with 34TiB of effective capacity, so with RF2, I would have only 17TiB, everything is clear until that point.
But the extent store assuming RF2 & N+1 gives me 11.3TiB, which is 1/3 of my available capacity. So...the question is: If I reserve capacity, by tripling all the data, that is for tolerate 1 additional node failure besides the 1st one (tolerated by RF2)?
A silly conclussion would be that with RF2 & N+1 the cluster is able to tolerate 2 nodes and continue operating just with one node, but I know that is not posible.
So, why assuming RF2 & N+1 reserves more TiB’s than just RF2? Please, I would appreciate the help.
Thanks in advanced!
Best answer by Alona
It seems to me you are looking at projections in Sizer RF and FT can be confusing. Let me try and explain it.
RF = Replication Factor. The number of copies of data. RF2 = each piece of data exists in two locations.
FT = Fault Tolerance also referred to by Nutanix as Redundancy Factor. Roughly (!) the nodes a cluster can tolerate without loss of data.
RF2 means you can lose a disk and recover it.
FT2 which is what ‘N+1’ in Sizer referrers to means you can lose a node and still recover. Hence RF2 with N+1 can tolerate a loss of a disk and a node at the same time. Let’s pause here and imagine how that is possible.
When a disk becomes faulty extra copy of the data is engaged but also there is now only one copy of data that was on that faulty disk. Cluster settings of RF2 dictate that there should be 2 copies of data at all times so the missing data is replicated. Provided there is free space. Replication Factor is restored. The disk is replaced. Life goes on.
When a node goes offline it takes some of the data with it. The cluster settings of RF2 dictate there should be 2 copies of the data and so the missing replicas of the cluster must be restored otherwise there is a disk failure with the only copy of the data that would be an unercoverable loss. That extra space you metioned is reserved for rebuilding the copies of data that the faulty node hosted.
I hope this makes sense. There are explanations here, on the forum as well as the video on YouTube. I’ll leave the links below.
Let me know if there are further questions.