Question

Question regarding Erasor coding

  • 12 June 2020
  • 3 replies
  • 2015 views

Badge +1

Ques- There is 5 node cluster with 200 TB raw disk data, administrator want to enable eraser coding to obtain more space. What will we the usable space after enable eraser coding

Options – 100, 125, 150, 175 TB ? Please share the calculation formula ?

 

Ques2 - Administrator want to perform DR test after 6 months, which snapshot he should use?

Option – latest snapshot, oldest one

 


This topic has been closed for comments

3 replies

Userlevel 6
Badge +5

Hi SonuK,

Cluster size = 4 nodes ( each node 20 TB) and this cluster is configured with RF2(as mentioned above)

Raw storage = 80 TB

Usable = 40 TB

So I am still confused how they have shown 40 TB as usable capacity in 4 node cluster with enabling eraser coding?

This part is calculated before erasure coding is enabled. The table compares usable space with erasure coding as opposed to without it.

RF2 means 2 copies of data exist at any given point in time, meaning whatever the amount of raw data you have, half of it will be used for replica data. 80TiB cluster has 40TiB usable space.

When the EC-X is enabled, EC-X effectively removes the need for the replica data, and replaces it with parity data. So instead of keeping an identical copy of the data, a logical calculation is performed and a set of data chunks is encoded with a one parity chunk. Similar to RAID logic. Similar but different because each EC-X stripe (data+parity) consists of the chunks that are each on a different node of the cluster. This is to ensure that all the data is recoverable, i.e. one node failure means that the missing stripe part can be recovered using the parity, or a missing parity part that went down with the node can be recalculated based on all of the remaining data parts.

 

Secondly how they are getting the dividing value like 1.5, 1.33, 1.25, 12.5 ?

 

Let’s talk about that KB separately once you share it? For now, let’s look at the example.

For an EC-X strip you would want:

  • data
  • parity
  • spare to recover from the loss of a data chunk

Hence:

4 nodes = 1 spare, 1 parity, 2 data

5 nodes = 1 spare, 1 parity, 3 data

6 nodes = 1 spare, 1 parity, 4 data

1 parity can code 4 data pieces at most hence it does not increase from there. That’s why 7 nodes cluster would still have 1 parity per 4 data plus 1 spare.

 

Now, that we moved that out of the way, what about the actual calculation?

4 nodes cluster: 1 parity codes 2 parts of data => 2/3 of the data written is actual data.

80TiB x 2/3 (or 80TiB / 1.5) = 53.3TiB

 

5 nodes: 1 parity codes 3 parts of data => ¾ of the data written is actual data.

100TiB x¾ (or 100TiB /1.33) = 75TiB

 

6 nodes: 1 parity per 4 data = 4/5, 120x4/5 (or 120/ 1.25) = 150TiB

Also take a look at this https://portal.nutanix.com/page/documents/solutions/details/?targetId=TN-2032-Data-Efficiency%3Atop_data_reduction.html

 

Let me know if you have further questions.

 

Badge +1

Hi, Thanks for your reply.

I went through the console guide, lets pick up the first example

Cluster size = 4 nodes ( each node 20 TB) and this cluster is configured with RF2(as mentioned above)

Raw storage = 80 TB

Usable = 40 TB

Usable after enabling Eraser coding = 53 TB or 80/1.5 = 53 TB

 

 

If we follow this example the normal Usable capacity is 40 TB, how normal usable capacity should be 62 TB [80 TB - (09*20)] as per Nutanix KB - Calculate usable storage(Sorry but I cant find that KB).

So I am still confused how they have shown 40 TB as usable capacity in 4 node cluster with enabling eraser coding?

Secondly how they are getting the dividing value like 1.5, 1.33, 1.25, 12.5 ?

 

Please clarify

Userlevel 6
Badge +5

Hi SonuK,

The questions look like they’re from some exam, is that right?

In relation to erasure coding usable space take a look at this: Prism Web Console Guide: Example of Savings from Erasure Coding.

 

As per the DR test, it depends on the recovery point and the snapshots available. In case of DR usually the goal is to recover data with as little loss i.e. as close to the time of the disaster as possible hence the latest snapshot that had been taken prior to the disastrous event would be the choice. This is not the right or wrong answer but rather a chain of thoughts I would use.