Hello fellow Nutanix fans,
We have a three-node cluster in a DR location. This DR cluster has less capacity than the primary site which has resulted in us exceeding the resilient capacity of the DR cluster.
To remedy this situation, we decided to replace four 8TB HDDs with four 12TB HDDs in each node.
Today we removed one drive from a node in Prism UI and waited approximately four hours before the system reported it was safe to remove the drive. During this time, Prism reported that data was lowering on the drive. The cluster showed about 700MB/s of throughput during this rebuild, which seems pretty really good for a cluster of this size.
We removed the 8TB HDD and inserted the 12TB HDD. The rebuild did not initiate until we ran NCC. By default, NCC runs every 24 hours so, eventually the rebuild would have started but, we wanted to see it get started.
We watched the rebuild run for a while after it got underway. The system is writing around 80MB/s to the replacement drive. At this rate (250GB/hour), it should take about 16 hours to rebuild the roughly 4.5TiB of data. At this rate, we can only do one drive replacement per day.
We have a few questions we are hoping someone can answer.
- We haven’t been very successful in finding good information relating to capacity expansion by replacing existing drive with larger ones. Is there a good reference out there?
- Are we better off performing the graceful drive removal, which takes around four hours or can we just remove a drive and replace it with the larger one? Is one method safer than the other?
- The specs of the 12TB drive state it can sustain a transfer rate of roughly 240MB/s. Why is the drive rebuild only pushing a third of this potential bandwidth?
- We believe that since we are on AOS 5.20, once all the drives in a node have been replaced, the node will provide the full capacity of the new drives to the cluster.
- Is there a better way for us to be upgrading the capacity of this cluster given our decision to simply replace drives?
Thanks for reading and replying.
Best answer by gabeoView original