Can Two Disks Fail in a Short Interval?

Yes. of course. The real answer lies on how these failures are handled in HCI architectures. What happens to the Data Availability during these failures?

The questions arise:

Does HCI architecture start protecting data as quickly as possible?
How smart the architecture is to rebuild the lost data and how long it's going to take?

If the rebuild takes longer, this time window is called risky period. Administrator must hope that there is no other failure occurs during this long rebuild window to ensure FTT and RF requirements.
Vendors may offer another brute force method of increasing RF / FTT which increases the cost (in the absense of smart architecture).

In this video, I have two disk failures simulated with soft disk pulls on Nutanix AOS. This short video can articulate how fast the AOS gets into action to protect the data from subsequent failures and how fast it completes with its truly distributed architecture by employing all available resources in the cluster with minimal impact to performance of the existing workload.

You can read the rest of the article by GV Govindasamy Nutanix Senior Solutions and Performance Engineer at his blog.

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded