Hard disk failure


Badge +7
Hey guys,

We are performing some tests at our lab, one of these test is to see what happens to performance when we physically disconnect an hard disk/SSD.

Testing tools:
HammerDB(SQL benchmarking) & IOmeter.

These are the results:
*Take note:
-Graphs timespan = 10 minutes
-At no point in the graph the hdd/ssd was physically reconnected
-There is only data on the SSD tier.



We have a few questions concerning the results:

1)When we disconnect an HDD, why does this have impact on the SSD tier aswel? (Since there is only data on the SSD's)
2)For some reason when the system recovers from the disconnect after about three minutes the performance descreases again for about 2 minutes by 50%. What could be happening here?


Seba




2 replies

Badge
Hi sverhoevne

HDD Failure:
I can't be 100% certain without more details about what the system was doing at this time, but I can take a good guess. Any drive failure is an extremely high priority event and Curator process will be started to replicate the data. This is an high priority event and will take away CPU cycles. This will impact IOPs load. However, performance delta due to this HDD failure with no data to replicate should be very short lived.

SSD Failure:
Performance impact should be only on writes. Concern on why it goes down is related with burst vs. sustained write performance due to write cache thresholds.

This is not specific to Nutanix. Any storage system will have limited write cache space and will have higher burst write performance but a lot lower sustained write performance. In real world, write workloads are bursty but for short durations. Benchmark tests are more aggressive than real world workloads and tend to hit one of the cache thresholds.

Userlevel 4
Badge +17
sverhoevne have you tested with longer timespan eg. 30 minutes or 60 minutes
could you also share CPU utilization during the test

Reply