How It Works

Welcome to the Nutanix NEXT community. To get started please read our short welcome post. Thanks!

Showing results for 
Search instead for 
Did you mean: 

Hard disk failure


Hard disk failure

Hey guys,


We are performing some tests at our lab, one of these test is to see what happens to performance when we physically disconnect an hard disk/SSD. 


Testing tools:

HammerDB(SQL benchmarking) & IOmeter.


These are the results:

*Take note: 

-Graphs timespan = 10 minutes

-At no point in the graph the hdd/ssd was physically reconnected

-There is only data on the SSD tier.




We have a few questions concerning the results:


1)When we disconnect an HDD, why does this have impact on the SSD tier aswel? (Since there is only data on the SSD's)

2)For some reason when the system recovers from the disconnect after about three minutes the performance descreases again for about 2 minutes by 50%. What could be happening here?








Nutanix Employee

Re: Hard disk failure

Hi @sverhoevne


HDD Failure: 

I can't be 100% certain without more details about what the system was doing at this time, but I can take a good guess. Any drive failure is an extremely high priority event and Curator process will be started to replicate the data. This is an high priority event and will take away CPU cycles. This will impact IOPs load. However, performance delta due to this HDD failure with no data to replicate should be very short lived. 


SSD Failure:

Performance impact should be only on writes. Concern on why it goes down is related with burst vs. sustained write performance due to write cache thresholds. 


This is not specific to Nutanix. Any storage system will have limited write cache space and will have higher burst write performance but a lot lower sustained write performance. In real world, write workloads are bursty but for short durations. Benchmark tests are more aggressive than real world workloads and tend to hit one of the cache thresholds. 



Re: Hard disk failure

@sverhoevne have you tested with longer timespan eg. 30 minutes or 60 minutes 

could you also share CPU utilization during the test