How It Works

Welcome to the Nutanix NEXT community. To get started please read our short welcome post. Thanks!

cancel
Showing results for 
Search instead for 
Did you mean: 

Hard disk failure

Highlighted
Trailblazer

Hard disk failure

Hey guys,

 

We are performing some tests at our lab, one of these test is to see what happens to performance when we physically disconnect an hard disk/SSD. 

 

Testing tools:

HammerDB(SQL benchmarking) & IOmeter.

 

These are the results:

*Take note: 

-Graphs timespan = 10 minutes

-At no point in the graph the hdd/ssd was physically reconnected

-There is only data on the SSD tier.

 

HDDfailure1.PNGHDDfailure2.PNGSSDfailure1.PNGSSDfailure2.PNG

 

We have a few questions concerning the results:

 

1)When we disconnect an HDD, why does this have impact on the SSD tier aswel? (Since there is only data on the SSD's)

2)For some reason when the system recovers from the disconnect after about three minutes the performance descreases again for about 2 minutes by 50%. What could be happening here?

 

 

Seba

 

 

 

 

2 REPLIES
Nutanix Employee

Re: Hard disk failure

Hi @sverhoevne

 

HDD Failure: 

I can't be 100% certain without more details about what the system was doing at this time, but I can take a good guess. Any drive failure is an extremely high priority event and Curator process will be started to replicate the data. This is an high priority event and will take away CPU cycles. This will impact IOPs load. However, performance delta due to this HDD failure with no data to replicate should be very short lived. 

 

SSD Failure:

Performance impact should be only on writes. Concern on why it goes down is related with burst vs. sustained write performance due to write cache thresholds. 

 

This is not specific to Nutanix. Any storage system will have limited write cache space and will have higher burst write performance but a lot lower sustained write performance. In real world, write workloads are bursty but for short durations. Benchmark tests are more aggressive than real world workloads and tend to hit one of the cache thresholds. 

 

Guardian

Re: Hard disk failure

@sverhoevne have you tested with longer timespan eg. 30 minutes or 60 minutes 

could you also share CPU utilization during the test