Hello mates,
We have 2x NX3350 and Arista switch which handle roughly 100 VM.
Workloads are:
20x VM with heavy disk workload which are used mainly for reporting and analysis. R/W ratio is roughly 50/50. They created 20K IOPS workload when they lived on SAN storage with RAID10 and tiny flash tier and average latency was always below 5 ms.
10x VM which are used for application virtualization
70x VM which are use for desktop virtualization.
Now, when we migrated to Nutanix, we have only 4K cluster IOPS and 20 ms latency which does not seems to be very good for us.
Trying to resolve the issue, we enabled inline compression and increased CVM memory up to 20GB. We also tried to change tier sequential write priority. Unfortunately, this does not help.
ncc, cluster status and prism health claim that everything is ok. Before we migrated our environment, we've run diagnostic VM and the results was roughly 100K IOPS for read.
Here is the current configuration of cluster:NOS Version: 4.0.1.1
1 storage pool
7 containers with enabled inline compression
Also, please see the 2009/latency page output on one node as an example:
reads:
nfs_adapter:
Stage Avg Latency (us) Op count Latency % Op count % of total of this component of total of this component
nfs_adapter component42348273618100100100100
RangeLocksAcquired027361800100100
InodeLockAcquired527383000100100
SentToAdmctl427377200100100
stargate_admctl423032737729999100100
AdmctlDone227377200100100
Finish827382800100100
writes:
nfs_adapter:
Stage Avg Latency (us) Op count Latency % Op count % of total of this component of total of this component
nfs_adapter component21420150682100100100100
RangeLocksAcquired015068200100100
InodeLockAcquired1615283800101101
SentToAdmctl16148526009898
stargate_admctl2150614852698989898
AdmctlDone2148526009898
Finish18215283100101101
Any ideas please?
Br,
Update
We ran diagnostic VM today during the production workload. Here is the output:
Waiting for the hot cache to flush ........... done.
Running test 'Sequential write bandwidth' ...
Begin fio_seq_write: Wed Oct 1 11:52:11 2014
1475 MBps
End fio_seq_write: Wed Oct 1 11:53:07 2014
Duration fio_seq_write : 56 secs
*******************************************************************************
Waiting for the hot cache to flush ............. done.
Running test 'Sequential read bandwidth' ...
Begin fio_seq_read: Wed Oct 1 11:54:16 2014
5104 MBps
End fio_seq_read: Wed Oct 1 11:54:33 2014
Duration fio_seq_read : 17 secs
*******************************************************************************
Waiting for the hot cache to flush ......... done.
Running test 'Random read IOPS' ...
Begin fio_rand_read: Wed Oct 1 11:55:20 2014
123849 IOPS
End fio_rand_read: Wed Oct 1 11:57:02 2014
Duration fio_rand_read : 102 secs
*******************************************************************************
Waiting for the hot cache to flush ....... done.
Running test 'Random write IOPS' ...
Begin fio_rand_write: Wed Oct 1 11:57:38 2014
85467 IOPS
End fio_rand_write: Wed Oct 1 11:59:20 2014
Duration fio_rand_write : 102 secs
*******************************************************************************
Tests done.
Update
We have found, that the problem is in the Hypervisor datastore layout. If I mount Nutanix container inside the test virtual machine, everything works as it should and we have nice results. Once VM is wrting data through the Hypervisor, the latency and IOPS are bad again. What could be the reason for this?
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.