First of all (for the nutanix guys):
It's a great idea to ship something like diagnostics.py as part of the product. Thx for that!
As advised, i gave it a run after a nutanix/vmware deployment.
Numbers looked good, compared to some i found on the net. (basically identical)
Re-imaged the very same box (4 nodes of NX3450) with nutanix/kvm.
Re-run the diagnostics.py - expected to achieve similiar numbers.
Result: got some craaaazy performance improvements (judging alone by the numbers), but look for youself:
|test/pattern||ESXi 5.5||KVM 0.12.1||unit||% more|
|Sequential write bandwidth||786||1503||MBps||91,2|
|Sequential read bandwidth||2902||3391||MBps||16,9|
|Random read IOPS||53123||68390||IOPS||28,7|
Random write IOPS
Sounds great, yeah..but there has to be some error somewhere - ratio is telling me >
The difference can't be that big "just" because of KVM vs. ESXi and iSCSI vs. NFS, right..?
The question is where?
diagnostics.py not emptying some caches correctly?
script-logic-error counting something twice?
wrong method at all?
maybe someone (nutanix?) has an idea?
Sequential write BW seems lower , expected is 1525.
I would check esxtop ( n option) to make sure that the traffic is going through 10 G.
Hi, I work in the Nutanix Solutions and Performance Engineering Team. Your numbers for ESXi 5.5 seem a bit on the low side for 3.5.2. The testing I've been doing on a 3450 with the IvyBridge processors wth 3.5.2 has been able to achieve around 1500MB/s seq write, 3200MB/s seq read, 100K 4K random read IOPS and 50K 4k random write IOPS. I'm using Jumbo Frames in my network on 10G links and my CVM's are configured with 24GB RAM. Results will vary based on a number of factors, including if you'd run the diagnostics.py cleanup prior to the test, if you're using Jumbo Frames, and what else is going on in the cluster etc.
The command I use to run diagnostics.py is as follows:
diagnostics/diagnostics.py --display_latency_stats --run_iperf run
The above command displays the latency stats and iperf network performance results in addition to the other default stats. This is useful to detect any networking problems. The above command assumes you have just logged into the node with ssh and are in the /home/nutanix directory (the default). After each run I run the diagnostics.py cleanup command. This gets everything back into a consistent state.
Thanks for the numbers and tool. I have couple of questions :
+ How many VMs were used to drive such numbers ? be specific wrt numbers vdisks, v-adapters, whether it was pvsci or not ?
+ what is the kind of workload running inside the VM ?
+ was buffer cache enabled inside VMs ? If so, are the numbers in conjuction of buffer cache or only as seen by the controller VMs ?
+ were numbers per cluster or per controller VM ? The seq numbers are crazy, makes me think it is per cluster, but random numbers seem to be per controller VM. Can you please clarify ?
diagnostics.py is a specific script that create a new container, one VM per CVM, boot these VMs on a ISO image (think live CD) and mount the new container directly into the VM, bypassing the hypervisor altogether.
The sole purpose of this script is to check whether or not the Nutanix cluster (as opposed to the Hypervisor Cluster) is working fine.
Nutanix run this test on their side, and they can then check your results against expected results to tell if something is wrong on the performance side.
If the Nutanix cluster is fine but you have bad performances, then it's possible the problem is at the hypervisor layer or at the VM/workload layer.
If the performances are wrong, then you may have problem in the Nutanix CVMs or at the hardware layer.
It's a great tool for performances problems and for initial setup testing, but not much else.
For YOUR performance testing, one thing to keep in mind is that Nutanix is a "VM-centric" solution, and the NDFS (Nutanix Distributed FileSystem) is specificaly design to handle multiple VMs workload.
To benchmark "realworld" workload, you should use ... "real world" scenarios.
Something like VMmark (http://www.vmware.com/products/vmmark) should be a good example.
Blindly running fio/iozone/IOmeter will not give you the full picture, because none of these tool is really capable of reproducing the IO patern of a real life application.
Note: For standard arrays (not distributed like Nutanix) these test would be fine because all the controllers would see in real life would be random IO because of the aggregation of IO patterns of multiple VMs.
This is not the case with Nutanix, because in this case the CVM knows about the VM, which vdisk belongs to which VMs, etc.
I downloaded the latest version of nutanix OS. What are the numbers I should expect from that, as
http://longwhiteclouds.com/2014/07/06/a-brief-look-under-the-covers-of-nutanix-os-4-0-1-release/ claims that there is 50% increase ? Does that mean that new numbers wrt seq r/w and rand r/w will increase to 2.25GB/s for seq w, 5GB/s for seq r, 165k for rand r and 78k for rand w ?
Sorry I didn't see this question before now. The performance results will vary based on the system and the type of workload. Hopefully you have upgraded to a much more recent release by now. Newer releases have significantly improved performance as well.