With the Nutanix Enterprise Cloud Platform, assessing performance for your enterprise applications is much easier than you might think. In this blog series we approach the topic one facet at a time, beginning with IOPS. Our tests demonstrate conclusively that the NX-3060-G4 delivers far more IOPS than even the most demanding enterprise applications require. Rather than simply offer a series of charts and numbers, however, we explain:
- The many advantages that hyperconverged infrastructures have over traditional architectures.
- The key metrics for evaluating performance for virtualized enterprise applications.
- Nutanix Enterprise Platform delivers increasing performance as the application demands it.
This blog series aims to help remedy this gap. We explain how to assess performance on the Nutanix Enterprise Cloud Platform, with special attention to the most crucial metrics for mission-critical enterprise applications. Performance is a top priority at Nutanix; a large number of the new workloads our customers are deploying include performance-sensitive enterprise applications such as SQL Server, Oracle databases, and SAP Business Suite.
Performance is not just raw speed measured in IOPS, however. It’s speed, stability, and scalability. In this first installment we are going to tackle speed, the most widely used metric, and discuss what’s relevant and what isn’t when thinking about IOPS for enterprise applications on HCI. We chose a transactional database (SQL Server), typically the most I/O-demanding enterprise application, to test our popular all-purpose model, NX-3060-G4.
Our testing demonstrates conclusively that the NX-3060-G4 can handle far more IOPS than most enterprise workloads need. We’ll not only show you the numbers that matter for enterprise application performance, we’ll tell you what they mean.
The per-node configuration of the NX-3060-G4 includes:
- Nutanix AOS 4.7.1
- 2x E5-2680v3 2.5GHz Intel Xeon processors
- 512 GB RAM
- Hybrid—2 SSDs and 4 HDDs
Comparing HCI and Traditional Storage Architecture
Hyperconverged infrastructure locates a software-based storage controller close to the hypervisor, either as a virtual machine or as a kernel module. This design is a radical shift from what most enterprise architects are familiar with—many of whom still believe that you need dedicated storage hardware to get high performance I/O capabilities.
Traditional storage architecture centralizes I/O from all the hypervisor hosts in one location, usually with a pair of storage controllers connected to a set of disk shelves, which contain either SSDs or HDDs. The storage controllers are configured so that if one storage controller fails, the hypervisor hosts can still access data. To maintain performance, each storage controller should run at 50 percent utilization so that if one storage controller fails, the remaining controller can serve 100 percent of the I/O.
Due to traditional storage architecture design, the storage controller’s I/O capacity has been a preeminent concern. This stands to reason, as the controller’s I/O capacity determines on average how many hypervisor hosts the storage can support, which in turn determines not only the cost (how many storage controllers one needs to buy) but also the complexity of managing the storage. Moving hypervisor hosts between storage devices is time-consuming and risky.
In fact, the complexity of managing multiple storage islands increases geometrically. In other words, managing three storage controller pairs is far more complex than managing one. It comes as no surprise, then, that IOPS continue to be a big concern for organizations moving to hyperconverged architecture.
The Hyperconvergence Difference
In an HCI environment, there are subtle but crucial differences that reduce the dependence on maximum IOPS as the primary performance metric. In HCI, every node in the cluster serves I/O, which means that as the number of hypervisor hosts grows, so does the I/O capability.
This design eliminates reliance on a single pair of storage controllers. The figures below illustrate this difference, where you can see that, with traditional storage architecture, storage controller capacity determines your consolidation ratio (and also leads to overprovisioning).
Another issue that fuels our fixation with IOPS is the need to satisfy different levels of demand from different types of workloads. Some hosts run workloads with relatively little I/O and others require quite a bit. VDI requires little I/O, but a lot of CPU and memory, while databases require a lot of CPU, memory, and storage IOPS. It turns out, though, that even database workloads have fairly modest I/O requirements relative to the high I/O capacity of modern SSDs, which have driven the HCI revolution.
For example, a single Intel S3700 supplies up to 45,000 IOPS at 8 KB. Most hybrid HCI platforms contain two or more such SSDs. All-flash platforms provide up to 24 SSDs. It used to take racks of spinning hard disk drives to generate enough IOPS for enterprise apps, but the industry has moved well past these constraints. We no longer need or want massive deployments of monolithic and modular network storage.
Measuring I/O for Database Applications
How much I/O do today’s workloads need? To answer this, we measured the I/O characteristics of a commonly virtualized workload by modeling a Microsoft SQL Server 2014 workload with HammerDB OLTP preset as the driver. HammerDB is a free, well-documented, open-source tool that you can use to reproduce our results if you’d like.
We chose the TPC-C schema to build a roughly 550 GB database of 5,000 warehouses and used 750 concurrent users to drive the database transactions with no think-time, resulting in 1,200,000 SQL Server transactions per minute. We went with 750 users because if you don’t have enough users, the database tends to sit in cache; this number was large enough to drive a significant amount of I/O to storage. The transactions per minute are simply a measure of work being done.
The results of our test may surprise you. We found that running a transactional workload with a Microsoft SQL Server database, on eight cores, generated only around 20,000 IOPS before reaching 100 percent CPU on the database VM.
If we assume 24 cores on the host, the total I/O consumption on the entire host would max out around 60,000 IOPS before the host ran out of CPU for the database VMs. As these numbers indicate, a platform with two Intel S3700 SSDs, generating 90,000 IOPS, could support I/O-heavy database VMs with room to spare.
The table above plainly shows that the point where the database VM reaches 100% CPU is what determines the IOPS requirement per host. When the database CPU is at 100%, the database cannot go any faster, even with an infinite amount of I/O capacity from the storage tier. In short, for enterprise applications running on HCI, CPU capacity imposes the constraint, not IOPS capacity.
With the Enterprise Cloud Platform, storage architects working on a uniform cluster no longer need to plan for the aggregate I/O requirement for all the hosts. They only need to measure for the most demanding application host, knowing with certainty that the remaining hosts have more than enough capacity for the less demanding applications.
Provisioning enough nodes to meet the compute requirements is the more pressing question. The Enterprise Cloud Platform radically simplifies performance capacity planning, another major benefit of an HCI system that provides data locality.
Understanding Microbenchmarks with HCI
The only truly reliable way to predict application performance is to run the application on your system. In lieu of this, however, the industry has come to rely on various microbenchmarks as proxies that allow us to compare systems. Following are some common benchmark patterns for assessing storage architectures. In each case, we illustrate that the available I/O performance on an NX-3060-G4 is greater than the required I/O performance for our example workload.
Random Read Performance
The random read performance metric is important in database environments where the total active dataset exceeds the database cache. Because read operations cannot be deferred, random read performance often determines the maximum performance of large database workloads.
Of course, the data that databases access is not "random," but is instead a function of the incoming requests. For cases like transactional databases (which may service medical records or credit card transactions) the workload is unpredictable and, as such, appears "random" to the storage system.
The chart below shows the performance of an NX-3060-G4 node running a random-read benchmark. The amount of read-concurrency that we observe from the database is between 40 and 80.
For this test we used 8 KB I/O because it is the minimum that most databases use. In reality, the I/O sizes are varied. Using a uniform 8 KB I/O size is commonly accepted practice when measuring 8 KB “small block” I/O. We also included queue depth in our tests, which is a key variable often missing from discussions of storage performance.
Queue depth is a proxy for the amount of I/O the application demands, and it stands in for such things as multiple threads, database users, and concurrent queries. We measure the total queue depth across the entire host, which could be many virtual machines with many virtual disks attached to each one.
The chart demonstrates that, no matter the queue depth, the NX-3060-G4 provides more I/O than either an 8vCPU or 12vCPU database demands for random read operations. Notice especially that the performance increases as the application demands it.
There is a great deal of latent capacity in the system; as you add more work, you get more IOPS. Finally, the database runs out of CPU resources long before it can consume all of the I/O capacity, particularly with greater queue depth.
Burst Write Performance
Burst write performance is most applicable to database updates and inserts. Databases typically write to a sequential log and then periodically synchronize the data in the main filesystem, which generates write bursts. In the Prism screenshot below, we show the bursty I/O pattern from a SQL Server running the DB workload discussed above.
Figure 4 IOPS Required and Generated for Burst Random Write at 8 KB
The platform achieves high levels of burst write performance with the “Oplog,” which is specifically designed to handle incoming write bursts. SQL Server “file flush” operations are short bursts with high degrees of concurrency (queue depth). With the HammerDB OLTP workload, we observe around 256 outstanding I/Os when measuring the datastore’s “Active” I/O using esxtop.
The Oplog resides on SSD and is replicated to other Nutanix nodes to ensure no data loss in the event of a failure. In fact, the Oplog is itself somewhat similar to a database transaction log, in that it is a write-optimized on-disk structure that asynchronously drains into a read-optimized datastore.
Sustained Small Random Write IOPS
Database workloads also require sustained write performance. This can be thought of as the background write rate sustained in-between the bursts. Although writes to the DB log file are sequential, they also use the Oplog since they are small in size and require low latency.
The “Sustained Random write” requirements account for both DB log writes and continuous background writes to the main DB files. The concurrency factor for background sustained write I/O demand for HammerDB with 750 users was around 30. The I/O demand is around 8,000 - 9,000 IOPS.
Simulating an “Average DB” workload
A reasonable simulation of the HammerDB SQL workload is to use a 70:30 read/write mix with an 8 KB block size. Although this scenario does not simulate bursty behavior, it does approximate the I/O size and I/O mix. Concurrency between 64 and 128 simulates both sustained and burst I/O in a single workload. The database was around 600 GB on-disk and the working-set size was around 400 GB.
What These Numbers Mean to You
For the first blog in this series, we focused on the most commonly cited performance metric—IOPS capacity. Here are a few quick takeaways:
- The Nutanix NX-3060-G4 platform easily exceeds the I/O requirements of even IOPS-hungry enterprise database workloads, with room to spare for workload consolidation.
- Nutanix radically simplifies performance capacity planning. Once you’ve accounted for your most I/O-demanding application, it’s then only a matter of determining how many nodes you need.
- CPU and I/O capacity grow linearly on demand to service your enterprise applications—by design, your applications get more as they need more.
As happy as we are with our test results, making infrastructure invisible for our thousands of customers, including more than 300 Global 2000 companies, is what really counts. Enterprises like Excelitas Technologies, Lion Group, Hallmark Business Connections, UCS Solutions, Empire Life, and Jabil all run enterprise applications such as SQL Server and Oracle on Nutanix; their real-world accounts of achieving great performance and availability via the Nutanix platform are why we do what we do.
Be sure to check out our best practices guides and reference architectures to see our validated designs for running your enterprise apps on Nutanix with confidence.
Stay tuned for our next installment, where we will explain the role of stability when assessing performance for HCI
Disclaimer: This blog may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites, and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such site.