This post was authored by Gary Little with Priyadarshi Prasad
Disclaimer: The views expressed in this blog are that of the authors and not those of Nutanix, Inc. or any of its other employees or affiliates.
On Friday November 1st the Transaction Processing Council announced the “TPCx-HCI ” benchmark. That’s quite a mouthful! In 2017, does anyone care about benchmarks anymore? Full disclosure - I helped develop TPCx-HCI.
It turns out assessing infrastructure fitness is hard
Assessing infrastructure performance is hard, but truly assessing fitness is much more difficult. Fitness includes performance and resiliency and scale.The ability to accurately assess fitness is especially critical when moving from one architectural style to another; e.g. “bare-metal” to virtualized, mainframe to open-systems etc. Today we want to measure whether it’s possible to move from a custom host:storage paradigm to a flat, cloud-like, infrastructure.
By and large, most proof of concepts (POCs) carried out by end-users are either very simplistic CPU/IO tests or huge matrix style checklists. Most POCs are never completed in practice; and even fewer without the “help” of the competing vendors. End users would like to rely on an independent third party whose entire expertise is to evaluate infrastructure. What has tended to happen though is that the independent third parties are often commissioned by the vendors themselves.
The closest thing we have today to an independent review is audited benchmarking where the rules are governed by a set of interested vendors. It works better than you might think, particularly when the participants are from product engineering rather than marketing. However, most of what is considered benchmarking, is nothing more than vendors striving to create the highest performance metrics possible with the most ridiculous configuration which still adhere to the benchmark rules (if there are any).
For numerous examples, see storage vendors arguing over whether 4k, 8k, or 32k is the “most representative” block size. Also, no storage benchmarks enforce any sort of logical data protection (e.g. snapshots) yet I don’t know of a single customer who does not use some form of data protection or snapshots.
It turns out that “raw performance” is the least interesting metric
This is particularly true for benchmarks which are largely, or solely dependent on storage performance. Modern SSDs give us more IOPS than can easily be consumed, particularly in a Hyperconverged infrastructure where the “storage controller” capacity scales linearly with number of hosts and hence “storage IO capacity scales with incoming work”.
A fairly standard Nutanix all flash node can provide several 100’s of thousands of IOPS. That’s 100,000’s of IOPS provided to a single pair of Xeon E5 processors. Unless my business is delivering “IO Benchmarking As A Service” (IOAAS) it is very unlikely that a pair of E5’s running a typical application today will consume much more than 40% of available IOPS at peak demand.
Most benchmarks represent the first date experience not the marriage experience
We currently have no benchmarks which measure performance together with the effects of events that happen in real data centers. For example events like failure, and expansion.
We created the X-Ray tool to focus on platform stability in conjunction with various workflows. Its sole purpose is to make HCI evaluation easier in real world environments. Customers wanted to get beyond the marketing with a tool that can quickly help them understand the strengths & weaknesses of competing HCI solutions.
There are two really important aspects of X-Ray:
► It is entirely agnostic to all HCI solutions. The tests included in test suite are the ones that you would want to run before you put any solution in your production environment. There are are tests that can run on all HCI solutions.
► It is NOT a drag racing tool. Being ready for production means more than just having huge performance from synthetic benchmarks. It is about failure tolerance, ability to keep performing under failure, and performance during upgrades and Snapshots. These are some of the tests you can run, visualize, and compare using X-Ray.
There are also two important points to consider around X-Ray:
► X-Ray is designed and implemented by Nutanix. There is naturally some skepticism especifically that the tool favors Nutanix even though it can run against many HCI environments, and has nothing in test runs/designs that favor Nutanix. See for yourself by downloading it here: https://www.nutanix.com/xray/
► Although X-Ray simulates the long-term experience, there is no auditing capability.
We need a benchmark which represents the long-term AND is auditable
TPCx-HCI is a benchmark which marries the goodness that existed in TPCx-V (the single node virtualization benchmark) and expands it using the ideas honed building real world benchmarking tools (i.e. X-Ray) to measure failures and capacity expansion within a cloud.(TPCx-HCI).
► To qualify for TPCx-HCI the underlying platform must be both Hyperconverged (meaning that storage, compute and networking are provided in the same unit) as well as providing a cloud like storage fabric that is uniformly accessible from all nodes.
► During the benchmark execution TPCx-HCI in addition to performance also attempts to measure storage-level resilience by ungracefully powering down one of the HCI nodes and observing the impact of that elsewhere in the cluster.
► From performance standpoint, the benchmark measures the cluster-wide, multiple database performance that is elastic rather than that of a single node or single database. This is important because enterprise clouds typically consist of many databases operating simultaneously, in an uncoordinated fashion which can lead to noisy neighbor effects.
So do we care about benchmarking?
I would say that many end-users and customers would care about a benchmark which has the following characteristics.
► A Benchmark using real-world performance (databases are a real-world workload) not a micro benchmark.
► A benchmark which measured the married experience including failures and expansion.
► A benchmark which is auditable.
We have tried to provide all of these in TPCx-HCI. The art of benchmark design is the ability to allow vendors to compete in a fair environment but which also allows for innovation and is simple enough for an auditor who is not a vendor which created the benchmark to successfully audit the benchmark results.
Time will tell if we have done a good job of that. The venerable TPC-C benchmark is 25 years old and still being used daily in many vendors to measure “real world” performance - despite its limitations. I hope we have done nearly as well.