This Blog is authored by Paul Updike, Sr. Manager, Technical Marketing Engineering at Nutanix, and Chris Wilson Manager, Performance Engineering at Nutanix.
At .NEXT 2017, we announced the availability a new tool, X-Ray, that automates the analysis of test scenarios on hyperconverged infrastructures. X-Ray allows you to evaluate systems in a more realistic and systemic way. We think it’s time that testing be more reproducible and holistic, uncovering whether systems are consistent, reliable, and predictable before they go into production. That’s why a central focus of X-Ray are scenarios derived from real situations that a hyperconverged cluster will eventually encounter.
In this blog, we introduce some of the internals of X-Ray to help you get a better understanding of what’s behind the scenes when we implemented the scenarios. This is because we want you to be confident in the results X-Ray provides.
Scenarios at the core
X-Ray scenarios are anchored to the core concepts of workloads and workflows. The combination of workloads and workflows produces a repeatable scenario that not only drives load to the system but also injects realistic events that would be encountered during the lifecycle of the system. These events range from requesting snapshots of VMs, modeling a failure of a node, powering off a node, and, adding new workloads and VMs .
To implement workloads, we settled on using the open source tool FIO off the shelf. It’s extremely flexible with tons of options. One of the most important attributes for X-Ray is that it allows for workloads to be driven at a fixed rate, instead of only a fixed concurrency like other tools. This lets us drive a more realistic load and focus on the impacts to that load, rather than just looking for top-end performance. X-Ray directly calls FIO and uses standard FIO configuration files, allowing the flexibility for nearly any FIO configuration option to be used.
To implement the workflows for a scenario, we’ve introduced a YAML description that includes all the information necessary to create a self-contained, repeatable test sequence.
Example of YAML description of a node failure scenario:
The scenario description uses the internal concepts of VM Groups, which describe VM counts, placement, and point to the internal gold images that will be used for the test. Currently there are two gold images packaged in X-Ray:
Workloads detail the VMs assigned to a specific workload as described by an FIO configuration file. In this way we set the entire group of VMs to perform the same FIO workload. Because the FIO configurations expect certain virtual disk configurations, some workloads need a particular gold image used with the VM group that is associated with that workload.
The scenario steps implement the workflow: where workloads are executed, snapshots taken, VMs migrated, nodes failed, and so on. The scenario steps are broken into two specific phases, setup, and run. Setup is used for scenario steps that shouldn’t be considered part of the measurement of the test. Setup steps typically include things like performing the initial cloning of VMs, powering VMs on, and running workload warmups. The run steps typically include the starting and measurement of the actual workloads while also injecting failures or other events. Each step refers to a specific function in the code, this then determines which API or set of APIs to call to complete the steps.
Though users can’t currently create their own scenarios in YAML, we do share what the scenarios do by including the YAML used for an experiment alongside the data in a result.
X-Ray is packaged as a VM that contains the X-Ray software as well as the necessary gold images. The architecture of X-Ray is broken into three major components: 1) the UI, 2) the X-Ray server, and 3) Charon.
Working from the bottom up, Charon is the component that drives the execution of X-Ray’s tests. Charon interprets the YAML descriptions to build a class that represents the scenario. Test instances are the combination of a test target configuration and a scenario. When an instance is created, Charon drives the scenario end-to-end by handling all communication with the test target, the infrastructure management service (e.g. vCenter for vSphere), and the VMs deployed as part of the scenario. Public APIs are used for all communication with the infrastructure as the desire is to mimic user or environmental interactions with the system. Charon, written in python, uses the python implementation of the vSphere SDK, pyvmomi, to communicate with vSphere targets; it uses the Nutanix REST APIs to communicate with AHV targets; and IPMI to perform node power management functions. Embedded in the gold images packaged with X-Ray is the Charon agent. This agent provides an infrastructure-agnostic interface to interact with the workload VMs.
The X-Ray server sits in the middle of the stack providing a REST API for the user interface and other consumers as well as communicating with instances of Charon it creates. It is responsible for maintaining the sets of test target configurations as well as test results and analyses. Sensitive information about the test targets such as passwords are protected by encryption when written to disk. As the manager of test targets, it ensures that only one test instance is executing for a target at a time. This enables the queueing capabilities also built into the server. The server also includes the ability to create reports using the same data displayed in the UI.
X-Ray can ultimately be summed up as a unique implementation of realistic systems performance evaluation using public APIs for automation and an open source workload generator. We think it’s important to be open about what’s included in X-Ray so you can be confident in the results it produces.
To get started with X-Ray, head over to https://www.nutanix.com/xray/ and register to download. To get help, provide suggestions and interact with other users and developers and participate in the X-Ray NEXT forum.