Chaos Monkey for the Enterprise Cloud

  • 12 April 2018
  • 1 reply
Chaos Monkey for the Enterprise Cloud
Userlevel 7
Badge +34
This post was authored by Marc Trouard-Riolle, Senior Product Marketing Manager Nutanix

During the late 70’s in the UK, there used to be a Japanese TV show called “Monkey”. Based on the Chinese novel Journey to the West by Wu Cheng’en, the choreographed martial arts action scenes made many fans, gaining a cult following that continues today. If you’re not familiar with the story, the main character is an immortal monkey and former king of a monkey tribe who is skilled in combat yet has been imprisoned for 500 years to learn about patience. Following this incarceration, he makes a pilgrimage to seek out holy scriptures, yet typically ended up causing chaos along his journey.

You may now be wondering how a 70’s TV show about a monkey causing chaos relates to hyperconverged infrastructure testing.

When organizations test and benchmark data-center infrastructures, they typically perform 4-corner micro-benchmark tests, referring to the pure I/O performance tests performed to understand the extremes for inputs and outputs, the results of which are often used to highlight the maximum speeds or throughputs of systems and components.

The greater challenge for infrastructure architects however, is to understand the business impact of infrastructure resilience, performance, security, data integrity, and even ease of use. What happens for example, if the resilience of a cluster is compromised, such as if a cluster node was to fail? The 4-corner test is still valid, however, the resulting “hero” benchmarks whilst interesting, are not necessarily reflective of real world business requirements, and certainly don’t take into consideration the impact of common datacenter scenarios like in the above example, or such as the impact snapshots have on the running applications.

Enter Chaos Monkey

While the term Chaos Monkey may have initially sparked memories of my childhood, it is more commonly attributed to Netflix’s testing tool, created in 2011 to intentionally disable computers in order to understand how the overall system responds to outages. These days Netflix has a suite of related tools for all types of reliability, security, or resiliency infrastructure tests. Nutanix X-Ray was created with a very similar purpose in mind; to provide the industry with the capability to assess hyperconverged infrastructures and enterprise clouds based on key test scenario areas.

Architected with multi-hypervisor and multi-platform support in mind, X-Ray can test hyperconverged infrastructures based on AHV or ESX hypervisors, and covers the following key scenario areas:
  • Availability - X-Ray tests how a hyperconverged solution tolerates a node failure while running workload.
  • Performance - X-Ray test the system's ability handling mixed workloads.
  • Feature Set - tests the impact to the hyperconverged infrastructure for features such as VM provisioning.
  • Data Integrity - X-Ray tests data integrity during power outages or component failures.
Test scenarios can also be customized, shared and imported into new X-Ray deployments, however the default set of tests include:
  • Database Colocation - demonstrate the effects of running two different database workloads (OLTP and DSS) at the same time within a three-node cluster.
  • Snapshot Impact - show the effects of an increasing number of snapshots on a critical OLTP database workload.
  • Rolling Upgrade - demonstrate that a critical OLTP database VMs continue running without issues during a simulated cluster upgrade of the hypervisor.
  • HCI Workflow - demonstrate the capability to run heterogeneous workloads (OLTB & VDI) across a cluster without impacting each other.
  • VDI Simulator - show the sustained, fixed-rate performance of VDI VMs in one of three intensities running across all cluster nodes.
  • OLTP Simulator - demonstrate an OLTP workload of a particular desired working set and workload intensity running on a cluster.
  • Throughput Scalability - establish the maximum sustained throughput for sequential and random reads and writes.
  • Clone Impact - understand the performance impact as 100s of clones are created across the cluster.
  • Sequential Node Failure - illustrate the effects of sequential node failures on data availability in a cluster.
  • Extended Node Failure - understand the impact to VM performance when a node fails without warning.
  • Four Corners Microbenchmark - test of performance with read versus write and sequential versus random.
The current version, X-Ray 2.3, introduced several new important capabilities:
  • Support for AHV on the IBM Power platform
  • Customized VM and vDisk configurations
  • Prism Central image service support
  • X-Ray tokens

IBM® Power® Platform Support

During 2017, IBM introduced support for hyperconverged systems powered by Nutanix software as a part of the IBM OEM partnership. With this 2.3 release, X-Ray supports the Power® architecture for scenario-based testing for the commonly deployed IBM Power® ecosystem of products, such as open source database management systems (OSDBMS) like MongoDB and PostgresSQL.

What Are X-Ray Tokens?

X-Ray is now licensed with tokens, removing the need to login to the X-Ray server. Each license provides 30-days of app use, with the tokens currently freely available from the portal. How you generate tokens is extremely simple as the following process demonstrates:

All X-Ray users must initially register to use X-Ray. This only needs to be done once, from which you will receive a registration confirmation email containing a download URL with a choice of OVA image (for VMware ESXi™ infrastructures), and QCOW2 image (for AHV infrastructures). All image downloads, documentation and other related community forums can be found through this link.

Token generation is found at:

To generate X-Ray tokens, select the reason for deploying X-Ray:

The resulting token is presented and can also be emailed to you:

Beyond Simple Performance Metrics

Hyperconverged infrastructures, and subsequently enterprise clouds are continuing to evolve at a significant pace, both in scope and capabilities, resulting in new applications and use cases. As organizations increasingly place business critical workloads on their private and hybrid enterprise clouds, the need to understand infrastructure characteristics goes beyond simple performance metrics.

Check the community forum for all new X-Ray app updates, and as a place to engage with other X-Ray users and Nutanix customers, or if you have any X-Ray requests and/or suggestions for Nutanix, please don’t hesitate to let us know either by a forum post, or by sending us an email at

Please also remember that if you’re new to X-Ray, you will need to first register here. All X-Ray app download binaries and documentation can be found on this forum post.

Gain a new insight into your enterprise cloud by causing some chaos in your [dev/test] infrastructure!

Attending .NEXT 2018 in New Orleans?

The X-Ray product team will be on-hand to help answer any questions, with demos, in the solution expo hall.

Register here now for the event and attend the ‘Benchmarks in the Age of HCI’ breakout session on Wednesday May 9th at 12:40PM to hear from Product Management & Engineering: application owners and architects don't need synthetic, unrealistic numbers, but instead require automated analysis of consistent performance, and reliability, during test scenarios. Space will be limited, so register your place now!

Disclaimer: This blog may contain links to external websites that are not part of Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such site.

© 2018 Nutanix, Inc. All rights reserved. Nutanix and the Nutanix logo are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. All other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s).

This topic has been closed for comments

1 reply

Userlevel 1
Badge +1
The concept of 'chaos engineering' is something I've been exploring. Thanks for sharing!