This post was authored by Michael Webster of Long White Virtual Clouds
I was one of several program managers on a project that needed to integrate 26 different disparate back end systems to create a national pension system, which would be responsible for collecting and disbursing several billion dollars per year. Many of the systems didn’t yet exist, so we had to build them from development, multiple test environments including for external parties, which were part of an end-to-end solution. For every production instance there would be at least 9 instances for non-production, development and different phases of testing, training and support. Each system was a mixture of client server or multi-tier web apps. The middleware that held everything together consisted of 5 layers. The whole solution needed to be always available 24x7 as there were tight SLA’s with financial penalties to external parties.
When we started planning the projects we quickly realized that creating all of the test environments the traditional way (physical) would take at least 3 months per environment, and we would run out of time and of datacenter space. Each fully configured middleware environment required 128 servers (64 at each of two datacenters), so there would be over 1000 servers required. For the systems that already had existing test environments it was already taking between 4 weeks and 6 weeks just for QA to verify all the components were working correctly so testing of a release could begin. Quite often we would find that some configuration had been changed and some remediation was required in these test environments, which would take more time, and impacted the ability to triage defects.
We quickly realized the traditional way of creating environments wasn’t going to work and turned to virtualization. We began by setting up a number of zones on each of the Solaris systems, which had been recommended by our software partners. This only somewhat alleviated the problem, as it was still time consuming to create the individual zones, to allocate and provision separate storage, and configure the application instances (which was partially scripted). Each of the Solaris systems was about 4 times the price of an x86 server, but could only support 8 zones, so we’d still need at least 160 servers. If we couldn’t get the necessary environments stood up we’d also have very low productivity from the hundreds of developers and testers we had on the projects.
This story so far isn’t unusual, at least not for the time circa 2007. We needed to create a number of test and development environments. We needed them to be set up in a consistent way each time and have a known good state so that we could consistently release code and make them available to QA as soon as possible. We needed to catch defects early, rather than after code had gone into production and caused downtime. We needed to make sure the test environments we had were as valid a representation of production as possible, so we could reduce the number of unknowns and environmental defects around configuration difference as much as possible. We needed to make sure all of our non-functional requirements around performance and availability could be met or exceeded, and how much headroom and capacity we had across all the different components. We needed a fast and repeatable process that could be applied to hundreds of different systems. We already had virtualization using Solaris zones, but it still wasn’t enough, we needed something more.
This brings us to the title of this article, The Role of Hypervisors in Dev and Test Environments. In this case x86 hypervisors specifically (yes there are hypervisors for Unix platforms and it’s been around on mainframes for 30 years). By using an x86 hypervisor we were able to create over 800 VM’s in two weeks and have the multiple test environments created in total within 6 weeks, something that would have otherwise taken a year even with Solaris Zones. We used Puppet to automate the configuration management of the operating systems and middleware. We deployed each OS from template, which had a standard hardened configuration, and we had tested it so we knew would give us the same consistent experience each time. We were able to quickly integrate all the different applications and create and destroy the environments as needed. Importantly we were able to create the virtual machines without yet having all of the hardware we needed to run them, while the hardware procurement was in progress. So we were able to get started quickly and add hardware on demand, as the lead-time was at least 4 weeks.
By using a hypervisor in the dev and test environments we greatly simplified the hardware configuration of each VM, which eliminated the need to support build processes for multiple different hardware configurations. This instantly liberates systems by decoupling them from the underlying hardware and making them portable, allowing them to be moved between physical hosts and between datacenters. We were able to provision environments that were an exact replica of how production would look and run them safely side-by-side many other systems without impacting performance. We could easily triage and reproduce defects by standing up additional environments on demand.
By leveraging a hypervisor for the dev and test environments of this project we were able to save 10 months, $30M CAPEX, save many more millions of labor and productivity costs, and deliver significantly higher quality releases in less time. This was back in 2007. With modern hyper-converged web scale infrastructure we can do much more. What required 40TB of storage and 6 weeks to do back in 2007 can be done in 4TB and less than an hour today. This should be the standard today, friends don’t let friends run physical.
Fast forward to today. This was highlighted recently when a new Fortune 500 customer of Nutanix migrated their previously physical Oracle E-Business Suite and Databases (the heard of their business) to our platform. On their previously physical platform it would take a minimum of 8 days to create the necessary test environments and refresh them from production. The original time is not too bad and it was a well-scripted efficient process. By migrating to Nutanix web-scale converged infrastructure the development and test environment refresh process now takes less than 1 hour to create 7 instances from production, and it only consumes the storage capacity of a single environment. We don’t call this data de-duplication, we call this data avoidance.
With Dev and Test environments being between 70%-80% of infrastructure and a significant investment it makes sense to try and optimize the investment as much as possible. With Nutanix you can do this on any of the supported hypervisors. So you get to choose the plumbing that best fits your business, and change it if or when it makes sense. This is what is .Next for IT.
How much would you save and how much more productive would your organization be with a solution like this? To find out how our customers leverage Nutanix solutions to drive real business benefits in development and test environment, come to our .Next conference. You will see this, and so much more.
This topic has been closed for comments