When Nutanix entered the market seven years ago, the dominant trends in the datacenter were virtualization, flash, and Big Data. We set off on a journey to marry all three, and the Nutanix Distributed Storage Fabric was born. Over the last seven years, the datacenter has evolved dramatically, particularly with respect to simplifying operations and increasing speed and agility. The container movement, led by Docker, has taken off because it addresses all three. While the IT operations team drove virtual machine adoption, developers are driving container adoption in the name of "agility." For many, virtualization is now seen as legacy but container technologies are not directly comparable to virtual machines and in many respects far less mature.
Container technology offers many benefits, such as improved application portability across development, test, and production environments, without regard for whether the host is your laptop, an on-premise datacenter, or in the cloud. Containers consume a small fraction of the compute resources of typical virtual machines, allowing for near-instant start times, application scaling, and increased application density, thereby saving customers time and money. Containers have come a long way in a remarkably short time and are now considered the de facto method for facilitating deployment, one of the costliest areas of software development.
To say the industry is excited about Docker is a huge understatement. After a few years of experience with Docker here at Nutanix, we can understand why: it provides an effective way to build, package, and distribute software. As Docker and the container ecosystem develop, it is clear that, in many respects, this movement both complements and competes with virtual machines. However, industry transitions take time and this applies even to an industry leading company, such as Docker. Customers and developers in the community have shared with us a range of issues, which we describe below, especially around problems with data persistence. In this blog, we talk about how we address several of these issues via the Acropolis Docker Volume Plugin, available with AOS 4.7.
Many Obstacles on the Path of Containerization
The container market remains highly fragmented. Customers are at different stages of adoption, with the vast majority in Dev/Test and very few organizations are running containers in production. The reason for this may be that Docker in development is two commands, usually a webapp linking to a database running in a VM, or one command if you’re using Docker Compose. Docker in production, however, is much more complex and not for the faint of heart.
Following are questions I regularly hear from customers and users in the field (events, meetups, etc) that illuminate the challenges that Docker users wrestle with:
- How are you persisting data between containers?
- Where am I logging the output of all my containers?
- How do I deploy Docker images across N machines?
- How do I roll back quickly if a push was bad?
- How do I automatically build and test containers?
- There was a security patch for the base OS my container uses. Does my process depend on FROM a little too much?
- Wait, why am I using a public base container image?
- Why are my containers not communicating with each other? Did I configure my overlay correctly?
- How many servers do I need to test a build quickly enough so I'm not getting yelled at by developers?
- How am I collecting metrics from each container?
- How are those metrics registering their time series with my DB?
- What software am I actually using to extract metrics from those containers?
- How am I ensuring a specific version across all my running containers?
- Do I have a proper staging environment that's accepting production traffic?
- How am I load balancing between hosts and containers on hosts?
- How am I debugging connection issues inside a container? I don't even have netstat?
- Have I load tested that logging system to auto-scale more than 10x on a weekend?
- Oh crap, one of the five distributed systems my scheduler requires exploded, causing a cascade of failures. (Ok, that one’s not really a question).
- How do I recover my secrets when my container died? Why did I put secrets in a container?
- How do I rebuild everything?
Docker Containers Lack Data Persistence
Unlike VMs, Docker containers are transient in nature, as is the storage assigned to them. When a Docker container goes away, the storage goes away with it. Much of the power of containers comes from the fact that they encapsulate as much of the state of the environment’s filesystem as is useful. As a result, when you restart a Docker container, the new container retains none of the changes made in the previously running container—those changes are lost. Yes, your data is lost!
Docker containers were initially developed for 12-Factor applications, an application development pattern pioneered by Heroku, an early PaaS provider. This 12-Factor application method says that the state of your application should be outside of the application and in a datastore (database, queuing, caching, etc). This pattern was great at the time (2011) if you were running only front-end applications, because it’s not a problem when the containers go away. But what about the web cache of your web server, or the application logs that you might want to keep for an audit trail? What about secrets (SSH keys, certificates, password files, database credentials, etc.)?
What about sharing data between containers or between the host and the container? Finally, what if you want to run your registry in a container or databases in containers, such as MySQL, Postgres, MongoDB and ElasticSearch, which are among the most pulled images on Docker Hub (see the DataDog survey and the top 10 repos on Docker Hub). In my opinion, the 12-Factor paradigm shows its age when users in the community demand persistent storage support with containers and the ecosystem develops solutions to support this problem (see Kubernetes Persistent Volumes, Mesos External Volumes)
You could use Docker Volumes (a bind mount between the container's file system and the host's filesystem) to store data outside the container's own union file system to persist data on the local host's file system. However, this binds the volume to a host, which is a single point of failure. Of course you can restart your container on another host, but your data does not move with it, and if the host or VM goes away your data is lost.
Containers Have Sluggish I/O Performance
I/O isn’t great on containers and Docker recommends using volumes. By default, containers utilize a union file system mount that brings copy-on-write capabilities at a container level. This is Docker’s secret sauce; it is what provides its git-like capabilities. However, Docker volumes bypass the union filesystem and are initialized during container creation. If you want your data to persist outside the container lifecycle, you need to use a Docker volume.
The downside of Docker volumes is that your data is lost if the Docker host dies. Docker introduced the concept of “volume plugins” with Docker 1.8 as a way to facilitate communication between Docker and storage APIs. This allowed third parties to integrate with Docker, but the level of integration was very basic. They then added “named volumes” with 1.9, which allowed users to manage volumes as atomic units instead of managing them inside a container. We can now provide highly available storage services to containers that reside outside the container and the host.
Introducing the Acropolis Docker Volume Plugin
With AOS 4.7, we have extended the Acropolis DSF (Distributed Storage Fabric) to provide persistent storage support for containers. The Nutanix Acropolis DSF Volume Driver is written in Go and runs as a Docker volume extension. It behaves as an intermediate container (runs in privileged mode), effectively a Sidekick container. The Nutanix Acropolis DSF Volume Plugin surfaces a link to the Acropolis DSF via iSCSI volume groups, exposing Acropolis DSF storage directly to containers and bypassing the hypervisor.
Thus when you lose a container or the host you still have access to your data. Moreover, because we leverage data locality, you get the added benefit of data mobility, which means the data volumes always follow the container as it moves across the entire cluster and to the host where the container is running. These features are unique to Nutanix. This form of data locality is the secret sauce that enables us to guarantee consistent performance against the noisy neighbor problem no matter the size of the Nutanix cluster. Another goal when developing support for containers was to bring the consumer grade UX that customers demand and expect of us. In our experience, early adopters of new technology already have a steep learning curve, so we did not want to be a choke point. Our solution does not require 3rd party drivers or tools; it’s native in our implementation.
As you progress in your journey of moving containers from Dev/Test to production, you can implement more advanced data services like disaster recovery, scheduled snapshots, real-time tiering (HDD, SSD, NVMe, or RAM), compression (inline/post-process), deduplication (inline/post-process), and erasure coding.
The Nutanix Acropolis DSF Volume Plugin for Docker and the Docker Machine Driver provide the following benefits to help our customers increase their pace of innovation and time to market:
- Container-Native Integration. DSF and Docker Machine use the native Docker API and tooling.
- High Storage Performance and Throughput. The Docker Volumes Plugin uses the best of breed Distributed Storage Fabric. Nutanix DSF performance scales linearly.
- Easy Install and Support. The DSF Docker Volume Plugin and the Docker Machine Driver work right out of the box and are fully supported by our award-winning support organization.
A Nutanix and Docker solution provides two great advantages, the web-scale IT of Nutanix alongside the speed and agility of Docker. Nutanix addresses the issues of data persistence and storage performance while complementing the speed and agility of Docker by making IT Infrastructure invisible. For all the work that remains to be done with this relatively new platform, Docker has a lot to offer, which is why it is changing the way applications are being built. We have only just begun bringing HCI to containers, so stay tuned for more to come.
Special thanks: Sridhar Devarapalli, Ray Hassan and John Williamson for reviewing.
Nutanix: The Move From a VM to a Container is Unnatural, a Challenge of New Platforms
Quick-start guide to implementing the Docker container stack on AHV
The Intersection of Docker, DevOps, and Nutanix
Docker Containers on AHV