This is one that's near and dear to my heart .... or at least to my buisness as we are container centric here. To restate the core question:
"Should containers be treated differently, or do system administrator need to know when to deploy containers?"
The TLDR version to this question is: In order to recieve a benefit, containers need to be managed in a completely unique method when compared to traditional VMs. Schedulers, service discovery, and logging become mandetory baselines to a succesful container strategy, and even things as simple as backups need to be rethought to accomidate.We personally have embraced the Hashicorp tools over some of the more popular choices (kubernetes) to help deliver this vision:Packer allows us to build container hosts. It's set on a cronjob to rebuild an image once a month so that patching is always up-to-date. This is probably the most "traditional" tool in our arsenal.Nomad is a scheduler that we plug into our CI to actually deliver containers to VM hosts. Hosts are tagged with "datacenters" (read: dev, test, stage, prod) and Nomad simply deploys the containers to the appropriate host. We dont track servers, they are all jsut compute targets.Consul handles service discovery and k/v store. Services dpeloyed register here so they can be queried as neded. We tag our containers with the label "proxy" so that our load balancers can pick them up and configure HA dynamically, or "exporters" so that Prometheus knows to scrape them for metric data.We are trying to ease in Terraform next, to actually provsion hosts from HA images. The idea here being we can add servers to the cluster, then drain and retire old servers automatically. Basically we wont patch boxes: we replace them.Becuase everythign is volitale, we use elasticsearch to gather all logs of the containers, and store all data ditrectly in Nutanix LUNs via the nutanix docker driver.Do we still have "legacy" servers made the old fashioned way by logging in and installing software? Of course. Containers are still new enough that not everything meets this new model. But we try to keep it to a minimum.
Thanks for sharing
Justin-DynamicD - sounds like you folks are pretty mature in your process at the moment. Have you heard of Blue-Green deployment models?
There is a good explanation from Digital Ocean here [
go], just came to mind as I read '... The idea here being we can add servers to the cluster, then drain and retire old servers automatically. Basically we wont patch boxes: we replace them'
I think the days of maintaing and patching servers is over, as you mentioned '... Basically we wont patch boxes: we replace them.' I'm seeing that more and more.
I recently read a good book on -
Infrastructure as Code: Managing Servers in the Cloud the author shares a lot of the same thinking you have on the topic.
"mature" is a relative term 🙂 I think what were are doing is fairly mature compared to some, but as I'm still in the thick of things I feel like I have a long way to go still :)The blue green model you mention is a goal right now. We are hoping terraform will help us deliver that by allowing use to stand up the new farm (blue) then leverage nomad to drain/move containers and flip things over. But ... it's pure theory craft atm :)I'm upgrading to 5.5 now and have been working with Nutanix folks on getting the Nutanix terraform plugin so we can start really testing this, then we'll see how mature things are.The other big part of this is packer. RIght now we run packer in VM that has qemu installed (nested virutalization) then copy the resulting image to Nutanix using a script that logs into a CVM ... so I'm hoping to "borrow" some of the work you guys did getting a terraform module working and apply it to packer so that pain is alleviated as well 🙂. But that's also ... well ... phase 2.
Justin-DynamicD is there a reason you are sticking with the VM constructs for doing your work and not leveraging a orchestration system (k8s, mesos, swarm, etc)?
Multi-tenancy, security, 12 factor limits? Feels like a ton of custom work to keep the fabric coherent so interested to understand the motivation.
To be clear, nomad is our orchestration/scheduling system. We send it jobs, and it figures out the target computer, ensures the container is healthy and stays running, and handles canary. blue/green updates, etc. We are abosolutely NOT running without orchestration, and I'd like to make clear that I'd never recomend anyone do that beyond simple container testing.But I think your question may be more about why the hashi-tools vs a pre-bundled system like kubernetes/swarm/mesos? Well the answer is a good mental excersize for me, so buckle in :)Why Not Kubernetes?So this one has some very company specific history and needs that color this descision so I'll try to be transparent.
When I started at this current company, Kubernetes was the incumbant. We had already deployed it in numerous locations multiple times. Remember that Kubernetes was(is) young at that time, I think we were using 1.2. It wasn't until after nearly a full year of failures we completely abandoned not just k8s, but _all_ orchestration systems in order to meet a deadline becuase of constant issues. So when we knew we had to return to evaluating an orchestration system, K8S had a stigma on day one that didn't help it's case. K8S has matured a lot sense then, and now that AWS is finally offering a managed k8s service like Google/Azure we may revisist it, but lets break down it's short commings.First off was deploying k8s is hard. Really hard.
Yes there are vagrant files and canned scripts, but we are a fintech company, so nearly all of those scripts use some kind of "root access this service" shortcut or generic image that simply doesn't pass the security standards we need to comply with. Only recently with 1.8 has CIS even released hardening recomendations for k8s, so push it back a year and deploying k8s meant taking someone else's work, then going line-by-line through said code and making sure all steps complied with least privileage, then running said code and discovering that your /tmp mount lockdowns (a CIS OS recomendation) completely broke deployment. We ended up using Rancher to deploy Kubernetes as it was the only reliable way we could get deployments to work. But then that just meant trading the complexities of kubernetes for the complexities of rancher ... and basically at the end of the day when someone said "we need a new k8s stack" it was an effort in pain (would rancher work, or would it fall over? once deployed would k8s behave, or would something else want access to something our hardening wouldn't allow and cause more problems?).You'll note security is a theme in the deployment complexities I listed.
This was the straw that eventually broke the camel's back as even now, the week after kubecon 17, K8S's security is abysmal. See, even once k8s is up, it's security model is basically non-existent, and needs to be supplimented by things like Hashicorp vault for secrets management and tools like nuvector or twistlock for sdn firewall/security. Without them you end up with an extremely complex orchestration tool that needs to be heavily segmented in order to meet security requirements. To quote Hightower during his kubecon keynote:"Use separate clusters depending on your org chart...do not go down the RBAC rabbit hole if you don't have to"The reality of this quote is that k8s has no rbac to speak of.
The biggest strength of an orchestration system is to pull systems into a compute pool much like nutanix does for VMs. But without any meaningful way to isolate container deployment/access you end up building a herd of k8s clusters .... so where's the simplicity? At one (low point), we had a single prodution system in AWS running on 6 seperate k8s clusters, and some "clusters" were little more than a single box. Nuvector solved this for us (awesome container stateful firewall, take a look), but still, we are now layering ontop of k8s which is already itself a compilation of 7-8 systems.At the end of the day, k8s is a system of tools that are tightly coupled a lot like Openstack. You are just grabbing said bundle so you dont have to think of those tools independantly.
But if those bundled tools dont fit your use-case, are you going to layer on helm, add calico, bring in nuvector, add vault .... or just go with something else? In our case, no matter how we approached k8s, we had to layer heavily so it made sense to just start much simplier. Again: k8s is now going to be bundled into docker, this story is changing and we may look again if it crosses the complexity (and secuirty) hurdles it needs to. But until then it's a matter of mitigating normal accidents https://en.wikipedia.org/wiki/Normal_AccidentsWhy not Swarm?We actually were very close to picking swarm when we knew it was time to revisit orchestration, and for a short while even had our CI generating compose files for services. The coolest thing about swarm is that it is the polar opposite of k8s when it comes to standing up: it's dirt simple. docker swarm init. done.
It was so easy to do that we were kind of euphroric over it for awhile. So why didn't we settle on that? Slightly different reasons but at the end of the day Swarm had two flaws that set it back.The first was that as new versions of swarm were released, the flexibility of options were reduced. That simplicity I really loved (one line dpeloyment) came ata cost of no longer being able to leverage custom SDN backends (it used to use consul/etcd but now sues something completley native). We still have plenty of legacy systems, and we already dpeloyed consul to be our SDN (it works wonderfully well with prometheus), so abandoning consul was a bit of an annoyance. Not a critical flaw, but it was an odd direction to see a service like swarm _remove_ options as it developed instead of adding them.
The issue that made it ultimately "lose the vote" is that swarm completley lacks any kind of remote management in the ce version. Again: security. If you want to stand up an ew service, drain anode, anything, you need to log into the swarm leader to send your docker commands. There's no remote api whats-so-ever. my local docker can't connect and issue commands, I can't even curl commands. That's a big deal for us, as we know we have a long-term goal of imutable boxes and plan on completely disabling ssh at some time. So swarm ... is litterally only capable of running in pesonal lab capacity (and some do it).Why did we end up on Nomad?So you read earlier we already had Consul in use due to the way we deployed prometheus. So we stumbled on Nomad when exploring our options and it simply checked all the boxes.
To be honest, there was a short debate on if Nomad would possibly do what kubernetes did ... but when we really drilled into what we wanted it was three things:1. service discovery. We technically already had that in consul, and nomad could leverage that (vs. the k8s etcd or swarms prorpietary dns)2. security/secrets management. Vault is simply fantastic here. If we were deploying k8s again we'd be using it anyway (they added full support which is awesome so try it if your a k8s fan) and nomad is the same company so that suport is also already baked in.3. container lifecycle management. This is precisely what nomad is meant to handle. we created a sublist of what that meant to use and it checked the boxes.So if we already had a tool to satsify #1, and #2 had to happen regardless then it really came down to what was easier to deploy and manage when our goals are #3 and only #3.
Remember my rant on how much trouble we had reliably deploying k8s in a secure environment? Nomad is a single binary. one file. you download it. You add a json/hcl to set your server settings. That's it. It's very predictable in how it runs, which makes it very easy to lock down. If I want a new nomad client to add to my cluster, i bring up a new box, and run the nomad binary. done. That's very easy to put behind terraform, puppet, chef, whatever. All i have to do is make sure said json I copied has it defined as a client and is given the name of the server. I can add acls and encryption keys to complicate this, but at it's core this is dirt simple.Nomad also came with a fringe benefit: it's not just for containers. You can also use it to run distirbuted batch jobs and other tasks.
We havn't explored this that much yet, but but it's nice to know that as a orchestration tool nomad extends beyond docker containers.So hopefully that explains how we ended up where we did. Bit of bad blood with k8s, followed by a realization that all we really needed could be satisfied by one little binary vs an entire system. EDIT: this is riddled with typos and run-ons. But thems the breaks when you type inbetween work duties :)