We have several AHV clusters across the WANscape all managed from Prism Central.
On one AHV cluster, I created a Windows Server 2016 and a Windows Server 2019 VM from CD ISOs, patched and tweaked and these are our “Golden Images”. I clone those and run sysprep and those are what I make new VMs from.
Now I have two problems: 1.) in each cluster we have an RF2 container and an RF3 container so these VMs are bound to one or the other, and 2.) how do I get these to other clusters?
To address the differing containers, I end up using the image service (create image of the disks and create new VM from those disks). But now I’m wasting storage because I’m keeping copies of the same VM on two different containers.
To address the differing clusters, I was thinking about setting up Protection Domains and letting Nutanix replicate them to each cluster. They would only get updated once or twice per month for Windows patches. Doing it this way would allow me to control replication schedule and bandwidth but would only replicate snapshots so recreating the VMs on the other end would require manually restoring from snapshot.
If I am not mistaken, there is an Image Service in Prism Central that would be globally available to all the clusters? If I put the disk images in that, is there a way of controlling the bandwidth used?
Does anyone have any best practices to accomplish this?
Thanks!
Best answer by TimGaray
So this is what I ended up doing. I will do it this way until such time as maybe there are changes such as the ability to deploy across different storage containers. So far only one AHV cluster has more than one storage container so it isn’t too bad yet.
On the source AHV cluster, I have a Windows 2012R2, a Windows 2016 and a Windows 2019 “golden image” VM setup. I then clone those and run sysprep and shut them down.
I have a protection domain setup to replicate those sysprep VMs to the other AHV clusters every night. I also have the original (not sysprepped) VMs setup in a local protection domain just for backup purposes.
When I update the golden images with Windows Patches, I wait until the new images are replicated to the remote AHV clusters. I wrote a PowerShell script that will go through each remote AHV cluster and delete existing disk images, restore the sysprepped VMs from snapshots, upload new disk images from the restored VMs then delete the restored VMs. It doesn’t take that long to run (maybe 20 minutes or so for all of them).
I also have a PowerShell script to do the same thing on the source AHV cluster to create images for the RF2 storage container and the RF3 storage container.
It sounds like a lot but with the PowerShell scripts it’s not that bad. So far bandwidth hasn’t been an issue but as we bring more AHV clusters onboard at remote sites then there may be cases where we need to throttle the bandwidth of the protection domain to those remote locations.
Help me understand the set up a little better, please. There are 2 VMs on 2 different containers. The VMs serve as Gold images not only for the cluster they reside on but also for other clusters within the environment.
The goal is to be able to deploy VMs off these Gold images to both containers and to other clusters. Is that a correct summary of the task?
There is no best practices guide at the moment for images or templates at the moment.
I would love to give you a solution as this is a valid request, but I can’t and I apologise. There are improvements on the process and the feature in the pipeline.
At this point in time, you could use Prism Central image management feature. Note that with this feature images would be placed on SelfService container at both Prism Central as well as at the cluster that receives the image. From there you would have to copy images to all of the containers of interest.
Image Management section of Prism Central Guide talks about how to upload images from a workstation, a central server or import it from one of the clusters.
You could use Protection Domains, of course. As you can imagine, it would still require some manual effort to create images from the VMs and place them into all containers that you wish to deploy.
I sincerely hope that the above helps. Let me know, please if you have further questions.
Thank you @Alona, that is a correct description of the environment.
Due to the multiple containers in each cluster (one RF2 and one RF3), I am leaning towards using the Prism Central image management. Thank you for the link to the guide.
If I get something up and running, I will be sure and post back my results.
After reviewing the Image Management material, although images seem to make the most sense there does not seem to be a way to throttle the bandwidth. Some of our sites have small WAN connections and if I upload an 80GB disk image then Prism Central could flood that connection when transferring the image to that remote cluster.
I understand your concern in relation to bandwidth. There is no obvious option to throttle bandwidth, unfortunately. I would suggest to open a case with Nutanix Support for a possible custom tweaking.
Look forward to reading about the results of your efforts.
It would appear that images are also stored on specific storage containers and it won’t let me create a VM in one container with a disk image in another container.
So, whether I create a VM from disk images or clone from VMs, I have to keep a set on each storage container in a cluster (usually two: one RF2 and one RF3).
I’ll put in a support ticket and see what happens.
So this is what I ended up doing. I will do it this way until such time as maybe there are changes such as the ability to deploy across different storage containers. So far only one AHV cluster has more than one storage container so it isn’t too bad yet.
On the source AHV cluster, I have a Windows 2012R2, a Windows 2016 and a Windows 2019 “golden image” VM setup. I then clone those and run sysprep and shut them down.
I have a protection domain setup to replicate those sysprep VMs to the other AHV clusters every night. I also have the original (not sysprepped) VMs setup in a local protection domain just for backup purposes.
When I update the golden images with Windows Patches, I wait until the new images are replicated to the remote AHV clusters. I wrote a PowerShell script that will go through each remote AHV cluster and delete existing disk images, restore the sysprepped VMs from snapshots, upload new disk images from the restored VMs then delete the restored VMs. It doesn’t take that long to run (maybe 20 minutes or so for all of them).
I also have a PowerShell script to do the same thing on the source AHV cluster to create images for the RF2 storage container and the RF3 storage container.
It sounds like a lot but with the PowerShell scripts it’s not that bad. So far bandwidth hasn’t been an issue but as we bring more AHV clusters onboard at remote sites then there may be cases where we need to throttle the bandwidth of the protection domain to those remote locations.