I wanted to engage the community to share our fault tolerant Nutanix powered VMware Horizon VDI design and architecture in the hopes it will open some dialog and potentially assist other customers as they venture into this effort.
I would be remiss to dive into the design without sharing some background information. I work for a quasi-governmental airport authority that employs a relatively small team of six dedicated IT staff. Because we are designing a solution for an airport, fault tolerance and high availability is an absolute must. Some of our desktops are mission critical, life safety machines. The desktop systems in this cluster support our fire department, police department and airport operations. With our datacenters on a campus surrounded daily by hundreds of swarming planes flying at low altitudes, we spend a lot of time designing our infrastructure. The last time we want problems presenting a desktop would be during an incident. We never close, and operations occur 24/7 365 days a year – we try to design things so we are not called in to work on Christmas Eve at midnight.
After a few iterations, we landed on standing up two separate clusters of three Nutanix servers. Each of these two clusters has its own vCenter server (VCSA), two VMware Horizon View Connection Servers (internal and external), a file server and a VMware View Composer server. We are utilizing non-persistent Linked Clones. Users are entitled to one of a several pools contingent on application needs and business unit. All pools are running Windows 10 LTSB. We are using Liquidware’s Profile Unity to manage the profiles in conjunction with the “normal” Group Policy settings to improve the desktop experience. DFS replication and namespace is in place between the two file servers and profile data is stored within this DFS space. We utilize the VMware Cloud Pod features for a Global Entitlement across the two clusters. A load balancer assists us in routing desktop requests between the two clusters.
This has been a great architecture for us so far. We can run updates to vCenter, and the Horizon View components in the middle of the day without the risk of affecting production availability in the other cluster. We have some challenges with replicating base images – today we use the VMware Content Libraries to replicate the base across the two clusters, which does work but due to the de-prioritization of the traffic that vCenter imposes, it takes about 20 minutes to replicate base image changes from one cluster to the other. Recomposes must be done twice (on a “per cluster” basis), but this is not a big deal with Nutanix because they are so fast.
I welcome any suggestions on how to get the base images from one side to the other in a more efficient way. We had some challenges when we used Nutanix Metro to replicate the VM across the clusters because we needed to create a snapshot from a single base image from within two separate VCenters and VMware did not like that. Anyways, I hope this helps others when they are pondering a fault tolerate Horizon View deployment on Nutanix.
You mentioned metro, but have you tried async replication by placing the template VM on primary side in a protection domain to be replicated to secondary site? You could then force replication, see when it finishes and spin out a copy on the 2nd cluster, snap it in vCenter and recompose.
Other options would be some manual copy process, vSphere host based replication.
We have the same design for out 2,000 CCU deployment. Works well. I don't know of a better way than conent libraries. VMware should add a feature to have a central Gold image that you can push to multiple pods.