Hi
I'm hoping someone has a similar setup.
We have a 2 sites connected by fast links with one Nutanix cluster at each site (16 nodes in each cluster). We use Metro availability to replicate between sites. We have 1 stretched ESXi cluster hosting mainly Window servers.
We have a number of VM guest failover SQL clusters (2 nodes in each cluster) which require shared storage and therefore require the use of Nutanix Volume groups. Each VM is connected to the volume group via ISCSI.
We can't use Metro replication to replicate these VMS and their associated volume groups to the other site as you can't replicate volume groups with Metro availability - it's not supported.
So, the only other option is to use Async replication. Here are my questions:
- In the manual the conditions for Async say "Do not have VMs with the same name on the primary and the secondary clusters. Otherwise, it may affect the recovery procedures"
Q: If we failover the async protection domain containing the cluster node VMs and their volume groups to the secondary Nutanix cluster will this cause a conflict when the VMs are registered back into the same ESXi cluster? It's the same EXI cluster due to the Metro configuration so the VM names will be ultimately be the same.
- Again from the Async guide "Do not include the source and the destination cluster in the same datacenter because the VMs could get deleted from both the source and destination clusters post the migration process"
Q: When is refers to Datacenter is it referring to an ESXi Datacenter? Our ESXi cluster which spans both Nutanix clusters is in the same ESXi Datcenter
- Because the new failover clusters are running Window server 2016 and 2019 it turns out that the related entity selection process that links the servers and associated volume groups together when you setup the protection domain won't work for this version of Windows. This means when we failover the VMs to the other Nutanix site there's no workflow job to reconnect the guest ISCSI disks to the secondary's sites data services IP. We will basically need to do this manually. It's not a massive deal but a bit of a pain :(
What I'm thinking of doing with all of the above in mind is to:
Run one node of the SQL cluster on our primary Nutanix site and the other cluster node on the secondary Nutanix site. Because this is an active\passive failover cluster I will set the SQL role to run on the cluster node on the primary site.
When a failover occurs (because of Windows patching or for any other reason) the services can still failover to the node running on the Nutanix secondary site as this node will be pointing at the VG located on the primary site. Although I/O will be going across sites during this time this will only be temporary while patching is taking place - because of the fast links this won't be problem really.
Once any patching is complete then the services will be set to failback to the cluster node running on the primary Nutanix site.
I can then setup a protection domain to only replicate the Volume group. This means if I have a problem with the primary site and need to failover, I will failover the protection domain which only contains the volume group. No VMs will need to be brought up online in the secondary site avoiding any conflicts. The only thing I'll need to do on the cluster node running on the secondary site is configure a script to reconnect the ISCSI disks to the secondary sites data services IP.
Q: Does anyone else have a similar setup? I'm interested to know how you've set it up and what problems you’ve run into if any.
Thanks in advance.