Windows guest failover clusters and Async replication

  • 8 April 2022
  • 0 replies

Badge +1


I'm hoping someone has a similar setup.  I originally posted this question a month or so ago but the answer I received isn't correct.  The post has been marked as solved so I'm unable to reply hence why I'm opening a new one. 

I think  there was a misunderstanding  with my  described setup in the previous post so I've tried to make it clearer here.  This is tricky to explain without diagrams so I appreciate this may be difficult to provide a valid answer and I may not get a response.

Our Setup

We have a 2 sites connected by fast links with one Nutanix cluster at each site (16 nodes in each Nutanix cluster).  

These two Nutanix clusters make up 1 stretched ESXi cluster hosting mainly Window servers.  We then use Metro availability to replicate between these sites\Nutanix clusters.

So in short - One ESXI cluster that's stretched across two sites and we use Nutanix Metro to synchronously replicate.

On this environment we host a number of VM guest failover SQL clusters (2 nodes in each cluster)  which require shared storage and therefore require the use of Nutanix Volume groups.  Each VM is connected to the volume group via ISCSI.

These VMs can't reside in Metro enabled containers\datastores so we can't use Metro replication to replicate these VMS and their associated volume groups to the other site.  You can't replicate volume groups with Metro availability, it's not supported.

So the only other option for us to replicate these VMs to our other site is to use Async replication.

First Questions around Async replication

  1. In the Nutanix manual, the conditions for Async say  "Do not have VMs with the same name on the primary and the secondary clusters. Otherwise, it may affect the recovery procedures"

Q: If we failover the async protection domain which contains the cluster node VMs and their associated volume groups to the secondary Nutanix cluster will this cause a conflict when the VMs are registered back into the same ESXi cluster?

It's the same EXI cluster due to the way we have it configured for Metro so the VM names will ultimately be the same when they are failed over.

  1. Again from the Async guide "Do not include the source and the destination cluster in the same datacenter because the VMs could get deleted from both the source and destination clusters post the migration process"

Q: When is refers to Datacenter is it referring to an ESXi Datacenter?  Our ESXi cluster which spans accross both Nutanix clusters is in the same ESXi Datcenter.

  1. Because the new failover clusters are running Window server 2016 and 2019 it turns out that the related entity selection process that links the servers and associated volume groups together when you setup the protection domain won't work for this version of Windows.  T

This means when we failover the VMs to the other Nutanix site there's no workflow job to reconnect the guest ISCSI disks to the secondary's sites data services IP.  We will basically need to do this manually. It's not a massive deal but a bit of a pain :(

What I'm thinking of doing with all of the above in mind is to:

  • Have one node of the guest SQL cluster run on our primary Nutanix site (in a local container\datastore) and have the other guest SQL cluster node run on the secondary Nutanix site (again in a local container\datastore).  
  • Because this is an active\passive SQL failover cluster I will set the SQL role to run on the cluster node located on the primary Nutanix site. 
  • When a failover of the SQL cluster role needs to occur (because of Windows patching or failure scenario) the role can still failover to the cluster node running on the secondary Nutanix site as this cluster node will be pointing at the Volume Group located on the primary site.  Although I/O will be going across sites during this time this will only be temporary until we fail the role back
  • I could then setup a protection domain to only replicate the Volume group and not the individual VM cluster nodes. 

In the event where I need to failover this protection domain,  I'd just need to reconnect the ISCSI disks on any of the available SQL cluster nodes to the secondary sites data services IP.

Q: Does anyone else have a similar setup?  I'm interested to know how you replicate your guest clusters in this kind of setup.


Thanks in advance.

0 replies

Be the first to reply!