Solved

Async replication and Windows failover guest clusters

  • 11 March 2022
  • 1 reply
  • 335 views

Badge +1

Hi

I'm hoping someone has a similar setup.  

We have a 2 sites connected by fast links with one Nutanix cluster at each site (16 nodes in each cluster).  We use Metro availability to replicate between sites.  We have 1 stretched ESXi cluster hosting mainly Window servers.

We have a number of VM guest failover SQL clusters (2 nodes in each cluster)  which require shared storage and therefore require the use of Nutanix Volume groups.  Each VM is connected to the volume group via ISCSI.

We can't use Metro replication to replicate these VMS and their associated volume groups to the other site as you can't replicate volume groups with Metro availability - it's not supported.

So, the only other option is to use Async replication.  Here are my questions:

  1. In the manual the conditions for Async say  "Do not have VMs with the same name on the primary and the secondary clusters. Otherwise, it may affect the recovery procedures"


Q: If we failover the async protection domain containing the cluster node VMs and their volume groups to the secondary Nutanix cluster will this cause a conflict when the VMs are registered back into the same ESXi cluster? It's the same EXI cluster due to the Metro configuration so the VM names will be ultimately be the same.  

  1. Again from the Async guide "Do not include the source and the destination cluster in the same datacenter because the VMs could get deleted from both the source and destination clusters post the migration process"


Q: When is refers to Datacenter is it referring to an ESXi Datacenter?  Our ESXi cluster which spans both Nutanix clusters is in the same ESXi Datcenter

 

  1. Because the new failover clusters are running Window server 2016 and 2019 it turns out that the related entity selection process that links the servers and associated volume groups together when you setup the protection domain won't work for this version of Windows.  This means when we failover the VMs to the other Nutanix site there's no workflow job to reconnect the guest ISCSI disks to the secondary's sites data services IP.  We will basically need to do this manually. It's not a massive  deal but a bit of a pain :(


What I'm thinking of doing with all of the above in mind is to:

Run one node of the SQL cluster on our primary Nutanix site and the other cluster node on the secondary Nutanix site.  Because this is an active\passive failover cluster I will set the SQL role to run on the cluster node on the primary site. 

When a failover occurs (because of Windows patching or for any other reason) the services can still failover to the node running on the Nutanix secondary site as this node will be pointing at the VG located on the primary site.  Although I/O will be going across sites during this time this will only be temporary while patching is taking place - because of the fast links this won't be problem really.

Once any patching is complete then the services will be set to failback to the cluster node running on the primary Nutanix site.

I can then setup a protection domain to only replicate the Volume group. This means if I have a problem with the primary site and need to failover, I will failover the protection domain which only contains the volume group.  No VMs will need to be brought up online in the secondary site avoiding any conflicts.  The only thing I'll need to do on the cluster node running on the secondary site is configure a script to reconnect the ISCSI disks to the secondary sites data services IP.

Q: Does anyone else have a similar setup?  I'm interested to know how you've set it up and what problems you’ve run into if any.

Thanks in advance.

icon

Best answer by DavidN 21 March 2022, 20:10

View original

This topic has been closed for comments

1 reply

Userlevel 3
Badge +6

While I understand your question, (setup of 1 stretched ESXi cluster that are spread across two different (16-node) Nutanix clusters) I don’t know if this is actually supported with respect to Metro type DR...

https://portal.nutanix.com/page/documents/details?targetId=Prism-Element-Data-Protection-Guide:sto-metro-availability-c.html

 

  • Nutanix cluster nodes to vSphere cluster mapping: The Nutanix data distribution architecture does not support the mapping of the nodes of a Nutanix cluster to multiple vSphere clusters.

     

  • Note: If the nodes of a Nutanix cluster are not in the same vSphere cluster and are split into multiple vSphere clusters, disaster recovery operations like migration and recovery fail.

    Ensure that all the nodes in the Nutanix cluster are in the same vSphere (ESXi) host cluster and that the network is available on all the nodes in the Nutanix cluster at the primary or remote site. Any configuration using Nutanix DR where there is more than one ESXi cluster on top of a single Nutanix cluster is not supported.

 

I really recommend contacting Nutanix regarding the options you have…

(My gut would be is that more than likely they’ll recommend that a new VMware cluster is created and move ESXi hosts from the single existing VMWare cluster to the newly created VMware cluster one at a time to represent a better picture of the Metro DR scenario...)