Step-by-Step Guide to Deploying Nutanix Metro Availability | Nutanix Community
Skip to main content

With Nutanix Metro Availability you can achieve a RPO of zero (0). This blog post will explain how to set this up.

First let me explain my lab:

I have two Nutanix clusters (Cluster-1 and Cluster-2) fresh foundationed with AOS 5.20.3.5 and ESX 7.0 U2. DNS and NTP are configured, and I created Storage Containers with the following names (all this is done on BOTH clusters, except the vCLS containers):

  • METRO_1-2
    • This is for guest vm's running on Cluster-1 which are synced to Cluster-2;
    • Set the advertised capacity to the real capacity;
  • METRO_2-1
    • This is for guest vm's running on Cluster-2 which are synced to Cluster-1;
    • Set the advertised capacity to the real capacity;
  • vCLS-1 (Only on Cluster-1)
    • This is for the vCLS virtual machine created by vCenter;
  • vCLS-2 (Only on Cluster-2)
    • This is for the vCLS virtual machine created by vCenter.

The three containers. Screenshot is from Cluster-2. On Cluster-1 there is a vCLS-1

For metro availability you need to have both Nutanix clusters in the same VMware cluster in vCenter. So, deploy a vCenter instance on one of the clusters (or somewhere else, or use an already deployed vCenter) In my case the vCenter server in deployed on Cluster-1. Note: This is not best practice, as vCenter is needed for proper operation of the metro availability. So, in production deployments place the vCenter somewhere else.

In vCenter create a new datacenter and cluster:

Leave everything default. We will change this to the correct settings later.

Add all nodes to the VMware cluster:

When the nodes are added to the cluster, the cluster will deploy a couple of vCLS virtual machines. Make sure you migrate them to the vCLS storage containers. (Ignoring the warnings vCenter will trigger during the migration wizard). Distribute them as evenly as possible. In my case vCLS-1 will hold 2 virtual machines and vCLS-2 only 1.

Now we need to register the vCenter into Nutanix. In Prism Element go to Settings --> vCenter Registration and click Register. (Do this in BOTH clusters).

Set the VMware cluster configuration according to the Nutanix best practices: Nutanix Portal VMware Best Practices.

Enable DRS:

  • DRS Automation
    • Automation Level: Fully automated
    • Migration threshold: 3
    • Predictive DRS: Disabled
    • Virtual Machine Automation: Enabled
  • Additional Options
    • All unchecked
  • Power Management
    • DPM: disabled
  • Advanced Options
    • All unchecked/empty

APD (All Paths Down) needs to be enabled on each host. Run the following two lines on each host via SSH:

esxcli system settings advanced set -o "/Misc/APDHandlingEnable" --int-value "1"

esxcli system settings advanced set -o "/NFS/HeartbeatTimeout" --int-value "30"

More info here: Enable/Disable APD

Enable HA:

  • Failures and responses
    • Enable Host Monitoring: Enabled
    • Host failure response: Restart VMs
      • Response for Host Isolation: Power off and restart VMs
      • Datastore with PDL: Disabled
      • Datastore with APD: Power off and restart VMs - Aggrassive restart policy
        • Response Recovery: Disabled
    • VM Monitoring: VM Monitoring Only
  • Admission Control
  • Heartbeat Datastores
    • Select the two METRO datastores
  • Advanced Options
    • Leave empty/default

Under VM Overrides make sure that the CVMs are disabled in DRS and HA.

Enable VMware EVC and select your oldest cpu family in the cluster.

Create two Host Groups:

  1. HOSTS_CLUSTER-1
    • Add all hosts from Cluster-1.
  2. HOSTS_CLUSTER-2
    • Add all hosts from Cluster-2.

Create two VM Groups:

  1. VMs_CLUSTER-1
    • Add all virtual machines which are running on Cluster-1.
  2. VMs_CLUSTER-2
    • Add all virtual machines which are running on Cluster-2.

With VM/Host Rules we can decide where the virtual machines should run. And, in case of a failure, that they can start/run on the other cluster.

Create two rules:

  1. Name: CLUSTER-1
    • Enable Rule
    • Type: Virtual Machines to Hosts
    • VM Group: VMS_CLUSTER-1
    • Should run on hosts in group
    • Note: Make sure "should" is selected or else the virtual machine will not run on the other cluster
    • Host Group HOSTS_CLUSTER-1
  2. Name: CLUSTER-2
    • Enable Rule
    • Type: Virtual Machines to Hosts
    • VM Group: VMS_CLUSTER-2
    • Should run on hosts in group
    • Note: Make sure "should" is selected or else the virtual machine will not run on the other cluster
    • Host Group HOSTS_CLUSTER-2

We need a witness vm to monitor both clusters. Download the witness vm from the Natunix portal and deploy it somehere in a third location. Deployment guide: Witness VM Deployment. Register BOTH clusters to the witness: Settings --> Configure Witness.

In each cluster create a remote site under Data Protection. Make sure:

  • Give Nutanix cluster virtual IP
  • Make NO VStore mappings.

In Data protection go to Metro Availability and create the Protection Domain (Metro Availability).

!!!! MAKE SURE YOU CREATE IT IN THE CORRECT SYNC WAY !!!!

So, Protection Domain METRO_01-02 (From 01 to 02) should be created on Cluster-1 and PD METRO_02-01 (From 02 to 01) should be created on the Cluster-2.

Now the 2 storage containers are in sync between the clusters. For now, you can migrate virtual machines between the two clusters (from vCenter) if you choose the correct storage and hosts. But this will require long copying time. This can come in handy but is not the goal of Metro availability.

To do this instant (manually, and seamless) we need to migrate the virtual machines to the other cluster and then promote the protection domain. These are the steps:

  1. Change the VM/Host Rule in vCenter to point the Host Group to the other cluster
    • Wait until DRS migrates all machines to the other cluster
  2. Promote Protection Domain to point to the other cluster
  3. When the machines are over, enable Protection Domain again to get both clusters back in sync

To do this automatically (down time for each virtual machine): Shut down a cluster 😉 Don’t forget to disable DRS before turning the cluster back on. Or else VMs will migrate back before the the storage containers are back in sync.

Links:

Implementing vSphere Metro Storage Cluster using Nutanix Metro Availability

Metro Availability Best Practices (Detail)

Metro Availability Best Practices Checklist

Metro Availability (ESXi and Hyper-V 2016)