Solved

Metro Availability (Standby Site Maintenance steps)

  • 26 August 2019
  • 1 reply
  • 2294 views

Hello,

I will need to power off a "Metro Standby Site", which has VMs running on it.
I would like to manually migrate those VMs to the primary (active) site and Shut down / Power Off the nodes in the Standby site approx. 24-36 hours.
All VMs need to be running on the active side and no VMs are to be restarted (RTO + RPO =0).

The steps I propose to take to achieve the goal are as follows:

I have included as much environment info as possible.

Any insight or corrections to this process are welcome.

Thanks in Advance

Phase I (Configure Standby Site)

  1. Disable/suspend the (x1) asynch PD (from Primary)
  2. Update VMware DRS Rules (Manual) (from Primary)
  3. Disable HA
  4. Manually perform vMotion of all VMs to active site.
  5. Configured VM affinities to the hosts of the primary site and set the DRS rules to Fully Automated.
  6. Create new VMGroup and Host Group if needed
  7. I do not want to disable DRS and lose my DRS Affinity Rules
  8. Disable Metro per PD (x3) (from Primary)

Phase II (Power Off)

  1. Shut down CVM(s) on (Standby site only)
  2. Put VMware Host(s) in "Standby Mode" (Standby site only)
  3. Power Off Nutanix Node(s) (IPMI) (Standby site only)


Phase III (Power Up) (Standby site only)

1. Power on Nutanix Node(s) (IPMI)
3. Disable "Standby Mode"
4. Ensure CVM(s) Powered Up
a. Status Checks


Phase IV (Re-enable PD)

  1. Migrate VMs back to Standby Site (previous VMs via affinity rules?)
  2. From the Primary Prism console establish the metro availability configuration by clicking Re-enable on each PD.
  3. Reestablish the DRS affinity Rules to VMs to original Groups etc.
  4. Reenable the async PD
Overview:
Power Off DEB Hosts for Extended period not to exceed 36 hours

DEA (Primary) (5 Node)
DEB (Standby) (5 Node)
DEB to be Powered Off

VMs are running on the Standby Hosts

Environment: (Same)
Ntx-XX-XXX / Ntx-XX-XXX

Nutanix
AOS 5.10.5 (LTS)
NCC 3.7.1.2

ESXi 6.7.0
VMware 6.7.0.30000

3 Blocks
5 Hosts
Blocks 1-2 NX-8035-G4 x2
Block 3 NX-8035-G5 x1

Blocks 1-2 (x4 Hosts)
Block 3 (x1 Host)


Cluster Config:
DRS: Enabled (Fully Automated) (Default settings)
HA: Enabled

Remote Site:
x1 Remote site
vStore Mapping:
Hci-XXX-adv-01 : hci-XXX-adv-01-mirror

Metro Availability PDs: both sides have identical resources, Disk, CPU, Memory

Active:
x3 active Containers

Standby:
x3 standby Containers

Async: (active on primary site)
x1 PD
x2 VMs

Admission Control:
Failover capacity is defined by reserving a percentage of the cluster resources.
Reserved failover CPU capacity:
25%
Reserved failover Memory capacity:
20%

Name HAEnabled HAFailover DrsEnabled DrsAutomationLevel
Level
---- --------- ---------- ---------- ------------------
Cluster Name True 1 True FullyAutomated
icon

Best answer by Mutahir 19 November 2019, 18:09

View original

This topic has been closed for comments

1 reply

Userlevel 3
Badge +4

Hi@Christanix 

apologies for the late reply

Steps look fine - always before shutting down or any other activity on Nutanix clusters, it is advisable to upgrade ncc and perform a full “ncc health_checks run_all - this ensures that all services / configuration / components are in optimal state.

If you have already performed the above activity - do share your experience. Following are some tips (which you probably would have checked on our portal):

 

For Metro Availability (Data Protection) Disable Metro Availability and Verify no Async DR replications are in progress.

  • Log on to Prism and, in the Table view of the Data Protection dashboard, select the Active protection domain and then click Disable. If the cluster you want to shut down has Standby protection domains, in the Table view of the Data Protection dashboard, select the Active protection domain on the relevant cluster and then click Disable as well. For more information, see: https://portal.nutanix.com/#/page/docs/details?targetId=Web-Console-Guide-Prism-v50:wc-dr-metro-enable-disable-r.html
  • For Async DR: Ensure no schedules or replications are running by executing the following command from any Controller VM.
    nutanix@cvm$ ncli pd ls-repl-status

For Shutting down ESXi clusters and metro clusters :
https://portal.nutanix.com/kb/000001438

 

BR