The best way to know if your DR solution will work when you need it is to actually test the DR workflows, right? Of course a DR test can be disruptive so you’ll want to understand the procedures and best practices before your testing window actually starts.
The Async-DR solution built into your Nutanix cluster can handle both planned and unplanned failover scenarios. Testing these capabilities is not much different from simply using them when needed.
The two most relevant documents for these procedures will be the Prism Web Console Guide and the Data Protection and Disaster Recovery best practice guide. The Prism Web Console Guide provides the execution steps for setup and failover, while the best practice guide provides additional detail on available solutions, requirements, and related considerations around space, bandwidth, and seeding, and a best practices checklist.
After reviewing the requirements and scheduling a test window, you can follow the planned failover workflow in the Prism Web Console Guide to test a planned failover. In brief, a planned failover will shut down and unregister local VMs, update replication to the remote site, and restore and power on VMs at the remote site. Basically this is a cold migration between clusters. At the conclusion of this workflow, you should have your migrated VMs powered on at the remote site, and the ability to snapshot and replicate VMs back to the primary site. You’ll want to check things internally on the VMs, to ensure configurations like AD authentication and file shares work as expected after migration.
In an unplanned failover, the protection domain is activated by the admin via Prism on the DR site. VM’s are not automatically powered on and will need to be powered on manually. This allows the admin to make adjustments to the VMs like updating networking configuration. It’s important to note that in a real unplanned failover situation we generally cannot reach the source cluster so the configuration will not change on that end. This means some manual clean-up is needed before new data from the DR site can be sent back to the source site. The steps for unplanned failover are just lower on the page linked earlier for planned failover. The clean-up steps are covered in the section Failing Back a Protection Domain (Unplanned)
If you want to do some validation of the unplanned failover workflow without impacting the source site, you can activate the protection domain at the remote site, then deactivate it after checking results, or you can do a restore from snapshot to a new filesystem location and then delete the clone(s) later. In either case, make sure to update networking before powering on the VMs during this kind of test. The original VMs still own their IP addresses and we don’t want to power on clones with duplicate IPs unless networking for the clones is totally isolated.