Protection Domain Based DR


Badge +2

Hi, I am having two physical sites with on-prem nutanix cluster. I am trying to config async dr from primary to secondary site every one hour. So the snapshot will trigger every one hour and replicate to the secondary site. I read in the portal , the first replica is a full copy and the subsequent are incremental. My question here is , once my first replica, which is a full copy of my data was completed to the secondary site and certain period of time, my first replica will expire and deleted based on my retention configuration. In that case, how can I restore my full data with the help of incremental snapshot from the remote site?


This topic has been closed for comments

32 replies

Userlevel 3
Badge +6

Hi,

you will always have one or more fully restorable snapshots at your secondary site. This needs a full copy for the first sync, while during subsequent syncs, only a delta is copied over. The snapshots are not incremental like incremental backups that need to keep an initial full.

Badge +2

Ok..It means all the snapshots having full data but only the initial snap needs to send full copy of data.  The Subsequent snaps always have all the data but only during data transmission it will transfer only the delta.   How the subsequent snap get sync with initial snap ? how it works in the backend?

Userlevel 3
Badge +5

Hi Jaya,

Let me try to explain, When u first configure the PD and start replication the first sync is all data replicated to secondary site, unless the first sync is not completed the incremental schedules are not initiated.

Once first sync is done and data is in sync, the PD schedule is triggered for any incremental/delta. That will take a snapshot at primary, then that snapshot metadata is send along with the delta to secondary, hence the secondary knows the details when and what the delta is for. 

This is followed as per your configured schedule, now your question about retention. when you say maintain 3 snapshots, then when the 4th snapshot is replicated successfully the  last snapshot is deleted, like FIFO, when we say snapshot is deleted means the data of the snapshot is merged with the actual data. The snapshot is basically point in time of chain of chances in data.

The following link might help u understand more.

https://next.nutanix.com/how-it-works-22/snapshots-so-simple-and-yet-so-complicated-faq-38007

https://next.nutanix.com/how-it-works-22/snapshots-so-simple-and-yet-so-complicated-faq-part-2-38061

 

F>P

Badge +2

So the second and subsequent snapshot will have all the data even the first snapshot got deleted because it get sync with that. Am i correct ?. Also I am trying to prepare a HLD for Nutanix Dr solutions. Let me know what are the things to be mentioned or any link is available for guidance?

Userlevel 3
Badge +5

Hi,

Yes, as part of garbage collection the when the retention is over the snapshots are deleted, means they are merged. I would recommend to read the following based on your requirements to understand the DR and its requirements based on RPO/RTO.

https://portal.nutanix.com/page/documents/details?targetId=Prism-Element-Data-Protection-Guide-v6_6:Prism-Element-Data-Protection-Guide-v6_6

https://portal.nutanix.com/page/documents/details?targetId=Disaster-Recovery-DRaaS-Guide-vpc_2023_1_0_1:Disaster-Recovery-DRaaS-Guide-vpc_2023_1_0_1

 

F>P 

Badge +2

Thanks. If I want to use Disaster recovery(Leap) solution in my environment between two different sites. How many Prism Central I need to deploy ? and What License I need to have to use the DR feature?

-jai

Userlevel 1
Badge +1

Hi Jai, for Disaster Recovery, you will ideally install a PC at each site for best availability.  This will also require you to have either: Pro license with Advanced Replication add-on, or Ultimate license.

Badge +2

Thanks for your reply. I can see even with starter license ,the disaster recovery will work. Only if we want Advanced Orchestration with Runbook Automation2 we need  Pro license with Advanced Replication add-on, or Ultimate license. can you vaildate my point pls ? 

 

Also having two PC at each site is best practice. But it will also work if we have one PC where both the clusters can register… Pls validate this point as well.

Userlevel 1
Badge +1

Hi, technically yes, both points are valid, you can use protection policies for Async replication with any license but you don’t get the advanced features such as orchestration, and you can operate Disaster Recovery from a single PC to which both clusters are registered.  Obviously if you are going with a single PC it would need to be running on the target cluster not the source one.

Badge +2

What kind of orchestration you are referring ? Actually i want to create protection polices  with async  and recovery plans and ensure my entities will run automatically in DR if my primary goes down. Shall I achieve this with my starter license ?

-JAI

Userlevel 5
Badge +6

Thanks. If I want to use Disaster recovery(Leap) solution in my environment between two different sites. How many Prism Central I need to deploy ? and What License I need to have to use the DR feature?

-jai

Starter license will work for Async replication, for NearSync and Sync replication you will need Pro License + Adv. Rep. license.

One PC will work too.

Badge +1

Actually you don´t NEED to deploy Prism Central for configuring Replication between to Nutanix Clusters, just need to configure the Remote Sites + Protection Domains at each Prism Element.But it´s highly advisable to have at least a single PC instance as a single-pane-of-glass for managing all Nutanix clusters and features.

Badge +2

@hscavetta - Yes ! but that is only if we use protection domain based DR. If we are going to use Disaster Recovery ( LEAP) then we need to deploy PC.

Badge +2

@StuB - 

What kind of orchestration you are referring ? Actually i want to create protection polices  with async  and recovery plans and ensure my entities will run automatically in DR if my primary goes down. Shall I achieve this with my starter license ?

-JAI

Userlevel 3
Badge +5

Hi Jay.

There are few things to be understood.

The automatic failover is ONLY possible with using Metro Availability, with witness site ONLY. That also requires Advanced Replication addon top of AOS Pro license or AOS Ultimate.

Metro Availability for vSphere / Hyper-V is configured using Prism Element and for AHV using Prism Central.

Prism Central allows you to create protection policies, create recovery plans which gives DR runbook capabilities, DR test drills, VM power on/ shut down sequence, re-ip etc. this is referred as DR orchestration or Runbook and requires Advanced Replication addon top of AOS Pro license or AOS Ultimate. 

Async DR using Prism Element gives basic DR capabilities through Protection Domain with 60 min RPO and manual activation and VM power actions on actions, this does not require any additional licenses.

Hope that clarifies, if you need you can DM me for peer discussion.

F>P

 

Badge +2

Hi,

    @sl.farhanparkar . Thanks for your reply.   I am seeing the below topics in DR guide. 

Protection and Manual DR (Disaster Recovery)
Protection and Automated DR (Disaster Recovery)

Both the option needs Advanced Replication addon top of AOS Pro license or AOS Ultimate?

-JAI

Userlevel 3
Badge +5

Hi Jay,

To be more specific,

Async RPO 60 min or above with manual DR - AOS Starter.

RPO near-sync (1-15 min ) and Sync replication, Metro availability (RPO 0), Manual DR Orchestration  (Prism Central) needs Advanced Replication addon top of AOS Pro license or AOS Ultimate

F>P

Badge +2

You mean to say * Automate DR orchestration (Prism Central)…….in the second line ?

-JAI

Userlevel 3
Badge +5

Hi Jay,

DR Orchestration is always manually triggered though Recovery Plan,  What it means is when the primary site failed for any reason, the recovery plans are manually triggered by administrator on secondary site. that will automatically power on replicated VMs in sequence defined.

When u need kind of fully automated failover you have to consider a third site which host the witness to avoid split brain scenarios, where once a site failed the VMs will be restarted automatically like a HA (like within cluster when a nodes fails) to other cluster. That is also referred as MSC (Metro stretch cluster) or Metro Availability.

F>P 

Badge +2

Ok. I want to clear the license needed part.  Let me put like this and correct me if anything wrong.

 

  1. Protection Domain -based DR  :      Work with starter license(Async), implement in prism element both primary and secondary.
  2. Disaster Recovery(Leap) -  Either manual/automatic(Async), We need pro with advanced replicaion license or ultimate. Need to enable from prism central.

If it is nearsync or sync, with both PD based DR or Leap, we need required license to work.

Userlevel 3
Badge +5

Hi Jay,

Exactly, 

Ref: https://www.nutanix.com/products/cloud-platform/software-options

Data Protection and Disaster Recovery section

F>P

Badge +2

Hi,

     Thanks for your reply.

Protection and Manual DR (Disaster Recovery) - From this link https://portal.nutanix.com/page/documents/details?targetId=Disaster-Recovery-DRaaS-Guide-vpc_2023_1_0_1:ecd-ecdr-procedure-manualprotection-pc-c.html -  There are two options Clone and Revert...Here Revert Option is only available in the primary site ? Because the revert option will overwrite the existing VM which is running only on the  primary site. Can  you clarify this pls?

 

Userlevel 3
Badge +5

Hi Jay,

Yes, Nutanix snapshots either local or remote you can revert (replace original entity to that point in time) or clone form snapshot (create new entity from that point in time and leave original entity intact).

If required you can pull the remote snapshot and take same actions on that as above.

 

F>P

Badge +2

But my question is how the revert option will be available in recovery site. The revert option will overwrite the original entities which will be in my primary.

-JAI

Userlevel 3
Badge +5

Hi

There is no revert option in recovery site, on recovery site you will have activate option, when u have failure and primary site is not available you have to activate and the snapshots and entities will be available, with PD you will need to power on vms, using recovery plan vms will be on automatically as defined.

 

F>P