Skip to main content
Question

Deploying Metro Availability and Prism Central

  • November 30, 2025
  • 2 replies
  • 25 views

Daniel Martinez
Forum|alt.badge.img

Hi everyone

I have to deploy a Metro Availability architecture for two AHV clusters (no ESXi involved) and I have several questions so I’d appreciate some feedback from those who have deployed similar setups.
 

The main goal is to achieve the lowest possible RPO and RTO between two datacenters with a robust and simple architecture. Both clusters will be separated by about 100m on two separate buildings and they will be connected with FC cables (<5ms).

As far as I know, only ESXi with vMSC supports true RTO = 0 

However, since this environment is AHV-only, my understanding is that in a failover event the VMs from Site A would need to be powered on in Site B rather than running continuously on both sides... Is that correct?

That being said, here you are the possible scenarios to deploy Metro Availability between both sites:

  1. Two AHV clusters in active–active mode, without an external Witness

  2. Two AHV clusters in active–active mode, with an external Witness (this one looks nice for me)

  3. Two AHV clusters in active–passive mode, without an external Witness

  4. Two AHV clusters in active–passive mode, with an external Witness

On top of that, I’m also evaluating how to deploy Prism Central in this scenario. These are the options I’m considering:

  1. A single Prism Central managing both clusters (I assume this scenario makes only sense without external witness but its very fragile cause it has a sigle point of failure)

  2. Each cluster will have its own Prism Central (I assume this scenario makes only sense with external witness and it seems to be the most robust in the event of a cluster failure the other site still has his own PC)

  3. A single PC with async DR to the second site (the idea is to simplify the architecture by managing a single PC for both clusters but in case the cluster hosting the PC fails it can be recovered on site B)

  4. One Prism Central per cluster + Adding Nutanix Central On-Prem to unify the PCs and avoid a single point of failure (As far as I know since PC 7.3 you can add an aditional management layer with Nutanix central on premise to manage several PCs and I think it avoids the request of external witness)

I’m particularly interested in what you consider the most stable and practical approach when running Metro Availability across two sites.

If you have experience running MA with AHV especially around Prism Central design choices, I’d really appreciate your opinions, recommendations, or pitfalls to avoid.
 

Thanks in advance!

2 replies

JeroenTielen
Forum|alt.badge.img+8
  • Vanguard
  • December 1, 2025

You summarized that well. The witness is needed for automatic failover. If you want that then the witness is needed. Automatic failover well help reducing the RTO ;) But RTO 0 is not possible on AHV clusters. 

 

What I'm doing the most is you scenario 4. Each DC/MER has it's own Prism Central. And on a third location there is a witness for the automatic failover. Nutanix Central is for the single point of management.

 

Do go with 1 Prism Central. When the cluster goes down (where Prism Central is hosted) then you first need to recover Prism Central on the other cluster before you can trigger the failover. This will increase RTO by approx an hour. 


Daniel Martinez
Forum|alt.badge.img

Thanks ​@JeroenTielen for sharing your knowledge!

So if I understand correctly:

  • The Witness is only required for automatic failover and for avoiding split-brain scenarios, regardless of whether the architecture is active–active or active–passive.

  • RTO = 0 is not achievable on AHV, since VMs will always need to be powered on at the surviving site.

  • Your most common deployment pattern is what I listed as scenario 4:

    • each cluster/site has its own Prism Central,

    • Witness resides in a third location,

    • Nutanix Central acts as the global management layer (does it helps with RTO automation?)


Also what I find very interesting is your last point regarding a single Prism Central:

“When the cluster goes down (where Prism Central is hosted) then you first need to recover Prism Central on the other cluster before you can trigger the failover. This will increase RTO by approx an hour.”

Cause this is exactly the part I was concerned about :P

It basically means that in a real disaster scenario, losing the cluster that hosts PC directly delays the Metro failover, because the management plane must be recovered before you can promote the secondary side.

So I assume that in case you have one PC on each cluster, in the event of a cluster failure in SiteA, the the PC on siteB will be able to automaticaly failover all the VMs on clusterB… is that correct?
 

If previous statements are correct, then:

  • 1 x PC = simpler design, but higher RTO in a site failure

  • 2 x PC (one per Cluster) = more resilient management plane, lower RTO, especially when the “primary” cluster is the one that fails

  • Nutanix Central = unified view + avoids the downsides of isolated PCs

Thanks