Solved

Metro Availability sensitivity

  • 10 November 2016
  • 8 replies
  • 4511 views

Userlevel 1
Badge +12
Is there a specific sampling interval that Metro Availibility uses to ping its Remote Site? I assume after 3 of these ping failures, the PD goes into a Disabled state. Can anyone confirm?

I'm running AOS 4.7.1 at the moment. Thanks!
icon

Best answer by Jon 11 November 2016, 18:54

its all the time (continuous), and if you have your PD set to 10 seconds, 10 pings and it will start tweaking.
View original

8 replies

Userlevel 4
Badge +21
Are you talking about the manual method?
Userlevel 4
Badge +18
Hi mandg The VM availability setting controls this behavior. If set to automatic you specify a timeout between 10 to 30 seconds. The disable will occur based on that specified timeout interval
Userlevel 1
Badge +12
Well, I think I understand the failure handing - I'm currently set at 10 seconds for each of my PD's. But with regards to the description, "In the event of a network failure or standby site failure, VM writes will temporarily halt" ...I was curious to know what has to transpire for the cluster nodes to determine that a network failure exists - I presume that the less than 5ms requirement comes into play and that there's a sampling done by a CVM (or CVM's) periodically. I'm trying to arm myself with information to work with our network team in the event that we have an event where Metro PD becomes disabled. Thanks!
Userlevel 7
Badge +30
the 5ms requirement is all about write latency. With any sync replication solution, WAN latency is "in path" for write latency, so if you had 100ms WAN link, you'd have at least 100ms storage latency for your writes.

Thus, most modern applications are pretty happy with a 5ms or less write latency, and thats why we suggest 5ms or less on your WAN link.

Pings are done every second, for the period of 10-30 seconds like mike said
Userlevel 1
Badge +12
Ah! ...I understand this better now. Thank you.

So, my PD's are defined with a 10 second interval, I am thus sending a total of 10 pings (one per second).
Are you able to share what the interval that this connectivity test is done (ie. every 5 minutes, 1 hour,
Continuously, etc).

Also, how many of these pings do i need to lose during a "test period" to render the network or remote site as failed?

Thanks again!
Userlevel 7
Badge +30
its all the time (continuous), and if you have your PD set to 10 seconds, 10 pings and it will start tweaking.
Userlevel 7
Badge +30
Also, as a side note, its worth looking into the Metro Witness feature that is coming out here shortly

joshodgers has a great blog on this here: http://www.joshodgers.com/2016/06/15/whats-next-2016-metro-availability-witness/
Userlevel 1
Badge +12
Awesome thanks, Jon. So in my case, 10 consecutive ping failures and the Remote Site will be marked as unreachable leading to the PD to go disabled. That's perfect - I got it now.

So, in a suspect network, it sounds like it'd be more beneficial for me to push that value up to 30.

I'm certainly interested in the upcoming Metro witness - I presume that'll be in asterisk. No need to respond to this - you guys got me cleared up on this now.

Thanks!!

Reply