Acropolis Block Services - CVM / node failure scenario

  • 3 October 2017
  • 2 replies
  • 3030 views

Badge +1
Hi
I was wondering if anyone could assist us in some proof of concept work we are presently undertaking. We are looking at Acropolis block services for some of our systems which will remain as physical devices. In our initial PoC work we have noticed a quirk with how the system reacts to node failure and we are wondering how others get around it.
Our PoC is three OEM nodes with a volume group containing three vdisks. This works out nicely as each CVM ends up assigned with one vdisk. We then have external clients connecting to the ABS ISCSI data services IP address to get storage from the PoC cluster.
As part of our operational acceptance testing, we have been disconnecting a node from the network to replicate a failure and then observing the ABS and client OS behaviour.
Coming from a 3 tier solution back ground I would normally expect to see a redundant path kick in pretty much instantly and applications being completely unaware of any issue. The OS would normally registered path down in event viewer, but the MPIO tools allow seamless recovery.
In our PoC the behaviour is different. The OS figures it doesn’t have connectivity to the vdisk anymore, waits the default OS timeout of 60 seconds, then forces the ISCSI initiator to re-initialise, this takes an additional 15-20 seconds (in our testing 19 seconds consistently)
It almost seems like there is no multi path / resilience on the storage / ABS level in the same way, it is completely reliant on switching CVM targets which takes time.
This is a worry for us, certain applications we have cannot cope with this level of disk interrupt so I was wondering what other people do? Reduce the OS time out to a few seconds to force the re-initialisation earlier? I suspect the 19 seconds would also give us problems on one application we have L
When we plug the node back in, the fail back works seamlessly after a couple minutes, taking the 15-20 seconds once more for re-initialisation to favoured node.
Does anyone use ABS extensively? We are presently quite concerned about recommending it internally..
As an aside, previously in version 4.7 full multi path functionality seemed to be available but this seems to have been deprecated now, is there a specific reason for this? Was it too complicated to maintain a path map for the data segments in RF2 and 3?
Thanks

2 replies

Userlevel 4
Badge +18
Hi HCCDST

You should not be seeing a 60 second disk timeout, I would open a case with support if you're seeing this and losing access to the disk. You should only be seeing the path failover which takes 15 to 20 seconds as you indicate. This is consistent with MPIO as I've tested it on Nutanix or even other solutions. If there's an outstanding IO in the queue, even MPIO has to wait for timeouts before it can fail that IO to another path. I'm sure each failover timing for a SAN will be dependent on the array and failure type (hba, san port, controller port, controller head), and I don't mean to question your experience, but I've tested path failover for a VMAX (a truly active/active controller architecture) and have seen basic scenarios take 30 seconds for retries to go through. And I've heard of controller failures with other solutions take even longer for IO retries to occur. Happy to discuss in more detail directly if you'd like to ping me.

Thanks,
Mike
Badge +1
Hi MikeThanks for the response, sorry should clarify and update the thread as well. We’ve done some additional testing with SQL loads over last couple of days.
We’re not loosing connectivity to the disk at any point in the process, well not to the point its detrimental to the health of the system! The OS is dealing with the lack of connectivity initially, pausing the IO and then supressing any event log entries, as expected, during the timeout period. Then we see the ISCSI initiator kick in and do the standard re-initialisation steps for the volume, so from that point of view it is all going to plan!
We’ve tried this with simulated SQL load as well, removing both data and log drives individually in the same manner. The solution has coped admirably. No disk connectivity issues, no data loss so we’re really happy about that.
Question away, I know that I don’t know enough 🙂
We’ve been through a couple of painful migration projects (new arrays, then separate ISCSI migration), where we’ve had quirks in the configuration of both arrays and clients. So we’re trying to get ahead of the curve on this one and figure the best practices for each aspect of the solution.
Over the last few implementations we’ve used differing OS disk timeouts (default is now 30 across our infrastructure), lots of different driver sets and modified MPIO settings (mainly around Path verification and timing on that)
We’re at a point now that our core technical teams know what to expect from the 3 tier solution and the behaviour of their systems in fail over situations. They have read up on Nutanix as a proposed platform and have questioned the 15-20 failover, so I just need to re-assure them it is fine. From our POC testing we cannot see anything wrong at all! That is a lab environment though so not indicative of real world. We have some rather sensitive applications that seem to have issues no matter what :-(
We’ve spoken to four reference sites but none of them have used ABS in anger, ideally what I need to do is talk to someone who uses ABS extensively. A company that uses 40+ devices connected via ABS for differing application types (SQL, Oracle, SAP) So using it as an enterprise storage platform as well as a hypervisor platform.
From our PoC this potentially isn’t a technical question anymore after reviewing the findings, it’s probably a heart and minds campaign. :-)
Might well take you up on that offer, I’ve got your email from Hardev and from the conference call a couple weeks back.
RegardsRob

Reply