Failover Testing Failed

Question

Hello we have a brand new cluster (4 nodes) and did some failover testing over the weekend prior to placing the new gear into production. We have 2 cisco 4500x 10GB switches and have the nodes split between the two switches for redundancy. We simulated a switch failover by pulling the plug on one of them and experiened some very bad results. Some CVMs became unresponsive and we lost connectivity to most of our VMs. Has anyone else experienced a similar issue? Or can anyone point me in the right direction for configurations to double check? Would appreciate any insight!

charlie_chuhak · Accepted Answer

Worked with support to get this sqaured away. We ended up upgrading the hypervisor which solved the issue.

Jon · Answer

First thing - have you put in a support ticket with Nutanix yet?

thats the best first step here, as we can help you validate your configuration on the Nutanix and hypervisor side. We can also help guide the conversation on the network side (bunch of our support staff are ex Cisco, and some are even CCIEs)

past that, it would be key to know how you have your vSwitches setup and how you have your Cisco switches setup.

when you make the support ticket, if you could attach "show run" from both switches, appropriately censored (don't need SSH keys or anything like that), that would really help.

feel free to CC me on the ticket, Jon at Nutanix dot com

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded