Solved

Networking issues - trying to add in a second switch


Userlevel 2
Badge +3

Hi All

Havoc struck this morning when I tried to move my AHV cluster from old switch stack to new switch stack - the hosts, I think went into panic mode and started to restart various VMs.

The background to this is:  I have 2 x Dell X4012 core switches and 2 x Dell N3024 ToR switches, currently due to moving VMware environment to Nutanix only 1 x X4012 is connected to NIC 1 on each of the 3 hosts.  From the X4012 there is a cross connect into the old VMware network Core blade switches (Dell 8024 x 2) which are connected to the old ToR N3048 switch stack, these are then connected to the router for VPLS and DMZ.

 

So to move the connectivity, I thought I could add the second NICs of the hosts to the second X4012 and disconect the crossconnect and the first NICs thus using the newer switches, albeit on the opposite NICs.  I planned to reconfigure and update the first X4012 and add back into the stack with a LAG to the N3024 stack and no crossconnect.  I moved the NIC connections and lost all pings so I rolled back - probably exacerbating the problem as I still could not ping - it was then I realised I had a restart storm.

 

Could anyone tell me a way of doing the changes without creating a storm?  I have added a diagram, the parts in black are current , the parts in red are where I need to be.

 

Thanks in advance

icon

Best answer by JeroenTielen 1 June 2023, 12:18

View original

This topic has been closed for comments

10 replies

Userlevel 6
Badge +8

You want the Nutanix cluster to switch from old to new switch stack? 

 

  1. Put 1 node in maintenance and shut it down. 
  2. Change the interfaces  connections from the node (which is turned off) to the new switch stack. 
  3. Turn on the node and check netwerk connectivity from the node to the other nodes.
  4. If network is correct then remove node from maintenance and do the next node. 
Userlevel 5
Badge +6

Hello @Eric-The_Viking 

Before make any physical network changes, you need to put the node in maintenance mode, only one node per time.

Please follow @JeroenTielen steps 

Userlevel 2
Badge +3

Hello @Eric-The_Viking 

Before make any physical network changes, you need to put the node in maintenance mode, only one node per time.

Please follow @JeroenTielen steps 

 

Good morning Jeroen and Mousafa, thanks for your reply, however as the 2 switch stacks are isolated there would be no point in maintenance mode as all VMs would end up on one node and then I would have the same problem. 

I have been pondering this problem and I got to thinking about the 1gb NICs in each host - would it be possible to add them into an unmanaged switch solely for the purpose of maintaining a heartbeat?  And if so would a new VLAN be needed?

 

Thanks

Eric

 

Userlevel 6
Badge +8

Ahh if isolated, then shutdown the cluster and repatch the nodes. Keep it simple. ;)

Userlevel 2
Badge +3

Ahh if isolated, then shutdown the cluster and repatch the nodes. Keep it simple. ;)

Hi Jeroen

That would be nice but it is the full production environment and I think the downtime would be too long, as I found to my cost.

Userlevel 6
Badge +8

Ok got it. What you can do:

 

  1. make an extra vswitch with interfaces which arent is use.
  2. patch those interfaces on the other network stack.
  3. change the vlans to the other vswitch or, if not possible, change the interface of the vm’s to the other switch. (Make sure the vlans/networks are correctly configured)
Userlevel 2
Badge +3

Ok got it. What you can do:

 

  1. make an extra vswitch with interfaces which arent is use.
  2. patch those interfaces on the other network stack.
  3. change the vlans to the other vswitch or, if not possible, change the interface of the vm’s to the other switch. (Make sure the vlans/networks are correctly configured)

Hi Jeroen

Thanks I will take a look and see if I can work it out, would you be able to add a rough sketch of this solution ( I am a visual learner) ?

 

Thanks

Eric

Userlevel 2
Badge +3

Ok got it. What you can do:

 

  1. make an extra vswitch with interfaces which arent is use.
  2. patch those interfaces on the other network stack.
  3. change the vlans to the other vswitch or, if not possible, change the interface of the vm’s to the other switch. (Make sure the vlans/networks are correctly configured)

Hi Jeroen

I had some luck and I am now able to access the live X4012 so I can add a trunk to the second x4012 - as an interim measure can you see any Nutanix network problems if I then add NIC0 from each node to the second switch? 

 

Thanks

Eric

 

Userlevel 2
Badge +3

Hi All

Thank you all for the suggestions, I have managed to get all the NICs on both switches working in Actiive SLB mode - I had been a wombat and missed the backplane vlan off the second switch, without the backplane the hosts lose their way and weird stuff happens.

 

Have a super day

Eric

 

Userlevel 6
Badge +8

Good to hear all is working again.