I have pulled some new nodes into an existing cluster that utilizes LACP/LAG on the current nodes. The default configuration on the new nodes is Active/Backup. We also have new ToR switches that are not configured in a LAG. Are these the correct steps the configure Active/Active? The new nodes are only participating in the metadata ring - not hosting any VMs:
To modify networking on the new nodes:
- Change the node to LACP enabled (ACTIVE/ACTIVE) using vs0 in the GUI – select only that ONE node to update the config on!!
- Put node into Maintenance Mode
- Login to CVM
- acli host.list
- acli host.enter_maintenance_mode host-IP-address
- DO I NEED TO PUT THE CVM IN MAINT MODE AS WELL??
- Make changes on ToR switch – enable the 2 ports for LAG for that node ONLY
- Test networking
- Remove from Maintenance mode
- acli host.exit_maintenance_mode host-ip
- Verify resiliency returns to OK
- Move onto the next node
I would love some feedback on if this is the correct order.
Best answer by bcaballeroView original
I suppose that you are using AHV as hypervisor and your AOS version is 5.20 or above, so following that premise….
If you take a look to this article regarding to AHV Networking https://portal.nutanix.com/page/documents/solutions/details?targetId=BP-2071-AHV-Networking:BP-2071-AHV-Networking and scroll down to LACP section you can check the following Nutanix Recommendation
With that in mind, the recommendation is to enable LACP first on hosts and then on switches. About the Virtual Switch configuration through Prism Element it will do the following steps using the “Standard method” https://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_20:ahv-cluster-nw-vs-uplink-config-ahv-r.html
In the past I tried to change from Active/Backup to LACP 4 nodes. Something wen’t wrong with TORs and the process timed out on the very first node leaving the rest untouched. But for peace of mind I would do one node at a time. Your step 1 looks good so far
Here’s the link for Enabling LAG and LACP on the ToR Switches https://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_20:wc-enable-lag-and-lacp-on-tor-switch-t.html
If you check how to put a host into maintenance mode https://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_20:ahv-node-maintenance-mode-put-ahv-t.html it includes the CVM on step 6 aswell, then you should put CVM on maintenance mode on your step 2
Step 3 looks good. On step 4 you should exit CVM out of maintenance mode because of your step 2. Steps 5 & 6 are ok too!
Hope that this can help you a bit
P.S. This is based on my own experience, maybe you prefer waiting for a Nutanix employee for a more accurate response
Glad it helped, you’re welcome