Correct order to configure LACP/LAG

Question

I have pulled some new nodes into an existing cluster that utilizes LACP/LAG on the current nodes. The default configuration on the new nodes is Active/Backup. We also have new ToR switches that are not configured in a LAG. Are these the correct steps the configure Active/Active? The new nodes are only participating in the metadata ring - not hosting any VMs:

To modify networking on the new nodes:

Change the node to LACP enabled (ACTIVE/ACTIVE) using vs0 in the GUI – select only that ONE node to update the config on!!
Put node into Maintenance Mode
1. Login to CVM
2. acli host.list
3. acli host.enter_maintenance_mode host-IP-address
4. DO I NEED TO PUT THE CVM IN MAINT MODE AS WELL??
Make changes on ToR switch – enable the 2 ports for LAG for that node ONLY
1. Test networking
Remove from Maintenance mode
1. acli host.exit_maintenance_mode host-ip
Verify resiliency returns to OK
Move onto the next node

I would love some feedback on if this is the correct order.

icon

Best answer by bcaballero 16 February 2022, 18:52

View original

bcaballero · Accepted Answer

Hi @MelissaAdamsI suppose that you are using AHV as hypervisor and your AOS version is 5.20 or above, so following that premise….If you take a look to this article regarding to AHV Networkinghttps://portal.nutanix.com/page/documents/solutions/details?targetId=BP-2071-AHV-Networking:BP-2071-AHV-Networkingand scroll down to LACP section you can check the following Nutanix RecommendationNutanix recommends that you enable LACP on the AHV host with fallback to active-backup, then configure the connected upstream switches. Different switch vendors may refer to link aggregation as port channel or LAG. Using multiple upstream switches may require additional configuration, such as a multichassis link aggregation group (MLAG) or virtual PortChannel (vPC). Configure switches to fall back to active-backup mode in case LACP negotiation fails (sometimes called fallback or no suspend-individual). This switch setting assists with node imaging and initial configuration where LACP may not yet be available on the host.With that in mind, the recommendation is to enable LACP first on hosts and then on switches. About the Virtual Switch configuration through Prism Element it will do the following steps using the “Standard method”https://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_20:ahv-cluster-nw-vs-uplink-config-ahv-r.htmlIf you change the uplink configuration of vs0, AOS applies the updated settings to all the nodes in the cluster one after the other (the rolling update process). To update the settings in a cluster, AOS performs the following tasks when configuration method applied isStandard:Puts the node in maintenance mode (migrates VMs out of the node)Applies the updated settingsChecks connectivity with the default gatewayExits maintenance modeProceeds to apply the updated settings to the next nodeIn the past I tried to change from Active/Backup to LACP 4 nodes. Something wen’t wrong with TORs and the process timed out on the very first node leaving the rest untouched. But for peace of mind I would do one node at a time. Your step 1 looks goodso farHere’s the link for Enabling LAG and LACP on the ToR Switcheshttps://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_20:wc-enable-lag-and-lacp-on-tor-switch-t.htmlProcedurePut the node in maintenance mode. This is in addition to the previous maintenance mode that enabled Active-Active on the node.Enable LAG and LACP on the ToR switch connected to that node.Exit maintenance mode after LAG and LACP is successfully enabled.Repeat steps 1 to 3 for every node in the clusterIf you check how to put a host into maintenance modehttps://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_20:ahv-node-maintenance-mode-put-ahv-t.htmlit includes the CVM on step 6 aswell,thenyou should put CVM on maintenance mode on your step 2Put the CVM into the maintenance mode.nutanix@cvm$ ncli host edit id=host-ID enable-maintenance-mode=trueReplacehost-IDwith the ID of the host.This step prevents the CVM services from being affected by any connectivity issues.Step 3 looks good. On step 4 you should exit CVM out of maintenance mode because of your step 2. Steps 5 & 6 are ok too!Hope that this can help you a bitP.S. This is based on my own experience,maybe you prefer waiting for a Nutanix employee for a more accurate responseRegards!

bcaballero · Answer

Glad it helped, you’re welcome @MelissaAdamsRegards!

Procedure

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded