Solved

Correct order to configure LACP/LAG

  • 16 February 2022
  • 3 replies
  • 1501 views

Badge

I have pulled some new nodes into an existing cluster that utilizes LACP/LAG on the current nodes.  The default configuration on the new nodes is Active/Backup.  We also have new ToR switches that are not configured in a LAG.  Are these the correct steps the configure Active/Active?  The new nodes are only participating in the metadata ring - not hosting any VMs:

To modify networking on the new nodes:

  1. Change the node to LACP enabled (ACTIVE/ACTIVE) using vs0  in the GUI – select only that ONE node to update the config on!!
  2. Put node into Maintenance Mode
    1. Login to CVM
    2. acli host.list
    3. acli host.enter_maintenance_mode host-IP-address
    4. DO I NEED TO PUT THE CVM IN MAINT MODE AS WELL??
  3. Make changes on ToR switch – enable the 2 ports for LAG for that node ONLY
    1. Test networking
  4. Remove from Maintenance mode
    1. acli host.exit_maintenance_mode host-ip
  5. Verify resiliency returns to OK
  6. Move onto the next node

I would love some feedback on if this is the correct order.

icon

Best answer by bcaballero 16 February 2022, 18:52

View original

This topic has been closed for comments

3 replies

Userlevel 4
Badge +5

Hi @MelissaAdams 

 

I suppose that you are using AHV as hypervisor and your AOS version is 5.20 or above, so following that premise….

 

If you take a look to this article regarding to AHV Networking https://portal.nutanix.com/page/documents/solutions/details?targetId=BP-2071-AHV-Networking:BP-2071-AHV-Networking and scroll down to LACP section you can check the following Nutanix Recommendation

 

Nutanix recommends that you enable LACP on the AHV host with fallback to active-backup, then configure the connected upstream switches. Different switch vendors may refer to link aggregation as port channel or LAG. Using multiple upstream switches may require additional configuration, such as a multichassis link aggregation group (MLAG) or virtual PortChannel (vPC). Configure switches to fall back to active-backup mode in case LACP negotiation fails (sometimes called fallback or no suspend-individual). This switch setting assists with node imaging and initial configuration where LACP may not yet be available on the host.

 

With that in mind, the recommendation is to enable LACP first on hosts and then on switches. About the Virtual Switch configuration through Prism Element it will do the following steps using the “Standard method” https://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_20:ahv-cluster-nw-vs-uplink-config-ahv-r.html

 

If you change the uplink configuration of vs0, AOS applies the updated settings to all the nodes in the cluster one after the other (the rolling update process). To update the settings in a cluster, AOS performs the following tasks when configuration method applied is Standard:

  1. Puts the node in maintenance mode (migrates VMs out of the node) 
  2. Applies the updated settings
  3. Checks connectivity with the default gateway
  4. Exits maintenance mode
  5. Proceeds to apply the updated settings to the next node

 

In the past I tried to change from Active/Backup to LACP 4 nodes. Something wen’t wrong with TORs and the process timed out on the very first node leaving the rest untouched. But for peace of mind I would do one node at a time. Your step 1 looks good so far

 

Here’s the link for Enabling LAG and LACP on the ToR Switches https://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_20:wc-enable-lag-and-lacp-on-tor-switch-t.html 

Procedure

  1. Put the node in maintenance mode. This is in addition to the previous maintenance mode that enabled Active-Active on the node.
  2. Enable LAG and LACP on the ToR switch connected to that node.
  3. Exit maintenance mode after LAG and LACP is successfully enabled.
  4. Repeat steps 1 to 3 for every node in the cluster

 

If you check how to put a host into maintenance mode https://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_20:ahv-node-maintenance-mode-put-ahv-t.html it includes the CVM on step 6 aswell, then you should put CVM on maintenance mode on your step 2

 

  1. Put the CVM into the maintenance mode.
nutanix@cvm$ ncli host edit id=host-ID enable-maintenance-mode=true

Replace host-ID with the ID of the host.

This step prevents the CVM services from being affected by any connectivity issues.

 

Step 3 looks good. On step 4 you should exit CVM out of maintenance mode because of your step 2. Steps 5 & 6 are ok too!

 

Hope that this can help you a bit

 

P.S. This is based on my own experience, maybe you prefer waiting for a Nutanix employee for a more accurate response

 

Regards!

Badge

@bcaballero  THANK YOU!!!  Because I do not want to disrupt the current nodes already configured properly, I’m doing this all command line and NOT using the Element GUI.  I tested all the commands on my AHV test cluster and I have all the steps down.  Thanks for your response!

Melissa

Userlevel 4
Badge +5

Glad it helped, you’re welcome @MelissaAdams 

Regards!