Hi @MelissaAdams
I suppose that you are using AHV as hypervisor and your AOS version is 5.20 or above, so following that premise….
If you take a look to this article regarding to AHV Networking https://portal.nutanix.com/page/documents/solutions/details?targetId=BP-2071-AHV-Networking:BP-2071-AHV-Networking and scroll down to LACP section you can check the following Nutanix Recommendation
Nutanix recommends that you enable LACP on the AHV host with fallback to active-backup, then configure the connected upstream switches. Different switch vendors may refer to link aggregation as port channel or LAG. Using multiple upstream switches may require additional configuration, such as a multichassis link aggregation group (MLAG) or virtual PortChannel (vPC). Configure switches to fall back to active-backup mode in case LACP negotiation fails (sometimes called fallback or no suspend-individual). This switch setting assists with node imaging and initial configuration where LACP may not yet be available on the host.
With that in mind, the recommendation is to enable LACP first on hosts and then on switches. About the Virtual Switch configuration through Prism Element it will do the following steps using the “Standard method” https://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_20:ahv-cluster-nw-vs-uplink-config-ahv-r.html
If you change the uplink configuration of vs0, AOS applies the updated settings to all the nodes in the cluster one after the other (the rolling update process). To update the settings in a cluster, AOS performs the following tasks when configuration method applied is Standard:
- Puts the node in maintenance mode (migrates VMs out of the node)
- Applies the updated settings
- Checks connectivity with the default gateway
- Exits maintenance mode
- Proceeds to apply the updated settings to the next node
In the past I tried to change from Active/Backup to LACP 4 nodes. Something wen’t wrong with TORs and the process timed out on the very first node leaving the rest untouched. But for peace of mind I would do one node at a time. Your step 1 looks good so far
Here’s the link for Enabling LAG and LACP on the ToR Switches https://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_20:wc-enable-lag-and-lacp-on-tor-switch-t.html
Procedure
- Put the node in maintenance mode. This is in addition to the previous maintenance mode that enabled Active-Active on the node.
- Enable LAG and LACP on the ToR switch connected to that node.
- Exit maintenance mode after LAG and LACP is successfully enabled.
- Repeat steps 1 to 3 for every node in the cluster
If you check how to put a host into maintenance mode https://portal.nutanix.com/page/documents/details?targetId=AHV-Admin-Guide-v5_20:ahv-node-maintenance-mode-put-ahv-t.html it includes the CVM on step 6 aswell, then you should put CVM on maintenance mode on your step 2
- Put the CVM into the maintenance mode.
nutanix@cvm$ ncli host edit id=host-ID enable-maintenance-mode=true
Replace host-ID with the ID of the host.
This step prevents the CVM services from being affected by any connectivity issues.
Step 3 looks good. On step 4 you should exit CVM out of maintenance mode because of your step 2. Steps 5 & 6 are ok too!
Hope that this can help you a bit
P.S. This is based on my own experience, maybe you prefer waiting for a Nutanix employee for a more accurate response
Regards!
@bcaballero THANK YOU!!! Because I do not want to disrupt the current nodes already configured properly, I’m doing this all command line and NOT using the Element GUI. I tested all the commands on my AHV test cluster and I have all the steps down. Thanks for your response!
Melissa
Glad it helped, you’re welcome @MelissaAdams
Regards!