Network Load Balancing with Acropolis Hypervisor

  • 3 December 2015
  • 7 replies
Network Load Balancing with Acropolis Hypervisor
Userlevel 7
Badge +34
In the first article of our four part Acropolis Networking series we tackled bridges and bonds, so we could split traffic among the multiple network interfaces on a physical Nutanix node.

Now that the CVM traffic is routed over the 10gb interfaces, and the User VM traffic can be routed either over the 10gb or 1gb adapters, we're ready to address load balancing within the OVS bonds. There are two primary concerns: fault tolerance and throughput.

To handle fault tolerance, we ensure that each bond is created with at least two adapters as in the diagram above. Once the bond has two or more adapters we can then move to managing the available throughput provided by the collective interfaces in a single bond. All of the following bond modes provide fault tolerance.

For a video walkthrough of the different load balancing modes with the Acropolis Hypervisor and Open vSwitch check out the following recording. The video shows some extra shortcuts such as "allssh" to speed up deployment of this configuration.

Within a bond, traffic is distributed between multiple physical interfaces according to the bond mode. The default bond mode is active-backup, where one interface in the bond carries traffic and other interfaces in the bond are used only when the active link fails.

View the bond mode and active interface with the following AHV command:

nutanix@CVM$ ssh root@ "ovs-appctl bond/show"

In the default configuration of active-backup, output will be similar to the following, where eth2 is the active and eth3 is the backup interface:

---- bond0 ----
bond_mode: active-backup
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
lacp_status: off

slave eth2: enabled
active slave
may_enable: true

slave eth3: enabled
may_enable: true

Active-backup bond mode is the simplest, easily allowing connections to multiple upstream switches without any additional switch configuration. The downside is that traffic from all VMs use only the single active link within the bond. All backup links remain unused. In a system with dual 10 gigabit Ethernet adapters, the maximum throughput of all VMs running on a Nutanix node is limited to 10 Gbps.

Active-backup mode is enabled by default, but can be configured with the following AHV command:

nutanix@CVM$ ssh root@ "ovs-vsctl set port bond0 bond_mode=active-backup"

To take advantage of the bandwidth provided by multiple upstream switch links, we recommend configuring the bond mode as balance-slb. The balance-slb bond mode in OVS takes advantage of all links in a bond and uses measured traffic load to rebalance VM traffic from highly used to less used interfaces. When the configurable bond-rebalance-interval expires, OVS uses the measured load for each interface and the load for each source MAC hash to spread traffic evenly among links in the bond.

Traffic from source MAC hashes may be moved to a less active link to more evenly balance bond member utilization. Perfectly even balancing is not always possible. Each individual virtual machine NIC uses only a single bond member interface, but traffic from multiple virtual machine NICs (multiple source MAC addresses) is distributed across bond member interfaces according to the hashing algorithm. As a result, it is possible for a Nutanix AHV node with two 10 gigabit interfaces to use up to 20 gigabits of network throughput, while individual VMs have a maximum throughput of 10 gigabits per second.

The default rebalance interval is 10 seconds, but we recommend setting this to 60 seconds to avoid excessive movement of source MAC address hashes between upstream switches. We've tested this configuration using two separate upstream switches with the Acropolis hypervisor. No additional configuration (such as link aggregation) is required on the switch side, as long as the upstream switches are interconnected.

The balance-slb algorithm is configured for each bond on all AHV nodes in the Nutanix cluster with the following commands:

nutanix@CVM$ ssh root@ "ovs-vsctl set port bond0 bond_mode=balance-slb"

nutanix@CVM$ ssh root@ "ovs-vsctl set port bond0 other_config:bond-rebalance-interval=60000"

Verify the proper bond mode with the following commands:

nutanix@CVM$ ssh root@ "ovs-appctl bond/show bond0"
---- bond0 ----
bond_mode: balance-slb
bond-hash-basis: 0
updelay: 0 ms
downdelay: 0 ms
next rebalance: 59108 ms
lacp_status: off

slave eth2: enabled
may_enable: true
hash 120: 138065 kB load
hash 182: 20 kB load

slave eth3: enabled
active slave
may_enable: true
hash 27: 0 kB load
hash 31: 20 kB load
hash 104: 1802 kB load
hash 206: 20 kB load

LACP and Link Aggregation
Because LACP and balance-tcp require upstream switch configuration, and because network connectivity may be disabled if cables from AHV nodes are moved to incorrectly configured switches, Nutanix does not recommend using link aggregation or LACP.

However, to take full advantage of the bandwidth provided by multiple links to upstream switches from a single VM, link aggregation in OVS using Link Aggregation Control Protocol (LACP) and balance-tcp is required. Note that appropriate configuration of the upstream switches is also required. With LACP, multiple links to separate physical switches appear as a single Layer-2 link. Traffic can be split between multiple links in an active-active fashion based on a traffic-hashing algorithm.

Traffic can be balanced among members in the link without any regard for switch MAC address tables, because the uplinks appear as a single L2 link. We recommend using balance-tcp when LACP is configured, since multiple Layer-4 streams from a single VM could potentially use all available uplink bandwidth in this configuration. With link aggregation, LACP, and balance-tcp, a single user VM with multiple TCP streams could potentially use up to 20 Gbps of bandwidth in an AHV node with two 10Gbps adapters.

Configure LACP and balance-tcp with the following commands. Upstream switch configuration of LACP is required.

nutanix@CVM$ ssh root@ "ovs-vsctl set port bond0 lacp=active"

nutanix@CVM$ ssh root@ "ovs-vsctl set port bond0 bond_mode=balance-tcp"

If upstream LACP negotiation fails, the default configuration is to disable the bond, which would block all traffic. The following command allows fallback to active-backup bond mode in the event of LACP negotiation failure.

nutanix@CVM$ ssh root@ "ovs-vsctl set port bond0 other_config:lacp-fallback-ab=true"

Finding the right balance
Use your virtualization requirements to choose the bond mode that's right for you! The following methods are arranged from least complex to most complex configuration. For simple and reliable failover with up to 10Gbps of host throughput with minimal switch configuration, choose active-backup. For instances where more than 10Gbps of throughput is required from the AHV host, use balance-slb. Where more than 10Gbps of throughput is required from a single VM, use LACP and balance-tcp.

This post was authored by Jason Burns, Senior Solutions & Performance Engineer at Nutanix

This topic has been closed for comments

7 replies

Badge +1
Hi  - thanks for that helpful information.

Quick question - in the second example Balanced-SLB the switches were interconnected but not configured as MC-LAG peers (in contrast to the TCP example where MC-LAG was a requirement). What is the purpose of this cross-connect link - assuming that the 2 switches do have a path to each other through the core/uplink network?

kind regards,

William Scull

Lenovo EMEA
Userlevel 2
Badge +14
I recommend that the switches are interconnected with balance-slb, but this isn't really a strict requirement. If the two top-of-rack switches are connected to the same core, that's good enough.

With balance-slb, a MAC address (VM) is going to be moving from one switch (old) to another (new). There is a period of time where the packets destined TO the VM will be sent over the old switch, but the response back will come to the new switch. The mac-address table entry will point to the old switch until the VM sends traffic to the new switch.

Both switches must share a core, or be interconnected to reduce the impact of the moving MAC addresses. If the switches do not share a common upstream connection, traffic could be dropped.
Badge +1

In the balance-slb, how about the switch mode interfaces configuration? Is it trunk mode for all interfaces (int to bond0, int between switch, and int to Core switch?

Userlevel 2
Badge +14
I'm assuming you're referring to Cisco's trunk mode as opposed to the access mode, for the purpose of allowing multiple VLANs on the port?

In my lab examples all switch ports were configured as "Trunk" ports. They were allowing all VLANs in my particular lab case. You could limit this just to a certain set of VLANs, as long as all ports allowed the same VLANs.

Let me know if this is what you're referring to and if it helps!
Badge +2
I want to use balance-slb and balance-tcp but I want to know what should be configuration from switch side for this requirement.

When I enable balance-slb on bond0 on nutanix node it stop pinging to another nodes in the network. while network team configured channeling between both ports where bond0 member ports were connected on two different switch those switches are also also interconnected with core switch.
Badge +3
balance-slb does not require any port channeling or different configuration on the switch side. For balance-tcp you would need a port channel on the switch in active mode, and lacp=active configured on the AHV br0-up interface.
Badge +1

I have an 8155 with (2) 2x10GB nics, so two seperate cards. I see the following recommendataions for this configuration on the support portal:

4x 10 Gb (2 + 2) and 2x 1 Gb separated --> Use to physically separate CVM traffic such as storage and Nutanix Volumes from user VM traffic while still providing 10 Gb connectivity for both traffic types. The four 10 Gb adapters are divided into two separate pairs. Compatible with any load balancing algorithm.

Does this mean I have to create two seperate bonds and have 2 of my 10GB nics on one and 2 of my 10GB nics on another? Or can I create one bond with all 4 of my 10GB nics? I am a bit confused as to what "divided into two seperate pairs" means.

From reading your blog above, it looks like it might be best practice (in this scenario) to use the LACP with balance-tcp load balancing algorithm to get the most bandwidth out of my physical nics? If I did the "default" active-backup I feel like I would be only utilizing 1 of my nics (active) and having the other 3 sit there in a standby mode.

Any advice would be greatly appreciated.