Acropolis Dynamic Scheduling (ADS) is responsible to examine the load on various components for hotspots and calculate what migrations can be done to balance the load without causing additional hotspots. Lazan is the dynamic scheduling service for AHV.
Lazan monitors the following resources:
-
Total CPU usage of each of the guest VMs.
-
Storage controller (stargate) CPU usage per VM or iSCSI target.
Lazan does NOT monitor the following:
-
Memory. Apart from ensuring its migration tasks are within memory limits ( including HA), Lazan does not load balance based on memory usage.
-
Networking.
Overview of how Lazan works:
-
The Lazan manager gathers stats from the components it monitors.
-
The Lazan solver (runner) checks the stats for potential anomalies and determines how to resolve them, if possible.
-
The Lazan manager invokes the tasks (e.g. VM migrations) to resolve the situation.
When is a HOTSPOT detected by Lazan?
Lazan runs every 15 minutes and analyzes the resource usage for at least that period of time. If the resource utilization of an AHV host remains >85% for the span of 15 minutes, migration tasks will be triggered to remove the hotspot.
In cases where there is a obvious hotspot but VMs did not migrate, it may be either of the following two reasons:
Reason 1: If Lazan can't resolve a hot spot, it won't migrate the resource blindly.
For example:
-
There's a huge VM (16 vCPUs) at 100% usage, and accounts for 75% of the AHV host usage (which is also at 100% usage).
-
The other hosts are loaded at ~ 40% usage.
In this situation, the other hosts can't accommodate the large VM without causing contention there as well. Lazan doesn't prioritize one host or VM over others for contention, so it leaves the VM where it's currently hosted.
Reason 2: If the number of all flash nodes in the cluster is less than the replication factor, the ADS will fail on the all flash node i.e any big VMs running on the all flash node will not be migrated from the all flash node. For example if the cluster is RF2 it should have a minimum of 2 all flash nodes for successful migration of VMs on the all flash nodes.
Within a VMware ESXi environment DRS would manage the load balancing of compute, memory, and storage resources for virtual machines. In ESXi deployments, Lazan also interacts with the balancing of storage resources at the ADSF layer. Lazan examines the storage load of iSCSI targets (via Stargate) and distributes the Volume Groups evenly across all stargate instances. This provides consistent performance for all iSCSI initiator connections to the "Volumes (Block Services)" Data Services IP.
The detailed KB article can be found at https://portal.nutanix.com/kb/000004229