A Quick Explanation of What Makes AHV Virtual Machine Scheduling with ADS So Unique
This post was authored by Mike Wronski, Sr Product Marketing Manager Nutanix
When it comes to anyone familiar with enterprise virtualization, we have certain expectations. Among them is the expectation that their virtualization platform will provide functions to optimize resource usage and ensure good performance. For VMware, it’s a component called the Distributed Resource Scheduler (or commonly DRS) in vCenter. For Microsoft it’s their a Performance Resource Optimization (PRO) that leverages System Center Operations Manager (SCOM) resource monitoring in combination with the System Center Virtual Machine Monitor (SCVMM).
It is a common misconception that resource optimization is a hypervisor feature; in actuality, it’s event-driven automation from the management plane. The only required hypervisor feature is the ability to migrate a VM to another server without disruption. It is an external management entity that is responsible for collecting data, noticing any “bad situation,” and then acting, not the hypervisor itself.
Nutanix Acropolis software combined with the Acropolis Hypervisor (AHV) is different from other virtualization solutions, particularly when it comes to our Acropolis Dynamic Scheduler (ADS). Here’s how:
Not a load balancer
One of the first things I want to emphasize is that ADS is a resource contention avoidance engine and not a load balancer. Load balancing alone doesn’t really provide much benefit. The act of moving VMs around a cluster of servers should only be undertaken if there is a performance impacting contention of resources as the move itself can have a high resource cost. ADS will only move VMs if the benefit of the move is significant enough to justify the migrations.
No configuration required
A unique differentiator of ADS is that there is nothing for the administrator to configure. ADS is enabled by default and does not require any human tuning due to its machine intelligence-based implementation. Nutanix groups our machine intelligence functions under the name Xfit and they are used across multiple functions in our Prism Management software. ADS uses a class of artificial intelligence algorithms called Constraint Satisfaction Problem (CSP) solvers. CSP solvers build a model based on a set of variables and a defined end state criteria and then iterate on the variables looking for a solution. Depending on sophistication of the implementation there can be more brute force or contain domain specific heuristics and feedback loops that aid in finding a valid solution. A common example would be a CSP programed to solve a Sudoku puzzle. In the AHV world, a solution would be VM migration plan that leaves cluster free of resource contention or hot spots.
Previously unachievable visibility
Our position of owning the full stack from HCI infrastructure, to our own AHV, to management and automation gives Nutanix the unique ability to gain insights into areas where traditional virtualization does not have visibility. For example, ADS can see storage resources from our CVM controllers, think storage controller in a traditional 3-tier architecture, in addition to getting VM resource utilization from AHV. In the future, other inputs could be included, from networking to application-level insights.
A VM “Advisor”
ADS is not only used to address active contention, ADS is also consulted for ideal placement of new VMs to ensure they are not provisioned where placement alone could cause resource contention. The ADS service, periodically checks the historical metrics to identify resource contention. If contention is found, the CSP solver is asked for a VM migration solution that would eliminate the problem. In addition to being fed resource constrains as inputs, the solver is also fed any VM or host affinity rules that must also be satisfied The solver finds the optimal solution while keeping costs low. Recalling that any migration of a VM has a resource cost, mostly from the movement of active RAM, the solver will recommend moving VMs with smaller RAM allocations vs. larger ones.
Once a solution is found, the VMs are migrated and the event is logged. When there is no valid solution, alerts and alarms are raised in Prism so that an administrator can take further action. Prism Pro provides additional reports and analysis for resource reclamation and node expansion that the administrator can use to address any contention that cannot be solved by VM migration alone, but that’s a topic for a different blog post.
Our unique ADS implementation is designed with a single purpose in mind: to make the lives of administrators and application owners easier—not simply by removing complexity, but also by leveraging automation and machine learning to ensure continued, optimal application performance. What other metrics could be fed into the ADS engine as Nutanix expands into software defined networking (SDN) and public cloud management? We encourage you to continue that discussion on our forum.