Following our previous blog “The Cloud OS Awakens : A New Hope” We covered why Machine Learning (ML) and Artificial intelligence (AI) are powering Nutanix Enterprise Cloud operating system. Let’s dive deeper as Yoda would say “Stay for some soup you must.”
Anomaly detection to maintain cluster healthEvery scalable architecture should be designed for failures with the goal of no application downtime and zero risk to the business and revenue. This is necessary but not sufficient to provide a highly available system with 5 or more 9s of availability. In addition to handling the presence of failed nodes, a truly scale out application must also handle partial faults, as outlined in Limplock and Fail-Stutter Fault Tolerance papers.
Partial faults can result in a node that is not dead, but is so unhealthy that it causes the entire cluster to slow down. Machine learning algorithms can significantly improve how these behaviors are learnt so that appropriate actions are taken in a timely fashion to maintain cluster health. Nutanix clusters have already mastered this and are able to maintain high availability. For this particular problem, Nutanix leverages a clustering algorithm called DBSCAN along with Nutanix distributed set of degraded node monitors.
In this process, every node calculates a score for all its peers based on their performance. DBSCAN algorithm then runs on this data and detects the outliers, having scores indicative of degradation. Once the degraded nodes are flagged, an alert is generated and the leadership and critical services will not be hosted on that node. When the node is operationally ready, it can be added back to the cluster. As a result, this feature ensures cluster health and high availability.
Optimization & proactive placementIn order to guarantee performance to all VMs, the cluster has to intelligently utilize adequate resources at all times. However, with the constant changing of the environments, number of VMs, and type of workloads, it is very difficult to maintain a consistent performance within a cluster. Therefore, there are many VM placement issues in the datacenters. As a result of a bad placement, the applications running on the cluster will experience unpredictable performance.
Also, in some cases resource contention happens. To help achieve better density of VMs, a lot of time resources are overcommitted which can cause contention between the nodes during peak traffic times. It is evident that lower and unpredictable performance directly affects business. Here is where Nutanix hypervisor AHV uses Acropolis Dynamic Scheduler (ADS). ADS leverages Constraint Satisfaction Problems Solver (CSP solvers) to improve VM placement & scheduling. CSP solvers are used in artificial intelligence (AI).
VM behavior learningIn a cluster with multiple VMs, many different applications are running. They are all consuming resources and can display different resource consumption characteristics. Some VMs are highly active during the day but idle at other times. In order to efficiently utilize the available resources, you will need to understand these behavioral patterns. Manually tracking and learning these behaviors in a large environment that is constantly changing, is a very cumbersome job. Who really wants to spend their time doing that?
There are bigger business problems that needs to be addressed. In a big deployment, VMs get created and may at times be forgotten. I am sure you can think of a scenario that the VM was created for a user and was not utilized. Here is where Nutanix’s X-FIT engine comes to rescue. Within Prism Central, X-FIT engine uses time series analysis algorithms to identify patterns.
Smart planning & what if analysisGuesswork & spreadsheets! How many different management consoles need to be monitored before making a critical expensive decision for your datacenter? How can you avoid over provisioning and the costs of it? X-FIT engine not only helps with the VM behavioral analysis, but also provides accurate forecasting. Many of our customers are loving the one-click upgrades and one-click operational insights within Prism Central. And because of X-FIT, they are enjoying the one-click planning option. X-FIT engine is comprised of a set of algorithms such as ARIMA, Theta, Neural, etc. It runs a tournament to choose the algorithms that best describes the data. Once the tournament winners are chosen their forecasts are combined.
These forecasts assist the customers to estimate their true resource needs empowering them to optimally size hardware resources for specific workloads. Customers using the one-click planning can easily see when they will run out of capacity. This is where the power of planning flourishes. Using what if analysis in Prism Central, you can specify the workloads that needs to be added and then the system will generate a resource recommendation. It is important to notice that leveraging technologies such as X-FIT helps us eliminate inefficiencies in the datacenter and largely save us on costs. Say goodbye to stressful, costly, and hectic IT refresh cycles and say hello to Nutanix.
Disclaimer: This blog may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such site.
2017 Nutanix, Inc. All rights reserved. Nutanix, the Enterprise Cloud Platform, and the Nutanix logo are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. All other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s).