Nutanix Prism radically simplifies a broad range of IT operations using powerful machine learning and automated decision making. At the heart of this machine learning is a patent-pending technology developed at Nutanix called X-FIT.
Instead of using one particular model, X-FIT uses an ensemble of models and runs a contest to dynamically pick the best model for the application. X-FIT is the engine behind Nutanix’s machine learning technology. This blog discusses the Predictive Capacity Analysis capability introduced in Prism that harnesses machine learning powered by X-FIT to drive the automated decision making.
Predictive Capacity Analysis enables enterprises to understand how different applications are using the underlying infrastructure resources and provides more accurate predictions about when they will run out of resources. It also recommends ways to optimize capacity and performance.
This capability, along with the one-click expansion of resources, is extremely crucial in delivering just-in-time provisioning of resources - a capability that is core to enterprise cloud platform.
This ability enables the IT teams to:
► View detailed capacity trends - including storage, compute and memory
► Capacity runway - accurately predict when applications will run out of capacity
► Optimization recommendation - determine steps that can be taken to optimize existing capacity, including isolating and deleting zombie VMs, reallocating unused resources from running VMs, and removing old snapshots, just to name a few.
► Precisely understand what resource should be added to the cluster to meet the needs of the application (aka an accurate Build of Materials (BOM)!)
The purpose of this post is not to talk about what we do, but how we do it. Capacity planning and “predictive” analysis is not new to the industry. Several vendors have done this in the past with varying levels of success. So what is different with Nutanix?
If AWS consumed storage, compute and virtualization from the different traditional vendors that most enterprises buy from today, would they be able to predict and scale their infrastructure as seamlessly as they are doing today?
When the same software manages the entire stack (compute, storage and virtualization), the visibility and the machine intelligence gained from each layer is much more powerful and complete than from building management packs for multiple disparate solutions.
The fact that Nutanix has converged the most important pieces of the infrastructure in the datacenter information gets passed from one layer to another as opposed to data. As a result, the fidelity of predictions is a lot more accurate.
While existing solutions keep track of historical usage and make predictions based on that information, they are not always application aware. Workload characterization and predictability are key to building a management plane for both enterprise applications as well as modern cloud-native applications running either in the cloud or large-scale centers. These workloads exhibit seasonal patterns and trends that can be used to provision capacity as well as place workloads optimally.
One Size Does Not Fit All
Existing capacity planning solutions use algorithms such as STL or ARIMA or Theta to come up with a prediction. However, these algorithms do not adequately address the requirements and challenges that arise in applications, namely:
► For dynamic applications, no one algorithm may be best suited for all environments; for example, the Theta method, which was the best method overall in the M3 competition, does not handle seasonal effects.
► Scale-out applications need algorithms that can scale to several thousand times series with millions of data points each.
The Solution: X-FIT
Nutanix has built a new technology called X-FIT, for forecasting time series data using an ensemble of models as opposed to sticking to a single model for understanding application data patterns.
X-FIT addresses the aforementioned challenges as follows:
First, instead of using a single algorithm per application, X-FIT uses an ensemble of models and runs a tournament to select the best model for each environment and historical time range.
Second, to scale to millions of data points, X-FIT efficiently runs a distributed tournament for finding the best set of models for a given time series and optimally combines the forecasts from the best models.
► Pre-processing of data: X-FIT fills in any missing values in the input time series using some interpolation algorithms. Further, if the data is not normally distributed i.e. exhibits heteroscedasticity, X-FIT transforms it by using the Box-cox transform.
► Initializing the ensemble of models: The ensemble is then initialized either by using the default ensemble of models containing ARIMA, STL, THETA, ETS, TBATS, Neural Network, Random Walk, Seasonal Naive Mean, Linear Regression with Seasonal Components or by finding the best models for similar time series using a pre-built index.
► Pruning models based on structural characteristics of the time series: Certain models including ARIMA have a “short-memory'” and will likely overfit any recent changes in trend. X-FIT uses an algorithm to detect this change-point pattern, if it detects any recent changes in the trend, it drops these models from the ensemble.
► Evaluating the accuracy of the models: This is done in a distributed manner, as follows: First, each mapper evaluates the out-of-sample accuracy of a different model and emits the accuracy results to a single reducer. The reducer then chooses the top performing models during the tournament.
► Forecasting: The forecast is obtained by combining the individual forecasts and producing the final forecast.
► Reconciliation: X-FIT reconciles forecasts while rolling up the forecasts; e.g. Node level CPU forecasts need to be aggregated at a cluster level and reconciled with the cluster level forecast. It uses weighted linear regression to achieve this reconciliation.
Nutanix evaluated X-FIT using two applications; The first application was for forecasting the workload patterns and resource usage within data centers, where X-FIT did better than Theta by more than 20%. The second application was the one from the M3 competition, where X-FIT outperformed Theta in forecasting all the data sets by more than 10%.
X-FIT which is the engine behind Nutanix’s machine learning technology is a highly scalable and a dynamic ensemble of models for time series forecasting. In this blog, we described how X-FIT powers the capacity planning capability in Prism.
We have shown its applicability for forecasting of metrics exhibiting different characteristics from different industries in the M3 competition data set, and for the capacity demand of web-scale applications running within mega data centers.
The results obtained from applying X-FIT to these data sets are promising, and show that X-FIT can predict workloads and infrastructure needs in real-time with high fidelity, much better than the existing models.
X-FIT can be an enabler in building a management fabric for invisible infrastructure by autonomously optimizing data center performance and by intelligently managing application resource demands using operational data. X-FIT can thereby enable optimal workload placement and consolidation at a fine-grained level.
Continue the conversation on our forums and follow Nutanix on Twitter for the latest news and announcements.
This post was authored by Abhinay Nagpal, Staff Engineer and Shubhika Taneja, Product Marketing Manager