No Alerts, Big Problems: A Story of Expectations with Prism Central and Backups

  • 1 February 2018
  • 0 replies
No Alerts, Big Problems: A Story of Expectations with Prism Central and Backups
Userlevel 7
Badge +35
This post was authored by Dwayne Lessner, Sr Technical Marketing Engineer at Nutanix

Overtime I have slowly played with a lot of different pieces of backup software. I wouldn't put backup software in the sexy pile of career choices but it can keep you out of jail when you're able to restore your VP's budget spreadsheet. To that end is there any way to figure out if your backup job is not going to work before it's too late? If you're backing up 100's of TBs daily you probably have a tight timeline to get everything completed before that job wreaks havoc on the rest of your environment. Coming in the next morning to see a failed backup is pretty depressing and only gets worse if happens 2,3 days in a row. Part of the problem here is that some jobs get stuck or become so slow and never really fail. Without the job failing the backup software may never alert.

With the release of AOS 5.5, we also released Prism Central (PC) 5.5. With PC 5.5 now includes machine-learning capabilities that analyze resource usage over time and provide tools to monitor resource consumption, identify abnormal behavior, and guide resource planning. The added anomaly detection records when performance or resource usage is outside an expected range based on learned VM baseline behavior. The anomaly detection module measures usage every five minutes and compares that usage with the predicted values. If the observed value is outside the band, it flags that value as an anomaly. Each anomaly is recorded as an event inside of PC.

This new machine learning behavior can be used to find the high and low ranges. In the case of backup software like Commvault or NetBackup , we can trend the backup proxies to determine if the backup job is running as expected. If a backup job gets stuck the resources should be low and we can alert on it. Likewise if we are doing a full backup once a week PC will take into account the trend and not alert you on the delta between that would be seen when only doing incremental. Also a great way to give you insight suddenly if the job takes an extra hour. Maybe some additional storage is being backed up that you need to account for.

To account for the differences between the weekdays and weekends PC needs to build 3 separate seasonal data points. In total it takes 3 weeks to build the separate data points but then it can be used for over 27 different metrics are for VMs, hosts, and clusters. Data for each metric from the past 3 weeks is recorded and analyzed, a normal behavior band is established, and predictions for the next 7 days are formulated. In the images below a smaller blue band indicates a very consistent workload. The Commvault server in the image is running a CPU load script to create the result. Workloads with larger variation in the band have a higher variance. A minimum tolerance is statically set to prevent too many false positives, but if the variance in the data is more, then the tolerance is actually a function of the variance in the data. In short PC gives you an adaptable algorithm for your environment.

The above custom policy can be created from PC. By relying on anomaly detection you don't have to set static thresholds that wouldn't sense when the workload is idle.

You could also look at you the workset size if you're running a media server on Nutanix. If the working set size is changing drastically for a time period if would good to review what is happening.

Datacenters aren't static and PC is a tool that can help is fight the battle of data and application sprawl while taking into account what is happening in your environment. At the end of the day you want to know about problems before your customers do.

If you have an interesting use case with PC please leave it in the comments.

©️ 2018 Nutanix, Inc. All rights reserved. Nutanix and the Nutanix logo are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. All other brand names and logos mentioned herein are for identification purposes only and are the property of their respective holder(s), Nutanix may not associated with, or sponsored or endorsed by such holder(s).

0 replies

Be the first to reply!