Memory Utilization Widget - Bad Math

  • 16 September 2016
  • 4 replies

Badge +4
On the cluster home page, the memory usage box shows that we’re using 51.71% of the Cluster Memory (2.21TiB). The only thing running on this cluster right now are the CVMs which have just been bumped to 24GB each for dedup.

In actuality when we drill down, we’re using 5.79% on each of the ESX hosts (512GB on each) and around 77.95% on the storage only nodes (32GB) which are good numbers.

In doing the math, it appears that someone just got lazy with the calculation on the home page widget and averaged the percentages across all nodes instead of actual memory capacity vs used probably anticipating every node to have identical builds and the same memory capacity. This is why it comes out to the over 50% utlized for us. It’s really more like 6% utilized since the AHV storage only nodes shouldn’t really count in the overall total of available for use by VMs other than CVMs.

It would be helpful for the dashboard widget to show to true capacity utilized.

This topic has been closed for comments

4 replies

Userlevel 6
Badge +30
Good point. I pinged our internal PM team to see if we already have a fix/ticket in for this, and if not, we'll file one.

I'd also suggest you submit a low priority ticket through so we can formally track this as an issue. Also helps us track customer demand for stuff like this.
Userlevel 6
Badge +30
I submitted an internal engineering ticket, ENG-64763, for this issue.

If you can, please submit a support ticket and have them associate your ticket to our internal ticket.

That helps track customer demand against this defect, so we can prioritize it appropriately. Most customer tickets get tagged against a defect, faster it gets fixed.
Just doing a bit of capacity planning. What is the formula used for the Memory Calculation % for a cluster on that widgit? Unless my calculations are wrong, it doesn't appear to be a straight % between total physical memory to allocated memory, including CVM's, for powered up VMs. Does it account for other factors such as Transparent Page Sharing (TPS) in ESX, for example?

This must have never been fixed.

We had a cluster that was running at, according to the dashboard, 47% capacity, we REMOVED x4 nodes from the cluster and our capacity actually INCREASED, with the dashboard saying we were now only at 37% capacity. We removed 758GB of RAM from the cluster and our capacity went up :joy:

The nodes we removed had less memory than the other nodes in the cluster, which would have cause a disparity and messed up the lazy averaging calculation.

I know now not to trust that figure.