Question

100% CPU Usage on all my hosts

  • 30 April 2019
  • 4 replies
  • 10416 views

In the last few weeks we have been experiencing complete host CPU usage at 100% on all of our host(6 of them). We were never this high. When the system is at this level no one can do much on the ystem for a few minutes.
The biggest change we did was add Bitdefender Gravity Zone to our system.
We have about 650 VMs on the system.
Memory usage hovers between 50-60% on all the hosts.
We are running ESXI hypervisor with VMware Horizon to run our VDI system.

Anyone have an insight into this situation? Is there any tweaking I need to do? What could be causing this?

4 replies

I'm sorry but I can't help. My recommendation, if not already done, is to open a SR with our support so they can dig in.
You can open an SR here:
https://portal.nutanix.com
or call https://www.nutanix.com/support-services/product-support/support-phone-numbers/
Is it only on the host or also on certain or all of your vm's . I assume your running a vdi workload for your vm count to be that high on a 6 node cluster .

https://www.bitdefender.com/support/what-is-the-vsserv-exe-process-1116.html
Userlevel 2
Badge +4

Hi Warana.

Please check my article regarding Antivuris on AHV, It can provide you with some additional information.

https://next.nutanix.com/installation-configuration-23/antivirus-on-nutanix-ahv-hosts-and-on-cvms-37041

Regs.

Antonio

Userlevel 3
Badge +4

Hi @Warana 
I am not very familiar with BitDefender but I think the above posts should be helpful. In case you are still stuck I thought I would offer a bit more advice.
The analysis page in Prism can be quite helpful for reviewing performance concerns like this. You can chart multiple views on a synchronized timeline and this can help to explain how a top level issue, such as 100% host CPU, relates to traceable stats such as per-vm CPU utilization, per-vm or per-CVM disk IO, disk latency, and more. 
The features available are explained in the Prism Web Console Guide

Many of the performance cases raised at Nutanix can be resolved through the use of these analysis tools. Once you can see in detail what contributing factors occurred in correlation with your CPU usage spike it generally becomes a lot easier to pin down the cause of the issue and start working on a solution.

Reply