Strange CVM / Hypervisor interaction

  • 27 May 2016
  • 3 replies

Badge +3
We are spinning up some new ESXi 6 clusters and have encountered some strange behavior on 1 of these clusters.

The CVMs periodically contact the host machines to gather various stats. All logging seems to indicate this process is working properly, but every few hours to few days we will have a host lose connection to vCenter, and stop responding to all remote management requests (SSH, etc.)
It appears that the VMware hostd service simply runs out of resources and stops responding. The only way to remedy that we have found so far is restarting the host completely. In some instances the host console was still responsive, but attempting to restart the management services was unsuccessful.

We have ensured all passwords are congruent and have regenerated the certificates used to secure hypervisor to CVM communication.

Has anyone seen anything similar?

This topic has been closed for comments

3 replies

Userlevel 1
Badge +9
Qlashley,If you haven't, please open a support case and DM me the case number.Cheers,Art
Badge +3
Art, I sent you the relevant data a few days ago. Have you discovered anything interesting?One thing we noticed that alleviates the symptoms is to remove the hosts themselves from our Windows domain (the ESXi hosts, not vCenter).Any further thoughts with this new info?
Userlevel 1
Badge +9

We've been working with Dell and VMware on a similar case, and have come to the same conclusion.

The ESXi AD (Likewise) related services are triggering an ESXi vmkernel memory issue, causing hostd to become unresponsive, ssh to fail, etc.

if these services are still running they should be stopped until VMware can patch this issue.

From a CVM run the following, to check if these services are running;
ssh root@ '/etc/init.d/lwiod status; /etc/init.d/lsassd status; /etc/init.d/netlogond status'

To stop all the services for all ESXi hosts in the Nutanix cluster you can run the following command;
allssh "ssh root@ '/etc/init.d/lwiod stop; /etc/init.d/lsassd stop; /etc/init.d/netlogond stop'"