We are spinning up some new ESXi 6 clusters and have encountered some strange behavior on 1 of these clusters.
The CVMs periodically contact the host machines to gather various stats. All logging seems to indicate this process is working properly, but every few hours to few days we will have a host lose connection to vCenter, and stop responding to all remote management requests (SSH, etc.)
It appears that the VMware hostd service simply runs out of resources and stops responding. The only way to remedy that we have found so far is restarting the host completely. In some instances the host console was still responsive, but attempting to restart the management services was unsuccessful.
We have ensured all passwords are congruent and have regenerated the certificates used to secure hypervisor to CVM communication.
Has anyone seen anything similar?