Nodes reset/reboot events

7 years ago
17 August 2016
2 replies
2627 views

+10

roberthwl
Adventurer
5 replies

We have had two instances where a node detected/reported a fault event and reset rebooting vm on each occasion. There seems no reason for this to have happened.

Details

Host 192.168.xx.x4 appears to have failed. High Availability is restarting VMs on hosts throughout the cluster. 08-17-16, 02:01:41am

Host 192.168.xx.x4 appears to have failed. High Availability is restarting VMs on hosts throughout the cluster.08-11-16, 07:19:48am

We updated the AHV and NCC and since had a repeat last night from the first instance last week

Is there a potential hw fault with the host that has not yet been detected or checked?

This topic has been closed for comments

2 replies

Userlevel 6

+29

Jon
Nutanix Employee
569 replies
7 years ago
17 August 2016

Could be some sort of hardware NMI or other issue that's causing this. Support can dig in from a diagnostics and log perspective.

I know you had another thread where I recommended opening up a case, please either piggyback on that one, or open a secondary one to cover this off. If Dell sees some sort of hardware issue, they'll do what they do, and if not, they'll pass the case to us to dig into it.

+4

swatkins
Nutanix Employee
5 replies
7 years ago
18 August 2016

If you check ~/data/logs/sysstats/ping_hosts.INFO we keep a log of pings between all nodes in a cluster.

We throw the error you saw if the node is inaccessable over the network, this could be networking or hardware failure.

If you check uptime on all nodes, what does it come back with, if they are uniform, it was likely a networking interuption

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded