Nodes reset/reboot events

Forum|Forum|9 years ago
August 17, 2016
2 replies
2748 views

+10

roberthwl
Adventurer

We have had two instances where a node detected/reported a fault event and reset rebooting vm on each occasion. There seems no reason for this to have happened.

Details

Host 192.168.xx.x4 appears to have failed. High Availability is restarting VMs on hosts throughout the cluster. 08-17-16, 02:01:41am

Host 192.168.xx.x4 appears to have failed. High Availability is restarting VMs on hosts throughout the cluster.08-11-16, 07:19:48am

We updated the AHV and NCC and since had a repeat last night from the first instance last week

Is there a potential hw fault with the host that has not yet been detected or checked?

This topic has been closed for replies.

+29

Jon
Nutanix Employee
Forum|Forum|9 years ago
August 17, 2016

Could be some sort of hardware NMI or other issue that's causing this. Support can dig in from a diagnostics and log perspective.

I know you had another thread where I recommended opening up a case, please either piggyback on that one, or open a secondary one to cover this off. If Dell sees some sort of hardware issue, they'll do what they do, and if not, they'll pass the case to us to dig into it.

Jon Kohler | Technical Director, Engineering, Nutanix | Nutanix NPX #003, VCDX #116 | @JonKohler | Please Kudos if useful!

Like

+4

swatkins
Nutanix Employee
Forum|Forum|9 years ago
August 18, 2016

If you check ~/data/logs/sysstats/ping_hosts.INFO we keep a log of pings between all nodes in a cluster.

We throw the error you saw if the node is inaccessable over the network, this could be networking or hardware failure.

If you check uptime on all nodes, what does it come back with, if they are uniform, it was likely a networking interuption

Like

Sign up

Login to the community