Nodes reset/reboot events

  • 17 August 2016
  • 2 replies
  • 1708 views

Badge +10
We have had two instances where a node detected/reported a fault event and reset rebooting vm on each occasion. There seems no reason for this to have happened.

Details

Host 192.168.xx.x4 appears to have failed. High Availability is restarting VMs on hosts throughout the cluster. 08-17-16, 02:01:41am


Host 192.168.xx.x4 appears to have failed. High Availability is restarting VMs on hosts throughout the cluster.08-11-16, 07:19:48am

We updated the AHV and NCC and since had a repeat last night from the first instance last week

Is there a potential hw fault with the host that has not yet been detected or checked?

2 replies

Userlevel 6
Badge +29
Could be some sort of hardware NMI or other issue that's causing this. Support can dig in from a diagnostics and log perspective.

I know you had another thread where I recommended opening up a case, please either piggyback on that one, or open a secondary one to cover this off. If Dell sees some sort of hardware issue, they'll do what they do, and if not, they'll pass the case to us to dig into it.
Badge +4
If you check ~/data/logs/sysstats/ping_hosts.INFO we keep a log of pings between all nodes in a cluster.

We throw the error you saw if the node is inaccessable over the network, this could be networking or hardware failure.

If you check uptime on all nodes, what does it come back with, if they are uniform, it was likely a networking interuption

Reply