Nutanix Alerts


Badge +10
We had a worrying senario this am on our main production cluster where events reported that two of the four nodes were being put into maintenace mode - ie downed the cluster. This was not the case though it caused a major panic and the start of a full dr process.

What we saw was as follows

Curator job Full Scan with id 6220 has been running for a long time i.e. 21625 seconds
Controller VM xx.xx.xx.x2 is put in maintenance mode due to Cluster conversion.
Controller VM xx.xx.xx.x4 is put in maintenance mode due to Cluster conversion.

The nodes were checked and the only real errors of any note picked up using ncc seem to be related to Genesis

WARN: Error while executing cluster status: 2016-10-06 10:50:35 WARNING genesis_utils.py:842 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort)2016-10-06 10:50:36 WARNING genesis_utils.py:842 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort)2016-10-06 10:50:37 WARNING genesis_utils.py:842 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort)2016-10-06 10:50:38 WARNING genesis_utils.py:842 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort)2016-10-06 10:50:39 WARNING genesis_utils.py:842 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort)

I have done some searching and come up with no real answers atm

FYI there is already a support case logged with our hw vendor


4 replies

Badge +10
The issue with genesis has been resolved - had hung/stopped on one node the other reported issues are still outstanding with no cause/solution
Userlevel 7
Badge +30
Robert - I don't see any cases filed with Nutanix about this under your account. Having Nutanix support dig into this is the absolute best way to get resolution on this one.

Looks like you are a Dell customer and you mentioned you logged a ticket. Please have your Dell support rep escalate the ticket to Nutanix.
Userlevel 7
Badge +30
- Following up, were you able to get through these issues?
Badge +10
Hi Jon
This was investigated by Dell and Nutanix support.
I thought I had already responded to this post.

We had a VMWare cluster which was converted to AHV - pre production
The errors we saw were legacy from the conversion and then subsequent AHV update.
A problem with genesis on one of the nodes was thought to be the root cause in flagging the events as current

Regards
Robert

Reply