Solved

Nutanix cluster issue, replaced node, connection error, out-of-support, seeking guidance

  • 9 October 2023
  • 6 replies
  • 1144 views

Hello all. I have a three node nutanix cluster. Recently I noticed that the VMs were down. I tried to browse to PE but I was getting connection error. I user IMPI to see what was going on and noticed that the 3rd node was down. Boot error. I replaced the SATA DOM and reimaged. I thought I could fail and be OK with only two nodes. I have access to the two AHV and CVMs but anytime I try to see if the cvm is in maintenance mode or try to come out, I get Error:Cannot connect to Prism Gateway. This error also happens when i run cluster status. I’m not sure what to do from here. This cluster isn’t under support anymore since we moved to the cloud. Just have some old VMs that someone asked for some data. 

 

Thanks

icon

Best answer by sl.farhanparkar 14 October 2023, 12:57

View original

This topic has been closed for comments

6 replies

Userlevel 4
Badge +7

Hey,

Yikes..!

What does genesis status output on each cvm?

Done the usual checks for disk space df -h just incase it's blown up from full disks?

Userlevel 3
Badge +5

Hi,

Can u access CVM and do “cluster status”

F>P

@Kcmount , Genesis status shows all services down except genesis: [ 3848,28228,28254,28255]

@sl.farhanparkar , I have access to the both CVMs. When I do cluster status I get

 

 

Userlevel 3
Badge +5

Hi

From any cvm can you ping other cvm ip addresses.

connect to any CVM and do “genesis status” to check the status of local services.

try “ genesis restart” on all cvm followed by “cluster status”

 

F>P

 

The same thing happens when I do genesis restart. Only genesis service is running. None of the other service are running. 

 

Userlevel 3
Badge +5

Hi 

This will need a detailed analysis to diagnose the issues, 

Run “cluster start” and check the status, monitor genesis.out , based on msg you getting move forward accordingly.

This is advanced troubleshooting, only Nutanix support personnel are trained and for normal administrator its not easy as the message will be overwhelmed and not worth if not running production.

 

F>P