Question

Manual cluster create fails after host remove.

  • 11 November 2020
  • 2 replies
  • 1882 views

Hello, All.

Good day!!

 

I have an issue and need your help and advice.

 

1. using 10 nodes, but I want to configure 2 clusters with 5 nodes each. (Composed of RF3)

 

2. In a cluster consisting of 10 nodes, the removal of 5 nodes was successfully completed.

 

3. Then, cluster creation was done manually for 5 nodes that had been removed.

cluster -s 166.103.190.123,166.103.190.124,166.103.190.125,166.103.190.126,166.103.190.127 --skip_discovery create --redundancy_factor=3

 

4. Errors that occur during cluster configuration work
--- 166.103.190.125 (Down) to start: It seems that the part does not progress as it repeats Up and Down.

Waiting on 166.103.190.124 (Up, ZeusLeader) to start: Medusa DynamicRingChanger Pithos Mantle Stargate InsightsDB InsightsDataTransfer Ergon Cerebro Chronos Curator Athena Prism CIM AlertManager Arithmos Catalog Acropolis Uhura SysStatCollector NutanixGuestTools MinervaCollector NutanixGuestTools MinervaCollector NutanixGuestTools MinervaCollector NutanixGuestTools MinervaCollector AndPLOSConfigur
Waiting on 166.103.190.125 (Down) to start:
Waiting on 166.103.190.126 (Up) to start: Medusa DynamicRingChanger Pithos Mantle Stargate InsightsDB InsightsDataTransfer Ergon Cerebro Chronos Curator Athena Prism CIM AlertManager Arithmos Catalog Acropolis Uhura SysStatCollector NutanixGuestTools MinervaCVM ClusterConfig APLOS OrganConfig Mercury LaTrim And APLOSEngine
Waiting on 166.103.190.127 (Up) to start: Medusa DynamicRingChanger Pithos Mantle Stargate InsightsDB InsightsDataTransfer Ergon Cerebro Chronos Curator Athena Prism CIM AlertManager Arithmos Catalog Acropolis Uhura SysStatCollector NutanixGuestTools MinervaCVM ClusterConfig APLOS OrganConfig Mercury LaTrim And APLOSEngine

Waiting on 166.103.190.123 (Up) to start: Medusa DynamicRingChanger Pithos Mantle Stargate InsightsDB InsightsDataTransfer Ergon Cerebro Chronos Curator Athena Prism CIM AlertManager Arithmos Catalog Acropolis Uhura SysStatCollector NutanixGuestTools MinervaCVM ClusterConfig APLOS OrganConfig Mercury LaTrim And APLOSEngine
Waiting on 166.103.190.124 (Up, ZeusLeader) to start: Medusa DynamicRingChanger Pithos Mantle Stargate InsightsDB InsightsDataTransfer Ergon Cerebro Chronos Curator Athena Prism CIM AlertManager Arithmos Catalog Acropolis Uhura SysStatCollector NutanixGuestTools MinervaCollector NutanixGuestTools MinervaCollector NutanixGuestTools MinervaCollector NutanixGuestTools MinervaCollector AndPLOSConfigur
Waiting on 166.103.190.125 (Down) to start:
Waiting on 166.103.190.126 (Up) to start: Medusa DynamicRingChanger Pithos Mantle Stargate InsightsDB InsightsDataTransfer Ergon Cerebro Chronos Curator Athena Prism CIM AlertManager Arithmos Catalog Acropolis Uhura SysStatCollector NutanixGuestTools MinervaCVM ClusterConfig APLOS OrganConfig Mercury LaTrim And APLOSEngine
Waiting on 166.103.190.127 (Up) to start: Medusa DynamicRingChanger Pithos Mantle Stargate InsightsDB InsightsDataTransfer Ergon Cerebro Chronos Curator Athena Prism CIM AlertManager Arithmos Catalog Acropolis Uhura SysStatCollector NutanixGuestTools MinervaCVM ClusterConfig APLOS OrganConfig Mercury LaTrim And APLOSEngine

5. Currently, cluster destroy has been executed, and 4 nodes except ‘166.103.190.125’ are destroyed, but ‘166.103.190.125’ is not proceeding as follows.

nutanix@NTNX-J302AC65-A-CVM:166.103.190.125:~$ cluster destroy
2020-11-11 17:43:43 INFO zookeeper_session.py:143 cluster is attempting to connect to Zookeeper
2020-11-11 17:43:54 ERROR configuration.py:157 Could not get Zookeeper connection with host_port_list: zk1:9876,zk2:9876,zk3:9876
2020-11-11 17:43:54 WARNING cluster:2709 Could not read SVM backplane IPs from zk
2020-11-11 17:43:54 INFO cluster:2716 Executing action destroy on SVMs localhost
2020-11-11 17:43:54 WARNING genesis_utils.py:279 Deprecated: use util.cluster.info.get_node_uuid() instead
2020-11-11 17:44:08 WARNING genesis_utils.py:1218 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort)
2020-11-11 17:44:09 WARNING genesis_utils.py:1218 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort)
2020-11-11 17:44:10 WARNING genesis_utils.py:1218 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort)
2020-11-11 17:44:11 WARNING genesis_utils.py:1218 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort)

6. I rebooted the 166.103.190.125 node, but it still has the same symptoms.

 

I don't want to reinstall cvm, I want to create cluster manually. Is there any way to fix it?


This topic has been closed for comments

2 replies

Hello, Hotae

 

i have a same issue but can not fix. 

I think that it is zookeeper issue.

Please help us if you occured.

I look foward to everyone reply.

 

Thanks  

Userlevel 2
Badge +2

The “Failed to reach a node where Genesis is up” could mean that node is not completely unconfigured yet. Typically if it is unconfigured and not part of a cluster, only genesis and zookeeper is running. These two services are stable and if you enter “cluster status”, it should show that the cluster is not configured.

So it should be a matter of properly unconfiguring it. We do have an internal KB for it, so it would be easier to open a Support case if you wanted to go down that route. (KB 3436)