Solved

Zookeeper failed after manually changed CVM IP

  • 15 March 2016
  • 4 replies
  • 7542 views

Badge +6
Hi All,

Recently i have changed my CVM IP address following Manual Method in Documentation 4.6 , i can start the cluster after changed the IP , but the zookeeper seem like not stable.

Prism show error "Desired fault tolerance for Zookeeper is 1 but we can tolerate only 0 node failure(s)" on the Data Resiliency Status dashboard.

NOS 4.6 , ESXI 6.0

When i grep the zookeeper PIDs , it only show 5 PIDs instead of 6 , and sometimes it will be change to 4 PIDs.

Command:

nutanix@NTNX-14SM15430005-C-CVM:10.20.20.118:~$ allssh genesis status | grep zookeeper zookeeper: [10354, 10367, 12412, 12438, 12439] zookeeper: [12250, 12263, 15454, 15506, 15507] zookeeper: [19055, 19069, 21163, 21189, 21190]

nutanix@NTNX-14SM15430005-C-CVM:10.20.20.118:~$ allssh genesis status | grep zookeeper zookeeper: [12412, 12438, 12439, 26481, 26494] zookeeper: [15454, 15506, 15507, 27119] zookeeper: [21163, 21189, 21190, 32528, 32542]

Anyone has any idea on this ?

Thanks.
icon

Best answer by mgauch 15 March 2016, 13:51

View original

This topic has been closed for comments

4 replies

Badge +4
Looks like you may have missed a step in the instructions. Please double check your zk_server_config_file (I didn't see genesis in there it still shows zookeeper_monitor):

==================================
Open the zookeeper configuration file.nutanix@cvm$ vi /home/nutanix/data/zookeeper_monitor/zk_server_config_filePress A to edit values in the file.Update the fields in the file.Change # LAST MODIFIER: field_entry to # LAST MODIFIER: GENESISIncrement the existing number by one in the # ZOOKEEPER CONFIG VERSION field. For example, you would change # ZOOKEEPER CONFIG VERSION 3 to # ZOOKEEPER CONFIG VERSION 4Update the entries for each Controller VM (zk1, zk2, zk3, and so on) to match the IP addresses (ip_address) of the Controller VMs.# LAST MODIFIER: GENESIS# ZOOKEEPER CONFIG VERSION existing_number+1ip_address zk2 # DON'T TOUCH THIS LINEip_address zk1 # DON'T TOUCH THIS LINEip_address zk3 # DON'T TOUCH THIS LINEip_address zkN # DON'T TOUCH THIS LINE
==================================

Link: https://portal.nutanix.com/#/page/docs/details?targetId=Advanced_Admin-Acr_v4_6:ip__cvm_ip_addr_change_t.html
Userlevel 4
Badge +18
Can you please check and make sure all CVM had updated zookeeper entry in /etc/hosts file.

-NP
Badge +6
Hi Donnie,

I have follow proper way to changing the CVM IP manually , and have sure my both file were synced.

Second , i have stop zookooper and restart genesis as follwoing one of the KB nutanix , but still look same.

Below are the result :

nutanix@NTNX-14SM15430005-D-CVM:10.20.20.119:~$ allssh cat /home/nutanix/data/zookeeper_monitor/zk_server_config_fileExecuting cat /home/nutanix/data/zookeeper_monitor/zk_server_config_file on the cluster================== 10.20.20.118 =================# LAST MODIFIER : ZOOKEEPER_MONITOR# ZOOKEEPER CONFIG VERSION 110.20.20.118 zk3 # DON'T TOUCH THIS LINE10.20.20.119 zk2 # DON'T TOUCH THIS LINE10.20.20.120 zk1 # DON'T TOUCH THIS LINE================== 10.20.20.119 =================# LAST MODIFIER : ZOOKEEPER_MONITOR# ZOOKEEPER CONFIG VERSION 110.20.20.118 zk3 # DON'T TOUCH THIS LINE10.20.20.119 zk2 # DON'T TOUCH THIS LINE10.20.20.120 zk1 # DON'T TOUCH THIS LINE================== 10.20.20.120 =================# LAST MODIFIER : ZOOKEEPER_MONITOR# ZOOKEEPER CONFIG VERSION 110.20.20.118 zk3 # DON'T TOUCH THIS LINE10.20.20.119 zk2 # DON'T TOUCH THIS LINE10.20.20.120 zk1 # DON'T TOUCH THIS LINEnutanix@NTNX-14SM15430005-D-CVM:10.20.20.119:~$ allssh cat /etc/hostsExecuting cat /etc/hosts on the cluster================== 10.20.20.118 =================127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6127.0.1.1 Nutanix-Controller-VM127.0.0.1 NTNX-14SM15430005-C-CVM10.20.20.118 zk3 # DON'T TOUCH THIS LINE10.20.20.119 zk2 # DON'T TOUCH THIS LINE10.20.20.120 zk1 # DON'T TOUCH THIS LINE================== 10.20.20.119 =================127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6127.0.1.1 Nutanix-Controller-VM127.0.0.1 NTNX-14SM15430005-D-CVM10.20.20.118 zk3 # DON'T TOUCH THIS LINE10.20.20.119 zk2 # DON'T TOUCH THIS LINE10.20.20.120 zk1 # DON'T TOUCH THIS LINE================== 10.20.20.120 =================127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6127.0.1.1 Nutanix-Controller-VM127.0.0.1 NTNX-14SM36060045-B-CVM10.20.20.118 zk3 # DON'T TOUCH THIS LINE10.20.20.119 zk2 # DON'T TOUCH THIS LINE10.20.20.120 zk1 # DON'T TOUCH THIS LINE
Badge +6
Thank mgauch , i have changed it and it is working fine now .