Preupgrade to AOS 5.5.8 fails (zookeeper not running on a CVM) but clean ncc | Nutanix Community
Skip to main content
Hello,



We have an 5.1 AOS cluster (EOL I know), we wish to upgrade to latest possible LTS (which seems to be 5.5.8). The ncc is clean but the preupgrade fails at 3°% claiming a zookeeper is not reponding on one of the CVMs.



But

allssh genesis status | grep zookeeper

zookeeper: 3967, 3996, 3997, 4003]

Connection to @1 closed.

zookeeper: 3899, 3928, 3929, 24979, 25089, 25105]

Connection to @2 closed.

zookeeper: 3919, 3948, 3949, 7259, 7365, 7380]

Connection to @3 closed.

zookeeper: 3923, 3952, 3953, 7709, 8032, 8047]

Connection to @4 closed.



And

code:
ncc health_checks system_checks zkinfo_check_plugin


runs normally.



What could be the issue? Thank you
upgrade the NCC version and try to run NCC checks
Hi @MYK



Can you provide me the output of the following commands from any CVM:



allssh "ps -elf | grep vip_service | wc -l"



allssh 'grep "Too many connections" ~/data/logs/zookeeper.out| tail -n 20'



I believe that you are experiencing a known issue and I need the output of commands to confirm.
Hello @RichardsonPorto

Here is the output. (just replaced IPs)





nutanix@NTNX-4-CVM:host4:~$ allssh "ps -elf | grep vip_service | wc -l"

Executing ps -elf | grep vip_service | wc -l on the cluster

================== host1 =================

92

Connection to host1 closed.

================== host2 =================

132

Connection to host2 closed.

================== host3 =================

162

Connection to host3 closed.

================== host4 =================

353

Connection to host4 closed.
I had to split and delete some of the output since it didn't get published



nutanix@NTNX-4-CVM:host4:~$ allssh 'grep "Too many connections" ~/data/logs/zookeeper.out| tail -n 20'

Executing grep "Too many connections" ~/data/logs/zookeeper.out| tail -n 20 on the cluster

================== host1 =================

Connection to host1 closed.

================== host2 =================

2019-02-12 08:37:33,256 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60

2019-02-12 08:37:33,263 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60

2019-02-12 08:37:33,292 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60

...

2019-02-12 08:45:18,164 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60

2019-02-12 08:47:28,783 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60

2019-02-12 08:47:29,285 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60

2019-02-12 08:48:18,173 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60

Connection to host2 closed.

================== host3 =================

Connection to host3 closed.

================== host4 =================

Connection to host4 closed.
Hi @MYK,



Thank you for the output and please open a ticket on Nutanix since you are experiencing a known issue and we can apply a fix to allow the pre-upgrade pass.

Thank you @RichardsonPorto for the reply

We also discovered an new issue when we upgraded the NCC. Some of IPMI addressess are misconfigured (actually with no IP at all). Is it the issue or an unrealted problem ?


Hi @MYK, that IPMI issue is not related to the zookeeper and assuming IPMI have correct IP assigned, a genesis restart on CVM will discovery the IP assigned to the IPMI, but since you will have to open the case for the zookeeper issue, ask the case owner to check that IPMI issue, since both issues should be easy to fix.
Thank you very much @RichardsonPorto