Solved

Preupgrade to AOS 5.5.8 fails (zookeeper not running on a CVM) but clean ncc

  • 11 February 2019
  • 8 replies
  • 2680 views

Badge
  • Adventurer
  • 4 replies
Hello,

We have an 5.1 AOS cluster (EOL I know), we wish to upgrade to latest possible LTS (which seems to be 5.5.8). The ncc is clean but the preupgrade fails at 3°% claiming a zookeeper is not reponding on one of the CVMs.

But
allssh genesis status | grep zookeeper
zookeeper: [3967, 3996, 3997, 4003]
Connection to @1 closed.
zookeeper: [3899, 3928, 3929, 24979, 25089, 25105]
Connection to @2 closed.
zookeeper: [3919, 3948, 3949, 7259, 7365, 7380]
Connection to @3 closed.
zookeeper: [3923, 3952, 3953, 7709, 8032, 8047]
Connection to @4 closed.

And
code:
ncc health_checks system_checks zkinfo_check_plugin

runs normally.

What could be the issue? Thank you
icon

Best answer by RichardsonPorto 13 February 2019, 11:54

View original

This topic has been closed for comments

8 replies

Userlevel 3
Badge +7
upgrade the NCC version and try to run NCC checks
Userlevel 3
Badge +5
Hi @MYK

Can you provide me the output of the following commands from any CVM:

allssh "ps -elf | grep vip_service | wc -l"

allssh 'grep "Too many connections" ~/data/logs/zookeeper.out| tail -n 20'

I believe that you are experiencing a known issue and I need the output of commands to confirm.
Badge
Hello @RichardsonPorto
Here is the output. (just replaced IPs)


nutanix@NTNX-4-CVM:host4:~$ allssh "ps -elf | grep vip_service | wc -l"
Executing ps -elf | grep vip_service | wc -l on the cluster
================== host1 =================
92
Connection to host1 closed.
================== host2 =================
132
Connection to host2 closed.
================== host3 =================
162
Connection to host3 closed.
================== host4 =================
353
Connection to host4 closed.
Badge
I had to split and delete some of the output since it didn't get published

nutanix@NTNX-4-CVM:host4:~$ allssh 'grep "Too many connections" ~/data/logs/zookeeper.out| tail -n 20'
Executing grep "Too many connections" ~/data/logs/zookeeper.out| tail -n 20 on the cluster
================== host1 =================
Connection to host1 closed.
================== host2 =================
2019-02-12 08:37:33,256 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60
2019-02-12 08:37:33,263 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60
2019-02-12 08:37:33,292 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60
...
2019-02-12 08:45:18,164 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60
2019-02-12 08:47:28,783 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60
2019-02-12 08:47:29,285 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60
2019-02-12 08:48:18,173 - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:9876:NIOServerCnxnFactory@224] - Too many connections from /host4 - max is 60
Connection to host2 closed.
================== host3 =================
Connection to host3 closed.
================== host4 =================
Connection to host4 closed.
Userlevel 3
Badge +5
Hi @MYK,

Thank you for the output and please open a ticket on Nutanix since you are experiencing a known issue and we can apply a fix to allow the pre-upgrade pass.
Badge

Thank you @RichardsonPorto for the reply

We also discovered an new issue when we upgraded the NCC. Some of IPMI addressess are misconfigured (actually with no IP at all). Is it the issue or an unrealted problem ?

Userlevel 3
Badge +5
Hi @MYK, that IPMI issue is not related to the zookeeper and assuming IPMI have correct IP assigned, a genesis restart on CVM will discovery the IP assigned to the IPMI, but since you will have to open the case for the zookeeper issue, ask the case owner to check that IPMI issue, since both issues should be easy to fix.
Badge
Thank you very much @RichardsonPorto