NOS Upgrade Stuck


Badge +2
Hi,

Yesterday, we performed the NOS upgrade from 4.0.2.1 to 4.5.2.3 follow the instruction guide as below:1. Run NCC Health Check -- everything is fine, all green.2. Copy Nutanix NOS file to /home/nutanix and tar it
nutanix@cvm$ cd /home/nutanixnutanix@cvm$ tar -zxvf nutanix_installer*-4.5*3. Then we started the upgrade by
nutanix@cvm$ /home/nutanix/install/bin/cluster -i /home/nutanix/install -v /home/nutanix/version-upgrade-metadata.json upgrade4. It looks fine until we use "upgrade_status" to check the statusnutanix@NTNX-13SX35460027-B-CVM:192.168.70.20:~$ upgrade_status2016-03-23 21:06:54 INFO upgrade_status:38 Target release version: el6-release-danube-4.5.2.3-stable-e3ee39b1b3f20aa8744b545f597230fd700cd10c2016-03-23 21:06:54 INFO upgrade_status:43 Cluster upgrade method is set to: automatic rolling upgrade2016-03-23 21:06:54 INFO upgrade_status:96 SVM 192.168.70.19 still needs to be upgraded. Installed release version: 1.7-release-danube-4.0.2.1-stable-d56e7508a18c7ec80b7f96a6942a14d58a307f092016-03-23 21:06:54 INFO upgrade_status:96 SVM 192.168.70.20 still needs to be upgraded. Installed release version: 1.7-release-danube-4.0.2.1-stable-d56e7508a18c7ec80b7f96a6942a14d58a307f09, node is currently upgrading2016-03-23 21:06:54 INFO upgrade_status:96 SVM 192.168.70.21 still needs to be upgraded. Installed release version: 1.7-release-danube-4.0.2.1-stable-d56e7508a18c7ec80b7f96a6942a14d58a307f092016-03-23 21:06:54 INFO upgrade_status:96 SVM 192.168.70.22 still needs to be upgraded. Installed release version: 1.7-release-danube-4.0.2.1-stable-d56e7508a18c7ec80b7f96a6942a14d58a307f09 It stucks at this step for while, more than an hour. We look at the web console to see the upgrade status, the progress bar stuck at 73% and the CVM which is currently upgrading the progress bar turn blue .. turn red and turn blue, red, blue. Stuck at the reboot step.5. We decided to reboot that CVM which currently upgrading manually, before we performed the reboot, we checked /home/nutanix/data/logs/install.outIt said "Install has already finished" Then we reboot it, it's still stuck .. 6. We have checked the cluster status and we found all CVM has the new services such as Acropolis or NutanixGuestTools (I didn't see it before)

9. Here is the Upgrade Software Progress is turn red and then turn blue ..

Does anyone has face the issue like this one ? Kindly suggest me. Thanks & Regards,PM

11 replies

Userlevel 2
Badge +12
Hello Gatekeeper,

The best course of action is to log into the Nutanix Support Portal and open a case. (Support > Open Case from the top menu). I believe one of our SREs can help you very quickly.

blaise c
Badge +2
Hi Blaise,Thanks for your information. But I have already opened the case on the Nutanix Support Portal. Unfortunately, they said that this Nutanix Cluster are out of warranty so they cannot support use anymore. Now we have two Nutanix Cluster, one is the ESXi cluster which currently faced the issue (out of warranty) and we just bougth the new 2 blocks 4 nodes Nutanix, built it into the AHV cluster today. We would like to use the Prism Central to control both of ESX cluster and AHV cluster so we have to update the NOS on ESX cluster first to use it, then we got the problem now.Cluster 1: ESX cluster -- out of warranty, updated NOS from 4.0.2.1 to 4.5.2.3 and got the problem.Cluster 2: AHV cluster -- just built it today with 2 blocks 4 nodes, currently running on NOS 4.6, everything are working fine.It's a little bit disappoint that the support team didn't support us because of the old cluster is out of warranty even though we just bought and built the new cluster today.
Userlevel 7
Badge +30
Hey  - I looked into this today, and it looks like David Sheller from support has reached out three times in the last week, have you been able to sync up with him?
Badge +1
Having a similar issue except with the NCC......was there a resolution to "reset" the web Gui? I was able to upgrade my NCC fine via comand line. But I cannot upload any software due to an ""NCC upgrade in progress. Please wait until finished...." issue
Userlevel 7
Badge +30
- Eric, can you open a ticket with support?
Badge +1
Will do
Badge +1
And, out of support warranty.

Is there a KB I can read to fix this?

Manualy upgrade succeeded btw. I am running on 4.6.0.2.
Userlevel 7
Badge +30
Do you have a case open already? Can you email me the case number at jon@nutanix.com?
Badge +2
Hi Jon, Sorry for my late reply, I saw that David Sheller has tried to reach my colleague for a fews day. It's great to see the response from Nutanix. We tried the commands as below and it can stop the upgrade process (but still not completed upgrade)- cluster disable_auto_install;- cluster restart_genesis;These two commands can stop the stuck upgrade process, stop the process in web console but still not completed upgrade, there are the new service show on the cluster status but the release_verion still the same, not going to 4.5 as we wish. However, since it's the production cluster and we didn't have much time to wait. We afraid that it might be some issue if we didn't do anything so we decided to destroy our new AHV cluster and change it into ESX cluster, migrated 200 VMs to temporary ESX cluster, destroy the cluster that facing the issue, wipe everything and install the new one. Then migrate all VMs back to their old block, destroy temporary ESX cluster and changed them into AHV cluster. So now we can say it works by destroy everything and re-create it. It's not easy to destroy the cluster which has run almost 800 days (actually 798 days before we destroy it) but we have to do. BTW, if you have any KB which related to our case, kindly suggest me. Thanks so much for your help 🙂
Userlevel 7
Badge +30
I'm sorry you had to go through all of that work, but I'm glad you were able to get it all working again
Badge
Ran into the same issue, here is what was done to resolve (Version 5.0.2):
progress_monitor_cli --fetchall

Reply