Installation & Configuration

Welcome to the Nutanix NEXT community. To get started please read our short welcome post. Thanks!

cancel
Showing results for 
Search instead for 
Did you mean: 

NOS Upgrade Stuck

Voyager

NOS Upgrade Stuck

Hi, 

 

Yesterday, we performed the NOS upgrade from 4.0.2.1 to 4.5.2.3 follow the instruction guide as below:

1. Run NCC Health Check -- everything is fine, all green.
2. Copy Nutanix NOS file to /home/nutanix and tar it 

nutanix@cvm$ cd /home/nutanix
nutanix@cvm$ tar -zxvf nutanix_installer*-4.5*

3. Then we started the upgrade by

nutanix@cvm$ /home/nutanix/install/bin/cluster -i /home/nutanix/install \
-v /home/nutanix/version-upgrade-metadata.json upgrade
4. It looks fine until we use "upgrade_status" to check the status

nutanix@NTNX-13SX35460027-B-CVM:192.168.70.20:~$ upgrade_status
2016-03-23 21:06:54 INFO upgrade_status:38 Target release version: el6-release-danube-4.5.2.3-stable-e3ee39b1b3f20aa8744b545f597230fd700cd10c
2016-03-23 21:06:54 INFO upgrade_status:43 Cluster upgrade method is set to: automatic rolling upgrade
2016-03-23 21:06:54 INFO upgrade_status:96 SVM 192.168.70.19 still needs to be upgraded. Installed release version: 1.7-release-danube-4.0.2.1-stable-d56e7508a18c7ec80b7f96a6942a14d58a307f09
2016-03-23 21:06:54 INFO upgrade_status:96 SVM 192.168.70.20 still needs to be upgraded. Installed release version: 1.7-release-danube-4.0.2.1-stable-d56e7508a18c7ec80b7f96a6942a14d58a307f09, node is currently upgrading
2016-03-23 21:06:54 INFO upgrade_status:96 SVM 192.168.70.21 still needs to be upgraded. Installed release version: 1.7-release-danube-4.0.2.1-stable-d56e7508a18c7ec80b7f96a6942a14d58a307f09
2016-03-23 21:06:54 INFO upgrade_status:96 SVM 192.168.70.22 still needs to be upgraded. Installed release version: 1.7-release-danube-4.0.2.1-stable-d56e7508a18c7ec80b7f96a6942a14d58a307f09
 
It stucks at this step for while, more than an hour. We look at the web console to see the upgrade status, the progress bar stuck at 73% and the CVM which is currently upgrading the progress bar turn blue .. turn red and turn blue, red, blue. Stuck at the reboot step.

5. We decided to reboot that CVM which currently upgrading manually, before we performed the reboot, we checked 
      /home/nutanix/data/logs/install.out
It said "Install has already finished" 

Then we reboot it, it's still stuck .. 

6. We have checked the cluster status and we found all CVM has the new services such as Acropolis or NutanixGuestTools (I didn't see it before)
Screen Shot 2559-03-24 at 10.45.42 AM.png

7. We checked the NOS version release cat /etc/nutanix/release_version and it said 1.7-release-danube...4.0.2.1 (the old version)

8. Here is the genesis.out logs
Screen Shot 2559-03-24 at 10.53.17 AM.png

9. Here is the Upgrade Software Progress is turn red and then turn blue ..
Screen Shot 2559-03-24 at 11.10.03 AM.png

Then turn blue again .. stuck at waiting for reboot and upgrade completion.
Screen Shot 2559-03-24 at 11.10.49 AM.png

Does anyone has face the issue like this one ? Kindly suggest me. 

Thanks & Regards,
PM

11 REPLIES
Nutanix Employee

Re: NOS Upgrade Stuck

Hello Gatekeeper,

 

The best course of action is to log into the Nutanix Support Portal and open a case. (Support > Open Case from the top menu). I believe one of our SREs can help you very quickly.

 

blaise c

Voyager

Re: NOS Upgrade Stuck

Hi Blaise,

Thanks for your information. But I have already opened the case on the Nutanix Support Portal. 

Unfortunately, they said that this Nutanix Cluster are out of warranty so they cannot support use anymore. 

Now we have two Nutanix Cluster, one is the ESXi cluster which currently faced the issue (out of warranty) and we just bougth the new 2 blocks 4 nodes Nutanix, built it into the AHV cluster today. 

We would like to use the Prism Central to control both of ESX cluster and AHV cluster so we have to update the NOS on ESX cluster first to use it, then we got the problem now.

Cluster 1: ESX cluster -- out of warranty, updated NOS from 4.0.2.1 to 4.5.2.3 and got the problem.

Cluster 2: AHV cluster -- just built it today with 2 blocks 4 nodes, currently running on NOS 4.6, everything are working fine.

It's a little bit disappoint that the support team didn't support us because of the old cluster is out of warranty even though we just bought and built the new cluster today. 


Moderator Moderator
Moderator

Re: NOS Upgrade Stuck

Hey @Gatekeeper - I looked into this today, and it looks like David Sheller from support has reached out three times in the last week, have you been able to sync up with him?

Jon Kohler | Principal Architect, Nutanix | Nutanix NPX #003, VCDX #116 | @JonKohler
Please Kudos if useful!
Pathfinder

Re: NOS Upgrade Stuck

Having a similar issue except with the NCC......was there a resolution to "reset" the web Gui?  I was able to upgrade my NCC fine via comand line.  But I cannot upload any software due to an ""NCC upgrade in progress.  Please wait until finished...." issue

Moderator Moderator
Moderator

Re: NOS Upgrade Stuck

@airborneric - Eric, can you open a ticket with support?

Jon Kohler | Principal Architect, Nutanix | Nutanix NPX #003, VCDX #116 | @JonKohler
Please Kudos if useful!
Pathfinder

Re: NOS Upgrade Stuck

Will do

Pathfinder

Re: NOS Upgrade Stuck

And, out of support warranty.

 

Is there a KB I can read to fix this?

 

Manualy upgrade succeeded btw.  I am running on 4.6.0.2.

Moderator Moderator
Moderator

Re: NOS Upgrade Stuck

Do you have a case open already? Can you email me the case number at jon@nutanix.com?

Jon Kohler | Principal Architect, Nutanix | Nutanix NPX #003, VCDX #116 | @JonKohler
Please Kudos if useful!
Voyager

Re: NOS Upgrade Stuck

Hi Jon, 

Sorry for my late reply, I saw that David Sheller has tried to reach my colleague for a fews day. It's great to see the response from Nutanix. 

We tried the commands as below and it can stop the upgrade process (but still not completed upgrade)
- cluster disable_auto_install;
- cluster restart_genesis;

These two commands can stop the stuck upgrade process, stop the process in web console but still not completed upgrade, there are the new service show on the cluster status but the release_verion still the same, not going to 4.5 as we wish. 

However, since it's the production cluster and we didn't have much time to wait. We afraid that it might be some issue if we didn't do anything so we decided to destroy our new AHV cluster and change it into ESX cluster, migrated 200 VMs to temporary ESX cluster, destroy the cluster that facing the issue, wipe everything and install the new one. Then migrate all VMs back to their old block, destroy temporary ESX cluster and changed them into AHV cluster. 

So now we can say it works by destroy everything and re-create it. 

It's not easy to destroy the cluster which has run almost 800 days (actually 798 days before we destroy it) but we have to do. 

BTW, if you have any KB which related to our case, kindly suggest me. 

Thanks so much for your help Smiley Happy