I can’t access LCM in my CE cluster. It just says “LCM Framework Update is in progress. Please check back when the update process is completed.” It’s been like this for days. Updating Foundation and NCC worked fine the old way.
I can’t see any active tasks in Prism or using ecli task.list
Any ideas?
Page 1 / 1
I’ve noticed if I do an NCC check then I get a load of :
INFO: Cluster/node reports it is currently undergoing maintenance/upgrade. This health check plugin is disabled during this workflow to avoid inaccurate results or alerts but will run again when the workflow completes. See KB4999 for more details.
But ncli host ls shows false for maintenance mode on all three nodes.
I found some other scripts but again says nothing is happening:
nutanix@NTNX-afcd9aa8-A-CVM:192.168.250.236:/usr/local/nutanix/cluster/bin/lcm$ ./lcm_upgrade_status Ongoing upgrades: No upgrade is in progress
Finished upgrades: Up to 5 previously finished upgrade batches listed in descending order of upgrade start time: nutanix@NTNX-afcd9aa8-A-CVM:192.168.250.236:/usr/local/nutanix/cluster/bin/lcm$
nutanix@NTNX-afcd9aa8-A-CVM:192.168.250.236:/usr/local/nutanix/cluster/bin/lcm$ lcm_auto_upgrade_status No autoupdate in progress
I just tried moving the lcm_leader to another node (no idea how to do this so I just rebooted the CVM the lcm_leader was on!) and it now just hangs on “Loading...” when going to the LCM page in Prism.
There’s nothing there that helps me in the state it’s in unfortunately. These don’t exist:
~/data/logs/lcm_ops.out or
~/data/logs/lcm_wget.log
No stuck tasks in ergon.out
Nothing interesting in genesis.out
….lcm/lcm path doesn’t exist so no issue there.
I think I need to clear the upgrade status that’s set somewhere for NCC to report this against most of its checks:
Detailed information for ahv_version_check: Node 192.168.250.237: INFO: Cluster/node reports it is currently undergoing maintenance/upgrade. This health check plugin is disabled during this workflow to avoid inaccurate results or alerts but will run again when the workflow completes. See KB4999 for more details.
Yeah so these both say no upgrade is taking place:
nutanix@NTNX-5de7c188-A-CVM:192.168.250.235:~/cluster/bin/lcm$ upgrade_status 2021-03-09 14:47:36,142Z INFO zookeeper_session.py:176 upgrade_status is attempting to connect to Zookeeper 2021-03-09 14:47:36,149Z INFO upgrade_status:38 Target release version: el7.3-release-ce-2020.09.16-stable-d4fc219b73b4181935a3a19465eb922313fc735f 2021-03-09 14:47:36,152Z INFO upgrade_status:103 SVM 192.168.250.235 is up to date 2021-03-09 14:47:36,152Z INFO upgrade_status:103 SVM 192.168.250.236 is up to date 2021-03-09 14:47:36,153Z INFO upgrade_status:103 SVM 192.168.250.237 is up to date
nutanix@NTNX-5de7c188-A-CVM:192.168.250.235:~/cluster/bin/lcm$ host_upgrade_status 2021-03-09 14:47:56,572Z INFO zookeeper_session.py:176 host_upgrade_status is attempting to connect to Zookeeper Automatic Hypervisor upgrade: Enabled Target host version: None
But NCC says:
Detailed information for cluster_active_upgrade_check: Node 192.168.250.235: INFO: ['NOS', 'Hypervisor', 'Firmware'] being upgraded Refer to KB 5277 (http://portal.nutanix.com/kb/5277) for details on cluster_active_upgrade_check or Recheck with: ncc health_checks system_checks cluster_active_upgrade_check
Shot in the dark, can you upgrade NCC? It will effectively restart the process.
No I’m already on the latest :(
I’ve today got to a different stage by leaving it overnight. When I go to the LCM page it just says “Waiting for LCM Framework to start...”
edit oh leaving it for a few minutes it’s back to “LCM Framework Update in Progress, please check back when the update process is completed.”
I notice in genesis log this repeats over and over:
2021-03-10 10:23:17,117Z INFO ergon_utils.py:825 No LCM operation running 2021-03-10 10:23:17,118Z INFO ergon_utils.py:1595 Cannot find root task uuid 2021-03-10 10:23:17,121Z INFO zeus_utils.py:413 Zk node /appliance/logical/lcm/mercury_config/f58690da-9d18-400c-8a6a-f7d3a4e09b28 doesn't exist 2021-03-10 10:23:17,121Z INFO framework.py:2196 Mercury config is in progress. Returning Autoupdate
Edit2 - I think I’m on to something. “f58690da-9d18-400c-8a6a-f7d3a4e09b28” is the ID of my 3rd node and it’s missing:
as that id now shows when I do a list. I did so many different commands I’m not 100% sure it is that one though :)
Cheers,
Steve
Oh I had to do it again but adding the other hosts in as i had a similar error in genesis logs, then I restarted genesis and the lcm leader moved to another host and now the LCM prechecks have completed and inventory is running!
Woohoo all done and updates performed!
Also NCC is clear now - no longer says upgrade in progress