@Kike2020 It definitely looks like LCM operation was in the process when the idrac update had to happen. When you say idrec update, was this a firmware update? And if so was it initiated using LCM or it was manual update?
You may be able to take more information out by finding the lcm leader (lcm_leader) in the cluster and checking the lcm files in nutanix “data/logs” folder (files include “lcm” in their names and the content of each can help getting to the bottom of this. Try to add them here (get the lines where they indicate and “Error” in these files.
Regards,
-Said
Hi @sbarab
The process was as follows, through the LCM I made the inventory and from there I selected the firmware update of iDRAC was that in the middle of the update I send the error and from there I no longer pick up the services.
sent what the log collected
2019-12-11 12:43:34 INFO command_execute.py:86 (IP ADDRESS NODE) Waiting 6 seconds before next attempt 2019-12-11 12:43:40 INFO command_execute.py:52 (IP ADDRESS NODE) Attempt 3 to execute if (Test-Path "$\Nutanix\Tmp\lcm_staging") \Nutanix\Tmp\lcm_staging"} on IP ADDRESS NODE 2019-12-11 12:43:40 WARNING command_execute.py:83 (IP ADDRESS NODE) Failed to execute command if (Test-Path "$\Nutanix\Tmp\lcm_staging") \Nutanix\Tmp\lcm_staging"} on IP ADDRESS NODE. ret: -1 out: err: Cannot remove item C:\Program Files\Nutanix\Tmp\lcm_staging\c879a59c-3710-404a-abec-66102899db01: The process cannot access the file 'c879a59c-3710-404a-abec-66102899db01' because it is being used by another process.Cannot remove item C:\Program Files\Nutanix\Tmp\lcm_staging: The directory is not empty. 2019-12-11 12:43:40 INFO command_execute.py:86 (IP ADDRESS NODE) Waiting 8 seconds before next attempt 2019-12-11 12:43:48 INFO command_execute.py:52 (IP ADDRESS NODE) Attempt 4 to execute if (Test-Path "$\Nutanix\Tmp\lcm_staging") \Nutanix\Tmp\lcm_staging"} on IP ADDRESS NODE 2019-12-11 12:43:49 ERROR catalog_staging_utils.py:820 (IP ADDRESS NODE) Failed to run if (Test-Path "$\Nutanix\Tmp\lcm_staging") \Nutanix\Tmp\lcm_staging"} on IP ADDRESS NODE with ret: -1, out: , err: Cannot remove item C:\Program Files\Nutanix\Tmp\lcm_staging\c879a59c-3710-404a-abec-66102899db01: The process cannot access the file 'c879a59c-3710-404a-abec-66102899db01' because it is being used by another process.Cannot remove item C:\Program Files\Nutanix\Tmp\lcm_staging: The directory is not empty. ^C nutanix@NTNX-59TT382-A-CVM:IP ADDRESS:~/data/logs$ tail -F lcm_ops.out 2019-12-11 12:43:34 INFO command_execute.py:86 (IP ADDRESS NODE) Waiting 6 seconds before next attempt 2019-12-11 12:43:40 INFO command_execute.py:52 (IP ADDRESS NODE) Attempt 3 to execute if (Test-Path "$\Nutanix\Tmp\lcm_staging") \Nutanix\Tmp\lcm_staging"} on IP ADDRESS NODE 2019-12-11 12:43:40 WARNING command_execute.py:83 (IP ADDRESS NODE) Failed to execute command if (Test-Path "$\Nutanix\Tmp\lcm_staging") \Nutanix\Tmp\lcm_staging"} on IP ADDRESS NODE. ret: -1 out: err: Cannot remove item C:\Program Files\Nutanix\Tmp\lcm_staging\c879a59c-3710-404a-abec-66102899db01: The process cannot access the file 'c879a59c-3710-404a-abec-66102899db01' because it is being used by another process.Cannot remove item C:\Program Files\Nutanix\Tmp\lcm_staging: The directory is not empty. 2019-12-11 12:43:40 INFO command_execute.py:86 (IP ADDRESS NODE) Waiting 8 seconds before next attempt 2019-12-11 12:43:48 INFO command_execute.py:52 (IP ADDRESS NODE) Attempt 4 to execute if (Test-Path "$\Nutanix\Tmp\lcm_staging") \Nutanix\Tmp\lcm_staging"} on IP ADDRESS NODE 2019-12-11 12:43:49 ERROR catalog_staging_utils.py:820 (IP ADDRESS NODE) Failed to run if (Test-Path "$\Nutanix\Tmp\lcm_staging") \Nutanix\Tmp\lcm_staging"} on IP ADDRESS NODE with ret: -1, out: , err: Cannot remove item C:\Program Files\Nutanix\Tmp\lcm_staging\c879a59c-3710-404a-abec-66102899db01: The process cannot access the file 'c879a59c-3710-404a-abec-66102899db01' because it is being used by another process.Cannot remove item C:\Program Files\Nutanix\Tmp\lcm_staging: The directory is not empty.
perform a cluster_status and left:
all the nodes go well "UP", but the error node goes out:
CVM: IP ADDRESS NODE Maintenance
@Kike2020 I am investigating this, in the mean time it would be great if yo let me know what AOS version you are running on and what is the lcm release. You might also want to run the commands:
acli task.list include_completed=no
and
progress_monitor_cli --fetchall
Regards,
Said
@Kike2020 on the top of above, the error condition may be higher up on the lcm logs, you may want to add that logs in here (if possible).
Hi @sbarab
I have version 5.10.7 of AOS. 3.9.2.1 NCC, 4.5 Foundation, LCM Version 2.2.11203
the first command left me invalid, I don't know if something is doing wrong, the second one showed me this:
2019-12-12 07:31:59,446:3209(0x7fe073913c80):ZOO_INFO@log_env@951: Client environment:zookeeper.version=zookeeper C client 3.4.3 2019-12-12 07:31:59,447:3209(0x7fe073913c80):ZOO_INFO@log_env@955: Client environment:host.name=ntnx-59tt382-a-cvm 2019-12-12 07:31:59,447:3209(0x7fe073913c80):ZOO_INFO@log_env@962: Client environment:os.name=Linux 2019-12-12 07:31:59,447:3209(0x7fe073913c80):ZOO_INFO@log_env@963: Client environment:os.arch=3.10.0-957.21.3.el7.nutanix.20190619.cvm.x86_64 2019-12-12 07:31:59,447:3209(0x7fe073913c80):ZOO_INFO@log_env@964: Client environment:os.version=#1 SMP Wed Jun 19 05:38:02 UTC 2019 2019-12-12 07:31:59,447:3209(0x7fe073913c80):ZOO_INFO@zookeeper_init@999: Initiating client connection, host=zk3:9876,zk2:9876,zk1:9876 sessionTimeout=20000 watcher=0x561d688e4580 sessionId=0 sessionPasswd=<null> context=0x561d69818040 flags=0 2019-12-12 07:31:59,451:3209(0x7fe06b788700):ZOO_INFO@zookeeper_interest@1942: Connecting to server IP ADDRESS:PORT 2019-12-12 07:31:59,451:3209(0x7fe06b788700):ZOO_INFO@zookeeper_interest@1979: Zookeeper handle state changed to ZOO_CONNECTING_STATE for socket tIP ADDRESS:PORT] 2019-12-12 07:31:59,452:3209(0x7fe06b788700):ZOO_INFO@check_events@2161: initiated connection to server rIP ADDRESS:PORT] 2019-12-12 07:31:59,454:3209(0x7fe06b788700):ZOO_INFO@check_events@2208: session establishment complete on server rIP ADDRESS:PORT], sessionId=0x36eadf2847ec73e, negotiated timeout=20000
@Kike2020 OK,
1- run the command “lcm_leader” on any cvm.
2- ssh to that cvm using the credentials “nutanix”
2- cd data/logs
3- ls -lart lcm*
4- copy the results.
5-check for the the logs “lcm_wget.log” and “lcm_op.trace” and “lcm_ops.out”.Examine their output. There should be line there in the logs with the timestamp of the day that you run the lcm for firmware upgrades
6- zip the logs above and upload them here
7- I will review and based on my finding I will either provide you with a response or ask you to open a case with our support line to dig deeper in this issue
Regards,
-Said
Hello sorry for the delay, the server is back UP, they took it out of the cluster and re-entered it.