Hi Team
I am trying to setup Prism Central in Nutanix CE. I have configured the required details in Prism Console for deploying prism central. My deployment is failing repeatedly and getting the error saying it’s failed in tarball extraction phase. I have tried multiple times and every time I saw the issue in UI.
I went through the /home/nutanix/data/logs/genesis.out logs to get more details on the issue. I found something interesting there. When we start the Prism central deployment, it first tried to download the prism central tarball and unzip it. One it’s unzipped then it’s start to convert the qcow2 file to image. when it start to convert the qcow2 → image , the command is configured with a timeout of 1800 sec. Please have a look into the below log line.
2024-07-01 08:55:17,702Z INFO 25030000 uvm.py:1606 Running cmd 6'/usr/bin/timeout', '1800', '/usr/local/nutanix/bin/qemu-img', 'convert', '-p', '-f', 'qcow2', '-O', 'raw', u'nfs://127.0.0.1/default-container-40804253832490/pc.2022.6.0.10-pc-boot.qcow2', u'nfs://127.0.0.1/default-container-40804253832490/pc.2022.6.0.10-pc-boot.img']
2024-07-01 08:55:17,791Z INFO 25030000 uvm.py:1606 Running cmd 0'/usr/bin/timeout', '1800', '/usr/local/nutanix/bin/qemu-img', 'convert', '-p', '-f', 'qcow2', '-O', 'raw', u'nfs://127.0.0.1/default-container-40804253832490/pc.2022.6.0.10-pc-home.qcow2', u'nfs://127.0.0.1/default-container-40804253832490/pc.2022.6.0.10-pc-home.img']
failure log lines
2024-07-01 09:25:06,157Z INFO 25030000 deployment.py:963 Busy waiting for /default-container-40804253832490/pc.2022.6.0.10-pc-home.qcow2->img (63.26/100%)^M
2024-07-01 09:25:17,826Z INFO 25030000 deployment.py:963 Busy waiting for /default-container-40804253832490/pc.2022.6.0.10-pc-home.qcow2->img
2024-07-01 09:25:17,826Z INFO 25030000 deployment.py:965 Done Waiting for /default-container-40804253832490/pc.2022.6.0.10-pc-home.qcow2.qcow2 extract
2024-07-01 09:25:17,826Z INFO 25030000 deployment.py:989 Finished waiting for all procs
2024-07-01 09:25:17,827Z INFO 25030000 client.py:232 Creating stub for Ergon on 127.0.0.1: 2090
2024-07-01 09:25:18,071Z ERROR 25030000 deployment.py:1237 Unable to extract image
2024-07-01 09:25:18,071Z ERROR 25030000 deployment.py:1238 Error in deployment:Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/cluster/deployment/deployment.py", line 1108, in deploy_worke
Exception: Unable to extract image
If we compare the timestamp between start and when the error occurred, it’s exactly 30 mins and which is equivalent to 1800 sec. I am not sure why the timeout is configured by Nutanix but it’s putting our deployment to a failed state.
Can you please let us know how to get rid of this timeout by configuring it to a higher interval or just remove the timeout check. All the required screenshots are attached.
Note:
Hypervisor Disk - 200 GB
CVM Disk - 700 GB
Data Disk - 400 GB
Container files to verify that the tar file download and extraction is successful.