Reimage Cluster with ESXi Failing

Question

I am reimaging my lab cluster after we had some serious problems with a conversion to AHV and rollback to ESXi. 2 of my 3 nodes reimaged fine, once I put a /firstboot directory in to the existing ESXi hosts. My 3rd node is losing track of where the firstboot directory would be.

From the foundation log:

20200221 11:40:17 ERROR Command 'scp -i ~/.ssh/id_rsa /tmp/tmp.gkI7z6j7XG/nutanix_provision_network_utils-1.0-py2.7.egg root@192.168.5.1:./vmfs/volumes/4d501d79-cff63899-7951-75210fae7516/Nutanix ./vmfs/volumes/5e4c3359-4b398980-cc34-ac1f6bb9dd8a/Nutanix/firstboot/nutanix_provision_network_utils-1.0-py2.7.egg' returned error code 127 stdout: stderr: FIPS mode initialized bash: line 1: ./vmfs/volumes/5e4c3359-4b398980-cc34-ac1f6bb9dd8a/Nutanix/firstboot/nutanix_provision_network_utils-1.0-py2.7.egg: No such file or directory 20200221 11:40:17 ERROR Failed while copying file /tmp/tmp.gkI7z6j7XG/nutanix_provision_network_utils-1.0-py2.7.egg to host with error Command 'scp -i ~/.ssh/id_rsa /tmp/tmp.gkI7z6j7XG/nutanix_provision_network_utils-1.0-py2.7.egg root@192.168.5.1:./vmfs/volumes/4d501d79-cff63899-7951-75210fae7516/Nutanix ./vmfs/volumes/5e4c3359-4b398980-cc34-ac1f6bb9dd8a/Nutanix/firstboot/nutanix_provision_network_utils-1.0-py2.7.egg' returned error code 127 stdout: stderr: FIPS mode initialized bash: line 1: ./vmfs/volumes/5e4c3359-4b398980-cc34-ac1f6bb9dd8a/Nutanix/firstboot/nutanix_provision_network_utils-1.0-py2.7.egg: No such file or directory 20200221 11:40:18 ERROR Exception in running <ImagingStepProvisionNetwork(<NodeConfig(10.252.200.33) @6bf0>) @68d0> Traceback (most recent call last): File "foundation\imaging_step.py", line 161, in _run File "foundation\imaging_step_provision_network.py", line 209, in run File "foundation\imaging_step_provision_network.py", line 119, in provision_network StandardError: ('Failed to execute threaded_provision_network on %s, error (%s)', '10.252.200.33', "Command 'scp -i ~/.ssh/id_rsa /tmp/tmp.gkI7z6j7XG/nutanix_provision_network_utils-1.0-py2.7.egg root@192.168.5.1:./vmfs/volumes/4d501d79-cff63899-7951-75210fae7516/Nutanix\n./vmfs/volumes/5e4c3359-4b398980-cc34-ac1f6bb9dd8a/Nutanix/firstboot/nutanix_provision_network_utils-1.0-py2.7.egg' returned error code 127\nstdout:\n\nstderr:\nFIPS mode initialized\r\nbash: line 1: ./vmfs/volumes/5e4c3359-4b398980-cc34-ac1f6bb9dd8a/Nutanix/firstboot/nutanix_provision_network_utils-1.0-py2.7.egg: No such file or directory\n")

Basically, it is going down the /bootbank/Nutanix/firstboot path instead of /firstboot. I see the .egg file in the directory where it SCP’s to, but for SOME REASON, it is trying to go to a completely different directory to execute it. I would love to just wipe this node clean and have AOS and ESXi install from scratch. Are there any tricks that I can do to make that happen?

JeremyJ · Accepted Answer

This is not a known issue with an already-recognized root cause. You should not have to make any manual changes during conversion.

I see a case is opened for this. I agree with that course of action.

Much like the old method for replacing a failed hypervisor boot drive, it is possible to run a clean ESXi install, then download and use the phoenix iso and run “configure hypervisor” step. This does not complete any of the configuration steps for you so remaining procedures would need to be completed manually.

If you have important VMs not yet registered on a successfully converted ESXi host, it may be better to troubleshoot the conversion process.

Alona · Answer

Hi there!

Bare with me as I am trying to help you, please.

Firstly, if I understand correctly, you have created /firstboot directory within ESXi file system. This would not matter when converting into AHV as AHV is installed fresh and independent to the pre-existing hypervisor.

Secondly, I would like to point out that Community edition is not exactly equal to full-enterprise version. I am sure I understand that anyway.

Unsuccessful conversion of the cluster should be accompanied by a message in Prism inviting to roll back the conversion. You could execute convert_cluster_status from a CVM in attempt to find out actual state of the node as well as any reasons for the imaging failure.

As per wiping the node clean, since it a lab environment and it is a 3 nodes cluster meaning evicting a node is not an option, you could re-image the entire cluster.

In an attempt to provide some guidance please see this Prism Web Guide: In-place hypervisor conversion which contains Requirements and limitations section amongst the rest.

I apologise for not being able to come up with a more helpful response. Please let us know how you go.

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded