Question

Problems installing 4-node Nutanix Lenovo HX

  • 26 January 2019
  • 0 replies
  • 156 views

Badge +7
code:
I started deploying four Nutanix nodes on one client. The nodes are Lenovo HX3320 model and their respective part numbers are: PE03Y2MK, PE03Y2MJ, PE03Y2ML, and PE03Y2MM.

They have been configured with the following IP information:
First node:

IMM: 172.16.24.101

Node: 172.16.26.1

CMV: 172.16.26.5



Second node:

IMM: 172.16.24.102

Node: 172.16.26.2

CMV: 172.16.26.6



Third node:

IMM: 172.16.24.103

Node: 172.16.26.3

CMV: 172.16.26.7



Node Room:

IMM: 172.16.24.104

Node: 172.16.26.4

CMV: 172.16.26.8

The 172.14.24.x / 24 network is management, and was with the IMMs. The 172.16.26.x / 24 network is that of servers and received the IPs of the SFP ports of the HX3200 nodes.
With the above information, we started to install: we started Nutanix Foundation, version 4.0.5, using AOS version 5.5.5 for installation. The Foundation Applet recognized only the last three nodes, seeing in none of the attempts the first node. I continued the installation by manually adding the first node through IMM IP. The installation continued without major problems, but the first node could not be recognized as Lenovo and the process in it stopped at 2%. The others proceeded, however, without the first node accompanying the process, the installation stopped at about 55% (I can not remember the percentage precisely, it was "fifty-few" percent).

We interrupted the installation and started over. This time, none of the nodes were automatically recognized by the Foundation Applet, so I added the nodes manually through the IMM IP. The installation ran smoothly up to about 35% of each node, where they rebooted but the installer waited for them to "reboot into the Phoenix". Since there was no response (despite the nodes having successfully rebooted), the installation ended in error.

After another attempt ended in the same way as described above, I tried to check if there was any update to do and I updated the Foundation to 4.3.1. I also started to use AOS 5.6.2, because I saw that the AOS present in the nodes was already in version 5.6. After the updates, I repeated the installation process and again have to add the nodes manually because they were not automatically recognized in the Foundation. Similarly to the previous time, the installation stopped with ero at 35%, for the same reasons above.

We then decided to leave all IPs on the same network, 172.16.24.x / 24, and did a new installation. Same result: It stopped with a 35% error because the nodes, although successfully rebooted, were not seen by the installer, which continued in the message "Waiting for node to reboot in Phoenix".

Following some guidance we received from Nutanix staff, we updated the AOS used for 5.1.0. We received information that Foundation 4.3 had problems with some Broadwell processors and I checked which processor we have: it's Sky Lake, so that's fine. We started another attempt, and this time only the first node was detected automatically in Foundation. Although it progressed through the installation, the manually added nodes stopped at 35%, for the same reasons as above. The installation failed at 43%, since the nodes except the first were locked at 35%.
We then decided to leave all IPs on the same network, 172.16.24.x / 24, and did a new installation. Same result: It stopped with a 35% error because the nodes, although successfully rebooted, were not seen by the installer, which continued in the message "Waiting for node to reboot in Phoenix".

Following some guidance we received from Nutanix staff, we updated the AOS used for 5.1.0. We received information that Foundation 4.3 had problems with some Broadwell processors and I checked which processor we have: it's Sky Lake, so that's fine. We started another attempt, and this time only the first node was detected automatically in Foundation. Although it progressed through the installation, the manually added nodes stopped at 35%, for the same reasons as above. The installation failed at 43%, since the nodes except the first were locked at 35%.

After that attempt, I went to the console in the Rack and saw that the first node bootped correctly, being at the CentOS prompt. The others, not: were in a loop, looking for connection through the old IP band, 172.16.26.x, complaining that they could not get along with "Mothership." That is, they are not even getting the IP of the new range (172.16.24.x) in the last installation attempts. As they were looping, I did not have access to the CentOS prompt to try to resolve manually. I tried to manually apply a Phoenix image without Hypervisor, downloaded directly from Nutanix, through the "Remote Control" of the IMM, but the Phoenix image warned that Phoenix and AOS images were present in the worked nodes and stopped the process.

And here I am, researching what may be happening and how I can resolve this situation.

Anybody can help us?

0 replies

Be the first to reply!

Reply