Solved

Imaging Failure with layer 2/3 10G switch for NOS 4.7 and above

  • 2 November 2016
  • 11 replies
  • 5192 views

Badge +6
Hi,

Since Nutanix came out with NOS 4.7, my team and I have been struggling with Nutanix node imaging for the mentioned NOS version when we use our Brocade switches (ICX6610 and ICX7250)(managed switch) (layer 2/3 10G switch). Each of our attempt in imaging the the nodes has come to failure with below error message:

"
ERROR Unexpected exception in initing ipmiTraceback (most recent call last):File "/home/hudsonb/workspace2/workspace/foundation_installer-3.1.1/builds/build-installer-3.1.1-release/foundation-python-tree/bdist.linux-x86_64/egg/foundation/imaging_step_init_ipmi.py", line 129, in runStandardError: Failed to connect to Phoenix at 192.168.0.22320161102 04:05:17 INFO Exiting SMCIPMITool20161102 04:05:18 ERROR Exception in running Traceback (most recent call last):File "/home/hudsonb/workspace2/workspace/foundation_installer-3.1.1/builds/build-installer-3.1.1-release/foundation-python-tree/bdist.linux-x86_64/egg/foundation/imaging_step.py", line 120, in _runFile "/home/hudsonb/workspace2/workspace/foundation_installer-3.1.1/builds/build-installer-3.1.1-release/foundation-python-tree/bdist.linux-x86_64/egg/foundation/imaging_step_init_ipmi.py", line 129, in runStandardError: Failed to connect to Phoenix at 192.168.0.223

"

This error never occur when we are imaging the nodes with 1G switch (unmanaged switch). Therefore, we can say that we can still image any nodes, except for those nodes within an NX-3175 series (no 1G shared IPMI port). Kindly assist.

P/S: Foundation used: 3.1.1, 3.3, 3.4
icon

Best answer by Jon 3 November 2016, 12:48

Log archives are good, but they need to be submitted through a support ticket. Given the ins and outs of every environment being different, and foundation being a product that simplifies what is arguablely a very complex product, its really tough to nail this down over a community forum.

Upload those to a case, priority 4 if you'd like, and we'll drill into it with you.
View original

11 replies

Userlevel 7
Badge +30
Foundation support is best provided through our support organization. Please submit a support ticket on portal.nutanix.com for NX/SX gear, and the respective OEM for XC/HX gear, and we can drill into these imaging failures via WebEx.

Thanks,
Jon
Userlevel 7
Badge +30
Also, we'll want the foundation logs, which can be grabbed via this command

http://:8000/foundation/log_archive_tar
Userlevel 4
Badge +17
If you plan to upgrade to 4.7. Simply perform one click NOS/AOS upgrade via Prism no need reimaging

If you're using Foundation please check your vlan config on ICX6610 and ICX7250. Make sure its only default vlan or if possible backup your switch config and reset your switch to factory default

I'm using ICX 6650 BASE-T no issue during foundation process
Badge +6
Hi Jon,

Our Nutanix boxes are NX gear, currently there is no immediate hurry for a WebEx session as I posted this from my experiences since the launching of foundation 3.4 and NOS 4.7.
Below are the log_archive from 12 October and 2 November 2016:

12 Oct (Foundation 3.4, NOS 4.7.1):
log_archive 12Oct2016

2 Nov (Foundation 3.4, NOS 4.6.1.1):
log_archive 2Nov2016

I used default vlan for both imaging process using our Brocade switches (ICX6610 and ICX7250)(default configuration). Need to be remind that upon imaging failure, i switched to normal 1G switch and the imaging completed successfully without any problem.
Badge +6
Hi Bezedin,

As this imaging processes are for POCs and our lab tests, we usually need to reimage our NX box.

vlan config seems not to be a problem since we normally use default vlan during imaging process.
Userlevel 4
Badge +17
Looks like the issue is connectivity between nodes and your Foundation client when using 10G Switch

Please refer to this link for Switch requirements
Userlevel 7
Badge +30
Log archives are good, but they need to be submitted through a support ticket. Given the ins and outs of every environment being different, and foundation being a product that simplifies what is arguablely a very complex product, its really tough to nail this down over a community forum.

Upload those to a case, priority 4 if you'd like, and we'll drill into it with you.
Badge +6
Hi Jon,

Thanks for the advise, I will log a case if necessary. As of now, my team and I are quite busy with implementations and POCs, so that could wait.
Userlevel 7
Badge +30
Sure, no problem, just know they are there to help when you run into issues like that. We can't fix issues we dont know about :)

Cheers,
Jon
Userlevel 1
Badge +6
bezeddin wrote:If you plan to upgrade to 4.7. Simply perform one click NOS/AOS upgrade via Prism no need reimaging

If you're using Foundation please check your vlan config on ICX6610 and ICX7250. Make sure its only default vlan or if possible backup your switch config and reset your switch to factory default

I'm using ICX 6650 BASE-T no issue during foundation process



Hi,

in general, you should not use an enterprise switch with the default configuration for Foundation.

Many enterprise switches have STP enabled in the default configuration and will thus take some time to enable access ports, that is the ports the Nutanix nodes are connected to. Since during the Foundation process the nodes are rebooted several times, each reboot resets the port to wait for STP BPDUs, or what amounts to a timeout for access ports, before allowing any traffic flowing across. Foundation uses remote media via IPMI which times out before the switch allows traffic through the port, thus foundation fails in this situation.

You should configure the switch ports that are used to connect Nutanix nodes for Foundation as STP edge ports (known as "portfast" in e.g. Cisco IOS and NX-OS) to avoid the problems described above.

Switches from Cisco (both IOS and NX-OS) and Arista do not work for Foundation in the default configuration. They need edge port configuration using the "portfast" keyword.

You may have more userfriendly enterprise switches, e.g. from Enterasys (now Extreme) Networks, that use MSTP and auto-edge by default, therefore remembering the automatically detected edge-port status across link-flaps and just working for Foundation in the default configuration.

The default configuration of Extreme Networks EXOS based switches works as well for Foundation.

Dell OS 9 (formerly known as FTOS or (Dell) Force10 switches) cannot be used for Foundation in the default configuration, since all ports are disabled (shutdown) and configured as routed ports (no switchport) by default. Of course they can be configured for Foundation use.

To be sure that Foundation will work, always manually configure STP edge ports where appropriate.

The fact that the needed edge port (aka portfast) configuration is not documented by Nutanix, though they should have the needed expertise to do so, results in the strange recommendation to use dumb switches (which nowadays do not provide STP any more) or manageable switches in default configuration (which works for some brands, but not others). That leads to painful experiences during cluster setup when Foundation fails for no obvious reason.

The fact that the IPMI boot from remote media will silently fall back to local boot further complicates troubleshooting this issue.

Thanks,
Erik
Userlevel 4
Badge +17
another 10GB switch can be used for reimaging with default config is Dell N series of course with BaseT module

Reply