Question

Phoenix fails to load squashfs.img from network and node's filesystem

  • 13 October 2023
  • 18 replies
  • 625 views

Badge +1

"Phoenix failed to load squashfs.img from both network and existing CVM filesystems of the node"

The error above was seen for each of the four nodes. The nodes are Nutanix NX-8155-G8.

Any idea of what could be causing? Thanks


This topic has been closed for comments

18 replies

Badge +1

Hi Olugbenga,

I encounter the same issue on 3 NX-8035-G6 nodes, attempting to deploy with nutanix_installer_package-release-fraser-6.5.4-stable.

Hope that any body will help us.

Regards

Badge +1

Hi Olugbenga,

I encounter the same issue on 3 NX-8035-G6 nodes, attempting to deploy with nutanix_installer_package-release-fraser-6.5.4-stable.

Hope that any body will help us.

Regards

I’ve reviewed the post Phoenix Failed to load squashfs.img on HPe DL380 Gen10 | Nutanix Community and the solution is to disable STP (spanning tree protocol) on the port, in my environment the STP is not enabled on the ports.

I’ve also tried without set VLAN ID, no improvements.

Userlevel 2
Badge +3

problem can be network related… Can you ping the node during deployment? It should be the CVM IP address that you configured during Foundation...

Badge +1

Hi Bart, thanks for your return.

From the console after the failure, i can ping herself :

But to be honest, i don’t think that at this step the network is already configured and usable :
 

This command will take some time ...
Device 1: None
Device 2: ISO File [/home/nutanix/foundation/tmp/sessions/20231018-041540-2/phoenix_node_isos/foundation.node_xxx.xxx.xxx.113.iso]

10.254.255.203 X11DPT-B+ (S0/G0,0w) 04:22 , AST2500>, AST2500>
2023-10-18 11:22:15,704Z DEBUG Virtual media is mounted successfully: /home/nutanix/foundation/tmp/sessions/20231018-041540-2/phoenix_node_isos/foundation.node_xxx.xxx.xxx.113.iso
2023-10-18 11:22:15,718Z INFO Power status is on
2023-10-18 11:22:15,719Z INFO BMC should be booting into phoenix
2023-10-18 11:22:15,719Z INFO Waiting for remote node to boot into phoenix, this may take 20 minutes

Userlevel 2
Badge +3

okay… but is it pingable from the outside?

 

Let me check my mail archove… I think I have seen this before...

Badge +1

From the Foundation VM and management server from where i access foundation :

  • Host IP addresses NOK
  • CVM IP addresses NOK
  • Gateway OK
  • BMC IP addresses OK

Ok thanks for the check

Userlevel 2
Badge +3

How is your switch config / network setup? From what I remember, network is needed during that stage of foundation… I am thinking in terms of LACP, vlan tagging…

 

Badge +1

The foundation is able to communicate with CVM/Host and BMC subnet from the base interface (no additionnal interfaces).

No SPC enable on the switches, VLAN tagging.

The only remarkable things, this 3 nodes was previously imaged with another subnet for CVM/Hosts but it was conflicting with other cluster, so we decided to move it.

Userlevel 2
Badge +3

Are you starting a bare metal foundation?

can you ping the gateway from the node when the process is ending?

Badge +1

Yes, bare metal foundation.

No i’m not able to ping the gateway from the node after the failure.

Visibly my issue is here.

Userlevel 2
Badge +3

yes, I should check the physical network first…

Badge +1

Just reviewing the network configuration with my network engineer and everything seems to be ok.

Maybe an issue with the Foundation, i will try to redeploy it.

Thanks for your help.

Userlevel 2
Badge +3

good luck… If the vlan is tagged, don’t forget to enter the vlan id in the configuration...

I have seen exact the same issue and it had to do with the Vlan tagging that wasnt correct. 

Just wanted to share that I had the exact same error message when trying to foundation a 3060 G8 node and I ended up spinning up the Foundation VM (CentOS Linux box with foundation) in the cluster which this node I am imaging will be going into and it worked like a charm.


I too could not ping the gateway from the IPMI after it failed. I was using an unmanaged gigabit switch with my MacBook and so STP (Spanning Tree Protocol) was not a factor for me. I could ping the host IP however. 

When Foundation was running, I had the KVM console window open and saw that connectivity failed on all 4 ports (eth0, eth1, eth2 and eth3). Then I would get the Phoenix failed to load squashfs.img message.


I was fighting with the standalone Foundation on my MacBook numerous times and finally got the node imaged by merely by using the Foundation VM instead of standalone on my Mac. 


Safe to say we...”squashed”… this issue (sorry I had to...long day...) 😅

 

@BartDonders  @lleuba86 

Badge +1

Hi,

I entrusted the cluster to one of my colleagues who was obviously able to image the cluster without any problems.
I'm waiting for him to come back and tell me what I was doing wrong.


Safe to say we...”squashed”… this issue (sorry I had to...long day...) 😅

Haha ;)

Userlevel 6
Badge +8

Foundation will mount a iso file via IPMI to the nodes. The nodes startup from these ISO and will use the normal interfaces (IP address and VLAN and LACP settings are provides via foundation wizards) to talk back to foundation. If this "talk back” isnt working you get these errors. 

 

And: Dont forget to disable the firewall on the machine where foundation is running, or open the inbound connection.

Userlevel 2
Badge +3

In my experience its almost always related to VLAN/bond settings. Either the AHV host bond didn’t create successfully so can’t reach the foundation VM after initial boot, there wasn’t enough connectivity to the Foundation VM to load the image because of VLAN mismatch or the foundation VM has bugs (have experienced this with the Windows version).