Question

AHV and CVM fails after restarting the AHV nodes

  • 21 August 2021
  • 1 reply
  • 70 views

Hello People,
 Facing strange issues on deploying Nutanix 3 node cluster.
 Everything is completed without any issues deploying Nutanix. After restarted the nodes we are not able to reach AHV Node IPs after logging into ILO manually restarted the network services and then the AHV and CVM are reachable and all the cluster is normal. These happen only I restarted the nodes or shutting down AHV.
Before restarting, network interfaces are up only.
  From Switch side we have configured access port VLAN tagged 1010.
  Here eth4 is connected on switch-1 port 1 and same eth6 is connected on switch-2 port 2
  Note : Both switch-1 and switch-2 is stacked
  Model : Cisco SG350XG
  AOS : 5.20.1.1 LTS

Here my config
 
nutanix@NTNX-CVM:101.101.0.13:~$ allssh manage_ovs show_uplinks
================== 101.101.0.14 =================
Bridge: br0
  Bond: br0-up
    bond_mode: active-backup
    interfaces: eth6 eth4
    lacp: off
    lacp-fallback: false
    lacp_speed: slow
================== 101.101.0.15 =================
Bridge: br0
  Bond: br0-up
    bond_mode: active-backup
    interfaces: eth6 eth4
    lacp: off
    lacp-fallback: false
    lacp_speed: slow
================== 101.101.0.13 =================
Bridge: br0
  Bond: br0-up
    bond_mode: active-backup
    interfaces: eth6 eth4
    lacp: off
    lacp-fallback: false
    lacp_speed: slow
nutanix@NTNX-CVM:101.101.0.13:~$ allssh manage_ovs show_interfaces
================== 101.101.0.14 =================
name  mode  link speed
eth0  1000 False  None
eth1  1000 False  None
eth2  1000 False  None
eth3  1000 False  None
eth4 10000  True 10000
eth5 10000  True 10000
eth6 10000  True 10000
eth7 10000  True 10000
================== 101.101.0.15 =================
name  mode  link speed
eth0  1000 False  None
eth1  1000 False  None
eth2  1000 False  None
eth3  1000 False  None
eth4 10000  True 10000
eth5 10000  True 10000
eth6 10000  True 10000
eth7 10000  True 10000
================== 1101.101.0.13 =================
name  mode  link speed
eth0  1000 False  None
eth1  1000 False  None
eth2  1000 False  None
eth3  1000 False  None
eth4 10000  True 10000
eth5 10000  True 10000
eth6 10000  True 10000
eth7 10000  True 10000

[root@NUTANIX-AHV1 ~]# cat /etc/sysconfig/network-scripts/ifcfg-br0
# Auto generated by phoenix
DEVICE=br0
NM_CONTROLLED=no
ONBOOT=yes
TYPE=OVSIntPort
DEVICETYPE=ovs
BOOTPROTO=none
IPADDR=101.101.0.10
NETMASK=255.255.255.0
GATEWAY=101.101.0.1
OVSREQUIRES="eth6 eth4"

 

[root@NUTANIX-AHV1 ~]# ovs-appctl bond/show br0-up
---- br0-up ----
bond_mode: active-backup
bond may use recirculation: no, Recirc-ID : -1
bond-hash-basis: 0
lb_output action: disabled, bond-id: -1
updelay: 0 ms
downdelay: 0 ms
lacp_status: off
lacp_fallback_ab: false
active-backup primary: <none>
active slave mac: cs:ds:sd:e3:ds(eth4)

slave eth4: enabled
  active slave
  may_enable: true

slave eth6: enabled
  may_enable: true

 


1 reply

Userlevel 1
Badge +1

Hello Senthil_P,

Thanks for reaching out to us. In absence of cluster access or any logs, we would say that we should following config on NICs.

    lacp: off
    lacp-fallback: true
    lacp_speed: off

Try changing below mentioned details on all hosts and check if it fixes reported issue. If not, I would recommend opening a case with Nutanix Support for more detailed investigation of reported issue.

Happy Troubleshooting !!

Reply