New

CVM unable to get IP address due to Duplicate IP/Proxy ARP

  • 24 July 2020
  • 1 reply
  • 2222 views

Have you ever had a scenario where a CVM is unable to get or acquire an IP address due to duplicate IP/Proxy ARP ?  We have seen it happen, it can happen.

Let us say you have an issue with a CVM, let us say 10.130.121.105 for our discussion, unavailable on the cluster and unavailable on the network.

We need to look at the resolution steps to figure out what needs to be done.

To begin with, you run into issues while configuring the IP address for the CVM and restarting the network, on the following lines:

nutanix@NTNX-<SerialNumber>-A-CVM::/etc/sysconfig/network-scripts$ sudo service network restart
Restarting network (via systemctl): Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.
[FAILED]

nutanix@NTNX-<SerialNumber>-A-CVM::/etc/sysconfig/network-scripts$ sudo /etc/init.d/network stop
Stopping network (via systemctl): [ OK ]

nutanix@NTNX-<SerialNumber>-A-CVM::/etc/sysconfig/network-scripts$ sudo /etc/init.d/network start
Starting network (via systemctl): Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.

You find that the contents of the Network configuration file appear to be perfectly fine:

nutanix@<SerialNumber-A-CVM::/etc/sysconfig/network-scripts$ cat ifcfg-eth0
# Auto generated by CentosNetworkInterfacesConfig on xxx xxx xxx xxx 2020

ONBOOT="yes"
MTU="1500"
NM_CONTROLLED="no"
NETMASK="255.255.255.128"
IPADDR="10.130.121.105"
DEVICE="eth0"
TYPE="Ethernet"
GATEWAY="10.130.121.1"
BOOTPROTO="none"

nutanix@<SerialNumber>-A-CVM::/etc/sysconfig/network-scripts$

You could discover a duplicate address scenario:

nutanix@<SerialNumber>-A-CVM::/etc/sysconfig/network-scripts$ sudo ifup eth0
ERROR : [/etc/sysconfig/network-scripts/ifup-eth] Error, some other host (AC:1F:6C:45:99:5E) already uses address 10.130.121.105.
nutanix@<serialNumber>::/etc/sysconfig/network-scripts$

Upon further investigation, you find that for some obscure reason, the vmk0 has taken on the MAC address of the vmnic2 NIC instead of the vmnic0 NIC's MAC address:

[root@USCANAMC01:~] esxcli network nic list

<<<<<<<<< Whatever >>>>>>>>>>>>>

[root@USCGNAMC01:~] esxcli network ip neighbor list
Neighbor Mac Address Vmknic Expiry State Type
-------------- ----------------- ------ -------- ----- -------
10.130.121.1 6c:20:57:2f:d8:c4 vmk0 475 sec Unknown
10.130.121.102 00:e0:ee:7d:7b:48 vmk0 717 sec Unknown
10.130.121.105 ac:1f:6c:45:99:5e vmk0 1161 sec Unknown
10.130.121.106 00:0c:59:07:9e:54 vmk0 1151 sec Unknown
192.168.5.254 00:50:56:ac:7d:04 vmk1 408 sec Unknown
[root@USCGNAMC01:~]

An ARP broadcast initiated from the functioning CVM 10.130.121.106 continues to show this MAC address while probing CVM 10.130.121.105:

nutanix@<SerialNumber>:10.130.121.106:~$ sudo arping -I eth0 -s 10.130.121.106 10.130.121.105
ARPING 10.130.121.105 from 10.130.121.106 eth0
Unicast reply from 10.130.121.105 [AC:1F:6C:45:99:5E] 11.371ms
Unicast reply from 10.130.121.105 [AC:1F:6C:45:99:5E] 11.593ms
Unicast reply from 10.130.121.105 [AC:1F:6C:45:99:5E] 11.765ms
Unicast reply from 10.130.121.105 [AC:1F:6C:45:99:5E] 11.769ms
Unicast reply from 10.130.121.105 [AC:1F:6C:45:99:5E] 25.627ms
Unicast reply from 10.130.121.105 [AC:1F:6C:45:99:5E] 25.729ms
Unicast reply from 10.130.121.105 [AC:1F:6C:45:99:5E] 25.617ms
Unicast reply from 10.130.121.105 [AC:1F:6C:45:99:5E] 26.334ms
Unicast reply from 10.130.121.105 [AC:1F:6C:45:99:5E] 26.708ms
Unicast reply from 10.130.121.105 [AC:1F:6C:45:99:5E] 26.514ms

You are unable to find the MAC address AC:1F:6C:45:99:5E on the CAM table of the external switch, say a Cisco switch in our example, connected to this Block.

Try disabling the following parameters on the Cisco switch ports connected to this Host:

ip arp inspection list

ip arp inspection list limit rate 100

The only workable option you now have is to edit the Network configuration file on the CVM and eliminate the ARP check:

nutanix@<SerialNumber>:10.130.121.105:~$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Auto generated by CentosNetworkInterfacesConfig on xxx xxx xxx xxx 2020

ARPCHECK="NO"
ONBOOT="yes"
MTU="1500"
NM_CONTROLLED="no"
NETMASK="255.255.255.128"
IPADDR="10.130.121.105"
DEVICE="eth0"
TYPE="Ethernet"
GATEWAY="10.130.121.1"
BOOTPROTO="none"

You should now be able to restart the network service on CVM 10.130.121.105

Restart the genesis services and verify the node getting added to the Cassandra Metadata ring.

Add the line ARPCHECK=NO to the network configuration file on the CVM 10.130.121.106  as well:

nutanix@<SerialNumber>:10.130.121.106:~$ cat /etc/sysconfig/network-scripts/ifcfg-eth0
# Auto generated by CentosNetworkInterfacesConfig on xxx xxx xxx xxx 2020

ARPCHECK="NO"
GATEWAY="10.130.121.1"
NM_CONTROLLED="no"
NETMASK="255.255.255.128"
IPADDR="10.130.121.106"
DEVICE="eth0"
TYPE="Ethernet"
ONBOOT="yes"
BOOTPROTO="none"

Restart the network on this CVM.

 

Bottomline:  The ARP check needs to be disabled in the above scenario.  As with any other scenario, review your work before carrying out the steps.

Thank you.  Hope this article helped you fix the CVM “unable to acquire an IP address” issue.


1 reply

Userlevel 1
Badge +2

Very nicely written!

Reply