Question

Nutanix Files Deployment Error: DGW Ip Unreachable

  • 17 March 2020
  • 9 replies
  • 2158 views

I’m attempting to deploy Files on my cluster and it fails stating the Default Gateway IP is not reachable however I can ping between systems on the desired networks. The only thing I can think of is that the DFW does not respond to ping, is that required?


9 replies

Userlevel 2
Badge +4

Check the minerva_cvm.log (CVM’s /home/nutanix/data/logs folder) for any additional hints for issues.

Are the CVMs and the Files clusters Storage side facing IPs on the same network?

The only thing that I get is:


nutanix@NTNX-15SM60350069-B-CVM:xx.xx.xx.xx:~/data/logs$ cat minerva_cvm.log
2020-03-18 08:53:20 rolled over log file
2020-03-18 08:53:20 INFO 91380112 logpipe_utils.py:153 logpipe process started 8801
2020-03-18 08:53:20 INFO 91380112 minerva_cvm:41 MinervaCvm not monkey patched
2020-03-18 08:53:20 INFO 91380112 zookeeper_session.py:131 minerva_cvm is attempting to connect to Zookeeper
2020-03-18 08:53:20 INFO 91380112 server.py:185 Starting the leadership thread
2020-03-18 08:53:20 INFO 91380112 zookeeper_session.py:131 minerva_cvm is attempting to connect to Zookeeper
2020-03-18 08:53:20 INFO 91380112 namespace.py:190 Loading resource util.mincli.modules
2020-03-18 08:53:20 INFO 54229424 server.py:751 Leader is  xx.xx.xx.xx
2020-03-18 08:53:20 INFO 91380112 namespace.py:190 Loading resource minerva.tools.cvm.modules
2020-03-18 08:53:20 INFO 91380112 server.py:139 Starting the minerva afs cli service and minerva rpc services.
2020-03-18 08:53:20 CRITICAL 54229584 decorators.py:47 Traceback (most recent call last):
  File "/home/jenkins.svc/workspace/postcommit-jobs/nos/euphrates-5.10.9.1-stable/x86_64-aos-release-euphrates-5.10.9.1-stable/builds/build-euphrates-5.10.9.1-stable-release/minervacvm-python-tree/bdist.linux-x86_64/egg/util/misc/decorators.py", line 41, in wrapper
  File "/home/jenkins.svc/workspace/postcommit-jobs/nos/euphrates-5.10.9.1-stable/x86_64-aos-release-euphrates-5.10.9.1-stable/builds/build-euphrates-5.10.9.1-stable-release/minervacvm-python-tree/bdist.linux-x86_64/egg/minerva/cvm/server.py", line 154, in __serve_http
  File "/home/jenkins.svc/workspace/postcommit-jobs/nos/euphrates-5.10.9.1-stable/x86_64-aos-release-euphrates-5.10.9.1-stable/builds/build-euphrates-5.10.9.1-stable-release/minervacvm-python-tree/bdist.linux-x86_64/egg/util/net/wsgi_utils.py", line 114, in wsgi_gevent_server_create
  File "/home/jenkins.svc/workspace/postcommit-jobs/nos/euphrates-5.10.9.1-stable/x86_64-aos-release-euphrates-5.10.9.1-stable/builds/build-euphrates-5.10.9.1-stable-release/minervacvm-python-tree/bdist.linux-x86_64/egg/util/net/wsgi_utils.py", line 54, in create_socket
  File "<string>", line 1, in bind
error: [Errno 98] Address already in use

 

But I’ve changed addresses specified and still get exactly the same errors

Userlevel 4
Badge +5

Hello @S.Riser 
I tried the same in my lab environment, if the G/W IP is not reachable, it won’t let you create a FS. 

I got the following response 

 



As you informed, that you changed your IP to other IP
Can you check if it’s being used by any other VM or networking config? 
 

I’ve confirmed that the DGW is operational and that I can get to other subnets but in our network the DGW is not pingable; is that what’s actually failing?
Th subnets I’m on are dedicated to the Nutanix infrastructure so I know the IPs aren’t in use and I’m not certain why I’m getting that error.

Userlevel 2
Badge +4

Let’s take this layer by layer…

On a CVM, can you do an “arp -i eth0 -vn” (show arp on interface 0, verbose and numeric format)

Does the MAC address for the gateway match with what you know? (verify from another source?)

Are you using anything like VRRP by any chance or link aggregation to your hosts?

Is IPv6 enabled on the gateway?

Can you confirm if the switch ports show any port level errors?

 

On one or more CVMs, check the /home/nutanix/data/logs/sysstats/ping_gateway.INFO 

tailf ping_hosts.INFO

All,

I appreciate the responses but I’ve already established that the Default Gateway will not respond to pings, its configured to not respond to ICMP, that is by design. Is a ICMP response required from the DGW to get Files deployed?

So ping is now enabled for the DGW but I cannot get the CVM’s to receive a ping from the DGW but I can from other systems. I’m at a complete loss.

  1. If VLAN Tags were wrong, I shouldn’t have any exterior network connectivity but I do.
  2. If the DGW were wrong, I wouldn’t be able to route, but I can.
  3. No other errors exist on the cluster during normal opps, just that the DGW is unreachable.

I’m opening a support ticket but if anyone has any other ideas, please let me know.

Userlevel 2
Badge +4

Can you ping the DFW from any of the hosts on this cluster?

Are there any ACLs on the gateway with respect to the VLANs where CVMs/Hosts are?

Could you put a test/dummy VM (a Linux “Live” boot for example) on the same VLAN for testing - can it ping the gateway?

You didn’t specify if you were on AHV or ESXi, they have different troubleshooting methods, but I would start by noting the configuration and test results of each component and start eliminating things at various layers… I’ve seen how a simple miss-configuration on either side (gateway or cluster) could cause this.

 

Something else to consider: based on the Minerva logs, 

  File "/home/jenkins.svc/workspace/postcommit-jobs/nos/euphrates-5.10.9.1-stable/x86_64-aos-release-euphrates-5.10.9.1-stable/builds/build-euphrates-5.10.9.1-stable-release/minervacvm-python-tree/bdist.linux-x86_64/egg/util/net/wsgi_utils.py", line 54, in create_socket

  File "<string>", line 1, in bind

error: [Errno 98] 
Address already in use

 

The issue seems to be with binding to a port - and the “Address already in use” message along with the “in create_socket” tells us that something is already listening on an intended port that this setup is trying to use… I think this one may be something support will need to solve.

Userlevel 3
Badge +4

All,

I appreciate the responses but I’ve already established that the Default Gateway will not respond to pings, its configured to not respond to ICMP, that is by design. Is a ICMP response required from the DGW to get Files deployed?

As of AOS 5.10 this should not block Files deployment. 
reference: https://portal.nutanix.com/page/documents/details?targetId=Release-Notes-Acr-v510:Release-Notes-Acr-v510 

 

Reply