USEFUL COMMANDS TO TROUBLESHOOT PE-PC CONNECTIVITY ISSUES
We have lots of scenarios where you might see alerts related related to PE-PC connection failure. This generally happens when network connectivity between PE and PC clusters are disrupted due to any reasons such as PC VM being rebooted, upgrading PC cluster , network issues, HTTP proxy issues(incorrect proxy whitelists) and port issues etc.
Firstly, when does PE-PC connectivity alert is raised on PE:
1)The alert is raised when PE-PC connectivity was disrupted for at least 6 minutes as of AOS 5.10.(Prios to AOS 5.10 alert is generated at a single instance of 2 minutes)
Sometimes, PE-PC connectivity checks shows red heart on Prism even, if the connectivity is fine and none of the above reasons are present.
In this case i.e if you verify there are no underlying PE-PC connectivity issue present, manually reset the check. Turn the check OFF and Turn it back ON from the Health page by clicking on this check like this below.
To troubleshoot some of the PE-PC connectivity issues below are a set of common scenarios and useful commands which should be run from CVM on affected PE cluster and PC-VM of the PC cluster:
1)Port issues: To verify port connection is fine check if port 9440 is open on PC:
nutanix@CVM$ nc <prism_central_ip_address> 9440 -v
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to x.x.x.x:9440.
The output should be something like the above. If port issues persists you should open the port 9440 on the PC.
2) HTTP proxy issues: In lots of scenarios we see this issue when http proxy is configured in the environment on PE or PC cluster but proxy whitelists are not added correctly on PE and PC clusters which could cause connection issues between PE and PC.
The following must be done when proxy setting is in place in PE and PC:
- Add Prism Central IP in PE proxy whitelist
- Add Prism Element IP and all the CVMs IPs in PC Proxy whitelist.
The proxy settings can be configured from Prism dashboard or also from nCLI like below:
nutanix@cvm$ ncli http-proxy ls
nutanix@cvm$ ncli http-proxy get-whitelist
nutanix@cvm$ ncli http-proxy add-to-whitelist
3)Remote connections issues: Whenever you register your PE cluster on PC, a remote connections is created between both of them. Sometimes, we can observer remote connections issues due to the above mentioned reasons which could lead to PC being marked “disconnected” on PE dashboard and other issues.
Useful commands to troubleshoot:
a)List all remote connections:
> PE CVM:
<nuclei> remote_connection.list
Name UUID
> PC CVM:
<nuclei> remote_connection.list_all
Name UUID
b) Get multicluster status: This command should be run from both PE cluster and PC cluster:
CVM$ ncli multicluster get-cluster-state
Cluster Id : 00056528-ea71-f155-60a7-6805ca7bf746
Cluster Name : IN-BLR-VDICLS1
Is Multicluster : false
Controller VM IP Addre... : e10.51.148.153, 10.51.148.154, 10.51.148.155, 10.51.148.156, 10.51.148.157, 10.51.148.158, 10.51.148.159, 10.51.148.160, 10.51.148.161, 10.51.148.162, 10.51.148.163, 10.51.148.164, 10.51.148.165, 10.51.148.166, 10.51.148.167, 10.51.148.168, 10.51.148.169, 10.51.148.170, 10.51.148.171, 10.51.148.172, 10.51.148.173]
External IP Address : 10.51.148.174
Marked for Removal : false
Remote Connection Exists : true <--------- Should be "true"
3)Remote connection health_check:Check Nuclei RC health status by running the following command on the available RCs in PE and PC to completely rule out it is a API v3 connectivity issues:
<nuclei> remote_connection.health_check <rc_name>
or
<nuclei> remote_connection.health_check_all
The above should also be run from both PE and PC cluster.
4) Check API response: If the above RC health check fails, then we should look at checking API response. Once we have logged in to the PE, from the same browser run the following:
https://<pe_ip>:9440/PrismGateway/services/rest/v1/multicluster/cluster_external_state
If the API connectivity is successful, the response will contain "reachable":true
t{"clusterUuid":"7aca431c-9bc8-4bd0-803a-9b49e550e942","clusterDetails":{"clusterName":"Unnamed","ipAddresses":U"10.5.222.90"],"multicluster":true,"username":"00055de7-3cc7-05fb-0000-000000004433","password":"41119365494582941674608439812739","prcCluster":false,"reachable":true},"configDetails":{"externalIp":""},"filters":e],"clusterTimestampUsecs":0,"nosVersion":null,"nosFullVersion":null,"markedForRemoval":false,"remoteConnectionExists":true}
If the API response is "reachable":false then there is an API connectivity issue between the PE and PC. The next step is to check if a proxy is configured for the PC as Proxy White listing might be needed.
d{"clusterUuid":"4b7b1a77-c591-477a-9501-37a39a4f8dfc","clusterDetails":{"clusterName":"Unnamed","ipAddresses":""10.246.73.47"],"multicluster":true,"username":"00056528-ea71-f155-60a7-6805ca7bf746","password":"65157945596046308979116553149711","prcCluster":false,"reachable":false},"configDetails":{"externalIp":""},"filters":e],"clusterTimestampUsecs":0,"nosVersion":null,"nosFullVersion":null,"markedForRemoval":false,"remoteConnectionExists":true}]
To troubleshoot further on REST API issues and to see what logs to check for here is a post on that https://next.nutanix.com/api-31/logs-to-check-for-rest-apis-and-apache-http-issues-37761 .
Another amazing post on PE-PC connectivity alerts https://next.nutanix.com/how-it-works-22/what-to-do-with-prism-element-prism-central-connectivity-alerts-37401
What's Next ?
In case none of the above scenarios help you to fix the issue, we might need to reset the PE-PC remote connection or unregister and re-register the PE on PC again after some deep troubleshooting and checking certain logs. For such scenarios it would be best to engage Nutanix support and let the technical expert take over from there.
Relavant KBs
1) 6970 - PE-PC Connection Failure alerts
2)3379 - cluster_connectivity_status check
3)5356 - PC is disconnected on PE dashboard because of incorrect proxy whitelist