Question

Nutanix Cluster API v2 reports normal status when a node is "down"

5 years ago
15 January 2019
8 replies
2521 views

Userlevel 1

Maxim Grishin
Trailblazer
23 replies

Yesterday our Nutanix CE cluster suffered a node loss (onboard NIC went offline somehow, investigating), HA failover performed correctly, but I had Zabbix monitoring cluster API 2.0 which did not state there is a problem. The only difference was that the host stopped being visible in the discovery by its proper name and remained only by IP, every other reading were "normal" and "not degraded" and even cluster state was reported "normal" during host effective downtime. WTF?!

Also, WTF is with re-posting a question to another forums when asked to? Do you maybe have crosslinks allowed to ask the same question in a different subset of forums?

This topic has been closed for comments

8 replies

Userlevel 4

+19

sandeepmp
Nutanix Employee
64 replies
5 years ago
3 February 2019

@Maxim Grishin

How many nodes you have this cluster.

And can you share the API url which u used to check the cluster status?

Userlevel 1

Maxim Grishin
Author
Trailblazer
23 replies
5 years ago
5 February 2019

@Maxim Grishin

How many nodes you have this cluster.

And can you share the API url which u used to check the cluster status?

4 (CE).

URL was "GET /api/nutanix/2.0/cluster/" I was using Zabbix's JSON parser to extract the value of "operation_mode" which I think to be the current cluster status. Am I correct, or there should be some other parameters to watch?

Userlevel 1

Maxim Grishin
Author
Trailblazer
23 replies
5 years ago
5 February 2019

Yep, I also monitor "is_degraded" value from "/api/nutanix/2.0/hosts/UUID", which actually reported me zeroes for hosts that are down. "Not degraded but failed" - weird state you know.

Userlevel 4

+19

sandeepmp
Nutanix Employee
64 replies
5 years ago
5 February 2019

@Maxim Grishin

operation_mode is used to identify if the cluster is "Single node" , "multi node" ,etc

To identify the "Data resiliency" please use below APIs

Request URL:

V1
https://cluster_ip:9440/PrismGateway/services/rest/v1/cluster/domain_fault_tolerance_status/

V2
https://cluster_ip:9440/PrismGateway/services/rest/v2.0/cluster/domain_fault_tolerance_status/

Userlevel 4

+19

sandeepmp
Nutanix Employee
64 replies
5 years ago
5 February 2019

"https://next.nutanix.com/api-31/powershell-cdmlets-or-rest-api-to-get-data-resiliency-status-30979"

Userlevel 4

+19

sandeepmp
Nutanix Employee
64 replies
5 years ago
5 February 2019

https://next.nutanix.com/api-31/powershell-cdmlets-or-rest-api-to-get-data-resiliency-status-30979

Userlevel 4

+19

sandeepmp
Nutanix Employee
64 replies
5 years ago
5 February 2019

"is_degraded" flag is used to identify Degraded node status.

https://portal.nutanix.com/#/page/docs/details?targetId=Web-Console-Guide-Prism-v510:man-node-degraded-wc-c.html

Userlevel 1

Maxim Grishin
Author
Trailblazer
23 replies
5 years ago
19 February 2019

@sandeepmp v2 version of this API doesn't work, returns an XML-formatted error "java.lang.NullPointerException". v1 does display the data, although in a somewhat unfriendly way, as failures tolerable are listed per service, and no "minimum" value is readily available. There is also a nice value for under-replicated data which can be used to signify a disk had failed.

Still, is there an API under /api/nutanix/ (which seems to be the more modern way of querying the cluster) that would deliver similar info?

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded