Data Resiliency Status shows error

  • 19 September 2014
  • 3 replies
  • 9055 views

Badge +5
Hello All,

For one of the customers, I did a fresh install and on that one, when I check the data resiliency status, it shows me that for Extend Group, it is 0. The explanation is as follows: "Based on placement of extend group replicas the cluster can tolerate a maximum of 0 node failure." Do you know what this can be about? There is no error or warning on the cluster.

Cheers..

3 replies

Badge +3
I would start by running a `cluster status | grep -i down` on a CVM to determine if any of the Cluster services are not running. If it's the error I think it is, it's usually because one of the CVMs isn't entirely booted, or is shut down.
Userlevel 1
Badge +6
Is the cluster using different nodes, specifically different SSD and/or HDD capacities per node? As far as I understand it, the cluster may not be able to tolerate losing a high storage node, unless there is at least one other node with the same capacity available.
Badge +1
Hi   

I powered off a node and then powered it back and I received the same messages to do with Data Resiliency Status for Oplog and Extent Group.

After checking that the cluster was up and stable, no down messages for the ncli command "cluster status | grep -i down", I was still seeing these messages in the PRISM console.

To resolve this I navigated to the Curator Master Status Page and performed a Full Scan Task as indicated below. This command was mentioned in the following post to do with manually initiating curator scans when patching and having to reboot the Hypervisor or cluster.

http://next.nutanix.com/t5/Installation-Configuration/Curator-scan-initiate/m-p/4543/highlight/true#M629

MOD PaulR

Initiating curator scans manually can be done by browsing to the curator master's IP on port 2010 to a specific "control" page as follows: Http://{Curator-Master-CVM-IP}:2010/master/control
From there you can choose "partial" or "full" scan.Monitor progress at the :2010/ top-level page.Note, you'll likely have to adjust iptables to open the 2010 port, or *temporarily* stop that service.

After the Full scan completed the Opslog 0 and Extent Groups 0 messages went away and the Data Resiliency Status went Green and OK. Hope this helps someone else.

Norm

Reply