Solved

Criteria for Periodic Inspections of Nutanix Clusters

  • 26 September 2022
  • 7 replies
  • 71 views

Badge +2

Hello,

I'm a beginner at Nutanix

I regularly inspect the client's cluster

ncc health check, newly marked alerts, Investigate relevant kb and documentation
and Enter a command such as
cs | grep -v UP
gs
df -h
nodetool -h 0 ring 

From cvm

===================================================================

I also check IO Bandwidth Usage and Disk IOPS 

Is there a way to set the standard for judging these two questions to be safe?

Each cluster has a different structure and environment, but I want to set a reference point How should I consider it?

====================================================================

Thank you for reading this long commen

 

 

 

icon

Best answer by junsu 7 October 2022, 09:23

View original

This topic has been closed for comments

7 replies

Userlevel 4
Badge +4

Hello
The best advice I can take is to configure alert policies and receive notifications via email. It’s also possible so send a daily digest, which will contain actual cluster problems (if exists).

 

Badge +2

What is the average level of risk for io bandwidth?

When is the risk level of disk IOPS measured?

ㅠㅠ

Badge +2

https://portal.nutanix.com/page/documents/kbs/details?targetId=kA03200000098bBCAQ

I think we can get a hint from here roughly

 

Userlevel 4
Badge +6

What is the average level of risk for io bandwidth?

When is the risk level of disk IOPS measured?

ㅠㅠ

This is a very subjective question I'm afraid depending upon workload and hardware.

I have some clusters with 70k iops and 1ms latency and some different ones with 5k iops and 3ms latency. Both are happy and working.

I'd recommend establishing baselines for your clusters on what is 'normal' for them and investigate deviations from this. I found a noisy VM generating lots of writes pushing up the avg latency this way well before it was a problem. 

Badge +2

Is it okay to specify the baseline for normal as a measurement when the vm in the cluster is operating without any problems?

Userlevel 4
Badge +6

Hello,

 

Yes absolutely, set your baseline on a ‘good experience’ so you know what has changed if you have a bad experience.

 

The built in thresholds are good worst case scenarios when something is up, but you’d rather know earlier ;)

Badge +2

Thank you so much for your kind reply.

 

Have a nice day