Warning health status but all services OK


Badge +3
Hi everyone

i've an health cluster alert on one of my cluster but when i select this alert i have nothing in warning, all services are green
i have resolved all old alerts



thanks for you help
guillaume

This topic has been closed for comments

14 replies

Userlevel 7
Badge +34
HI comgui

Let me see what I can do - Thanks
Badge +3
Hi aluciani

we still have the issue

as you can see we have warning on Cluster Health but nowhere we can find the issue itself (we have resolve and delete all alerts )

regards
Userlevel 2
Badge +11
Hi comgui

Can you try following?
1. Run check manually from the "Action" drop down (if you are in 5.0 and later)
2. You can select to run only failed and warning check

Let me know whether that solves the problem.
Badge +3
hi

thanks for your answer but we are stil in 4.7.4

is thre an other way to "refresh" .. it seems it's the problem

thanks
Badge +1
Hi, same thing happening here (v4.7.5). Did you ever find the solution to this issue?
Badge +3
after an update it's ok 🙂
Badge +4
I have this same issue in 4.77.1 - all checks show as green and fine , under the yellow exclamation, no checks show as warn, yet the status of my cluster always stays yellow.

This is one two separate clusters so not sure what nelse try, other than open a ticke

Badge
I have the exact same issue, no alerts, and my Cluster Services shows yellow. I think it's safe to say it's an annoying bug.
AOS: 5.1.3ESXi: 6.0 U3NCC: 3.1.2Foundation: 3.11

Mike
Badge +4
Hello

I did open a ticket , and there was some counters left from a previous issue that would never reset. For myself those values were :

stat_name: "check.score"stat_value: 74stat_name: "check.111038.score"stat_value: 74stat_name: "check.overall_score"stat_value: 74

After resetting those stat vlaues by hand, the cluster services have returned and stayed green

Best of luck to all

Paul
Badge
pflynn321

Thanks for the insight! I''m digging into the stale counters now.
Mike
Userlevel 3
Badge +6
We had the same issue occur on our clusters (35)... wish the fix could be automated somehow.
Userlevel 1
Badge +3
Hello

I did open a ticket , and there was some counters left from a previous issue that would never reset. For myself those values were :

stat_name: "check.score"stat_value: 74stat_name: "check.111038.score"stat_value: 74stat_name: "check.overall_score"stat_value: 74

After resetting those stat vlaues by hand, the cluster services have returned and stayed green

Best of luck to all

Paul

Please inform how to reset such counters. I'm having same trouble.
Hi Guys,

Same issue happening to me,

Hey guys,
you should check this KB : https://portal.nutanix.com/kb/1964
It worked fine for me.

Just like pflynn321 said, you need to reset your stats counters.

To do so, type these commands using SSH on one CVM of your cluster :

  • allssh ~/cluster/bin/genesis stop cluster_health
  • allssh ~/cluster/bin/genesis stop hyperint
  • allssh ~/cluster/bin/genesis stop prism
  • allssh ~/cluster/bin/genesis stop arithmos
  • allssh rm -Rf ~/data/arithmos/arithmos_per*
  • cluster start
Reconnect to prism and check the health status.

Good luck !

Note : these commands keep your cluster alive (no need to stop VMs or anything like that)