Do you often get confused between alert generated and Nutanix Cluster Check (NCC) failure?
Here are some points to understand them both:
Alert: Mechanism to report underlying issues in the system.
NCC: Tool to check the cluster health and report alert if required. (If there is a NCC failure then it does not always generates an alert.)
Three important sections of the above diagram:
- Notifications received - Cluster Health service.
- Configuration stored - IDF database.
- Alert reported - Alert Manager.
Configuration and Alert Reporting:
The alert reporting is based on the above configuration.
We can also fetch the above information via "ncli alerts get-alert-config" and update using "ncli alerts update-alert-config"
Tools:
For checking the alerts received to nos-alert: Zygrade
For checking the alert: Insights
Logs/Command:
Alert Manager leader: alert_tool
To check if the alert notification is send to email recipients: "alert_manager.INFO" log file in the alert manager leader.
To check about the alert generating notification/plugin: "health_server.log" log in the node generating the alert.
KB:
KB 1959 Which Alerts Automatically Generate a Support Case with Nutanix Support?
KB 2595 Nutanix Support Services: Pulse and Alerts