NTP i.e. Network Time Protocol is a critical component of the Nutanix cluster and is crucial in keeping up hardware, processes, services and applications running with time synchronization across one another. Inconsistencies in time synchronization could lead to undesirable consequences, not to mention the potential disastrous impact it could have on databases and real time applications through operational failures. The downside of a poor NTP scenario could be data loss, hard to detect security breaches, even leading to legal liabilities, and loss of credibility.
NTP alerts are generated when you run the NCC Health checks, and are typically shown as below:
Detailed information for check_ntp:
Node 172.24.0.143:
FAIL: The hypervisor is not synchronizing with any NTP server. This might occur if none of the configured NTP servers are available or you are currently experiencing network instability determined by the high offset/high jitter.
Node 172.24.0.144:
FAIL: NTP leader is not synchronizing to an external NTP server
FAIL: The hypervisor is not synchronizing with any NTP server. This might occur if none of the configured NTP servers are available or you are currently experiencing network instability determined by the high offset/high jitter.
Node 172.24.0.145:
FAIL: The hypervisor is not synchronizing with any NTP server. This might occur if none of the configured NTP servers are available or you are currently experiencing network instability determined by the high offset/high jitter.
Refer to KB 4519 (http://portal.nutanix.com/kb/4519) for details on check_ntp or Recheck with: ncc health_checks network_checks check_ntp --cvm_list=172.24.0.143,172.24.0.144,172.24.0.145
There are a number of interwoven commands that you should run while troubleshooting a NTP-related issue. I often begin with checking the NTP functionality and status on the
ESXi Hosts or AHV Hosts on the Nutanix cluster:
Check the NTP running status: # ntpq -pn Check the NTP configuration file: # cat /etc/ntp.conf Find out the DNS names of the NTP server: # dig -x <IP-address-of-the-NTP-server> to check if they are internal servers or external, public ones. # timedatectl Status of the NTPD service: # systemctl status ntpd # systemctl status ntpd.service Restart the NTPD: # systemctl restart ntpd.service Some customers use Chrony: # systemctl status chrony # cat /sys/devices/system/clocksource/clocksource0/current_clocksource # ntpq ntpq > assoc ntpq > rv <assoc-id> Look specifically at “dispersion” rate. Should generally be below 10.00 . If the dispersion rate is more than 16, the NTP servers could become unreliable. Also, “jitter” should be lower than 16. # egrep ntp /var/log/messages* If, for example you are working on an Oracle UEK VM, look specifically for the following lines: c016 06 restart freq_set kernel 12.589 PPM ntpd exiting on signal 15 Typically, on a ESXi or AHV Host, you can restart the NTPD service: # /etc/init.d/ntpd restart Some commands that you might find useful, occasionally, as a last resort: # ntpdate -uv <NTP-server> # ntpdate -buv <NTP-server> # ntpdate -dv <NTP-server> Look for “dispersion”, it should be lower than 16 for a reliable NTP source. |