NTP warnings on NCC

3 years ago
28 July 2020
0 replies
4315 views

Userlevel 2

+3

Nashma
Nutanix Employee
22 replies

Network Time Protocol (NTP) is a protocol for clock synchronisation between computers. The hosts and CVMs in a Nutanix cluster must be configured to synchronise their system clocks with a list of stable NTP servers. Generally, at least 1 (one), but preferably 3 (three) or more reliable off-cluster NTP servers are configured on the cluster. To avoid split brain scenarios, it is always recommended to configure an odd-number of NTP servers.

Some of the primary things when it comes to NTP configuration and troubleshooting can be covered using these quick links. Please note that most of the links here are for the latest LTS version - AOS 5.15(as of July 28th, 2020).

To configure NTP servers on the CVMs and AHV, see Configuring NTP Servers in the Prism Web Console Guide. (Configuring NTP servers via Prism will update both the CVMs and the AHV hosts).
To configure NTP servers on ESXi hosts, see Configuring Network Time Protocol (NTP) on ESX/ESXi hosts using the vSphere Client (2012069).
To configure NTP servers on Hyper-V hosts, scroll down to Configuring NTP on Hyper-V section of the KB-4519.
For recommendations on which NTP servers to use, see Recommendations for Time Synchronization.
Troubleshooting NTP Sync to Windows Time Servers can be done using this KB article.

NTP and DNS issues are some of the most frequent NCC failures seen. We generally see many common scenarios in NTP configuration issues. Here is a quick guide to have a look at the most common issues and the directions to fix them.

The most common causes of NTP check warnings are listed below.

There are no NTP servers configured on the cluster.
There are no NTP servers configured on the hypervisor.
All or some NTP servers configured on the hypervisor are not the same as those configured on the (P)CVMs.
A configured NTP server is not reachable or not responding to NTP queries.
A configured NTP server is not reliable or stable.
The NTP server is configured with a hostname but cannot be resolved due to DNS/name resolution issues.
NTP port (UDP/123) is not open.
The NTP server is passing a parameter that the (P)CVM NTP client considers unsuitable for NTP synchronization, such as a high dispersion value, offset, jitter, reach or stratum.
A Windows-based NTP server (AD PDC) that uses its local clock as its time source, by default, will advertise itself as a less suitable NTP source by including a dispersion value of 10 seconds in the NTP parameter of that server. W32time is not designed with the precision required for NTP and does not guarantee better than +/- 5-minute tolerance.
The genesis service has recently restarted and NTP synchronization is still pending, or if the NTP configuration has been changed, the effect might take 10 minutes. Waiting and rerunning the check after 10-15 minutes may produce a different result if this has provided sufficient time for the change to take effect and synchronize.
The time on the cluster is out of sync and found to be in the future by at least 5 seconds when compared to the actual time on the NTP servers.

NTP sync, when off by a greater margin, might cause a few problems in cluster operations like inaccurate logging and log collection, guest OS time skew, users not being able to log on to the Prism web console using LDAP or other directory integrated services etc.

The best document for any NTP related troubleshooting is the Knowledge Base Article #4519. Always refer the General Troubleshooting steps first and then proceed with the specific troubleshooting steps as per the warning message pointed out by the NCC check.

Keep this checklist handy while troubleshooting any NTP related issues.

Refer the KB-4519 and follow the general troubleshooting steps.
Make sure you are following the best practice guidelines for NTP configuration.
Ping the NTP servers configured from the CVMs/Hosts. Easy way to collect NTP server IP/hostname is from the NTP Servers tab in the Settings pane of the Prism UI.
Keep the 'ntpq' command handy. The usage of this command has been better explained in the KB. This helps check the NTP parameters like dispersion value, offset, jitter, reach or stratum etc, along with finding the NTP leader for the cluster. This helps navigate through to the right set of logs and the problematic entity.
Also, always be careful while working with the config files.

And as a standard procedure, always run a complete NCC Health report to check for any other critical failures on the cluster.

This topic has been closed for comments

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded