Time Synchronization on Nutanix Cluster

  • 24 October 2019
  • 0 replies
  • 30508 views

Userlevel 3
Badge +4

Reliable and Accurate Time Sync is mandatory for distributed services to work in a reliable / efficient manner.

Network Time Protocol (NTP) is used across different devices and services on a network to maintain reliability and integrity of services, data and other critical functions.

Nutanix - AOS, built on web-scale engineering principles, distributes roles and responsibilities to all nodes within the system to form a large cluster of services working together. Accurate time sync becomes a vital requirement for all the different components to work reliably and help keep up system integrity.

Accurate time sync, not just offers integrity and smooth operations but offers a lot of value even when things don’t work as they should. During troubleshooting of any service, timestamps are used to understand and co-relate root-cause, impact of the problem.

In order for a distributed system such as Nutanix AOS to work smoothly - NTP is of critical importance. 

CVMs (Controller Virtual Machine) that comprise a Nutanix cluster get their time by syncing to a single member which is known as the NTP Leader (Genesis Master). This CVM / Node will be responsible for syncing with whatever NTP servers are added to Prism.

When NTP is properly configured, the Leader CVM will set its own clock to the time provided by the server and then all other CVMs will sync with the Leader's time.

If the Genesis service is restarted on the NTP Leader, the role of syncing with external time servers will be passed to the next CVM, chosen to be Genesis Master.

If an NTP server is not yet configured in Prism or the NTP server is unusable for any reason, the NTP Leader will get its time from its local clock and the other CVMs will sync with this time.

Nutanix AHV Hosts utilise the same list of servers defined in Prism and being used by the NTP Leader, will be configured on each host to sync with independently. 

All other hypervisors (ESXi, Hyper-V, XEN) need to have NTP configured separately using their unique management tools. 

Ensuring CVMs are configured and syncing with a reliable time source:

Following ncc (Nutanix Cluster Check utility) checks for any problems with NTP configuration on all the CVMs in a cluster:

nutanix@CVM:~$ ncc health_checks network_checks check_ntp

 

To List Configured Time Sources from a CVM shell:

ncli cluster get-ntp-servers

 

Check Cluster NTP Status for All Configured CVMs:

nutanix@CVM:~$ allssh ntpq -pn


Detailed Statistics on Local CVM Connection to a Single Remote NTP Server

nutanix@CVM:~$ ntpdate -qdv <remote ntp server ip>

 

Checking the NTP leader on a Nutanix Cluster:

We will run the command “allssh ntpq -pn” on any cvm to see time sources for all CVMs and also which cvm is the NTP Leader.

 

nutanix@CVM:192.168.1.1:~$ allssh "ntpq -pn"

================== 192.168.1.2 =================
remote refid st t when poll reach delay offset jitter
==============================================================================
*192.168.1.1 83.98.201.134 3 u 67 1024 377 1.297 0.223 0.350

================== 192.168.1.3 =================
remote refid st t when poll reach delay offset jitter
==============================================================================
*192.168.1.1 83.98.201.134 3 u 329 1024 377 0.785 -0.012 0.100

================== 192.168.1.4 =================
remote refid st t when poll reach delay offset jitter
==============================================================================
*192.168.1.1 83.98.201.134 3 u 652 1024 377 1.064 0.142 0.190


================== 192.168.1.1 =================
remote refid st t when poll reach delay offset jitter
==============================================================================
+83.162.149.224 193.67.79.202 2 u 701 1024 377 7.337 0.467 0.129
+80.101.10.35 211.207.249.90 2 u 984 1024 377 17.808 0.727 0.727
-149.210.142.45 131.211.8.244 2 u 270 1024 377 3.259 -0.076 0.175
*83.98.201.134 238.213.222.236 2 u 1 1024 357 2.806 0.193 0.136
127.127.1.0 .LOCL. 10 l 29h 64 0 0.000 0.000 0.000

================== 192.168.1.5 =================
remote refid st t when poll reach delay offset jitter
==============================================================================
*192.168.1.1 83.98.201.134 3 u 846 1024 377 0.793 -0.017 0.166

We can see from the output above, we have five nodes (5 x CVMs) cluster, CVM 192.168.1.1 is the NTP leader and is synchronising itself from NTP servers defined in Prism.

Other CVMs on the same cluster (192.168.1.2 – 192.168.1.5) are synchronising their time from the NTP Leader, i.e. 192.168.1.1

 

Keep your Nutanix Clusters Healthy by ensuring time sync is from a reliable, reachable time source.

Read more about NTP:

NTP

Recommendations for Time Synchronization


This topic has been closed for comments