Nutanix Cluster Upgrade and What It means to you.

  • 16 July 2020
  • 0 replies
  • 4415 views

Userlevel 4
Badge +2

We often ask ourselves, what do these various Upgrades entail and what actually happens during an upgrade. Here is a high-level description of what happens during the various Nutanix core software upgrade processes. I have assumed it is a minimum 3-node cluster and you have already determined the Upgrade version. 


Note for all the Upgrade below, it is important to perform health checks before upgrading any component (check KB 2852).

 

1. NCC:

When : If health checks show no issues,  the NCC upgrade can proceed. It is recommended to upgrade to the latest NCC version shown. 

How : The upgrade process copies NCC software to each Controller VM or Prism Central VM.

What Happens : The cluster_health service, which is responsible for health monitoring and the logic underlying cluster alerts, restarts on each node. 

Caution Level :  No services involved in the data path restart hence it is non-intrusive. 

NCC version does not depend on any other Nutanix component and can be upgraded regularly and often to get the best out of the NCC health checks. 

This process typically takes about 5 minutes.

 

2. Foundation:

The Foundation service is typically only running if you are doing an LCM upgrade or expand cluster (add node) operation. 

When : If health checks show no issues.

How: The Foundation binaries are updated for all nodes.

What Happens: Controller VMs, hypervisor hosts, and all other services continue running and are NOT restarted or rebooted.

Caution Level: Similar to NCC ,No services involved in the data path restart hence it is non-intrusive. Foundation is recommended to be at the latest version available.

This process typically takes about 5 minutes.

 

3. AOS

When:  If health checks show no issues. Check compatibility matrix to determine Upgrade Version.

How: The process copies the AOS upgrade software to each Controller VM in the cluster. The Controller VM upgrades and restarts task’s process in a round-robin fashion for all Controller VMs.

What Happens :  The first Controller VM upgrades and effectively restarts.Storage traffic from guest VMs redirects to other Controller VMs in the cluster during the upgrade. Guest VMs might experience minor latency while traffic is redirected.

Caution Level: All cluster level activity remains operational. Only one CVM is allowed to go down at any given time. 

This process typically takes about 15 mins or so.

 

4. LCM

When : Run an inventory to see a full list of available updates. Pre-upgrade checks run to help ensure you can upgrade your cluster with LCM. Except for certain modules for Dell platforms, all firmware updates performed through LCM require the hosts to boot into a CentOS-based staging area called Phoenix.
 

How: LCM has built-in intelligence that tells it the order to do firmware updates, eliminating any user worry about choosing which updates to perform first. Simply select Update All and LCM will automatically satisfy all dependencies between all installed firmware.
 

What Happens : LCM will evacuate guest VMs from the hosts one-at-a-time and boot them into the Phoenix staging area to perform the updates. Guest VMs are not powered off. Your workload should continue to be served without disruption.

After firmware updates are completed. the node hypervisor host starts, followed by the node Controller VM, and ensures sure that all clusters services are up and running.

Finally, LCM ensures that the local hypervisor is schedulable and can host guest VMs before the upgrade continues to the next node.
 

Caution level: Depending on the firmware being upgraded, the hypervisor host might restart several times into Phoenix. This is expected behavior. In this case, do not intervene.

 

This process typically takes 45 minutes per node for SATA DOM firmware upgrades. Otherwise, the amount of time depends on the number of firmware updates occurring on a node and how long it takes to evacuate guest VMs from each host.
 

5. Prism Central: Upgrade Now Process

When: If Health checks show no issues.

How: First the software is staged, then the Prism Central VM upgrades and effectively restarts

What Happens:

  1. For single-VM Prism Central deployments :

The web console is momentarily unavailable but has no effect on the Prism Element clusters managed by Prism Central. Log on again to the web console and ensure that all upgrade tasks have completed to 100 percent.

This process takes approximately 25 minutes.

  1. For three-VM Prism Central scale-out deployments:

The first Prism Central VM upgrades and effectively restarts. The Prism Central VM upgrade and restart tasks process in a round-robin fashion for all Prism Central VMs. The web console is available during the upgrade. Ensure that all upgrade tasks have completed to 100 percent.

This process takes approximately 1 hour.

Additional Documentation:  

https://portal.nutanix.com/page/documents/details?targetId=Acropolis-Upgrade-Guide-v5_17%3AAcropolis-Upgrade-Guide-v5_17


This topic has been closed for comments