Most of our customers upgrade their infrastructure once every couple of months and this guide helps our customers understand how to upgrade and what to expect from the process.
In older versions of AOS, upgrades were done using “one click” or “upgrade software”. With LCM available in newer versions, the upgrades are much simpler and straightforward. Here’s the documentation to understand pre-requisites of LCM and how LCM works:
The LCM works on one-node at a time and if the upgrade proceeds to the next node if the scheduled node is successful in upgrading to the target version. The LCM upgrade tasks can be monitored from “LCM” in Prism UI.
There is essentially no downtime required for the user VMs during the upgrades. With regards to concerns about performance, it’s always best to schedule maintenance window for the upgrades, especially with the cluster with 1G NIC cards. Please take a look at the following documentation to understand the sequence of components to be upgraded:
It is quite important to understand if the components can be upgraded to target version from the present version and here’s the URL to check the same:
Here’s the guide to check compatibility between the AOS and hypervisor:
If your hardware isn’t listed in the above matrix please check the vendor website for their compatibility with hyperconvergence and hypervisor. For example, to check the DELL hardware compatibility with AOS and hypervisors,
dell.com-->support-->browse all products-->converged infrastructure--->hyperconverged--->XC series appliances-->choose XC server-->browse documentation--->type "compatibility matrix" in search bar.
For all the other components like hypervisor, firmware, foundation etc; the to-go way of checking the available versions to upgrade would be to perform the LCM inventory. Performing LCM inventory will fetch all the compatible versions for each component. “Performing LCM inventory” is a non-disruptive process and can be automated as well. The LCM reaches download.nutanix.com to fetch the installers. Please note that hypervisors for other vendors have to be manually uploaded.
Here’s a helpful link to understand Acropolis upgrades:
First and foremost component of the Nutanix infrastructure to be upgraded is the NCC. This is because NCC makes sure that there are no false positives during the upgrades and gives accurate reports of the cluster health check at any point in the upgrade. This is a completely non-disruptive upgrade and is independent of all other components.
Please refer to the following documentation to understand pre-requisites of AOS upgrade:
The average time taken to upgrade AOS on each node is 15 minutes. The clusters with two nodes take longer than clusters with three or more nodes.The pre-upgrade tests are conducted before the upgrade to check if the cluster is ready for AOS upgrade. Pre-checks along with the potential causes for pre-upgrade test failures and solutions are listed in the article here: portal.nutanix.com/kb/6524
After successful completion of pre-upgrade tests, the fetched AOS software is uploaded locally to each of the CVMs, one at a time and reboot is observed at the end of each CVM upgrade. During the time when a CVM is down, the traffic is routed to another CVM in the cluster.
Upgrading Prism Central:
To understand how the PC upgrades work, please refer to:
The average time to upgrade a standalone PC is 25 minutes and scale out PC is 1 hour. Pre-Upgrade tests are conducted before starting the PC upgrade. All the tests have to be passed to proceed to the upgrade. There is no downtime during the pre-upgrade tests. Please refer to the below documentation to understand each of the pre-upgrade tests and potential causes of the failed tests: portal.nutanix.com/kb/6524
The pre-upgrade tests are run to verify if the cluster is ready for hypervisor upgrade. Apart from certain DELL modules, LCM should be able to upgrade all firmware. The “update all” option lets the LCM work intelligently and perform firmware upgrades in sequence. Customers need not worry about BMC or BIOS upgrading first as LCM takes care of it. The firewall requirements are listed in the documentation here:
LCM upgrades firmware one node at a time and makes sure the services are up and DR is OK before proceeding to the next node. The node boots into phoenix multiple times while the firmware upgrade is running and this is normal and not to worry.
Upgrading the Hypervisor:
Make sure to upgrade NCC and Firmware prior to Hypervisor upgrade.
For AHV: Please refer to the documentation here https://portal.nutanix.com/page/documents/details?targetId=Web-Console-Guide-Prism-v510%3Aupg-hypervisor-upgrade-ahv-c.html&a=6c512d227c25fcff1f8bac1b65e44554c4ca6cf8a558507fc607fc027662213f282f2f85696a4070
For ESXi: Please refer to the documentation here
For Hyper V: Please refer to the documentation here
For the hypervisor support policy, please refer to portal.nutanix.com/kb/3123
Pre-upgrade tests are run before the upgrades, please refer to the following KB about the same: portal.nutanix.com/kb/6524
What’s more appealing about LCM hypervisor upgrades is that it automates the migration of VMs, putting the host and CVM in Maintenance mode, performing the upgrade and bringing back the host online. The LCM will make sure Data Resiliency is OK before proceeding to the next host.
Here’s the URL to download the Nutanix software:
Issues with Upgrade? Check the portal.nutanix.com for the common error signatures from LCM failure tasks.