How It works
Have questions about how the Nutanix Platform works? Looking to get started - start here!
- 1,191 Topics
- 1,844 Replies
What is Nutanix Guest Tool (NGT)? Nutanix Guest Tools (NGT) is a software based in-guest agent framework which enables advanced VM management functionality through the Nutanix Platform. The solution is composed of the NGT installer which is installed on the VMs and the Guest Tools Framework which is used for coordination between the agent and Nutanix platform. The NGT installer contains the following components: Guest Agent Service Self-service Restore (SSR) aka File-level Restore (FLR) CLI VM Mobility Drivers (VirtIO drivers for AHV) VSS Agent and Hardware Provider for Windows VMs App Consistent snapshot support for Linux VMs (via scripts to quiesce) This framework is composed of a few high-level components: Guest Tools Service Guest Agent The figure shows the high-level mapping of the components: Important notes: NGT uses TCP/IP network connectivity secured with SSL. The installation includes identifiers unique to the VM and the cluster, but you can pre-install NGT on a clone ba
Listing the recommended order of Upgrade for various components of Nutanix Cluster : 1. Start with Prism Central(PC) - (If Deployed) Upgrade NCC Upgrade PC Run NCC health checks. Check for impacted clusters. 2. Then Upgrade Prism Element (PE): Upgrade NCC on all your clusters Run NCC health checks Correct any issues found before moving forward. 3. Upgrade Foundation to the latest on each PE. 4. Update Lifecycle Manager (LCM) Perform LCM inventory (Which updates the LCM framework to the latest software) During this time, Do not upgrade anything else on that cluster. This step only updates the LCM framework itself and also performs inventory. No reboots or upgrades to the server are performed. 5. Upgrade AOS on each PE cluster. 6. Update Firmware versions using LCM This step should be performed only if all the previous steps are complete and health checked. 7. Upgrade Hypervisor. For AHV Clusters : Upgrade AHV to bundled version from the PE
Let’s say you want to know what are the uses and differences between Prism Element, Prism Central and Prism Pro Nutanix Prism is the centralized management solution for Nutanix environments, however, Prism comes in different flavors depending on the functionality needed. Prism Element: It is a service already built into the platform for every Nutanix cluster deployed. It provides the ability to fully configure, manage, and monitor Nutanix clusters running any hypervisors, however, It only manages the cluster it is part of. Prism Central: It is an application that can be deployed in a VM or in a scale/out cluster of VMs (Prism Central Instance) that allows you to manage different clusters across separate physical locations on one screen and offers an organizational view into a distributed environment. Each Prism Central VM can manage 5,000 to 12,500 VMs, where Prism Central Instance (a three Prism Central VMs cluster) can manage up to 25,000 VMs. Prism Pro: It is a feature set
Nutanix presents us with many management interfaces like HTML5(Prism), REST API, acli and ncli for managing and troubleshooting and maintaining work infrastructure. We will look into how to access the Nutanix Command line Interface and what the capabilities and purpose of acli and ncli command-lets areaCLI: Acropolis Command Line Interface Utility to create, modify and manage VMs in AHV. Provides extra abilities(commands) to manage AHV host networking, manual snapshot etc Cannot manage Nutanix Cluster and so we have ncli nCLI: Nutanix Command Line Interface Utility to manage the entire Nutanix cluster operations. Ncli is more extensive and complex command set To access the CLI, You can install it on your local machine Check out how to install ncli on your local machine here. From any Controller VM SSH to any CVM as a nutanix user and type ncli and hit return to enter the ncli command shell and will be the same process for acli.Once inside the shell, you can view the list o
Nutanix, like any other system is composed of several components including hardware, software and firmware. If a component has to communicate with another at any given moment, the connection has to be pre-integrated and supported. How do you keep-up and understand what is compatible with what? Easy! with our compatibility matrix. For example, you have a Nutanix block model NX-8035-G5 and you want to know the supported AOS and Hypervisor for the chosen Model. The matrix shows the AOS, AHV, and hypervisor compatibility for Nutanix NX and SX Series platforms and SW-Only Models qualified by Nutanix (such as Cisco UCS, Dell PowerEdge, HPE ProLiant and others listed here). For other platforms not listed here, such as Dell XC, Lenovo HX, and others, please see your vendor documentation for compatibility. The matrix is presenting the fields in a cascading style, to show a clear components dependencies. Also in the compatibility matrix, you can see the Nutanix Ready Solution where we show the
Shutting down and Restarting a Nutanix Cluster requires some considerations and ensuring proper steps are followed - in order to bring up your VMs & data in a healthy and consistent state. Nutanix is a Hypervisor agnostic platform, it supports AHV, Hyper-V, ESXi and XEN. This makes it all the more important to read the following Nutanix KB, which details the steps required to gracefully shutdown and restart a Nutanix cluster with any of the hypervisors. Nutanix KB : How to Shut Down a Cluster and Start it Again?
What is the difference between the Redundancy Factor and Replication Factor? Redundancy factor 3 is a configurable option that allows a Nutanix cluster to withstand the failure of two nodes or drives in different blocks. By default, Nutanix clusters have redundancy factor 2, which means they can tolerate the failure of a single node or drive. The larger the cluster, the more likely it is to experience multiple failures. Redundancy Factor 3 requirements: Min 5 nodes in the cluster. CVM with 32GB RAM configured. For guest VM to tolerate a simultaneous failure of 2 nodes or 2 disks in different blocks, VM data must be stored on a container with replication factor 3. NOTE: Nutanix cluster with FT2 enabled, can host storage containers with RF=2 and RF=3. Redundancy Factor 2 requirements: Min 3 nodes in the cluster. CVM with 24GB RAM configured. Some background to understand Redundancy Factor. Cassandra Key Role: Distributed metadata store Description: Cassandra stores and manag
Many users are not aware that a recent change has been made to the default password setting of new Nutanix nodes. Specifically, the default password for the IPMI interface is now the serial number of the node itself (using capital letters). Please note that the node serial number is different from the block serial number. You can find more information regarding this change as per the Common BMC and IPMI Utilities and Examples Knowledge Base article.Also, if you desire to change the IPMI password, you can do so using the IPMI management utility located within the file system of the operating system running on the node. Further, you can even change the password, without having an operating system installed/running, by using the utility from a bootable DOS environment. You can find more information regarding this within the Changing the IPMI Password section of the NX Series Hardware Administration Guide.
Receiving emails about EOL, but no clue what it means? Once a product is EOL (end of life), then there will be no further upgrades/updates released for it and will no longer be supported by Nutanix. This may include the AOS version, Prism central, Nutanix files or Supported hardware platforms. To view the EOL information, navigate as follows:- Login into Nutanix Support Portal. Menu > Documentation. EOL Information. Select the entity you want to see the information for. Don’t want to follow the above steps and like shortcuts?… Click here then. Have any questions? Drop a comment and let us start a discussion.
All, I have a 6 Node 1065 system in 2 blocks. Recently one of the CVMs (node 2, block A) crashed. When rebooted it was just going in loops. When diagnosed it seems the SSD (not the SATADOM) had failed and we replaced it. When we try to boot the CVM, it still just loops. We were told to boot that node with Phoenix which the cluster provided me for download. I do that and it doesn’t load Phoenix and gets errors instead. I’m looking for a suggestion of how to get the node back to 100%. At this point (and throughout) the ESXi on the SATADOM has booted fine and I guess if I didn’t care about the storage side I could just ignore this but I’d like the system to be fully healthy. Any suggestion about how to get the CVM working again would be appreciated. Thank you Johan
Hi all, Local Replication is a process in which multiple copies of data are stored within a storage container. These copies exist for fault tolerance. Snapshots are placed locally on the same cluster as the source VM. Thus, If a physical disk fails, the cluster can recover data from another copy. The cluster manages the replicated data, and the copies are not visible to the user. So, what is the difference the Replication Factor option? Because RF is used too for fault tolerance in case of a physical disk failure (or node, ...) Thanks
Ever wondered what are some of the main services/components that make up Nutanix?The following is a simplified view of the main Nutanix cluster components.All components run on multiple nodes in the cluster and depend on connectivity between their peers that also run the same component. Cluster components ZeusKey Role: Access interface for Zookeeper· A key element of a distributed system, zeus is a method for all nodes to store and update the cluster's configuration. This zeus configuration includes details about the physical components in the cluster, such as hosts and disks, and logical components, like storage containers. The state of these components, including their IP addresses, capacities, and data replication rules, are also stored in the cluster configuration.· Zeus is the Nutanix library that all other components use to access the cluster configuration, which is currently implemented using Apache Zookeeper. MedusaKey role: Access interface for Cassandra· Distributed sy
A traditional Nutanix cluster requires a minimum of three nodes, but Nutanix also offers the option of a two-node cluster for ROBO implementations and other situations that require a lower cost yet high resiliency option. Unlike a one-node cluster (see Single-Node Clusters), a two-node cluster can still provide many of the resiliency features of a three-node cluster. This is possible by adding an external Witness VM in a separate failure domain to the configuration (see Configuring a Witness (two-node cluster)). Nevertheless, there are some restrictions when employing a two-node cluster. The following links will provide you guide lines and information abut configuring the two node clusters:Two-Node Cluster Guidelines Two-Node ClustersAsk any questions to clarify any concerns about the the two node clusters.
Giving thought to change your replication factor from 2 to 3? What are the impacts and things to consider? First, let’s take a look at what replication factor is. Redundancy factor is a configurable option that allows a Nutanix cluster to withstand the failure of nodes or drives in different blocks. By default, Nutanix clusters have redundancy factor 2, which means they can tolerate the failure of a single node or drive. So RF3 means cluster can tolerate the failure of 2 nodes or drive… Basic Maths isn’t it? Redundancy factor 3 has the following requirements: Redundancy factor 3 can be enabled at the time of cluster creation or after creation too. A cluster must have at least five nodes for redundancy factor 3 to be enabled. For guest VMs to tolerate the simultaneous failure of two nodes or drives in different blocks, the data must be stored on containers with replication factor 3. Controller VMs must be configured with a minimum of 28 GB(20 GB default+8 GB for the featur
Hi, A customer moved his Nutanix Cluster (with Hyper-V) from a DC to another, after powering the Nodes up, the IPs of all Hyper-V host and CMs released, I logged it locally to Hyper-V hosts and configure the internal IP (192.168.5.1/28) and the external IP same like before shutting the cluster down. I repeated the previous step with CVMs, I went through cd/etc/sysconfigs/ and edit network-scripts file and added the the external IP in the eth0 and the internal one (192.168.5.2/28) in eth1. Now Hyper-V FC is working fine but cannot start VMs due to the Nutanix cluster issue, whenever I tried to start cluster from any CVM, I get this message “WARNING genesis_utils.py:1211 Failed to reach a node where Genesis is up. Retrying” Is there any way to fix this issue or to repair cluster configuration without disrupting existing data? Thanks in advance
Let’s say you have Nutanix cluster with 24 CPU cores and 2 sockets and now you’re confused regarding the terminology. vCPU and cores per CPU and how to provision CPU to a Virtual Machine confusing you? Let’s break the terminology down in simple terms!In the world of Hardware, we have sockets and cores. In a host, there would be 2 sockets(or CPU) and 12 cores in each socket, resulting in 24 cores.So what is a vCPU?vCPU corresponds to the number of sockets for the VM. Cores per vCPU correspond to cores in a socket, so in conclusion, if you have provisioned your VM with the following configuration:2 vCPU 4 core per vCPUIn this scenario, your VM would have 2 sockets and 8 cores in total.How can I provision my vCPU, is there a guide or documentation regarding it?Absolutely yes, please give the following article a readCPU Configuration So is there a way to overprovision my CPU?Absolutely yes, please give the following article a readCPU Oversubscription
On 31st March 2020 we made available our latest Long Term Support (LTS) release of AOS, version 5.15. This LTS release builds upon a mature and proven AOS codebase which customers have already been running successfully in their production environments. End of Support Life (EOSL) and Release Information: AOS 5.15 is a Long Term Support (LTS) Release: Information on AOS Long Term Support (LTS) and Short Term Support (STS) Releases, please see KB 5505 or the Support policies page Please refer to the AOS EOL Schedule for release details If you are on an EOSL release, please plan on moving to one of the following to avoid disruption in support: AOS 5.15 (LTS) or a supported LTS release AOS 5.16 (STS) or a supported STS release for rapid adoption of new features mentioned in the release notes Hardware Compatibility List (HCL) for Approved Platforms and EOL: Information on the Hardware Compatibility Guidelines and EOL can be found on the Support policies page Please refer to
Want to know how to change the default credentials on the cluster? On Nutanix cluster’s if you have the default credentials you will receive an INFO message (default_password_check) in NCC health check informing you the same, you can change your CVM, Hypervisor and IPMI password using the guides below, you can also use script to change it on all nodes at once. Nutanix portal document
Cluster data usage grows and occasionally it grows more rapidly then planned and expected before a new node to be added to handle this growth. Nutanix offers data Compression and Deduplication to hold more data in the container, by reducing the stored size and avoiding duplicate data, respectively. Usually Compression should be used first, as de-duplication is recommended only on some specific scenarios (please, check documentation links provided below). Compression: ============ You can enable compression on a storage container. Compression can save physical storage space and improve I/O bandwidth and memory usage which may have a positive impact on overall system performance. The following types of compression are available. - Post-process compression: Data is compressed after it is written. The delay time between write and compression is configurable, and Nutanix recommends a delay of 60 minutes. If compression is enabled in "Post-process", then existing data will also
Let’s say you want to add a new node with a newer processor class than the existing nodes in the cluster. In this case you must enable EVC (Enhanced vMotion Compatibility) feature. What is EVC? EVC stands for Enhanced vMotion Compatibility which is a vCenter Server cluster-centric feature allowing virtual machines to vMotion or migrate across ESXi hosts equipped with dissimilar processors in the same cluster. VMware EVC Mode works by masking unsupported processor features thus presenting a homogeneous processor front to all the virtual machines in a cluster. This means that a VM can vMotion to any ESXi host in a cluster irrespective of the host’s micro-architecture examples of which include Intel’s Sandy Bridge and Haswell. One caveat to remember is that all the processor(s) must be from a single vendor i.e. either Intel or AMD. You simply cannot mix and match. What are the benefits? The main benefit is that you can add servers with the latest processors to your existing cluste
There are times, when we need to move a vm disk to a different container on the same AHV Cluster.For e.g. : We may want to move this VM Disk on a container with De-Dupliction Disabled. To relocate a virtual machine’s disk to a different container on the same AHV cluster, following steps are required:Requirements for the Move:Source Container ID (where the vmdisk is located originally) Destination Container ID (our target container on the same cluster) VM Disk(s) UUID (UUID of each disk we need to move “acli vm.get <vm-name>”) Power-off the VMSummary of Steps:Determine the vmdisk_uuid of each virtual disk on the VM. Make sure the VM for which the VMdisk we are migrating is powered off. Use the Acropolis Image Service to clone the source Virtual Disk(s) into Image(s) on the target container. Attach the disk from the Acropolis Image(s) which created in Step 3 to the VM. Remove the VM disk that is hosted on the original container Optional: Remove the cloned VMdisk from image services
In some cases, you might have to permanently remove a physical node / host from a Nutanix cluster. There are two scenarios in node removal. Permanently Removing an online node Removing an offline / not-responsive node in a 4-node cluster, at least 30% free space must be available to avoid filling any disk beyond 95%. You cannot remove nodes from a 3-node cluster because a minimum of three Zeus nodes are required. Some Points to consider before initiating node removal: Sufficient Disk space available on other nodes in the cluster User Virtual Machine relocation (if required) Any software upgrade should not be running Checklist on verifying cluster health status Data resiliency is “OK” (green) in Prism Run a complete “ncc report” either from prism or CVM cli: ncc health_checks run_all Depending on the size of data, node removal can be lengthy process, which involves relocating data from the node to other healthy nodes in the cluster. Node removal also remo
The Prism interface allows the investigation of the disk I/O latency. As a result, following questions are raised. Note: Nutanix recommends that maximum latency readings should not be used as a measure of cluster performance and health. Average latency is a useful measure of cluster performance and health. What should be the average latency on a production cluster? What should be the maximum latency? What point is the latency too high? How to investigate the high latency? Consider the following for latency investigations. The end-user impact for any performance investigation. If the impact is not measurable by the end-user, then any investigation of performance statistics is going to reveal normal and healthy cluster operations. VM combinations, traffic type at the time, write or read size, sequential versus non-sequential, read versus write factors on which investigations are dependent. Latency Variables in a Nutanix ClusterThe following points provide you with the information
Login to the community
Login with your account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.