How It works
Have questions about how the Nutanix Platform works? Looking to get started - start here!
- 1,298 Topics
- 1,981 Replies
Did you know that in Acropolis (AHV) you can enable high availability for the cluster to ensure that VMs can be migrated and restarted to another node in case of a failure? Best effort VM availability is enabled by default in Acropolis. Virtual Machine High Availability (VM HA) VM HA is a feature designed to ensure that critical VMs are restarted on another Acropolis Hypervisor (AHV) host within the cluster if a host fails. There are two VM high availability modes: Default - This does not require any configuration and is included by default when an Acropolis Hypervisor-based Nutanix cluster is installed. When an AHV host becomes unavailable, VMs that were running on the failed AHV host are restarted on the remaining hosts, based on the available resources. Not all of the failed VMs will restart if the remaining hosts do not have sufficient resources. Guarantee - This non-default configuration reserves space to guarantee that all failed VMs will restart on other hosts of the clu
We all know that Virtual IP is used to access the Prism web console. It is also referred to as the external IP address of the cluster. Cluster virtual IP is mapped to the CVM which is the Prism service leader. Every time a new leader is elected the virtual IP is transferred to the new leader CVM, ensuring Prism Element availability. The NCC health check virtual_ip_check verifies if the cluster virtual IP is configured and reachable. It is scheduled to run every hour, by default and will generate an alert after 1 failure. To manually verify virtual IP settings in the Prism web console. Click the cluster name in the main menu of the Prism web console dashboard. In the Cluster Details pop-up window, check that a Cluster Virtual IP is configured and is correct. Check if the virtual IP configured is in the same subnet as the CVM IP. The settings can also be accessed from CLI. Have a read about this check and the scenarios where it can throw alerts https://portal.nutanix.c
Did you know that you can directly access the files in the Nutanix container from your local desktop? It is possible using WinSCP. This process helps in uploading or downloading files to and from Nutanix container. For example, you want to download a phoenix iso you generated on a CVM. Here are the steps to access a container: Open WinSCP. Connect to the CVM IP using SFTP protocol and port 2222. Login using the admin/prism element credentials. Enable the option to show hidden files by going to Options > Preferences > Panels and then selecting the “Show hidden files” option under the common settings. From here you can either upload or download files to the container. Note: Do not delete any data from the container via WinSCP or similar tool. Appropriate Prism or CVM command-line workflows should be leveraged to perform the cleanup if needed. To take a look at the steps in detail, take a look at https://portal.nutanix.com/page/documents/kbs/details/?targetId=kA0
To perform core VM management operations directly from Prism without switching to vCenter Server, you need to register your cluster with the vCenter Server. Nutanix cluster communicates with vCenter Server to obtain virtual machine information necessary for certain Nutanix cluster operations like Data Protection, One-Click upgrades, etc. If the vCenter Server is not registered or is not accessible, those operations may fail. The NCC health check check_vcenter_connection is also in place to verify if the vCenter Server is registered with Prism and if a connection can be established. Follow the steps to register your cluster with vCenter: Log into the Prism web console. Click the gear icon in the main menu and then select vCenter Registration in the Settings page. Click the Register link. Enter the administrator user name and password of the vCenter Server in the Admin Username and Admin Password fields. Click Register. Following are some of the important points about regi
The NCC health check check_vcenter_connection verifies if the vCenter Server is registered with Prism and if a connection can be established. Nutanix cluster communicates with vCenter Server to obtain virtual machine information necessary for certain Nutanix cluster operations like Data Protection, One-Click upgrades, etc. If the vCenter Server is not registered or is not accessible, those operations may fail. The check returns a PASS if vCenter Server is registered with Prism Element and connection can be established. The check returns an INFO if vCenter is not registered with Prism The check returns a FAIL if vCenter is registered with Prism and the connection to vCenter cannot be established. This scheduled to run every 5 minutes, by default and will generate an alert after 3 consecutive failures across scheduled intervals. To take a look at the NCC check and the solution section https://portal.nutanix.com/page/documents/kbs/details/?targetId=kA032000000TVQACA4 For instructio
Below are new knowledge base articles published on the week of April 12-18, 2020. KB 9109 - Cannot Login to Prism with AD account KB 9168 - Custom Virtual Machine Report Creation in Prism Central KB 9223 - Nutanix Files: FileServer preupgrade check failed with cause(s) Sub task poll timed out KB 9224 - Restore the SSR snapshots using CLI KB 9232 - LCM Failure: Could not find pnics attached to the CVM interface KB 9239 - LCM upgrade failure on HPE - The node is not in production mode KB 9264 - AHV IDE bus performance implications Note: You may need to log in to the Support Portal to view some of these articles.
Our vision at Nutanix has been and will always be: "one platform, any app, any location". This has been our goal from close to the beginning. We have openly documented Nutanix architecture in the freely available Nutanix Bible, and we are committed to open-source software, actively using and contributing code within a variety of communities. For example, here is a simplified architecture of a Nutanix environment taken from the Nutanix bible: In the documentation, you will find all the information you need regarding the Software, Hardware, performance, recommended practices, restrictions and more.
Flow is a software-defined networking product tightly integrated into Nutanix AHV and Prism. Flow provides rich visualization, automation, and security for VMs running on AHV. There are instances in which flow is not configured in your environment but still, you will see the following alert "Flow Control Plane Failed" There could be multiple reasons for the alert to be recurring but if you're sure that you have not enabled flow in your infrastructure, you can use the following KB article to get a better idea about the alert, confirm if the flow is enabled or not and the possible workarounds "Flow Control Plane Failed" alert appears on PE even if Flow is not enabled Want to know more about Flow and it's best practices? The following document will help you understand the architecture and common guidelines regarding Nutanix Flow Nutanix Flow Guide
Are you looking for a way to support more than 12K VMs as part of your Prism Central installation? Then you should try to scale out Prism Central. Nutanix introduced Prism Central scale-out architecture in AOS 5.6. Prism Central scale-out architecture allows customers to scale out its Prism Central deployments incrementally, depending on the needs. Single Prism Central instance can support up to 12k VMs, with Prism Central scale-out, the number of supported VMs increases up to 25k PoweredOn VMs. Prism Central scales-out has 3 VMs and has been architected to tolerate one node failure (n+1 fault tolerance). The following requirements must be met before you can expand Prism Central or deploy a Prism Central VM: The specified gateway must be reachable. No duplicate IP addresses can be used. The container used for deployment is mounted on the hypervisor hosts. When installing on an ESXi cluster: vCenter and the ESXi cluster must be configured properly. See the vSphere Administra
In a traditional Nutanix cluster at least 3 nodes are expected to form a cluster. There is an option however to form a single-node or a two-nodes cluster for ROBO (Remote Office/Branch Office) implementations or as a backup site. The working of the two-node cluster is different from our usual 3 or more nodes cluster. Some examples are: We cannot expand a two-node cluster. Node removal is not supported. There is no cluster stop for 2 node cluster So what is the graceful way to shut down and start a two-node cluster? How to shut down a 2-node cluster: Ensure that the cluster has data resiliency OK and can tolerate one node down from Prism. Stop user VMs - graceful shutdown. There is no cluster stop for 2 node clusters. Log in to a CVM using the nutanix account, and perform a graceful shut down of the CVM. Wait for 5-10 mins, and then shut down the second CVM. Shut down the hosts. NOTE: The above two commands used to shutdown the CVMs are different. Take a look at
The NCC health check duplicate_hypervisor_ip_check detects IP addresses that conflict with any of the PE/clusters hypervisor host IPs on the same network. It does so by enumerating all available IPs for the external hypervisor interfaces in the cluster, checks responses to these IPs on the local network, and calls out any duplicate IP's which may be detected. This check results in a PASS status if none of the Hypervisor IPs have been duplicated in the network. If this check returns a FAIL status, that means that either the Hypervisor External IP or Hypervisor backplane network has a conflicting IP address assigned with another device on the same network. What can be the impact? Hypervisor host connectivity can become unstable or unavailable, leading to performance impact, redundancy concerns, and potential downtime. Below is an overview of the steps involved in case the check reports a failure: Run an arping command from any of the CVM and ensure a reply is received from onl
Pulse is an essential tool for maintaining uptime on a Nutanix cluster. While alert emails can directly open a case for an issue which has already happened, the data gathered and sent by Pulse enables identification of potential known issues that haven’t impacted your cluster yet. Enabling this feature within a Nutanix cluster is fairly simple, but depending on your network setup and security there may be some additional steps to make sure it’s working. When you first set up your cluster, right around the time you accept the EULA and change the password for ‘admin’ you are given the option to disable Pulse. So long as you don’t select to disable it, Pulse will attempt to work with the default settings. For some environments it’s just that simple. The cluster will start sending data periodically to Nutanix and our Support Portal will highlight any concerns identified based on that configuration. The configuration is once-per-cluster. If you want to check or update your Pulse configu
Below are new knowledge base articles published on the week of April 19-25, 2020. KB 9225 - Flow | Traffic between Target Group VMs may be blocked whenever a VM Migrate or Power On event is triggered KB 9272 - How to open Urls within the Frame Session KB 9281 - Xi Frame on AHV Manual CCA Configuration Note: You may need to log in to the Support Portal to view some of these articles.
If you've set up a new Nutanix cluster, you've seen the popup during the initial login that asks if you want to disable Pulse (not recommended). Yes, that one. So you may be wondering, what will it send and why would I want to enable this? The purpose and scope of Pulse data is cluster health monitoring. The data sent to Nutanix includes hardware, software, and firmware version information, storage usage and configuration details, resiliency status, CVM health status information, and some limited detail about VMs on the cluster. In the Pulse settings UI in Prism you have the option to enable or disable additional support information such as entity names, so you can keep those user-friendly names private if needed. More complete detail on what gets included in Pulse data transmissions can be found in the document Nutanix Pulse and Remote Diagnostics, or in the article "What is in a Pulse submission to Nutanix?" So why send this data to Nutanix? The short answer is “so that Nutan
Currently, we have two types of releases: Short Term Support (STS) releases that have new features, but also imply regular upgrades. Long Term Support (LTS) releases that are maintained for a longer duration and provide primarily bug fixes for an extended period of time on a particular release family. Depending on your choice of the type of the release i.e. LTS or STS, you can identify the type that has been currently installed using one of the following methods: By looking for is LTS field in the nCLI version. By looking for is LTS field in cluster information. From release tag file contents on the CVM. From the Zeus configuration printer. From the upgrade history. To take a look at the commands involved, how the output looks, and what to expect, please give a read to KB-6235. To know more about short term and long term releases, take a look at https://next.nutanix.com/prism-infrastructure-management-26/long-term-vs-short-term-support-releases-lts-vs-sts-37087
The NCC health check idf_db_to_db_sync_heartbeat_status_check verifies the consistency of data replicated from the Source Cluster to the Replica Cluster. IDF is Insights Data Fabric. A DB that contains data about all the entities in the cluster. And it has to be synced between PE and PC. For the purposes of the check, the source cluster is where the check is run from and the replica cluster is the DB instance that is compared against. This check returns a FAIL status when the heartbeat sync time between database on the source cluster and database on the replica cluster crosses a predefined threshold value (10 minutes, by default). This check is scheduled to run every 5 minutes, by default and will generate an alert after 3 consecutive failures across scheduled intervals What is the impact? The impact is the data replicated from the source to the replica cluster is not up to date. This could be because some services may not be working as expected. If this check returns FAIL, verify
What is the difference between the Redundancy Factor and Replication Factor? Redundancy factor 3 is a configurable option that allows a Nutanix cluster to withstand the failure of two nodes or drives in different blocks. By default, Nutanix clusters have redundancy factor 2, which means they can tolerate the failure of a single node or drive. The larger the cluster, the more likely it is to experience multiple failures. Redundancy Factor 3 requirements: Min 5 nodes in the cluster. CVM with 32GB RAM configured. For guest VM to tolerate a simultaneous failure of 2 nodes or 2 disks in different blocks, VM data must be stored on a container with replication factor 3. NOTE: Nutanix cluster with FT2 enabled, can host storage containers with RF=2 and RF=3. Redundancy Factor 2 requirements: Min 3 nodes in the cluster. CVM with 24GB RAM configured. Some background to understand Redundancy Factor. Cassandra Key Role: Distributed metadata store Description: Cassandra stores and manag
There are 3 types of Nutanix nodes: An HCI Node is the most common kind of node. It is heavily equipped with all three components: processing capacity (CPU), memory (RAM), and data storage capacity. An HCI node can run any kind of supported hypervisor. A Storage Node only runs the Nutanix AHV hypervisor. These nodes have a bare minimum of processing and memory capacity however, as the name suggests, have plenty of storage onboard. A Compute Node only runs the Nutanix AHV hypervisor. Logically, these are nodes that have minimum storage onboard but are powerful compute and memory devices. We are going to take a look at storage nodes in this post. Storage Nodes (sometimes referred to as “Light Compute” Nodes) are nodes that only use AHV as their hypervisor. Due to their nature no user VMs should be run on these nodes. You will still see a CVM. CVMs are integral to Nutanix architecture as they process all storage input-output requests. Simply put you must have a CMV on each no
There are 3 types of Nutanix nodes: An HCI Node is the most common kind of node. It is heavily equipped with all three components: processing capacity (CPU), memory (RAM), and data storage capacity. An HCI node can run any kind of supported hypervisor. A Storage Node only runs the Nutanix AHV hypervisor. These nodes have a bare minimum of processing and memory capacity however, as the name suggests, have plenty of storage onboard. A Compute Node only runs the Nutanix AHV hypervisor. Logically, these are nodes that have minimum storage onboard but are powerful compute and memory devices. We are going to take a look at compute nodes in this post. A compute-only (CO) node allows you to seamlessly and efficiently expand the computing capacity (CPU and memory) of your AHV cluster. The Nutanix cluster uses the resources (CPUs and memory) of a CO node exclusively for computing purposes. CO nodes enable you to achieve more control and value from restrictive licenses such as Ora
Secure Boot is specifically designed to prevent a malicious boot loader attack and has been the most widely accepted approach for both Windows and Linux. Secure Boot is supported by major hardware and hypervisor vendors. Windows introduced a new specification called Unified Extensible Firmware Interface (UEFI) that connects the computer’s firmware to its operating system (OS). UEFI is now part of an open-source forum UEFI and is expected to eventually replace BIOS. In order for Nutanix to fully support secure boot, Nutanix binaries are now signed with keys that the hardware trusts. The Nutanix public keys will either be available by default in the hardware or a customer would need to manually obtain Nutanix Public Keys and import them into the hardware UEFI administration interface. With Nutanix public keys made available in the hardware, UEFI will allow Nutanix binaries to boot securely. The NCC check returns a PASS if the following is true: All Hosts is running with Secure
Here is a tip on how to reset the password for Nutanix Prism local user. The guide applies to Nutanix Prism Central as well. There are two ways. 1. Using Prism interface Log in to Prism Element with domain user and go to User Management Prism user management From the user's window choose user and click update. On the bottom of the applet there is a Reset password button. 2. Command line Log in to CVM over SSH and get into ncli command line. Follow the step by step instructions by reading Reset Web Console or nCLI Password Also if you want to know how to modify the Prism user password expire days for security reasons, you can follow this article: Modify Prism admin and console user password expire days.
There may be instances where you'll receive the following alert in your environment "Cassandra on CVM x.x.x.31 is now detached from the ring due to Node was down for a long time. Node detach procedure done by x.x.x.41." Before understanding the alert, let's first understand what is Cassandra and the ring-structure? Cassandra stores and manages all of the cluster metadata in a distributed ring-like manner based upon a heavily modified Apache Cassandra. The Paxos algorithm is utilized to enforce strict consistency. This service runs on every node in the cluster. The Cassandra is accessed via an interface called Medusa To know more about Cassandra and the ring-structure, try going through Nutanix Bible which explains the architecture. . Cassandra has a feature called auto-ring repair which helps to prevent taking the cluster down due to multiple node failures at different intervals. The following article lists more information about the alert, auto-repair feature and ho
Hey all, At the suggestion of some people, sharing this video that I’ve created that goes through the VMware upgrade process using Nutanix 1-click functionality. Check it out if you’d like. Apologies in advance for the poor quality - it was the best way for me to hide any confidential info on screen. Cheers!
Below are the top knowledge base articles for the month of April 2020. KB 7604 - Disk space usage for root on Controller VM has exceeded 80% KB 4116 - Alert - A1187, A1188 - ECCErrorsLast1Day, ECCErrorsLast10Days KB 7503 - G6, G7 platforms - DIMM Error handling and replacement policy KB 1540 - What to do when /home partition or /home/nutanix directory is full KB 4141 - Alert - A1046 - PowerSupplyDown KB 4158 - Alert - A1104 - PhysicalDiskBad KB 4409 - LCM: (LifeCycle Manager) Troubleshooting Guide KB 1113 - HDD/SSD Troubleshooting KB 8792 - NCC checks: same_hypervisor_version_check, duplicate_cvm_ip_check, same_timezone_check, esx_sioc_status_check, power_supply_check, orphan_vm_snapshot_check giving ERR KB 2090 - AHV | Host and Guest Networking KB 1523 - NCC Health Check: disk_usage_check KB 2473 - NCC Health Check: cvm_memory_usage_check KB 2486 - NCC Health Check: cvm_mtu_check KB 4519 - NCC Health Check: check_ntp KB 4541 - Alert - A101055 - MetadataDiskMountedCheck KB 3357 - NCC H
This article describes how to rename the CVM entity in Prism within the AHV environment. This does not change the hostname of the CVM displayed in a console/SSH session as the hostname is a value configured within the CMV OS and is independent of the Controller VM name entity recorded in the Prism entity DB (Insights Data Fabric). It is the same name as in CVM’s XML definition on the AHV host. It can be checked by listing VMs on the host. The name displayed is the name that is shown in the Prism Wen console. Starting from AOS 5.10.10, 5.15 and 5.16, change_cvm_display_name script is available on CVM that can be used to change the display name. Following is the overview of the rename procedure: Complete the pre-checks: Name must start with NTNX- and end with -CVM. Only letters, numbers and "-" are supported in the name. CVM should be able to receive a shutdown token. Once CVM has shutdown token it will be turned off. The script does not put the CVM or the host into mainten
Login to the community
Login with your account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.