How It works
Have questions about how the Nutanix Platform works? Looking to get started - start here!
- 1,300 Topics
- 1,982 Replies
The NCC health check check_vcenter_connection verifies if the vCenter Server is registered with Prism and if a connection can be established. Nutanix cluster communicates with vCenter Server to obtain virtual machine information necessary for certain Nutanix cluster operations like Data Protection, One-Click upgrades, etc. If the vCenter Server is not registered or is not accessible, those operations may fail. The check returns a PASS if vCenter Server is registered with Prism Element and connection can be established. The check returns an INFO if vCenter is not registered with Prism The check returns a FAIL if vCenter is registered with Prism and the connection to vCenter cannot be established. This scheduled to run every 5 minutes, by default and will generate an alert after 3 consecutive failures across scheduled intervals. To take a look at the NCC check and the solution section https://portal.nutanix.com/page/documents/kbs/details/?targetId=kA032000000TVQACA4 For instructio
To perform core VM management operations directly from Prism without switching to vCenter Server, you need to register your cluster with the vCenter Server. Nutanix cluster communicates with vCenter Server to obtain virtual machine information necessary for certain Nutanix cluster operations like Data Protection, One-Click upgrades, etc. If the vCenter Server is not registered or is not accessible, those operations may fail. The NCC health check check_vcenter_connection is also in place to verify if the vCenter Server is registered with Prism and if a connection can be established. Follow the steps to register your cluster with vCenter: Log into the Prism web console. Click the gear icon in the main menu and then select vCenter Registration in the Settings page. Click the Register link. Enter the administrator user name and password of the vCenter Server in the Admin Username and Admin Password fields. Click Register. Following are some of the important points about regi
Did you know that you can directly access the files in the Nutanix container from your local desktop? It is possible using WinSCP. This process helps in uploading or downloading files to and from Nutanix container. For example, you want to download a phoenix iso you generated on a CVM. Here are the steps to access a container: Open WinSCP. Connect to the CVM IP using SFTP protocol and port 2222. Login using the admin/prism element credentials. Enable the option to show hidden files by going to Options > Preferences > Panels and then selecting the “Show hidden files” option under the common settings. From here you can either upload or download files to the container. Note: Do not delete any data from the container via WinSCP or similar tool. Appropriate Prism or CVM command-line workflows should be leveraged to perform the cleanup if needed. To take a look at the steps in detail, take a look at https://portal.nutanix.com/page/documents/kbs/details/?targetId=kA0
We all know that Virtual IP is used to access the Prism web console. It is also referred to as the external IP address of the cluster. Cluster virtual IP is mapped to the CVM which is the Prism service leader. Every time a new leader is elected the virtual IP is transferred to the new leader CVM, ensuring Prism Element availability. The NCC health check virtual_ip_check verifies if the cluster virtual IP is configured and reachable. It is scheduled to run every hour, by default and will generate an alert after 1 failure. To manually verify virtual IP settings in the Prism web console. Click the cluster name in the main menu of the Prism web console dashboard. In the Cluster Details pop-up window, check that a Cluster Virtual IP is configured and is correct. Check if the virtual IP configured is in the same subnet as the CVM IP. The settings can also be accessed from CLI. Have a read about this check and the scenarios where it can throw alerts https://portal.nutanix.c
Did you know that in Acropolis (AHV) you can enable high availability for the cluster to ensure that VMs can be migrated and restarted to another node in case of a failure? Best effort VM availability is enabled by default in Acropolis. Virtual Machine High Availability (VM HA) VM HA is a feature designed to ensure that critical VMs are restarted on another Acropolis Hypervisor (AHV) host within the cluster if a host fails. There are two VM high availability modes: Default - This does not require any configuration and is included by default when an Acropolis Hypervisor-based Nutanix cluster is installed. When an AHV host becomes unavailable, VMs that were running on the failed AHV host are restarted on the remaining hosts, based on the available resources. Not all of the failed VMs will restart if the remaining hosts do not have sufficient resources. Guarantee - This non-default configuration reserves space to guarantee that all failed VMs will restart on other hosts of the clu
Let’s say you need to administer your user VMs from the command line interface. The Controller VM resources are shown under the VM page In the Nutanix Prism, but you will not be able to change the resources configuration unless you connected to the Acropolis hypervisor (host) and modified the configurations using virsh. “virsh: is a command line interface tool for managing guests and the hypervisor.” Centos.org. First you can review the settings of the CVM under the VM Page on Prism. Connect to the Acropolis hypervisor (host) using the root account with password “nutanix/4u” Lists all the VMs on a host > virsh list –all Displays information about a VM > virsh dominfo VM_Name Displays information about the vCPU > virsh vcpuinfo VM_Name Sets the number of virtual processors > virsh setvcpus VM_name count Note: The count value cannot exceed the number of processors specified for the guest. You can increase the number of processors by editing the virs
Let's say that you ran the health checks on your cluster and received data_replication_check failure, what does it mean and how do you fix it? The NCC health check data_replication_check helps to ensure that the customers are not impacted by an extremely rare condition which can result in the inability to restore from snapshots. This issue is covered in Field Advisory 28. Scenario 1 - Cluster is running an old version of AOS.Scenario 2 - Cluster was recently upgraded, but has snapshots created before the AOS upgrade. For a full explanation and the solution for both scenarios check out the health check documentation at: KB-2089. To make sure that the issue is resolved you can run the dedicated health check for this component by connecting to a CVM and running: "ncc health_checks data_protection_checks protection_domain_checks data_replication_check" Note: This health check has been retired from NCC 3.9.3.
Scalability is the backbone of any industry solution, hence with ever-increasing workloads, it is natural for us to add new nodes in our existing infrastructure to make it scalable and highly efficient. Nutanix provides a feature which allows you to add new nodes to your existing cluster and increase the cluster overall capacity. A question lingering in our mind right now Do we have some guidelines regarding cluster expansion? Cluster expansion depends on the AOS version, hypervisor type (AHV, ESXi, Hyper-V, or Citrix Hypervisor), data-at-rest encryption status, and certain hardware configuration factors. The following documents provides some basic guidelines specific for cluster expansion Cluster Expansion .NEXT:Want to expand your cluster
It may be helpful to determine what commands were run previously to troubleshoot a current issue. Let’s first understand what aCLIand nCLI command utilities are. aCLI: utility to create, modify and manage VMs in AHV. nCLI: utility to manage cluster operations. The history files are hidden and are persistent across reboots. Below article explains how to retrieve commands history: https://portal.nutanix.com/page/documents/kbs/details/?targetId=kA032000000TSsjCAG Also, take a look at how to check cluster upgrade history: https://next.nutanix.com/discussion-forum-14/checking-cluster-upgrade-history-37417
Did you know that each node of the cluster keeps records of the upgrade history of its components? Quite convenient to look at when determining the timeline of events during troubleshooting. Each of the components below has a dedicated history file: AOS Hypervisor NCC Files Hardware CVM memory In addition, there is a history of maintenance mode engagement on the cluster. The article below lists the commands necessary to retrieve a specific component’s upgrade history. https://portal.nutanix.com/page/documents/kbs/details/?targetId=kA0600000008VYECA2 Want to know how upgrades work in Nutanix? Take a look at the following Knowledge Base to understand how upgrades work in Nutanix architecture. https://portal.nutanix.com/page/documents/kbs/details/?targetId=kA00e000000LMgICAW
Below are new knowledge base articles published on the week of April 5-11, 2020. KB 8842 - NCC Health Check: cassandra_ssd_size_check KB 9198 - [Prism Central] Prism Central 22.214.171.124 - Docker logs for docker based services are missing KB 9209 - Zoom Security Statement KB 9211 - How to remove Ghost NICs on Windows Note: You may need to log in to the Support Portal to view some of these articles.
I have read in the ECA course some recommandation from Nutanix regarding AHV Networking specialy for LACP. If this is a best practice please Nutanix do as usual and set it by default : "ovs-vsctl set port br0-up other_config:lacp-fallback- ab=true” "ovs-vsctl set port br0-up other_config:lacp-time=fast"
You may have encountered an alert about PE-PC Connection Failure, or regarding IDF db to db sync. It’s possible that you had a visible issue with Prism Central’s management and monitoring of clusters, but also possible that you didn’t. What are these alerts for? How do we know if there’s really a problem? What can we do to fix it? I would like to answer these questions for you. Prism Element and Prism Central need two-way communication for a number of reasons. Manageability, forecasting, alert visibility, and reporting in Prism Central all depend on periodic syncing of data from PE to PC. Prism Central’s VM and infrastructure management, image creation, DR orchestration, and other enhanced features like Calm require API communication from PC to PE. PE-PC Connection Failure alerts indicate communication failure. This could be a short-lived issue and even an expected one, like when PC is rebooted during an upgrade, or it could reflect a more prolonged connection loss. Seeing the
NCC (Nutanix Cluster Check) is a framework of scripts that can help diagnose cluster health. NCC can be run provided that the individual nodes are up, regardless of the cluster state. NCC checks could be run from the Prism web console for clusters where AOS 5.0 or above and NCC 3.0 or above are installed. Procedure: 1. Logon to the Prism web console. 2. Browse to Health page, select Actions > Run Checks 3. Select the checks that you want to run. 4. Enable/disable the sending of the report email, then click Run. 5. Browse to Tasks Page, to review the summary by clicking on the task status ‘Succeeded”. 6. Download the summary by clicking on ‘Download output’. Note: You cannot run NCC checks from the Prism web console for clusters where AOS 4.7.x or earlier and NCC 3.0 or earlier are installed. Log Collector Logs can be collected for Controller VMs, file server, hardware, alerts, hypervisor, and for the system. Procedure: 1. Logon to the Prism web console. 2. Browse to H
Logging into the cluster 5 times a day (we can’t imagine why but you’re the boss) and always re-attempt the login because you can never enter the password right? Concerned about the security of transmitting your password across the network? Looking for an extra security layer? All of these are valid reasons to enable key-based authentication for SSH sessions. Generally speaking, there are two options: Go completely passwordless Use a combination of a public key and a password How does it work? You generate a key pair (a public and a private keys) using RSA algorithm. You never share a private key with any other system or server. The public key, on the other hand, you add to the Prism. With the first login public RSA key of a server (Prism in this instance) is shared with a client (session initiator) which together with the public key of the host recorded in the prism comprises a mutual public key exchange. Since the public key on its own, without its private part, is not enou
Let’s consider a scenario where you have two NX AHV clusters, A and B and there is an urgent need to move a VM from cluster A to cluster B. What are the scenarios available? Do you need to create a protection domain? Do you need to import vdisks from cluster A to cluster B? These are some questions which might be boggling your mind. The following knowledge article helps to list the different scenarios available and steps required in each scenario. AHV | How to move VM disks between two AHV Clusters To understand the concepts of data protection and disaster recovery in the Nutanix environment, the following documents might be helpful Data Protection and Disaster Recovery for AHV based VMs Data Protection and Disaster Recovery
What is Prism Central? Prism Central is a Multi-cluster manager responsible for managing multiple Acropolis Clusters to provide a single, centralized management interface. Prism Central is an optional software appliance (VM) which can be deployed in addition to the Acropolis Cluster (can run on it). How to access Prism Central? Prism Central can be accessed using the IP address specified during configuration or corresponding DNS entry. The figure shows an image illustrating the conceptual relationship between Prism Central and Prism Element: Here is a brief explanations about each of the main pages in Prism Central UI: Home Page: Environment wide monitoring dashboard including detailed information on service status, capacity planning, performance, tasks, etc. To get further information on any of them you can click on the item of interest. Virtual Infrastructure: Virtual entities (e.g. VMs, containers, Images, categories, etc.) Policies: Policy management and creation (e.g. securi
NTP or Network Time Protocol is an integral component for any infrastructure as it allows for accurate timestamps of events across the environment. Whether it is relied on for monitoring, troubleshooting or services and applications operations – NTP source is one of the key components of modern IT infrastructure.it helps different systems to be in sync with each other. Controller Virtual Machines (CVMs) are the backbone of Nutanix Hyper-converged Infrastructure. It is important for all the CVMs in the cluster to be in sync with each other. The CVMs use NTP to sync time between themselves within the cluster by after electing one of the CVMs as an NTP leader, even when an external NTP server is not configured, not available or not being used by design. Is there a check which validates time sync between the CVMs? The NCC health check cvm_time_drift_check verifies the time differences between CVMs (Controller VMs). Why is it important for the CVMs to be in time- sync with each oth
The Nutanix CVM is what runs the Nutanix software and serves all of the I/O operations to the hypervisor and all VMs running on that host hence it is of crucial importance to configure CMVs with the right amount of resources. The RAM and CPU allocation of the CVM both depend on the model of the nodes, the storage capacity of the cluster and the features used in the cluster. Each environment is unique in a way and each and every single one is dynamic by nature. Features are turned on and off, nodes upgrade, storage added. The following guide helps us to understand the memory requirement of the CVM for different models and features. The next time you are planning to add a feature or upgrade hardware in the cluster look at the minimum requirements to make sure that the cluster performance is not affected. After all there are very few things more satisfying than a smooth maintenance window. Acropolis Advance Administration Guide: CVM Memory Requirement Prism Web Console Guide v5.16: Increa
On 31st March 2020 we made available our latest Long Term Support (LTS) release of AOS, version 5.15. This LTS release builds upon a mature and proven AOS codebase which customers have already been running successfully in their production environments. End of Support Life (EOSL) and Release Information: AOS 5.15 is a Long Term Support (LTS) Release: Information on AOS Long Term Support (LTS) and Short Term Support (STS) Releases, please see KB 5505 or the Support policies page Please refer to the AOS EOL Schedule for release details If you are on an EOSL release, please plan on moving to one of the following to avoid disruption in support: AOS 5.15 (LTS) or a supported LTS release AOS 5.16 (STS) or a supported STS release for rapid adoption of new features mentioned in the release notes Hardware Compatibility List (HCL) for Approved Platforms and EOL: Information on the Hardware Compatibility Guidelines and EOL can be found on the Support policies page Please refer to
Below are new knowledge base articles published on the week of March 29-April 4, 2020. KB 9119 - Alert A1159 "Flash Mode Usage Limit Exceeded" is not being raised KB 9123 - Alert - A160050 - FileServerConfigureNameServicesFailed KB 9130 - Clicking Source Entity = Container or Disk in an Alert/Event/Task will error out: "Storage Container or Disk with id '<UUID> ' was not found" KB 9131 - Prism Central (PC) 1-click deployment fails on Prism Element (PE) cluster version prior to 5.10.10, 5.11.2, 5.16 and 5.17 KB 9135 - Nutanix Move: "DuplicateName" for ESXi to ESXi Moves on the same vCenter KB 9147 - Migrating CentOS/RHEL 5.11 VM by Move KB 9160 - Nutanix Support Organization response to COVID-19 KB 9163 - Kubernetes pod fails to mount a volume that is internally-attached to a VM KB 9169 - Era Registration Fails due to multiple IP Addresses KB 9173 - LCM Failure: The following entities cannot be updated as they are disabled KB 9178 - ESXi 6.7U3 Upgrade on PRIMEFLEX Using 1-Click Wi
Hi! I’m trying to find a way to create automatically expiring snapshots via API. I know I can use v2 api to create normal snapshots, and with Playbooks I can create snaphots which have an expiry time. Is there a way to create recovery points (on-demand) or run playbooks via API? Thanks!
When it comes to administering the Nutanix cluster, it's very important to control permissions and to restrict access to critical components such as CVMs, AHV hosts and the Prism UI. Any user that has write access to these components can make drastic changes to the Nutanix environment and thus to your organization’s production/testing environment. Here is how to reset the password for each of the components above: AHV | Root account password reset check out KB-7068 Reset Web Console or nCLI Password check out KB-1200 To recover CVM Password Through the Prism Web Console check out KB-2233
Login to the community
Login with your account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.