How It works
Have questions about how the Nutanix Platform works? Looking to get started - start here!
- 1,120 Topics
- 1,675 Replies
Nutanix Controller VM (CVM) is the brain of a Nutanix Cluster running the Nutanix AOS software to form a highly-resilient cluster comprised of 3 or more nodes. Nutanix CVM offers rich data services, virtual machine management and hosting services, replication services - while ensuring all data and metadata is checksummed to ensure data consistency and integrity. Nutanix CVM communicates over a IP network to all other CVMs and Hosts in the cluster. Thus, changing the CVM IP Address should be planned carefully with all the following and any environment specific points: CVM and Hypervisor Host are required to be in the same subnet (192.168.5.x) Hypervisor host can be multi-homed, but point - (i) is mandatory IPMI subnet should be reachable to and from the CVM Cluster Virtual IP Address iSCSI Data Services IP (used by Volumes, Files, Objects, Karbon, LEAP) Network Segmentation check (Backplane traffic) Remote Sites and any on-going replication Guest VM downtime (this is required for Re-IP
Shutting down and Restarting a Nutanix Cluster requires some considerations and ensuring proper steps are followed - in order to bring up your VMs & data in a healthy and consistent state. Nutanix is a Hypervisor agnostic platform, it supports AHV, Hyper-V, ESXi and XEN. This makes it all the more important to read the following Nutanix KB, which details the steps required to gracefully shutdown and restart a Nutanix cluster with any of the hypervisors. Nutanix KB : How to Shut Down a Cluster and Start it Again?
Hey Community, can someone clarify the compression and de duplication features that come with the starter edition of AOS? I can see on the Nutanix website: https://www.nutanix.com/products/software-options that starter has a check box beside inline compression and inline de duplication, but not besides compression and de duplication, Does this mean with a starter license environments can still take advantage of the storage efficiencies of de duplication and compression (on write)? Is there any solution briefs that go into detail regarding the compression and de duplication features of the starter edition. Best Regards,
VDi (Virtual Desktop Infrastructure) is one of the earliest applications that was used in hyper-convergent systems. The closer the storage to the cpu and memory the better the performance. Citrix director plugin, need to be connected to the nutanix cluster. if the connection fails with a message "Unable to connect to the host”, you can easily address it by: 1- Allow ICMP traffic and make sure port 9440 is also allowed between Citrix Director and Nutanix cluster. 2- Make sure there is no proxy server configured in the browser configuration The above is reported in the Nutanix public KB article: https://portal.nutanix.com/#/page/kbs/details?targetId=kA00e000000LKiBCAW
The Move 3.0 services are now Dockerised and all Move and Move Agent Services now run as Docker Containers. You maybe running Nutanix Move version 3 and are unable to connect it to hosts that reside on certain internal subnet you have. it may be that local docker container (Docker0) has also assigned to the same subnet. This could be addressed buy the following KB article in your Nutanix Portal: https://portal.nutanix.com/#/page/kbs/details?targetId=kA00e000000PVwkCAG Ask questions about it, if you are concerned.
I want to Migration VM. Nutanix(AHV) → Another Nutanix(AHV) So. I tried to get the qcow2 file. refs https://virtualife.pro/export-an-nutanix-ahv-vm/ First I tried executed command “acli vm.list” → succeed :) The next time executed command “acli vm.get <VM name>” → nothing happend :( Why? -- Version Nutanix 5.10.5 LTS NCC 126.96.36.199 LCM 2.1.4139
Hello Sirs! I just want to inquire and ask if anybody has an idea if Nutanix has a plan to automate the processes of doing a shutdown of the cluster, entering a Node to Maintenance Mode and exporting VM that resides to a nutanix cluster and integrate this basic operation to their Prism Element or Prism Central GUI just as what the other HCI vendors? currently I belive these things is done via CLI only. I know this is a easy task for a technical people and admin but I think it defies the "one click simplicity / one click operation" that is shown on presentation decks. It would be great and probably an additional edge if processes like this is automated. thanks :)
Hi all, Our end users want to get SMS service when Nutanix has alert. May I know the following things? 1) Do you have any script or API to send SMS by using SNMP? 2) Is it possible to share the script? 3) If there is no script for SNMP, what is the best way to send SMS to end users? Thanks in advance for your support Billy Seo
If you are running Nutanix hardware, you may be familiar with accessing the IPMI page to load ISO files and looking into hyperviosr remote console among many other functions. You can reach this page from Hardware link in Prism (click on table tab and select the hyperviosr you are concerned about, the lower left of the screen will provide you with an IP link to the ipmi page) Occasionally we need to be concerned about the hardware related messages the node provide us (Event Log) OR hardware component level health of the node. The “Server Health” tab in ipmi page can provide us with these valuable information: ==
Hi, I have updated my Nutanix (AHV) license and checking license status, i see correct license expiry date applied to my clusters. I however still get alert from “Licese Standy Mode” details below Possible Cause The license file has not been applied after cluster summary file generation. Recommendation Apply a new license. When i try to reapply license by downloading csf file, uploading csf file i get error: You’ve uploaded an older/inactive cluster summary file(CSF). To continue, download the latest CSF from Prism Element or Prism Central Help please
Below are new knowledge base articles published on the week of December 1-7, 2019. KB 8546 - Pre-check: test_nsx_configuration_in_esx_deployments KB 8624 - PulseHD shows RED if dmidecode.exe is missing KB 8631 - Accessing Prism Via Citrix NetScaler (ADC) KB 8669 - Nutanix Files - long filename isn't supported yet. KB 8671 - How to determine which M.2 device failed on the node KB 8680 - Metro - Recovery procedure after two-node down scenarios Note: You may need to log in to the Support Portal to view some of these articles.
Hi all, Local Replication is a process in which multiple copies of data are stored within a storage container. These copies exist for fault tolerance. Snapshots are placed locally on the same cluster as the source VM. Thus, If a physical disk fails, the cluster can recover data from another copy. The cluster manages the replicated data, and the copies are not visible to the user. So, what is the difference the Replication Factor option? Because RF is used too for fault tolerance in case of a physical disk failure (or node, ...) Thanks
Every once in a while due to network infra structure changes or because you have to physical move the cluster to another location, you may have to modify the Cluster IP. This includes CVM, Hypervisor and IPMI ip addresses, netmask and default gateways Unfortunately this operation requires taking some down time as you will need to stop the cluster for the duration of change. Before you start, you need to: 1- Clearing the external virtual ip address of the cluster , and setting new ip address for it 2- Ensuring that the Ntp and Dns servers of the cluster are reachable from new CVM ip address and if they are going to be different, remove the old addresses and add the new ones 3- Check that all hosts are part of metadata store You need to consider 3 different scenarios: 1- Change the IP addresses of the CVMs in the same subnet. 2- Change the IP addresses of the CVMs to a new or different subnet. 3- Change the IP addresses of the CVMs to a new or different subnet if you are moving the cl
Foundation is how we build and configure Nutanix clusters and many customers would prefer to make use of more advanced network technologies like LACP to improve the cluster performance and provide redundancy. LACP increases bandwidth, provides graceful degradation as failure occurs, and increases availability. It provides network redundancy by load-balancing traffic across all available links. If one of the links fails, the system automatically load-balances traffic across all remaining links. Foundation 4.2 introduces LACP support for the standalone Foundation. For more information about the supported hypervisors and requirements, please see the KB article titled: LACP Support in Foundation.
Data Replication page in prism shows "processing" for some or all snapshots even after they are completed
If you are replicating data through DR (Data Replication page in Prism), then you have setup schedules for snapshots so they can be copied to the remote site on scheduled time. When you select a “snapshot” filed in one of the protection domains you created in the Prism UI, one of the fields is “Reclaimable space”. You may observe an spinning wheel continuously and the word “processing” on this filed for some or all snapshots, but you may also notice that snapshot(s) has already been taken and done. So why the spinning wheel for this filed? This field is lazy-calculated by Curator during full scans and populated afterward, so it takes sometime (may be few hours) to show up. Until Curator finishes calculating the value, the field shows Processing in Prism.
Hi I have a concern with the data resilience in Nutanix Cluster about rebuild the data in 2 scenarios. When a node is broken or failure, then the data will be rebuilt at the first time, the node will be detached from the ring, and I can see some task about removing the node/disk from the cluster. The whole process will used about serveral minutes or half hour. It will last no long time to restore the data resilience of the cluster. When I want to remove a node from the cluster, the data will also be rebuilt to other nodes in the cluster. but the time will be last serveral hours or 1 day to restore the data resililence. Seems remove node will also rebuild some other data like curator,cassandra and so on. but Does it will last so long time, hom many data will be move additionaly ? and What the difference for the user data resilience for the cluster?
Nutanix AOS offers simplicity in managing traditional complex infrastructure tasks. From Virtual machine management, Storage operations, replication - and of course Cluster software and hardware upgrades. As Infrastructure admins, we are well aware of the operational pain points, when it comes to upgrading: Hypervisor Upgrades Storage OS upgrades Firmware Upgrades Management software upgrades the list goes on… With Nutanix One-Click upgrades, customers can upgrade software components and hardware components easily. Software and Firmware needs to be downloaded from Nutanix repositories - which is why it is important to understand what Network Ports are required to be open or can be opened on demand to check for upgrades. Following KB from Nutanix Portal lists the required network ports for different services and upgrade repos endpoints: Recommendation on Firewall Ports Config
I am looking for a something that i can setup in an automated task on a server to poll for any active replications for Protection Domains and if true to pull information and email it. The output i am looking for in the email would be something like below. Protection Domain : ProtectionDomainName Replication Operation : Sending Start Time : 03/11/2019 12:00:02 EDT Remote Site : RemoteSiteName Snapshot Id : 2918635 Bytes Completed : 444.08 MiB (465,653,447 bytes) Snapshot Size : 2.57 GiB (2,760,598,528 bytes) Complete Percent : 95.38689 If anyone already has something like this setup that would be awesome, my scripting skills are slim to none so any help would be awesome.
Hi all, I’m very new to Nutanix, and pretty new to Ansible. I’ve been tasked with updating / installing guest tools on any machines that need them, and they’d prefer to do it via Ansible. I’d like to be able to have Ansible use the uri module to grab the UUID of a given VM, or grab a list of UUID’s and the associated VM; however I’m having a lot of trouble parsing this information out in a way that Ansible can actually use it. Does anyone have experience with this? Or at least can tell me that there’s a better way to be doing this? Thanks!
@Mutahir has already shared some insights on NCC checks in Keeping the Lights Green - NCC - Hardware Checks. Today I would like to bring up two important aspects of the tool. There may be a time where you receive an alert triggered by a regularly executed NCC check. Oftentimes the alert will have a reference to a KB article. You read the KB and it does not make any sense. Naturally, you raise a case with the Nutanix support team or commence the journey across vast space of the Internet in the search for an answer. The very first thing Nutanix support engineer will do is verify if the environment is running the latest version of NCC checks, and if it’s not, they will proceed with the NCC upgrade. More often then not, the alert will clear after the NCC upgrade. Why is it so? NCC is a powerful tool that is developed and maintained by a team of professionals. With their help the tool evolves and grows, more checks are introduced, issues are resolved and algorithms are improved. Thus it i
Below are the top knowledge base articles for the month of November 2019. KB 4141 - Alert - A1046 - PowerSupplyDown KB 4116 - Alert - A1187, A1188 - ECCErrorsLast1Day, ECCErrorsLast10Days KB 1540 - What to do when /home partition or /home/nutanix directory is full KB 7503 - G6, G7 platforms with BIOS 41.002 -DIMM Error handling and replacement policy KB 4409 - LCM: (LifeCycle Manager) Troubleshooting Guide KB 1113 - HDD/SSD Troubleshooting KB 4541 - Alert - A101055 - MetadataDiskMountedCheck KB 4158 - Alert - A1104 - PhysicalDiskBad KB 2090 - AHV | Host and Guest Networking KB 4519 - NCC Health Check: check_ntp KB 1888 - NCC Health Check: storage_container_mount_check KB 4188 - Alert - A1050, A1008 - IPMIError KB 1507 - Alert IPMI IP address on Controller VM was updated to ... without following the Nutanix IP Reconfiguration procedure, can be misleading KB 4273 - NCC Health Check: aged_third_party_backup_snapshot_check KB 3523 - How to create a Phoenix ISO or AHV ISO from a CVM or Foun
Below are new knowledge base articles published on the week of November 24-30, 2019. KB 8302 - Pre-Upgrade Check : test_is_hyperv_nos_upgrade_supported KB 8303 - Pre-Upgrade Check : test_if_cau_update_is_running KB 8499 - Security - Nutanix definitions for most common STIGs KB 8555 - Launching a blueprint by using the simple_launch API fails after Prism Central is upgraded to 5.11 KB 8616 - "Restore" screen under ASYNC DR is misaligned if entity have long name KB 8618 - PD: Trying to to 'deactivate-and-destroy-vms' operation got error 'Error: Unexpected application error kInvalidAction raised' KB 8619 - Genesis may not start with error 'Received multiple ips for interface bound to ExternalSwitch' KB 8621 - Alert - A400101 - NucalmServiceDown KB 8622 - Alert - A400102 - EpsilonServiceDown KB 8629 - Calm - Jenkins deployment is stuck at "Installing: ssh-credentials" and fails without error messages KB 8639 - AHV | Never-schedulable node CVMs are not shown in the VM in Prism. KB 8641 - De
Hi all, I have some questions that I’m trying to answer but … ;) So if you can explain to me or point me to a part of some resources # Questions Is it recommended or mandatory to configure containers as ReplicationFactor-3 when the cluster is RedundancyFactor-3 In case of ReplicationFactor-3, when reading, how many checks are done to validate data correctness? In a RedundancyFactor-2 only 1 failure is tolerated, the cluster will still work with (e.g) 2 Zookeeper. In RedundancyFactor-3 there is 5 Zookeeper, so why we can’t tolerate up to 3 failure? What are the limitations for which it is not possible to migrate VMs between containers without the export/import method? How the cluster will behave in case of network separation issue (e.g. 4 nodes can communicate and 4 other too)? If I a have 2 Guest VM in the same Vlan, will they communicate through the OVS br0 or the traffic will go till the external switch and come back to the cluster? With the bond0 (br0.up) interface having 2 links
We have had a couple of instances recently when making network changes that have affected our clusters. This caused a restart on the lead host due to it detecting a network loss and then resulted in system outages. The cluster is configured with dual networks ports in active and passive mode and the understanding was that it would switch if any change or failure was detecetd without producing error events and systems down.
Login to the community
Login with your account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.