How It works
Have questions about how the Nutanix Platform works? Looking to get started - start here!
- 1,207 Topics
- 1,864 Replies
Cluster IP address change
Every once in a while due to network infra structure changes or because you have to physical move the cluster to another location, you may have to modify the Cluster IP. This includes CVM, Hypervisor and IPMI ip addresses, netmask and default gateways Unfortunately this operation requires taking some down time as you will need to stop the cluster for the duration of change. Before you start, you need to: 1- Clearing the external virtual ip address of the cluster , and setting new ip address for it 2- Ensuring that the Ntp and Dns servers of the cluster are reachable from new CVM ip address and if they are going to be different, remove the old addresses and add the new ones 3- Check that all hosts are part of metadata store You need to consider 3 different scenarios: 1- Change the IP addresses of the CVMs in the same subnet. 2- Change the IP addresses of the CVMs to a new or different subnet. 3- Change the IP addresses of the CVMs to a new or different subnet if you are moving the cl
Foundation and LACP support
Foundation is how we build and configure Nutanix clusters and many customers would prefer to make use of more advanced network technologies like LACP to improve the cluster performance and provide redundancy. LACP increases bandwidth, provides graceful degradation as failure occurs, and increases availability. It provides network redundancy by load-balancing traffic across all available links. If one of the links fails, the system automatically load-balances traffic across all remaining links. Foundation 4.2 introduces LACP support for the standalone Foundation. For more information about the supported hypervisors and requirements, please see the KB article titled: LACP Support in Foundation.
Data Replication page in prism shows "processing" for some or all snapshots even after they are completed
If you are replicating data through DR (Data Replication page in Prism), then you have setup schedules for snapshots so they can be copied to the remote site on scheduled time. When you select a “snapshot” filed in one of the protection domains you created in the Prism UI, one of the fields is “Reclaimable space”. You may observe an spinning wheel continuously and the word “processing” on this filed for some or all snapshots, but you may also notice that snapshot(s) has already been taken and done. So why the spinning wheel for this filed? This field is lazy-calculated by Curator during full scans and populated afterward, so it takes sometime (may be few hours) to show up. Until Curator finishes calculating the value, the field shows Processing in Prism.
data difference between removed node and node failure
Hi I have a concern with the data resilience in Nutanix Cluster about rebuild the data in 2 scenarios. When a node is broken or failure, then the data will be rebuilt at the first time, the node will be detached from the ring, and I can see some task about removing the node/disk from the cluster. The whole process will used about serveral minutes or half hour. It will last no long time to restore the data resilience of the cluster. When I want to remove a node from the cluster, the data will also be rebuilt to other nodes in the cluster. but the time will be last serveral hours or 1 day to restore the data resililence. Seems remove node will also rebuild some other data like curator,cassandra and so on. but Does it will last so long time, hom many data will be move additionaly ? and What the difference for the user data resilience for the cluster?
Nutanix 1 - Click Upgrades - Network Ports
Nutanix AOS offers simplicity in managing traditional complex infrastructure tasks. From Virtual machine management, Storage operations, replication - and of course Cluster software and hardware upgrades. As Infrastructure admins, we are well aware of the operational pain points, when it comes to upgrading: Hypervisor Upgrades Storage OS upgrades Firmware Upgrades Management software upgrades the list goes on… With Nutanix One-Click upgrades, customers can upgrade software components and hardware components easily. Software and Firmware needs to be downloaded from Nutanix repositories - which is why it is important to understand what Network Ports are required to be open or can be opened on demand to check for upgrades. Following KB from Nutanix Portal lists the required network ports for different services and upgrade repos endpoints: Recommendation on Firewall Ports Config
Automate Replication Status check
I am looking for a something that i can setup in an automated task on a server to poll for any active replications for Protection Domains and if true to pull information and email it. The output i am looking for in the email would be something like below. Protection Domain : ProtectionDomainName Replication Operation : Sending Start Time : 03/11/2019 12:00:02 EDT Remote Site : RemoteSiteName Snapshot Id : 2918635 Bytes Completed : 444.08 MiB (465,653,447 bytes) Snapshot Size : 2.57 GiB (2,760,598,528 bytes) Complete Percent : 95.38689 If anyone already has something like this setup that would be awesome, my scripting skills are slim to none so any help would be awesome.
Getting UUID's to pass to Ansible
Hi all, I’m very new to Nutanix, and pretty new to Ansible. I’ve been tasked with updating / installing guest tools on any machines that need them, and they’d prefer to do it via Ansible. I’d like to be able to have Ansible use the uri module to grab the UUID of a given VM, or grab a list of UUID’s and the associated VM; however I’m having a lot of trouble parsing this information out in a way that Ansible can actually use it. Does anyone have experience with this? Or at least can tell me that there’s a better way to be doing this? Thanks!
NCC - why it is important to upgrade it
@Mutahir has already shared some insights on NCC checks in Keeping the Lights Green - NCC - Hardware Checks. Today I would like to bring up two important aspects of the tool. There may be a time where you receive an alert triggered by a regularly executed NCC check. Oftentimes the alert will have a reference to a KB article. You read the KB and it does not make any sense. Naturally, you raise a case with the Nutanix support team or commence the journey across vast space of the Internet in the search for an answer. The very first thing Nutanix support engineer will do is verify if the environment is running the latest version of NCC checks, and if it’s not, they will proceed with the NCC upgrade. More often then not, the alert will clear after the NCC upgrade. Why is it so? NCC is a powerful tool that is developed and maintained by a team of professionals. With their help the tool evolves and grows, more checks are introduced, issues are resolved and algorithms are improved. Thus it i
Top KB Articles for November 2019
Below are the top knowledge base articles for the month of November 2019. KB 4141 - Alert - A1046 - PowerSupplyDown KB 4116 - Alert - A1187, A1188 - ECCErrorsLast1Day, ECCErrorsLast10Days KB 1540 - What to do when /home partition or /home/nutanix directory is full KB 7503 - G6, G7 platforms with BIOS 41.002 -DIMM Error handling and replacement policy KB 4409 - LCM: (LifeCycle Manager) Troubleshooting Guide KB 1113 - HDD/SSD Troubleshooting KB 4541 - Alert - A101055 - MetadataDiskMountedCheck KB 4158 - Alert - A1104 - PhysicalDiskBad KB 2090 - AHV | Host and Guest Networking KB 4519 - NCC Health Check: check_ntp KB 1888 - NCC Health Check: storage_container_mount_check KB 4188 - Alert - A1050, A1008 - IPMIError KB 1507 - Alert IPMI IP address on Controller VM was updated to ... without following the Nutanix IP Reconfiguration procedure, can be misleading KB 4273 - NCC Health Check: aged_third_party_backup_snapshot_check KB 3523 - How to create a Phoenix ISO or AHV ISO from a CVM or Foun
New KB Articles Published on Week Ending November 30, 2019
Below are new knowledge base articles published on the week of November 24-30, 2019. KB 8302 - Pre-Upgrade Check : test_is_hyperv_nos_upgrade_supported KB 8303 - Pre-Upgrade Check : test_if_cau_update_is_running KB 8499 - Security - Nutanix definitions for most common STIGs KB 8555 - Launching a blueprint by using the simple_launch API fails after Prism Central is upgraded to 5.11 KB 8616 - "Restore" screen under ASYNC DR is misaligned if entity have long name KB 8618 - PD: Trying to to 'deactivate-and-destroy-vms' operation got error 'Error: Unexpected application error kInvalidAction raised' KB 8619 - Genesis may not start with error 'Received multiple ips for interface bound to ExternalSwitch' KB 8621 - Alert - A400101 - NucalmServiceDown KB 8622 - Alert - A400102 - EpsilonServiceDown KB 8629 - Calm - Jenkins deployment is stuck at "Installing: ssh-credentials" and fails without error messages KB 8639 - AHV | Never-schedulable node CVMs are not shown in the VM in Prism. KB 8641 - De
How Nutanix works!
Hi all, I have some questions that I’m trying to answer but … ;) So if you can explain to me or point me to a part of some resources # Questions Is it recommended or mandatory to configure containers as ReplicationFactor-3 when the cluster is RedundancyFactor-3 In case of ReplicationFactor-3, when reading, how many checks are done to validate data correctness? In a RedundancyFactor-2 only 1 failure is tolerated, the cluster will still work with (e.g) 2 Zookeeper. In RedundancyFactor-3 there is 5 Zookeeper, so why we can’t tolerate up to 3 failure? What are the limitations for which it is not possible to migrate VMs between containers without the export/import method? How the cluster will behave in case of network separation issue (e.g. 4 nodes can communicate and 4 other too)? If I a have 2 Guest VM in the same Vlan, will they communicate through the OVS br0 or the traffic will go till the external switch and come back to the cluster? With the bond0 (br0.up) interface having 2 links
Host services restart when network changes resulting in vm down
We have had a couple of instances recently when making network changes that have affected our clusters. This caused a restart on the lead host due to it detecting a network loss and then resulted in system outages. The cluster is configured with dual networks ports in active and passive mode and the understanding was that it would switch if any change or failure was detecetd without producing error events and systems down.
Python - Inventory VM and Cluster Inventory Tools v1.3
Hi In my previous publication, publish version v1.2. (https://next.nutanix.com/scripts-32/nutanix-tools-for-ahv-v1-2-32075) It was updated to v1.3 and it already extracts new information. The status of the NGT TOOLS, description, etc. and many new key validators to avoid problems. Additional I add a small script to obtain the information of the connection of the cluster to the TOR switch and the physical ports. Please check the github for more information. https://github.com/dlira2/Nutanix-tools-for-AHV I use the script in large accounts with more than 1000VMs and it works optimally.
VM capacity and prism capacity are different
Hi, I got a question during the test. Window VMs(C: drive) , linux VMs(df -h) capacity different those capacity in the prism(VM-table). Not all VMs are like this, only a few are like this. These VMs erased a lot of data and got a lot of capacity. The Curator has since been executed, but the capacity has not decreased inf prism-vm-table. Why can't Prism return a VM's erased data capacity value? How do I get Prism to read this capacity? [img]https://d1qy7qyune0vt1.cloudfront.net/nutanix-us/attachment/95360a51-c444-4541-8a0d-9d9376566a45.png[/img][img]https://d1qy7qyune0vt1.cloudfront.net/nutanix-us/attachment/eaa67bca-e1c9-4651-b4c2-72dafc3cc28d.png[/img] Window -> Server 2016 Linux -> Centos 7.X Thank you.
Converting AHV to ESXI
Hi all, did you guys know that AHV to ESXi conversion is only possible from Prism (Convert cluster option) if the cluster was earlier Esxi converted to AHV and being converted back to ESXi. AHV to ESXi conversion is not supported from the Prism. Some pre- requisites for converting AHV to ESXi via prism are-https://portal.nutanix.com/#/page/docs/details?targetId=Web-Console-Guide-Prism-v55:man-cluster-conversion-requirements-limitations-r.html#nref_amd_hlp_k5 However one could always re-image an AHV node to ESXi manually. Before converting the cluster, customer should always migrate the running vms on the AHV cluster to their DR cluster. This can be done using Protection Domains. More information on Protection Domains can be found below: https://portal.nutanix.com/#/page/docs/details?targetId=Prism-Element-Data-Protection-Guide-v511:Prism-Element-Data-Protection-Guide-v511 or also refer KB-3059 https://portal.nutanix.com/#/page/kbs/details?targetId=kA03200000098T7CAI Once cluster is c
Nutanix and Security
Nowadays everyone is concerned about the security of their infrastructure as they should be. The Nutanix document referenced below contains an overview of the security development life cycle (SecDL) and host of security features supported by Nutanix. It also demonstrates how Nutanix complies with security regulations to streamline infrastructure security management. In addition to this, this guide addresses the technical requirements that are site specific or compliance-standards (that should be adhered), which are not enabled by default. https://portal.nutanix.com/#/page/docs/details?targetId=Nutanix-Security-Guide-v510:Nutanix-Security-Guide-v510
Cluster upgrades when with metro availability is enabled
All clusters will need to be upgraded at a point. If you have metro availability enabled in your environment, you will need to follow the best practices guide lines for it: https://portal.nutanix.com/#/page/docs/details?targetId=Web-Console-Guide-Prism-v510:wc-metro-availability-upgrade-considerations-r.html Specifically Nutanix supports the following replications: Between N to N-2 major versions and vice versa for STS to STS versions or LTS to STS versions. Between N to N-1 major versions and vice versa for LTS to LTS versions.
Pulse - what's in it for you? Much more than a simple call home.
Nutanix Pulse HD provides diagnostic system data to Nutanix support teams to deliver pro-active, context-aware support for Nutanix solutions. The Nutanix cluster automatically and unobtrusively collects this information with no effect on system performance. Pulse HD shares only basic system-level information necessary for monitoring the health and status of a Nutanix cluster. OK, this is all great but let’s get to real benefits, shall we? Q 1. Why would you enable it? A. Well, for several reasons: So that if you have an issue that you raise with Nutanix support we would be able to start looking at the data about your cluster immediately without making you answer many questions that are often crucially important for a prompt issue resolution. In essence, to reduce the amount of time it takes to fix the problem. So that when Nutanix engineer requires logs they would be able to collect them themselves while you would focus on what you need to do. No frustration with logs collection,
Please Help: As per this video, when the new data is written to the cloned/snapshotted vDisk, what happens when we delete a snapshot? Say we created three snapshots and deleted a second snapshot which have new data written while the base vDisk was read only, what happens to the newly written data? Will it be committed to the base vDisk or the current state snapshot will be alive and other will be deleted? [url=https://www.youtube.com/watch?v=uK5wWR44UYE]https://www.youtube.com/watch?v=uK5wWR44UYE[/url]
Discovery of Nutanix Appliance
Hello All, The infrastructure team within my organization has recently rolled out several Nutanix environments. Unfortunately we are only able to discover the individual VM's and not the actual appliances. When we tried to address this with the Infrastructure team it was stated that no SSH credentials can be provided as the only credential come from the vendor. Our Discovery tool only supports Nutanix discovery via SSH. Which leads me to my Question? Is anybody performing Discovery on Nutanix appliance and VM's using Discovery tools in there environment and if so how are you achieving this?
CVM's running NFS server risk
I just had a security scan done against my cluster running AOS 5.5.6 and the only 'high' risk that came back was with cve-id CVE-1999-0548 (NFS Server Without Shares Detected): [b]Description;[/b] A superfluous NFS server that is not sharing any file systems has been detected. [b]How to Fix;[/b] Disable the NFS server. Obliviously, I don't think I want to disable the NFS server service on all of my cvm's - is there any official documentation that I can share with my peers to support this so that I can get an exemption from this risk on these systems?
Differences between v1 and v2 - metrics missing
Hi Team, Please let us know how to get below values using v2 apiversion. Previously we are using apiversion v1 and getting metrics using /vms [url=https://hostname4:9440/PrismGateway/services/rest/v1/vms/]https://hostname4:9440/PrismGateway/services/rest/v1/vms/[/url] Now with api version v2 we are not able to get below mentioned metrics [b]hypervisor_cpu_usage_ppm[/b] not able to find this with both urls(vms,virtual disks) [b]hypervisor_memory_usage_ppm[/b] not found [b]memoryCapacityInBytes[/b] not found [b]ipaddresses[/b] not found but found requested_ip_address are both same?? [b]controllerVm[/b] not found Able to get few metrics using virtual_disks and above mentioned metrics we are not able to get, Are these not available in v2 or modified to anyother url.
Already have an account? Login
Login to the community
Login with your account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.