How It works
Have questions about how the Nutanix Platform works? Looking to get started - start here!
- 1,222 Topics
- 1,874 Replies
Why are we seeing an alert for "external client authentication" and what should be done about it?
This alert is generated when an API comes in which is authenticated as "admin". Nutanix recommends any script or third party application sending APIs to the cluster should use a service account rather than using 'admin'. You can read more about this alert from the article "Alert - ExternalClientAccessCheck" If you are seeing this alert, it is informing you that some system is authenticating as admin. To aid in investigation the IP address is provided. The intent is that any 3rd party application or script should be using a service account and not ‘admin’ as this makes command auditing much more reasonable and helps keep the admin password secure. When a third party application such as Veeam is set up to authenticate to the cluster as ‘admin’ that should generate this alert. If you log in to Prism Element as admin, access the REST API explorer, and then test an API you should see this alert because that’s your desktop sending an API as ‘admin’. Likewise if you set up a PowerShell script
All, I have a 6 Node 1065 system in 2 blocks. Recently one of the CVMs (node 2, block A) crashed. When rebooted it was just going in loops. When diagnosed it seems the SSD (not the SATADOM) had failed and we replaced it. When we try to boot the CVM, it still just loops. We were told to boot that node with Phoenix which the cluster provided me for download. I do that and it doesn’t load Phoenix and gets errors instead. I’m looking for a suggestion of how to get the node back to 100%. At this point (and throughout) the ESXi on the SATADOM has booted fine and I guess if I didn’t care about the storage side I could just ignore this but I’d like the system to be fully healthy. Any suggestion about how to get the CVM working again would be appreciated. Thank you Johan
New KB Articles Published on Week Ending March 14, 2020
Below are new knowledge base articles published on the week of March 8-14, 2020. KB 7424 - NCC Health Check: metro_invalid_break_replication_timeout_check KB 9000 - NCC reports LSI firmware is blacklisted for DELL XC nodes, when LCM inventory does not have any newer versions KB 9042 - Prism Central session times out unexpectedly when a user logged in with 'Admin' role. KB 9063 - [Karbon] Kube DaemonSet Rollout Stuck Alert; Daemonset wrongly reports unavailable pods KB 9068 - AHV | nutanix-network-crashcart scripts fail with "No module named fc_progress" error on hosts imaged with Foundation 4.5.2 KB 9074 - [Karbon] Kubernetes Upgrade Fails with Error: Upgrade failed in component Monitoring Stack Could not upgrade k8s and/or addons Note: You may need to log in to the Support Portal to view some of these articles.
New KB Articles Published on Week Ending March 7, 2020
Below are new knowledge base articles published on the week of March 1-7, 2020. KB 8869 - NGT installation via Prism Central on Windows Server 2016 or more recent Operating Systems fails with INTERNAL ERROR message KB 8905 - How to download images from Prism Element clusters via command line KB 8917 - NCC - ERR : The plugin timed out KB 8993 - Foundation : Upgrade foundation using LCM Dark site bundle KB 8997 - Not able to delete a Role in Prism Central KB 8998 - Era Registration fails if container is not mounted on all hosts KB 9004 - Increased number of connections to File Server once migrated from Windows to Nutanix Files KB 9013 - How to Create a Shared Folder in Windows Server 2016/Windows 10 KB 9016 - Unable to open Java console after BMC upgrade from version 7.00 to 7.05 KB 9028 - ERA-DB provisioned from the OOB template failed to register with ERA server KB 9031 - Prism and Microsoft LDAP Channel Binding and Signing KB 9045 - How to find a VM creation date and time in Prism Cen
3node Cluster - 1 CVM is buggy - How to fix that?
2 of 3 nodes are fine and working. The 3nd CVM is up and i could ping it. Restart the Cluster with “allssh genesis stop cluster_health; cluster start” does not start these Cluster Partner. After Login in Prism i saw a “Disk degraded” for these 3rd node and now these disk is missing. How to fix node Nr. 3 in a 3 Node Cluster?
Disk size increased, but still reflects the same size within the VM.
Suppose you need more disk capacity on a virtual machine in your environment. You choose the VM in Prism, click ‘update’, select to edit the appropriate disk, and change the size of the disk from 200 to 300GiB. You click update and see that the task completes successfully, then close the VM update UI. The VM details reflect the increase in disk space, but when you access the VM it appears the capacity of the drive is unchanged! This is actually expected. There is just a bit more work to be done. The partition will need to be extended following the steps for your VM guest operating system. You can see the steps to complete this in Windows from the KB article “Expand volume group disk size on Windows OS” or if you are using Linux, check the KB article “Increase disk size on Linux UVM”
NCC Series | same_timezone_check
Let’s say you received an alert stating that all CVMs are not in the same timezone or all hosts are not in the same timezone. What does it mean? Well, as simple as the alert indicates, the CVMs/hosts are not in the same timezone. We need to ensure that the same timezone is configured across all the CVMs/Hosts as it ensures that all the guest VMs log messages are timestamped consistently. How will you know about the timezone issue? There is an NCC health check, “same_timezone_check” in place to inform any discrepancy in the timezones. To know more about the alerts and errors which can be seen and how to change the timezone, take a look at https://support-portal.nutanix.com/#/page/kbs/details?targetId=kA0600000008hm9CAA Have any questions? Leave a comment and let’s start a discussion.
NTP | AHV vs ESXi
First, let us understand what NTP (Network time protocol) is. An NTP server is a time server that is used to keep/sync the time in your cluster. An NTP server can be public or private depending on the strictness of your environment. To know how to configure NTP in your Nutanix cluster, take a look at- https://support-portal.nutanix.com/#/page/docs/details?targetId=Web-Console-Guide-Prism-v5_16:wc-system-ntp-servers-wc-t.html After the NTP server is configured, the genesis leader becomes the NTP leader, which means that the genesis leader is syncing time to the NTP server and other CVMs are syncing time with the genesis leader. How NTP works in AHV:- It’s as simple as it gets. The AHV hypervisor takes the same server configured on the cluster and syncs the time with it individually. There are no extra steps required to configure the NTP server on the AHV hosts. How NTP works in ESXi:- The ESXi cluster does not take the server configured on the Nutanix cluster and it needs to be
NTP Time In AHV And CVM is not Synchronizing
I am having an External NTP server, Which is tagged to my CVM, unfortunately my AHV is not corresponding with mt NTP server even it is reachable from my hypervisor level. I have checked my Hypervisor thru ssh and run the command ‘date’ its gives me different time from my NTP server.
Setting or changing the IPMI IP address, what are the restrictions?
You’re probably aware all the CVMs and hypervisor hosts in a cluster need to have IPs in the same subnet, but what about the IPMI? What are the requirements, and what’s involved in changing the IP? The requirements are actually quite flexible. The IPMI does not have to be in the same subnet as the hosts and CVMs. You’ll see an alert and a cluster health warning if you don’t restart the genesis service on the CVMs after making changes, but the configuration can be whatever works best for your organization. To restart the genesis service, log into the CVM via SSH or console as the user ‘nutanix’ and run the command “genesis restart”. This restart of the genesis service is non-disruptive. Actually, you can plug the IPMI into an isolated network or configure it on a different VLAN. You could even leave it unplugged when not in use. The IPMI is very useful for installs and updates, and for troubleshooting hardware issues or an unexpected reboot but it is not required for day to day operati
Recommended maximum storage utilization
What is the recommended maximum storage utilization in a cluster? Customers can observe cluster issues when they use more than 90 percent of the total available storage on the cluster. Here is a brief explanation on the recommended storage utilization in respect to the replication factors 2 & 3. For a cluster to be considered healthy and functioning as expected, the cluster has to tolerate at least one node failure for data resiliency. Here is an explanation on how much free space you need and how to calculate it. The formula for calculating the maximum recommended usage for clusters is one of the following: Recommended maximum utilization of a cluster with containers using replication factor (RF)=2 M = 0.9 x (T - N1) Recommended maximum utilization of a cluster with containers using replication factor (RF)=3 M = 0.9 x (T - [N1 + N2]) M Recommended maximum usage of the cluster T Total available physical storage capacity in the cluster N1 Storage capacity of the node with t
Why am I seeing this alert: Storage Containers are not mounted on all nodes?
On your Nutanix and vSphere cluster you may have recently seen this alert stating “Storage Containers are not mounted on all nodes”. This may have started showing up after some cluster modifications or maintenance, or could have appeared after some recent upgrades. You’re probably wanting to know, what is this alert for? How big of a problem is it and what do we need to do to fix it? I’d like to answer those questions for you. This alert is generated through a recurring NCC check “Storage Container Mount Configuration”. First off it’s useful to know that the main purpose of this check is to validate the configuration of a Metro Availability pair of Nutanix/ESXi clusters. That’s why this check is categorized under the data protection checks. The command to run this check from the CVM CLI is “ncc health_checks data_protection_checks protection_domain_checks storage_container_mount_check”. As for the impact of the problem being reported I’d have to say it’s conditional. The briefest way
Changing bonding mode in AHV from active-passive to balance-tcp
Let’s say you want high availability at the NIC level and failover capacity to the CVM and user VMs. We recommend using balance-slb to take advantage of all adapters. Balance-tcp is a load balancing method that increases host and VM bandwidth utilization beyond a single 10 Gb adapter by balancing each VM NIC TCP session on a different adapter. Also used when network switches require LACP negotiation." Advantages of Bond Mode for balance-slb Balance-slb bond mode in OVS (Open vSwitch) takes advantage of all the links. Rebalance VM traffic from the interface that is used extensively to less used interfaces. When the configurable bond-rebalance interval expires, OVS uses the measured load for each interface and the load for each source MAC hash to spread traffic evenly among links in the bond. Traffic from some source MAC hashes may move to a less active link to more evenly balanced bond member utilization. Here is an article that describes the impact of the change and the s
NCC series | Is mixing all-flash and hybrid nodes within the same cluster supported?
First of all, let’s take a look at what all-flash and hybrid nodes mean. All-flash:- Nodes containing only SSD drives Hybrid nodes:- Nodes containing both HDD and SSD drives. Is it possible to have both kinds of nodes within the same cluster? – Provided that the two conditions below are met, it is: The AOS version must be 5.1 or later. The minimum number of each node type (all-flash/hybrid) must be equal to the cluster redundancy factor. For example, clusters with a redundancy factor of 2 must have a minimum of 2 hybrid and 2 all-flash nodes. To check for any issues with mixing these nodes, we have the NCC check, “all_flash_nodes_intermixed_check”. This checks if there are any all-flash (only SSD drives) nodes in the same cluster that have hybrid (HDD and SSD drives) nodes. This check is scheduled to run every day, by default and will generate an alert after 1 failure. To know more about the check and its behavior, take a look at https://portal.nutanix.com/#/page/kbs/detai
New users on the team, how do we give them portal access?
When you on-ramp new team members they’re going to need access to all their tools. If those new teammates will be managing Nutanix systems they most likely need a login to the support portal for access to documentation, downloads, the knowledge base, and the ability to open a support case. So how do we set up access for new users? There are a few simple steps covered in the article How to Gain User Access to Nutanix Support Portal . You’ll need a block or software-only serial number and the user’s email address. The email address should be in the authorized domain for your customer account. To find the block serial number log into Prism and navigate to Hardware > Diagram. The block serial number is displayed above each block in the diagram. If there are multiple blocks, pick one and note the serial number for later. For software-only licenses, The software asset serial number is mentioned in the acknowledgement email sent upon the fulfilment of your order. Once you have the
Understanding UEFI BIOS
The Unified Extensible Firmware Interface (UEFI) is a specification that defines a software interface between an operating system and platform firmware. UEFI replaces the legacy Basic Input/Output System (BIOS) firmware interface originally present in all IBM PC-compatible personal computers,with most UEFI firmware implementations providing support for legacy BIOS services. UEFI can support remote diagnostics and repair of computers, even with no operating system installed.Advantages :The interface defined by the EFI specification includes data tables that contain platform information, and boot and runtime services that are available to the OS loader and OS. UEFI firmware provides several technical advantages over a traditional BIOS system: Ability to use large disks partitions (over 2 TB) with a GUID Partition Table (GPT) CPU-independent architecture CPU-independent drivers Flexible pre-OS environment, including network capability Modular design Backward and forward compatib
NCC Series | Low free block count error
You might receive an error of low free block count in your NCC report due to failed check sata_dom_fwv_check. This plugin checks the SATA DOM SMART data and shows if anything is off from the expected result. To free up blocks in the SATA DOM, we have a procedure called, “Trim Operation” which is effective and really quick. The trim operation consists of running a script called trim_satadom.sh which can be downloaded from the link provided in the below mentioned doc. To get to know more about the NCC check and to see how the Trim operation works, take a look at- https://portal.nutanix.com/#/page/kbs/details?targetId=kA032000000PODRCA4
RF2 -> RF3 | Requirements
Giving thought to change your replication factor from 2 to 3? What are the impacts and things to consider? First, let’s take a look at what replication factor is. Redundancy factor is a configurable option that allows a Nutanix cluster to withstand the failure of nodes or drives in different blocks. By default, Nutanix clusters have redundancy factor 2, which means they can tolerate the failure of a single node or drive. So RF3 means cluster can tolerate the failure of 2 nodes or drive… Basic Maths isn’t it? Redundancy factor 3 has the following requirements: Redundancy factor 3 can be enabled at the time of cluster creation or after creation too. A cluster must have at least five nodes for redundancy factor 3 to be enabled. For guest VMs to tolerate the simultaneous failure of two nodes or drives in different blocks, the data must be stored on containers with replication factor 3. Controller VMs must be configured with a minimum of 28 GB(20 GB default+8 GB for the featur
VM on SATA DOM?
Oops, did you by mistake created a VM on SATA DOM? What will happen if a VM is intentionally created on SATA DOM? For starters, it is not recommended to run VMs on the SATA DOM, as this accelerates the degradation. If by mistake you did it, how will you know? The NCC health check sata_dom_uvm_check verifies if there are any VMs running on the SATA DOM, and reports a FAIL status if the check detects any VMs running on the SATA DOM. Now as the check suggests that there is a VM running on SATA DOM, the only solution would be to take it out of there. To know more about the check outputs and the solution, take a look at https://portal.nutanix.com/#/page/kbs/details?targetId=kA03200000098jUCAQ Have any questions? Drop a comment and let’s start a discussion
Security issue on IPMI v 2.0
IPMI version 2.0 is susceptible to exploitation that allows an attacker to obtain password hash information. The vulnerability scan on the environment can give the below sample output:- Synopsis :The remote host supports IPMI version 2.0.Description :The remote host supports IPMI v2.0. The Intelligent Platform Management Interface (IPMI) protocol is affected by an information disclosure vulnerability due to the support of RMCP+ Authenticated Key Exchange Protocol (RAKP) authentication. A remote attacker can obtain password hash information for valid user accounts via the HMAC from a RAKP message 2 response from a BMC. To know the IPMI version on the host, login into the host, run the command for checking BMC version and you’ll get an output similar to:- Device ID : 32Device Revision : 1Firmware Revision : 3.63IPMI Version : 2.0 <<== IPMI versionManufacturer ID : xxxxxManufacturer Name : Supermicro To know ho
Shared Disk on AHV
Is it possible to share a disk between two or more Virtual Machines? Can we use the disk on the two VMs at the same time, for reading and writing? The answer is Yes and No respectively. Yes, Nutanix supports shared disk via Volume Group on AHV. This is supported on clustering solutions (Cluster at guest OS level) like My SQL etc. You can use the disk on the two VMs at the same time for reading. However, two VMs writing at the same time is not allowed. In a clustering solution, the master VM will write to the disk and all the other VMs in the cluster can read from the disk. Take a look at the guide to have an overview of how storage management is done at Nutanix:- https://portal.nutanix.com/#/page/docs/details?targetId=Web-Console-Guide-Prism-v5_16:wc-storage-management-wc-c.html
Understanding EVC Mode (vSphere) and when to use it
Let’s say you want to add a new node with a newer processor class than the existing nodes in the cluster. In this case you must enable EVC (Enhanced vMotion Compatibility) feature. What is EVC? EVC stands for Enhanced vMotion Compatibility which is a vCenter Server cluster-centric feature allowing virtual machines to vMotion or migrate across ESXi hosts equipped with dissimilar processors in the same cluster. VMware EVC Mode works by masking unsupported processor features thus presenting a homogeneous processor front to all the virtual machines in a cluster. This means that a VM can vMotion to any ESXi host in a cluster irrespective of the host’s micro-architecture examples of which include Intel’s Sandy Bridge and Haswell. One caveat to remember is that all the processor(s) must be from a single vendor i.e. either Intel or AMD. You simply cannot mix and match. What are the benefits? The main benefit is that you can add servers with the latest processors to your existing cluste
Already have an account? Login
Login to the community
Login with your account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.