Performance metrics


Badge +5
  • Trailblazer
  • 17 replies
Hi there.

From my Analysis page i can see that today at 3:27 PM my cluster went from 500 IO to 20.500 IOPS.

How can i narrow down, which virtual machine is the reason for this exessive use, without adding 70 disks to my metrics.
And the disks only have numbers, i dont know which numbers belong to which vm's..


12 replies

Userlevel 2
Badge +12
This may be a long shot, you can use rest api's to get the stats for all virtual machine disks at 3:27 PM. The requirements are
1. linux vm or Mac osx that has connectivity to the cluster2. jq software installed (for "centos yum install jq", for osx "brew install jq")
First you need to get the epoch equivalent time for 3:27 PM your local time. Convert your 3:27 PM your local time to UTC and then use https://www.unixtimestamp.com to get the epoch time for the date you need to check. Now to the obtained epoch time add 6 zeros at the end to convert the seconds to micro seconds.

Step 1: Get a list of all virtual disk id's in the system along with their names in a file called uuidlist. The following command will do,

curl -s -k -u admin: -XGET --header 'Accept: application/json' 'https://:9440/PrismGateway/services/rest/v2.0/virtual_disks/' | jq '.entities[]| "(.uuid) (.attached_vmname)"' | sed 's/"//g' > uuidlist
Please replace with your cluster prism password and with your cluster's virtual ip address

Step2: Get IOPS for all the disks in the uuidlist file for a given epoch time and output to a file statsout. You need to run the command below

for i in `cat uuidlist | awk '{print $1}'`;do echo $i $(curl -s -k -u admin: -X GET --header 'Accept: application/json' "https://:9440/PrismGateway/services/rest/v2.0/virtual_disks/$i/stats/?metrics=controller_num_iops&start_time_in_usecs=&end_time_in_usecs=&interval_in_secs=1" | jq '.stats_specific_responses[].values[]' 2>/dev/null);done > statsout
Please replace with your cluster password and with your cluster's virtual ip address, with the time obtained before

Step 3: Find the virtual disk uuid that has the highest IOPS at 3:27 PM and the corresponding CVM name.

grep $(cat statsout | sort -n -r -k 2 | head -n1 | cut -d " " -f1) uuidlist
Userlevel 2
Badge +12
This will be python equivalent that can be run on linux/windows/OSX with python 3.x.
Change the "password", "ip", "epochtime" to get the correct values.

import requestsfrom requests.auth import HTTPBasicAuthimport jsonimport urllib3urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)epochtime="1506987645000000" # Change me to correct epoch timepassword='password' # Change me to correct passwordip="1.1.1.1" # Change me to correct Cluster virtual IPaddressurl = "https://"+ip+":9440/PrismGateway/services/rest/v2.0/virtual_disks/"headers = { 'accept': "application/json" }response = requests.request("GET", url, headers=headers,verify=False,auth=HTTPBasicAuth('admin', password))output=json.loads(response.text)uuid_col=[]for entity in output["entities"]: temp={} temp["uuid"]=entity["uuid"] temp["vm_name"]=entity["attached_vmname"] uuid_col.append(temp)for var in uuid_col: baseurl=url+var["uuid"]+"/stats/?metrics=controller_num_iops&start_time_in_usecs="+epochtime+"&end_time_in_usecs="+epochtime+"&interval_in_secs=1" response1 = requests.request("GET", baseurl, headers=headers,verify=False,auth=HTTPBasicAuth('admin', password)) iopsout=json.loads(response1.text) temp={} if not len(iopsout["stats_specific_responses"][0]["values"]): temp["iops"]=0 else: temp["iops"]=iopsout["stats_specific_responses"][0]["values"][0] var.update(temp)maxiops=max(uuid_col,key=lambda x:x['iops'])print("Virtual Machine with max IOPS is")print(maxiops)
Badge +5
Jeez! Thanks for this, but is this the only way to find what i am looking for? It happends once a day, hits 20.000 IOPS and goes down again.

Could it be some CVM cluster thing happening once a day?

If you say this is the only way, i will try it out...
Userlevel 2
Badge +12
m1kkel Does it happen at 3:27 PM every day or at random times?
Badge +5
Nope. The last two days including today, it've been around 1AM and

Today at: 01:02AM and
Yestoday at 10:06 PM
Userlevel 2
Badge +12
m1kkel You could also use Nutanix powershell commandlets from any windows pc to get the stats for 1 minute at your local time 3rd Oct 10:06 PM by running the following commands. No time conversions required

Connect-NTNXCluster -Server -UserName admin -Password -AcceptInvalidSSLCerts$date = Get-Date -Day 3 -Month 10 -Year 2017 -Hour 22 -Minute 06 -Second 00$starttime=([DateTimeoffset]$date).ToUnixTimeSeconds()*1000000$endtime=(([DateTimeoffset]$date).ToUnixTimeSeconds()+60)*1000000Get-NTNXVirtualDisk | %{ $name=$_.attachedvmnameGet-NTNXvirtualDiskStat -Id $_.uuid -StartTimeInUsecs $starttime -EndTimeInUsecs $endtime -Metrics controller_num_iops -IntervalInSecs 1 | select @{Name="VM Name";Expression={$name}},@{Name="IOPS";Expression={$_.values}}} | Sort IOPS -descending
Please replace with your prism virtual ip address

For instructions on installing the Nutanix powershell commandlets on windows PC, please follow
https://portal.nutanix.com/#/page/docs/details?targetId=API-Ref-AOS-v51:ps-ps-cmdlets-install-r.html
Badge +5
Okay thanks i will try that first! 🙂
Badge +5
Hi again.

So the powershell cmdlets seems to work fine, however the script is retrning netagive values for some of the vm's, and whats even better, the vm's listed does not consume the 20.000 IOps.

So im nut sure the reason why we dont see the 20.000 IOps is because of the script or because it's a system task generating them.


Userlevel 2
Badge +12
m1kkel How long does the 20,000 iops last for? May be we are not getting the correct time.
Badge +5
Around 30 minutes..
Userlevel 2
Badge +12
m1kkel It could be your replications. The script i provided to you doesn't show stats for inbound replications. Do you have inbound replications configred?
Badge +5
Well i dont use replication, unless i misunderstand what you mean. I do not have a DR cluster, and i currently dont replicate snapshots to AWS or AZURE...

Reply