Question

Troubleshooting NGT SSR

  • 5 June 2020
  • 2 replies
  • 3397 views

Hello community,

I am facing a situation with some VMs, where the SSR-part of the Nutanis Guest Tools is causing some trouble.

I have two VMs and created a Protection Domain for this to be able to play around with snap restore. One VM is working perfectly, the other VM displays:

“Error executing command: Failed to query snapshot due to Nutanix Data Protection service error, error detail: Failed to fetch file information for VM with ID ****** due to hyperint error 1”

This message appears as soon as I start the SSR tool or via ngtcli. Both VMs are set up the same way with IP from the same subnet.

Can anyone push me in the right direction to have a look for an answer on this?

Cheers

Dirk


2 replies

Userlevel 2
Badge +4

Hello @DirkRasche ,

Can you confirm Whats the version of AOS, Also NGT and VIRTIO on Both VMs.

Did you try restarting any of the services on CVM?

Also did you try restarting the VM itself which shows this error?

 

Some of the common NGT Troubleshooting steps are listed in KB3741

Badge +1

@DirkRasche 

 

Solution 1:

 

In early AOS releases, there is no retry logic for the function to fetch datastores so if the RPC call is lost or there is no reply for any reason the high-level operation (such as snapshot, migration, takeover) which trigger will fail too.

Instead of an immediate error on timeout, a subsequent retry should be implemented which will work on temporary failures.

If the cluster is in the older version of AOS  I suggest you upgrade AOS first

 

Solution 2:

 

You can  try to re-install NGT on the affected VMs

 

https://portal.nutanix.com/page/documents/kbs/details/?targetId=kA032000000TVEnCAO

 

Solution 3:

 

++ You can check if  VSS snapshot is working correctly when using the same network as CVMs

 

++ If that is the case then change the network on the VM to another one different from CVMs network,  perform changes on the firewall as required.

 

Solution 4:

 

If your environment is in Hyper-V then please check below steps

 

a) Please check if we are able to execute the below command from the Hyper-V hosts:

allssh 'winsh "Get-LocalVMConfiguration | ConvertTo-JSON -Depth 5 -Compress"'

 

If the above command succeeds then proceed to the next step, if not need to check at the Hyper-V layer if the command fails.

 

b). Need restart HyperInt across the cluster nodes. But at this point, I would like you to open a case with us as we might have to verify a couple of things before taking any further actions 

 

Reply