Hello @DirkRasche ,
Can you confirm Whats the version of AOS, Also NGT and VIRTIO on Both VMs.
Did you try restarting any of the services on CVM?
Also did you try restarting the VM itself which shows this error?
Some of the common NGT Troubleshooting steps are listed in KB3741
@DirkRasche
Solution 1:
In early AOS releases, there is no retry logic for the function to fetch datastores so if the RPC call is lost or there is no reply for any reason the high-level operation (such as snapshot, migration, takeover) which trigger will fail too.
Instead of an immediate error on timeout, a subsequent retry should be implemented which will work on temporary failures.
If the cluster is in the older version of AOS I suggest you upgrade AOS first
Solution 2:
You can try to re-install NGT on the affected VMs
https://portal.nutanix.com/page/documents/kbs/details/?targetId=kA032000000TVEnCAO
Solution 3:
++ You can check if VSS snapshot is working correctly when using the same network as CVMs
++ If that is the case then change the network on the VM to another one different from CVMs network, perform changes on the firewall as required.
Solution 4:
If your environment is in Hyper-V then please check below steps
a) Please check if we are able to execute the below command from the Hyper-V hosts:
allssh 'winsh "Get-LocalVMConfiguration | ConvertTo-JSON -Depth 5 -Compress"'
If the above command succeeds then proceed to the next step, if not need to check at the Hyper-V layer if the command fails.
b). Need restart HyperInt across the cluster nodes. But at this point, I would like you to open a case with us as we might have to verify a couple of things before taking any further actions