Solved

Copy or Storage Vmotion Very Slow


Badge +2

I have a 8 nodes nutanix with vSphere 7 deployed. I’ve created 2 new storage container with compression enabled and map to the nodes. 

I test to copy the file that i uploaded to the default container to the new storage container presented as NFS datastore...it was very slow. 5GB took 2hrs. 

Then i tried storage vmotion of a VM reside on the new datastore to another new datastore. A 10GB VM took 20 minutes. 

If storage vmotion from the default container to new datastore, it fails. 

icon

Best answer by swee han 4 November 2022, 04:54

View original

This topic has been closed for comments

17 replies

Userlevel 4
Badge +5

How are the networks configured? 

Badge +2

no LACP. active-standby. Previously was working fine. 

Userlevel 4
Badge +5

So if it was working fine, something is changed or broken. I would suggest you look into what is changed and look and that component to find the bottleneck. 

Userlevel 3
Badge +5

Hello @swee han 

Make sure that Nutanix VAAI plug-in is installed and enabled.

Badge +2

checked...the plugin is installed and enabled.

Userlevel 1
Badge +1

Check the MTU on the switches and the NICs and the vSwitches.  To me this feels like an MTU problem. If it’s not that, my next guess would be that it’s a routing problem. Nutanix uses 192.168.5.0/24 internally. If you are using that network outside of Nutanix that would cause problems. 

Badge +2

Another funny thing I found in the prism, the data resiliency will randomly go into the state data resiliency not possible, system is self-healing although all the CVM services are up. Click on the self-healing, and it shows the extent group is the one affected. 

Userlevel 3
Badge +5

Just for completeness, is the default container functioning properly?

Badge +2

Default container was functioning properly previously. Now even I deploy VM in default container and install os with the iso reside in the same default container also very slow. 2 hours to deploy a window vm.

Userlevel 3
Badge +5

You have to contact Nutanix Support

Badge +2

Yes, they conduct few remote session and still no clue. So now they collected the logs and escalate to their senior or Level 2 or Level 3. 

Userlevel 4
Badge +5

Looks like the network is not as is should be. Are the switches (distrubuted vswitch or the srandard vswitches in vmware) changed? 

Userlevel 3
Badge +5

Can you tell us how the equipment is connected in terms of nics, switches etc.

Are you seeing alerts on Prism Element?

Can you transfer between esxi hosts directly (boot media to boot media) with speed?

Support will be best placed to help here.

What hardware is it?

Badge +2

Prism element only have warning...mostly is on the CVM ntp. The server is HPE nutanix model DX2000 with DX190R gen10. All the servers are connected to 2 x Cisco C9500. 

Userlevel 3
Badge +5

Ok cool. 

 

So if you SSH into esxi on node1 can you run df -h

We need to see if we have some local (non Nutanix) storage at hardware level so we can try to orchestrate testing without cvm/virtual storage.

 

Userlevel 2
Badge +3

Some of the points to considered:

  • Check if EVC is enable, i noticed if EVC is enable with very old generation. which may lead to slowness
  • Check if any firewall is involve if between the host. 
  • what the network connectivity. are you using 1G or 10G
Badge +2

Seems the culprit is on one of the node network connection at switch2. In the switch, that particular port has CRC errors. After disable the port, I perform testing for file copy and storage vmotion the result returns as normal. Also in the prism the data resiliency status also back to normal. Now checking which item is faulty (FC cable, network or server port, network or server transceiver).