Write latency in a VDI POC

  • 16 September 2015
  • 6 replies
  • 1402 views

Badge +6
We are performing a POC at a customer site using a 3 node DELL XC720xd cluster. The statistics taken from Vcenter show an average read letency around 1 ms and a write latency aroud 5-6 ms.

The write latency seems way too high since it should just make 2 writes (base and replica) on SSD. I wonder if the CVM has a timer before it flushes the pending writes to the oplog.

This could help generally speaking to reduce the number of I/O and increase the size which is good for SSD.

Unfortunatey in our POC we are fighting against the current configuration of server with Fusion-IO PCI cards and this approach could be a disadvantage.

Another important parameter is the logon time which average at 35-40 sec on the current system and goes up to 75 sec with Nutanix.

Any suggestion to speed up the Nutanix Cluster may be modifying some parameter?

This topic has been closed for comments

6 replies

Badge +1
Hi Steve,

Did you ever figure out the source of the write latency on the flash tier that you were experiencing during the POC? I would be curious to know what the resolution was. Thanks
Badge +6
Hi 

Since the POC is based on DELL appliances I have asked yesterday DELL to open a support Ticket.

There can be a lot to be improved in the customer environment but this does not explain the write latencies that should depend on the Nutanix implementation of the storage. Additionally the read could go on the SSD or on the HDD tier but ALL the write in this environmente should go to the SSD tier which should have latencies around 1 ms...
Userlevel 4
Badge +19
yeah something is not right and would open a support ticket. I doubt the flash tier is full so at this point it might as well be an all-flash array. I would work backwards from the problem. I would block inhertieance in AD, create a new user, not use romaing profiles, folder redirection. Then slowy add things back.

what's a brand new desktop not attached to a domain looking like. I know I have got logon times below 4s when minimal GPO's are used.

You can run NCC to verfiry everything as well. I would also confirm DNS is not screwing around with you.
Userlevel 3
Badge +19
I could not comment on the write Latency, but i had faced the same problem with one of the XenDesktop POC and problem with GPO and folder redirection. I had to create new GPO and apply instead of using existing GPO and was able to get login time below 10sec..

vF.P
Badge +6
Yes,

it is a single image. The desktops are non persistent and are assigned to the users at the logon. They are using roaming profiles and folder redirection on an external storage. This download of the user profile triggers a high number of writes at logon time. Today with an increased load of around 18-20 active desktop/node the average write latency is up to 7.5 ms and the average read latency is up to 2.3 m.

What is worse is the peak latency observed. We monitored a specific Virtual Desktop that was "very active" . It has from 40 to 80 write/sec and the write latency was up to over 20 msec.

I think there is surely something wrong because the total load is very low an average of 240 IO/sec. But every peak of requests produces a surge in the latency.

One last word: the minimum write latency never goes below 6 ms....

We have not the data from the storage tab of Prism because we can get the data only asking the customer to get them for us and he doesn't know how to use Prism. We can arrange to instruct him to get the data form Prism. we have seen the size of the Delta nd they are below 4 GB so 4x45 user is 180GB that should comfortably stay on the SSD tier.

The main complain from the customer is the login time which is twice that of the current environment. Since each VM performs on average 6-8 IO/sec the 6 ms average write latency should not be an issue after the logon but we have seen some peak of request and this immediately increases the latency up to 20 ms.

Of course these results are creating a problem in the POC.
Userlevel 4
Badge +19
Hi 

Are you using the same image in the same AD OU? What is the latency showing on the storage tab in PRISM?