vSphere 5.5U1 and NFS disconnects

  • 18 April 2014
  • 4 replies
  • 8210 views

Userlevel 4
Badge +21
Hello,

It's made the news yesterday, but in case you missed it, there seems to be a bug within vSphere 5.5U1 that affects all NFS users.

NetApp & VMware (at least, maybe Nutanix too?) are working on the subject.

See the post below for more information:
http://datacenterdude.com/vmware/nfs-disconnects-vmware-vsphere/

That may also explain the delay from Nutanix in the support of the 5.5U1?
Did you guys catch this while working no the 5.5U1 support?

Sylvain.

4 replies

Badge +5
The ESXi 5.5 U1 NFS datastore disconnect issue has occurred both internally and with our customers. Field advisory #17 was sent to our customers yesterday advising them to postpone any upgrades to ESXi 5.5. U1.
Badge +5
Looks like vmware just posted KB-2076392 on this issue.

Frequent NFS APDs after upgrading ESXi to 5.5 U1 (2076392)
SymptomsWhen running ESXi 5.5 Update 1, the ESXi host frequently loses connectivity to NFS storage and APDs to NFS volumes are observed. You experience these symptoms:
  • Intermittent APDs for NFS datastores are reported, with consequent potential blue screen errors for Windows virtual machine guests and read-only filesystems in Linux virtual machines.Note: NFS volumes include VSA datastores.
  • For the duration of the APD condition and after, the array still responds to ping and netcat tests are also successful, and there is no evidence to indicate a physical network or a NFS storage array issue.
  • The NFS storage array logs and traces also do not indicate any evident issue, other hosts not running ESXi 5.5 U1 continue to work and can read and write to the NFS share without issue.
  • You see entries in the vobd logs similar to:Note: These log entries use the 12345678-abcdefg0 volume as an example:2014-04-01T14:35:08.074Z: [APDCorrelator] 9413898746us: [vob.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.2014-04-01T14:35:08.075Z: [APDCorrelator] 9414268686us: [esx.problem.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state.2014-04-01T14:36:55.274Z: No correlator for vob.vmfs.nfs.server.disconnect2014-04-01T14:36:55.274Z: [vmfsCorrelator] 9521467867us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.1.1/NFS-DS1 12345678-abcdefg0-0000-000000000000 NFS-DS12014-04-01T14:37:28.081Z: [APDCorrelator] 9553899639us: [vob.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.2014-04-01T14:37:28.081Z: [APDCorrelator] 9554275221us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
ResolutionThis is a known issue affecting ESXi 5.5 Update 1 hosts with connected NFS storage. VMware is working towards providing a resolution to customers.To work around this issue, VMware recommends using ESXi 5.5 GA.
Userlevel 4
Badge +21
Thanks, very clear.

Too bad, we were to test the 5.5U1 just next week to rollout massively starting next month, guess we'll have to wait...

Sylvain.
Badge +5
Sure you know already but in case anyone see's this thread and isn't aware, a patch to 5.5U1 was released by VMware this week. Nutanix are internally testing it I believe. We await the results eagerly 🙂

Reply