The ESXi 5.5 U1 NFS datastore disconnect issue has occurred both internally and with our customers. Field advisory #17 was sent to our customers yesterday advising them to postpone any upgrades to ESXi 5.5. U1.
Looks like vmware just posted KB-2076392 on this issue.
Frequent NFS APDs after upgrading ESXi to 5.5 U1 (2076392)
When running ESXi 5.5 Update 1, the ESXi host frequently loses connectivity to NFS storage and APDs to NFS volumes are observed. You experience these symptoms:
Intermittent APDs for NFS datastores are reported, with consequent potential blue screen errors for Windows virtual machine guests and read-only filesystems in Linux virtual machines.
Note: NFS volumes include VSA datastores.
For the duration of the APD condition and after, the array still responds to ping and netcat tests are also successful, and there is no evidence to indicate a physical network or a NFS storage array issue.
The NFS storage array logs and traces also do not indicate any evident issue, other hosts not running ESXi 5.5 U1 continue to work and can read and write to the NFS share without issue.
You see entries in the vobd logs similar to:
Note: These log entries use the 12345678-abcdefg0 volume as an example:
2014-04-01T14:35:08.074Z: [APDCorrelator] 9413898746us: [vob.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state. 2014-04-01T14:35:08.075Z: [APDCorrelator] 9414268686us: [esx.problem.storage.apd.start] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down state. 2014-04-01T14:36:55.274Z: No correlator for vob.vmfs.nfs.server.disconnect 2014-04-01T14:36:55.274Z: [vmfsCorrelator] 9521467867us: [esx.problem.vmfs.nfs.server.disconnect] 192.168.1.1/NFS-DS1 12345678-abcdefg0-0000-000000000000 NFS-DS1 2014-04-01T14:37:28.081Z: [APDCorrelator] 9553899639us: [vob.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed. 2014-04-01T14:37:28.081Z: [APDCorrelator] 9554275221us: [esx.problem.storage.apd.timeout] Device or filesystem with identifier [12345678-abcdefg0] has entered the All Paths Down Timeout state after being in the All Paths Down state for 140 seconds. I/Os will now be fast failed.
This is a known issue affecting ESXi 5.5 Update 1 hosts with connected NFS storage. VMware is working towards providing a resolution to customers.
To work around this issue, VMware recommends using ESXi 5.5 GA.