We have an NX-3050 and we frequently have to re-build linux VM's due to their ext4 filesystems corrupting and goin into read only. Our research has pointed us to articles where the linux kernel has issues with SSD's. Has anyone else experienced this and if so, how did you solve it?
Edit: We created a container that bypassed the SSD's and we have not yet see the issue there, but we would love to re-engage the SSD's on our servers. The linux version/distro is Ubuntu 12.04.3 LTS.
One of the articles we found relating to this is: http://askubuntu.com/questions/262717/ubuntu-12-04-ssd-root-frequent-random-read-only-file-system
I hope this helps if any of you have experienced this issue.
Already have an account? Login
Login to the community
Login with your account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.
It would be surprising, if bypassing SSD in container fixes this issue.
We only needed Ubuntu for a single application which now runs on CentOS 6. We have not seen this issue with CentOS under identical circumstances.
The issue confused us also as I agree that the guest os should have no ides they are SSDs, but the errors we found in the logs were consistant with the bug referred to above. Windows as yet to have a problem, and we haven't tried Centos yet as one of the blogs we found was a similar issue (if not the same) on CentOS with SSDs. We only created the container that *should* ignore the SSD's a few weeks ago but so far there haven't been any issues. If all else fails, we have a NAS presented to vSphere via NFS that we could put our linux boxes on, with the new container and the updated kernel, things are looking good.
I am glad I'm not the only one and I am very happy to hear about CentOS, we may have to switch. I am still fairly new to linux (about 1 year exp.) and have been on Ubuntu and Debian from the start. Have you reported the issue to Nutanix at all, when I first talked to them about it, they hadn't heard of it.
Since the hypervisor presents a disk, it should be transparent to the VM, espically given how the distrubuted file system in a cluster works.
Is this KVM, vSphere, or HyperV?
Never seen this issue with CentOS and never with non-cloned drives.
If removing the SSD tier stops it (not that I've tried) then I'm suspecting some sort of latency issue. I wonder if copy-on-write on a cloned disk causes the SSD cache layer to trip up briefly where running of spinny disk is slow enough for this not to happen. At that point it might be in the order of magnitude of kernel parameter tweaks which might explain why CentOS doesn't exhibit the same behaviour.
It would be a good thing to potentially contact support when there is a paused VM so wost case support may be able to find the root cause.
This is on vSphere 5.1 and the Ubuntu was 12.04.3 LTS.
@kiboro Interesting you mention cloned drives, I haven't been cloning the drives of our CentOS boxes, but I did with Ubuntu, I wonder if that is a factor.
Removing the SSD layer did not work, we had some fail even in the "noSSD" container.
@swatkins We have not ever had I/O that the nutanix would consider high, I think the highest we have seen is about 2000 iops (during a spike).
I did contact Nutanix and after reviewing everything they recommended opening a case with VMware, but we haven't gotten around to that since we switched to CentOS (it was time to update anyways).
I had first thought it was a side effect of converting the machines (pre-nutanix, we were on hyper-v 2)
Anyways, I will try to find the logs, but we consider this resolved since CentOS is working for us.