I was wondering what best practice is for defragmenting guest OS on a Nutanix node. I am running ESXi on the nodes. I have looked all over but I cannot find anything that tells me definitively the best practice.
Best answer by Sneha PatekarView original
Technically you can defrag, and do so at the VM level. https://www.vmware.com/support/ws55/doc/ws_disk_defrag.html
You really shouldn't defragment your virtual disks on Nutanix. It's bad for the SSD drives and the way storage extents work on Nutanix, it's unnecessary since it is already indexing and organizing data for you in 64k chunks. Your vdisk already has extents across multiple drives, so this will generate unnecessary IO.
Of course I always recommend reaching out to support. They can provide you best practices.
I did find that link but it was for VMWare workstation 5.5 so I wasn’t sure if it was relevant. Thanks for the info.
Defragmentation process is complicated and we do not allow customers to do on their own but only under Support guidance as there are some high number of caveats present.
Storage features applied on a container on a Nutanix cluster may cause vdiskblockmap metadata for the VM disk(s) to become heavily fragmented. This usually happens over a period of time for VMs with deduplication on disk and/or on VMs with workloads with small I/O overwrites and regular DR snapshots. High vdiskblockmap fragmentation can reduce performance across a number of operations.
If deduplication is used, then a high fragmentation is expected. In that case, do not proceed as defragmenting deduplicated vdisks means that deduplication savings are being removed.
Ideally to perform defragmentation:
Ensure that cluster is running AOS >= 5.10.6 or >= 5.11
If fragmentation is due to deduplication and it has been decided that deduplication is not the right solution: Disable deduplication.
Pause the PD that protects the VM that is to be defragmented (There are detailed steps to this process which should be performed under Support guidance.)
Jon from Engineering here - piling on this thread at request of our friends in support, I’d like to drop in some thoughts.
1 - I agree with Sneha - Defrag is (in general) only useful on physical systems with rotational media that they directly control and not in a virtualized SAN/HCI environment. I’d also wager high on saying that the most in the virtualization and SAN community agrees with this, here’s a handful of links (with VMware’s being the most definitive on the topic)
To be clear, We align with just about everyone else in the industry on this - for traditional windows defragmentation, we (in general) do not recommend it simply because it will largely be a waste of time and a waste of overall system resources. A lot of time spent and storage moved around for little to no gain what so ever (and in one case, possibly cause headaches, see below)
However, if you do want to defragment your guest file system, it shouldn’t hurt anything except if you are using snapshots (either at the hypervisor level and/or at the Nutanix level). Since the data you are moving around is changing blocks, you should expect snapshot bloat as well as temporarily increased replication times if you are using cluster to cluster replication. This would also extend to technologies looking at CBT for backups, since they are just looking at if blocks have changed or not.
Note: I said in general twice here because there are “defragmentation tools”, like Raxco and Conducive (and to a small extent Windows’s optimize-volume cmdlet) that do much more than traditional defragmentation-for-the-sake-of-making-data-contiguous like Windows Defrag does as well as to note that defragmentation sometimes is used as an overarching term for “keeping the file system healthy”, which dovetails into #2 (below).
2- Sometimes defragmentation is conflated with free space management, which I think comes from the days where it would be common to literally use the phrase “defragment my free space”; however, these days people are more likely to want to know how to optimize thin provisioning or simply “reclaim” free space. This refers to when the guest has allocated some space and then deleted data - in an ideal world you would want your backend storage to understand that data deletion and then “reclaim” those allocated blocks to keep you as thin as possible.
With Nutanix - If you’re on Hyper-V or AHV, you can natively take advantage of the TRIM / SCSI UNMAP feature sets built into many modern operating systems, where the operating system will send down a hint to the storage to garbage collect/reclaim a deleted set of blocks. There are a variety of blog posts and such on this topic, which is not Nutanix specific what so ever. If you’re running windows 2012 r2++, check out the optimize-volume -retrim command to explictly go and do this, but know that windows will just “figure it out” most of the time since it can see that the virtual disks from Hyper-V and AHV support trim properly and will just do it on the fly.
If you’re on ESXi, we use NFS as the means to present the Nutanix storage container to the ESXi host. NFS itself doesn’t have a clean way to do something like TRIM or UNMAP, so instead the trick is to use something that can write zeros into the free space. In the “freebie” software world for Windows, that is sdelete. Sdelete comes with plus and minus. The plus is that the net result is that it can zero out free space, and when Nutanix see a zero being sent, we deallocate that storage and reclaim it back to free space.
The big, big minus to SDelete is it has to use a balloon to write that data such that if you have open snapshots or have backups running, etc, all of those technologies are going to see data being changed within the operating system. This may bloat up snapshots or backups, but, assuming you have enough free space on the backend to process that sdelete, it does work well