In part 1 of Deduplication & Compression Comparison – Nutanix ADSF vs VMware vSAN, we talked about data reduction/efficiency technologies which apply further storage capacity efficiencies to the usable capacity. In part 2, we’ll look deeper at how, when and where the data reduction takes place.
The following table shows the storage tiers the data reduction technologies are supported for both products:
vSAN data reduction technologies are NOT applied in the high performance “cache” tier whereas they are with Nutanix where in-line compression for writes is on by default and Deduplication is supported.
With vSAN, Deduplication and Compression are only applied once the data is cold and de-staged to the capacity tier. Going back to the reference I made earlier that VMware has consistently been saying that they do not support data reduction on hybrid intentionally for performance reasons. It begs the question then why they do not apply these valuable technologies to their “all-flash” cache tier especially considering the write cache is limited to just 800GB per disk group.
The Nutanix write buffer (oplog) has compression enabled by default under the covers and cannot be disabled via the PRISM GUI. In testing, it’s been shown to be so valuable with minimal overheads that it was somewhat “hardcoded” on.
Check out: What are the performance impacts & overheads of Inline Compression on Nutanix? for example.
Let’s say the Nutanix flash tier was the same size as vSANs maximum write cache, 800GB. With compression enabled, even assuming a conservative 1.5:1 efficiency ratio the effective flash tier is increased to 1.2TB means significantly more data is being served from the fastest possible tier regardless of if it’s SSD+HDD (Hybrid) or NVMe+SSD.
Now to address VMware’s claim around not enabling data efficiency on Hybrid platforms:
Let’s say Nutanix compression adds a massive 50% latency penalty (which it doesn’t as per the earlier reference, but hear me out), and the average combined read/write latency without compression is 2ms.
Add 50% to that and we get 3ms average latency but we now have 50% more data (assuming a conservative 1.5:1 compression ratio) in the faster tier (e.g.: NVMe or SSD) as opposed to the same 50% of data being serviced by SATA (in the case of Hybrid) where latency would be more like 10ms on average.
This is a simple example shows that data reduction on hybrid systems can and frequently is a major performance ADVANTAGE!
With that said, customers often choose hybrid for use cases which host lots of cold data, in which case it’s infrequently accessed and even if performance was impacted, a 1.5:1 or 2:1 efficiency is likely well worth a potential performance penalty.
At 2:1 that’s HALF the nodes required to store the same data! Nutanix Customers could enable compression and spend some of the saved money on additional nodes to increase performance and still end up with a NET saving and a great business outcome!
For NVMe to SATA-SSD for Flash systems with mixed drive types (NVMe & SSD), with only a 1.5:1 ratio would still enjoy 50% more data served by NVMe compared to SATA-SSD which would provide some latency and performance benefits, albeit less of an advantage compared to Flash & HDD platforms.
That is a pretty solid outcome even if we assume some impact to latency.
Important note: The vSAN cache is typically the highest cost flash device (e.g.: NVMe or Enterprise Grade SSD) so the more data you can squeeze into that tier, the better your ROI (and your performance!)
But with vSAN, data reduction is only applied to COLD data, which again goes against VMware’s claim that they don’t support data reduction on hybrid for performance reasons as they do not use data reduction for the cache tier.
vSAN applies deduplication and then compression as it moves data from the cache tier to the capacity tier.Reference: https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.virtualsan.doc/GUID-3D2D80CC-444E-454E-9B8B-25C3F620EFED.html
Nutanix applies data reduction to all data, including in the oplog (persistent write buffer) and ensures the maximum amount of hot data is stored in flash regardless of the configuration (Hybrid or all flash).
What are the options of DISABLING data reduction features?
The below table shows the options customers have to reduce or remove data reduction configuration:
With vSAN, as mentioned earlier, Compression and Deduplication are an “all or nothing”. On one hand you might be getting good compression rates (say >1.5:1) but on the other hand you may experience a high impact in latency or low deduplication rates. With vSAN you need to compromise between performance and capacity efficiency due to lack of choices.
I’d say that makes VMware’s argument around overheads and efficiency of global metadata a moot point.
Side note: Before vSAN supported Compression and Deduplication, I was in a vChampions event in Sydney, Australia and I vividly recall the messaging was “It’s not required due to support for large capacity SATA drives which are low cost”. Of course we’re not stupid and we all know this was really just an excuse to buy VMware time before they could implement some for of data efficiency to compete with Nutanix and traditional storage arrays.
With Nutanix, these settings can be toggled on/off in real time and also in a granular manner to ensure the best balance between capacity efficiency and performance.
VMware also claim that global deduplication is costly on storage controller (CPU/RAM) resources which in fairness is not wrong per se, but it is misleading as all data efficiency always has a cost/benefit regardless of vendor. The key is what is our return from the “cost” and is that worth it.
Here is an article on the Cost vs Reward for the Nutanix Controller VM (CVM).
VMware rightly concede deduplication has overheads which need to be weighed up verses the benefits (which vary from customer to customer), yet their implementation is tied to being enabled with compression and worse still, activated at the cluster level. I can’t imagine that this would have ever been a design goal, more likely the best implementation available given the constraints of vSANs underlying architect.
If “global metadata” being “too costly” was a genuine reason for VMware not to do global dedupe, it begs the question why they went with per “Disk group” rather than a per node solution. No global metadata is required for per node dedupe and the concept of per node deduplication has some genuine advantages while suffering some, albeit not all of the downsides of the current vSAN implementation.
If VMware limiting vSAN dedupe to a “failure domain” was a genuine reason, it might have some merit but as the vSAN architecture is already constrained by the disk group concept. It’s more realistic to say vSAN is just limited to dedupe being per disk group as opposed to that being a genuine architectural design goal which in my opinion doesn’t make sense.
Combine the fact vSAN is not a truly distributed storage solution like Nutanix ADSF, the real reason is more likely that implementing a global metadata layer in vSAN just to support global deduplication would be way to resource intensive and would introduce significant layers of complexity in the product which is not justified for just one feature.
As Nutanix was designed from the ground up with global metadata, implementing global deduplication was the obvious choice as it was a simple extension of the existing architecture.
With Nutanix if you feel the cost/benefit of deduplication is not worth it, you can still enjoy the benefits of Compression and Erasure Coding (EC-X) and deduplication can be disabled on the fly without any downsides as we’ve already learned.
Again with vSAN you cannot turn off just deduplication so the claim deduplication (global or not) is too costly is a little silly when vSAN forces you to use two potentially costly technologies (compression and deduplication) together when in many cases you want one or the other, and not both! Noting vSAN compression is only applied for data which reduces a minimum of 2:1 ratio so customers are paying the cost for compression and only getting a return on that resource investment if the dataset is compressible at >2:1.
With that said, Deduplication is probably the most overrated feature in enterprise storage and rarely provides anywhere near the promises some vendors claim. Nutanix provides significant levels of space efficiency under the covers with technology like metadata clones (VAAI-NAS for ESXi and natively with AHV) and zero suppression in the write path and as a result, the reported deduplication savings may appear less than other vendors who misleading represent the numbers.
As such I typically recommend customers leave deduplication disabled as the savings from compression & Erasure Coding (EC-X) combined with metadata clones, the elimination of silos and zero suppression deliver excellent technical and business outcomes with the least possible resource usage.
When customers see large reported savings from deduplication, it is frequently due to misleading reporting such as reporting snapshots or metadata copies as deduplication as I discussed here: Deduplication ratios – What should be included in the reported ratio?
What is the impact of DISABLING data reduction features?
VMware often promote that their Storage Based Policy Management (SBPM) makes decisions around data efficiency and data protection (FTT vs RAID5/6) easy as you can just change the settings. While it’s true you can change the settings, the impact of doing so has significant impacts on cluster performance & resiliency which need to be carefully considered which will lead many customers to only perform these changes during maintenance windows and almost always out of business hours due to the long duration and high impact.
The following table shows 4 major impacts when disabling Compression and Deduplication on vSAN, none of which are applicable to Nutanix:
The following quotes are from VMware’s documentation confirming the above.
vSAN Full Evacuation, Change of Disk format & Capacity unavailable (disk group removed):
While disabling deduplication and compression, vSAN changes the disk format on each disk group of the cluster. It evacuates data from the disk group, removes the disk group, and recreates it with a format that does not support deduplication and compression.
The time required for this operation depends on the number of hosts in the cluster and amount of data. You can monitor the progress on the Tasks and Events tab.Reference: https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-5A01D0C3-8E6B-44A7-9B0C-5539698774CC.html
vSAN Temporary Reduced Resiliency:
As a result, temporarily during the format change for deduplication and compression, your virtual machines might be at risk of experiencing data loss. vSAN restores full compliance and redundancy after the format conversion is completed.Reference: https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-125B2B04-FBB9-43AB-8AF9-E7179734BC7C.html
Nutanix, on the other hand, allows on the fly changes without all these downsides. the process is handled as a low priority task by curator and processed over time while the new setting is immediately applied to incoming IO.
Let’s discuss the resiliency considerations with data reduction technologies.
With vSAN, using Data reduction technologies significantly increase the impact of a failures. The following resiliency scenarios highlight the advantage of Nutanix which does not suffer from any of these constraints.
Key point: With Nutanix, Resiliency is NEVER compromised as a result of any data efficiency setting/change.
The below quote clarifies the failure scenarios for vSAN:
If a capacity disk fails, the entire disk group becomes unavailable. To resolve this issue, identify and replace the failing component immediately. When removing the failed disk group, use the No Data Migration option.Reference: https://docs.vmware.com/en/VMware-vSphere/6.7/com.vmware.vsphere.virtualsan.doc/GUID-AA72CA1D-803D-4D1D-87BB-E7D86EC947D2.html
Minimum compression ratios
vSAN only compresses data (following deduplication) if the 4kb block is compressed to <=2kb, meaning unless your data is compressible at 2:1 ratio or higher, vSAN customers get NO compression savings.
Nutanix minimum compression ratio is currently 30% meaning Nutanix customers will have a significant capacity advantage on datasets which achieve <2:1 compression but more than 1.3:1.
A fairly weak argument could be made that 30% is too low, my personal opinion is 1.3-1.5:1 is probably the sweet spot but a very strong argument can be made that vSAN’s 2:1 is robbing customers of significant data reduction savings.
Data Efficiency Advantage example calculations.
Finally let’s do some quick math to see what the advantages of Nutanix’ combined higher usable capacity and more comprehensive data reduction technologies might look like for customers:
In my previous post Usable Capacity between Nutanix ADSF vs VMware vSAN I showed a 16 node cluster example where Nutanix had a 41.25% usable capacity advantage over vSAN with 71TB usable vs 41TB for an RF2/FFT1 configuration.
Let’s use those numbers for this example:
** Values updated due to previous miscalculation.
In the above table we see that due to the large usable capacity advantage, when data reduction even with the same ratio as vSAN applied, the effective capacity advantage grows significantly higher.
If we assume a conservative 20% advantage due to Nutanix’s superior data reduction architecture from applying data reduction globally, the persistent write buffer (equivalent to vSAN’s cache layer) AND for compression ratios <2:1, we see that the effective capacity advantage of 87.39TB (not total) increases to more than vSANs total usable capacity with data reduction.
In this example, Nutanix provides 171TB usable (RF2/FTT1) and vSAN only provides 83.83TB, around 50% less usable capacity than Nutanix.
Let’s now consider that if customers are educated on the resiliency issues with vSAN, they may chose to avoid using some/all of vSAN’s data reduction technologies to improve resiliency, in which case the advantage is even greater for Nutanix.
We’ve covered that marketing slides or sales pitches showing both products to support the same features to be very misleading.
In this post we’ve covered a wide range of factors regarding the implementation of Data Reduction technologies and how the underlying architectures have a major impact on the real world value of these features.
If we combine the 20-40%+ usable capacity advantage Nutanix has over vSAN WITHOUT data reduction applied, then any achieved ratio (even if it’s the same ratio as vSAN achieves) is going to increase the advantage further especially as we’ve learned Nutanix applied data reduction to all tiers of storage.
Putting aside the obvious capacity advantages, Nutanix never compromises Resiliency when data reduction technologies are used while allowing these features to be enabled/disabled on a granular basis without reformatting or major back end overheads.
If the resiliency of the data is ever compromised as a direct result of using data reduction technologies, that’s not a minimally viable implementation.
When data reduction technologies cannot be applied to the fastest and most expensive tier of storage (e.g.: NVMe or Enterprise Grade SSD i.e.: vSANs cache), the customer is losing out on typically very significant performance improvement and ROI (getting more out of that expensive storage).
When enabling data reduction make impact of ANY single drive failure in a vSAN disk group (1 cache and up to 7 capacity drives) cause the entire disk group to go offline and need to be rebuilt, the risk vs reward is just not worth it.
By Nutanix investing heavily from day 1 in creating a truly distributed storage fabric, it has allowed not only more usable capacity vs RAW compared to more rudimentary products like vSAN, but also allowed Nutanix to implement data efficiency technologies to further drive efficiencies and give customers flexibility.
In my next post, I cover Erasure Coding Comparison – Nutanix ADSF vs vSAN
This article was originally published at http://www.joshodgers.com/2020/02/03/deduplication-compression-comparison-nutanix-adsf-vs-vmware-vsan/
2020 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product, feature and service names mentioned herein are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. All other brand names mentioned herein are for identification purposes only and may be the trademarks of their respective holder(s). This post may contain links to external websites that are not part of Nutanix.com. Nutanix does not control these sites and disclaims all responsibility for the content or accuracy of any external site. Our decision to link to an external site should not be considered an endorsement of any content on such a site.