Compression Deep Dive!

  • 18 September 2020
  • 0 replies
  • 2201 views

Userlevel 2
Badge +1

Compression is one of the key features of the Nutanix Capacity Optimization Engine (COE) to perform data optimization. Data Storage Fabric provides both inline and offline flavors of compression to best suit the cluster’s needs and type of data. As of 5.1, offline compression is enabled by default.

Inline compression will compress sequential streams of data or large I/O sizes (>64K) when written to the Extent Store (SSD + HDD). This includes data draining from OpLog as well as sequential data skipping it. There is no impact to random I/O, helps increase storage tier utilization and benefits large or sequential I/O performance by reducing data to replicate and read from disk.

Offline compression will initially write the data as normal (in an un-compressed state) and then leverage the Curator framework to compress the data cluster wide. When inline compression is enabled but the I/Os are random in nature, the data will be written un-compressed in the OpLog, coalesced, and then compressed in memory before being written to the Extent Store. Given inline compression will compress only large or sequential writes inline and do random or small I/Os post-process, that should be used instead.

OpLog Compression

As of 5.0, the OpLog will now compress all incoming writes >4K that show good compression potential. This will allow for a more efficient utilization of the OpLog capacity and help drive sustained performance. When drained from OpLog to the Extent Store, the data will be decompressed, aligned and then re-compressed at a 32K aligned unit size (as of 5.1).

This feature is on by default and no user configuration is necessary.

Here’s a small tip:
Almost always use inline compression (compression delay = 0) as it will only compress larger/sequential writes and not impact random write performance. This will also increase the usable size of the SSD tier increasing effective performance and allowing more data to sit in the SSD tier. Also, for larger or sequential data that is written and compressed inline, the replication for RF will be shipping the compressed data, further increasing performance since it is sending fewer data across the wire.
For more information on compression, click here, and to learn how to enable it, check this out.


This topic has been closed for comments