Alert: Fingerprinting is disabled by Stargate

  • 14 July 2020
  • 0 replies
  • 482 views

Userlevel 3
Badge +4

You may get an alert from NCC stating that the deduplication fingerprinting has been disabled, but it is enabled on a container:

WARN: Fingerprinting is disabled by stargate, but configured on Storage Container(s)

As we know, Stargate is the service that is managing storage on a Nutanix cluster, so Stargate has some conditions on which it will automatically disable deduplication.

  1. If the metadata usage is higher than 250Gb on a node.

  2. If the metadata usage is higher than 50% of the total metadata space on a node.

The second condition is common on small size clusters, for example NX-1000 series, with single-SSD nodes if the SSD size is small. The metadata is located on the SSDs and the total metadata size on the node is equal to the capacity of:

  • One SSD in single-SSD nodes

  • Two SSDs in double-SSD nodes

  • Four SSDs in all-flash nodes

So, for example, if you have a node with 1 SSD which is 480Gb in size, 50% of the metadata capacity will be 240Gb (in fact it will be lower because of the formatting overhead and CVM uses some space, too), which is lower than the first condition of 250Gb. So, the fingerprinting will be disabled even though the metadata usage is lower than 250Gb.

 

If you received such alert, you can run the following command on one of the CVMs:

allssh 'grep "metadata usage" ~/data/logs/curator*'

It will show you the current metadata usage and the conditions, including which node is above the threshold and what is the maximum metadata size for that node. Example:

/home/nutanix/data/logs/curator.WARNING:W1225 14:32:44.416410 18286 curator_cluster_state_ops.cc:669] Disabling fingerprints. Calculated metadata usage of 268485485162 exceeds safety limit value of 268435456000 on node with id 554945641with Metadata disk size=7479818305536; curator_metadata_usage_pct_to_disable_fingerprints=50%; curator_metadata_usage_to_disable_fingerprints_bytes=268435456000

So, from that example, we can see that on the node with ID 554945641 the current metadata usage is 268485485162 bytes which is a bit bigger than the threshold of 250Gb. Total metadata space on this node is 7479818305536 bytes (6.8Tb).

Why can that be a problem?

The main issue with it is that the deduplication creates fingerprints for all the data that is written in the cluster. Those fingerprints are part of the metadata. Everytime when there is a read request, the metadata is read first. The lower the amount of metadata you have, the more of it can be stored in the memory and if metadata is served from the memory, the performance is much better than if you have 1 more access to disks, even if it is SSD. Same goes for writes, because the cluster needs to write more metadata overall. Moreover, CVMs get more work to do and more memory and CPU of the CVMs is occupied by handling deduplication. It generally affects the storage performance in a bad way, because those resources could be used by stargate or any other services on the CVMs.

The best solution would be to disable the deduplication. You can simply go to Storage - Table - Storage Containers and disable the deduplication for the containers on which it is enabled (both cache and capacity dedupe). Deduplication only works well if you have VDI with persistent desktops on your cluster. Otherwise, it will just badly influence the performance while not saving much storage space.

If you disabled deduplication, fingerprints will still remain for the previously deduplicated data, until that data is overwritten, so it is possible that the alert will not disappear immediately after you disable deduplication, but will take a few hours/days depending on the workload.

You can check the current storage space savings from deduplication by running this command:

curator_cli display_data_reduction_report | grep Dedup

If the space savings ratio is below 1.15 or even 1.2, there is no point at all to keep deduplication on.

 

If you don’t take disabling deduplication as an option and you feel it should be used, you have the following options:

  • Ignore the alert and do nothing. The old data will keep being deduplicated, the new data will not be deduplicated.

  • Remove some data from the cluster to reduce the total amount of data (metadata will also be reduced)

  • Add more nodes to the cluster to spread the metadata to the new nodes.

  • If you are hitting condition 2 with small SSDs, replace the SSDs with bigger ones.

The best solution is always to disable deduplication as most probably, the space savings are low and the negative impact on the storage performance is present. The only situation when deduplication is working great is VDI with persistent desktops. In other cases, it is not so good and we don’t recommend to use it.


 


0 replies

Be the first to reply!

Reply