Replication Factorについての質問

Question

SSD（NVMe） 使用、RedundancyFactor2の 4ノードAHV構成でのご相談です。ReplicationFactor2のコンテナ上で仮想マシンを動かしています。 異なるノード上の2本のSSDが「同時に」壊れたと仮定して、運悪く壊れたSSD上にデータを配置している仮想マシンがいた場合はその仮想マシンのデータはロストしてしまうのでしょうか。 仮定の話だとしても、同時にSSD2本障害という天文学的な確率の話で申し訳ありません・・ 「ReplicationFactor2でデータが守られています」以外にも安心材料がありましたら教えてください。お客様の質問にうまく答えられなかったので、こちらで質問させていただきました。

takhagi · Accepted Answer

​@kei-54835さん、お疲れ様です。実際に「データがロストするのか」という観点で言えば、ご認識のとおり、故障した異なるノードのSSDに配置されたデータブロックはロストしている可能性が高いと思います。（ただしご認識の通り、そのような確率自体がどれほどあるかという話になりますので、あくまでもそのような事象が発生した場合の話であり、頻繁に起きるものではないという理解です）Nutanixの場合、仮想ディスク単位ではなくデータブロック単位でクラスター内の各ノードに分散されるため、特定の仮想マシンのすべてのデータが一度にロストするわけではありません。以前にこの動作を検証した際、異なる2ノードのSSDで同時障害を発生させると、一部のVMがI/O処理を継続できなくなりましたが、そのVMを再起動すると何事もなかったかのようにOSが起動しました。おそらくOS部分のデータは障害が発生したSSDには載っていなかったのではないかと思います。したがって、SSDの故障によって失われるデータはあくまでもブロック単位であり、そのブロックを読み込まない限り、仮想マシンとしては動作が継続できることを確認しています。OSから見ると、バッドセクタが発生して一部の領域が読み取れなくなっているようなイメージになると思います。そのため、OSより上位のレイヤーでカバーできれば、継続動作自体は可能であると考えられます。

JeroenTielen · Answer

Let me try to explain this e bit further. When using RF1 (Replication Factor 2) for data all extends (bits of data with a size of 1MB) are writen on the local drives of the host where the Virtual machines is running (Datga Locality) the is a copy of the extend writen on 1 of the other disks in the cluster (creating the RF2). When a disk goes down the extends on that disks are recreated (from the other copy in the cluster) on the remaining disks. (healing). When that is done the RF2 is restored. During this healing process a second disk cannot go down, then you will have data loss. When the healing is done, another disk can go down again. (I created a post about this: https://next.nutanix.com/community-blog-154/honey-i-shrunk-my-cluster-multiple-nodes-down-in-rf2-41359)

But if you want to handle the second drive failure during healing process you must create an Redundancy Factor 3 cluster with Replication Factor 3 storage containers. This has a minimum of 5 nodes in a cluster.

In production environments I do not see a lot of RF3 clusters. Only with large clusters (lets say 16 nodes or more) I see this kind of setups. To get you some more information, I've deployed hundreds of clusters from small to large and I never ever had to deal with 2 disks down at the same time in an RF2 cluster.

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded