Solved

Difference between fault host and block

  • 10 September 2015
  • 7 replies
  • 6257 views

Userlevel 1
Badge +10
Hello,

We have 5 Nutanix nodes based on Dell XC630 (single node) with one RF3 Container. I'm aware that we can handle a double-node failure (so 2 block in our configuration) but the Prism views seems to disagree with me :)



When I details the Data Resiliency panel, it seems to not understand the type of node per block we have:

Fault Domain Type: Host

ComponentFailures TolerableMessage
Static Configuration2
ZooKeeper2
Oplog2
Extent Groups2
Metadata2
Erasure Code Strip Size2
Free Space2

Fault Domain Type: Block
ComponentFailures TolerableMessage
Static Configuration1
ZooKeeper1
Oplog1
Extent Groups1
Metadata1
Erasure Code Strip Size1
Free Space1
That's why I'm asking for some explanations on this part, I'm agree with the Fault Domain Node but as we're owned a one node/block server type, the Fault Domain Block should louk exactly the same and print 2 instead of 1 Failures Tolerate ?
icon

Best answer by cbrown 11 September 2015, 18:35

In this case you're just hitting a display issue - we shouldn't show block awareness at all for a cluster made up entirely of 1 node blocks.

Since 1 node = 1 block in your system our normal node replciation will protect you from outages, rather then block awareness (which is just smart placement of resources across blocks). So you're still protected even if block awareness is wrong :)

I filed a bug to get that fixed as it doesn't make sense to show block awareness in 1node/block clusters.
View original

7 replies

Userlevel 7
Badge +35
Hi 

Thanks for the question, let me see what info I can can get for you :)

Regards,
Userlevel 2
Badge +14
Out of curiosity, do all of your nodes have identical disk configuration? Matching SSD configurations play a big part of the logic behind block awareness.
Userlevel 1
Badge +10
Our 5 nodes comes from the same delivery. Each node have 2 SSD and 4 HDD on the same firmware.
Disk balancing across the cluster is fairly equivalent as you can see from the table below:

HostTierModeDisk Usage
HPV01SSDOnline71%
HPV01SSDOnline70.50%
HPV01HDDOnline11.56%
HPV01HDDOnline10.45%
HPV01HDDOnline10.41%
HPV01HDDOnline10.49%
HPV02SSDOnline76.86%
HPV02SSDOnline76.51%
HPV02HDDOnline9.10%
HPV02HDDOnline8.37%
HPV02HDDOnline8.34%
HPV02HDDOnline8.37%
HPV03SSDOnline70.47%
HPV03SSDOnline70.59%
HPV03HDDOnline12.04%
HPV03HDDOnline10.79%
HPV03HDDOnline10.93%
HPV03HDDOnline10.79%
HPV04SSDOnline70%
HPV04SSDOnline70.43%
HPV04HDDOnline10.74%
HPV04HDDOnline9.78%
HPV04HDDOnline9.70%
HPV04HDDOnline9.78%
HPV05SSDOnline71.25%
HPV05SSDOnline71.03%
HPV05HDDOnline11.29%
HPV05HDDOnline10.13%
HPV05HDDOnline10.32%
HPV05HDDOnline10.20%
Userlevel 2
Badge +14
It's possible that you are hitting a bug in the block awareness logic where it's getting confused because 100% of your blocks are single node blocks. You might want to open up a support case as there shouldn't be any user intervention required to enable block awareness for an availability domain.
Userlevel 3
Badge +16
In this case you're just hitting a display issue - we shouldn't show block awareness at all for a cluster made up entirely of 1 node blocks.

Since 1 node = 1 block in your system our normal node replciation will protect you from outages, rather then block awareness (which is just smart placement of resources across blocks). So you're still protected even if block awareness is wrong :)

I filed a bug to get that fixed as it doesn't make sense to show block awareness in 1node/block clusters.
Userlevel 7
Badge +35
Thanks for adding to the conversation 

Let us know if the explanation helps  and thanks for contributing to the NEXT community!
Userlevel 1
Badge +10
Thanks , i know this was just a display issue and i'm glad to hear that you'll take care of this in future update.

Reply