Solved

Storage Node Architecture

  • 27 January 2021
  • 8 replies
  • 316 views

Badge +2

Can a Nutanix Storage node contain more secondary VM copies than what would fit into the existing capacity on the cluster nodes disk? and if yes, how does one bring the storage node down for maintenance?  

icon

Best answer by DavidN 28 January 2021, 20:57

I believe, and might double check this w Support/Sales Engineers - in the situation you describe the VMs would continue to functions/run (no outage of them executing/reading/writing to disk) however the cluster’s resiliency would stay Critical (it doesn’t have enough pool capacity to write 2nd copies) until the storage heavy node is back up (i.e. couldn’t loose any additional disk/nodes/blocks).

 

This is one of the reasons why Storage-heavy nodes need to be added to a cluster in pairs, so storage un-balances during HA events don’t end up causing, “sufficient_disk_space_check” warnings.

Note that the number of storage heavy nodes may need to be more if you’re using RF=3

https://portal.nutanix.com/page/documents/kbs/details?targetId=kA0600000008gVyCAI

 

View original

This topic has been closed for comments

8 replies

Userlevel 6
Badge +5

Hi @theGman 

From the Prism Web Console Guide: Storage-only Node Configuration

Ensure that your cluster has sufficient capacity if a storage-only node fails.

Your cluster must have sufficient capacity to handle the rolling-upgrade operations during which nodes are restarted.

The minimum number of storage-only nodes in your cluster must be equal to the replication factor (RF) of your cluster. For example, if the RF of your cluster is two, add at least two storage-only nodes. If the RF of your cluster is three, add at least three storage-only nodes.

Assuming that you meant storage-only and not storage-heavy nodes, a storage-only node will not store multiple copies of the same data as this does not help resilience. Hence the recommendation to have the number of storage-only nodes equal to RF factor. In that way, each storage-only node will have a copy of the data.

Storage node maintenance process does not differ from a regular node except for the fact that there will be no VMs to migrate off the node.

Badge +2

Hi @Alona 

I do actually mean storage heavy, not storage only, but now that you bring this option up…. 

With Nutanix VMs are written 2x within the cluster for HA, so correct me If I’m wrong but the purpose of storage node is to house more of the secondary copies is it not?     So what I’m looking for is an understanding of how to bring down the storage heavy node, or the storage only node, for maintenance, when there isn’t enough local disk capacity within the cluster to house the secondary copies stored on the storage node.    

For storage only nodes you state that you wouldn’t want just a single instance, so that makes sense to a degree, especially if you have some VMs that has a very large disk requirement that would eat up all the storage on a single node, after all the cluster requires 2 copies of these VMs.  What I’m looking for is the “how does this work” ..  

I want to understand whether during a maintenance event all the VMs continue to have HA capability within the cluster, and I would think in order for this to happen data would need to be moved around when bringing down the storage node.  My question is what if there isn’t enough capacity across the remaining local disk to write this data, and how is this threshold tracked.   It seems to me like this scenario might require some downtime for some VMs to make this happen in this case. 

Userlevel 2
Badge +4

I believe, and might double check this w Support/Sales Engineers - in the situation you describe the VMs would continue to functions/run (no outage of them executing/reading/writing to disk) however the cluster’s resiliency would stay Critical (it doesn’t have enough pool capacity to write 2nd copies) until the storage heavy node is back up (i.e. couldn’t loose any additional disk/nodes/blocks).

 

This is one of the reasons why Storage-heavy nodes need to be added to a cluster in pairs, so storage un-balances during HA events don’t end up causing, “sufficient_disk_space_check” warnings.

Note that the number of storage heavy nodes may need to be more if you’re using RF=3

https://portal.nutanix.com/page/documents/kbs/details?targetId=kA0600000008gVyCAI

 

Badge +2

Thanks David - this makes sense, why bring it down completely when it can still run in critical mode… you just have to get the work done in a timely manner just in case.  :)  

 

So a good architecture that leverages storage heavy nodes would include  a JBOD  that allows for more than 1 server to connect and divi up the disk appropriately between the 2 storage heavy nodes..  

Userlevel 2
Badge +4

I would highly recommend looking at/using the “Nutanix Sizer” to help you right-size workloads like this… It’ll help account for some nuances like this.


Given the amount of re-replication, you may want to separate CVM/Storage traffic from Guest VM traffic and depending on the amount of data in question to use 10Gb+ nics in the nodes… to help curator with resiliency scans.

 

For some background check out: 

https://nutanixbible.com/  section “Awareness Conditions and Tolerance”

 

https://portal.nutanix.com/page/documents/kbs/details?targetId=kA0600000008i1dCAA

 

https://next.nutanix.com/how-it-works-22/curator-scans-types-and-frequency-or-where-is-our-free-space-33516

Badge +2

Hi David - thanks for  your replies.  As for separating storage from VM traffic, well,  this means 2 more 10GB ports per node (Now you need 4 upstream ports for each host), and if you do the math on the cost of switching and datacenter hardware required for scaling HCI you soon see that this model falls apart from a cost perspective.   You might avoid an array in this stack but the lack of network consolidation, along with a need to license for each node, catches up fast as you expand.   

Userlevel 2
Badge +4

Oh, I’m well aware of various vendor’s “per 10Gb port licenses”…. you could employ basic VLAN separation of CVM/ESXi from Guest VM traffic and use other methods to “tier” your Guest VM traffic… I doubt many guest OS’ can saturate a 10Gb link for long, so ‘sharing’ storage & guest vm traffic on host uplinks is a small price to pay. (This of course totally depends on the types of loads you have - you know those best.) 
 

As far as scaling… I don’t know of any other 3-tier system that scales as well as HCI (for each node you add you have a choice to scale the clusters IO/storage or compute depending on nodes you add).

I’ve been there (3-tier systems) and done the math (fan-in/fan-out ratios, zoning oopsies, bully VMs and “managing SSD cache” -manually… not really wanting to go back.)
Don’t take my word for it… recommend some reading by Josh Odgers (http://www.joshodgers.com/

 

I would also recommend you get in touch w a Nutanix Sales Engineer and work out your needs & concerns. Details is what it’s all about and given how many systems have reference documents & 100% supportability on Nutanix, I’m sure there is a solution that’s right for your budget!

Badge +2

Thanks David - I appreciate your responses.   In closing, HCI is great when rapid expansion is called for, you get to expand compute and storage simultaneously without concern of a storage front end.  But this is hardly the best of the benefits HCI brings to the table in my opinion.  I would argue its the pre-canned automation, consolidation into a single stack, with its operational efficiencies that make it easier to manage.     I will point out a few things from our own infrastructure which is a hybrid mix of HCI and traditional w centralized array:  1.  many organizations don’t need to expand and are in fact reducing their footprint.  HCI doesn’t provide the flexibility to reduce a footprint without a bit of challenge, if you have moved a large # of workloads to the cloud and no longer require the compute power you can’t just pull a couple of nodes away without addressing the reduction is storage that would accompany a compute reduction.   #2, we run over 50 hypervisor nodes with about 35 VMs each against a single array and see an average of .5ms latency with very little spiking above 10ms, but our Nutanix deployments average 2ms latency with a substantial amount of spiking that far exceeds the what our centralized array exhibits, and these specific hosts don’t run ½ the # of VMs per host as our non HCI clusters do.   I certainly understand the value of what HCI brings to the table and it is a perfect fit for some organizations but not so much for others.