Solved

Network and storage segregation

  • 27 October 2015
  • 5 replies
  • 4589 views

Badge +10
Hi all,

My present client is making me working on a design implying to deploy a XAAS plateform with a lot of security inside (NSX is on the cook list). In our team, i was declared as the nutanix man.

My mission is to determine the best way to isolate storage from diffrent workload :
  • we'll have an integration zone, a developpement and a production zone as PAAS.
  • Some node will executed heavy VDI (K2 graphic cards will achieve this),
  • Then we'll have a IAAS with vcloud SP for the sandbox zone).
So we can imagine some sort of multicluster with 3 cluster dedicated to 3 zones : 3D VDI zone / SANDBOX IAAS zone / and PAAS zone. It implies at least 9 nodes which is a bit too much for starting plateform.

My question is, can I make a 6 node cluster for nutanix / storage point, with 3 volumes pools on all 6 node ?
and more than that what will it implies in terms of storage performance, data protection, data seggregation (security team will challenge me heavily on this part) ?
Last but not least, can I achieve this at the container level ?

best regards,

Thomas CHARLES
icon

Best answer by cbrown 2 November 2015, 18:51


That's the slick part of this - there wouldn't be a performance impact if setup correctly. Stargate and Curator will just not use the other disks as a target for the data or it's replicas. This does mean that one application will fill up before others. When they start to run out of space I would say they should spin out a new cluster rather then expanding this cluster (makes it easier to manage and just nicer over all)

If setup incorrectly (for example by not keeping the VMs on the hosts that have their SP's disks) we would lose data locality, but that's about it.

And yes, Zookeeper and Cassandra will continue on as normal. Assuming 3 2-node blocks, you should keep ZKs on separate blocks but beyond that it doesn't matter too much. There's a couple of choices you can have for actual deployment:

1) Ability to shutdown block without affecting any layer



In this deployment any block can fail without taking down any of the layers, but if you need to shutdown one of the layers for maintenance you need to shutdown the whole cluster

2) Ability to shutdown application without affecting other applications:



In this case any application can go down without affecting the other applciations, but if they want to shutdown a block for some reason {EDIT:} an application layer needs to go down {not the whole cluster}


Just to add another wrech to the deployment, you could do an RF3 cluster with RF2 containers. In that case you can lose 2 nodes without affecting the cluster as a whole, but if both nodes failures are from 1 application that application will go down without affecting the other applications. This plus the setup from the first scenario up there give you the ability to withstand both failure scenarios. There is a small performance hit here. They also wouldn't be able to spin out the nodes into a new cluster easily here (as RF3 requires 5 nodes) which makes this a less then ideal solution.
View original

5 replies

Userlevel 7
Badge +35
Thanks for the question  - I will pull in a resource to help
Userlevel 3
Badge +16
Hey Thomas,

You can accoplish this using 6 nodes and storage pools. You'd want to do the following:

1) Create 3 storage pools, each with all the disks from 2 nodes. If possible, don't use 2 nodes on the same block
2) Create 3 containers based on these three storage pools
3) Create 3 vSphere clusters, each with the 2 nodes that share the storage pool
4) Use vSphere datastore permissions to prevent access to the other containers
5) Use container level whitelists if you need to access the datastore externally

The storage pool will provide physical separation for the data, so those 2 nodes are the only nodes that will hold the actual data. The Metadata will be spread out across the cluster as normal. Because metadata (and zeus config) is spread out across all nodes, the failure domain will be all 3 enviornments still (which is the biggest downside of this vs the 9-node config)
Badge +10
Thanks Mr Brown ! 

I like the concept. As far as I know, this is not recommended by Nutanix to do such pooling so I guess there is a downside...

Can You tell me more on this in term of performence tax ?

I understand that Metadata will be spread across cluster, meaning that I'll have a good RF on this with the downside that if a component has a failure on one of the node, all will be impacted no matter what.

As it will not be so complicated fo Zookeeper, i guess it will not generate a tax on this complexity.

How curator / stargate react ? Do you have some charter that I can use to help my management decide ?

thanks a lot for this,

Virttom
Userlevel 3
Badge +16
That's the slick part of this - there wouldn't be a performance impact if setup correctly. Stargate and Curator will just not use the other disks as a target for the data or it's replicas. This does mean that one application will fill up before others. When they start to run out of space I would say they should spin out a new cluster rather then expanding this cluster (makes it easier to manage and just nicer over all)

If setup incorrectly (for example by not keeping the VMs on the hosts that have their SP's disks) we would lose data locality, but that's about it.

And yes, Zookeeper and Cassandra will continue on as normal. Assuming 3 2-node blocks, you should keep ZKs on separate blocks but beyond that it doesn't matter too much. There's a couple of choices you can have for actual deployment:

1) Ability to shutdown block without affecting any layer



In this deployment any block can fail without taking down any of the layers, but if you need to shutdown one of the layers for maintenance you need to shutdown the whole cluster

2) Ability to shutdown application without affecting other applications:



In this case any application can go down without affecting the other applciations, but if they want to shutdown a block for some reason {EDIT:} an application layer needs to go down {not the whole cluster}


Just to add another wrech to the deployment, you could do an RF3 cluster with RF2 containers. In that case you can lose 2 nodes without affecting the cluster as a whole, but if both nodes failures are from 1 application that application will go down without affecting the other applications. This plus the setup from the first scenario up there give you the ability to withstand both failure scenarios. There is a small performance hit here. They also wouldn't be able to spin out the nodes into a new cluster easily here (as RF3 requires 5 nodes) which makes this a less then ideal solution.
Badge +10
Thanks for this Sharing  !:)

I'll give you feed back on this.

To explain this choice of design, you need to understand some contraints that I have for building this blackbox :

  • Implementation will be spread across multiple entities, including massive (1000+ VM + on demand IAAS to little (no more than 50 VMs) deployment.
  • As a matter of simplification (CAPEX volume and OPEX installation team), choice was made on only one DELL model,
  • Since we need 3D support and FC child card for some entities, RC730 16G will be the only option (for nutanix).
  • Nutanix is balanced with VSAN on my client choice.
  • model that will be tested : 2xE5_2680v3 CPU / 24 x 16 Go of RAM / 4 x 400Go SSD drive / 12x 1,2 To SAS 10k HDD / 2xK2 card when 3D is needed...
=> this is a good fat box so I'm trying to convice my client chosing the coolest product and this kind of short package for little client.

So my strategy will be :
__________
little BOX :
6/8 nodes with 3/4 storage pools / 1 vCenter & 3/4 clusters
middle BOX :
{9/12 nodes with 2 clusters, 1 storage pool for management / 1 for resources and 2/3 containers / 2 vCenter (management & resources w/ 2/3 Vclusters} OR
{9/12 nodes with 3/4 clusters, 1 storage pool each and 1 containers / 2 vCenter (management & resources w/ 2/3 clusters}
large BOX :
  • 6+ nodes management / 1 cluster /{1 SP (for RF3 +ECx support).Maybe 2 SP if NSX edge segmentation is needed}.1 vCenter 1+Vcluster.
  • 6+ nodes resources / 1 cluster / 2/3 SP. 1vCenter 2/3 clusters.
_____________

w/ your explanation (and a reserve since it will be taylored on paperboard not tested nor validated), and some pros and cons table I bet it will be fair enough for their choice to be made.

sheers,

Reply