Server Virtualization

Welcome to the Nutanix NEXT community. To get started please read our short welcome post. Thanks!

cancel
Showing results for 
Search instead for 
Did you mean: 

Distributed object storage on Nutanix

SOLVED Go to solution
Wayfarer

Distributed object storage on Nutanix

Hi, we're starting to look at using Spark on our Nutanix cluster. Not in a huge way but to run some ETL processes in parallel. I'm under pressure to install Hadoop, or at least HDFS on the cluster but the entire concept of adding a distributed, resilient "filesystem" (actually I think it's more an object store) on top of the one already provided by Nutanix seems somewhat off. 

 

Is there a recommended way of doing this? I know that containers are exported to ESXi via NFS. Would that be usable? Would that be able to leverage stargate to access from anywhere? All I really need is a globally available volume shared between all my nodes.

1 ACCEPTED SOLUTION

Accepted Solutions
Moderator Moderator
Moderator

Re: Distributed object storage on Nutanix

Hey Kevin,

I've moved your post from the CE forums to our production product forums. 

 

 

In general, for Hadoop on Nutanix, I'd recommend checking out these three assets which you can cherry pick data from

https://portal.nutanix.com/#/page/solutions/details?targetId=RA-2078-Cloudera-with-Nutanix:RA-2078-C...

https://portal.nutanix.com/#/page/solutions/details?targetId=RA-2030_Hadoop_with_AHV:RA-2030_Hadoop_...

 

 

 

We dont specifically have a Spark on Nutanix guide out yet; however, those two are rich with content for the type of solution that you might want to roll out.

 

 

That said, you are correct that HDFS (in general) is designed for non-redundant storage (like bare metal), so it has a lot of the same constructs that Nutanix does already. It is worth nothing that you can (or should be able to) configure the replication copies of Hadoop itself, such that you dont have many copies in Hadoop on top of many copies on Nutanix. Thats generally where "the rub" comes from when we discuss this with customers.

 

That said, we've got customers doing Hadoop RF2 + Nutanix RF2 (such as in the Cloudera case) and it works just fine, it just imposes a bit of an overhead.

 

 

To be clear though, you can't expose HDFS directly from stargate, so you'd always have something like a Hadoop data node (or data nodes plural) in between Nutanix and Spark

 

Jon Kohler | Technical Director, Engineering, Nutanix | Nutanix NPX #003, VCDX #116 | @JonKohler
Please Kudos if useful!
2 REPLIES
Moderator Moderator
Moderator

Re: Distributed object storage on Nutanix

Hey Kevin,

I've moved your post from the CE forums to our production product forums. 

 

 

In general, for Hadoop on Nutanix, I'd recommend checking out these three assets which you can cherry pick data from

https://portal.nutanix.com/#/page/solutions/details?targetId=RA-2078-Cloudera-with-Nutanix:RA-2078-C...

https://portal.nutanix.com/#/page/solutions/details?targetId=RA-2030_Hadoop_with_AHV:RA-2030_Hadoop_...

 

 

 

We dont specifically have a Spark on Nutanix guide out yet; however, those two are rich with content for the type of solution that you might want to roll out.

 

 

That said, you are correct that HDFS (in general) is designed for non-redundant storage (like bare metal), so it has a lot of the same constructs that Nutanix does already. It is worth nothing that you can (or should be able to) configure the replication copies of Hadoop itself, such that you dont have many copies in Hadoop on top of many copies on Nutanix. Thats generally where "the rub" comes from when we discuss this with customers.

 

That said, we've got customers doing Hadoop RF2 + Nutanix RF2 (such as in the Cloudera case) and it works just fine, it just imposes a bit of an overhead.

 

 

To be clear though, you can't expose HDFS directly from stargate, so you'd always have something like a Hadoop data node (or data nodes plural) in between Nutanix and Spark

 

Jon Kohler | Technical Director, Engineering, Nutanix | Nutanix NPX #003, VCDX #116 | @JonKohler
Please Kudos if useful!
Highlighted
Wayfarer

Re: Distributed object storage on Nutanix

Thanks for that. I was hoping to not have to install a full Hadoop cluster just yet. At the moment it's for only a few Spark jobs. It's looking like I might be able to get away with just running that with Spark on its own but will need a full Hadoop setup, probably HDP in the near future. It's just the scaking that scares me. It's only a small part of what we do and I only have 7 NX3000 nodes to play with and the'yre nearly full anyway.