Is there a recommended way of doing this? I know that containers are exported to ESXi via NFS. Would that be usable? Would that be able to leverage stargate to access from anywhere? All I really need is a globally available volume shared between all my nodes.
Best answer by Jon
I've moved your post from the CE forums to our production product forums.
In general, for Hadoop on Nutanix, I'd recommend checking out these three assets which you can cherry pick data from
We dont specifically have a Spark on Nutanix guide out yet; however, those two are rich with content for the type of solution that you might want to roll out.
That said, you are correct that HDFS (in general) is designed for non-redundant storage (like bare metal), so it has a lot of the same constructs that Nutanix does already. It is worth nothing that you can (or should be able to) configure the replication copies of Hadoop itself, such that you dont have many copies in Hadoop on top of many copies on Nutanix. Thats generally where "the rub" comes from when we discuss this with customers.
That said, we've got customers doing Hadoop RF2 + Nutanix RF2 (such as in the Cloudera case) and it works just fine, it just imposes a bit of an overhead.
To be clear though, you can't expose HDFS directly from stargate, so you'd always have something like a Hadoop data node (or data nodes plural) in between Nutanix and Spark