Disk Storage Format


Badge +2
What does Nutanix uses for its native store. underneath all the layers. I presume it is an existing object or kv store or a fs. I keep seeing references to NTFS, VMFS and a variant of Cassandra. I am trying out some SSD optimizations where i have my own open source SSD controller and a custom FS (being developed) , so need to adapt this to Nutanix.

This topic has been closed for comments

16 replies

Userlevel 6
Badge +29
Have you read through the Nutanix Bible yet?

www.nutanixbible.com

I'd suggest starting there, and let us know if you have any questions after that.

The short story:
Nutanix has its own file system, known as Nutanix Distributed File System (also known as Distributed Storage Fabric). We do NOT use NTFS or VMFS, ever.

We provide virtual disk storage to virtual machines, and on those virtual disks, you can run whatever you want.

You can not, however, modify Nutanix's file system at all, and honestly, I can't imagine a situation in which you'd need to.
Badge +2
I did go through the nutanixbible and pretty much every other references I could find. Could not find anything conclusive after 1 week of search ! While I did go through references to NDFS, it appeared to be a custom variant of NFS. Nothing in the nutanixbible or any other reference mentioned anything explicitly of the final disk storage format. Hence the posting . If I assume that Nutanix has actually created its own native disk storage format, are there specific optimizations for SSDs and HDDs ? Since i am building my own SSDs, I do need to midfy teh FS to take advantage of its features since a a lot of features like atomic tran. KV ops are provided by the controller. Think of it as an open source alternative to FusionIO.
Userlevel 6
Badge +29
Nutanix Bible is pretty exhaustive, it has sections on how we store data (extents and extent groups), and such.

NDFS has nothing to do with NFS, it is completely indepedent.





We pick the best connection format for the hypervisor, so we happen to use NFS for ESXi, SMB3 for Hyper-V, and iSCSI for AHV (which is what CE uses).

The final format behind the storage connection protocol is invisible to everyone for a reason, as end-users dont interact with it at all.

Yes, we've got specific optimizations for SSD and HDDs, but again, thats not exposed to end-users, as honestly, there's no reason to.


For your specific situation, it sounds like you're developing your own hardware? If so, I think we'd probably have lower level issues to deal with, like drivers into AHV (Linux CentOs based), before we get to anything Nutanix storage related.

Also, keep in mind, we previously used to use Fusion IO in our paid product, and dropped it years ago for many reasons. One of them was servicability, the other was driver stability, and the other was overall hardware reliability. We've standarized around SAS based hot swapable drives, and are going to NVMe hot swappable here shortly, as they are more serviceable, have better/more stable drivers, and tend to me much more reliable hardware to deal with.

That's not to discourage you in your efforts at all, just giving you a little context since you mentioned Fusion IO.
Badge +2
Thanks for the detailed reply.

Yes I saw the section on extents and extent groups. But I did feel
that it was not clear as to what the extents used for storage, a native format or files
of an underlying file system. I asked this angle from the ascpect of teh async I/O used to acess the disks. Hence the request for clarification. I am aware of the
various choices having been developing file system and database engines for about 30 years now ! In terms of performance I was hoping you would use a native format since that gives the best
perfirmance. Very similar to what RDBMSs use. From your reply that seems to be the case.

Regards

Madhu
Userlevel 6
Badge +29
You're right, we dont go that deep in the Nutanix Bible, as honestly, I dont feel that we need to, as from the Nutanix perspective, we are care about the Extent and Extent Group level, and everything else is an implementation detail that could change from release to release. We abstract that complexity from end-users for a reason.



That all said, if you're thinking about using Nutanix Community Edition (CE) to do any sort of performance testing/etc, know that the architecture of CE is a bit different than traditional "paid" nutanix, specifically with regards to how we address physical disks.

This is the part you'd probably care about the most.

In CE, we use LUN passthrough in QEMU to emulate disks into our controller virtual machine.

In Paid Nutanix, we use PCI passthrough, so that the controller VM has direct access to the devices, just like a bare metal server. This performs infinitely better than LUN passthrough.

We chose LUN passthrough for CE, as its more compatible with more situations, where users may not have a full enterprise quality HBA capable of doing PCI passthrough.
Badge +5
Is it possible to enable passthough in CE assuming the hardware supports it?
Userlevel 7
Badge +25
You would probably have to hack on the domain xml for the cvm. Likely it would revert back at the next upgrade. Unsure if anything else would break as well.
Badge +5
I really like the idea of the CE edition but I'd really like to be able to play with it in a strictly performance mindset. I kinda wish there was a paid for version that was HCL based instead of having to buy their hardware.
Userlevel 7
Badge +25
Yup CE has fed the sw only camp (of which I am a part of) but it's just not there today. Honestly CEs hcl is enabled by using lun instead of disk devices.Lots of cool stuff emerging though with nvme eliminating the middleman (hba complexity) and containers providing a lighter weight option for the VSA components. Instead of routing in pcie devices to a pinned vm Hades could access them directly while retaining some abstraction for serviceability. I know over simplified, but it's not pie in the sky.
Userlevel 6
Badge +29
Thanks for reaching out.

I'm curious, what's your end goal W.R.T passthrough and the performance mind-set? to stress test some of your apps on CE/AHV? Benchmark against an existing system? Kick the tires on NX?

No wrong answer here, I'm just curious so I can respond properly.

That said, from a strictly technical perspective, you could probably "hack it up" on the domain XML has justin alluded to to do a pci device passthrough. Dont think we "block it", but if you've got an adventurous spirit, and some high end HBA's (perhaps LSI 3008 based in IT mode), go for it.

RE Paid Version with HCL
Are you talking about having a "low cost" community edition thats more like our production code? Or having Nutanix as a software only offering for full bore production (with enterprise support, etc), where you procure hardware and just licence the software?
Userlevel 1
Badge +10
I, personally, would like to see the "low cost" CE edition, which is more like the production one. Which works the same way but restricted to run, say, 4 nodes.
This would be a great.gift to developing countries, on one of them I'm living at. 4 node paid nutanix (even the starter edition) is "too much" for us.
Badge +5
We are a small company that operates like a big company on a small budget. I am personally a jack of all trades in the IT department. I am responsible for our entire infrastructre end-to-end. We utilize last gen off lease Dell for our virtulization infrastructre and its dirt cheap we get hardware warranty through xbyte. We will never have the budget to forklift over to Nutanix nor could I justify it. However if I could license it kinda like VMWare then it becomes a possiblity. I also like idea of a more production code based CE edition for purchase limited to 4 nodes. I someday want to start my own cloud hosting company and a CE edition for purchase (low cost) woud be a great way to get started without having to have massive startup costs but still be able to offer great features and performance.

Thanks
Ben Bliss
Sr. Infrastructure Architect
Sentinel Security Life Insurance Co.
Userlevel 6
Badge +29
 - Thanks for your thoughts.

I understand exactly what you're talking about, If you feel comfortable sharing, I've got a few questions:

1 - what budget *ranges* are we talking here?
2 - what "support" experience (SLAs, etc) would you expect/need?
3 - what "servicability" experience would you expect/need (think software upgrades, hardware maintenance, firmware patches, etc)?
4 - Would this more "production" version (in your mind) support VMware? Hyper-V? Both?
5 - What "other" features would you need/expect (other than "running VM's")?
Userlevel 1
Badge +10
Here are my somewhat quick answers:
1. I think, the range could be like 1K-2K USD, but this can be defined through some local research
2. I think SLA can be based on email support within one week timeframe.
3. Software upgrade, patches and/or security patches
4. No need for different hypervisor support other than ahv
5. Since this would be only 4 node cluster, replication (backup/restore) to other 4 node cluster.

I see this as a first step to converting to full featured nutanix in the future. And of course, there will be more nutanix lovers. ;)

Regards,
Nemat
Badge +5

I agree with most of this:
Here are my somewhat quick answers:
1. I think, the range could be like 1K-2K USD, but this can be defined through some local research
2. I think SLA can be based on email support within one week timeframe.
3. Software upgrade, patches and/or security patches
4. No need for different hypervisor support other than ahv
5. Since this would be only 4 node cluster, replication (backup/restore) to other 4 node cluster.

I would add that the 1-2k or even a little more,maybe 2-4k per cluster, could be on a annual basis to cover R&D and the patches and upgrades. Offer DR replication as an addon to another CE cluster. I would like AHV and ESX but not a must. It would be nice to have say 2 of the CE clusters; 1 in PROD and the other at a DR site with replication. HA would be nice too. Firmware in this should be our/my responibilty not nutanix. Hardware should also be the customers/my responsibility. Terms should basiclly be must use HCL server grade hardware (Dell,HP, etc) or no/best effort software support. Current feature support of CE seems suffcient other that my above mentioned HA and the ability to replicate to another 4 node cluster at a DR site.

Thanks
Ben Bliss
Userlevel 6
Badge +29
Guys - realized I never responded to this, cleaning up some backlog. Thanks for the information.