Do we need to backup anything like configs/settings to a USB drive in case of a serious outage?

Badge +4
I took the admin course two weeks ago and have been tasked with writing an operations manual. One of the sections relates to backup & recovery so my question is "Do we need to backup anything like configs/settings to a USB drive in case of a serious outage?". If not, how does one recover from the most serious of outages? Being new to Nutanix, I'm not even sure what a catastrophic outage would entail so if someone could articulate same and provide a high level recovery process, I would be most appreciative.

6 replies

Userlevel 2
Badge +14
I don't believe there is any graphical function for backup inside of Prism itself, depending on your hypervisor the best recommendation for grabbing a backup of your cluster configuration is probably doing snapshot-level backups of the CVM's themselves - however I am not sure what that would do to your metadata databases if you were to restore multiple CVM's from inconsistent time periods.
Userlevel 4
Badge +18
There are few things which can be backed up in the environement using Nutanix converged infrastructure.

1. User Data : As you may be knowing this, user data is always replicated and default replication factor we use in that is 2 but you can also configure it for 3. What that means that for every data there is replicated data available in NDFS so that node failure can be tolerated. At the same time you can use conventional back up solutions to backup your VMs.

2. MetaData database : Even this part is protected by RF factor 3, which means that each metadata information is stored in 3 nodes.

3. As far as configuration of cluster is concerned, there is no way to back up the configuration and you probably don't need it until and unless your entire cluster goes down. Also in entire cluster down case (catastrophic event) you can configure DR solution which is in built in Nutanix systems.
Userlevel 3
Badge +16
There's not really any benefit to having the configuration on a USB drive, because the configuration is useless without the backup data.

Here's some common failure scenarios and how the Nutanix cluster responds:

1) Node goes offline (hw failure for example). The other nodes have the the data and will automatically replicate this data. When the node comes back online the data will be balanced again (and extra replicas will be deleted).

2) Cluster goes offline.
With Nutanix/3rd party DR:
Activate the remote site which will bring the VMs up at your DR site. When the VMs come back online you can replicate the data back and then power it back up.

Without DR:
Each CVM has a configuration cache it uses to contact the rest of the cluster when it comes up. After the cluster has initialized it pulls down any config updates from zeus and ensures that all the data is redundant (and curator fixes it if not)

3) Cluster gets hit with a meteor
With DR:
Bring up the DR site and work on getting new gear. Once that's in place go through the initial setup and then replicate the DR site back again.

Without DR:
Find the nearest TARDIS and save the cluster.

The biggest worry with any storage platform isn't the configuration, but rather the actual data. We've got many tools available to protect it and there's no need to backup the configuration files.

The only thing I would say take note of is any gflags that support or GSO applies on your cluster. These are advanced configurations that are applied to tune certain workloads and if you ever did have to rebuild that would be the only configuration that would be hard to replicate. They are rare though, so you likely don't have any applied to your cluster.

You can see if any are applied using NCC:

ncc health_checks system_checks gflags_diff_check

These should only be modified by Nutanix, and if you've got any questions about them feel free to shoot me an email or open a case.
Badge +4
Thanks very much cbrown, a most insightful analysis. We have no 'gflags or GSOs' so we will get the DR solution up and tested, then we'll be good to go. May I quote you in the manual I am writing 🙂
Badge +4
* sorry, we have no gflags or advanced configurations at present
Userlevel 3
Badge +16

Let me know if there's other questions we can help with