Backup and Recovery
Discussions about how to protect applications with different RPO, RTO and 3rd-party tools
- 196 Topics
- 578 Replies
Whether your environment has grown enough to be ready to have a DR site or the scale of operations can no longer tolerate the loss of the site or maybe even a single application, the first question is “What are the options?”. And that is what we will talk about in this post. Starting with AOS 5.10 Metro Availability is supported on Hyper-V. Metro Availability is the solution if you are looking for minimal RPO and are preparing for a full scope DR scenario. Prism Web Console Guide v5.16: Data Protection The next in line would be Async DR which is a scheduled based snapshots replications essentially. Both MA and Async DR require a physical cluster. Since single-node clusters are not supported at this point with Hyper-V it would have to be at least a 3 nodes block. Prism Web Console Guide v5.16: Configuring a Remote Site (Physical Cluster) What if you are not ready to invest in hardware and be bothered with its maintenance? There are more options such as leveraging the mighty priva
If you need to clear up space on your Nutanix cluster you may be considering removing local snapshots, but what if the reclaimable space is displayed as “Processing”? What can we do about it, and how can we know if a snapshot is worth removing? The reclaimable space is subject to change and has to be totaled up before it can be known. This work is done during the same periodic filesystem scans which drive disk clean-up, post-process deduplication and compression, and information lifecycle management. Generally speaking we just need to wait for the next periodic scan to finish and then we should have the reclaimable space figure shown in Prism. If you need to estimate reclaimable space in the meantime, I can offer a few hints to help narrow that estimate. It helps to have a little understanding of how snapshots work, and how the reclaimable space number is identified. Think of it this way. Each current filesystem block has one or more owners, and any block cannot be deleted until no mor
Nutanix Mine is the product name for joint solutions between Nutanix and selected data protection software vendors.Mine is a fully integrated data protection appliance that combines the Nutanix AOS software with the Veeam Backup & Replication solution. Mine can provide data protection for any applications running in a Nutanix cluster. Nutanix Mine with Veeam extends the core strengths of Nutanix software to backups. It integrates data protection software from Veeam with the software running in a hyperconverged manner on a Nutanix cluster. It is an entirely self-sufficient turnkey solution that provides compute, memory, storage, and now, backups too. Nutanix Mine with Veeam is also tightly integrated with the Nutanix Prism management console, which means that IT administrators can use management interface for both primary storage (as part of the HCI infrastructure) and backups. The Mine on Veeam can back up VMs and application data from Nutanix and non-Nutanix environments, wh
If you’re managing backups with Veeam, Hycu, Commvault, Rubrik, or another third party application then you may have encountered the alert message “Aged third-party backup snapshots present” even though nothing seems to be wrong. If so, you probably want to know why you’re seeing this alert and what to do about it. This alert doesn’t necessarily indicate a problem. It’s just seeing snapshot age. In most cases you just need to change the alert threshold which defaults to 7 days. See this article for more details about the alert. The phrase “third party snapshot” used in this alert refers to a specific type of snapshot requested by third party backup software using an API option specifically for this purpose. The backup application essentially has ownership of this snapshot. If your backup policy is set up to keep only five snapshots for this job, or to remove any snapshots over 14 days old, the decision of when to delete is made within the third party backup software. On the Nutanix end
Q1: Is it ok to create protection domains on two different clusters with the same name with remote sites enabled on both clusters but PD is only setup with local snapshots only? Answer: It is possible to have a protection domain with the same name as the remote site protection domain as long as they are not configured to sync with each other and it is only a local protection domain. It is also possible to configure "active" schedules for that protection domain and retain the snapshots in the local cluster. Q2: Can a Protection Domain be renamed once configured and has entities and snapshots associated with it? Answer: There is a workaround available which is to create a new Protection Domain with this procedure: Create a new Protection Domain with the desired name and schedule. Remove all protected VMs from the old Protection Domain. Make a note of existing consistency groups. Add the VMs to the newly created Protection Domain, reconfigure consistency groups as they were in th
We love our VM level snapshots on VMware ESXi. It is a quick and easy way to roll back recent changes as if nothing has ever happened. It gives peace of mind and boosts confidence. It is not an answer to all prayers as, just like anything, it has its limitations. What are the limitations of VM level snapshot in a Nutanix cluster? First, let’s take a look at what are some of the interesting events that occur during the snapshot operation and its presence: If a virtual machine is running off of a snapshot, it is making changes to a child or sparse disk also called delta disk. The delta disk metadata in-memory of vSphere host includes the delta disk header. Updates to the header of the delta disks happen in memory as required and the changes are written to disk only upon certain events such as snapshot consolidation or when the delta disk is closed. Storage snapshot operations and storage replications are transparent to ESXi hosts. If the storage snapshot used to restore a VM wa
We've just gotten started with Nutanix, and I have a question about snapshots. Do I need to create a Protection Domain before I can create a snapshot? I'm not looking to take snapshots of a bunch of machines or have scheduled snapshots. I just want to make one snapshot of a single Windows server before I upgrade some software on it. We are running AHV as our hypervisor. Thank you
The alert text “Connectivity to remote site is not normal” may show up for a number of different reasons, so it may not be immediately clear what to do about it. If you have recently added a remote site configuration in Prism it could be some problem of configuration, but if you had Async DR replication working for several days before the error popped up the issue is more likely to be related to network issues or the status of the remote site. Since the cause might only be temporary, the first thing to do is to re-run the check. To see the full detail I recommend running this one from the CLI: nutanix@cvm$ ncc health_checks data_protection_checks remote_site_checks remote_site_connectivity_check If you see a PASS result now and no corrective action was taken, the issue was temporary. If that’s the case you may want to check the alert timestamp and look into whether the network or remote site was undergoing maintenance or upgrade, or exhibiting some issue at the time of the alert. Since
If anyone is using HYCU to back up their AHV environment, please let me know your feedback. 1- Is the product good and acceptable solutions to back VM on AHV? 2- How was your restore experience, when you had to recover a VM or file or app data? 3- How is the support? 4- Is their documentation clear and easy to follow on any KB or DIY. Thanks a Bunch
The best way to know if your DR solution will work when you need it is to actually test the DR workflows, right? Of course a DR test can be disruptive so you’ll want to understand the procedures and best practices before your testing window actually starts.The Async-DR solution built into your Nutanix cluster can handle both planned and unplanned failover scenarios. Testing these capabilities is not much different from simply using them when needed.The two most relevant documents for these procedures will be the Prism Web Console Guide and the Data Protection and Disaster Recovery best practice guide. The Prism Web Console Guide provides the execution steps for setup and failover, while the best practice guide provides additional detail on available solutions, requirements, and related considerations around space, bandwidth, and seeding, and a best practices checklist. After reviewing the requirements and scheduling a test window, you can follow the planned failover workflow in the Pri
Good Day, I am very new to Nutanix and recently purchased a cluster. It has only been running for about 30 days now managing my network and I am in the process of make some configurations to it. I need some assistance with an error message that I am receiving. I am using AHV as hyper-visor on my 3 node cluster. On this cluster I am running 5 Windows based Server VM's (not using Hyper-V or VMware). I followed the instructions from the Administration Guide by enabling VSS Shadow Copies on the Servers, then installing guest tools on all the servers and creating a protect domain Async DR. My configurations are working and snapshots are being created for my Domain Controllers and Application Servers. However, when a snapshot is trying to be created of my File Server, I keep getting the following error. "Warning : VSS snapshot failed for the VM(s) FS-01 protected by the FileServer in the snapshot (169035, 1563300389081879, 960) because Quiescing guest VM(s) failed or timed out. Impact
There are many situations when you might need to abort an ongoing disaster recovery replication job on your Nutanix cluster. You may be deleting the protection domain or maybe you added a large VM to an existing protection domain and now you don’t want to include this new VM’s baseline replication in the existing job, at least not before the weekend. Maybe the destination is running short of space and we need to stop adding to the problem. You can see the replication in Prism, but there’s no button to pause or cancel it. To abort the replication we will need to go to the CLI. This process is covered in https://portal.nutanix.com/kb/3272 SSH to a CVM in the cluster as the user ‘nutanix’, then run ‘ncli’ to get into this interactive CLI. First, list current ongoing replications with this command. You’ll need the Protection Domain name and ID from this output for the next command. <ncli> pd ls-repl-status You’ll need the Protection Domain name and ID from this output for th
It’s five-o’clock on Friday and the database admin has just let you know they’re working on a problem and need to keep last week’s Friday afternoon snapshot until further notice. If you do nothing that backup will expire in about an hour due to the protection domain schedule. How can we make sure this snapshot sticks around? Don’t worry! There‘s an NCLI command for this. You can change that snapshot’s expiration time to indefinite, and it will wait around until you delete it manually later. See the solution in the KB article here: https://portal.nutanix.com/kb/8594 First, you need the snapshot ID and the protection domain name. You can get this from the Prism UI, or from NCLI. To capture these details from Prism go to the Data Protection dashboard, Table view, Async DR tab, and select the protection domain from the table here. In the lower panel click on Local Snapshots and find the snapshot with the right Create Time. The number in the ID column is the snapshot ID you need, alo
Trying the veeam AHV appliance, i had a crash whilst taking a backup now am left with orphaned snapshots on a PAID ACCOUNT, and nutanix sends the ball to veeam back to nutanix, so here i am Protection Domain DP-QC-3 has 3 aged third-party backup snapshot(s) and may unnecessarily consume storage space in the cluster. i have a multitude of these and want to manually remove those snapshots, is there a process for this ?
We are currently considering implementing NearSync for a DR with an RPO of 15 minutes. Is there still a cluster limit of 40TB in 5.16? I can't find anything more specific in the documentation. https://portal.nutanix.com/#/page/docs/details?targetId=Prism-Element-Data-Protection-Guide-v5_16:wc-dr-near-sync-requirements-limitations-r.html The restrictions that backup tools like rubrik or netbackup cannot take normal snapshots when nearsync is active should be solved in 5.9 or not?
Looked around for new backup tools and found two interesting ones: http://www.nakivo.com/VMware-VM-backup-replication-recovery-software.htm And second Cloudbacko Cloudbacko shows a screen were a 1 TB is backup in 2 Minutes http://www.cloudbacko.com/en/cloudbacko-advanced-cloud-local-server-workstation-amazon-S3-google-backup-software-benefits.jsp
Configuring data protection and disaster recovery solution in your Nutanix environment is seamlessly simple and interesting. What if you have configured a remote site and have configured all the settings but still you are not sure regarding the configuration. Have you configured the vstore mapping correctly? Have you configured the network mapping correctly? Is there a check which can help me verify all the remote site configuration? Nutanix Cluster Check provides a check which helps you to verify the mapping configuration of a remote site. Give the following Knowledgebase article a read to know more about the check. KB-3335 Give the following documentation a read regarding best practises of DATA PROTECTION AND DISASTER RECOVERY to understand the configuration and limitations in-depth.DATA PROTECTION AND DISASTER RECOVERY
Getting the following error “VSS Scripts Not Installed” in your Nutanix environment with the description “VSS software or pre_freeze/post_thaw Scripts Not Installed”.Confused about VSS and the above-mentioned scripts?Let us help you understand the use of VSS for a snapshot!When you are enabling the Nutanix Guest tool, the following features VSS and Application consistent snapshot is enabled by default.Nutanix native in-guest VmQuiesced Snapshot Service (VSS) agent is used to take application-consistent snapshots for all the VMs that support VSS. This mechanism takes application-consistent snapshots without any VM stuns (temporary unresponsive VMs) and also enables third-party backup providers like CommVault and Rubrik to take application-consistent snapshots on the Nutanix platform in a hypervisor-agnostic manner. What’s the use of pre_freeze/post_thaw then? Within a Windows VM, NGT will use the Microsoft VSS writer built into the OS to quiesce the VM to take the app consistent snapsho
We’re in the process of testing Veeam for AHV on our NTNX running AOS 5.10. So far with Windows issues we have had very few issues and we worked them out quickly. However with a Linux VM running Ubuntu we’re getting errors on the Nutanix where the snapshot is checking for pre-freeze and post-thaw scripts. We’ve attempted a variety of ways to get those scripts in place, but it seems that there is no clear path. Does anyone else have experience with Veeam for AHV and backing up Linux VMs? Does anyone have any info on setting up the pre-freeze and post-thaw scripts? While we have some experience with this on Linux we’re certainly not experts.
Login to the community
Login with your account
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.