Question

Horizon View Instant Clone & Nutanix Firmware updates....Sanity check..

  • 24 May 2019
  • 4 replies
  • 391 views

Badge
We have been running our Horizon View environment on Nutanix a little over 6 months, and I'm ready to do my first round of disk/BIOS, etc firmware updates. My main concern, is since we are utilizing Horizon View instant clones, there is a powered on "parent image" on each host...We are running View 7.5, and I know in previous versions, there was an attribute called "Instant Clone maintenance" that you changed the value to a 1, and it would shut down and delete that parent image, disallowing any desktops from being provisioned on that host allowing you to place it in maint mode, and do your business. I need some clarity (and a sanity check!) on how to allow (or if I don't use LCM) to perform these necessary firmware updates without bringing my View environment down...This environment is used 24x7 and cannot withstand an outage..Please show me your wheel, so I have no need to re-invent!! Thank you community!

4 replies

Badge +1
@VirtuallySteve,
Did you ever find an answer to this? We're standing up a new cluster using instant clones and running into the same thing. I'm wonder if I just have to manually do each node one at a time instead of the rolling upgrade method that I'd far prefer. Hopefully you found a better solution - I'd love to hear it.
Badge
mjg, Currently the best way to do the nodes that are running your instant clones is individually. VMware has made it a bit easier, as you'll notice each node (in the vSphere web client) has a custom attribute "InstantClone.Maintenance", and the current value should be 0. My steps to upgrade firmware, or anything that you need to put the host into maintenance mode are below:
  1. Edit the value for "InstantClone.Maintenance" to 1
  2. After a few minutes, vCenter and Horizon View will see that value, and shut down/delete the cp-parent VMs on that host, and the value will change to 3
  3. Once you see the cp-parent VMs are gone, you can go into Prism and go to LCM, once you have performed an inventory, you should be able to put a check next to the host you want to upgrade (ONLY CHECK THE HOST YOU WISH TO UPGRADE!).
LCM will take care of the rest. It will invoke DRS (make sure HA Admission control is disabled or Nutanix will fail the process). It will migrate all workloads off the host, shut down the CVM, place the host into maintenance mode and begin the firmware upgrades. WHen the host comes back online, all your health checks are good, just change the value back to 0, and move some workloads back. When new VMs are needed in your instant clone pool, the cp-parent images will be spawned again. I would generally wait until the cp-parent images were re-spawned before moving to the second host. I only have 4 nodes in my instant clone cluster currently, so depending on how many firmware are needed bank on at least an hour per node I am on AOS 5.10.5 LTS, as there were some much needed improvements in LCM in the 5.10.x line up, so be sure you are on that build or higher.

NOTE: if you have ANY nodes that require SATADOM upgrades, you HAVE to apply them on every node before any other firmware shows as available.
Let me know, i'm happy to answer any other questions!
Badge +1
@VirtuallySteve
Thanks for the quick reply! I was able to get firmware on a couple of nodes using this method yesterday and it went smoothly.

Not to fork the topic too much, but do you mind if I ask how you're handling the AOS upgrades? I've only ever used the rolling upgrade method which this would seem to break. Is it possible to do AOS one node at a time/have the process wait for you to switch the next node InstantClone.Maintenance to 1?
Badge
mjg, no worries, I am really glad to help! Glad your firmware updates are going well! Definitely take it slow to ensure everything is healthy between node upgrades. AOS is completely non-disruptive to the hosts. No need to place the hosts in maintenance mode, as it installs AOS on each CVM, and reboots them in a staged/safe fashion. Let the Nutanix cluster do its thing for the CVMs. I highly recommend the first couple types of firmware/AOS upgrade work you do, open a proactive ticket with Nutanix. Have them do a pre-work health check, they'll keep the ticket open during your work (in the event you have an issue, you'll get next in line support with a ticket already opened). I used to then call back in the next working day and have them do post-upgrade health checks. I'm more than happy to share my commands etc that I run on my own now to ensure things are healthy.

Reply