REQ: Ability to shutdown AHV cluster and hypervisor hosts from from Acropolis

Userlevel 1
Badge +13
I'm sure this has been asked for before but I couldnt see it in the search so here it is (again?)

I'd like to see the ability to gracefully stop and power down each of the physical nodes in the cluster from within Acropolis.

Even have the ability to configure if this feature is available in the admin menu - off by default.

Definately have a safety feature so you dont inadvertably click on the shutdown,

Use Case something along the lines of:
1) Clicking on shutdown from the Admin Menu
2) Warning is displayed requiring confirmation
3) you then have to type in SHUTDOWN to complete the sequence.
4) Shutdown VM Hosts
5) Stop cluster services and send shutdown command to Hypervisor hosts.

- Failure to shutdown any VM would cancel the shutdown sequence.

8 replies

Userlevel 7
Badge +30
What's the use case for this? Planned outages? Datacenter moves? Is this something you've had to do through another method before?
Badge +1
Any news on how to shutdown an AHV based cluster.

We will have a power cut and would like to shut down the whole cluster preventively.

Many thanks in advance.

Kind regards
Userlevel 4
Badge +20
I'm sure there is a script out there but here are the manual steps.

Copy down IP's of CVMs, Hypervisors and IPMI.
Shutdown VMs from either OS or via Prism.
After all VMs are down. SSH to one of the CVMs and issue "cluster stop" command.
Once the cluster is stopped, logon to each CVM and issue "cvm_shutdown -h now" command.

Logon to each hypervisor and issue "virsh list" to confirm nothing is runnning then issue"shutdown -h now" command.
From there, login to IPMI and power off the hardware.

Userlevel 1
Badge +5
StepsPerform the following steps to shut down all hosts in a cluster to perform maintenance or other tasks such as VM relocation.- SSH to a Controller VM and run ncc health_checks run_all prior to the scheduled shutdown. If there are any errors or failures, contact Nutanix Support or resolve them yourself.- Shut down all the VMs in the Nutanix cluster.- Stop all AFS cluster VMs if applicable- Stop the Nutanix cluster.- Shut down each node in the cluster.- After completing maintenance or other tasks, power on the nodes and start the cluster.

Shutting down AFS (file server) VMsShut down all AFS cluster VMs by executing the following command from any Controller VM.nutanix@cvm$ minerva -a stop

Verify no Async DR replications are in progressFor Async DR: Ensure no schedules or replications are running by executing the following command from any Controller VM.nutanix@cvm$ ncli pd ls-repl-status

Stopping the Nutanix clusterLog on to any Controller VM by using SSH with the Nutanix credentials and run the following command to stop the Nutanix cluster:nutanix@cvm$ cluster stop

Shutting down each CVM in the clusterssh into each cvm and type cvm_shutdown -P now

Shutting down each node in the clusterVia IPMI power off the node
or log on to the node by using SSH wit the Nutanix credentials and run the following command to check and stop
"virsh list" to see the running VMs or "virsh list --all" to see if the CVM is off.
"shutdown -h now" to shutdown the node

Powering on each node in the clusterVia IPMI power on the node

Start the clusterLog on to any one Controller VM in the cluster with SSH with the Nutanix credentials.Start the Nutanix cluster by issuing the following command:nutanix@cvm$ cluster start​Confirm that all cluster services are running on the Controller VMs.nutanix@cvm$ cluster status
Userlevel 1
Badge +6
All the above. We have had to do it several times for various reasons.
Badge +5
We also need to shut down one node today because of a faulty memory module...
Badge +2
This is something we have had to go through a couple of times already within our DC environments.
When we begin a ROBO rollout this is certainly something that would be useful for quickly shutting down ROBO deployments at our remote sites by local support teams.
Badge +2
We have 30+ clusters in ROBO environments... ended up having to write some procedures around all that since scripting some of the logic is just a bit too complex (checking for hw faults post power on etc...)

Two notes:
  1. In our experience, even if there are async replications, the cluster seems to recover if it's powered off (nicely or hard... which has happened to us a couple of times unfortunately.)Kudos to the Nutanix team for writing code that recovers from that and picks up where it left off!
  2. I would also recommend putting the hosts into "maintenance mode" host.enter_maintenance_mode
Check if AHV host can enter maintenance mode. host.enter_maintenance_mode_check
Then Exit MM: host.exit_maintenance_mode

Perhaps I'm being being extra cautious...