To reboot an ESXi host, to trigger a PPR for a DIMM for example, when DRS is on and fully automated there a few questions to be answered before putting together a sequence of steps.
First of, should we disable the DRS? When a host is taken out of the maintenance mode the DRS will rebalance the load on the host and as such VMs will migrated back onto the host that has just exited maintenance. What if the CVM is not ready? After all, the CVM is the entity the serves storage to the VMs. Will be VMs affected if the CVM is not fully booted or has not exited maintenance mode by the time VMs are returned to the host? Luckily, the VMs will not be imacted in any way as at the time of placing CVM into the maintenance mode the storage traffic is re-routed and is now handled by other CVMs in the cluster.
Should we evacuate powered off VMs for a host reboot? The answer depends on the risk tolerance and risk assessment of the issue. Is it likely that the host may experience issues after a reboot? If so, better safe then sorry. Migrate the powered off VMs away from the host. It will only take a moment since the VMs are simply re-registered on a different host and no I/O cutover has to take place.
Now that the questions are answered what does the sequence of steps to place an ESXi host into maintenance mode look like?
- Verify that you have access to IMPI and CVM.
- Place ESXi host into maintenance mode. This will trigger VMs evacuation. You can choose to migrate all VMs or only those that are powered on. The task will not complete as the CVM does not meet the migration requirements.
- Log in to CVM and place it into maintenance mode.
- Shutdown the CVM.
- Maintenance mode of the host task will now complete.
- Reboot the host.
- Observer host reboot from the IPMI.
- Once the host is booted and re-connected to vSphere end maintenance mode on the host. At this point DRS will begin returning VMs to the host.
- Log in to any other CVM on the cluster and end maintenance mode for the CVM on the rebooted host.
- Verify cluster health.
Where to look for details?
KB-4639 How to place CVM and host in maintenance mode
vSphere Administration Guide for Acropolis (using vSphere HTML 5 Client) AOS 5.17: Restarting a Node
Hardware Replacement Documentation Acropolis5.16: Node Shutdown