Solved

Installing Nvidia host drivers before Cluster expand

  • 31 March 2021
  • 1 reply
  • 67 views

We are in the Process of expanding our 4 Node Cluster to 6 Nodes. The Installation of the Hosts was successful using the Foundation VM. However, the servers have Nvidia cards installed, so we need to install the driver as well. When the Cluster is expanded with the 2 extra nodes, my understanding of the command “install_host_package” is, that it will install the drivers on all hosts, that have Nvidia cards. Even those, that already have the driver. To do so it will migrate the VMs and put the Hosts into maintenance. This is not an option for us, since the reason we are expanding the cluster is that the current cluster already is overloaded. I am looking for a way to install the drivers without interfering with the up and running environment. For example, is there a way to install the drivers before expanding the cluster?

icon

Best answer by Mark-AUSTRALIA 29 June 2021, 17:22

It’s just an rpm package, they even make us remove it manually before permitting an AHV upgrade.

You could therefore just as easily manually install it using the same overall procedure as the removal procedure, but installing it instead of uninstalling. The overall manual uninstall is partially described at the end of KB7973.

They tell you to contact support for the details, but the only bits missing are how to put the AHV node into maintenance mode, there’s an article for that: Putting a Node into Maintenance Mode

shut the cvm down as well before doing the install.

They also don’t mention how to uninstall the RPM package, fortunately the RPM command cheat-sheet is easy to understand: https://www.cyberciti.biz/howto/question/linux/linux-rpm-cheat-sheet.php

uninstall:
rpm -ev [package_name} 

the install:
rpm -ivh {rpm-file}

View original

This topic has been closed for comments

1 reply

Badge

It’s just an rpm package, they even make us remove it manually before permitting an AHV upgrade.

You could therefore just as easily manually install it using the same overall procedure as the removal procedure, but installing it instead of uninstalling. The overall manual uninstall is partially described at the end of KB7973.

They tell you to contact support for the details, but the only bits missing are how to put the AHV node into maintenance mode, there’s an article for that: Putting a Node into Maintenance Mode

shut the cvm down as well before doing the install.

They also don’t mention how to uninstall the RPM package, fortunately the RPM command cheat-sheet is easy to understand: https://www.cyberciti.biz/howto/question/linux/linux-rpm-cheat-sheet.php

uninstall:
rpm -ev [package_name} 

the install:
rpm -ivh {rpm-file}