Alternatives to the USB key


Userlevel 7
Badge +25
Any chance we might be able to target the install to a physical device in a near term release?

My logistics team is looking to relocate my lab to another region which will make access to the USB keys a bit more of a challenge.

I also just had my 5th device fail (DD to the device fails w a write error) which I have to attribute to some wear issue so I am sure kingston is getting a bit tired of my RMAs. I have a couple SAS spindles I could target in the boxes which shouldn't have these type of issues.

I am thinking I might be able to come up with some hack to get it imaged to a spindle, but was unsure on when it went through the install script if it might panic or labotomize itself. Anyone try and put the image on a HDD on the SAS controller and can post some experience?

This topic has been closed for comments

26 replies

Userlevel 1
Badge +10
Hey Justin,

CE doesn't really care what device it boots from. SATADOMs work great as will cheap consumer grade SSDs. We haven't tried spinning disks, but I don't see why that shouldn't just work.

All it takes is to DD the image to the desired boot device and reconfigure the BIOS to boot from it.

Note that if the controller the boot disk is on is unknown to the drivers in the initramdisk you have to rerun dracut to add the right drivers (thank you, RedHat). The procedure is posted elsewhere in this forum.

Best wishes,
Jan
Userlevel 7
Badge +25
Cool. Will give it a shot wth one of my nodes.

  1. Roughly it will be a new USB with a linux live instance (wonder if UEFI would work)
  2. Boot form Live image
  3. Copy down (or stage) the CE img
  4. Determine the target device (was combine the lsi data with data from lsblk)
  5. dd to the corrrect sd maybe a 1m blocksize
  6. reboot to the HDD
  7. Should be able to run install like norm
Hopefully Nutanix KVM will recognize it is root and not present the working HDD for adding to the storage pool.

Stay tuned.

If it does that might make for a nice RFE to allow for specifying an alternative root fs in the install script
Userlevel 7
Badge +25
So grabbed the instructions from the 380 thread on dracut notes and gave it a go...

  • Did a CentOS 7 live boot off USB
  • Copied the IMG from media to /tmp
  • sudo dd from /tmp to /sdd (likely a larger bs would have gone quicker, but it went pretty fast by default)
  • reboot server
  • Booted nutanix kernel and got the init error for missing modules
  • Rebooted to rescue and did the dracut fix (though on rescue uname wouldn't provide the right var, but uname after the nutanix kernel init fail likely would work
  • rebooted to nutanix kernel
  • filled out install UI and hit proceed
  • :robotmad:
From what I see one of the events I kind of expected occured. Install looks to try and use all disks regardless of mount state and if it can't wipe its partition it interprets that as a fatal and bails rather than just excluding it from use by the CVM.

How does Phoenix/mdadm exclude the SATADOM or USB from the device list normally? Is there some way I can piggy back off that to exclude in-use devices? Other creative thoughts or even just what is going on so I can think if I can fool it?

Userlevel 7
Badge +25
So spent some time spelunking in the code this morning working back from the issue.

So the core looks to be a difference between CE path and Prod in svm.py. Svm.py collects all the devices in line and discards the boot since SATADOM would show up, BUT in the CE path it leverages a var set in imagingUtil (CE_DEVS) which is set by usable_disks. usable_disks is set basically by excluding USB and requiring a size > 99GB with no context to boot partition on all devices.

SysUtil/collect_disk_info doesn't seem to have a var for boot which would make the exclusion a bit cleaner, but not a big deal as find_boot_disk still exists. Would be a nice add IMHO to the disk_info class though.

I tried the "easy" fix by copying the call from the prod path, but looks like we don't pass boot_disk in on the param list. I think I can mod it to use find_boot_disk and still drop it from the list (or hardcode for a quick check), but sort of working in the dark. Anyone with a better understanding of the phoenix process have a better approach for excluding the boot drive in the CE path than me try/failing through installs?
Userlevel 7
Badge +25
So just doing a data_disks.remove('sda') worked and the install seemed to complete and I was able to create a one node cluster.

Now hardcoding = bad :p

so will play with the result from find_boot and see if I can clean that up a bit more.
Userlevel 7
Badge +25
So couple minor notes...

Instead of uname which wouldn't work in rescue from what I saw I just did this. Wouldn't work on a system with a mess of installed kernels, but in this case for the moment...

KERNEL=`ls /lib/modules`dracut -f /boot/initramfs-${KERNEL}.img ${KERNEL}Now the find_book_disk I had less luck on mocking it up. I need to pass in the param_list, but for some reason it was coming up as null. Probably just not reading the code well. The remove.('sda') has been working thus far.

Now one event that I am not 100% sure if correlated was in my testing I created a 1 node cluster which worked fine with my tweaks above. As I have been rebuilding the 3 other nodes to the same HDD model I did a cluster stop/destroy and it never seemingly completed. After cancelling out though it looked like the root file system on the Host was toast. Now why a CVM command would kill the Host I have no clue so will do a bit more testing before confirming "success".
Badge +5
Hi,

Did you ever manage to get it work? And if you did, can you please post the complete steps here?

Thanks,

Amit
Userlevel 7
Badge +25
Yup as noted above its working pretty well this far. You well need a live ISO (I used centos) to image from the ce.img to a device. The first boot will fail, but you can use dracut to link on the modules. And in my case my spindle is on my lsi so I needed to update one of the Phoenix scripts top exclude the disk.
Badge +4
Can you explain what you mean by:

"So just doing a data_disks.remove('sda') worked and the install seemed to complete"

And where you set this?

Many thanks
Userlevel 7
Badge +25
So svm.py collects all the disks for adding to the available storage pool.

With CE there is an environment var that directs the code to collect up all the disks without excluding the root partition. They exclude only by size and usb which in a HDD install means that the install will try and use root. That kills the install so the disks.remove call I insert into svm.py to manually exclude root so it is not swept up. I would be lovely if root was an attribute of the diskinfo class, but its not. There is also a function to detect root, but I didn't spend enough time to work out the parms I needed to use it effectively. With a handful of like nodes root was always sda so hardcoding the exclusion worked w/o an issue for me.

And since I also needed to do the manual gen of init using dracut I was already in the file system. One more vi step wasn't a big deal.

EDIT... sorry forgot. Like I note in the rest fo the thread the change is in home/install/phx_iso/phoenix/svm.py and I add the line right around the second instance of CE_DEVS. You will see the "if CE else" logic where in the else the self-.p_list.boot_disk is removed from the data.disks list. the remove(sda) is just hardcoding the same result.
Badge +4
Many thanks!!

That worked.
Userlevel 7
Badge +24
FYI I'm adding code to avoid wiping the host boot disk in all cases for the next update. Here's a diff for anyone that wants the functionality sooner. The file is /home/install/phx_iso/phoenix/imagingUtil.py. NOTE - this is tested for syntax only so far. Try at your own risk and let me know how it goes.

@@ -480,8 +480,15 @@ def image_node(param_list, unattended=False): # CE: choose disks to use if os.environ.has_key('COMMUNITY_EDITION'):+ # exclude the host boot drive+ host_boot_part = sysUtil.get_partition_from_mount_path('/')+ if host_boot_part:+ host_boot_dev = host_boot_part.rstrip('1234567890')+ else:+ host_boot_dev = None usable_disks = [disk for disk in sysUtil.collect_disk_info().itervalues()- if (not disk.isUSB and disk.size >= 99)]+ if (not disk.isUSB and disk.size >= 99 and + disk.dev != host_boot_dev)] os.environ['CE_DEVS'] = ','.join([disk.dev for disk in usable_disks]) os.environ['CE_HBA'] = "lun" # find boot disk(s)
Userlevel 7
Badge +25
Awesome. Now just because I am **bleep**... why are you duping the functionality in sysUtil.find_boot_disk? Seems like the CE path was already implemented, but it justw wasn't used in imagingUtil or svm?

Do we think that there will be a way to avoid the dracut step as well? I don't know the eccentricities of init, but is the reason that the ce.img contains an ramfs that only contains udev pointers to the USB devs? Any way to make that more abstract or is it chicken/egg?
Userlevel 7
Badge +25
Hah.. looks like lithium is a bit liberal with their censor button.
Userlevel 7
Badge +24
It just seemed easier at the time - I may refactor a bit for the official checkin. =)

Why are you looking to avoid dracut? It mirrors our normal hypervisor install process for bare-metal. The basic idea is that we distribute a slimmed-down initramfs with the minimum set of drivers to boot. Nothing is setup on first boot, so other drivers can be loaded after the fact. Once the system is installed, however, you need things to come up in a certain order and some of that has dependencies on the pre-init stage. In order to bring everything up correctly, we have to run dracut to inject the necessary drivers into the initramfs.
Userlevel 7
Badge +25
Completely understood. I wans't sure on what I was missing in my p_list when I tried to use it, but honestly its a bit of work to know how those CE flags change the environment without stepping through it all.

Sorry was not specific on dracut comment. Not to elim dracut, but more when I dd to a HDD I need to call dracut manually after first boot because the kernel can't find initramfs. Thats one of the two changes I needed to support local disk (including your above boot dev exclusion). What I was trying to figure out was if there was some way to avoid the manual regen of initramfs and have that work OOB. My guess is just what you note... I am missing the drivers for the HBA preventing the first boot success. That a tad bit clearer?
Userlevel 7
Badge +24
Yes that makes sense. If you know what driver you need, there's command-line options to dracut to force their inclusion. The default behavior is just to include the drivers that are currently loaded when you run it.
Userlevel 7
Badge +25
So chicken/egg problem? Host is unknown to a level so can't prestage the drivers in init to allow install to function oob?

And since you guys likely prep off of a USB device by default only USB is included in the device that is imaged.

Now there are only a couple of recommended options for HBAs for CE. Is it feasible as part of source image creation to include support for the majors? Don't have the background to rate the complexity of that one when it comes to impact to initramfs size and revision control. Thinkning LSI, Adaptec, Intel PCH, etc.
Userlevel 7
Badge +24
Given that USB is the default method and CE isn't intended for production or the mass-installation automation that's needed for production-sized environments, this probably ranks low on my list of improvements for us to work on. It should be possible for you to boot the image on USB of one of the machines you're using, run dracut manually, and dd the image back off the USB to use as your master for the other machines.
Userlevel 7
Badge +25
Heh my use of an HDD has little to do w production or mass deployment. All to do with avoiding more RMAs to Kingston for dead USB devices. :D

But good call on doing it once and reusing. My latest run had no USBs in the mix (all iDRAC remote media mounts) so I didn't even need physical access to the box which is helpful to me since my primary lab is about 1500 miles away.

Rerunning dracut isn't a huge effort though. Honestly the steps for vi'ing the exclusion of boot was more of a pain to make sure I don't screw up the white space. And now that I have eliminated the cluster death due to USB failure and you have Upgrades rolling I am really hoping I can avoid USB shennanigans for a while.
Userlevel 1
Badge +10
Few hours ago I've replaced all my three USB key's with cheap 32GB Chineese SSD disks on 3 node cluster. I'd say the effect was impressive, as it starts simultaneousely now. 😉
Userlevel 3
Badge +20
Nemat
I don't have basic linux skills. Would you please explain step by step to make ce bootable on a ssd drive.
Did to rebuilt CE with the bootable SSd or just transfer the data from USB to SSD without rebuilding.
Thanks
mswasif
Userlevel 1
Badge +10
I did this on already installed 3 node CE cluster without reinstall.
-Shut down the cluster and install the SSD (mine is 32GB)
- Boot from live cd (any linux live cd will make it)
- Find the source USB drive and destination SSD (fdisk -l can be help)
- Copy the raw image from USB to SSD. Be carefull with device paths, if you choose the wrong disk,
you may loss the data!
In this below command put the right characters instead A and B
dd if=/dev/sdA of=/dev/sdB bs=1M
- In the bios choose the SSD you have prepared to be a first bootable device

This will do it. When you get the the system up and running, you may decide to correct the root partition/filesystem. I've got 32GB partiion and root filesystem afterwards.
Userlevel 3
Badge +20
Hi Nemat,

Thanks for the reply.
I bought a kingston 120 gb ssd for $44, which is cheaper and faster than satadom and better than cheap non branded ssd. I have been using this ssd for about 3 years with not a single problem. Prepare a 8 gb usb drive with Nutanix CE to differentiate between the 32 gb usb which I was using to boot CE. Boot with the 8 gb drive into CE rescue mode, the second option. Log on to the root and "fdisk -l" to find the drive letters. "dd if=/dev/sdd of=/dev/sdc bs=1M" where "sdd" was the original 32 gb boot CE usb and "sdc" the new 120 gb ssd. Remove both usb and made 120 gb SSD as the first boot drive in the bios and it is working without any problem. I checked both CE and CE rescue boot option and everything is working fine. Now I don't have to worry about dead usb and its fallout.

I am only concern about 2 things.
Would CE software upgrade cause any problems trying to overwrite the boot ssd.
Only 7 gb of the 120 gb ssd is being used which was the size of CE partition on the 32 gb usb. Is there a need to reclaim the unused space and what is the command.

Thanks

mswasif
Userlevel 1
Badge +10
mswasif wrote:

I am only concern about 2 things.
Would CE software upgrade cause any problems trying to overwrite the boot ssd.
Only 7 gb of the 120 gb ssd is being used which was the size of CE partition on the 32 gb usb. Is there a need to reclaim the unused space and what is the command.


I dont think CE upgrade will cause any problem, but I'm not sure. Hopefully, we will find out this soon.

I don't know if you need an extra unused space, but I did reclaim. I will use it to store logs of serial device in my app.

In order to reclaim the space do the following on nutanix host, log in as a root:
1. find out your SSD device: fdisk -l
mine is at /dev/sdb
2. using fdisk delete old partition and recreate it with a new size (bew careful to use your device from step 1) :
fdisk /dev/sdb
p - print the existing partition info
Notice the beginning sector number

d - delete the partition

n - create the new one
Create the primary partition and use the number from previous steps for the new partition
For end sector use the biggest secotr number available for your SSD

w - write the partition table permanently

3. Use partprobe to inform the OS about partition change
4. reboot
5. resize2fs /dev/sdb1
Resize your file system to entire partition (my partiotion is /dev/sdb1)