Adding a new disk to a CE node


Userlevel 3
Badge +19
Hi All,

I thought adding a new disk to a CE cluster might be a useful thing as things change over time. I'm running a nested CE cluster and a couple of physical CE hosts. For my nested CE cluster (My Son's home lab - http://longwhiteclouds.com/2015/07/11/nutanix-community-edition-4-node-cluster-nested-on-esxi-6-0/) I just got around to putting some SSD's in there (FusionIO cards) and I wanted to expand the storage on each of the 4 nodes. This isn't quite as easy as just adding the disks to the hosts or in my case VM's.

Here is the process you need to go through:

1. Add the disks to the host, either virtual or physical disks depending on what you're running
1. Log into the CE host as root and do virsh list to get the CVM name
2. Use virsh edit to edit the CVM configuration and add additional disk lines:






ATA
Virtual disk






ATA
Virtual disk



In my case the disks popped up as /dev/sdc and /dev/sdd, it could well be different on your setup. Be careful with this as you don't want to accidentially make a mistake and stop your CVM from running.

3. Log into your CVM as nutanix and sudo vi /etc/nutanix/disk_location.json

You will need to add new entries for the new disks you've added based on their scsi serial number specified in the virsh configuration. As an example:

{"drive-scsi0-0-0-3": 4, "drive-scsi0-0-0-2": 3, "drive-scsi0-0-0-1": 1, "drive-scsi0-0-0-0": 2}

The first two entries above are the new ones I added (although I added them to the end of the line and CVM rearranged things).

4. You will need to shut down or power off your CVM and restart it. You can do this either by logging into the CVM as nutanix and doing a sudo shutdown -h now, or doing a virsh shutdown or virsh destroy (another name for power off) on the CVM. You can use virsh start to start it again, if you forgot the name of the CVM just do virsh list --all.

5. Do list_disks and make sure the new disks show up there.
6. Do parted -l and see which disks show up without any partitions, these are the new disks you added.
7. Do a mount and ensure that the disks you think you added are not already mounted. You don't want to blow away an existing disk that might have moved device in the process of adding the new disks.
8. Take care as these next steps are destructive if there is something on the disk or you chose the wrong disks.
9. Do sudo parted -s /dev/sd? mklabel gpt - where ? is the new device for the new disk
10. Do sudo parted -s /dev/sd? mkpart primary "1 -1" - where ? is the new device for the new disk, this creates the partition
11. Do sudo mkfs.ext4 /dev/sd?1 - where ? is the new device for the new disk, this creates the new filesystem on the disk
12. Do sudo bash
13. Do mount_disks - this recognises all the partitions and mounts the disks in the correct locations
14. Do restart hades - this restarts the disk manager to recognise the new disks
15. Reboot the CVM - this will force Hades to add the new disks to the Zeus configuration and they will show up in PRISM.

I haven't tried this with disks marked as SSD, not sure if it'll work. This is a bit of a hack as it's not meant to be supported to add disks after CE is deployed and a cluster is already built. But it works, and that is what CE is for. Figuring stuff out.

This process isn't required in a production Nutanix environment as the disks are automatically detected and added. But as the configurations of CE systems are so variable this is not possible to do it all automatically yet in CE. Maybe someone could script it up.

Also if you have an existing disk and you want to change the size of it, first you'll need to remove the disk from the cluster using the remove disk button on the hardware table, then change the size of the disk, re partition the disk, make the new filesystem on the disk, mount it, restart hades and then restart the CVM. This will probably be a lot more rare than adding a disk. I would not recommend doing this with the SSD's.

Anyway, hope you enjoy playing with the above. It certainly helped me expand my CE cluster.

This topic has been closed for comments

14 replies

Userlevel 2
Badge +13
I added a blank drive and it didn't show up with ncli disk list. I used the lsscsi command to find out which /dev/sdX it appears as.
Userlevel 1
Badge +10
Today, I tried to add a new disk to one of the nodes of my CE cluster. Stuck on 3rd step, because there is no disk_location.json in /etc/nutanix folder.

Combining this guide and the "failed disk" thread I was able to add a new disk.
And the new disk automagically added to the default container/default sp.
Userlevel 3
Badge +17
Well done.
Yes, if there is only one Container ("default"), the system will add the new disk "automagically" to the SP.
Userlevel 1
Badge +10
PaulR wrote:Well done.
Yes, if there is only one Container ("default"), the system will add the new disk "automagically" to the SP.
I wonder if Topic starter has different version of CE (internal to nutanix) . Why I miss that file? Is it because the TS runs nested?

If someone interested in writing a script where to look into details?
Badge +4
Great article...

I have a 4 node supermicro server all nodes are identical.

1 x SATA DOM 64GB
1 x INTEL 240GB SSD4 x Segate 1TB SATA
Booting from an 8GB USB Cruzer

When I first tried to install it I only had the SATA DOM and since have added a proper SSD.

I can see three are layed out the same and one is not but that is not the issue I am having.

The physical hosts are mapping the drives differently as to what the CVM is seeing.

From the CVM I see the following:

nutanix@NTNX-0ada8cab-A-CVM:10.37.2.68:~$ list_disksSlot Disk Model Serial Size0 /dev/sde ST1000NX0303 S470172T 1.0 TB1 /dev/sdd ST1000NX0303 S470176Z 1.0 TB2 /dev/sdc ST1000NX0303 S47016TN 1.0 TB3 /dev/sda ST1000NX0303 S47016QZ 1.0 TB4 /dev/sdb INTEL SSDSC2BB24 BTWA51830051240A 240 GB

virsh edit
/usr/libexec/qemu-kvm S47016QZ 5000c50080b9fa49 ATA ST1000NX0303
S470172T 5000c50080b9b51e ATA ST1000NX0303
S470176Z 5000c50080b9b1b6 ATA ST1000NX0303
S47016TN 5000c50080b9da44 ATA ST1000NX0303
BTWA51830051240A 55cd2e404c00d3c6 ATA INTEL SSDSC2BB24

Phyiscal Host:
lsscsi[0:0:0:0] disk SanDisk Cruzer Switch 1.27 /dev/sdb[1:0:0:0] disk ATA INTEL SSDSC2BB24 0130 /dev/sdc[1:0:1:0] disk ATA ST1000NX0303 NN02 /dev/sdd[1:0:2:0] disk ATA ST1000NX0303 NN02 /dev/sde[1:0:3:0] disk ATA ST1000NX0303 NN02 /dev/sdf[1:0:4:0] disk ATA ST1000NX0303 NN02 /dev/sdg[5:0:0:0] disk ATA SATA SSD S9FM /dev/sda


Via prism I see that it can see each node is using the INTEL SSD but not the full 240GB.
BTWA518209WS240A 0GiB of 42.49Gib (Is that expected?)

First issue:
1. As you can also see the virsh config does not list two of the 4 SATA 1TB drives..
=> Add them via virsh but that leads me to the problem of list_disks does not match the virsh config.
2. reconfigure virsh to have the correct config matching the physical hardware. If I do this do I have to do this whilst the CVM is down?
3. I am also worried that the CVM is trying to use the SATA DOM.

Any assistance would be welcome.
Userlevel 7
Badge +24
OK, there's a handful of things to address in this thread:

  • Editing disk_location.json should not be necessary. As a few of you have noted, it should not even exist on a CE cluster. I believe Mr. Webster has been the victim of very old NX-2000 hardware that does make use of that file.
  • Regarding lad-sysops post:
  • (a) The reduced capacity in Prism of the 240GB SSD is expected. We reserve much of that space for internal metadata usage.
  • (b) From the virsh config shown, all 4 of the 1TB HDDs are listed, along with the 240GB SSD. I don't see any mention of the SATA DOM. In general, CE will not make use of any drive below 100GB, which SATA DOMs typically are.
  • (c) If you do reconfigure anything in virsh edit, you should definitely do it while the CVM is powered off.
Let me know if I missed any concerns.
Badge +4
Hi Adam,
I did figure out that nutanix installed itself on the SATA DOM. Found this out after removing it.
I have done a cleanup.sh and reconfigured the nodes with a new install.

I am still seeing discrepancies though with regards to what the physical host see's it's disks as and what the cvm recognises them as.

Physical host
[root@NTNX-877a8451-A ~]# lsscsi[0:0:0:0] disk SanDisk Cruzer Switch 1.27 /dev/sda[1:0:0:0] disk ATA INTEL SSDSC2BB24 0130 /dev/sdb[1:0:1:0] disk ATA ST1000NX0303 NN02 /dev/sdc[1:0:2:0] disk ATA ST1000NX0303 NN02 /dev/sdd[1:0:3:0] disk ATA ST1000NX0303 NN02 /dev/sde[1:0:4:0] disk ATA ST1000NX0303 NN02 /dev/sdf


CVM
nutanix@NTNX-877a8451-A-CVM:10.37.2.68:~$ lsscsi[2:0:0:0] disk ATA ST1000NX0303 2.1. /dev/sda[2:0:0:1] disk ATA ST1000NX0303 2.1. /dev/sde[2:0:0:2] disk ATA ST1000NX0303 2.1. /dev/sdd[2:0:0:3] disk ATA INTEL SSDSC2BB24 2.1. /dev/sdc[2:0:0:4] disk ATA ST1000NX0303 2.1. /dev/sdb

SO I suppose at this point if I start making changes with the disk at the CVM level it may clobber the incorrect physical disk.

Is my next best option to edit the configuration via virsh and make it see the disks properly before initialising them to be used in the cluster?

Cheers
Userlevel 7
Badge +24
Hmm, are you just referring to the difference in order? If so, that's harmless. We use KVM disk passthrough, so while drives should be 1 for 1, there's not going to be a direct correspondence between them down to the device ordering. From the virsh output you posted earlier, you do have serial #s for each disk (which will be displayed in Prism) if you need to remove a specific disk at some point.
Badge +4
I solved my disk issues by reading this post. -> I updated it with my solution.

http://next.nutanix.com/t5/Discussion-Forum/failed-disk/m-p/4397/highlight/false#M801

Thanks for you help Adam.
Userlevel 7
Badge +24
Glad to hear you got it working!
Badge +3
I thought I would share this as it still hasn't cropped up in the forum and other ncli hidden commands have been exposed already.

Once you have added a new SSD to your Nutanix node, you can then assign it to an SSD storage tier. The ncli syntax is:

ncli> disk update id=123456 tier-name=SSD-SATA

No need to restart any services, in Prism you will then see your disk which was previously on a HDD storage tier, updated to an SSD disk.

Secondly, with regards re-numbering your newly added HDD, I have had some success with performing a combination of editing the files disk_config.json and disk_wal_locations.json at /home/nutanix/data/stargate-storage/disks/SERIALNR, restarting the CVM and/or node. Restarting other cluster services and/or rebooting the other nodes may also be necessary (though I don't quite remember the exact steps I took so would need to re-perform to verify).

Once your new disk ID appears in Prism, you can remove the old high-numbered disk ID, wait for the operation to clear, then place your new lower-numbered disk ID online.

I should warn that whilst it seems possible to change the disk ID, I have no idea whether this affects any data stored on said disk or your NDFS Volumes.

- V.
Badge +6
Hi

When I try to add a second HDD 9XF3NEAH for ce-2017.05.11-stable.img as below I get the error when I try to save the edit

error: XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/domain.rngExtra element devices in interleaveElement domain failed to validate content

Has something changed or have I missed something?

9XF3NEH15000c5007b46bf29ATAST9500620NS9XF3NEAH5000c5007b46eceeATAST9500620NS

error: XML document failed to validate against schema: Unable to validate doc against /usr/share/libvirt/schemas/domain.rngExtra element devices in interleaveElement domain failed to validate content
Badge +6
I tried running virt-xml-validate on the unchanged file created by virsh edit and it has the same error

virt-xml-validate /tmp/virshn3ZtLl.xml domainRelax-NG validity error : Extra element devices in interleave/tmp/virshn3ZtLl.xml:22: element devices: Relax-NG validity error : Element domain failed to validate content

Line 22 is ""
Badge +6
Hi
I used virsh edit to add the new disk and when I saved the file told it to ignore the validation error.
I then followed the procedure outlined in the original post and now I can see the new disk 😃