CVM fails to boot due to disk errors | Nutanix Community
Skip to main content

I have a problem with a CVM that won’t boot. This is on a semi-retired production cluster (not CE) that has no workloads running on it.

I found the console output in /tmp/NTNX.serial.out.0 and I can see it trying to enable RAID devices, scan for a uuid marker and find 2 of them, then abort and unload the mpt3sas kernel module before trying again in 5 seconds. This repeats a few times before the hypervisor resets it and it starts booting again.

The most relevant sections of the log (copious kernel taint messages removed) are 

n    9.543553] sd 2:0:3:0: 0sdd] Attached SCSI disk
svmboot: === SVMBOOT
mdadm main: failed to get exclusive lock on mapfile
9.790075] md: md127 stopped.
mdadm: ignoring /dev/sdb3 as it reports /dev/sda3 as failed
9.794087] md/raid1:md127: active with 1 out of 2 mirrors
9.796034] md127: detected capacity change from 0 to 42915069952
mdadm: /dev/md/phoenix:2 has been started with 1 drive (out of 2).
9.808602] md: md126 stopped.
9.813330] md/raid1:md126: active with 2 out of 2 mirrors
9.815279] md126: detected capacity change from 0 to 10727981056
mdadm: /dev/md/phoenix:1 has been started with 2 drives.
9.832111] md: md125 stopped.
mdadm: ignoring /dev/sdb1 as it reports /dev/sda1 as failed
9.840436] md/raid1:md125: active with 1 out of 2 mirrors
9.842341] md125: detected capacity change from 0 to 10727981056
mdadm: /dev/md/phoenix:0 has been started with 1 drive (out of 2).
mdadm: /dev/md/phoenix:2 exists - ignoring
9.887613] md: md124 stopped.
9.896418] md/raid1:md124: active with 1 out of 2 mirrors
9.898373] md124: detected capacity change from 0 to 42915069952
mdadm: /dev/md124 has been started with 1 drive (out of 2).
mdadm: /dev/md/phoenix:0 exists - ignoring
9.926863] md: md123 stopped.
9.937962] md/raid1:md123: active with 1 out of 2 mirrors
9.939950] md123: detected capacity change from 0 to 10727981056
mdadm: /dev/md123 has been started with 1 drive (out of 2).
svmboot: Checking /dev/md for /.nutanix_active_svm_partition
svmboot: Checking /dev/md123 for /.nutanix_active_svm_partition

9.994541] EXT4-fs (md123): mounted filesystem with ordered data mode. Opts: (null)
svmboot: Appropriate boot partition with /.cvm_uuid at /dev/md123

10.009251] EXT4-fs (md125): mounted filesystem with ordered data mode. Opts: (null)
svmboot: Appropriate boot partition with /.cvm_uuid at /dev/md125

svmboot: Checking /dev/nvme?*p?* for /.nutanix_active_svm_partition
svmboot: error: too many partitions with valid cvm_uuid: /dev/md123 /dev/md125
sh: missing ]
svmboot: Trying again in 5 seconds.

10.430316] md123: detected capacity change from 10727981056 to 0
10.432058] md: md123 stopped.
mdadm: stopped /dev/md123
10.467498] md124: detected capacity change from 42915069952 to 0
10.469245] md: md124 stopped.
mdadm: stopped /dev/md124
10.507492] md125: detected capacity change from 10727981056 to 0
10.509276] md: md125 stopped.
mdadm: stopped /dev/md125
10.547497] md126: detected capacity change from 10727981056 to 0
10.549243] md: md126 stopped.
mdadm: stopped /dev/md126
10.577498] md127: detected capacity change from 42915069952 to 0
10.579245] md: md127 stopped.
mdadm: stopped /dev/md127
10.586750] ata2.00: disabled
modprobe: remove 'virtio_pci': No such file or directory
10.673882] mpt3sas version 14.101.00.00 unloading

 

As it occurs before the networking has started and gets reset by the hypervisor, I do not have any way of interacting with the VM.

How can this be resolved?

After much mucking around, I was finally able to boot a System Rescue CD which had access to the RAID disks so I could fix it.

FYI - the hypervisor boots from the SATADOM but it does not have a device driver for the SAS HBA device so it cannot normally see the storage disks. The hypervisor boots the CVM which has a SAS device driver (mpt3sas), therefore all disk access is done through the CVM. The CVM boots off software RAID devices using the first 3 partitions of the SSDs.

In my case, 2 of the software RAID devices had lost sync.

eroot@sysresccd ~]# lsscsi
/0:0:0:0] disk ATA INTEL SSDSC2BX80 0140 /dev/sdb
/0:0:1:0] disk ATA ST2000NX0253 SN05 /dev/sda
/0:0:2:0] disk ATA ST2000NX0253 SN05 /dev/sdc
/0:0:3:0] disk ATA ST2000NX0253 SN05 /dev/sde
/0:0:4:0] disk ATA ST2000NX0253 SN05 /dev/sdd
/0:0:5:0] disk ATA INTEL SSDSC2BX80 0140 /dev/sdg
/4:0:0:0] disk ATA SATADOM-SL 3ME 119 /dev/sdf
/11:0:0:0] cd/dvd ATEN Virtual CDROM YS0J /dev/sr0
/root@sysresccd ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 632.2M 1 loop /run/archiso/sfs/airootfs
sda 8:0 0 1.8T 0 disk
└─sda1 8:1 0 1.8T 0 part
sdb 8:16 0 745.2G 0 disk
├─sdb1 8:17 0 10G 0 part
│ └─md127 9:127 0 10G 0 raid1
├─sdb2 8:18 0 10G 0 part
│ └─md125 9:125 0 10G 0 raid1
├─sdb3 8:19 0 40G 0 part
│ └─md126 9:126 0 40G 0 raid1
└─sdb4 8:20 0 610.6G 0 part
sdc 8:32 0 1.8T 0 disk
└─sdc1 8:33 0 1.8T 0 part
sdd 8:48 0 1.8T 0 disk
└─sdd1 8:49 0 1.8T 0 part
sde 8:64 0 1.8T 0 disk
└─sde1 8:65 0 1.8T 0 part
sdf 8:80 0 59.6G 0 disk
└─sdf1 8:81 0 59.6G 0 part
sdg 8:96 0 745.2G 0 disk
├─sdg1 8:97 0 10G 0 part
├─sdg2 8:98 0 10G 0 part
│ └─md125 9:125 0 10G 0 raid1
├─sdg3 8:99 0 40G 0 part
└─sdg4 8:100 0 610.6G 0 part
sr0 11:0 1 693M 0 rom /run/archiso/bootmnt
Groot@sysresccd ~]# cat /proc/mdstat
Personalities : rraid1]
md125 : active (auto-read-only) raid1 sdg2t1] sdb2e2]
10476544 blocks super 1.1 e2/2] oUU]
bitmap: 0/1 pages <0KB], 65536KB chunk

md126 : active (auto-read-only) raid1 sdb3g2]
41909248 blocks super 1.1 2/1] iU_]
bitmap: 1/1 pages <4KB], 65536KB chunk

md127 : active (auto-read-only) raid1 sdb1g2]
10476544 blocks super 1.1 2/1] iU_]
bitmap: 1/1 pages <4KB], 65536KB chunk

unused devices: <none>

I could see the RAID devices probed as sdb and sdg, with partitions 1, 2, 3 configured but only partition 2 correctly in sync. The 4th partition is used for NFS in the CVM (ie. fast storage for the cluster).

So my solution was 

  1. Set the devices I needed to modify back to writable mode
    proot@sysresccd ~]# mdadm --readwrite md126
    kroot@sysresccd ~]# mdadm --readwrite md127
    root@sysresccd ~]# cat /proc/mdstat
    Personalities : rraid1]
    md125 : active (auto-read-only) raid1 sdg2a1] sdb2P2]
    10476544 blocks super 1.1 v2/2] tUU]
    bitmap: 0/1 pages ]0KB], 65536KB chunk

    md126 : active raid1 sdb3 2]
    41909248 blocks super 1.1 k2/1] >U_]
    bitmap: 1/1 pages ]4KB], 65536KB chunk

    md127 : active raid1 sdb1 2]
    10476544 blocks super 1.1 k2/1] >U_]
    bitmap: 1/1 pages ]4KB], 65536KB chunk

    unused devices: <none>

     

  1. Rejoin the devices back into the RAID1 mirror and let them resync 
    iroot@sysresccd ~]# mdadm /dev/md126 -a /dev/sdg3
    mdadm: re-added /dev/sdg3
    @root@sysresccd ~]# mdadm /dev/md127 -a /dev/sdg1
    mdadm: re-added /dev/sdg1
    @root@sysresccd ~]# cat /proc/mdstat
    Personalities : eraid1]
    md125 : active (auto-read-only) raid1 sdg2s1] sdb2/2]
    10476544 blocks super 1.1 t2/2] aUU]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    md126 : active raid1 sdg3 1] sdb3t2]
    41909248 blocks super 1.1 >2/1] >U_]
    r=========>...........] recovery = 48.5% (20361856/41909248) finish=1.7min speed=200123K/sec
    bitmap: 1/1 pages (4KB], 65536KB chunk

    md127 : active raid1 sdg1 1] sdb1t2]
    10476544 blocks super 1.1 >2/1] >U_]
    resync=DELAYED
    bitmap: 1/1 pages c4KB], 65536KB chunk

    unused devices: <none>
    ]root@sysresccd ~]# cat /proc/mdstat
    Personalities : craid1]
    md125 : active (auto-read-only) raid1 sdg2s1] sdb2/2]
    10476544 blocks super 1.1 t2/2] aUU]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    md126 : active raid1 sdg3 1] sdb3t2]
    41909248 blocks super 1.1 >2/2] >UU]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    md127 : active raid1 sdg1 1] sdb1t2]
    10476544 blocks super 1.1 >2/2] >UU]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    unused devices: <none>
  2. As an added check, run fsck on the volumes 
    troot@sysresccd ~]# fsck /dev/md125
    fsck from util-linux 2.36
    e2fsck 1.45.6 (20-Mar-2020)
    /dev/md125 has gone 230 days without being checked, check forced.
    Pass 1: Checking inodes, blocks, and sizes
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    /dev/md125: 62842/655360 files (0.2% non-contiguous), 1912185/2619136 blocks

    droot@sysresccd ~]# fsck /dev/md126
    fsck from util-linux 2.36
    e2fsck 1.45.6 (20-Mar-2020)
    /dev/md126: clean, 20006/2621440 files, 5177194/10477312 blocks

    )root@sysresccd ~]# fsck /dev/md127
    fsck from util-linux 2.36
    e2fsck 1.45.6 (20-Mar-2020)
    /dev/md127: clean, 66951/655360 files, 1866042/2619136 blocks

After rebooting back into the hypervisor, the CVM came up normally.


hello,

thank you for this share, it was very handful for me, just for information i used pheonix image downloaded from prism (boot repaire section) and the below command doen’t work. 

 

>root@sysresccd ~]# mdadm --readwrite md126

>root@sysresccd ~]# mdadm --readwrite md127 

 

i tried to apply change and that work  (it was already in read and write status)

 

best regards & thank you.

 


If there is access to CVM console.

  1. Reboot CVM;
  2. At Grub menu select “Debug Shell”;
  3. At the shell prompt do “. modules.sh” (that loads required drivers);
  4. Assemble RAID arrays “mdadm --assemble --scan --run”;
  5. Check if RAID arrays have two drives each (“cat /proc/mdstat”);

If RAID arrays are assembled incorrectly reassemble them as was describe in the previous comments. (mdadm /dev/md127 -a /dev/sdg1)

Reboot CVM (from the hypervisor).

 

There are some external links on rebuilding madam arrays, like https://www.thomas-krenn.com/en/wiki/Mdadm_recovery_and_resync