Solved

CVM fails to boot due to disk errors

  • 15 October 2020
  • 1 reply
  • 194 views

Badge +4

I have a problem with a CVM that won’t boot. This is on a semi-retired production cluster (not CE) that has no workloads running on it.

I found the console output in /tmp/NTNX.serial.out.0 and I can see it trying to enable RAID devices, scan for a uuid marker and find 2 of them, then abort and unload the mpt3sas kernel module before trying again in 5 seconds. This repeats a few times before the hypervisor resets it and it starts booting again.

The most relevant sections of the log (copious kernel taint messages removed) are 

[    9.543553] sd 2:0:3:0: [sdd] Attached SCSI disk
svmboot: === SVMBOOT
mdadm main: failed to get exclusive lock on mapfile
[ 9.790075] md: md127 stopped.
mdadm: ignoring /dev/sdb3 as it reports /dev/sda3 as failed
[ 9.794087] md/raid1:md127: active with 1 out of 2 mirrors
[ 9.796034] md127: detected capacity change from 0 to 42915069952
mdadm: /dev/md/phoenix:2 has been started with 1 drive (out of 2).
[ 9.808602] md: md126 stopped.
[ 9.813330] md/raid1:md126: active with 2 out of 2 mirrors
[ 9.815279] md126: detected capacity change from 0 to 10727981056
mdadm: /dev/md/phoenix:1 has been started with 2 drives.
[ 9.832111] md: md125 stopped.
mdadm: ignoring /dev/sdb1 as it reports /dev/sda1 as failed
[ 9.840436] md/raid1:md125: active with 1 out of 2 mirrors
[ 9.842341] md125: detected capacity change from 0 to 10727981056
mdadm: /dev/md/phoenix:0 has been started with 1 drive (out of 2).
mdadm: /dev/md/phoenix:2 exists - ignoring
[ 9.887613] md: md124 stopped.
[ 9.896418] md/raid1:md124: active with 1 out of 2 mirrors
[ 9.898373] md124: detected capacity change from 0 to 42915069952
mdadm: /dev/md124 has been started with 1 drive (out of 2).
mdadm: /dev/md/phoenix:0 exists - ignoring
[ 9.926863] md: md123 stopped.
[ 9.937962] md/raid1:md123: active with 1 out of 2 mirrors
[ 9.939950] md123: detected capacity change from 0 to 10727981056
mdadm: /dev/md123 has been started with 1 drive (out of 2).
svmboot: Checking /dev/md for /.nutanix_active_svm_partition
svmboot: Checking /dev/md123 for /.nutanix_active_svm_partition

[ 9.994541] EXT4-fs (md123): mounted filesystem with ordered data mode. Opts: (null)
svmboot: Appropriate boot partition with /.cvm_uuid at /dev/md123

[ 10.009251] EXT4-fs (md125): mounted filesystem with ordered data mode. Opts: (null)
svmboot: Appropriate boot partition with /.cvm_uuid at /dev/md125

svmboot: Checking /dev/nvme?*p?* for /.nutanix_active_svm_partition
svmboot: error: too many partitions with valid cvm_uuid: /dev/md123 /dev/md125
sh: missing ]
svmboot: Trying again in 5 seconds.

[ 10.430316] md123: detected capacity change from 10727981056 to 0
[ 10.432058] md: md123 stopped.
mdadm: stopped /dev/md123
[ 10.467498] md124: detected capacity change from 42915069952 to 0
[ 10.469245] md: md124 stopped.
mdadm: stopped /dev/md124
[ 10.507492] md125: detected capacity change from 10727981056 to 0
[ 10.509276] md: md125 stopped.
mdadm: stopped /dev/md125
[ 10.547497] md126: detected capacity change from 10727981056 to 0
[ 10.549243] md: md126 stopped.
mdadm: stopped /dev/md126
[ 10.577498] md127: detected capacity change from 42915069952 to 0
[ 10.579245] md: md127 stopped.
mdadm: stopped /dev/md127
[ 10.586750] ata2.00: disabled
modprobe: remove 'virtio_pci': No such file or directory
[ 10.673882] mpt3sas version 14.101.00.00 unloading

 

As it occurs before the networking has started and gets reset by the hypervisor, I do not have any way of interacting with the VM.

How can this be resolved?

icon

Best answer by waddles 16 October 2020, 05:03

After much mucking around, I was finally able to boot a System Rescue CD which had access to the RAID disks so I could fix it.

FYI - the hypervisor boots from the SATADOM but it does not have a device driver for the SAS HBA device so it cannot normally see the storage disks. The hypervisor boots the CVM which has a SAS device driver (mpt3sas), therefore all disk access is done through the CVM. The CVM boots off software RAID devices using the first 3 partitions of the SSDs.

In my case, 2 of the software RAID devices had lost sync.

[root@sysresccd ~]# lsscsi
[0:0:0:0] disk ATA INTEL SSDSC2BX80 0140 /dev/sdb
[0:0:1:0] disk ATA ST2000NX0253 SN05 /dev/sda
[0:0:2:0] disk ATA ST2000NX0253 SN05 /dev/sdc
[0:0:3:0] disk ATA ST2000NX0253 SN05 /dev/sde
[0:0:4:0] disk ATA ST2000NX0253 SN05 /dev/sdd
[0:0:5:0] disk ATA INTEL SSDSC2BX80 0140 /dev/sdg
[4:0:0:0] disk ATA SATADOM-SL 3ME 119 /dev/sdf
[11:0:0:0] cd/dvd ATEN Virtual CDROM YS0J /dev/sr0
[root@sysresccd ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 632.2M 1 loop /run/archiso/sfs/airootfs
sda 8:0 0 1.8T 0 disk
└─sda1 8:1 0 1.8T 0 part
sdb 8:16 0 745.2G 0 disk
├─sdb1 8:17 0 10G 0 part
│ └─md127 9:127 0 10G 0 raid1
├─sdb2 8:18 0 10G 0 part
│ └─md125 9:125 0 10G 0 raid1
├─sdb3 8:19 0 40G 0 part
│ └─md126 9:126 0 40G 0 raid1
└─sdb4 8:20 0 610.6G 0 part
sdc 8:32 0 1.8T 0 disk
└─sdc1 8:33 0 1.8T 0 part
sdd 8:48 0 1.8T 0 disk
└─sdd1 8:49 0 1.8T 0 part
sde 8:64 0 1.8T 0 disk
└─sde1 8:65 0 1.8T 0 part
sdf 8:80 0 59.6G 0 disk
└─sdf1 8:81 0 59.6G 0 part
sdg 8:96 0 745.2G 0 disk
├─sdg1 8:97 0 10G 0 part
├─sdg2 8:98 0 10G 0 part
│ └─md125 9:125 0 10G 0 raid1
├─sdg3 8:99 0 40G 0 part
└─sdg4 8:100 0 610.6G 0 part
sr0 11:0 1 693M 0 rom /run/archiso/bootmnt
[root@sysresccd ~]# cat /proc/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
10476544 blocks super 1.1 [2/2] [uu]
bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active (auto-read-only) raid1 sdb3[2]
41909248 blocks super 1.1 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk

md127 : active (auto-read-only) raid1 sdb1[2]
10476544 blocks super 1.1 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

I could see the RAID devices probed as sdb and sdg, with partitions 1, 2, 3 configured but only partition 2 correctly in sync. The 4th partition is used for NFS in the CVM (ie. fast storage for the cluster).

So my solution was 

  1. Set the devices I needed to modify back to writable mode
    [root@sysresccd ~]# mdadm --readwrite md126
    [root@sysresccd ~]# mdadm --readwrite md127
    [root@sysresccd ~]# cat /proc/mdstat
    Personalities : [raid1]
    md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
    10476544 blocks super 1.1 [2/2] [uu]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    md126 : active raid1 sdb3[2]
    41909248 blocks super 1.1 [2/1] [U_]
    bitmap: 1/1 pages [4KB], 65536KB chunk

    md127 : active raid1 sdb1[2]
    10476544 blocks super 1.1 [2/1] [U_]
    bitmap: 1/1 pages [4KB], 65536KB chunk

    unused devices: <none>

     

  1. Rejoin the devices back into the RAID1 mirror and let them resync 
    [root@sysresccd ~]# mdadm /dev/md126 -a /dev/sdg3
    mdadm: re-added /dev/sdg3
    [root@sysresccd ~]# mdadm /dev/md127 -a /dev/sdg1
    mdadm: re-added /dev/sdg1
    [root@sysresccd ~]# cat /proc/mdstat
    Personalities : [raid1]
    md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
    10476544 blocks super 1.1 [2/2] [uu]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    md126 : active raid1 sdg3[1] sdb3[2]
    41909248 blocks super 1.1 [2/1] [U_]
    [=========>...........] recovery = 48.5% (20361856/41909248) finish=1.7min speed=200123K/sec
    bitmap: 1/1 pages [4KB], 65536KB chunk

    md127 : active raid1 sdg1[1] sdb1[2]
    10476544 blocks super 1.1 [2/1] [U_]
    resync=DELAYED
    bitmap: 1/1 pages [4KB], 65536KB chunk

    unused devices: <none>
    [root@sysresccd ~]# cat /proc/mdstat
    Personalities : [raid1]
    md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
    10476544 blocks super 1.1 [2/2] [uu]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    md126 : active raid1 sdg3[1] sdb3[2]
    41909248 blocks super 1.1 [2/2] [uu]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    md127 : active raid1 sdg1[1] sdb1[2]
    10476544 blocks super 1.1 [2/2] [uu]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    unused devices: <none>
  2. As an added check, run fsck on the volumes 
    [root@sysresccd ~]# fsck /dev/md125
    fsck from util-linux 2.36
    e2fsck 1.45.6 (20-Mar-2020)
    /dev/md125 has gone 230 days without being checked, check forced.
    Pass 1: Checking inodes, blocks, and sizes
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    /dev/md125: 62842/655360 files (0.2% non-contiguous), 1912185/2619136 blocks

    [root@sysresccd ~]# fsck /dev/md126
    fsck from util-linux 2.36
    e2fsck 1.45.6 (20-Mar-2020)
    /dev/md126: clean, 20006/2621440 files, 5177194/10477312 blocks

    [root@sysresccd ~]# fsck /dev/md127
    fsck from util-linux 2.36
    e2fsck 1.45.6 (20-Mar-2020)
    /dev/md127: clean, 66951/655360 files, 1866042/2619136 blocks

After rebooting back into the hypervisor, the CVM came up normally.

View original

1 reply

Badge +4

After much mucking around, I was finally able to boot a System Rescue CD which had access to the RAID disks so I could fix it.

FYI - the hypervisor boots from the SATADOM but it does not have a device driver for the SAS HBA device so it cannot normally see the storage disks. The hypervisor boots the CVM which has a SAS device driver (mpt3sas), therefore all disk access is done through the CVM. The CVM boots off software RAID devices using the first 3 partitions of the SSDs.

In my case, 2 of the software RAID devices had lost sync.

[root@sysresccd ~]# lsscsi
[0:0:0:0] disk ATA INTEL SSDSC2BX80 0140 /dev/sdb
[0:0:1:0] disk ATA ST2000NX0253 SN05 /dev/sda
[0:0:2:0] disk ATA ST2000NX0253 SN05 /dev/sdc
[0:0:3:0] disk ATA ST2000NX0253 SN05 /dev/sde
[0:0:4:0] disk ATA ST2000NX0253 SN05 /dev/sdd
[0:0:5:0] disk ATA INTEL SSDSC2BX80 0140 /dev/sdg
[4:0:0:0] disk ATA SATADOM-SL 3ME 119 /dev/sdf
[11:0:0:0] cd/dvd ATEN Virtual CDROM YS0J /dev/sr0
[root@sysresccd ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 632.2M 1 loop /run/archiso/sfs/airootfs
sda 8:0 0 1.8T 0 disk
└─sda1 8:1 0 1.8T 0 part
sdb 8:16 0 745.2G 0 disk
├─sdb1 8:17 0 10G 0 part
│ └─md127 9:127 0 10G 0 raid1
├─sdb2 8:18 0 10G 0 part
│ └─md125 9:125 0 10G 0 raid1
├─sdb3 8:19 0 40G 0 part
│ └─md126 9:126 0 40G 0 raid1
└─sdb4 8:20 0 610.6G 0 part
sdc 8:32 0 1.8T 0 disk
└─sdc1 8:33 0 1.8T 0 part
sdd 8:48 0 1.8T 0 disk
└─sdd1 8:49 0 1.8T 0 part
sde 8:64 0 1.8T 0 disk
└─sde1 8:65 0 1.8T 0 part
sdf 8:80 0 59.6G 0 disk
└─sdf1 8:81 0 59.6G 0 part
sdg 8:96 0 745.2G 0 disk
├─sdg1 8:97 0 10G 0 part
├─sdg2 8:98 0 10G 0 part
│ └─md125 9:125 0 10G 0 raid1
├─sdg3 8:99 0 40G 0 part
└─sdg4 8:100 0 610.6G 0 part
sr0 11:0 1 693M 0 rom /run/archiso/bootmnt
[root@sysresccd ~]# cat /proc/mdstat
Personalities : [raid1]
md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
10476544 blocks super 1.1 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active (auto-read-only) raid1 sdb3[2]
41909248 blocks super 1.1 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk

md127 : active (auto-read-only) raid1 sdb1[2]
10476544 blocks super 1.1 [2/1] [U_]
bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

I could see the RAID devices probed as sdb and sdg, with partitions 1, 2, 3 configured but only partition 2 correctly in sync. The 4th partition is used for NFS in the CVM (ie. fast storage for the cluster).

So my solution was 

  1. Set the devices I needed to modify back to writable mode
    [root@sysresccd ~]# mdadm --readwrite md126
    [root@sysresccd ~]# mdadm --readwrite md127
    [root@sysresccd ~]# cat /proc/mdstat
    Personalities : [raid1]
    md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
    10476544 blocks super 1.1 [2/2] [UU]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    md126 : active raid1 sdb3[2]
    41909248 blocks super 1.1 [2/1] [U_]
    bitmap: 1/1 pages [4KB], 65536KB chunk

    md127 : active raid1 sdb1[2]
    10476544 blocks super 1.1 [2/1] [U_]
    bitmap: 1/1 pages [4KB], 65536KB chunk

    unused devices: <none>

     

  1. Rejoin the devices back into the RAID1 mirror and let them resync 
    [root@sysresccd ~]# mdadm /dev/md126 -a /dev/sdg3
    mdadm: re-added /dev/sdg3
    [root@sysresccd ~]# mdadm /dev/md127 -a /dev/sdg1
    mdadm: re-added /dev/sdg1
    [root@sysresccd ~]# cat /proc/mdstat
    Personalities : [raid1]
    md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
    10476544 blocks super 1.1 [2/2] [UU]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    md126 : active raid1 sdg3[1] sdb3[2]
    41909248 blocks super 1.1 [2/1] [U_]
    [=========>...........] recovery = 48.5% (20361856/41909248) finish=1.7min speed=200123K/sec
    bitmap: 1/1 pages [4KB], 65536KB chunk

    md127 : active raid1 sdg1[1] sdb1[2]
    10476544 blocks super 1.1 [2/1] [U_]
    resync=DELAYED
    bitmap: 1/1 pages [4KB], 65536KB chunk

    unused devices: <none>
    [root@sysresccd ~]# cat /proc/mdstat
    Personalities : [raid1]
    md125 : active (auto-read-only) raid1 sdg2[1] sdb2[2]
    10476544 blocks super 1.1 [2/2] [UU]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    md126 : active raid1 sdg3[1] sdb3[2]
    41909248 blocks super 1.1 [2/2] [UU]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    md127 : active raid1 sdg1[1] sdb1[2]
    10476544 blocks super 1.1 [2/2] [UU]
    bitmap: 0/1 pages [0KB], 65536KB chunk

    unused devices: <none>
  2. As an added check, run fsck on the volumes 
    [root@sysresccd ~]# fsck /dev/md125
    fsck from util-linux 2.36
    e2fsck 1.45.6 (20-Mar-2020)
    /dev/md125 has gone 230 days without being checked, check forced.
    Pass 1: Checking inodes, blocks, and sizes
    Pass 2: Checking directory structure
    Pass 3: Checking directory connectivity
    Pass 4: Checking reference counts
    Pass 5: Checking group summary information
    /dev/md125: 62842/655360 files (0.2% non-contiguous), 1912185/2619136 blocks

    [root@sysresccd ~]# fsck /dev/md126
    fsck from util-linux 2.36
    e2fsck 1.45.6 (20-Mar-2020)
    /dev/md126: clean, 20006/2621440 files, 5177194/10477312 blocks

    [root@sysresccd ~]# fsck /dev/md127
    fsck from util-linux 2.36
    e2fsck 1.45.6 (20-Mar-2020)
    /dev/md127: clean, 66951/655360 files, 1866042/2619136 blocks

After rebooting back into the hypervisor, the CVM came up normally.

Reply