Solved

Multiple hard drives stuck

  • 24 June 2021
  • 3 replies
  • 52 views

I have a 20+node cluster and 2 nodes had multiple hard drives go out between them ( 1 SSD and 6 SATA). They are all showing stuck. I do not see any replication happening when using this command: 

curator_cli get_under_replication_info summary=true.

All nodes show up good when using this command:

nodetool -h 0 ring | grep Normal 

Would it be a good idea to kill the tasks with this command:

cvm$ progress_monitor_cli --entity_id="<Entity_ID>" --entity_type=<Package_Name> --operation=<Operation -delete

 

icon

Best answer by raaji 7 July 2021, 05:25

HI @Alexandria_512 

Thank you for the responses. Looks like one disk from each node are being removed. 

We will need to check the hades logs on both nodes to see if the disks were marked bad and were being unmounted or if its a different issue altogether. Please open a case with Nutanix support for further analysis.

View original

This topic has been closed for comments

3 replies

Badge +4

Hi @Alexandria_512 

Could you please clarify on 2 nodes having multiple hard drives go out? Do you see them marked Red in Prism? Can you share a screenshot? Do you see any alert in Prism regarding the fail of these drives?

Also, you can ssh into any of the CVMs in the cluster as nutanix user and execute the following commands to see if you were able to see all the disks on the nodes

  1. allssh list_disks
  2. allssh df -h

Hope this helps,

Thanks

From what I see from the cli, everything “looks” ok but from Prism the hard drives are not replicating.

 

 allssh list_disks


================== XXX.XXX.XXX.XX1 =================
Slot  Disk      Model             Serial                Size
0     --------  ----------------  --------------------  ------
1     /dev/sda  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    480 GB
2     /dev/sdl  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    480 GB
3     /dev/sdb  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
4     /dev/sdc  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
5     /dev/sdd  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
6     /dev/sde  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
7     /dev/sdf  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
8     /dev/sdg  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
9     /dev/sdh  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
10    /dev/sdi  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
11    /dev/sdj  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
12    /dev/sdk  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX  4.0 TB
================== XXX.XXX.XXX.XX2 =================
Slot  Disk      Model             Serial                Size
0     --------  ----------------  --------------------  ------
1     /dev/sda  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    480 GB
2     /dev/sdb  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    480 GB
3     /dev/sdc  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
4     /dev/sdd  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
5     /dev/sde  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
6     /dev/sdf  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
7     /dev/sdg  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
8     /dev/sdh  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
9     /dev/sdi  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
10    /dev/sdj  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
11    /dev/sdk  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
12    /dev/sdl  XXXXXXXXXXXXXXXX  XXXXXXXXXXXXXXXXXX    4.0 TB
================== XXX.XXX.XXX.XXX =================

allssh df -h


================== 1XXX.XXX.XXX.XX1 =================
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         16G     0   16G   0% /dev
tmpfs           512M     0  512M   0% /dev/shm
tmpfs            16G  1.1M   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/md0        9.8G  6.2G  3.5G  64% /
/dev/loop0      240M  2.4M  221M   2% /tmp
/dev/md2         40G   21G   19G  54% /home
tmpfs           3.2G     0  3.2G   0% /run/user/1000
/dev/sdf1       3.6T  1.7T  1.9T  48% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdh1       3.6T  1.7T  1.9T  48% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdl1       3.6T  1.7T  1.9T  48% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdi1       3.6T  1.7T  1.9T  48% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdk1       3.6T  1.7T  1.9T  48% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdc1       3.6T  1.7T  1.9T  48% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdd1       3.6T  1.7T  1.9T  48% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdb4       307G  247G   57G  82% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sda4       307G  248G   57G  82% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sde1       3.6T  1.7T  1.9T  48% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdg1       3.6T  1.7T  1.9T  48% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdj1       3.6T  1.7T  1.9T  48% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
================== XXX.XXX.XXX.XX2 =================
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs         16G     0   16G   0% /dev
tmpfs           512M     0  512M   0% /dev/shm
tmpfs            16G  1.1M   16G   1% /run
tmpfs            16G     0   16G   0% /sys/fs/cgroup
/dev/md0        9.8G  6.2G  3.5G  64% /
/dev/loop0      240M  2.4M  221M   2% /tmp
/dev/md2         40G   21G   19G  52% /home
tmpfs           3.2G     0  3.2G   0% /run/user/1000
/dev/sdg1       3.6T  1.5T  2.1T  42% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdc1       3.6T  1.5T  2.1T  42% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdk1       3.6T  1.5T  2.1T  42% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdj1       3.6T  1.5T  2.1T  42% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdd1       3.6T  1.5T  2.1T  42% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdf1       3.6T  1.5T  2.1T  42% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdl1       3.6T  1.5T  2.1T  42% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdi1       3.6T  1.5T  2.1T  42% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdh1       3.6T  1.5T  2.1T  42% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sde1       3.6T  1.5T  2.1T  42% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sdb4       307G  247G   58G  82% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
/dev/sda4       307G  252G   53G  83% /home/nutanix/data/stargate-storage/disks/XXXXXXXXXXXXXXXXXX
 

 

Badge +4

HI @Alexandria_512 

Thank you for the responses. Looks like one disk from each node are being removed. 

We will need to check the hades logs on both nodes to see if the disks were marked bad and were being unmounted or if its a different issue altogether. Please open a case with Nutanix support for further analysis.