Question

Disks offline after replace a disk failed


Badge +1
Hi! I have many disk marked as offline after replace a disk failed. How can I put these disk as online again?

Thanks in advanced.

This topic has been closed for comments

22 replies

Userlevel 7
Badge +25
Did you replace the failed disk to get the node functional again? Was this just a capacity disk or was it the SSD for metadata?
Badge +1
When I replaced the disk I had already put nutanix the other disks as offline. I replaced the failed disk to get the node functional again. The failed disk was a capacity.

Thanks.
Userlevel 7
Badge +25
So you ran the add_disk script to add the new one to the pool. Someone seems odd that losing one capacity disk would impact all disks.

What does ncli disks ls show?
Badge +1
Disk show only 3 disks (online disks):
2- SSD
1- HD

All 6 HD disk not show

ncli> disk list

Id : 00056c27-d5cc-ac17-15e1-d06726d06*****
Uuid : e4c58872-ba3b-4a8a-81f2-8c358817****
Storage Tier : SSD
Max Capacity : 161.46 GiB (173,365,601,977 bytes)
Used Capacity : 131.45 GiB (141,139,034,112 bytes)
Free Capacity : 30.01 GiB (32,226,567,865 bytes)
Host Name : 192.168.1.212
Controller VM Address : 192.168.1.211
Mount Path : /home/nutanix/data/stargate-storage/disks/S36KNX0K215101
Storage Pool Id : 00056c27-d5cc-ac17-15e1-d06726d0635e::3
Storage Pool Uuid : d5f916af-05b2-4cd4-939d-deb3c5a66c1f
Online : true
Status : Normal
Location : 5
Self Encrypting Drive : false

Id : 00056c27-d5cc-ac17-15e1-d06726d0*****
Uuid : 3198fb65-ba45-4533-975c-127dbe9*****
Storage Tier : SSD
Max Capacity : 82.92 GiB (89,038,499,308 bytes)
Used Capacity : 73.46 GiB (78,880,542,720 bytes)
Free Capacity : 9.46 GiB (10,157,956,588 bytes)
Host Name : 192.168.1.212
Controller VM Address : 192.168.1.211
Mount Path : /home/nutanix/data/stargate-storage/disks/S36KNX0K215308
Storage Pool Id : 00056c27-d5cc-ac17-15e1-d06726d0635e::3
Storage Pool Uuid : d5f916af-05b2-4cd4-939d-deb3c5a66c1f
Online : true
Status : Normal
Location : 6
Self Encrypting Drive : false

Id : 00056c27-d5cc-ac17-15e1-d06726d0****
Uuid : 17d3b06e-6750-46a4-8114-45f5ca****
Storage Tier : HDD
Max Capacity : 3.3 TiB (3,624,123,930,747 bytes)
Used Capacity : 1.66 TiB (1,824,317,001,728 bytes)
Free Capacity : 1.64 TiB (1,799,806,929,019 bytes)
Host Name : 192.168.1.212
Controller VM Address : 192.168.1.211
Mount Path : /home/nutanix/data/stargate-storage/disks/ZC163GA7
Storage Pool Id : 00056c27-d5cc-ac17-15e1-d06726d0635e::3
Storage Pool Uuid : d5f916af-05b2-4cd4-939d-deb3c5a66c1f
Online : true
Status : Normal
Location : 2
Self Encrypting Drive : false

In Prism:



There are all disk ID, included the old failed id disk: ZC16047Y

The new ID disk is not show: ZC16XFSE

Filesystem Size Used Avail Use% Mounted on
/dev/sdh1 9.8G 4.9G 4.9G 51% /
devtmpfs 12G 0 12G 0% /dev
tmpfs 12G 4.0K 12G 1% /dev/shm
tmpfs 12G 1.2G 11G 10% /run
tmpfs 12G 0 12G 0% /sys/fs/cgroup
/dev/sdh3 40G 16G 23G 42% /home
/dev/sdf1 3.6T 1.3T 2.4T 35% /home/nutanix/data/stargate-storage/disks/ZC16047Y
/dev/sde1 3.6T 1.7T 2.0T 46% /home/nutanix/data/stargate-storage/disks/ZC163GA7
/dev/sdb1 3.6T 1.3T 2.4T 35% /home/nutanix/data/stargate-storage/disks/ZC163GA1
/dev/sda1 3.6T 1.3T 2.4T 35% /home/nutanix/data/stargate-storage/disks/ZC1615JX
/dev/sdi1 220G 134G 84G 62% /home/nutanix/data/stargate-storage/disks/S36KNX0K215101
/dev/sdh4 161G 101G 58G 64% /home/nutanix/data/stargate-storage/disks/S36KNX0K215308
/dev/sdg1 3.6T 1.3T 2.4T 35% /home/nutanix/data/stargate-storage/disks/ZC1633RY
/dev/sdc1 3.6T 1.3T 2.4T 36% /home/nutanix/data/stargate-storage/disks/ZC163GZT
/dev/sdd1 3.6T 1.3T 2.4T 35% /home/nutanix/data/stargate-storage/disks/ZC1634AV
tmpfs 2.4G 0 2.4G 0% /run/user/1000
tmpfs 2.4G 0 2.4G 0% /run/user/2000


How can I delete the old disk and add the new?

Thanks in advanced
Userlevel 7
Badge +25
So those SSDs are a bit small so assuming you updated minreq?
Is this a single node deploy?

You ran ce_add_disk already to hook the devices into the pool?

You should get a prompt in Prism as "Remove" that can eject any of the devices you think are suspect. the snaps you sent look more like warnings though and not complete failures.

what does ncli disk ls-tombstone-entries show?
Badge +1
Yes It's a single node deploy. It's used only for replicate as DR.

I have not run ce_add_disk, I am a newbie in nutanix CE. I will send you ls-tombstone-entries tomorrow, I have not access to the cluster now.

So, These would be the steps?

  • Remove the old disk from Prism
  • ce_add_disk to add the new disk to pool.
  • Any more?
Thanks.
Userlevel 7
Badge +25
So if it is a one node than you have no replicated blocks (beyond your source) so you would have some data loss if the disk is already dead.

I would add the new disk before pulling the "bad" so if the old disk is marginal maybe nutanix can move blocks to that new capacity as part of the removal process.

And I guess I am skeptical that your 3 capacity disks are dead. Seems surprising they would all fail at the same time.
Badge +1
Yes it's very extrange. Only one disk failed and automatically nutanix put the other 5 in offline mode, leaving only one of capacity as online😓
Userlevel 7
Badge +25
May want to check your hades.out log. Feels suspect and doesn't pass the smell test. Could be the hba itself or something more fundamental in the fabric.
Badge +1
Attach the hades.out.log

hades.out
Userlevel 7
Badge +25
Sorry can't access that from my current locale. I would poke around yourself using those serial numbers that are offline and see if anything stands out.
Badge +1
This is the hades.out.log the specific date when the failure occurred:


2018-08-02 11:38:46 rolled over log file
INFO 1558 ../../../../../infrastructure/cluster/service_monitor/service_monitor.c:175 StartServiceMonitor: Launched child with pid: 1560
INFO 1560 ../../../../../infrastructure/cluster/service_monitor/service_monitor.c:200 StartService: Starting service with cmd: /usr/local/nutanix/bootstrap/bin/hades
INFO 1560 ../../../../../infrastructure/cluster/service_monitor/service_monitor.c:129 RefreshZkHostPortList: Setting ZOOKEEPER_HOST_PORT_LIST=zk1:9876;
2018-08-02 11:38:49 INFO server.py:78 Starting Hades
2018-08-02 11:38:49 INFO disk_manager.py:3637 Setting /sys/block/sda/queue/scheduler to deadline
2018-08-02 11:38:49 INFO disk_manager.py:3637 Setting /sys/block/sdb/queue/scheduler to deadline
2018-08-02 11:38:49 INFO disk_manager.py:3637 Setting /sys/block/sdc/queue/scheduler to deadline
2018-08-02 11:38:49 INFO disk_manager.py:3637 Setting /sys/block/sdd/queue/scheduler to deadline
2018-08-02 11:38:49 INFO disk_manager.py:3637 Setting /sys/block/sde/queue/scheduler to deadline
2018-08-02 11:38:49 INFO disk_manager.py:3637 Setting /sys/block/sdf/queue/scheduler to deadline
2018-08-02 11:38:49 INFO disk_manager.py:3637 Setting /sys/block/sdg/queue/scheduler to deadline
2018-08-02 11:38:49 INFO disk_manager.py:3637 Setting /sys/block/sdh/queue/scheduler to deadline
2018-08-02 11:38:49 INFO disk_manager.py:3637 Setting /sys/block/sdi/queue/scheduler to deadline
2018-08-02 11:38:49 INFO disk_manager.py:3637 Setting /sys/block/sr0/queue/scheduler to deadline
2018-08-02 11:38:49 INFO disk_manager.py:3919 No disks to stripe, continuing...
2018-08-02 11:38:49 INFO fio_utils.py:72 Executing command lspci
2018-08-02 11:38:49 INFO fio_utils.py:113 No Fusion IO cards present on the local node
2018-08-02 11:38:49 INFO disk_manager.py:321 Disabling write cache on broken HBAs
2018-08-02 11:38:49 INFO disk_manager.py:3622 Executing cmd: /home/nutanix/cluster/lib/lsi-sas/lsiutil -i
2018-08-02 11:38:49 INFO disk_manager.py:325 Preparing block devices
2018-08-02 11:38:49 INFO disk_manager.py:3724 Disabling the write cache on disk /dev/sdf
2018-08-02 11:38:50 INFO disk_manager.py:3750 Checking for corrupt filesystem on partition /dev/sdf1
2018-08-02 11:38:50 INFO disk_manager.py:3754 Enabling journaling on ext4 partition /dev/sdf1
2018-08-02 11:38:50 INFO disk_manager.py:3724 Disabling the write cache on disk /dev/sde
2018-08-02 11:38:50 INFO disk_manager.py:3750 Checking for corrupt filesystem on partition /dev/sde1
2018-08-02 11:38:50 INFO disk_manager.py:3754 Enabling journaling on ext4 partition /dev/sde1
2018-08-02 11:38:50 INFO disk_manager.py:3724 Disabling the write cache on disk /dev/sdb
2018-08-02 11:38:50 INFO disk_manager.py:3750 Checking for corrupt filesystem on partition /dev/sdb1
2018-08-02 11:38:50 INFO disk_manager.py:3754 Enabling journaling on ext4 partition /dev/sdb1
2018-08-02 11:38:50 INFO disk_manager.py:3724 Disabling the write cache on disk /dev/sda
2018-08-02 11:38:50 INFO disk_manager.py:3750 Checking for corrupt filesystem on partition /dev/sda1
2018-08-02 11:38:50 INFO disk_manager.py:3754 Enabling journaling on ext4 partition /dev/sda1
2018-08-02 11:38:50 INFO disk_manager.py:3724 Disabling the write cache on disk /dev/sdi
2018-08-02 11:38:51 INFO disk_manager.py:3750 Checking for corrupt filesystem on partition /dev/sdi1
2018-08-02 11:38:51 INFO disk_manager.py:3754 Enabling journaling on ext4 partition /dev/sdi1
2018-08-02 11:38:51 INFO disk_manager.py:3765 Setting /sys/block/sdi/queue/rotational to 1 for disk /dev/sdi
2018-08-02 11:38:51 INFO disk_manager.py:3750 Checking for corrupt filesystem on partition /dev/sdh4
2018-08-02 11:38:51 INFO disk_manager.py:3754 Enabling journaling on ext4 partition /dev/sdh4
2018-08-02 11:38:51 INFO disk_manager.py:3765 Setting /sys/block/sdh/queue/rotational to 1 for disk /dev/sdh
2018-08-02 11:38:51 INFO disk_manager.py:3724 Disabling the write cache on disk /dev/sdg
2018-08-02 11:38:51 INFO disk_manager.py:3750 Checking for corrupt filesystem on partition /dev/sdg1
2018-08-02 11:38:51 INFO disk_manager.py:3754 Enabling journaling on ext4 partition /dev/sdg1
2018-08-02 11:38:51 INFO disk_manager.py:3724 Disabling the write cache on disk /dev/sdc
2018-08-02 11:38:51 INFO disk_manager.py:3750 Checking for corrupt filesystem on partition /dev/sdc1
2018-08-02 11:38:51 INFO disk_manager.py:3754 Enabling journaling on ext4 partition /dev/sdc1
2018-08-02 11:38:51 INFO disk_manager.py:3724 Disabling the write cache on disk /dev/sdd
2018-08-02 11:38:51 INFO disk_manager.py:3750 Checking for corrupt filesystem on partition /dev/sdd1
2018-08-02 11:38:51 INFO disk_manager.py:3754 Enabling journaling on ext4 partition /dev/sdd1
2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout /dev/sdf1: clean, 365999/244195328 files, 347464683/976754176 blocks
, stderr e2fsck 1.42.9 (28-Dec-2013)

2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout /dev/sde1: clean, 444492/244195328 files, 419678739/976754176 blocks
, stderr e2fsck 1.42.9 (28-Dec-2013)

2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout /dev/sdb1: clean, 365340/244195328 files, 347081555/976754176 blocks
, stderr e2fsck 1.42.9 (28-Dec-2013)

2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout /dev/sda1: clean, 366378/244195328 files, 348012167/976754176 blocks
, stderr e2fsck 1.42.9 (28-Dec-2013)

2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout /dev/sdi1: recovering journal
Clearing orphaned inode 131393 (uid=1000, gid=1000, mode=0100400, size=168)
Clearing orphaned inode 131403 (uid=1000, gid=1000, mode=0100400, size=572)
Clearing orphaned inode 131517 (uid=1000, gid=1000, mode=0100400, size=254)
Clearing orphaned inode 131518 (uid=1000, gid=1000, mode=0100400, size=858)
Clearing orphaned inode 131569 (uid=1000, gid=1000, mode=0100400, size=43)
Clearing orphaned inode 131570 (uid=1000, gid=1000, mode=0100400, size=144)
Clearing orphaned inode 131197 (uid=1000, gid=1000, mode=0100400, size=86)
Clearing orphaned inode 131198 (uid=1000, gid=1000, mode=0100400, size=286)
Clearing orphaned inode 131443 (uid=1000, gid=1000, mode=0100400, size=44)
Clearing orphaned inode 131444 (uid=1000, gid=1000, mode=0100400, size=134)
Clearing orphaned inode 131153 (uid=1000, gid=1000, mode=0100400, size=43)
Clearing orphaned inode 131162 (uid=1000, gid=1000, mode=0100400, size=144)
Clearing orphaned inode 131282 (uid=1000, gid=1000, mode=0100400, size=171)
Clearing orphaned inode 131284 (uid=1000, gid=1000, mode=0100400, size=575)
Clearing orphaned inode 131115 (uid=1000, gid=1000, mode=0100400, size=167)
Clearing orphaned inode 131121 (uid=1000, gid=1000, mode=0100400, size=571)
Clearing orphaned inode 131293 (uid=1000, gid=1000, mode=0100400, size=35121)
Clearing orphaned inode 131294 (uid=1000, gid=1000, mode=0100400, size=86357895)
Clearing orphaned inode 131417 (uid=1000, gid=1000, mode=0100400, size=16158)
Clearing orphaned inode 131418 (uid=1000, gid=1000, mode=0100400, size=538235)
Clearing orphaned inode 131189 (uid=1000, gid=1000, mode=0100400, size=34763)
Clearing orphaned inode 131190 (uid=1000, gid=1000, mode=0100400, size=617029)
Clearing orphaned inode 131336 (uid=1000, gid=1000, mode=0100400, size=15567)
Clearing orphaned inode 131337 (uid=1000, gid=1000, mode=0100400, size=352682)
Clearing orphaned inode 131492 (uid=1000, gid=1000, mode=0100400, size=6202)
Clearing orphaned inode 131495 (uid=1000, gid=1000, mode=0100400, size=48216)
Clearing orphaned inode 131256 (uid=1000, gid=1000, mode=0100400, size=8025)
Clearing orphaned inode 131257 (uid=1000, gid=1000, mode=0100400, size=53886)
Clearing orphaned inode 131363 (uid=1000, gid=1000, mode=0100400, size=10427)
Clearing orphaned inode 131364 (uid=1000, gid=1000, mode=0100400, size=37932)
Clearing orphaned inode 131475 (uid=1000, gid=1000, mode=0100400, size=5460)
Clearing orphaned inode 131479 (uid=1000, gid=1000, mode=0100400, size=18246)
Clearing orphaned inode 131228 (uid=1000, gid=1000, mode=0100400, size=5329)
Clearing orphaned inode 131231 (uid=1000, gid=1000, mode=0100400, size=17941)
Clearing orphaned inode 131445 (uid=1000, gid=1000, mode=0100400, size=21)
Clearing orphaned inode 131453 (uid=1000, gid=1000, mode=0100400, size=8095)
Setting free inodes count to 14608180 (was 14654975)
Setting free blocks count to 17246917 (was 57369296)
/dev/sdi1: clean, 47308/14655488 files, 41360443/58607360 blocks
, stderr e2fsck 1.42.9 (28-Dec-2013)

2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout /dev/sdh4: recovering journal
Clearing orphaned inode 3670111 (uid=1000, gid=1000, mode=0100400, size=16011)
Clearing orphaned inode 3670112 (uid=1000, gid=1000, mode=0100400, size=231784)
Clearing orphaned inode 3670489 (uid=1000, gid=1000, mode=0100400, size=5766)
Clearing orphaned inode 3670490 (uid=1000, gid=1000, mode=0100400, size=30313)
Clearing orphaned inode 3670170 (uid=1000, gid=1000, mode=0100400, size=8277)
Clearing orphaned inode 3670171 (uid=1000, gid=1000, mode=0100400, size=37467)
Clearing orphaned inode 3670433 (uid=1000, gid=1000, mode=0100400, size=11859)
Clearing orphaned inode 3670441 (uid=1000, gid=1000, mode=0100400, size=40966)
Clearing orphaned inode 3670327 (uid=1000, gid=1000, mode=0100400, size=6583)
Clearing orphaned inode 3670330 (uid=1000, gid=1000, mode=0100400, size=21257)
Clearing orphaned inode 3670157 (uid=1000, gid=1000, mode=0100400, size=7181)
Clearing orphaned inode 3670163 (uid=1000, gid=1000, mode=0100400, size=22797)
Clearing orphaned inode 3670417 (uid=1000, gid=1000, mode=0100400, size=1472)
Clearing orphaned inode 3670571 (uid=1000, gid=1000, mode=0100400, size=9549)
Clearing orphaned inode 3670480 (uid=1000, gid=1000, mode=0100400, size=1377)
Clearing orphaned inode 3670482 (uid=1000, gid=1000, mode=0100400, size=5527)
Clearing orphaned inode 3670145 (uid=1000, gid=1000, mode=0100400, size=877)
Clearing orphaned inode 3670146 (uid=1000, gid=1000, mode=0100400, size=3378)
Clearing orphaned inode 3670411 (uid=1000, gid=1000, mode=0100400, size=108)
Clearing orphaned inode 3670418 (uid=1000, gid=1000, mode=0100400, size=832)
Clearing orphaned inode 3670134 (uid=1000, gid=1000, mode=0100400, size=108)
Clearing orphaned inode 3670135 (uid=1000, gid=1000, mode=0100400, size=832)
Clearing orphaned inode 3670437 (uid=1000, gid=1000, mode=0100400, size=441)
Clearing orphaned inode 3670438 (uid=1000, gid=1000, mode=0100400, size=7250)
Clearing orphaned inode 3670110 (uid=1000, gid=1000, mode=0100400, size=1006)
Clearing orphaned inode 3670115 (uid=1000, gid=1000, mode=0100400, size=24900)
Clearing orphaned inode 3670430 (uid=1000, gid=1000, mode=0100400, size=29848)
Clearing orphaned inode 3670431 (uid=1000, gid=1000, mode=0100400, size=66879154)
Clearing orphaned inode 3670341 (uid=1000, gid=1000, mode=0100400, size=14237)
Clearing orphaned inode 3670342 (uid=1000, gid=1000, mode=0100400, size=432706)
Clearing orphaned inode 3670098 (uid=1000, gid=1000, mode=0100400, size=32155)
Clearing orphaned inode 3670099 (uid=1000, gid=1000, mode=0100400, size=504719)
Clearing orphaned inode 3670246 (uid=1000, gid=1000, mode=0100400, size=44)
Clearing orphaned inode 3670252 (uid=1000, gid=1000, mode=0100400, size=145)
Clearing orphaned inode 3670126 (uid=1000, gid=1000, mode=0100400, size=44)
Clearing orphaned inode 3670127 (uid=1000, gid=1000, mode=0100400, size=145)
Clearing orphaned inode 3670229 (uid=1000, gid=1000, mode=0100400, size=132)
Clearing orphaned inode 3670230 (uid=1000, gid=1000, mode=0100400, size=402)
Clearing orphaned inode 3670120 (uid=1000, gid=1000, mode=0100400, size=43)
Clearing orphaned inode 3670140 (uid=1000, gid=1000, mode=0100400, size=144)
Clearing orphaned inode 3670078 (uid=1000, gid=1000, mode=0100400, size=86)
Clearing orphaned inode 3670079 (uid=1000, gid=1000, mode=0100400, size=288)
Clearing orphaned inode 3670069 (uid=1000, gid=1000, mode=0100400, size=255)
Clearing orphaned inode 3670070 (uid=1000, gid=1000, mode=0100400, size=861)
Setting free inodes count to 10697849 (was 10722805)
Setting free blocks count to 14447176 (was 36639446)
/dev/sdh4: clean, 25479/10723328 files, 28431544/42878720 blocks
, stderr e2fsck 1.42.9 (28-Dec-2013)
Badge +1
Part 2:

2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout /dev/sdg1: clean, 366276/244195328 files, 347507096/976754176 blocks
, stderr e2fsck 1.42.9 (28-Dec-2013)

2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout /dev/sdc1: clean, 367235/244195328 files, 348861320/976754176 blocks
, stderr e2fsck 1.42.9 (28-Dec-2013)

2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout /dev/sdd1: clean, 365981/244195328 files, 347395530/976754176 blocks
, stderr e2fsck 1.42.9 (28-Dec-2013)

2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout tune2fs 1.42.9 (28-Dec-2013)
, stderr
2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout tune2fs 1.42.9 (28-Dec-2013)
, stderr
2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout tune2fs 1.42.9 (28-Dec-2013)
, stderr
2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout tune2fs 1.42.9 (28-Dec-2013)
, stderr
2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout tune2fs 1.42.9 (28-Dec-2013)
, stderr
2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout tune2fs 1.42.9 (28-Dec-2013)
, stderr
2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout tune2fs 1.42.9 (28-Dec-2013)
, stderr
2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout tune2fs 1.42.9 (28-Dec-2013)
, stderr
2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout tune2fs 1.42.9 (28-Dec-2013)
, stderr
2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout , stderr
2018-08-02 11:38:55 INFO disk_manager.py:3889 Command finished with ret 0, stdout , stderr
2018-08-02 11:38:56 INFO disk_manager.py:388 Mounting disks: set(['/dev/sdh', '/dev/sdi', '/dev/sdf', '/dev/sdg', '/dev/sdd', '/dev/sde', '/dev/sdb', '/dev/sdc', '/dev/sda'])
2018-08-02 11:38:56 INFO disk_manager.py:395 Mounting disk: /dev/sdf
2018-08-02 11:38:56 INFO disk_manager.py:935 Waiting for disk mount lock for mount disk /dev/sdf
2018-08-02 11:38:56 INFO disk_manager.py:973 Mounting partitions on disk /dev/sdf
2018-08-02 11:38:56 INFO disk_manager.py:988 Mounting partition /dev/sdf1 on path /home/nutanix/data/stargate-storage/disks/ZC16047Y
2018-08-02 11:38:56 INFO disk_manager.py:395 Mounting disk: /dev/sde
2018-08-02 11:38:56 INFO disk_manager.py:935 Waiting for disk mount lock for mount disk /dev/sde
2018-08-02 11:38:56 INFO disk_manager.py:973 Mounting partitions on disk /dev/sde
2018-08-02 11:38:56 INFO disk_manager.py:988 Mounting partition /dev/sde1 on path /home/nutanix/data/stargate-storage/disks/ZC163GA7
2018-08-02 11:38:56 INFO disk_manager.py:395 Mounting disk: /dev/sdb
2018-08-02 11:38:56 INFO disk_manager.py:935 Waiting for disk mount lock for mount disk /dev/sdb
2018-08-02 11:38:56 INFO disk_manager.py:973 Mounting partitions on disk /dev/sdb
2018-08-02 11:38:56 INFO disk_manager.py:988 Mounting partition /dev/sdb1 on path /home/nutanix/data/stargate-storage/disks/ZC163GA1
2018-08-02 11:38:57 INFO disk_manager.py:395 Mounting disk: /dev/sda
2018-08-02 11:38:57 INFO disk_manager.py:935 Waiting for disk mount lock for mount disk /dev/sda
2018-08-02 11:38:57 INFO disk_manager.py:973 Mounting partitions on disk /dev/sda
2018-08-02 11:38:57 INFO disk_manager.py:988 Mounting partition /dev/sda1 on path /home/nutanix/data/stargate-storage/disks/ZC1615JX
2018-08-02 11:38:57 INFO disk_manager.py:395 Mounting disk: /dev/sdi
2018-08-02 11:38:57 INFO disk_manager.py:935 Waiting for disk mount lock for mount disk /dev/sdi
2018-08-02 11:38:57 INFO disk_manager.py:973 Mounting partitions on disk /dev/sdi
2018-08-02 11:38:57 INFO disk_manager.py:988 Mounting partition /dev/sdi1 on path /home/nutanix/data/stargate-storage/disks/S36KNX0K215101
2018-08-02 11:38:57 INFO disk_manager.py:395 Mounting disk: /dev/sdh
2018-08-02 11:38:57 INFO disk_manager.py:935 Waiting for disk mount lock for mount disk /dev/sdh
2018-08-02 11:38:57 INFO disk_manager.py:973 Mounting partitions on disk /dev/sdh
2018-08-02 11:38:57 INFO disk_manager.py:988 Mounting partition /dev/sdh4 on path /home/nutanix/data/stargate-storage/disks/S36KNX0K215308
2018-08-02 11:38:57 INFO disk_manager.py:395 Mounting disk: /dev/sdg
2018-08-02 11:38:57 INFO disk_manager.py:935 Waiting for disk mount lock for mount disk /dev/sdg
2018-08-02 11:38:57 INFO disk_manager.py:973 Mounting partitions on disk /dev/sdg
2018-08-02 11:38:57 INFO disk_manager.py:988 Mounting partition /dev/sdg1 on path /home/nutanix/data/stargate-storage/disks/ZC1633RY
2018-08-02 11:38:58 INFO disk_manager.py:395 Mounting disk: /dev/sdc
2018-08-02 11:38:58 INFO disk_manager.py:935 Waiting for disk mount lock for mount disk /dev/sdc
2018-08-02 11:38:58 INFO disk_manager.py:973 Mounting partitions on disk /dev/sdc
2018-08-02 11:38:58 INFO disk_manager.py:988 Mounting partition /dev/sdc1 on path /home/nutanix/data/stargate-storage/disks/ZC163GZT
2018-08-02 11:38:58 INFO disk_manager.py:395 Mounting disk: /dev/sdd
2018-08-02 11:38:58 INFO disk_manager.py:935 Waiting for disk mount lock for mount disk /dev/sdd
2018-08-02 11:38:58 INFO disk_manager.py:973 Mounting partitions on disk /dev/sdd
2018-08-02 11:38:58 INFO disk_manager.py:988 Mounting partition /dev/sdd1 on path /home/nutanix/data/stargate-storage/disks/ZC1634AV
2018-08-02 11:38:58 INFO disk_manager.py:7865 Setting SSD read-ahead to 16 KB
2018-08-02 11:38:58 INFO disk_manager.py:7902 Read ahead value on /dev/sdi is currently set to 32 sectors
2018-08-02 11:38:58 INFO disk_manager.py:7913 Sector size of SSD /dev/sdi is 512
2018-08-02 11:38:58 INFO disk_manager.py:7902 Read ahead value on /dev/sdh is currently set to 32 sectors
2018-08-02 11:38:58 INFO disk_manager.py:7913 Sector size of SSD /dev/sdh is 512
2018-08-02 11:38:58 WARNING disk_manager.py:273 Failed to reach genesis. Skipping Hades configure
2018-08-02 11:38:58 INFO server.py:92 Starting the serve_http thread
2018-08-02 11:38:58 INFO server.py:96 Starting udev event handling
2018-08-02 11:38:58 INFO udev_handler.py:63 Starting udev worker and observer threads
2018-08-02 11:38:58 INFO server.py:99 Registering RPCs
2018-08-02 11:38:58 ERROR rpc.py:303 Json Rpc request for unknown Rpc object DiskManager
2018-08-02 11:39:30 INFO disk_manager.py:412 Configuring Hades
2018-08-02 11:39:30 INFO zookeeper_session.py:110 hades is attempting to connect to Zookeeper
2018-08-02 11:39:30 INFO zookeeper_session.py:110 hades is attempting to connect to Zookeeper
2018-08-02 11:39:30 WARNING disk_manager.py:5070 Unable to determine total number of slots. Empty disk slots will not have entries
2018-08-02 11:39:30 INFO disk.py:627 Failed to get the physical disk locations (CE: returning fake slot values)
2018-08-02 11:39:32 INFO disk_manager.py:3622 Executing cmd: /home/nutanix/cluster/lib/lsi-sas/lsiutil -i
2018-08-02 11:39:32 INFO kvm.py:1394 Executing cmd: cat /proc/mounts
2018-08-02 11:39:32 INFO kvm.py:1410 Active hostboot disk /dev/sdk
2018-08-02 11:39:32 INFO kvm.py:1424 Executing cmd: smartctl -a /dev/sdk
2018-08-02 11:39:32 ERROR kvm.py:1429 Unable to get bootdisk via smartctl (1, smartctl 6.2 2017-02-27 r4394 [x86_64-linux-4.4.77-1.el7.nutanix.20180123.170.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/sdk: Unknown USB bridge [0x0930:0x6545 (0x110)]
Please specify device type with the -d option.

Use smartctl -h to get a usage summary
Userlevel 7
Badge +25
Things seem good there. Def check the tombstone entries and remove them (rm-tombstone-entry or something like that) and see if the disks stay online.
Badge +1
Ok, I will check it tomorrow. Thank you so much.
Badge +1
I can't find tombstone-entries.... the command ncli disk ls-tombstone-entries does not exist, only ncli disk list and only it show Online disk, not offline.

I have execute ce_add_disk and the output is correct, shows scsi uuids disks.

I have seen that there is a command clean_disk for a partition. Is it useful? Can I execute it for the partition mounted in the failed disk uuid?

Thanks in advanced.
Userlevel 7
Badge +25
Hmm don't have a lab to check, but maybe that is the old model for storing the offline devices.

https://portal.nutanix.com/#/page/docs/details?targetId=Command-Ref-AOS-v55:acl-ncli-disk-auto-r.html

Not sure if scrubbing the device is your issue. Its trying to figure out why they were marked offline and how to get them back online. Feel like we may need a nutant if tombstone is not the model any more.
Userlevel 7
Badge +34
Hi @juanluis.delgado-67016

How did things work out for you? You still having issues?
Badge +1
Bad. Yes I still have issues with the offline disks.
Userlevel 7
Badge +34
Hi @juanluis.delgado-67016

Have you thought about starting over with your CE install? Maybe something was missed or something was configured incorrectly? Let us know - Thanks
Badge +1
CE reinstalled and working it again but this is a not good solution.
Userlevel 7
Badge +34
Thanks for sharing@juanluis.delgado-67016 sounds like you reinstalled it and it working at the moment.