Solved

IOError [Error 28] No space left on device

  • 13 July 2021
  • 4 replies
  • 189 views

Hi Everyone,

One of the nodes of our cluster suddenly got disconnected. I was not able to ping initially, but after restarting the node and checking if maintenance mode is enabled. Connectivity was regained and all the IPs can be pinged from each other. 

 

Even so, the cluster can’t recognize the previously disconnected node even when all pings are good. Tried restarting the cluster and the local CVM but the error below appears:

nutanix@NTNX-A-CVM:192.168.50.182:~$ cluster status
2021-07-13 07:34:46,652Z WARNING genesis_utils.py:1304 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 875, in emit
    self.flush()
  File "/usr/lib64/python2.7/logging/__init__.py", line 835, in flush
    self.stream.flush()
IOError: [Errno 28] No space left on device
Logged from file log.py, line 191
2021-07-13 07:34:47,654Z WARNING genesis_utils.py:1304 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 875, in emit
    self.flush()
  File "/usr/lib64/python2.7/logging/__init__.py", line 835, in flush
    self.stream.flush()
IOError: [Errno 28] No space left on device
Logged from file log.py, line 191
2021-07-13 07:34:48,656Z WARNING genesis_utils.py:1304 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 875, in emit
    self.flush()
  File "/usr/lib64/python2.7/logging/__init__.py", line 835, in flush
    self.stream.flush()
IOError: [Errno 28] No space left on device
Logged from file log.py, line 191
2021-07-13 07:34:49,659Z WARNING genesis_utils.py:1304 Failed to reach a node where Genesis is up. Retrying... (Hit Ctrl-C to abort
Traceback (most recent call last):
  File "/usr/lib64/python2.7/logging/__init__.py", line 875, in emit
    self.flush()
  File "/usr/lib64/python2.7/logging/__init__.py", line 835, in flush
    self.stream.flush()
IOError: [Errno 28] No space left on device
Logged from file log.py, line 191

 

Has anyone encountered this? Unfortunately, the support term ended months ago.

icon

Best answer by raaji 22 July 2021, 21:54

View original

This topic has been closed for comments

4 replies

Badge +3

Hello @John Renz please verify:

  1. Check from IPMI for any hardware issues
  2. From PE, do you see any alerts with regards to the CVM/host concerned? From hardware diagram in the PE, do you see any unmounted disks/bad disks/errors on the host?
  3. Please check portal.nutanix.com/kb/8086 is a match to the issue. 

Thanks and Best

Hi @Haritha Andal

There’s an alert in Prism:

Operation failed. Reason: Prechecks failed: Failed to reach lcm on 192.168.50.182. Please check KB 7781 Found remote version None while 2.4.2.25804 was expected on node 192.168.50.182. This may be a caching issue. Please ensure all local caches are cleared and wait a few minutes for any remote caches to get invalidated before retrying. Please check KB 7784

I’ve tried KB 7784 and 7781 but to no avail.

I’ve tried the KB you gave me and I saw that /home is at 100%

================== 192.168.50.182 =================
Filesystem      Size  Used Avail Use% Mounted on
devtmpfs        9.8G     0  9.8G   0% /dev
tmpfs           512M     0  512M   0% /dev/shm
tmpfs           9.9G  624K  9.9G   1% /run
tmpfs           9.9G     0  9.9G   0% /sys/fs/cgroup
/dev/sda2       9.8G  4.0G  5.8G  41% /
/dev/loop0      240M  2.1M  222M   1% /tmp
/dev/sda3        40G   39G     0 100% /home
tmpfs           2.0G     0  2.0G   0% /run/user/1000
/dev/sdb1       3.6T  1.3G  3.6T   1% /home/nutanix/data/stargate-storage/disks/ZC1B0WKY
/dev/sdc1       3.6T  1.4G  3.6T   1% /home/nutanix/data/stargate-storage/disks/ZC1ARFYN
/dev/sda4       1.7T  269G  1.4T  16% /home/nutanix/data/stargate-storage/disks/S47PNE0M400683
 

Tried deleting its contents but it says:


nutanix@NTNX-19FM6J260111-A-CVM:192.168.50.182:~/foundation$ rm /isos/*
/usr/bin/rm: cannot remove ‘/isos/*’: No such file or directory

 

Is there something wrong with my syntax ?
 

Userlevel 1
Badge +4

Hi John,

Let me respond to your query.

If /home is full, we will need to carefully check and clean files from approved directories on the CVM

Could you please ssh into CVM (192.168.50.182) and execute the following command?

du -h ~/data/logs

This shows the space utilized by the logs in the CVM 

And then please refer to this KB to clean up space: [AOS Only] What to do when /home partition or /home/nutanix directory on a Controller VM (CVM) is full

  • DO NOT use rm -rf under any circumstances unless stated. It will lead to data loss scenarios.

If you still need assistance on this, please open a support ticket with Nutanix to troubleshoot the issue

Hi @raaji,

Thanks! 

I actually tried this yesterday. Method 2 seems to work best. 

I just deleted some files at /home/nutanix/data/ as indicated in the KB

Just enough to reduce the utilization from 100% to 97%. After starting the cluster, the utilization came down to around 30%.