Question

Disk space usage high on host


Badge +1
Hi!

I have some alert from nutanix ce. Disk space usage high on 3 hosts. But I have 4,61Tb free space of 5,54Tb. Please help me to fix it. Thanks very much





This topic has been closed for comments

27 replies

Userlevel 7
Badge +25
Did you check that host or cvm and just do an lsblk to see what the OS sees? It probably is not the DFS capacity and may be log data or other cruft on your USB.

Does the hardware alert tab in prism show which drive has the bad ratio?
same with me
please someone give solution,thanx
Badge +1
Did you check that host or cvm and just do an lsblk to see what the OS sees? It probably is not the DFS capacity and may be log data or other cruft on your USB.

Yes. result:





Does the hardware alert tab in prism show which drive has the bad ratio?


No. look at the screenshots from the first message .
Userlevel 7
Badge +25
Sorry my bad... df -h would be better... too many device related issues lately. 🤔
Badge +1
result. I didn't see any problem :(

Userlevel 7
Badge +25
Hmmm.... the mystery deepens. I didn't see the Hardware Alerts tab in those screenshots only the Perf Summary twice. It does look like you have a disjoint cluster with one node (154) with a much different storage profile than the others? It looks like it has a mess of guessing 480GB PNY SSDs? Those would be below the 500GB Nutanix would look for in the capacity tier so wondering if that is making a calc function abnormally.

An interesting test would be to remove that node and see if the error goes away.
Badge +1
I keep getting this issue too.

It seems that the files in /var/log on the host (not the CVM) keep piling up.
I resolve it with running:
sudo find /var/log -name '*-*' -mtime +7 -exec rm -f {} \;

...which essentially clears log files with a "-" older than 7 days.

Questions are:
Why do these log files keep filling up with seemingly futile journal info?
How can I increase the partition size to what the disk actually is? It is a 32G USB key but the install process capped it at 8G.
Badge +4
Having the same issue, 32Gb SATADOM.

code:
[root@NTNX-f7d1d483-A ~]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 272K 32G 1% /dev/shm
tmpfs 32G 1.7G 30G 6% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sdd1 6.8G 6.0G 509M 93% /
tmpfs 6.3G 0 6.3G 0% /run/user/0
[root@NTNX-f7d1d483-A ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 953.9G 0 disk
├─sda1 8:1 0 10G 0 part
├─sda2 8:2 0 10G 0 part
├─sda3 8:3 0 40G 0 part
└─sda4 8:4 0 893.9G 0 part
sdb 8:16 0 953.9G 0 disk
└─sdb1 8:17 0 953.9G 0 part
sdc 8:32 0 931.5G 0 disk
├─sdc1 8:33 0 102M 0 part
└─sdc2 8:34 0 2G 0 part
sdd 8:48 0 29.5G 0 disk
└─sdd1 8:49 0 7G 0 part /


sdd1 (SATADOM) only has a 7G partition.
After cleaning the logs older than 7 day I only recovered a bit of space.
code:
[root@NTNX-f7d1d483-A log]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 272K 32G 1% /dev/shm
tmpfs 32G 1.7G 30G 6% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sdd1 6.8G 5.6G 824M 88% /
tmpfs 6.3G 0 6.3G 0% /run/user/0


Have the same questions as aspoov.

Thanks
Userlevel 1
Badge +5
I had some issues with space on CVM this might help -- >
https://ronnie.ie/2016/11/21/nutanix-cvm-space-issues/
Badge +4
Hi ronnie,

Thanks for your reply.
Unfortunately our problem is not space issues in the CVM, we are limited in the host it self.
Userlevel 1
Badge +5
Ok no worries.
Userlevel 7
Badge +25
I have had similar issues in the past https://next.nutanix.com/discussion-forum-14/log-cleanup-9217 for a CVM (understood this is for AHV though). I don't know if scavenger keeps AHV scrubbed.

you could probably just use the same query to find out the offenders
code:
sudo du -ch --exclude=/home/nutanix / | sort -rh | head -15 


My gut says that there are some upgrade images hanging our in /home/install or something similar that is eating all your space.
Badge +4
Hey,

Thanks for the reply, here goes the result:

code:
[root@NTNX-f7d1d483-A ~]# sudo du -ch --exclude=/home/nutanix / | sort -rh | head -15 
du: cannot access ‘/proc/26219/task/26219/fd/4’: No such file or directory
du: cannot access ‘/proc/26219/task/26219/fdinfo/4’: No such file or directory
du: cannot access ‘/proc/26219/fd/3’: No such file or directory
du: cannot access ‘/proc/26219/fdinfo/3’: No such file or directory
7.9G total
7.9G /
3.7G /home/install/phx_iso
3.7G /home/install
3.7G /home
3.4G /home/install/phx_iso/mnt/local
3.4G /home/install/phx_iso/mnt
3.3G /home/install/phx_iso/mnt/local/images
2.9G /home/install/phx_iso/mnt/local/images/svm/2018.01.31-stable
2.9G /home/install/phx_iso/mnt/local/images/svm
2.3G /run/log/journal/7d3f49164dd94f739d2497de962800a5
2.3G /run/log/journal
2.3G /run/log
2.3G /run
1.2G /usr


You are right, do these images need to be here?

When I login into the AHV I still get this message:
code:
Nutanix Community Edition AHV

Welcome to Nutanix Community Edition. Please login with
the 'install' user (no password) to begin the installation.
Userlevel 7
Badge +25
Yeah that's the default login view for AHV at all times. You just login as root using the 4u password like you would do with user nutanix on the cvm.

Problem is my lab got reclaimed in a building move so I am unsure if that mountpoint is the default file system for the CVM (I think it is) which is passed though via libvirt. So don't nuke it just yet.

So systemd seems to be eating a lot. "journalctl --disk-usage" may provide some insight?

I would poke around in the UUID directory of your /journal and maybe there are some .journal~ temp files hanging around?
Badge +4
Regarding journalctl disk usage:
journalctl --disk-usage
Archived and active journals take up 3.0G on disk.

Inside the /journal directory no temp files.

Warning keeps coming up, open to more suggestions
Userlevel 7
Badge +25
code:
Maybe try and shrink the journal?

journalctl --vacuum-size=500M
Userlevel 7
Badge +34
Hi @miranon @Event

Were you able to get this issue fixed? Anything you can share here with the community would be great as this would help others 👍
Badge +4
Shrinking the Journal size helps but of course it grows again. I could limit the size of the journal size but it's built in a way so it rotates depending on space.

The question here is, is it ok/normal for the host to become full, if so why the warnings? If not what are the permanent solutions here? Increasing the partition possible or limiting the journal size is the way to go.

Thanks
Userlevel 2
Badge +11
I had the same issue today on one of my nodes when the host's disk usage exceeded 90%. The host still had the default partition size from the image installer. As I had dd'ed the image to a 128GB SSD, I expanded the partition into the unused space to resolve it.

It appears I'll have to do the same to the other 3 nodes shortly being they are in the 89%-90% usage range.
Userlevel 7
Badge +25
Yeah may want to do a "journalctl --since yesterday" to see if you can spot some noisy event and try and resolve that. If the journal isn't capped by nutanix than at some point it will catch up with a larger partition.
Userlevel 2
Badge +11
Yeah may want to do a "journalctl --since yesterday" to see if you can spot some noisy event and try and resolve that. If the journal isn't capped by nutanix than at some point it will catch up with a larger partition.

All I'm seeing are frequent entries seemingly related to ssh sessions being opened and closed from each CVM which, I guess, is normal. The daemons are systemd-logind, systemd, sshd and libvirtd.
Userlevel 7
Badge +25
I don't have an install handy, but is there a SystemMaxFileSize set in journald.conf? wondering if there needs to be some tweaks to the defaults in CE. https://www.freedesktop.org/software/systemd/man/journald.conf.html

Either preserve more freespace as a ratio or I would probably set a cap on total so it doesn't soak up to 4G of the rootfs 8G.
Userlevel 2
Badge +11
All the entries in journald.conf are commented out including SystemMaxFileSize so must be working with defaults.

code:
#SystemMaxFileSize=
Userlevel 7
Badge +25
@aluciani may be a good feature enhancement as sshd is being logged to the journal. With only /var/log to work with and potentially resource constrained in non-volatile /run/log depending on how much ram was allocated to AHV. Not sure if @AdamFG is still the man of the hour on that.

Maybe try setting SystemMaxUse to something like 500M and RuntimeMaxUse maybe to the same. See if that limits the use, but we need to be aware that constant rolling of IO on the USB may cause issues
Badge
I had the same issue today on one of my nodes when the host's disk usage exceeded 90%. The host still had the default partition size from the image installer. As I had dd'ed the image to a 128GB SSD, I expanded the partition into the unused space to resolve it.

It appears I'll have to do the same to the other 3 nodes shortly being they are in the 89%-90% usage range.


@HS would you mind explaining how you did this? I've got 64GB USB drives installed and the root partition is only 7.3Gb on my nodes for some reason (probably my doing).

Thanks in advance for any help you can provide.

Paul

code:
devtmpfs     17G   0  17G  0% /dev
tmpfs 17G 279k 17G 1% /dev/shm
tmpfs 17G 1.7G 16G 10% /run
tmpfs 17G 0 17G 0% /sys/fs/cgroup
/dev/sdc1 7.3G 6.3G 575M 92% /
tmpfs 3.4G 0 3.4G 0% /run/user/0
[root@NTNX-c9156e7c-A ~]#