869/9500 = 1?
November 12, 2010 9:43 PM   Subscribe

I'm seeing "No space left on device" errors on a Ubuntu (standard LAMP stack) Virtual Private Server, to which I have root access via SSH. df -i reports no shortage of iNodes, but df -h reports Size = 9.5G, Used = 869M, Avail = 0, Use% = 100%. I'm not so good at math, but those numbers don't seem to add up. I've found no reports of such errors with similar disk space reports on Google, and my usual avenues of online support have not been helpful.
posted by The Confessor to Computers & Internet (18 answers total)
 
Hm. This sounds weird enough that it might be an issue with your hosting provider. I assume this VPS is hosted somewhere, right? Maybe your 10G quota is not full, but the physical disk itself is full. Can you post your df -H output?
posted by lholladay at 9:49 PM on November 12, 2010


Response by poster: Size = 11G, Used = 911M, Avail = 0, Use% = 100%, Hosting Provider = VPSLink.
posted by The Confessor at 10:08 PM on November 12, 2010


Remember that when you first create the filesystem, you select how much space is reserved for the superuser. This is typically around 5%. Now, 869M is not 95% of 9.5G, but something like that might be going on. "No space left on device" does not literally mean that there is no space left on the device, just that you personally can't use any more. (Incidentally, if you are over quota, the error is EDQUOT, "Disk quota exceeded". So it's probably not that.)

Anyway, this is absolutely a problem to bring up with your host. "It says there's no space left." It's probably not your problem, but if it is, there is no harm in asking support. That's what they are there for.
posted by jrockway at 10:10 PM on November 12, 2010


See if you happen to have a directory full of small files. I had an app that had ~100K images in a folder, and the filesystem didn't like that they were all in the same directory. It might not be that you're out of space, more that something is wrong and your filesystem doesn't like it.
posted by thebigdeadwaltz at 10:28 PM on November 12, 2010


This probably isn't it on preview, but you might have an ~8.7gig log file that has been deleted but is still held open by a process. Either reboot the server (which will kill the offending process) or see if you have access to lsof - Ubuntu lsof should indicate (deleted) files and which process is hanging on to it.

Is it possible that you have snapshots turned on on that filesystem? I've run into weirdness when the snapshot utilization impinges on your quota allocation, causing usage to spike even though you're only using a tiny little fraction, the balance is held in .snapshot archives that the OS isn't quite aware of.
posted by Kyol at 10:49 PM on November 12, 2010 [1 favorite]


It's probably the case that your VPS disk is just a file on a real physical machine with many other VPS machines on it and it's the physical disk that is actually full. That's the way it works in things like VirtualBox and probably any other virtualization solution that allows dynamic allocation.

For instance my CentOS VM has an 8G filesystem on it, 3.2G used... but the actual physical file on my Ubuntu laptop for that virtual space is 4.9G. If I tried to fill up my VM filesystem, it would grow to the full 8G and take up ~9.7G on my laptop. I bet the physical machine your VM is hosted on is out of disk space.

It may be the open but deleted file that Kyol mentions, since it's a VPS. But those open but deleted files do show up 'Used' as expected under df on a regular host (just checked).
posted by zengargoyle at 11:05 PM on November 12, 2010


Response by poster: 1. I have a support ticket in to my hosting provider now; I was reluctant to do so earlier because I kinda pride myself on being the sort who tries to fix whatever goes wrong myself.

2. I can't think of any directory of very many small files that I've made myself. Is there any good way to search them out using a BASH command?

3. Rebooting the server was one of the first things I tried.

4. How would I determine whether snapshots are enabled?
posted by The Confessor at 11:08 PM on November 12, 2010



find . -type d | while read a; do echo `ls "$a" | wc -l` "$a"; done | sort -n
I would run it from subdirectories that you think may be a problem, it needs a lot more stuff to avoid trying to go through /proc or /dev and take forever and throw a lot of errors if you don't have permissions for certain files and such. My home directory turns up:

434 ./NetSight/System/mibs
457 ./.wine/drive_c/Python26/Lib/test
530 ./.wine/drive_c/CLARIFY/12.5/ora92/ocommon/nls/ADMIN/DATA
1802 ./.thumbnails/normal
1811 ./tmp/downtown
1867 ./tmp/downtown/jun
2256 ./.cache/google-chrome/Cache
3051 ./.wine/drive_c/Program Files/Doctor Who - The Adventure Games/Data/ShaderCache

as the directories with the most entries... shooting a shotgun can hit a lot of things. Doubt that is your problem.
posted by zengargoyle at 11:27 PM on November 12, 2010


What does the output of this print:

perl -le '$buf = qq(\0)x64; $dir = "/"; $res = syscall(99, $dir, $buf); ($size, $tot, $free, $avail) = (unpack("L7", $buf))[1..4]; print join "\n", map { sprintf "%.2fM", $_ * $size /1024**2 }($tot, $free, $avail)'

The first number is the total size, the second the free space, the third the available space, as reported by the statfs() syscall. (If this isn't the root then change $dir = "/" to correspond to whatever filesystem you're talking about.) The avail number should match what df says, and the free number should match the total size minus the amount used. If free != avail then that means you've got some superuser-reserved blocks as jrockway alluded to.
posted by Rhomboid at 11:50 PM on November 12, 2010


What filesystem is it? If ext2/ext3, what does dumpe2fs say?
posted by Pruitt-Igoe at 11:51 PM on November 12, 2010


every time I've seen discrepancies between the output of du and df, it has come down to some file, somewhere, on that filesystem being deleted but still held open by a running process.

df and dh do not determine disk usage in the same fashion - du does a getdent for every directory, then does an fstat(2) on every file returned. df reads /etc/mtab then does a statfs on the mounted filesystems to get its output. (man 2 statfs and man 2 fstat for more info.)

Try running     lsof | grep deleted
and look for a large file (or files, even) in a deleted state on the affected filesystem.

Odds are pretty good that if you cycle whatever process is holding those files open, your space will magically "reappear".
posted by namewithoutwords at 3:41 AM on November 13, 2010


Response by poster: Apologies for not posting this sooner; sleep interrupted.

Support responded overnight indicating the issue had been resolved. df -h showed correct values immediately, but some server functionality required a full reboot to fix. Thank you all for your assistance, and I apologize for bothering you with an issue that probably could not have been resolved from my end.
posted by The Confessor at 7:15 AM on November 13, 2010


Response by poster: Just as a final note, customer support indicated to me that this is not something I could have fixed "at the client level."
posted by The Confessor at 10:15 AM on November 13, 2010


Given your provider's response both Kyol and namewithoutwords got it right. A large file (probably a log file for, let's say, Apache) got rolled at just the wrong time and instead of being able to delete it it got stuck in a limbo state. If you had the permissions to do the lsof suggestion and the ability to kill the processes that were holding the deleted files open then you could have resolved this yourself.

It's worth mentioning that this may happen again, in which case you either need to get more space or increase the frequency that the logs are rolling. I'd keep an eye on it over the course of the day/week/month/however frequently the logs are rolling and decide how to proceed from there.
posted by togdon at 1:44 PM on November 13, 2010


togdon/namewithoutwords: The Confessor said he already rebooted, as Kyol suggested, and that would have killed any process holding a deleted file.
posted by Pruitt-Igoe at 2:25 PM on November 13, 2010


I mean, he rebooted his virtual private server instance. The problem (still unknown) was finally solved when the machine hosting it was rebooted.
posted by Pruitt-Igoe at 2:29 PM on November 13, 2010


Best answer: Sorry togdon, you are just wrong. The OP is on a VPS, he has root. No lack of permissions. The OP has rebooted (before asking the question) so no processes were holding open any deleted files. Barring a really bad filesystem corruption that couldn't be fixed by fsck on reboot there was nothing going on in the VPS that could be causing the error.

The not a problem is Virtual File Systems only take up as much physical space on the disk as they are using. Free space on a VFS is not even allocated on the physical disk.
-rw------- 1 zen zen 2.1G 2010-08-14 02:35 b1.vdi
-rw------- 1 zen zen 2.0G 2010-08-14 02:35 b2.vdi
-rw------- 1 zen zen 5.0G 2010-11-13 00:04 centos.vdi
-rw------- 1 zen zen 3.1G 2010-08-20 19:40 l1.vdi
-rw------- 1 zen zen 1.9G 2010-08-20 19:40 l2.vdi
-rw------- 1 zen zen 1.9G 2010-08-03 02:00 psps.vdi
-rw------- 1 zen zen 1.9G 2010-10-19 07:14 rc4.vdi
-rw------- 1 zen zen 5.4G 2010-11-04 02:29 Test.vdi
-rw------- 1 zen zen 2.0G 2010-09-24 09:57 w1.vdi
-rw------- 1 zen zen 1.8G 2010-08-07 23:03 w2.vdi
-rw------- 1 zen zen 5.4G 2010-11-03 10:03 WinXP.vdi
Each of these VFSs thinks it is a full 10G filesystem yet they all only take up 37G of physical hard disk space at the moment. The provider for some reason simply let the physical partition that holds the VFSs run out of space so they had no room to expand to their claimed 10G capacity. The reason you just get the 'no space left on the device' error is because you're in a virtual machine and there is no 'omg! the physical device that I can't know anything about because I'm virtual has run out of space'. And the VFS driver hijacks the statfs system call to reflect the fact that the VFS size is 10G, you've used 900M and there is 0 available.

There was nothing the OP could have done or could do in the future if it happens again except file a trouble ticket and let the provider clean up and make more physical disk space available.

If you're really bored, you could create a 500M loopback filesystem, create a 10G VFS on it and duplicate the error conditions. (I was bored and tried but VirtualBox will pause the VM on this error condition and I'm not bored enough to see if that can be worked around...)

There is no mystery here, the VFS is Dynamic and there was no available space left on the Physical hard disk.
posted by zengargoyle at 3:28 PM on November 13, 2010


Sorry, I missed the response mid-question where he mentioned he rebooted, it wasn't in the original question and I skipped his responses down to the one where he said the hosting provider fixed it. I'm well aware of how filesystems and disk over-subscription work, thanks.

As to whether the problem will happen again... clearly it will, the provider isn't paying attention to the over-subscribed filesystem, a bigger problem than a rolled log holding space open.
posted by togdon at 7:43 AM on November 15, 2010


« Older wormmmmmmmmmmmmmms?   |   Another "what's that song, Google failed me"... Newer »
This thread is closed to new comments.