Deleting very large directory
April 10, 2008 9:42 AM   Subscribe

How to delete a huge directory on Linux without overloading the system?

I've looked at this question which is basically my exact situation, but I can't get any of the solutions described too work without killing the system.

I've tried:
rm -rf mydir
nice rm -rf mydir
nice -n 19 find mydir -type f -exec rm -v {} \;
nice -n 19 rm -rf mydir&

And many other combinations of rm, find rm, nice find rm, etc, but all cause the server load to rise to dangerous levels quickly (I kill the process when `top` hits 20, I'm assuming the machine would hang if allowed to continue).

So is it possible to remove a directory with a lot of files without killing the server?
posted by justkevin to Technology (14 answers total) 4 users marked this as a favorite
Best answer: find mydir -type f -print0 | perl -0 -MTime::HiRes=sleep -ne 'unlink; sleep .1;'
posted by nicwolff at 9:56 AM on April 10, 2008

(Or sleep .05 or .01 or whatever doesn't crush your CPU.)
posted by nicwolff at 9:58 AM on April 10, 2008

find mydir -type f | while read; do rm -v "$REPLY"; sleep 0.2; done
posted by flabdablet at 10:02 AM on April 10, 2008

Sorry, should be

find mydir -type f | while read -r; do rm -v "$REPLY"; sleep 0.2; done

just in case any of your filenames have backslashes in them. This won't remove files with newlines in their names, but those are pretty rare.
posted by flabdablet at 10:05 AM on April 10, 2008

Follow it up with

find mydir -depth -type d | while read -r; do rmdir -v "$REPLY"; sleep 0.2; done

to remove the directory tree, if you have tens of thousands of subdirectories and rm -rf is still too harsh.
posted by flabdablet at 10:09 AM on April 10, 2008

It would be interseting to know both how many files are there (ls |wc -l) and what OS/filesystem is in use.
posted by TravellingDen at 10:10 AM on April 10, 2008

Best answer: If there are enough files that rm -rf hangs you up, you don't want to be doing anything that sorts their names - TravellingDen's curiosity would be best served with

find mydir -type f | wc -l
posted by flabdablet at 10:12 AM on April 10, 2008

I did some experiments on this a few years back. Between the 2.4 and 2.6 kernel series, directories became better and adding and opening files, but much slower at deleting them.

Once you've solved your immediate problem, the next step is to avoid building huge directories. One approach is to partition files into subdirectories based on something easily computable, such as the first few characters of the filename, or the md5/sha1 hash of the filename if that gets you a better distribution. This is how git (a distributed version control system) manages huge numbers of files (e.g., the source files for Linux, so if this is how Linux does it for Linux, it's an approach worth considering).
posted by dws at 10:14 AM on April 10, 2008

Response by poster: Using flabdablet's method of counting files (and nicwolf's method of deleting them), there's currently a little over 5 million, dropping at a rate of a few hundred a second. Top seems stable, hovering around 3.0, plus or minus.

It's Red Hat ES, kernel 2.6.18.
posted by justkevin at 10:35 AM on April 10, 2008

ionice will probably help:

$ ionice -c 3 rm -rf <dir>

puts rm in the "idle" io scheduler class, which should mean that it only gets to do IO when nobody else wants to.
posted by pharm at 10:40 AM on April 10, 2008 [4 favorites]

The ionice/nice command loads should NOT concern you when they rise.

Trust the scheduler - trust the system. The scheduler *knows* your command has no priority and it will move it aside for other applications when they request it. The load rises because the system starts doing what you requested.

The load of a system is just a measure. The scheduler will still do the correct thing when requested. You are not "killing" anything, I promise.
posted by unixrat at 12:08 PM on April 10, 2008 [2 favorites]

The scheduler only knows that the command has the same priority as every other command unless you tell it otherwise.

Although I agree that a high load value isn't in and of itself a bad thing.
posted by pharm at 12:23 PM on April 10, 2008

Agreed with using ionice, and letting the thing chug away. It will let you use your computer for anything else, since everything else will get higher priority to the disk. And the delete will go when nothing else wants to.

No reason to keep load down, use the computer to it's fullest.
posted by cschneid at 2:49 PM on April 10, 2008

Also: if you're regularly creating directories with millions of files in them, you might want to consider putting those on a ReiserFS file system. Reiser is good at that. But read this first.
posted by flabdablet at 5:21 PM on April 10, 2008

« Older Who will clean my (suede) bag?   |   big anniversary coming - wife wants to travel, I... Newer »
This thread is closed to new comments.