Too many files?
February 1, 2007 8:12 AM   Subscribe

Is it possible for Windows 2000/XP (or more properly, an NTFS disk volume) to "collapse" under the weight of too many files?

I have about a million specialized data files spread out on about a hundred CDs (no, it's not porn) and would like to load them onto one of those 500 GB Firewire drives formatted with NTFS. I will spread them out among multiple directories, at least so that directory operations in various applications won't grind the OS to a halt.

But I have some concerns about whether a huge number of files could corrupt the volume somehow or invite read/write problems. Doesn't it take a lot of storage merely to index a million files?

How many files is "too many"? Is there any advantage to putting massive collections like this into zip files (if so I'd be using command line info-zip)?
posted by chef_boyardee to Computers & Internet (11 answers total) 2 users marked this as a favorite
You'd need to be pushing serious amounts of data about to 'break' NTFS in the sense you're talking about. See here for the limitations of this file system.
posted by ReiToei at 8:22 AM on February 1, 2007

I currently support a production Windows 2003 server that is hosting close to 40 million small files, spread out over several thousand directories. I don't think you're going to see an issue with the filesystem.

One issue that I have seen, however, is when you have a great many files in a single directory. I had an issue on an Exchange server where the mailroot/vsi 1/badmail directory accumulated over 20 million small files over a period of about a year. Disk performance was absolutely terrible on that machine as a result. It also took almost a week just to delete the files in that directory.
posted by deadmessenger at 8:23 AM on February 1, 2007

Microsoft says you should be able to have 2^32 (4.3 billion) files per NTFS volume, so you should be fine.

Spread them out in different directories as much as you are able. Single directories with excessive numbers of files in them will have poor performance.
posted by jellicle at 8:25 AM on February 1, 2007

It's a Linux system with ext3 filesystem, but I count approximately 1 million files in 125000 directories on my fileserver (it's a 300GB disk) with no troubles. I assume other modern filesystems like NTFS are also up to this task.
posted by jepler at 8:51 AM on February 1, 2007

One easily encountered problem with NTFS is the total directory path character count is fairly small. ReiToei's link says the max filename size is 255 characters but that is actually the maximum path length. You can have a filename with 255 characters however it can only be mounted at the root. If you have long directory names and a deep tree you can run into problems.

255 characters may seem like a lot but take for example your "My Pictures" folder. On my machine the path (C:\Documents and Settings\Mitheral\My Documents\My Pictures) has already consumed 54 characters and I haven't even created a sub directory yet. Office creates a folder in my pictures called "Microsoft Clip Organizer", whoops there goes another 24 characters. Logitech creates the folders "My Logitech Pictures\Pictures and Videos" when you install one of their camera products. Store anything in their and your path is 39+54=93 characters long and you haven't even got started naming your files or organising them into directories.

But the worse offender is IE. When you save a web page the default name given to the file and the associated directory is the title text. If the webmaster has entered a magnum opus as the title it takes the whole thing. The save dialog doesn't show the full name and users who use explorer in icon mode won't ever see the full thing. Try to move it to a directory with a longer path though and windows throws errors.
posted by Mitheral at 9:08 AM on February 1, 2007

The biggest issue you'll probably come across is file fragmentation and performance. I'd auto-schedule a defrag on a drive holding that many small files.
posted by damn dirty ape at 9:13 AM on February 1, 2007

Directory fragmentation is also a real issue with NTFS. If you're copying loads of files to an NTFS volume, and they're going to be essentially static once they get there, use Robocopy to do the copy, and do all the copies twice - once with the switch that makes it copy dummy zero-length files instead of the real ones, and a second time to copy the actual file contents. Details about switches to use are in the Robocopy documentation, which I don't have access to right now.

Or you could use a NAS box instead of a Firewire drive, which would allow you to use the ReiserFS filesystem on your 500GB disk. ReiserFS has no trouble at all with hundreds of thousands of files in a single directory.
posted by flabdablet at 12:30 PM on February 1, 2007

"ReiToei's link says the max filename size is 255 characters but that is actually the maximum path length."

That's inaccurate. Improperly written (or old) programs that don't use the newer APIs for handling filenames will have a 255 character limit on the entire path. Modern and properly-written programs will not have that limitation.

I ran into this problem with Windows Explorer and a usb backup drive. One issue to keep in mind is that data still often travels between file system formats - NTFS to FAT32 or the CD formats. Long directory structures can make this painful.
posted by srboisvert at 4:19 PM on February 1, 2007

The 255 character pathname limit doesn't apply to Robocopy, unless you turn on the switch to enable it (?!!) but does apply to most other command-line-based tools. And it will apply in the same way regardless of what kind of filesystem you end up choosing.
posted by flabdablet at 7:34 PM on February 1, 2007

Bottom Line: Yes, your 500GB external drives will handle your million files as long as you span your files across folders. To keep your processor from pegging out, I would put about 125-150,000 files per folder.
posted by nataaniinez at 7:05 AM on February 2, 2007

It may be a side effect of our local conditions. We have home drives H:\ on regular server (W2K3) shares whose paths are nice and short but our corporate shared drive Z:\ is using DFS. A side effect of which is the user doesn't know what the path length on the server is and it is sometimes fairly long. So when they attempt to move some 130 character saved webpage from their H:\ to Z:\ they can get path length errors because z:\ just happens to have a directory path that is 20 characters longer than H:\ and that bumps them over the 255 character limit. It could be IS is doing some sort of jiggery pokery with DFS that is causing this, we have quite a bit of legacy cruft and shims floating around.
posted by Mitheral at 7:14 AM on February 2, 2007

« Older PBX Software Development   |   Yet another tax question... help us understand Newer »
This thread is closed to new comments.