"I got five hard drives with eighty-nine gigabytes, I eat databases, networks, and web sites."
September 4, 2010 4:15 PM   Subscribe

[OSX Organizational Filter] I do video and need to become better organized on my external drives.

My problem is that I have about ten small drives (500gb to 2tb) and there's a sort of disorganized redundancy throughout them. Is there a program that I can install, where I insert each drive at a time, have the program catalog the names of files and folders and their respective sizes and then, after all the drives are done, I can look and say "File X is 30gb and on four drives" or "Folder Y is 130gb and on three drives."? Does this exist or is there a smarter way to handle this?

Thanks.
posted by history is a weapon to Computers & Internet (4 answers total) 5 users marked this as a favorite
 
Whenever I've done this, I don't use filesizes, but rather md5 fingerprints, which are a better test for uniqueness.

All I did was a quick script (to write, not to run) like:


find /Volumes/ | while read i; do md5 "$i" >>~/Desktop/filedetails.txt; done


Then chuck the resulting file into excel (or similar) and sort on the md5 hash to group the duplicates.
posted by pompomtom at 6:55 PM on September 4, 2010


One option would be a setup like Raid 5. It would combine all your physical drives into a single virtual drive that the operating system sees. In it's normal setup, it has the ability to withstand any single drive dying, but would lose data if two drives died.

So if you have reasonable backups, this may be a better setup, since instead of finding something across disks, you can just have one giant 10tb chunk of disk you use.

There are other raid setups that have different safety vs size tradeoffs. Be careful, and ALWAYS have backups.
posted by cschneid at 9:41 PM on September 4, 2010


CDFinder is a great application for the first step of your process (cataloging the discs). I've never tried to use it for the second step, but it may have some abilities in that area. Its search is speedy and useful.
posted by muta at 10:27 PM on September 4, 2010


pompomtom: All I did was a quick script (to write, not to run) like:
find /Volumes/ | while read i; do md5 "$i" >>~/Desktop/filedetails.txt; done
Then chuck the resulting file into excel (or similar) and sort on the md5 hash to group the duplicates.


A (perhaps slightly) simpler way to do this:
find /Volumes -exec md5 {} \; >> ~/Desktop/files.txt
sort -t= +1 ~/Desktop/files.txt -o ~/Desktop/files.txt
Find's '-exec' primary runs the named program directly, replacing the '{}' with the current filename, "not subject to the further expansion of shell patterns and constructs"—which means you won't need to worry about it doing silly things with any bizarrely-named files you may have laying about—which depending on your drive(s) and processor(s) may be marginally quicker than having the shell do it instead.

Once that's over with, the second part is certain to be a hell of a lot faster than booting Excel (or maybe I just have a shitty computer...), but it assumes your md5's output looks like mine:
MD5 (/Volumes/ALEXANDRIA/Music/Pixies/Doolittle/07 Monkey Gone to Heaven.mp3) = 23327b24f79304bb845b33fcd32be993
Dividing the line on the '=', sort based on the second field, and drop the sorted file back on top of the original.
posted by FlyingMonkey at 5:52 AM on October 4, 2010 [1 favorite]


« Older Please help identify this grill/stove   |   Need a junker that isn't complete junk! Newer »
This thread is closed to new comments.