Join 3,433 readers in helping fund MetaFilter (Hide)

It's basically a haystack made of needles.
January 17, 2014 1:32 PM   Subscribe

I have a ton of old, ad hoc backups from a variety of old computers going back years. I've been very good about keeping the data around and alive, safe from bitrot, but not at organizing it or keeping it useful. How do I sort through this heap of disorganized data and make it useful?

I have a lot of old media and data sitting around from a decade-plus worth of backups, old computers, relatives' computers that I inherited, junk burned to CD, etc. etc. Some of it is 20+ years old, and I've been very good over the years about periodically migrating it forward to avoid losing it to dead / unreadable media. (E.g. floppies to Zip to CD to DVD to hard drive.) But that doesn't mean I've ever really organized it, or really even looked at some of it.

Now that hard drives are big and cheap, I want to take all this stuff — basically a pile of small hard drives and DVDs — and organize it, get it merged in with the rest of my online files, and let my normal Time Machine backup process deal with it, so I don't have to think about it anymore.

So what's the best way to sort through all this stuff? Assume that everything is now in local storage, but just in folders based on whatever media it was copied from. There's a ton of crap in there (many of the drives are just pulls from machines when they stopped being used, so there are entire copies of Windows, applications, etc. on there), plus many documents are duplicated in multiple places. Sorting it out by hand seems like it'd be months of work.

I've been thinking about writing scripts to go through and copy files out based on their extension, so I'd have a folder full of Word/Excel/PPT documents, another of photos, another of video, etc. But many of the files are organized in folder trees where the position in the tree has value, so I don't want to completely destroy that. So maybe what I need instead is a level of organization on top of the actual filesystem; some sort of indexing tool that analyzes the documents and provides a good search interface...?

If nothing else, I'll just shove all of them on a big drive and let Spotlight index them, but I feel like there has to be a better way.

This is on Mac OS, but I have access to Linux and Win7 as well for the purposes of sorting, although ultimately I'd like them to end up attached to the Mac.
posted by Kadin2048 to Computers & Internet (4 answers total) 8 users marked this as a favorite
If nothing else, I'll just shove all of them on a big drive and let Spotlight index them, but I feel like there has to be a better way.

Spotlight is pretty decent. The mdfind command-line tool makes it easy to search for file types, metadata, attributes and content via scripts or remote access. It indexes a wide variety of metadata, and you can use the command-line tool to search on these keys, combining them into complex queries. The GUI is pretty good if you want a graphical tool. Can you define what better would mean for you?
posted by Blazecock Pileon at 1:45 PM on January 17

I also took a look for SpotlightFS, which makes more powerful smart folders from a wide variety of attributes. Further, this is integrated into the file system and accessible via command-line.

So let's say you have 6 backups and you want all the images from them, taken with an old digital camera. You would make a smart folder that searches for files with a combination of Spotlight attributes (image files of a certain date or older, stored in a certain location, containing tags that indicate it was made with a specific model of Nikon camera, etc.). This smart folder updates its contents automatically as you pull in and index each of the 6 backups.
posted by Blazecock Pileon at 1:52 PM on January 17

Humm, maybe I haven't been giving Spotlight enough credit. I really only use it as a quicklauncher for applications, admittedly.

SpotlightFS also seems close to what I was envisioning, in that it provides a sort of organized filesystem alongside the existing one. I haven't played with it yet but it seems neat and I was unaware of it.

Keep the suggestions coming if anyone else has any...
posted by Kadin2048 at 6:18 PM on January 17

I ended up using Spotlight to good effect, although for future reference TMSU looks very interesting (Linux at the moment but seems like it would work on MacOS).
posted by Kadin2048 at 9:17 AM on March 4

« Older For the last few years I've be...   |  What cigars do I buy as a than... Newer »

You are not logged in, either login or create an account to post comments