Digital Libraries - Searching For Files
August 11, 2011 12:37 PM   Subscribe

Digital library management problems. I have a mess of pdfs, word docs, emails, images and other things I want to be available, but I'm at a loss for a relatively low-cost solution that will allow these documents to be easily searched through and found.

I have thousands of PDFs (ocr'd & non), outlook emails, images and word documents that I want to have available in an existing digital repository. For simplicity's sake, just imagine it is one big computer and a bunch of us are sitting in the room with it, trying to access the files.

Ideal Solution

What would be perfect is if there was some kind of iTunes-like way to go a single search and put in keywords and pull up the images, pdfs and other filetypes.


I am positive there will be some kind of work to do to catalog some of this material. I have done a bit of that work using a blog-like installation but the majority of the available files are still out there. A Google server is too expensive from what I understand regarding their cost. Using the windows search is not an option.


Imagine it's MeFi, but in pdfs, word docs, emails and images. These files are all in folders - hundreds of folders. Penelope wants to find the folder with that .gif file of cortex eating the donut. Rachel is looking for the pdf of an article referenced in an FPP a few months ago, and all she remembers is that it involved the New York Times, and something about race. Stan is trying to find the word document where a few people in metatalk signed up to hold each other accountable.

The files could be in different folders, and folders within folders. The gif could be under cortex, or donuts, or funny pictures. Stan thinks he could find the word doc by searching for "accountable" or "accountability", or "metatalk" but the windows search is unavailable. What other system could be put in place to allow this search. It would be useful because in searching for "accountable" in these hundreds of folders, maybe Stan would pull up pdfs of askme posts where people talked about accountability solutions for those working away from their offices, in addition to the sign up sheet.


What software or hardware exists to make this happen? Thousands of folders accessed by a lot of people. Windows search is not an option. In a perfect world, how I can go into iTunes and type in "treach" and get everything from Treach from Naughty By Nature to Symbolic Three's "We're Treacherous", I would be able to type keywords and get all kinds of documents that are in hundreds of folders.
posted by cashman to Computers & Internet (21 answers total) 13 users marked this as a favorite
I am pretty pleased with Google Desktop for searching stuff on my computer - much better than Windows search.
posted by exogenous at 12:51 PM on August 11, 2011

Response by poster: I'd actually forgotten about Google desktop. And it does not make your files available online, right? Does it index the content and send that info back to Google, or act otherwise in a problematic way in that area?
posted by cashman at 12:58 PM on August 11, 2011

I use Punakea on the Mac ( There are windows equivalents - tagging files. You have to tag the files, but you can make your own taxonomy and get as much cross-referencing and depth of categorizing as you need.

I tried databases, folder sorting, and a bunch of other methods. I wanted something that worked with the vague folder structure I use (I dump files in project folders and then archive them every now and then) without damaging or risking the files themselves if the program got corrupted.

Punakea works for me because it's all done by keyboard shortcuts. Every now and then, I'll clear a folder by tagging it before I archive it, and over the past two months I've gotten about 50% of my files tagged with very little fuss. It's a couple of seconds to hit the shortcut, type the start of a tag, hit enter, tab, enter and bam, done.

Look for a tool that requires the absolute minimum of input because you will probably lose your enthusiasm for organising soon =)
posted by viggorlijah at 1:07 PM on August 11, 2011

It doesn't send the files themselves to the Google mothership. According to the privacy policy, it doesn't send content to Google without permission, full stop. Seems to be pretty well behaved in that regard.
posted by exogenous at 1:08 PM on August 11, 2011

something that I would recommend that could do that for a lot of your files (although not all) is Mendeley -- you can test it out for free (it is up to a certain limit for free). For academic type PDFs, it actually pulls authors, keywords,abstracts automatically and voila, searchable files.

I've heard similar things about Zotero
posted by Wolfster at 1:10 PM on August 11, 2011

There are numerous programs for the mac that will do this for you, of which Devonthink Pro is probably the most powerful and sounds like exactly what you need. From your question though it sounds like you are on windows, in which case your options are more limited. Although I ahve never liked it, Evernote might work? For more limited file types you could use PaperPort by nuance.
posted by Another Fine Product From The Nonsense Factory at 1:26 PM on August 11, 2011

Seconding Evernote. It's not perfect but fits most of your parameters.
posted by orrnyereg at 2:08 PM on August 11, 2011

You're looking for a multi-user solution, right? And you've mentioned Windows, but is that the only operating system in this network?

A single-user (and portable) option could be Calibre, which supports a lot of different formats (and converting, should you want that sort of thing). You have various ways to catalog collections and items. It's rather robust, but might take some figuring out. It's free, and on Windows, Mac and Linux.
posted by filthy light thief at 2:10 PM on August 11, 2011

Seconding Calibre, especially for PDFs. It's easy to convert file formats, edit metadata, and search.
posted by mattbucher at 2:30 PM on August 11, 2011

Devonthink is far & above the most powerful & flexible digital library solution. It can handle just about any file format there is, has automated metadata generation, scripting, OCR, tags, RSS, the works. It will take some time & patience to chew through a large collection; break it into chunks instead of throwing it all in at once. Calibre is best suited to a uniform collection of files without a lot of directory structure getting in the way. It really shines with ebook collections where it can grab metadata from Amazon & other sites but it's less impressive with random PDF & HTML files where you'll be left entering a lot of it manually.
posted by scalefree at 2:46 PM on August 11, 2011

Sharepoint? It's not *cheap* but if you work for a university or something like that, your IT department may already have a server and you can set up a site and document libraries for free. That's what we used for our digital library.
posted by hotelechozulu at 3:54 PM on August 11, 2011

I'm on a Mac and I use Papers to keep track of my ever-growing stash of medical literature and other files. This AskMe may also be helpful for suggestions if you are not on a Mac.
posted by honeybee413 at 4:13 PM on August 11, 2011

Papers is good for what it's for, which is technical papers in PDF format that are already indexed online so you can just transfer the metadata to your local datastore. If what you've got is a jumble of formats & content types including Outlook PST files, it's best to get something that can extract & capture metadata from the documents themselves & maybe the directory structure as well. OP mentioned iTunes so I assume he's on a Mac, though that's not necessarily true. If it is, my vote's still for Devonthink.
posted by scalefree at 4:37 PM on August 11, 2011

Nothing the suggestion to consider Mendeley.
posted by bluedaisy at 6:04 PM on August 11, 2011

Nthing! Not nothing.
posted by bluedaisy at 6:05 PM on August 11, 2011

Maybe try Copernic Desktop Search -- it previews images of your files as you describe:

I prefer version 1.7, but I'm still on XP and not sure if it works for newer versions of Windows:
posted by rumbles at 9:03 PM on August 11, 2011

FYI, Google Desktop is incompatible with both OSX Snow Leopard & Lion.
posted by scalefree at 11:16 PM on August 11, 2011

Another vote for Evernote and their shared notebook functionality.
posted by ukdanae at 11:43 PM on August 11, 2011

you might find Qiqqa helpful for dealing with your PDFs; have a look at their website and perhaps try it out. It might not address all your specifications/scenarios, though.
posted by davemack at 3:57 AM on August 12, 2011

Google Desktop

Cloud stuff:

Evernote (maybe?)

Dropbox (maybe?)

mozy (?)
posted by BobbyDee at 10:20 AM on August 12, 2011

Give Zotero a try ( It's designed for capturing and organizing scientific literature, but works very well at organizing anything, really (you can drag and drop stuff right into it). Totally revolutionised how I organise information at work.
posted by Mundungus at 2:21 PM on August 12, 2011

« Older "Happy families are all alike," but I want to read...   |   Mont Blanc: stick to pens? Newer »
This thread is closed to new comments.