Help me find the best PDF management software for a Pd.D researcher
November 14, 2011 8:16 AM   Subscribe

Looking for the absolute best and (preferably) easiest way to manage thousands of PDFs as part of dissertation research.

My girlfriend is embarking on her Ph.D dissertation in history, and has spent the last few months digitizing about 60 reels of microfilm (more than 60,000 pages broken down into multi-page PDFs for each document). She's looking for the most robust, intuitive way to manage and use these files, and is hoping AskMe can provide personal recommendations for software packages she should explore.

She would like a program that has the following features:

1) Built-in file/PDF viewer (of course).
2) Ability to tag files and take notes on them, then search/sort across notes and tags. The file names contain essential information about each file, so those must be maintained and incorporated into the sort options.
3) Good OCR would be awesome, but not essential. A large number of the documents are handwritten or were poorly-microfilmed, but it still would be nice to have at least some search capability.
4) Ability to ingest and manage a huge number of files, of all sizes. Again, we're talking more than 60,000 pages in 30,000-40,000 or so PDFs.
5) Intuitive to use. Ideally, the software would "get out of her way" and allow her to focus on the actual document contents and not spend her time futzing with the interface.

Are there other features she hasn't considered that would make this project more manageable?

She uses PCs, but if the absolute best software of this type is Mac-based please let us know. She may be getting a new laptop soon, so if there is a strong case for getting a Mac as part of this it will be taken into consideration. A web-based solution would be intriguing, but remember that she will be dealing with tens of thousands of documents that she wants to navigate through easily and quickly. I experimented with Google Docs and Zotero, but both seemed to get overwhelmed by the number and size of files she would be dealing with.

She's willing to spend money on this, but I suppose there is a limit. If there's a free option that can handle that many files easily, all the better. Right now she's just wanting to get a sense of her options.

Any and all suggestions would be appreciated!
posted by arco to Computers & Internet (18 answers total) 22 users marked this as a favorite
I used a while ago, I know it supports OCS, but I stopped using it because at the time the site was too slow for me in the UK. That was 3 or so years ago, so they may have improved.

This was taken from the Wikipedia page on Evernote:

The Evernote software can be downloaded and used as "stand-alone" software without using the online portion of your Evernote account (online registration is required for initial setup, however), but will not be able to upload files to the Evernote server, or use the server to synchronize or share files between different Evernote installations. Also, no image or Image-PDF (Premium only) recognition and indexing will take place if you use the software entirely offline.
posted by ben30 at 8:29 AM on November 14, 2011

Don't use it myself. But heard good things about it. Oh and it's mac only
posted by schwa at 8:37 AM on November 14, 2011

Papers for Mac is designed for this kind of thing. On Windows you could use Endnote for this but it will be cumbersome.
posted by twblalock at 8:37 AM on November 14, 2011

This also sounds like the kind of thing Devon Think Pro (Mac) was built for.

My guess is that you're going to want to OCR the documents using a software package outside your PDF library software. But, Devon Think Pro may have all the OCR support you need.
posted by jeffch at 8:47 AM on November 14, 2011

Mac in general has pretty good pdf support I think. Adobe Acrobat will OCR files. But 60,000 pages is a lot. That must be at least 100 GB? It will take for ever. Also it probably won't handle hand-written/poor microfilm quality.

Given the number of documents, and the size of the corpus, I would check with the developers of these apps if they can handle this number.

It may be worth contacting the archivist at your local library/historical society, to see if they can recommend any specialized archive applications for this. This seems like more of a professional/prosumer task than a consumer one.
posted by carter at 8:48 AM on November 14, 2011

If Zotero didn't cut it, you might look at Mendeley, which has a downloadable component.
posted by canine epigram at 9:04 AM on November 14, 2011

She might consider OneNote. It has a nice hierarchical notebook metaphor (pages within sections within notebooks) and you can import a pdf and then make notes along the side or add tags to the page. I should emphasize that any notes/tags/highlights are all in the oneNote document, not changes made to the PDF. Not sure if you can export your tags later. I think she might have to import each pdf by hand, however, though it seems like she will need to look at each one anyhow.

I would rate one note ++ on flexibility and usability but - on playing well with other programs
posted by shothotbot at 9:16 AM on November 14, 2011

The social science lab I work in uses Mendeley, which I happen to quite like. I've used it in the past for dealing with ~1000 pdf's during a literature review, and it was easy to use and has a get tagging and search system. Not sure how it would work for the scale that you are looking at, but its free so it might be worth a go. Other thing to note it that it doesn't offer OCR. You would have to do that separately in a standalone program.

Other programs might include Devonthink, which a friend of mine uses and I keep wishing I had a Mac for. It looks absolutely awesome, might be worth exploring as an option. That friend wrote her PhD with it and had no complaints that I am aware of. Have also heard Sente is good (Mac based as well)

Just a note, I don't think Evernote would be a good solution for this. I tried it for a project I was working on and your girlfriend would be better off going with a bibliographic program. Evernote got overwhelmed with 100 pdf's I was testing with, and you might run into storage issues if you have a free account.
posted by snowysoul at 9:45 AM on November 14, 2011 [1 favorite]

Nuance's PDF Converter Pro does batch PDF OCR and is reasonably affordable (you can try it out with a free trial, and educational discounts are available.)
posted by canine epigram at 10:10 AM on November 14, 2011 [1 favorite]

A happy user of Papers, mentioned earlier. Mac only, Will never be ported to Windows
Another upcoming program is Qiqqa
posted by zaxour at 11:07 AM on November 14, 2011

+1 for Mendeley. Is cross-platform. Used Papers for a while but can't use Mac exclusively :(
posted by bullox at 3:08 PM on November 14, 2011

Can Papers search full-text? I downloaded a trial version and threw some PDFs in there. It did a good job, but when I tried some sample searches I was not searching the full-text of my PDFs, just the citation info and notes I'd attached to each PDF. I tried several tests with the same result, but never figured out if I had some setting off somewhere.

Zotero has a standalone version in beta, I'm hoping it can handle high numbers better than the browser-based version.
posted by lillygog at 3:51 PM on November 14, 2011

To clarify, my point with Papers is that if it doesn't search the full-text of her PDFs that might be a deal-breaker for your girlfriend.
posted by lillygog at 3:53 PM on November 14, 2011
posted by cupcake1337 at 9:04 PM on November 14, 2011

Omnipage Professional from Nuance will do bulk conversions of image PDFs into image-on-text PDFs, which will then be searchable using any PDF indexer. It also includes a workflow editor that you can use to tweak the initial settings for best results (manually defining headers, footers, columns, etc.).

Don't buy it, it's likely that the school will have a volume license or has it installed somewhere accessible for her to use in a one-time weekend OCR binge.
posted by benzenedream at 2:34 AM on November 15, 2011

Mendeley has my vote. It has made the dissertation process a LOT easier, and I highly recommend it.
posted by richmondparker at 10:34 AM on November 18, 2011

In case anyone is interested in this, here's what we found out: If you're on a PC, Mendeley is probably your best bet. It handled 20,000+ documents easily, and was relatively easy to use. However, after seeing Papers in action, and hearing general testimonials about the Macbook Air, she decided to go the Mac route.

Thanks for all your suggestions!
posted by arco at 12:20 PM on December 14, 2011

I don't use it, but I don't think papers is mac exclusive, I am seeing papers on windows, (Windows requires XP Service Pack 3, Vista or Windows 7) here... that might put it back on the table.
From the website: You can use your Papers license on either your PC or your Mac. Use a PC at work and a Mac at home, or visa versa?

I have heard nice things about Sente (expensive, has an 'undergrad' option for about $30), also, you may like to check through this listing comparing some of the features/specs of various academic/reference management softwares.
posted by infinite intimation at 5:56 AM on September 11, 2012

« Older Best books that started out as films?   |   Help me narrow my timeline down. Newer »
This thread is closed to new comments.