PDF metadata extraction for file management
May 30, 2013 10:15 AM   Subscribe

I know there are a lot of earlier posts about PDF management software (and others on Chronicle of Higher Education and elsewhere) but they seem out of date, and don't work for my situation. Need: keep PDF library but must be able to import existing downloaded files and extract metadata. Zotero did this using Google Scholar and it worked well, but Google now severely limits how much much metadata looking can be done from the IP they detect. I don't have the skills to work around this. So the problem remains: is there any way to import a large number of already-downloaded PDFs to PDF management software, recovering metadata? (Neither Papers nor Mendeley handle metadata issue adequately, other virtues aside; EndNote is hopeless.). If there is no available solution, what would be the most efficient way to rebuild the library? Find all the papers on-line again and download the bibliographic information, which would be vastly tedious? I am on OSX but open-minded. Thanks.
posted by cogneuro to Computers & Internet (5 answers total) 6 users marked this as a favorite
 
Are you sure that the PDFs have the information you're hoping for in the metadata of the file itself?

I've generally found that to not be the case, and that to get the info I would like--complete citation info, doi, and abstract--I have to use the "Match" function in Papers. I can't tell from the way you phrased it if you have tried that, or if you are just unsatisfied with the metadata Papers finds in your files. You have to "Match" one file at a time, but it is very rare for it to not find complete metadata online.
posted by hydropsyche at 10:33 AM on May 30, 2013


I've never used it, but apparently WizFolio will query PubMed (thus Medline) for these data, which should yield much higher quality than Google Scholar. Even if you end up hating WizFolio I imagine that you could then export the records for use in another reference/PDF manager.
posted by pullayup at 11:23 AM on May 30, 2013


If you already have a lot of the PDF metadata in something (even if the metadata is crappy) you can try using "Find Reference Updates" in EndNote X5 (or higher) to grab additional metadata and missing fields: video tutorial

If your library consists mainly of PDFs with DOIs or PMIDs, but you don't even have crappy metadata for it, you might consider rebuilding it with Zotero's "Add Item(s) by Identifier" feature (it's the little magic wand icon). Copy/paste in the identifer, and it grabs the metadata from CrossRef or PubMed. You can do a PMID lookup in EndNote, but not DOI, last time I checked (there's a new version out very recently).
posted by unknowncommand at 11:44 AM on May 30, 2013 [2 favorites]


Response by poster: Papers2 is much improved since the last time I looked, and also got bought up by Springer, for better and for worse.
posted by cogneuro at 6:06 PM on May 30, 2013


Response by poster: Well, I should have taken a second look at Papers2 before posting. It's much improved, especially the match function. I see they've been bought out by Springer, the academic publishing behemoth.
posted by cogneuro at 6:08 PM on May 30, 2013


« Older Identify Medallion   |   Consequences of coming off anti-depressants Newer »
This thread is closed to new comments.