Is there software that can organize and search PDFs based on their subjects?
May 18, 2009 9:56 AM
Subscribe
How do you make journal article PDFs searchable by keywords, controlled vocabulary subject headings, as well as authors and titles?
I'm working for a team of researchers that would like to share their personal files of journal articles and reports with each other. They are primarily stored as pdfs, but there are also some Word documents. They use Windows machines and the core group shares a network drive. They would like a more organized system to allow more precise searching and useful sorting. I've looked at various bibliographic management software programs, but I'm hoping for something that will be able to grab metadata (like pre-defined subject headings) from a massive quantity of pdfs (and not imported citations) and not sure if they do that, or if the pdfs have that information embedded in them. I've also considered document management systems, but wonder if it might be overkill. We also have limited IT support. Although if I could find one that generates RSS feeds for the different researchers, based on the subjects of new articles added, that would be amazing, and also something they would very much like. Automatically generated hierarchical folders would be nice, unless I could convince them that search tools make that unnecessary. Will Owl, Alfresco, KnowledgeTree, OpenKM, or Sharepoint work? How difficult is it to implement these systems? Is it better to just stick to bibliographic software like Zotero, Aigaion, or Connotea? Which one of these would be ideal?
Posting anonymously since I'd rather be discreet for my employer's sake.
posted by anonymous to computers & internet (4 comments total)
3 users marked this as a favorite
The code for adding in metadata looks something like this in C#:
PdfDocumentMetadata meta = new PdfDocumentMetadata();
meta.Title = "Ethel the Frog Goes Quantity Surveying";
meta.Author = "Eric Idle";
meta.Creator = "Plinth"; // etc.
meta.Append(outputStream, false);
It appears within the code that the licensing will work with dotImage Photo, which is our budget product. In addition, you can add PDF custom fields separate from the standard fields.
It is not an out-of-the-box solution - you would need to write code for this, but since you listed Sharepoint as a possible repository, it will integrate well with it since Sharepoint is .NET friendly.
posted by plinth at 10:44 AM on May 18