How to automatically extract graphical content from PDFs?
May 9, 2007 1:33 PM Subscribe
Are there any software packages or toolkits (preferably open source) available that allow me to automatically extract graphical content (such as pictures, diagrams, graphs, etc.) from batches of PDFs?
I'm working on a Grad school project where I would like to automatically extract any graphical content from batches of PDFs.
By graphical content, I mean pictures, graphs, diagrams... anything that's visual and not part of the full text.
I would also like to be able to automatically extract any captions that a picture would have, and perhaps the surrounding text... say half a page before and after the occurrence of the picture.
I'm trying to build a set of pictures from a large batch PDFs, and classify/tag them based on the content of the captions or nearby text.
Thanks for any help!
posted by elbaso to computers & internet (4 answers total) 3 users marked this as a favorite
posted by roue at 1:46 PM on May 9, 2007