Sort, OCR, export text from PDFs
March 17, 2008 11:13 AM
Subscribe
I am in need of a server-side Linux or Unix-based software solution that will sort uploaded PDF files that can be PDF-native (that is, created in such a way that the text in the PDF is freely copyable), PDFs with embedded text over images (usually the result of a previous OCR job), and PDF-scanned, which are PDFs containing no text, only scanned images. The PDF-native files and PDFs with embedded text it will extract text from, the PDF-scanned files it will then OCR and export that text.
This means it should not be Windows-based, it should not run on the client or desktop side, and it should be scriptable.
posted by Mo Nickels to computers & internet (4 comments total)
1 user marked this as a favorite
posted by onalark at 11:22 AM on March 17, 2008