Document scanning, archiving, and OCR?
February 16, 2005 7:22 AM   Subscribe

Document scanning and archiving? (MI)

Organizing my new employer's files. I don't have massive quantities to scan, but to do it by hand and create PDFs using, say, Acrobat, could take days. I also need to be able to organize documents and images already in electronic formats, preferably combined in a seamless folder system that includes the scanned images. I'm trying to determine the necessity of using OCR to get searchability, but am having trouble weighing its value against the labor required (I'd have to scan and then convert to PDF to prevent alteration-- they are long legal documents); if I sacrifice the search functionality, then I can just scan them in directly as PDFs, assuming I've got the right software. Additional factor: the company needs the ability to create PDFs for other purposes as well, so bonus points for recommending a software that includes this feature. Am I making any sense? (I'm only one cup of coffee into the morning.)

I have of course googled, but I'm looking for recommendations (ease of use, good learning curve, speed, etc.) Any experiences with PaperPort or similar programs?
posted by mireille to Computers & Internet (7 answers total)
Don't know if you've used OmniPage Pro v14 before, but it's an extremely powerful OCR program. It can OCR documents from images. Also, it has a Word plug-in that allows you to import PDF files directly (whether they be images or text, it will convert them to Word format). It's the only program I know that does this.
posted by Civil_Disobedient at 7:39 AM on February 16, 2005

Oh, and obviously once they're in Word format, it's a simple matter to go to HTML, or dump the text into a database, or whatever you wish to do with it.
posted by Civil_Disobedient at 7:40 AM on February 16, 2005

Response by poster: I haven't used this type of software before, so...

OmniPage looks like about half of what I need (and a good, easy half) but I'm thinking that the same technology is bundled with PaperPort because they're from the same company. The other half that I require is the ability to file and organize all docs (scanned hardcopy and original electronic documents and images) so that my employers can find information easily-- and we're not talking about people who are particularly tech-savvy. This can be accomplished either by having a search function built-in to the software, or by me completely overhauling the current folder system (and to make it worse, between the President, the VP and the remote server, there are many cases of three copies of the same doc; I'm already going to have to weed out all duplicates and find the most recent/accurate). Whatever solution I decide on, it's going to have to be intuitive for a couple of people who aren't, necessarily.
posted by mireille at 8:06 AM on February 16, 2005

This looks interesting. If you try it out I'd like to know how it performs.
posted by omnidrew at 8:22 AM on February 16, 2005

I believe the latest version of Adobe Acrobat Standard lets you create PDFs directly from a scanner. There is also a product called Acrobat Capture which is specifically designed to do this.

(Full disclosure: I work as an engineer for the company, but my comments here do not represent those of Adobe Systems Incorporated in any way.)
posted by tsackett at 10:06 AM on February 16, 2005

We generally scan directly to Acrobat for plain image PDFs and to FineReader for OCR. After the OCR process is done, the file can then be saved to image+text PDF, to TXT, to DOC, to RTF, or to several other formats, and several saves can be done for multiple versions.
posted by yclipse at 1:15 PM on February 16, 2005

Response by poster: Thanks, all. This is more than I had before, which helps!
I'll mark a 'best answer' if one of these turns out to be...
posted by mireille at 11:37 PM on February 16, 2005

« Older Help finding photos of Royal Navy ships   |   What is my legal obligation to attend a court date... Newer »
This thread is closed to new comments.