OCR solutions?
November 15, 2011 2:25 PM   Subscribe

Looking for hardware/software that will scan a physical form with handwritten fields and generated accurate delimited text with it.

The form can be altered/designed to fit the restraints of the system, within reason. Looking for OCR rather than some sort of scantron/bubble system. What's out there right now?

-----form example-----

prompt1: [_____]
prompt2: [_____]
prompt3: [_____]
date: [__-__-__]
serial: [___-____]

-----end form example-----

-----desired output-----
xxxx,yyyy,zzzz,11-15-2011,A1-01234
-----end desired output-----

Ideally it would be robust enough to withstand smudges/faint ink on the standard, unchanging form portions. Most of the handwritten information will be numbers.

Also slightly interested in libraries (any programming language) that ennable this sort of task.
posted by jsturgill to Technology (2 answers total) 1 user marked this as a favorite
 
I haven't used it in a while but I know Omnipage allows you to set up ignore fields very easily. So for the form in question, Omnipage will only recognize the actual written form fields. If all the physical printed paper forms are identical in format, which I am assuming they will be, you can a page as a template with the OCR and ignore fields demarcated, so you don't have to go defining the ignore fields on every form. You might even be able to get one of those feeder scanners and have it do a batch job. I remember it worked pretty well.

Caveats

1. It has been a while since I used to software, but I am pretty sure it has those capabilities, but IANAOCRS.
2. Even great OCR isn't 100% perfect- what is the outcome if on 1% of the forms xxxx,yyyy,zzzz,11-15-2011,A1-01234 comes out as kkkk,yyyy,zzzz,II-15-2011,A1-01234? Mere annoyance, or a poor kid not getting into Harvard? So you have to go over and manage and probably proof these forms anyway.
3. Depending on how bad the handwriting is, that could be a problem.
posted by xetere at 7:34 PM on November 15, 2011


The RecoStar OCR/ICR engine will do this.
The accuracy isn't too bad compared with other tools I've seen and used.

You can set up laid out forms from a scanned sample. The engine will then recognize the form in an image and pull the text out for you.
posted by plinth at 3:20 AM on November 16, 2011


« Older Putting Windows 7 on a MacBook Air   |   How not to puke in a celebrity's face Newer »
This thread is closed to new comments.