OCR solutions?
November 15, 2011 2:25 PM Subscribe
Looking for hardware/software that will scan a physical form with handwritten fields and generated accurate delimited text with it.
The form can be altered/designed to fit the restraints of the system, within reason. Looking for OCR rather than some sort of scantron/bubble system. What's out there right now?
-----form example-----
prompt1: [_____]
prompt2: [_____]
prompt3: [_____]
date: [__-__-__]
serial: [___-____]
-----end form example-----
-----desired output-----
xxxx,yyyy,zzzz,11-15-2011,A1-01234
-----end desired output-----
Ideally it would be robust enough to withstand smudges/faint ink on the standard, unchanging form portions. Most of the handwritten information will be numbers.
Also slightly interested in libraries (any programming language) that ennable this sort of task.
The form can be altered/designed to fit the restraints of the system, within reason. Looking for OCR rather than some sort of scantron/bubble system. What's out there right now?
-----form example-----
prompt1: [_____]
prompt2: [_____]
prompt3: [_____]
date: [__-__-__]
serial: [___-____]
-----end form example-----
-----desired output-----
xxxx,yyyy,zzzz,11-15-2011,A1-01234
-----end desired output-----
Ideally it would be robust enough to withstand smudges/faint ink on the standard, unchanging form portions. Most of the handwritten information will be numbers.
Also slightly interested in libraries (any programming language) that ennable this sort of task.
The RecoStar OCR/ICR engine will do this.
The accuracy isn't too bad compared with other tools I've seen and used.
You can set up laid out forms from a scanned sample. The engine will then recognize the form in an image and pull the text out for you.
posted by plinth at 3:20 AM on November 16, 2011
The accuracy isn't too bad compared with other tools I've seen and used.
You can set up laid out forms from a scanned sample. The engine will then recognize the form in an image and pull the text out for you.
posted by plinth at 3:20 AM on November 16, 2011
This thread is closed to new comments.
Caveats
1. It has been a while since I used to software, but I am pretty sure it has those capabilities, but IANAOCRS.
2. Even great OCR isn't 100% perfect- what is the outcome if on 1% of the forms xxxx,yyyy,zzzz,11-15-2011,A1-01234 comes out as kkkk,yyyy,zzzz,II-15-2011,A1-01234? Mere annoyance, or a poor kid not getting into Harvard? So you have to go over and manage and probably proof these forms anyway.
3. Depending on how bad the handwriting is, that could be a problem.
posted by xetere at 7:34 PM on November 15, 2011