Image processing of scanned text
April 9, 2009 2:26 PM
Subscribe
I have 18 copies of a 47-page document, scanned with handwriting on them. I want to extract the handwritten bits (i.e. compare, page-by-page, and eliminate the "constant" part), despite skewing, offset, and some noise in some copies. I want to use Perl or Python with e.g. ImageMagick or gd or something. Any pointers? I'm not talking about OCR -- just comparison, with one output being the graphical bits that don't match.
In case you're wondering, this is documentation for a clinical trial. The clinical history forms are printed, distributed to the physicians participating in the trial, filled out, stepped on, fed to the dog, and mailed back to the study center. Years later, lawyers give reams of this paper to translators for discovery during a lawsuit in a different country. Yes, you are paying for that when you fill a prescription, why do you ask? But I digress.
posted by Michael Roberts to computers & internet (12 comments total)
Hopefully someone smarter than me will answer your question.
posted by chairface at 3:21 PM on April 9