DIY document imaging
July 26, 2012 6:41 AM Subscribe
How can I convert a color or greyscale scan from a flatbed scanner to a reasonably high quality black and white image, comparable to what a high-end copier would produce?
I store my personal records almost purely as scans, rather than paper. At two past jobs, I've had access to big multifunction copier/scanner/fax machines (most recently, a Xerox 7655), and they chew through 8.5x11 sheets like candy, outputting nice small 300dpi B&W CCITT4 files while keeping the content extremely readable (even pictures look halfway decent, though I care far more about the text).
Unfortunately, sheet feeders don't work well on odd sized paper, so I scan those on a flatbed scanner. Those generally let the user pick settings like resolution and color depth, but choosing 300dpi B&W, for anything other than nice dark text on bright white paper, almost always ends up producing something illegible, covered in mostly-black or mostly-white speckles. And scanning to greyscale generates files a good 20-30x bigger than B&W.
In Gimp, I can manually tweak such files down to something halfway decent (usually a few passes of contrast and edge enhancement), but the exact steps I take for one file might leave another completely useless, so I can't automate the process.
I don't specifically want a recommendation of a program (though I'll welcome any feedback) to do this; I more want a description of a series of image processing filters in any mainstream photo manipulation program like Photoshop or Gimp that will convert 8 bit greyscale to legible B&W.
The fact that a "real" copier/scanner can do a good job at this gives me hope, since that doesn't require any manual intervention; But I just can't seem to come up with the be-all-end-all set of steps to get what I want.
Any suggestions?
posted by pla to technology (4 answers total) 1 user marked this as a favorite
I'd build up a script chaining Netpbm commands: pnmnorm to force the background to white, then pamditherbw to make a B&W image. (You might also want to take a look at this thread if you want to add OCR; Windows and OS X will index your files so you can quickly find them based on content rather than file location.)
posted by scruss at 7:19 AM on July 26, 2012