DIY document imaging
July 26, 2012 6:41 AM   Subscribe

How can I convert a color or greyscale scan from a flatbed scanner to a reasonably high quality black and white image, comparable to what a high-end copier would produce?

I store my personal records almost purely as scans, rather than paper. At two past jobs, I've had access to big multifunction copier/scanner/fax machines (most recently, a Xerox 7655), and they chew through 8.5x11 sheets like candy, outputting nice small 300dpi B&W CCITT4 files while keeping the content extremely readable (even pictures look halfway decent, though I care far more about the text).

Unfortunately, sheet feeders don't work well on odd sized paper, so I scan those on a flatbed scanner. Those generally let the user pick settings like resolution and color depth, but choosing 300dpi B&W, for anything other than nice dark text on bright white paper, almost always ends up producing something illegible, covered in mostly-black or mostly-white speckles. And scanning to greyscale generates files a good 20-30x bigger than B&W.

In Gimp, I can manually tweak such files down to something halfway decent (usually a few passes of contrast and edge enhancement), but the exact steps I take for one file might leave another completely useless, so I can't automate the process.

I don't specifically want a recommendation of a program (though I'll welcome any feedback) to do this; I more want a description of a series of image processing filters in any mainstream photo manipulation program like Photoshop or Gimp that will convert 8 bit greyscale to legible B&W.

The fact that a "real" copier/scanner can do a good job at this gives me hope, since that doesn't require any manual intervention; But I just can't seem to come up with the be-all-end-all set of steps to get what I want.

Any suggestions?
posted by pla to Technology (4 answers total) 2 users marked this as a favorite
 
Best answer: A "real" copier/scanner basically cranks up the exposure/contrast so that the background goes to pure white. This is harder to do when trying to back out of a greyscale file, so there may not be an automatic general solution.

I'd build up a script chaining Netpbm commands: pnmnorm to force the background to white, then pamditherbw to make a B&W image. (You might also want to take a look at this thread if you want to add OCR; Windows and OS X will index your files so you can quickly find them based on content rather than file location.)
posted by scruss at 7:19 AM on July 26, 2012 [1 favorite]


Best answer: I use stuff from the NetPBM library like Scruss, but I use mkbitmap for final downconverting to 1-bit bitonal image. I use this, literally, thousands of times a day in a batch file and it does an excellent job of making crisp black-and-white text images from grayscale.

What might be even easier: Does your scanner driver have the option to scan as a "fax"? Does it still let you change resolution? Scan something as a 300dpi or 600dpi fax, and you should get similar results to the software scruss and I recommend, but with fewer steps.
posted by AzraelBrown at 7:56 PM on July 26, 2012 [1 favorite]


Totally forgot about mkbitmap; it's very good for continuous lines to B&W. Can work well for text.

(Unfortunately, mkbitmap reminded me of the linked cartoon Loxie & Zoot, which I made the mistake of clicking through to, once ...)
posted by scruss at 6:38 PM on July 27, 2012


Response by poster: Thank you, both - Whether or not I use those specific tools, both have given me a few great ideas to try. In particular, I had honestly never thought about applying frequency filters to the image as a preprocessing step, but those samples looks pretty much perfect.

Now to throw them at my backlog of 1500 or so mixed scans and see how they do! :)
posted by pla at 5:56 AM on July 28, 2012


« Older Speculative Realism 101   |   Give my older desktop some new life Newer »
This thread is closed to new comments.