Scanning Shoddy Microfilm
May 13, 2007 11:27 AM   Subscribe

How do I improve the legibility of the horrible old Cyrillic newspaper microfilm negatives I am scanning? The main problem is that the background is often faded and hard to distinguish from the text. Is there some clever software trickery that will solve it?
posted by thirteenkiller to Computers & Internet (15 answers total)
 
Well, you can play with the scans in photoshop, increasing the contrast for example; but I doubt whether you will significantly improve legibility, because you can't add any more visual information over what you have. If bits of text are lost into the background, then I guess they are lost.
posted by londongeezer at 11:50 AM on May 13, 2007


Your scanning software may allow you to adjust the gamma settings. If not, save the scan as greyscale image, then use photo editing software to adjust it.
posted by euphotic at 12:11 PM on May 13, 2007


Sometimes inverting the colors it so you're reading it in negative (or, in this case, I guess, positive) helps. I'm not sure how that could be used to help the scanning process, but it's worth knowing when you're trying to read it.
posted by katemonster at 12:26 PM on May 13, 2007


Can you post a sample page of one of your scans? Also, which scanner are you using? Have you already tried bitonal instead of grayscale? If not, the answer might be "make bitonal images". But if you already are and they still aren't readable, then more complicated image processing may be necessary.
posted by rajbot at 12:39 PM on May 13, 2007


Response by poster: Here's a sample of a troublesome bit. This article is mostly readable, but you can get an idea of the noise and lack of contrast issues. Other articles are way worse, so I haven't kept the scans of them.

I'm using "Canon Microfilm Scanner 500" to read the microfilm and scanning with eCopy Desktop. I've tried scanning in text mode and also in picture mode.
posted by thirteenkiller at 12:56 PM on May 13, 2007


Response by poster: Another example
posted by thirteenkiller at 1:03 PM on May 13, 2007


If possible, tweak the scanner settings (disable any scanner filters unless they're really good, and experiment with gamma) to get the best data possible before processing in Photoshop. Those examples only contain 30-odd levels of grey (have a look at a histogram and count the obvious peaks), far short of what you may be able to get out of the device.

With a bit more data you might be able to get a worthwhile improvement in legibility (I had a quick go at the images with various filters and contrast adjustments but the results were poor).
posted by malevolent at 1:29 PM on May 13, 2007


I think you would have some success with that in Photoshop using "Adjust Levels" and pulling back the whitepoint.
posted by Wolfdog at 4:38 PM on May 13, 2007


I think your main problem is getting the text crisper. See if you can increase the DPI or anything (try anything that claims to increase quality) of your scanner. I got the background to be less blurry by doing the following in Photoshop.

1) Paste original image into a layer
2) Create a new layer, select it and paste the original image into it
3) Using the bottom-most layer in the layers panel, threshold the image. WHen you go to the threshold window, you should see the histogram, with several peaks and then one major low region (valley) use about the middle of the valley as your threshold point
4) Go to the top-most layer and set the blending mode to Pin Light.

I ended up with this mage. It's slightly more readable, but the text is still fairly blurry. Perhaps some photoshop\image processing guy will come along with something better.
posted by !Jim at 5:04 PM on May 13, 2007


Levels and/or Curves dialog in Photoshop is the way forward. I'm sure between the two of you you can figure out how it works if you don't already have elite photoshop skillz.

These tools should give you something somewhat less fuzzy and mangled than what !Jim posted, but perhaps not that much better. Some of the text in those examples does appear to be pretty much beyond rescue.
posted by public at 5:29 PM on May 13, 2007


If bits of text are lost into the background, then I guess they are lost.

I actually did this as prep for some OCR work, and curves in PS could actually bring out text on the other side of the paper. Make the curve snap from bottom to top over the midtones and you'll get complete separation of white and nonwhite -- even what you couldn't see come up on a normal scan. The scanner is going to white balance poorly and so 'lost' information is really just hidden in midtones with the color of the paper (or negative). The "multiply" blending mode for a copy will work too.

Still might not be OCR-ready, but definitely legible.
posted by cowbellemoo at 6:22 PM on May 13, 2007


I'll look for a link, but I read somewhere when I was doing some scanning a page that suggested using a dark (blue/black) backing page for your scans. This lets you adjust the contrast more dramatically in software afterwards without getting the reverse of the pages.
posted by acro at 9:44 PM on May 13, 2007


OK, microfilm. Misread that, the backing thing probably won't work...
posted by acro at 9:46 PM on May 13, 2007


Thanks for posting the sample, thirteenkiller.

If you set your scanner to do max hardware resolution (400x400dpi), then there is a lot we can do.

Using a dumb bitonalization and unpaper processing using the default settings, we can clean up some of the noise from the dirty negative. But we really need the full resoultion scan to do any significant post-processing. Still here is your sample with some noise filters applied: 68mmm9i-out.png.

The text is mostly unreadable, but we can fix that with higher-rez input and creating better bitonal input. This is just to show some of the noise reduction.
posted by rajbot at 11:00 AM on May 14, 2007


Thats some neat software rajbot.
posted by acro at 1:30 PM on May 14, 2007


« Older Can you plug two WRT54Gs into a router?   |   Do I have to be a citizen of the US to register a... Newer »
This thread is closed to new comments.