Turn grayscale scanned documents into clear B&W
August 13, 2017 2:23 PM   Subscribe

How can I make scanned grayscale pdfs into crisp, clear white background black and white pdfs? (example inside).

I have some black and white teaching materials that I scanned to pdfs a while ago. I don't remember the settings I used, but some of them are crisp, clear, and use only black and white. Others are grayscale and don't print well. Here's an example: make the left look like the right

Is there a way to do this using Acrobat Reader DC or other free (Windows) software, or online? The files are about 65 MB each, if it matters.
posted by rossination to Computers & Internet (9 answers total) 1 user marked this as a favorite
This used to be part of my job. The way I did it was to save the pages as images and then manipulate them with image editing software (I like GIMP for these purposes). I used Acrobat to convert from .PDF to .jpg or .tif, but you can do it at pdftoimage.com.

If the graying is mild, you can sometimes fix it by adjusting the brightness and contrast. If not, you can select the gray area and then bucket fill it with white or paint it white with a huge brush. Then save the images back to PDF.

I haven't tried them myself, but I know people who use PDFcreator and PDFlite, and have been happy with them. They're both free and open source (as is GIMP).
posted by The Underpants Monster at 2:56 PM on August 13, 2017

Sorry to threadsit - I should mention that these are files with 100+ pages in them. I'm hoping for an automated solution if possible!
posted by rossination at 2:59 PM on August 13, 2017

I would do this with Photoshop. It will probably be as simple as messing with the contrast. I believe you can export all PDF pages as pngs, then open them up in Photoshop. Record an action of yourself editing an image, saving it, and closing it. Then you can hit that button one time for each image OR you could make a droplet of it and drag the pngs into it. Make sure your pngs all save in the same place and then create a new PDF from them. Tiff might be a file type to try as well but this is entirely doable.
posted by Foam Pants at 3:12 PM on August 13, 2017

irfanview has a batch conversion mode. In the advanced options, you can select the color depth (ie make it black/white) and change sharpness, contrast etc.
posted by cfraenkel at 3:50 PM on August 13, 2017 [3 favorites]

Photoscape also has a batch conversion tool.

I'd personally hit up GIMP, toy with the threshold, and then export as PDF. I've not sure if it'd be possible to do as a batch operation, though. Mind you, I have been known to do crazy things like mess with 200+ photos over a weekend to get them ready for print.

Ask yourself if this is something you could Fiverr as well.
posted by kariebookish at 4:00 PM on August 13, 2017

MeFi's Own™ Matt Zucker wrote noteshrink: really meant for handwritten notes, but it segments scans nicely.
posted by scruss at 7:22 PM on August 13, 2017 [1 favorite]

I think people use ImageMagick scripts to automate this kind of stuff.
posted by The arrows are too fast at 7:25 PM on August 13, 2017 [1 favorite]

OK - for more than a decade I worked on document imaging software. What you want to do is a process called binarization. It's process of reducing a multibit per pixel image to 1 bit. There's a fair amount of black magic in the process to make that happen.

You can use ImageMagick and what you want it called "Adaptive Threshold". It's a process that looks at a moving window onto the image to determine is a local area should be black or white. For the price (free), it will be about as good as you can get.

I don't know that IrfanView has an Adaptive Threshold implementation. There is an add-on to get PDF output.

PhotoShop's threshold tools are not geared for documents. Don't waste your time.

At my old company (Atalasoft), we had a better variant of this called Dynamic Threshold, which gave much better results for documents of this class. The product is a software development kit and unless you code in C# daily, I wouldn't recommend it for you. If you do code in C#, I would say get yourself an evaluation license, write a quick took to iterate over your files, read in the images, run the DynamicThresholdCommand, then use the PdfEncoder to turn them into a PDF. We routinely had customers who built tools on an evaluation license, solved their problems, then never purchased.
posted by plinth at 8:27 AM on August 14, 2017 [1 favorite]

Save the PDFs out as individual image files; use Photoshop or Irfanview (or GIMP?) to do the necesary adjustments, and use those settings for batch processing; re-PDF the final images.

IrfanView is free. GIMP is free; I have no idea if it has batch processing. If the originals weren't scanned with exactly the same settings and/or aren't substantially identical, batch processing may not work well enough, although even if you have to adjust the settings for each document that's probably worth doing.

I don't recommend hiring it done through Fiverr; there's almost no way to establish if the one you hire actually knows what they're doing. (I can't find the link now, but I had read one where someone bought a $5 service every day for a few months and kept records, and came up with something like "10% outright scams; 20% lovely good service; 50% shoddy service; 20% well-intentioned but incompetent.")

FWIW, if you DM me after working hours (erm, in about 6 hours; I'm on the left coast), I'd be happy to take a look at a couple of files and try to figure out if they'd be quick & easy conversions with the right software, or if they're going to be troublesome no matter what you're dealing with.

And if they're quick & easy enough, I might volunteer to just do them - it's the kind of project I enjoy, and sometimes it really is just "plug these settings into the automator and POOF instant good files" where upload & download takes longer than the actual conversion process.
posted by ErisLordFreedom at 2:37 PM on August 14, 2017

« Older I'm contemplating quitting my job without anything...   |   best way to sell silver coins - Seattle Newer »
This thread is closed to new comments.