Open Source / Free pdf redacting tools?
February 21, 2012 7:58 AM

I need to redact some information in several pdf files. It doesn't need to be secure just obscure. Is there a way to do this without buying software ? Open source option?

I don't have acrobat pro. The pdfs already exist (automated copier scans). They are just light reading and people mentioned in them don't want their names/addresses/phone numbers to be easily available but nobody is going to be very motivated to 'uncover the data'.
posted by srboisvert to Computers & Internet (12 answers total)
Oh and I use PCs. No Macs.
posted by srboisvert at 8:02 AM on February 21, 2012

I've used PDF Escape online , it should be able to do everything you need.
posted by missmagenta at 8:05 AM on February 21, 2012 [1 favorite]

This OpenOffice extension claims to be able to import PDFs into Draw. So I can only assume you can make black shapes over the redacted info.

Otherwise, is there a reason (outside of the waste of paper/time) you can't print, sharpie and scan them back in?
posted by griphus at 8:06 AM on February 21, 2012 [1 favorite]

You can use Foxit Reader for this. Draw a rectangle around the info in question, then choose a fill color in the rectangle's properties. When done, print to pdf using a tool like PrimoPDF or PDFill. Not the quickest solution, but it's free.
posted by jon1270 at 8:14 AM on February 21, 2012

Otherwise, is there a reason (outside of the waste of paper/time) you can't print, sharpie and scan them back in?

I've tried this. My scanner can see right through the Sharpie.
posted by jon1270 at 8:15 AM on February 21, 2012

Nthing PDF Escape (it's a web-based program, so you don't have to download), and there are others; basically you can electroncially "white" it out and then save the edited copy to your hard drive.
posted by sm1tten at 8:22 AM on February 21, 2012

sharpie doesn't work depending on how the docs are printed, but white out or correction tape should, but that can be pretty time intensive.
posted by dpx.mfx at 8:30 AM on February 21, 2012

Obscure vs Secure - be careful what people claim won't bother them when it comes to personal information being distributed without consent. Err on the side of caution and just get rid of it or later on you may be left holding the bag - even though you followed the directions that appeared clear at the time. I don't use the online solutions, so I cannot speak to those. However, I deal with PDFs all the time and a big part of my job is converting data back and forth between formats.

First, you can download Acro Pro and get a free, full functional 30 trial with no need for a credit card or any info other than an email address.
Acrobat Pro X download (465mb). It is big, but not all that bad for a free trial and it is the entire app.

There are also other PDF editors that will do the same with similar trial option. One that I use all the time is NitroPDF, but there are several out there.
NitroPro 7 download (it is a lot smaller). Some of the third party apps can be clunky, but in some cases, they do certain operations better than Acrobat Pro. As far as redaction goes, it really depends on the source file as it will dictate the particulars.

Even though these are "copier scans", you should be aware the text may get automatically recognized and laying there ready to be extracted with simple copy and paste. Redaction can be defeated rather easily if certain precautions are not taken. You would be surprised that data, while appearing secure, has nothing more than a simple band-aid covering it and can be brushed away with a few keystrokes using off the shelf software. We are not talking hack stuff here - a precocious teen on a Captain Crunch sugar rush could do it.

There is a way to make it more secure though. The key is that after PDF files are first redacted, they then need to be re-printed to a flat file, image only PDF. Most PDF print drivers offer this option, but the feature is not always in the same place in the print window and some don't do it at all. If done correctly though, the redaction and the image text merge and cannot easily be separated. The flat file PDF can still be reconstituted into editable text, but the redacted sections will convert to gibberish or just black lines.
posted by lampshade at 9:15 AM on February 21, 2012 [2 favorites]

(Oh and FWIW, I suggest those two softwares as a long time, licensed user of both packages and others. While I believe people should pay for software, at the same time those companies openly offer those trials as advertised.)
posted by lampshade at 9:23 AM on February 21, 2012 [1 favorite]

PDF X-Change Viewer (link) allows you to draw black boxes on top of content in the free version.

I usually then print the resulting PDF to a PDF, using a pdf printer such as PDFCreator (warning, sneaky installer) or bullzip so that the content is actually redacted, and not just hidden.
posted by motdiem2 at 1:57 PM on February 21, 2012

You can use ImageMagick and Ghostscript to convert your PDF to a JPG.

Then use Gimp to black out the sections that you want to hide.

Then convert back to PDF!
posted by gregr at 6:45 PM on February 21, 2012

all those authors of shitware-ridden "pdf printers" that wouldn't do a thing without GPL'd "ghostscript" software can die in a bike fire for all I care. I also suspect that this method of producing a pdf doesn't actually omit the overwritten letters from the output pdf. (they are a rather literal translation starting with the Windows printing API through the Windows generic postscript printer driver through gs with the "pdfwriter" backend, so in the resulting file you'd expect to see the pdf instruction to draw the letter shapes in the expected location, followed by the pdf instruction to draw an opaque black rectangle in the same spot)

I'd use conversion to image and back to pdf, so that I know that only what I can see is in the PDF.

In Unix (linux) with the appropriate tools installed, here are some steps you can follow:

First, convert all the pages to lossless image files, one per page. If bilevel (B&W) is acceptable, then tifflzw is a good choice. otherwise, consider tiffgray or tiff24nc (with these formats you may need to specify a "-r" argument to set the resolution in dpi, e.g., -sDEVICE=tiffgray -r300)

$ gs -dNOPAUSE -sDEVICE=tifflzw -sOutputFile=/tmp/page%04d.tif file-to-redact.pdf

then edit the pages (e.g., in gimp), adding redaction rectangles where you want (rectangle tool + bucket fill "whole selection")

Convert the distinct images to a multipage tiff:
$ tiffcp /tmp/page*.tif /tmp/redacted.tif

and convert the tiff to pdf:
$ tiff2pdf -o redacted.pdf /tmp/redacted.tif

the redacted tif will probably be larger than the original, because bitmaps are "heavier" than text, but you can be assured that the original text is no longer anywhere to be found.
posted by jepler at 7:04 PM on February 21, 2012

