Help me wrangle PDFs
July 2, 2010 1:59 AM   Subscribe

I'm trying to go paperless, and have scanned and OCR'd huge stacks of paperwork into PDF documents. Can you recommend a tool to split, merge, delete pages etc from PDFs?

So my workflow so far has been: stick a huge pile of related docs into the sheet feeder of our networked scanner at work, and scan them, double sided to a PDF. Then feed that PDF through ABBYY FineReader.

So each PDF contains multiple documents. For example I have all the bank statements for a single account in one huge document. Double sided scanning means there are also a lot of blank pages in the docs.

So, now that I've OCR'd them, I'd like to separate them into separate docs, remove blank pages and probably merge a few docs where they've been split across batches.

I'm imagining something that works like a standard PDF viewer, but lets me drag pages around to re-order, delete pages, and Ctrl-click multiple pages and save them to a file.

Do you know of such a tool. Preferably linux based, but windows will do too. Oh yeah - and free is better :)
posted by blacksky to Computers & Internet (9 answers total) 14 users marked this as a favorite
 
Win only: PDF Split and Merge seems to be most commonly recommended, but I prefer A-PDF Split, even though it is not freeware. Neither has the interface you want; nothing that I know of, short of the commercial products (FoxIt, etc.) does.
posted by megatherium at 2:32 AM on July 2, 2010


PDF Split and Merge is in the ubuntu repositories, so it runs under linux.
pdfshuffler might also be worth a try.
posted by Triton at 4:04 AM on July 2, 2010


On Windows ConcatPDF is my favorite free one. It's a GUI-based app although I don't remember if it has exactly the interface you're talking about. Before installing ConcatPDF 1.1 you have to install Microsoft .NET Framework Version 1 and Visual J# .NET Redistributable Package.
posted by XMLicious at 4:19 AM on July 2, 2010 [1 favorite]


Best answer: If you don't mind working at the command line, look at pdftk. It is not hard to learn and very powerful. I use it all the time to split and concatenate documents for work.
posted by metroidhunter at 4:47 AM on July 2, 2010


You might want to download dotImage from my company Atalasoft. It is an imaging SDK that includes PDF manipulation. The evaluation gives you 30 days to play with it and one of the included samples is a tool that lets you merge/split PDF files via drag-and-drop. IIRC, that tool will run on its own even after the eval expires.

FWIW, I worked on Acrobat version 1, 2, 3, and 4, and while I didn't write the sample tool, I wrote all the code underneath that manipulates the files. If you have problems with it, be sure to memail me.
posted by plinth at 6:12 AM on July 2, 2010


pdf helper
posted by mukade at 8:19 AM on July 2, 2010


Response by poster: Thanks everyone!

Just for anyone looking at this later: I found most of the apps had horrendous, unintuitive UIs, to the extent where writing my own Python glue to make pdftk do what I wanted was less painful. PDFShuffler was the closest to what I was looking for, but seemed to mess up the output PDFs - I have the OCRed text under the original scanned image, and PDFShuffler seemed to either move the text to the top, or loose the image.

Sorry Plinth, I didn't get round to trying yours - a combination of being jaded from all the other tools, and having to fill in a form to try it made me give up. No offence!
posted by blacksky at 8:54 AM on July 3, 2010


I use PaperPort -- very intuitive UI.
posted by blue_wardrobe at 10:38 PM on July 3, 2010


You can do all of these things in Mac's Preview, BTW
posted by rockindata at 10:00 AM on February 11, 2011


« Older Why so centrist?   |   Help me find Robin Hood! Newer »
This thread is closed to new comments.