How to rotate alternating pages by non-90-degree-multiple amounts?
March 26, 2010 7:05 AM   Subscribe

I have a PDF (hundreds of pages) that was scanned from a book. All the odd pages are tilted about 6 degrees left, all the even pages, 6 degrees right. How can I fix this, preferably with free software? A command line solution would be ideal.
posted by dmd to Computers & Internet (7 answers total) 4 users marked this as a favorite
 
Response by poster: Oh, and I am completely platform-agnostic. Windows, Mac, Linux, whatever.
posted by dmd at 7:06 AM on March 26, 2010


ImageMagick might do the job for you, that's got lots of functionality and it's all accessible through the comand line, though you might have to convert all the pages to individual images, rotate them, and then somehow stick them back together in a pdf.
posted by BigCalm at 7:11 AM on March 26, 2010


Best answer: unpaper is what you want.
posted by themel at 7:24 AM on March 26, 2010 [6 favorites]


You could download a free trial of photoshop.

Then just record an action of you rotating the image, then cropping it (will be most likely be needed after rotation).

You can then apply that action to all your images.
posted by travis08 at 7:25 AM on March 26, 2010


Best answer: Yeah, doing this is going to stink. I write PDF manipulation software for a living, just as an FYI.

You can do this with my company's products. Here's how I would do it,
foreach page in the pdf, extract the image (assuming it's an image-only page)
Rotate the page to a new image, write that into the new PDF.

The trick in that is that for hundreds of pages, memory is an issue. We have a class that handles that called ImageSource and a specialized one that extracts images out of PDFs.

the code (C#) would be something like this:
int pageNo = 0;
PdfImageSource pdfSource = new PdfImageSource(pdfStream);
DelegatedImageSource source = new DelegatedImageSource(pdfSource, (image) => {
    RotateCommand rot = new RotateCommand( (i & 1) != 0 ? 6 : -6);
    i++;
    return rot.Apply(image).Image;
});
PdfEncoder encoder = new PdfEncoder();
encoder.Save(outputStream, source, null);

What this does:
(1) creates an image source for the PDF that will extract images per page.
(2) wraps that in a meta image source that will apply a lambda expression to an image before returning it
(2a) the lambda expression rotates the image either 6 or -6 degrees depending on the page #
(3) Resaves the images

There are a few things here that will give you issues:
(1) our product is not free. Since this is a one-time shot, you might be able to do this under a 30 day eval
(2) there might be an issue with propagation of lossy codec errors from decoding and re-encoding - the actual solution that I believe is best is to rewrite the page content stream to adjust the transform matrix applied to the image. This will avoid the whole decoding/re-encoding issue. Our (exposed) code doesn't do this yet.
(3) you need visual studio and will need to implement the code in the blog article linked above, as well as the outlined code above and enough to turn it into a command line tool. This would be trivial for me - don't know about you.
posted by plinth at 7:26 AM on March 26, 2010


Best answer: Oh, and in case my earlier comment was a bit short, here's my entire free software book digitalization workflow:

poppler comes with pdfimages, which will dump the images from your scanned PDF to PBMs, which unpaper will correct for you.

If you want a searchable PDF, this guide works great. Otherwise, just use ImageMagick's convert to turn the PPM pages back into PDFs (doing any compression and depth adjustments along the way) and merge them with pdftk.

I found xjobs helpful in dealing with the resulting deluge of batch jobs, but that's really because I'm grasping at every opportunity to mention that my desktop PC is an i7 :)
posted by themel at 7:50 AM on March 26, 2010


Again not free (evaluation probably available) but Acrobat's OCR tool deskews pages automatically, it usually works pretty well but YMMV.

If the odd pages and even pages are on the same page you will need to split them before running the tool, here's a javascript tool for that.
posted by stratastar at 7:46 PM on March 27, 2010


« Older TV News Helicopters   |   Nazi punks #$%& off! Newer »
This thread is closed to new comments.