Splitting a PDF by bookmark
May 28, 2009 9:17 PM   Subscribe

I'm looking for some inexpensive/free solution to splitting a PDF by bookmark (first level bookmark), and using the bookmark names for the filenames of the split PDF.

I've done some research already and discovered 2 products that are almost good enough: PDF Split-Merge and PDF Split and Merge.

The only drawback to PDF Split-Merge is that it costs USD$20-30 per user, and I have about 12 users to roll it out to. (That may sound like nothing, but my IT department runs on a literal shoestring. The other day we each got given a roll of duct-tape so each of us can repair the disintegrating carpet in our respective cubicles.)

The only drawback to PDF Split and Merge (which is free and open-source), is that when it splits the PDF, it doesn't use the bookmark names as filenames. Instead you get generic filenames with an appended counter based on page number.

I've done some scripting, so I'm open to hearing about, for example, Perl modules that help with this. My searches on CPAN have led me to PDF::Core and PDF::Parse, which don't really seem to be built for this.
posted by Ritchie to Computers & Internet (2 answers total)
 
Hi! We should start a support group or something. I'm trying to figure this out, too. I am utterly unable to script anything, so maybe you can help me, too.

My provisional solution is to use pdftk -- which is a phenomenal little program -- to extract metadata from the pdf (pdftk mydoc.pdf dump_data output report.txt). Then, using the metadata from the pdf (which includes bookmarks, bookmark level and page number) to create a batch file for pdftk which would properly name the files. This step involves me editing the metadata report in emacs with macros. You could probably do this in perl fairly painlessly.

Then it's just a matter of running the batch file (shell script, whatever). That was easy, huh? Of course you'd have to do that with every file, but there's worse things. Like having to do it by hand in Acrobat. Ugh.

Pdfs are great and all, but it is a seriously clunky format to try to do anything interesting with.
posted by Barry B. Palindromer at 1:48 PM on May 29, 2009


Response by poster: Hmm. It'll be a bit of a kludge, but it might work. PDF Split and Merge includes a command-line option, so it might be possible to script the whole thing. I'll have a go, and if I can get it to work I'll post the results back here.

The other option, I suppose, is to contacts the developers directly and beg/bribe them to add the feature.
posted by Ritchie at 4:26 PM on June 1, 2009


« Older Cratification   |   Professional de-cat-dandering Newer »
This thread is closed to new comments.