Automated PDF modification
December 17, 2012 1:55 PM Subscribe
Is there an automated way of placing elements from one PDF file into another? Open to coding this via Python if a relevant module exists.
posted by mnemonic to Computers & Internet (4 answers total) 1 user marked this as a favorite
I have two pdf's, Doc 1 and Doc 2. All pages in both documents are US Letter sized. Doc 1 contains large tables of values (generated in Excel with PDFCreator), one per page; this is the only content on the pages. Doc 2 contains pages with my company's border, and header info (title, page # etc.). Doc 1 tables go into Doc 2 bordered pages. I would like an automated way of taking the tables of Doc 1, and placing each into a separate page in Doc 2, without overlapping elements.
I'm decently proficient in Python and think I could code something if some pdf/vector graphics handling modules exist. I've done nontrivial programming with the xlrd3 module for working with our Excel files. How I think the code for this might work:
Open Doc 1, Doc 2 as vector images
For each page in Doc 1, Doc 2:
- Get content bounds in Doc 1
- Scale content in Doc 1 to fit in Doc 2 borders
- Insert in Doc 2 at [coordinates]
What Python modules would I need to do this? Or any other approaches would be welcome.
- This task recurs in my work every few weeks, and the pages can total over a hundred. I handle this currently by printing out Doc 2 (page borders), then refeeding the pages into the printer and printing Doc 1. This is inconvenient because the printer often fails to grab the pages, and because I have to run back to my computer to issue the next job (it's not actually 2 files each time, more like 14 separate pairs of such files, that must be printed separately), and occasionally I can get tripped up by coworkers printing over my pages.
- I often paste Excel tables directly into Word docs. This fails here because the formatting gets mangled in Word from having lots of merged cells and landscape oriented tables in portrait orientation pages. I can sort of transpose the tables in Excel, and set the text orientation in Word to vertical, but each table requires a ton of cleanup, and there are many of them.
- I can manually combine the docs by opening the PDFs in Inkscape (or other vector illustrating program), but again, 100+ tables. Inkscape opens up one page at a time.