Best way to sort PDFs
November 21, 2014 9:00 AM   Subscribe

I have about 14 PDF Files with about 400 pages. Each page is an invoice with a number from 1 to 400. Unfortunately, they aren't all in the right sequence. What is the best way to sort them? Is there a way to do this automatically? If not, what other options do I have. The thumbnail view in MAC OS to small to see the invoice number.
posted by jfricke to Computers & Internet (4 answers total) 3 users marked this as a favorite
 
If the text in the document is text, and not an image (or if the image is good enough for them to be very reliably converted to text with an OCR tool), you could

1. Burst the pdf pages into separate documents (using a tool like pdftk)
2. Write a script to use a regular expression to programmatically extract the invoice number from each page
3. Use a tool like pdftk to recombine the invoices in the correct order using the extracted invoice numbers.

This would require writing a script - there are many languages that would be capable of this.

Honestly, if this is the only time you'll have to do this task, it would probably be faster to just sort them manually.
posted by Salvor Hardin at 9:16 AM on November 21, 2014


Someone else might chime in with a way to automate, but for a manual process I'd highly recommend PDFToolkit.

It allows you to save each page individually (as a PDF) and then re-append them together into one PDF in whatever order you'd like. Good luck, as it sounds quite tedious!
posted by Twicketface at 9:18 AM on November 21, 2014


Came to say basically what Salvor Hardin said. If the invoice number is in a reliable place in all of the pages, pdftotext can extract text from just that specific area, making your job much easier.

PDFs are just virtual marks on virtual paper, though, so there's no general solution.
posted by scruss at 10:17 AM on November 21, 2014 [1 favorite]


I don't know if this sounds too wacky, and I don't know how the invoices are formatted, but if they are single page invoices, you could print them out, shuffle them into order by hand, and then scan them back into a single pdf file.

As you have 14 files, you could do both a digital and a paper sort, and see which one works the best for you.
posted by carter at 11:15 AM on November 21, 2014


« Older Pro/Con: Tankless Water Heater. Is it worth it?   |   Best time of year to hike the Lake District? Newer »
This thread is closed to new comments.