Join 3,380 readers in helping fund MetaFilter (Hide)


Need magic to convert PDF to Excel
October 23, 2007 11:46 AM   Subscribe

OCRfilter: any software that will take a PDF and convert it to Excel?

I'm looking something that will take a PDF or a TIFF of an account statement or the like (columns of numbers) and convert it to Excel.

I know nothing's going to be even close to perfect, but even getting a csv file will al least save time to ke-key every single number into Excel. Clean up and tweaking can happen after the numbers themselves are in there.

Anyone use anything that does this?
posted by bartleby to Technology (9 answers total) 1 user marked this as a favorite
 
I've been playing with all the options out there and FineReader is definitely the best.
posted by zeoslap at 12:00 PM on October 23, 2007


I've had a lot of luck just copying a table from a PDF and pasting it into Excel.
posted by hydropsyche at 12:02 PM on October 23, 2007


It can generate HTML and I think excel can take an html table as input so should work.
posted by zeoslap at 12:02 PM on October 23, 2007


If you already have MS Office, the Office Document Image Writer virtual printer and Image viewer contain fairly decent (Omnipage engine) OCR. Image viewer can read a TIFF directly and apply OCR.
posted by scruss at 12:39 PM on October 23, 2007


Here's how I have done this in the past.

Use pdf2text (aka pdfToText) to convert the page to raw text. Use the -layout flag to keep everything aligned. Then import into open office calc, choosing space as your delimiter and the 'merge delimiters' option.

I've had fairly good results doing this, sometimes combined with a little bit of scripting in perl or ruby to tidy things up prior to the import.
posted by chrisamiller at 12:42 PM on October 23, 2007


I use a software called "ABBYY PDF Tranformer"
posted by hubs at 1:03 PM on October 23, 2007


ABBYY is pretty good. We do this a lot at my job and we usually use OmniPage to OCR it. It helps if you define the columns and rows before you recognize the text.
posted by kenzi23 at 2:37 PM on October 23, 2007


If you have the budget Cogniview (review) is good.
posted by labnol at 10:41 AM on October 24, 2007


I believe that Able2Extract is what you're looking for.
posted by MrHappyGoLucky at 12:46 PM on October 25, 2007


« Older Yet another Halloween question...   |  Can you think of any more famo... Newer »
This thread is closed to new comments.