Join 3,512 readers in helping fund MetaFilter (Hide)

Tags:

Help Me Convert Hard Copy Data to Excel
November 10, 2010 6:29 AM   Subscribe

Scanning, Excel, data input help needed. I need to convert info from a lot of hard copy bank statements into a an excel spreadsheet. Have tried scanning directly into xls and it results in a disorganized mess. Have tried to cut and paste from scan created pdf but it will not allow. Please help in automating this process and making my life a much happier place.

Trying to avoid manual input of hard copy bank statements into an excel spreadsheet. I have moderate excel skills. Although ultimately all the info will end up on one spreadsheet, I do not require it to be imported to that single spreadsheet now. I can combine later. I have received the statements in 3 formats from my client. 1. Basic hard copy. 2. Scanned into a pdf. 3. Scanned into xls. The scanned pdf does not allow cut and paste and it will not allow conversion into a text file. It appears that it was scanned as an image. The xls scan is a disorganized mess and will be more work reformatting than it's worth. Do you have any suggestions for automating this process? Is there a way to setup excel to receive the scanned info more appropriately? Any ideas will help. I need to get these statements into a spreadsheet. We're talking about a 600+ pages of info. Thank you.
posted by mbx to Computers & Internet (9 answers total) 2 users marked this as a favorite
 
can't you scan it into a text file and then import that into excel? The wizard will allow you to choose how you want the columns recognized.


Alternately wouldn't the full version of acrobat reader allow you to copy tables to maintain formatting?
posted by JPD at 6:36 AM on November 10, 2010


You need some OCR software that will read the scanned PDF images and output the text into a text or Excel file. However no OCR software is 100% perfect, so you will be proof-reading everything anyway and making corrections. This is based on the best-case scenario of clear, hi-resolution scans of printed text - if it's hand writing, and/or a low quality scan, then OCR is likely going to be basically useless.
posted by EndsOfInvention at 7:06 AM on November 10, 2010


You're in optical character recognition territory... (OCR is the search term you're looking for). Since this is a Big Business (what with medical and legal applications) there are a fair number of pricey solutions. ABBYY Finereader pro kicks ass, but is $400.

As a general tool to have around, I find that Able2Extract has been super helpful for me. This does not do scanned images. A more advanced product Able2Extract Professional has the OCR capabilities you need, though I have not used it.

It says you can download a free trial - my suspicion is that it will be kneecapped in some way. I'd download the trial, see if it works ~at all~ and then look into their 30 day license option.

Maybe there are services? Also, Google apparently has OCR services.
posted by BleachBypass at 7:06 AM on November 10, 2010


The only OCR software I've really used was ABBYY FineReader which worked OK.
posted by EndsOfInvention at 7:07 AM on November 10, 2010


Two more links

http://www.planetpdf.com/enterprise/article.asp?ContentID=6860

http://www.labnol.org/software/convert-scanned-pdf-images-to-text-with-google-ocr/5158/
posted by BleachBypass at 7:08 AM on November 10, 2010


There's a program called pdf2xl that is helpful for copying pdf columns to tables.

OCR from acrobat / abbey and pull out tables from there....
posted by stratastar at 8:16 AM on November 10, 2010


Its like the old joke " If i was trying to get there i wouldnt start from here"
I would ask your client if they can get the data from their bank in a better data format. Converting from the printed sheet is going to be hard. Especially with numbers and cross checking. At least minimize the load by getting as much as possible from the source.
Even if it is just a saved web page to scrape rather than full csv files that will help.
If they create a mint.com ( or similar ) account that may help to extract the data online for at least a subset of the data if the bank is not accommodating.
posted by stuartmm at 8:17 AM on November 10, 2010 [2 favorites]


+1 on the suggestion of going back to the source. ALL banks are automated these days and I can't imagine they don't have ways to provide a dump of the transactions. Get in touch with them and find out.

Even with 99% efficiency you're still looking at 1 wrong number out of 100. What's going to take longer, double-checking the scanned results, or just sitting down and retyping them? Quite a lot of the time the latter is a lot less error-prone.
posted by wkearney99 at 11:15 AM on November 10, 2010


Thanks for your help. Going to give your ideas a try. Unfortunately, I don't have access to bank assistance in this matter. The bank statements I need to convert are evidence in a court case and what I have is what I have. Copy and paste city.
posted by mbx at 6:05 AM on November 11, 2010


« Older Is it safe to cross the Cascad...   |  Need help finding an impromptu... Newer »
This thread is closed to new comments.