Solution to OCR many bank statements into excel
September 29, 2012 7:32 PM Subscribe
Need to get data from hundreds of pages of bank records into a spreadsheet. We have a scanner with a document feeder, but would love some recommendations on software/workflow ideas.
I've searched previous questions and have found some info about Abbyy finereader but I'm not sure if this is the current state of the art or what. I'm really new at this and would really appreciate suggestions for a good workflow for getting lots of bank statements into a spreadsheet to analyze. I know that the best option would be to download them from the bank in an electronic format but unfortunately that might not be possible.
I'm using Mac OS 10.7.4 but I also have a Windows 7 machine around that I can use if there are better solutions available for that platform.
I've searched previous questions and have found some info about Abbyy finereader but I'm not sure if this is the current state of the art or what. I'm really new at this and would really appreciate suggestions for a good workflow for getting lots of bank statements into a spreadsheet to analyze. I know that the best option would be to download them from the bank in an electronic format but unfortunately that might not be possible.
I'm using Mac OS 10.7.4 but I also have a Windows 7 machine around that I can use if there are better solutions available for that platform.
The challenge of your task is going to be telling the software what to capture. (Unless it's your intention to capture everything - addresses, promotional messages, logos, etc.)
Are the statements in a similar format that you could set up a template for the OCR software to follow? If you can set up a template, it's going to make this a much easier task to automate. Otherwise, you'll be doing quite a bit of post-scan data remediation.
posted by 26.2 at 10:13 PM on September 29, 2012
Are the statements in a similar format that you could set up a template for the OCR software to follow? If you can set up a template, it's going to make this a much easier task to automate. Otherwise, you'll be doing quite a bit of post-scan data remediation.
posted by 26.2 at 10:13 PM on September 29, 2012
I've previously mentioned Abbyy Finereader here, but I agree with the able comments that OCRing accurately to a spreadsheet is a very difficult task. And, with financial records, you want accuracy.
I can't remember if Abbyy Finereader has a trial version, but if it does, I would download the trial version and test it out for accuracy.
posted by dfriedman at 2:01 AM on September 30, 2012
I can't remember if Abbyy Finereader has a trial version, but if it does, I would download the trial version and test it out for accuracy.
posted by dfriedman at 2:01 AM on September 30, 2012
Should read "above comments", not "able comments"...
posted by dfriedman at 2:01 AM on September 30, 2012
posted by dfriedman at 2:01 AM on September 30, 2012
One approach you may wish to consider is outsourcing. You can scan the docs yourself, obscure personally identifiable information (e.g. by pasting a blackout template over every statement image, hiding the name, address, and account number headings), and then go to one of the many job-bidding websites to get the actual transcription done.
posted by Dimpy at 11:14 AM on September 30, 2012
posted by Dimpy at 11:14 AM on September 30, 2012
I don't have the knowhow, but maybe this could be done with something like ImageMagick?
I'm thinking of a workflow like this:
scan the document to image file
use imagemagick to extract certain regions of each page, given that the statements will be printed on a uniform bank template (at least, on a per-account basis)
hand off those snippets to an OCR engine
insert results into database/spreadsheet.
posted by snuffleupagus at 11:28 AM on October 7, 2012
I'm thinking of a workflow like this:
scan the document to image file
use imagemagick to extract certain regions of each page, given that the statements will be printed on a uniform bank template (at least, on a per-account basis)
hand off those snippets to an OCR engine
insert results into database/spreadsheet.
posted by snuffleupagus at 11:28 AM on October 7, 2012
This thread is closed to new comments.
posted by megatherium at 8:16 PM on September 29, 2012