<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Solution to OCR many bank statements into excel</title>
	<link>http://ask.metafilter.com/225602/Solution-to-OCR-many-bank-statements-into-excel/</link>
	<description>Comments on Ask MetaFilter post Solution to OCR many bank statements into excel</description>
	<pubDate>Sat, 29 Sep 2012 19:32:59 -0800</pubDate>
	<lastBuildDate>Sat, 29 Sep 2012 20:16:17 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Solution to OCR many bank statements into excel</title>
		<link>http://ask.metafilter.com/225602/Solution-to-OCR-many-bank-statements-into-excel</link>	
		<description>Need to get data from hundreds of pages of bank records into a spreadsheet.  We have a scanner with a document feeder, but would love some recommendations on software/workflow ideas. &lt;br /&gt;&lt;br /&gt; I&apos;ve searched previous questions and have found some info about &lt;a href=&quot;http://finereader.abbyy.com/&quot;&gt;Abbyy finereader &lt;/a&gt; but I&apos;m not sure if this is the current state of the art or what.  I&apos;m really new at this and would really appreciate suggestions for a good workflow for getting lots of bank statements into a spreadsheet to analyze.  I know that the best option would be to download them from the bank in an electronic format but unfortunately that might not be possible.&lt;br&gt;
&lt;br&gt;
I&apos;m using Mac OS 10.7.4 but I also have a Windows 7 machine around that I can use if there are better solutions available for that platform.</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2012:site.225602</guid>
		<pubDate>Sat, 29 Sep 2012 19:32:59 -0800</pubDate>
		<dc:creator>capnsue</dc:creator>
		
			<category>OCR</category>
		
			<category>bank</category>
		
			<category>statements</category>
		
			<category>spreadsheet</category>
		
			<category>scanner</category>
		
	</item>
	<item>
		<title>By: megatherium</title>
		<link>http://ask.metafilter.com/225602/Solution-to-OCR-many-bank-statements-into-excel#3264607</link>	
		<description>I don&apos;t have an answer, but an observation. Scanning and OCR from paper to spreadsheets is a very difficult task. Even with the best OCR program (and FineReader is, in my experience, the best) it is still very challenging. There will need to be a lot of post-scanning correction by human eyes.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2012:site.225602-3264607</guid>
		<pubDate>Sat, 29 Sep 2012 20:16:17 -0800</pubDate>
		<dc:creator>megatherium</dc:creator>
	</item><item>
		<title>By: 26.2</title>
		<link>http://ask.metafilter.com/225602/Solution-to-OCR-many-bank-statements-into-excel#3264667</link>	
		<description>The challenge of your task is going to be telling the software what to capture. (Unless it&apos;s your intention to capture everything - addresses, promotional messages, logos, etc.)&lt;br&gt;
&lt;br&gt;
Are the statements in a similar format that you could set up a template for the OCR software to follow?  If you can set up a template, it&apos;s going to make this a much easier task to automate.  Otherwise, you&apos;ll be doing quite a bit of post-scan data remediation.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2012:site.225602-3264667</guid>
		<pubDate>Sat, 29 Sep 2012 22:13:04 -0800</pubDate>
		<dc:creator>26.2</dc:creator>
	</item><item>
		<title>By: dfriedman</title>
		<link>http://ask.metafilter.com/225602/Solution-to-OCR-many-bank-statements-into-excel#3264746</link>	
		<description>I&apos;ve previously mentioned Abbyy Finereader here, but I agree with the able comments that OCRing accurately to a spreadsheet is a very difficult task.  And, with financial records, you want accuracy.&lt;br&gt;
&lt;br&gt;
I can&apos;t remember if Abbyy Finereader has a trial version, but if it does, I would download the trial version and test it out for accuracy.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2012:site.225602-3264746</guid>
		<pubDate>Sun, 30 Sep 2012 02:01:07 -0800</pubDate>
		<dc:creator>dfriedman</dc:creator>
	</item><item>
		<title>By: dfriedman</title>
		<link>http://ask.metafilter.com/225602/Solution-to-OCR-many-bank-statements-into-excel#3264747</link>	
		<description>Should read &quot;above comments&quot;, not &quot;able comments&quot;...</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2012:site.225602-3264747</guid>
		<pubDate>Sun, 30 Sep 2012 02:01:46 -0800</pubDate>
		<dc:creator>dfriedman</dc:creator>
	</item><item>
		<title>By: Dimpy</title>
		<link>http://ask.metafilter.com/225602/Solution-to-OCR-many-bank-statements-into-excel#3265053</link>	
		<description>One approach you may wish to consider is outsourcing.  You can scan the docs yourself, obscure personally identifiable information (e.g. by pasting a blackout template over every statement image, hiding the name, address, and account number headings), and then go to one of the many  job-bidding websites to get the actual transcription done.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2012:site.225602-3265053</guid>
		<pubDate>Sun, 30 Sep 2012 11:14:32 -0800</pubDate>
		<dc:creator>Dimpy</dc:creator>
	</item><item>
		<title>By: snuffleupagus</title>
		<link>http://ask.metafilter.com/225602/Solution-to-OCR-many-bank-statements-into-excel#3272709</link>	
		<description>I don&apos;t have the knowhow, but maybe this could be done with something like ImageMagick?&lt;br&gt;
&lt;br&gt;
I&apos;m thinking of a workflow like this:&lt;br&gt;
&lt;br&gt;
scan the document to image file&lt;br&gt;
use imagemagick to extract certain regions of each page, given that the statements will be printed on a uniform bank template (at least, on a per-account basis) &lt;br&gt;
hand off those snippets to an OCR engine&lt;br&gt;
insert results into database/spreadsheet.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2012:site.225602-3272709</guid>
		<pubDate>Sun, 07 Oct 2012 11:28:11 -0800</pubDate>
		<dc:creator>snuffleupagus</dc:creator>
	</item>
	</channel>
</rss>
