And Jesus did turn image into text
December 13, 2008 9:41 AM   Subscribe

Is there a program I can download/buy/steal/borrow that can "read" scanned pictures of letters and transform them into digital text, like the letters I'm typing right now? Ideally this would include the unique letters and diacritical marks in many other languages like Portuguese, French, Russian, and Chinese. I ask because I would like to start reading the strange and fanciful books I've obtained overseas by scanning the pages and running them through the web 2.5 choiceness that is Google Fish.

I assume such a program is responsible for how I can search the text inside all those Google Books (Either that or Google hired one million ancient Sumerian scribes to hand type the text of every book ever written).

And if such a program exists you are all free to copy my brilliant idea.
posted by dgaicun to Computers & Internet (10 answers total) 6 users marked this as a favorite
 
Commercial, off-the-shelf OCR software has existed for, oh, 15 years. ReadIris, OmniPage, Textbridge are popular OCR products. I believe Acrobat Pro does it too if you have a PDF that is all image rather than text.
posted by kindall at 9:47 AM on December 13, 2008


Ocropus. I can't get it to work, but I am pretty new to linux.
posted by 517 at 9:54 AM on December 13, 2008


Response by poster: OK, "optical character recognition", thanks. Holy crap, I have Acrobat Pro!

I'll go check out what it can do; in the mean time, others with experience please give me recommendations and tips that you think will help with my specific OCR needs.
posted by dgaicun at 9:59 AM on December 13, 2008


Chinese is going to be the difficult part of this picture, I think. I highly recommend ReadIris Pro + the Asian language pack - I've used it for Japanese and Korean and it works great, so I don't doubt its Chinese abilities.
posted by soma lkzx at 10:06 AM on December 13, 2008


Many scanners (and scanner/printers) come with OCR software, so you may want to look at the complete set of software/drivers that came with your scanner. If you are the market for a new scanner, you can definitely find a fairly cheap on with OCR support.

Also, there is a slightly less busy version of Google Translate.
posted by McGuillicuddy at 12:57 PM on December 13, 2008


Evernote does OCR. Not sure it how well it will work with non-latin alphabets though.
posted by low affect at 4:28 PM on December 13, 2008


If you have MS Office installed, Microsoft Office Document Imaging has OCR. You may have to add it to your installed programs if it wasn't installed initially. Don't know how its performance compares with other products.
posted by NailsTheCat at 5:21 PM on December 13, 2008


Sorry to point out the obvious, but either your objective is the sheer lulz that reading things that have gone through an automatic translator brings, or there is a rather large flaw in your plan.
posted by Andorinha at 6:42 PM on December 13, 2008


For a FOSS (Free Open Source Software) solution try tesseract. Here is a script to get you started with automating OCR in linux with tesseract.
posted by jduckles at 7:05 PM on December 15, 2008


i got much better OCR results from MS Office Doc Scanning than from the tool that comes with my Canon scanner. MS Office Doc scanner is included into an Office package.
posted by webwesen at 11:10 AM on December 17, 2008


« Older Why can't I copy and paste in OS X 10.5.5?   |   I want to see individual threads. Newer »
This thread is closed to new comments.