Advice on primitive text recognition to beat Yahoo! Text Twist?
February 3, 2011 2:52 PM

Dear Hive, please give me pointers on making a text-recognition algorithm in Java for a program that wins at Yahoo! TextTwist.

I have a dictionary text-file, and an algorithm for guessing all of the right words very rapidly. All I'm missing is a way to turn this:

http://imgur.com/98Y6m

Into this:

String[] anagram = {"E", "H", "H", "I", "R", "T"};

I'm thinking of using the PixelGrabber class to count the number of pixels within a range of blue hues and then sorting the letters according to their pixel count. Any thoughts? Advice? Alternatives?

posted by JamesJD to Computers & Internet (10 answers total) 2 users marked this as a favorite

Since each letter will have a specific icon associated with it, you might be able to use a histogram-based search, similar to what you proposed. The only tricky part is isolating each icon in the image. Are you going to have them in the same place in all of your images?
posted by demiurge at 3:05 PM on February 3, 2011

This is an area where a bog-standard backpropagation neural network would perform really well - it's what was used for a lot of early OCR.
posted by logicpunk at 3:07 PM on February 3, 2011

They won't all be in the same order, but I should be able to chop the image I showed you into six pieces so that each time "A" appears it is roughly identical.

logicpunk, are there any opensource java classes I could download for that?

Thanks!
posted by JamesJD at 3:15 PM on February 3, 2011

Using a neural network is a huge overkill for this problem. You only need to match 26 exact images. OCR techniques are good for when the image you're trying to decode won't be identical. In this case, an 'A' will be pixel-to-pixel identical with another 'A' every time. As long as you figure out where to chop the images, even a diff between 26 image icons will not be very time consuming.
posted by demiurge at 3:20 PM on February 3, 2011

If I had to do this while making only a few assumptions about the source image (thereby allowing Yahoo to change fonts, etc.), I'd use techniques from content-based image retrieval, probably something like:

Convert the image to black & white using an appropriate filter.
Run canny edge detection on the image.
Use Shape Contexts to identify the most probable matching letters from a set of training letters to the letters in the image.

If you're not worried about Yahoo changing the images, I'd use a naive nearest-neighbor search to match the letters. The only question is how you choose to characterize the images — I'd probably try breaking each image representing a single letter into a 5x5 grid, counting the number of dark pixels in each cell. Each image is thus characterized by a set of 25 numbers, and you'd find the matching letter for an image by finding the "closest" in your training set (try the least of the sum of the squares of the differences between the matching elements).
posted by RichardP at 3:43 PM on February 3, 2011

demiurge is right; this isn't an OCR problem or any kind of machine learning problem. (The OP did not ask how to play TextTwist with arbitrary letter images!)

Since there are only 26 images, maybe you shouldn't worry about the speed of the search itself; the real bottleneck might be whatever processing you do to the query image. Depending on your application, maybe you shouldn't worry about speed at all, since it could end up making your program unnecessarily complicated.

The first method that occurs to me is just a direct comparison: take, say, the red channel of the query image and compare that directly to the red channel of each of the A-Z example images until there's only one possible match. If you really care about speed, you could sort the examples and use a binary search, which seems to be what you had in mind. But remember that the distribution of letters isn't random—depending on how letters are distributed in English, it might be faster to use a linear search with the letters sorted by their frequency.

You could also save a little time by using only part of the images (e.g. chop out the rectangle that contains the letter, since the rest is always the same).
posted by Chicken Boolean at 4:03 PM on February 3, 2011

Let me ask the dumb question here: do you have a good reason not to just ask the user to type in the letters?

Requiring image recognition makes something nontrivial out of a program that should take only two lines in any reasonable language and definitely not more than fifty in Java.
posted by d. z. wang at 6:16 PM on February 3, 2011

Wait-- maybe I don't follow, but if the images are literally pixel-exact every time...how fast does this have to be? Like, the easiest thing ever would be to add all the letters to hashtable and hash into the table with the image. Like that would take two seconds.

No need for all this crazy stuff.

Even that's too slow, hash differently. E.g. use the size of the compressed image in bytes if they are all distinct, since you don't have to compute anything (the compression algorithm is your hash function). Even if they are not all distinct you can use the ones that are and come up with a dead simple disambiguation for the others so your special hash function is:

hash (image) {
if image.number_of_bytes != AMBIGUOUS VALUE then {
return image.number_of_bytes;
} else if image.rgba(x,y) == BLUE then {
return THE_BLUE_HASH_VALUE;
} else
return THE_OTHER_POSSIBLE_HASH_VALUE
}

Look up is constant time, space is miniscule.

Again, maybe I'm not understanding and the images are hand-selected and thus not necessarily pixel aligned or something.

If they are the strategy for the whole thing is:
- init hashtable at program start, you can save these to a file or whatever and load it up when the program starts:
- create hash table
- load image for 'a'
- hashtable.add(a.hash, 'a')
... for each of the letters in the alphabet

for each tileset:
- anagram = {};
- numletters = tileimage.width mod letterwidth
- for 0 to numletters
- offset into the image
- hash it
- hashtable.lookup resultant hashcode
- add resultant letter to anagram.
posted by jeb at 7:45 PM on February 3, 2011

yeah, demiurge is right - since all the letter A's, for example, should be identical, neural nets are probably a bit much.

I just really like neural nets
posted by logicpunk at 8:41 PM on February 3, 2011

Thanks guys, I looked around a little and using the PixelGrabber class and comparing for identical images worked like a charm!
posted by JamesJD at 12:53 PM on February 22, 2011

« Older Movies and books to expand vocabulary | Have you been to Beirut? Newer »

This thread is closed to new comments.

Ask MetaFilter

Advice on primitive text recognition to beat Yahoo! Text Twist?
February 3, 2011 2:52 PM

Tags

Share

Advice on primitive text recognition to beat Yahoo! Text Twist? February 3, 2011 2:52 PM

Tags

Share

Advice on primitive text recognition to beat Yahoo! Text Twist?
February 3, 2011 2:52 PM