Soundex for images of text
July 28, 2005 2:52 PM
Subscribe
Does anyone know of a fairly fast algorithm for generating a "hash" of images which behaves like soundex, in that similar images have similar values? I know that's not possible in general, but I'm particularly interested in text - so what I really want is something that says "boo" is more like "bon" than "box" (in the character set I am using to type this, at least). You can assume no background noise, good orientation etc.
The emphasis on images is important because the character set could be anything in unicode. In other words, I want something that says whether "mathowie" is "mathowie", no matter how much embedded unicode (including non-latin characters that look similar to latin ones).
(
That's what made me think of asking this - but I'm asking out of curiousity, not because I want to push Matt towards implementing anything. I suspect it's a hard problem, but it also seems like something that people might have attempted before - signature recognition, for example)
posted by andrew cooke to computers & internet (17 comments total)
posted by devilsbrigade at 3:07 PM on July 28, 2005