Soundex for images of text
July 28, 2005 2:52 PM Subscribe
Does anyone know of a fairly fast algorithm for generating a "hash" of images which behaves like soundex, in that similar images have similar values? I know that's not possible in general, but I'm particularly interested in text - so what I really want is something that says "boo" is more like "bon" than "box" (in the character set I am using to type this, at least). You can assume no background noise, good orientation etc.
posted by andrew cooke to Computers & Internet (17 answers total)
The emphasis on images is important because the character set could be anything in unicode. In other words, I want something that says whether "mathowie" is "mathowie", no matter how much embedded unicode (including non-latin characters that look similar to latin ones).
's what made me think of asking this - but I'm asking out of curiousity, not because I want to push Matt towards implementing anything. I suspect it's a hard problem, but it also seems like something that people might have attempted before - signature recognition, for example)