Building a search engine
Building a search engine - how to cope with mis-spellings?

I have quite a few full-text index searches that I maintain, and I'd love to make them cope with mis-spellings better. For example, if someone is looking for say 'refrigerator', then even if they try searching for 'fridge' or 'refridgerator' then they'll still get results. I'd also like to get this to work with place-names, so someone searching for 'Aberystwyth' will still get results if they spell it incorrectly. What's the best way to go about this? I've thought about using some sort of phonetic approach but this seems to be overkill for my needs. Any lists of common mis-spellings that google has found for me seem to be a bit inadequate for what I want too. Any suggestions?
Take advantage of open source and use Aspell. While it won't fix your refrigerator/fridge problem, it'll do well for honest misspellings. If you're using PHP, there's a good built-in Aspell API called Pspell.

I also recommend doing what Google does and doing Did you mean refrigerator? when someone searches for "refridgerator" instead of silently correcting it.
i could have sworn i read a paper not that long ago that had a detailed explanation of efficient searching for mis-spelt words, but i can't find the reference on my mailing list. however, looking back through the archives i did find tim bray's notes, which might be useful.
ah, found the paper - nrgrep. however, it's more for searching that working with pre-built indices.
Look into phonetic algorithms like Soundex or Metaphone. They compute a hash for how a word "sounds" so that you can search for other words that have the same hash.

For example, you might have an SQL query "SELECT * FROM table WHERE SOUNDEX(title) = SOUNDEX($search_term)" You may want to precompute the soundex value for some stuff, I imagine a full text search with soundex would be slow on a larger site.

MySQL and PHP support soundex and if you're using Perl you can grab Text::Soundex from CPAN.
Soundex looks like the way to go I think. Thanks guys!
I wouldn't go for pure soundex, you'll get waaay to many false positives. Soundex has its uses, but this isn't one of them I'm afraid.
I've heard soundex disparaged elsewhere as well, though this is the only place I can recall and reference for you right now.
You can easily check it out for yourself, just do "select soundex('wordofyourchoice');" in your favourite database. Or go to and try a few lookups using soundex.
I've done some experimenting, and soundex is frankly rubbish, but metaphone is far more promising (though this has notable failures, but far fewer than soundex).

select soundex(field) is not available in all db servers (notably informix, though it's very easy to add), and I doubt that select metaphone(field) is available in any.
