<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Building a search engine</title>
	<link>http://ask.metafilter.com/8954/Building-a-search-engine/</link>
	<description>Comments on Ask MetaFilter post Building a search engine</description>
	<pubDate>Mon, 26 Jul 2004 05:46:14 -0800</pubDate>
	<lastBuildDate>Mon, 26 Jul 2004 05:46:14 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Building a search engine</title>
		<link>http://ask.metafilter.com/8954/Building-a-search-engine</link>	
		<description>Building a search engine - how to cope with mis-spellings? &lt;br /&gt;&lt;br /&gt; I have quite a few full-text index searches that I maintain, and I&apos;d love to make them cope with mis-spellings better.  For example, if someone is looking for say &apos;refrigerator&apos;, then even if they try searching for &apos;fridge&apos; or &apos;refridgerator&apos; then they&apos;ll still get results.  I&apos;d also like to get this to work with place-names, so someone searching for &apos;Aberystwyth&apos; will still get results if they spell it incorrectly.  What&apos;s the best way to go about this?  I&apos;ve thought about using some sort of phonetic approach but this seems to be overkill for my needs. Any lists of common mis-spellings that google has found for me seem to be a bit inadequate for what I want too.  Any suggestions?</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2004:site.8954</guid>
		<pubDate>Mon, 26 Jul 2004 04:58:12 -0800</pubDate>
		<dc:creator>BigCalm</dc:creator>
		
			<category>database</category>
		
			<category>programming</category>
		
			<category>data</category>
		
			<category>integrity</category>
		
			<category>misspelling</category>
		
	</item> <item>
		<title>By: zsazsa</title>
		<link>http://ask.metafilter.com/8954/Building-a-search-engine#169685</link>	
		<description>Take advantage of open source and use &lt;a href=&quot;http://aspell.sourceforge.net/&quot;&gt;Aspell&lt;/a&gt;.  While it won&apos;t fix your refrigerator/fridge problem, it&apos;ll do well for honest misspellings.  If you&apos;re using PHP, there&apos;s a good built-in &lt;a href=&quot;http://us2.php.net/manual/en/ref.pspell.php&quot;&gt;Aspell API called Pspell&lt;/a&gt;.&lt;br&gt;
&lt;br&gt;
I also recommend doing what Google does and doing &lt;em&gt;Did you mean &lt;strong&gt;refrigerator&lt;/strong&gt;?&lt;/em&gt; when someone searches for &quot;refridgerator&quot; instead of silently correcting it.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.8954-169685</guid>
		<pubDate>Mon, 26 Jul 2004 05:46:14 -0800</pubDate>
		<dc:creator>zsazsa</dc:creator>
	</item><item>
		<title>By: andrew cooke</title>
		<link>http://ask.metafilter.com/8954/Building-a-search-engine#169687</link>	
		<description>i could have sworn i read a paper not that long ago that had a detailed explanation of efficient searching for mis-spelt words, but i can&apos;t find the reference on my mailing list.  however, looking back through the archives i did find &lt;a href=&quot;http://www.tbray.org/ongoing/When/200x/2003/07/30/OnSearchTOC&quot;&gt;tim bray&lt;/a&gt;&apos;s notes, which might be useful.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.8954-169687</guid>
		<pubDate>Mon, 26 Jul 2004 05:52:42 -0800</pubDate>
		<dc:creator>andrew cooke</dc:creator>
	</item><item>
		<title>By: andrew cooke</title>
		<link>http://ask.metafilter.com/8954/Building-a-search-engine#169689</link>	
		<description>ah, found the paper - &lt;a href=&quot;http://www.dcc.uchile.cl/~gnavarro/software/&quot;&gt;nrgrep&lt;/a&gt;.  however, it&apos;s more for searching that working with pre-built indices.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.8954-169689</guid>
		<pubDate>Mon, 26 Jul 2004 05:57:27 -0800</pubDate>
		<dc:creator>andrew cooke</dc:creator>
	</item><item>
		<title>By: revgeorge</title>
		<link>http://ask.metafilter.com/8954/Building-a-search-engine#169698</link>	
		<description>Look into &lt;a href=&quot;http://en.wikipedia.org/wiki/Soundex&quot;&gt;phonetic algorithms&lt;/a&gt; like Soundex or Metaphone.  They compute a hash for how a word &quot;sounds&quot; so that you can search for other words that have the same hash.  &lt;br&gt;
&lt;br&gt;
For example, you might have an SQL query &quot;SELECT * FROM table WHERE SOUNDEX(title) = SOUNDEX($search_term)&quot;  You may want to precompute the soundex value for some stuff, I imagine a full text search with soundex would be slow on a larger site.&lt;br&gt;
&lt;br&gt;
MySQL and PHP support soundex and if you&apos;re using Perl you can grab Text::Soundex from CPAN.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.8954-169698</guid>
		<pubDate>Mon, 26 Jul 2004 06:55:00 -0800</pubDate>
		<dc:creator>revgeorge</dc:creator>
	</item><item>
		<title>By: BigCalm</title>
		<link>http://ask.metafilter.com/8954/Building-a-search-engine#169721</link>	
		<description>Soundex looks like the way to go I think.  Thanks guys!</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.8954-169721</guid>
		<pubDate>Mon, 26 Jul 2004 08:21:41 -0800</pubDate>
		<dc:creator>BigCalm</dc:creator>
	</item><item>
		<title>By: fvw</title>
		<link>http://ask.metafilter.com/8954/Building-a-search-engine#169738</link>	
		<description>I wouldn&apos;t go for pure soundex, you&apos;ll get waaay to many false positives. Soundex has its uses, but this isn&apos;t one of them I&apos;m afraid.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.8954-169738</guid>
		<pubDate>Mon, 26 Jul 2004 10:28:51 -0800</pubDate>
		<dc:creator>fvw</dc:creator>
	</item><item>
		<title>By: weston</title>
		<link>http://ask.metafilter.com/8954/Building-a-search-engine#169876</link>	
		<description>I&apos;ve heard soundex disparaged elsewhere as well, though &lt;a href=&quot;http://yro.slashdot.org/article.pl?sid=03/06/08/2042247&amp;tid=158&amp;tid=103&amp;tid=17&quot;&gt;this &lt;/a&gt; is the only place I can recall and reference for you right now.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.8954-169876</guid>
		<pubDate>Mon, 26 Jul 2004 15:34:36 -0800</pubDate>
		<dc:creator>weston</dc:creator>
	</item><item>
		<title>By: fvw</title>
		<link>http://ask.metafilter.com/8954/Building-a-search-engine#170025</link>	
		<description>You can easily check it out for yourself, just do &quot;select soundex(&apos;wordofyourchoice&apos;);&quot; in your favourite database. Or go to &lt;a href=&quot;http://www.dict.org/&quot;&gt;dict.org&lt;/a&gt; and try a few lookups using soundex.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.8954-170025</guid>
		<pubDate>Tue, 27 Jul 2004 02:27:56 -0800</pubDate>
		<dc:creator>fvw</dc:creator>
	</item><item>
		<title>By: BigCalm</title>
		<link>http://ask.metafilter.com/8954/Building-a-search-engine#170028</link>	
		<description>I&apos;ve done some experimenting, and soundex is frankly rubbish, but metaphone is far more promising (though this has notable failures, but far fewer than soundex).&lt;br&gt;
&lt;br&gt;
select soundex(field) is not available in all db servers (notably informix, though it&apos;s very easy to add), and I doubt that select metaphone(field) is available in any.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2004:site.8954-170028</guid>
		<pubDate>Tue, 27 Jul 2004 04:06:30 -0800</pubDate>
		<dc:creator>BigCalm</dc:creator>
	</item>
	</channel>
</rss>
