<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:admin="http://webns.net/mvcb/"
     xmlns:content="http://purl.org/rss/1.0/modules/content/"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
	<channel> 

	<title>Comments on: Fuzzy logic search to find international characters.</title>
	<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters/</link>
	<description>Comments on Ask MetaFilter post Fuzzy logic search to find international characters.</description>
	<pubDate>Thu, 13 Apr 2006 03:07:54 -0800</pubDate>
	<lastBuildDate>Thu, 13 Apr 2006 03:07:54 -0800</lastBuildDate>
	<language>en-us</language>
	<docs>http://blogs.law.harvard.edu/tech/rss</docs>
	<ttl>60</ttl>

	<item>
		<title>Question: Fuzzy logic search to find international characters.</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters</link>	
		<description>I&apos;ve got a database of names, many of which have international characters (e-acute, c-cedilla, o-umlaut, etc). I want the search routine to be clever enough that if I search for &quot;Celik&quot; it&apos;ll find c-cedilla-elik, even though &quot;c&quot; and &quot;c-cedilla&quot; are entirely different.  &lt;br /&gt;&lt;br /&gt; Does a look-up table exist that matches whole range of such non-English letters with their nearest-looking English equivalents? Or can anyone here help me construct one?&lt;br&gt;
&lt;br&gt;
I&apos;m thinking o and u umluat, c and s cedilla, o circumflex, Turkish g and undotted-i, Scandinavian o with a line through it, Spanish n, e with a grave and acute, accented a, the dipthongs. &lt;br&gt;
&lt;br&gt;
Any more for any more?</description>
		<guid isPermaLink="false">post:ask.metafilter.com,2006:site.36258</guid>
		<pubDate>Thu, 13 Apr 2006 02:57:37 -0800</pubDate>
		<dc:creator>Pericles</dc:creator>
		
			<category>international</category>
		
			<category>extended</category>
		
			<category>Latin</category>
		
			<category>characters</category>
		
			<category>search</category>
		
	</item> <item>
		<title>By: grouse</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563095</link>	
		<description>Perhaps someone will have a better or pre-made solution. But one thing you could do is use the &lt;a href=&quot;http://www.unicode.org/ucd/&quot;&gt;Unicode Character Database&lt;/a&gt;, which will allow you to decompose, e.g. U+00C7 LATIN CAPITAL LETTER C WITH CEDILLA into U+0043 LATIN CAPITAL LETTER C, U+0327 COMBINING CEDILLA.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563095</guid>
		<pubDate>Thu, 13 Apr 2006 03:07:54 -0800</pubDate>
		<dc:creator>grouse</dc:creator>
	</item><item>
		<title>By: Goofyy</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563108</link>	
		<description>Don&apos;t forget a-umluat.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563108</guid>
		<pubDate>Thu, 13 Apr 2006 04:11:50 -0800</pubDate>
		<dc:creator>Goofyy</dc:creator>
	</item><item>
		<title>By: Sharcho</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563111</link>	
		<description>At least for MySQL, you can simply use the LIKE operator,&lt;br&gt;
e.g. &lt;code&gt;SELECT &apos;&#225;&#231;&#241;&apos; LIKE &apos;acn&apos;&lt;/code&gt; returns 1, but &lt;code&gt;SELECT &apos;&#229;&apos; LIKE &apos;a&apos;&lt;/code&gt; returns 0, so it doesn&apos;t work in all cases, but it might be good enough for you.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563111</guid>
		<pubDate>Thu, 13 Apr 2006 04:17:02 -0800</pubDate>
		<dc:creator>Sharcho</dc:creator>
	</item><item>
		<title>By: public</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563117</link>	
		<description>Are you using MySQL?&lt;br&gt;
&lt;br&gt;
Switch to a sensible DB, e.g. PostgreSQL.&lt;br&gt;
&lt;br&gt;
or&lt;br&gt;
&lt;br&gt;
Convert your (VAR)CHAR/TEXT columns to BLOB. I expect it won&apos;t try and be clever about things then.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563117</guid>
		<pubDate>Thu, 13 Apr 2006 04:39:02 -0800</pubDate>
		<dc:creator>public</dc:creator>
	</item><item>
		<title>By: grouse</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563122</link>	
		<description>&lt;em&gt;Convert your (VAR)CHAR/TEXT columns to BLOB. I expect it won&apos;t try and be clever about things then.&lt;/em&gt;&lt;br&gt;
&lt;br&gt;
I don&apos;t understand this response. The OP wants more cleverness, not less.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563122</guid>
		<pubDate>Thu, 13 Apr 2006 05:01:00 -0800</pubDate>
		<dc:creator>grouse</dc:creator>
	</item><item>
		<title>By: Pericles</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563125</link>	
		<description>It&apos;s an Oracle database.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563125</guid>
		<pubDate>Thu, 13 Apr 2006 05:07:25 -0800</pubDate>
		<dc:creator>Pericles</dc:creator>
	</item><item>
		<title>By: Yeomans</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563131</link>	
		<description>Try &lt;a href=&quot;http://www.devon-technologies.com/products/devonthink/uniquefeatures.html&quot;&gt;DevonThink Pro&lt;/a&gt;. I don&apos;t know how extensive the &quot;character equivocation&quot; (my term) can get, but it has a stunningly-intelligent fuzzy logic search and can, from my experience, ignore/&apos;equivocate&apos; umlauts.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563131</guid>
		<pubDate>Thu, 13 Apr 2006 05:19:02 -0800</pubDate>
		<dc:creator>Yeomans</dc:creator>
	</item><item>
		<title>By: tperrigo</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563160</link>	
		<description>You might want to check out the &lt;a href=&quot;http://www.merriampark.com/ld.htm&quot;&gt;levenshtein distance&lt;/a&gt; algorithm.  It calculates the &quot;distance&quot; between two strings (the number of edits it would take to transform 1 string into another).  In your case, although &quot;c&quot; and &quot;c-cedilla&quot; are different characters, the levenshtein distance between them would be 1 (so you would find it if you searched for all matches within a certain range).  &lt;a href=&quot;http://www.merriampark.com/ldplsql.htm&quot;&gt;Here&lt;/a&gt; is an implementation in Oracle&apos;s PL/SQL.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563160</guid>
		<pubDate>Thu, 13 Apr 2006 06:10:17 -0800</pubDate>
		<dc:creator>tperrigo</dc:creator>
	</item><item>
		<title>By: nebulawindphone</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563168</link>	
		<description>(&apos;Course, the levenshtein distance between &quot;C&quot; and &quot;Q&quot; or &quot;j&quot; or &quot;!&quot; is also 1.  Pericles wants &quot;Celik&quot; to match &quot;&#199;elik&quot; but not &quot;Qelik&quot; or &quot;jelik&quot; or &quot;!elik&quot;.)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563168</guid>
		<pubDate>Thu, 13 Apr 2006 06:15:04 -0800</pubDate>
		<dc:creator>nebulawindphone</dc:creator>
	</item><item>
		<title>By: Pericles</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563191</link>	
		<description>I&apos;m pretty sure a dumb look-up table would do it. User types in Celik, search routine goes to table and sees &amp;ccedil; next to &quot;c&quot; and so also searches for that.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563191</guid>
		<pubDate>Thu, 13 Apr 2006 06:34:45 -0800</pubDate>
		<dc:creator>Pericles</dc:creator>
	</item><item>
		<title>By: zpousman</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563198</link>	
		<description>To tag along, I have this problem in my iTunes library. Any ideas for solving that problem (other than chaning the names of tons of bands and songs, of course)?</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563198</guid>
		<pubDate>Thu, 13 Apr 2006 06:42:46 -0800</pubDate>
		<dc:creator>zpousman</dc:creator>
	</item><item>
		<title>By: adamrice</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563225</link>	
		<description>Depending on how you&apos;re setting things up, one way to do this is to store a hidden field that has the ASCII-fied version of your accented strings, and whenever someone does a search, run their query string through the same ASCII-fier before doing the search.&lt;br&gt;
&lt;br&gt;
Obviously this would get clunky if you&apos;re dealing with long blocks of text, but I did use something like this technique when there were only a few searchable fields. The overhead was low and it worked well.&lt;br&gt;
&lt;br&gt;
I did a little googling and ran across the following links that may be of use to you:&lt;br&gt;
1. &lt;a href=&quot;http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10749/ch5lingsort.htm&quot;&gt;Linguistic sorting &amp;amp; string searching&lt;/a&gt;. This is specifically about Oracle.&lt;br&gt;
2. &lt;a href=&quot;http://www.4guysfromrolla.com/webtech/040599-2.shtml&quot;&gt;Multilingual databasing techniques&lt;/a&gt;. This discusses my ASCII-fied technique, along with a clever variation on it.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563225</guid>
		<pubDate>Thu, 13 Apr 2006 07:15:07 -0800</pubDate>
		<dc:creator>adamrice</dc:creator>
	</item><item>
		<title>By: Pericles</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563276</link>	
		<description>Thanks Adam Rice. I did do a little googling (honest!) but wasn&apos;t clever enough on my search terms, evidently ...</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563276</guid>
		<pubDate>Thu, 13 Apr 2006 08:12:16 -0800</pubDate>
		<dc:creator>Pericles</dc:creator>
	</item><item>
		<title>By: costas</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563310</link>	
		<description>This may be overkill for what you&apos;re looking for, but IBM&apos;s &lt;a href=&quot;http://www-306.ibm.com/software/globalization/icu/index.jsp&quot;&gt;ICU library&lt;/a&gt; (in Java and C++), has transliteration abilities of the type you&apos;re looking for.  I haven&apos;t used it myself (yet anyway), but from the examples they have online for my own target language (Greek), the features are pretty impressive.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563310</guid>
		<pubDate>Thu, 13 Apr 2006 08:52:55 -0800</pubDate>
		<dc:creator>costas</dc:creator>
	</item><item>
		<title>By: odinsdream</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563381</link>	
		<description>If you do run a search against an ASCII-only version, be sure you aren&apos;t making it more difficult for users who already know how to type international characters to find exactly what they&apos;re looking for.&lt;br&gt;
&lt;br&gt;
That is, if you search for Jacob, and find Jac&#246;b, great, but if you search for Jac&#246;b and get no results, that&apos;s bad.</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563381</guid>
		<pubDate>Thu, 13 Apr 2006 10:15:38 -0800</pubDate>
		<dc:creator>odinsdream</dc:creator>
	</item><item>
		<title>By: mrg</title>
		<link>http://ask.metafilter.com/36258/Fuzzy-logic-search-to-find-international-characters#563440</link>	
		<description>have you thought about using &lt;a href=&quot;http://en.wikipedia.org/wiki/Soundex&quot;&gt;soundex&lt;/a&gt; or &lt;a href=&quot;http://aspell.net/metaphone/&quot;&gt;metaphone&lt;/a&gt; algorithms for this? they generate a key for the string based on how the string sounds. (both those algorithms were originally intended for use with English but I imagine you could or someone has adapted them for other languages and such.)</description>
		<guid isPermaLink="false">comment:ask.metafilter.com,2006:site.36258-563440</guid>
		<pubDate>Thu, 13 Apr 2006 11:15:19 -0800</pubDate>
		<dc:creator>mrg</dc:creator>
	</item>
	</channel>
</rss>
