How Many Homophones?
Which language has the most homophonic words (one sound, multiple spellings. In English, e.g., BEAR and BARE)? It's hard to do precise comparisons across languages because they differ in what counts as a word, in how complicated their inflectional system is, etc. But even approximate data would be useful. I saw one paper on automatic speech recognition which showed that the system made more errors on French than Italian German etc. and that most of them were due to homophones. But, where are some real facts about degree of homophony across languages?
It seems like you're focusing more on European languages, but the language with the most homophones has got to be Chinese. Hands down, if you don't account for tones.
Seconding amillionbillion. If I recall correctly, each word may be said with 4 distinct tone changes, and each intonation has a different meaning.
Of the languages written with the 26 letter ABC alphabet, English might be higher since many have spellings that are more consistent, less different combinations to make the same sound, etc...
Chinese must be up there.

I have a theory that Chinese is a hard language not just because it has so many homophones, but because every word sounds almost exactly like some other perfectly legit word. In English it seems to me that if you mispronounce a word then there is a good chance that you haven't said another English word, you've said a "not English" word. That cues the speaker (or listener) that a mistake has happened. In Chinese, pretty much every pronounceable syllable means something - many things actually. If you say ba4 instead of da4 then you've said a different word, not a non-word, and it takes a little more mental processing to realize that a mistake has been made.

I'm sure languagehat or other people who have actually studied this stuff will be happy to explain why I'm full of it.
Words don't need to be spelled differently to be homophonic (eg Buffalo buffalo buffalo).

Japanese contains lots of homophones (possibly due to the lack of syllables, and because they use Chinese loan words, but without the tones.)
I would imagine tones would count, though. I'd vote French myself, but I don't know the majority of the world's 6000-plus languages!

As a general rule, I would say that the more recently a language began to be written, the fewer the homophones, since sometimes sounds are lost but spellings maintained, which I suppose is why you have many homophonic words (or forms of words) in French - passé, passer, passez, etc. But sometimes older literary languages go through spelling reforms wherein spellings are changed to reflect pronunciation. In Hungarian, which is pretty intensely phonetic, I can't think of a single pair of true homophones, and strange pronunciations aside, can only think of a couple of obscure ways it would even be possible.
Regarding English: see this article on the PIN-PEN merger. PIN and PEN are homophones in Southern American English but not in Canadian English or British English.

Given that Brits, Canadians and Americans all speak English and that PIN-PEN is just one example (of many) of how different English accents contain different homophones I'm not sure how you'd go about counting the homophones of a language.
According to this site discussing Chinese homophones, Japanese has even more homophones than Chinese.

"Japanese has fewer syllables than Chinese, with only 349 syllables, while Chinese has 410 (if tones are not counted), so Japanese has more homophones than Chinese. The Japanese linguist Mochizuki Yasokichi (Wàngyuè Bāshíjí 望月八十吉) published “Homophones in Japanese and Chinese (Rìběnyǔ hé Zhōnguóyǔ d Tóngyīncí),” in the second volume of A Contrastive Study of the Japanese and Chinese Languages (Rìběnyǔ hé Zhōngguóyǔ d Duìzhào Yánjiū, University of Foreign Languages, Osaka, Japan [Rìběn Dàbǎn Wàiguóyǔ Dàxué], 1977). He discovered that Japanese homophones are three times more numerous than those found in Chinese."

Although Chinese has a lot of homophones too. Here is an example of a Chinese story made entirely of the same sound and only changing the tones.
It seems like you're focusing more on European languages, but the language with the most homophones has got to be Chinese. Hands down, if you don't account for tones.

I was actually going to suggest it would be Korean. It's probably impossible to know for sure either way, but yeah, the eastern languages obviously have many more homophones.
Yeah, I don't speak Chinese but I'm going to cast a vote for Japanese here, because there are a relatively small number of basic sounds in the language, and I know there are a stupid amount of kanji that can represent the same sounds.

In fact, I'm going to guess that that would be how you'd find this out: how many sounds does the language provide? That's probably the first indication that you're going to have a buttload of homophones.

However, I'm too lazy to do this work for you...although maybe theres a linguistics person on here who knows...?
I would agree with Japanese for the reasons above and because I know it, but I think that the real answer if there is one is probably a language like Hawaiian with even fewer sounds or another Polynesian language or whatever that language group is called. A Western European perspective is way to narrow for approaching this question.
posted by vincele at 4:46 PM on November 9, 2010

I don't have an exact answer for you, but what you're looking for will be found in consistency effects. Specifically, feedback inconsistencies, where one sound pattern maps onto multiple spelling patterns (the opposite, incidentally, is feedforward inconsistencies, where one spelling pattern maps onto multiple possible pronunciation patterns). I'll think about it some more and pop back in if I come up with something.
posted by iamkimiam at 4:49 PM on November 9, 2010

Hawaiian has a very small sound inventory and is not known for its wealth of homophony.
posted by Nomyte at 4:49 PM on November 9, 2010

I'm also assuming that you're wanting to look only at languages with orthography (spelling) systems. And in this case, your answer would likely be a language that has a deep orthography (scroll down for def.). But the language with the deepest orthographic system is not necessarily the one with the most homophony, so don't get tripped up there.
posted by iamkimiam at 4:54 PM on November 9, 2010

I'm looking for data, not guesses.

Chinese: tone counts.
Should tone count? I know of at least one poem in Chinese which explicitly plays with the idea of homophony, but uses different tones:

According to Wikipedia, in pinyin, it is written:
« Shī Shì shí shī shǐ »

Shíshì shīshì Shī Shì, shì shī, shì shí shí shī.
Shì shíshí shì shì shì shī.
Shí shí, shì shí shī shì shì.
Shì shí, shì Shī Shì shì shì.
Shì shì shì shí shī, shì shǐ shì, shǐ shì shí shī shìshì.
Shì shí shì shí shī shī, shì shíshì.
Shíshì shī, Shì shǐ shì shì shíshì.
Shíshì shì, Shì shǐ shì shí shì shí shī.
Shí shí, shǐ shí shì shí shī, shí shí shí shī shī.
Shì shì shì shì.
Since the whole point of the poem is that you're using homophonous words, but those words have different tones, then wouldn't it follow that Chinese words with the same sound but different tones are still homophonous?

For those who are interested, here is the English translation of the poem:
« Lion-Eating Poet in the Stone Den »

In a stone den was a poet Shi, who was a lion addict, and had resolved to eat ten.
He often went to the market to look for lions.
At ten o'clock, ten lions had just arrived at the market.
At that time, Shi had just arrived at the market.
He saw those ten lions, and using his trusty arrows, caused the ten lions to die.
He brought the corpses of the ten lions to the stone den.
The stone den was damp. He asked his servants to wipe it.
After the stone den was wiped, he tried to eat those ten lions.
When he ate, he realized that these ten lions were in fact ten stone lion corpses.
Try to explain this matter.

Guys, I think you missed the point of the Lion poem. Zhao Yuanren did not write it to show there are homophones in Chinese language. He wrote to show it is impractical to use pinyin to transliterate Literary Chinese. The language he used is not what you would use in every day conversation or normal writing. You can't use it as an example to show there are a lot of homophones in Chinese even if this is true.
posted by Carius at 6:29 PM on November 9, 2010

I see. Carry on, then.
tone counts because it acts like a phonetic feature. it's just western unfamiliarity with it that would cause one to think that MA (first tone) and MA (second tone) are homophones in the same way as PEAR and PEAR in English.

The terminology is not consistent, but for many people homophones = 1 sound, 2 or more spellings, homonyms = 1 sound, 1 spelling, 2 or more unrelated meanings (e.g., WATCH), and homographs = 1 spelling, 2 sounds (e.g., WIND).

anyway, I was looking for actual attempts to compare languages in degree of homophony but language hat doesn't seem to be listening I guess....
I second Korean; the majority of its vocabulary is derived from Chinese, but without the benefit of any of its homophones being distinguished by tonal pronunciation. Like Chinese, Korean words are often compounds of several syllables based on Chinese characters ("Hanja"). But several different Hanja could share the same pronunciation. "가" (pronounced "ga") could be "house", "add", "go", or "price". "장" (pronounced "jang") could be "place", "long", "leader" or "organ". The whole language is like this, and when discussing the meaning of such Chinese-derived words or Hanja, they are always referred to by pairing them with a word of purely Korean origin that is unmistakable in its meaning. Imagine the following conversation between two Koreans:

"My name is Kim Eun Jung."
"How do you spell that in Hanja? Is that silver Eun or grace Eun?
"It's grace Eun. And it's spirit Jung, not justice Jung.
passé, passer, passez

These aren't really homophones in the classic sense of the word, they're just spellings of different verb conjugations and tenses for the same root verb. Passed, to pass, you pass. It's difficult to understand when you're first learning French, but once you start speaking it's all contextual just like "I talk", "you talk", and "they talk" are. More of a spelling quirk than anything else, really.

I'm blanking on any real French homophones, but I only took 4 semesters in college and that was a long time ago.
My favourite French homophone is "le thé", but "la taie d'oreiller" because the change of gender always got me as a child.
Sara C.: eau, au.
