What percentage of English words have three syllables?
November 12, 2013 7:12 AM Subscribe
What percentage of English words have three syllables?
Response by poster: Good question. I am asking about the former, but an answer to the latter would be edifying as well.
posted by reverend cuttle at 7:35 AM on November 12, 2013
posted by reverend cuttle at 7:35 AM on November 12, 2013
Best answer: According to this paper, in a lexicon of 20,000 words, about 22% were three-syllable words.
(There's also some data on corpus frequency in there, so you might be able to approximate an answer to question #2 if you play around with it.)
posted by Now there are two. There are two _______. at 7:46 AM on November 12, 2013
(There's also some data on corpus frequency in there, so you might be able to approximate an answer to question #2 if you play around with it.)
posted by Now there are two. There are two _______. at 7:46 AM on November 12, 2013
Also keep in mind that there are many words that are three syllables in standard American pronunciation and two in standard British (e.g. 'battery').
posted by tractorfeed at 8:13 AM on November 12, 2013 [1 favorite]
posted by tractorfeed at 8:13 AM on November 12, 2013 [1 favorite]
As tractorfeed says, answers may vary.
Rev-rend, Rev-er-end
Ah, ha-ha.
posted by SLC Mom at 8:43 AM on November 12, 2013
Rev-rend, Rev-er-end
Ah, ha-ha.
posted by SLC Mom at 8:43 AM on November 12, 2013
There are scripts that give their best guess for number of syllables, like hyphenation algorithms. They're not perfect, but they're probably the closest you'll come without direct access to real linguistic databases. Apply that to something like /usr/share/dict/words on a Linux or Unix system.
posted by supercres at 8:51 AM on November 12, 2013
posted by supercres at 8:51 AM on November 12, 2013
Best answer: The CMU Pronouncing Dictionary marks the stress of each syllable of a word with a number, so I counted the number of syllables in every word with this super dumb method:
(Get rid of everything that isn't a number, then count how many digits are left on each line, and finally count how many of each result.)
The result was (slightly edited for clarity):
Bear in mind that the CMU dictionary has a lot of proper names and Spanish words in it (for some reason). Also I may have made stupid mistakes.
posted by moonmilk at 9:28 AM on November 12, 2013 [2 favorites]
perl -n -e 's/\D//g; print length($_)."\n";' < dictionary.txt | sort | uniq -c
(Get rid of everything that isn't a number, then count how many digits are left on each line, and finally count how many of each result.)
The result was (slightly edited for clarity):
16258 1 57693 2 37983 3 15271 4 4709 5 1193 6 211 7 30 8 5 9 1 11 2 12 1 14So that's 37,983 words of 3 syllables out of 133,357 words, or about 28%.
Bear in mind that the CMU dictionary has a lot of proper names and Spanish words in it (for some reason). Also I may have made stupid mistakes.
posted by moonmilk at 9:28 AM on November 12, 2013 [2 favorites]
Best answer: Of course, now we're all wondering what the 14 syllable word is. It's SUPERCALIFRAGILISTICEXPEALIDOSHUS, and I'm not at all convinced they spelled it right!
The two 12 syllable words are ANTIDISESTABLISHMENTARIANISM and N92762, which is a total fakeout because (1) it has numbers in it, which messed up my count, and (2) WHY IS N92762 IN THE DICTIONARY??
posted by moonmilk at 9:31 AM on November 12, 2013 [1 favorite]
The two 12 syllable words are ANTIDISESTABLISHMENTARIANISM and N92762, which is a total fakeout because (1) it has numbers in it, which messed up my count, and (2) WHY IS N92762 IN THE DICTIONARY??
posted by moonmilk at 9:31 AM on November 12, 2013 [1 favorite]
Best answer: Now there are two. There are two _______.: "According to this paper, in a lexicon of 20,000 words, about 22% were three-syllable words. "
moonmilk: "So that's 37,983 words of 3 syllables out of 133,357 words, or about 28%."
It doesn't seem wildly unreasonable to assume that the percentage will continue to rise asymptotically as the lexicon size increases.
posted by turkeyphant at 10:48 AM on November 12, 2013
moonmilk: "So that's 37,983 words of 3 syllables out of 133,357 words, or about 28%."
It doesn't seem wildly unreasonable to assume that the percentage will continue to rise asymptotically as the lexicon size increases.
posted by turkeyphant at 10:48 AM on November 12, 2013
Best answer: I suspect that's right. There's a tendency for high-frequency words to be short. Larger lexica will have more low-frequency words. So larger lexica will probably have more long words — and in English, a three-syllable word counts as long.
posted by Now there are two. There are two _______. at 8:18 AM on November 13, 2013
posted by Now there are two. There are two _______. at 8:18 AM on November 13, 2013
This thread is closed to new comments.
posted by Now there are two. There are two _______. at 7:32 AM on November 12, 2013 [1 favorite]