What percentage of English words have three syllables?
November 12, 2013 7:12 AM   Subscribe

What percentage of English words have three syllables?
posted by reverend cuttle to Writing & Language (10 answers total) 1 user marked this as a favorite
 
Do you mean "what percentage of words in the dictionary have three syllables?" or "in an actual English text, how often will you encounter three-syllable words?"
posted by Now there are two. There are two _______. at 7:32 AM on November 12, 2013 [1 favorite]


Good question. I am asking about the former, but an answer to the latter would be edifying as well.
posted by reverend cuttle at 7:35 AM on November 12, 2013


According to this paper, in a lexicon of 20,000 words, about 22% were three-syllable words.

(There's also some data on corpus frequency in there, so you might be able to approximate an answer to question #2 if you play around with it.)
posted by Now there are two. There are two _______. at 7:46 AM on November 12, 2013


Also keep in mind that there are many words that are three syllables in standard American pronunciation and two in standard British (e.g. 'battery').
posted by tractorfeed at 8:13 AM on November 12, 2013 [1 favorite]


As tractorfeed says, answers may vary.
Rev-rend, Rev-er-end
Ah, ha-ha.
posted by SLC Mom at 8:43 AM on November 12, 2013


There are scripts that give their best guess for number of syllables, like hyphenation algorithms. They're not perfect, but they're probably the closest you'll come without direct access to real linguistic databases. Apply that to something like /usr/share/dict/words on a Linux or Unix system.
posted by supercres at 8:51 AM on November 12, 2013


The CMU Pronouncing Dictionary marks the stress of each syllable of a word with a number, so I counted the number of syllables in every word with this super dumb method:

perl -n -e 's/\D//g; print length($_)."\n";' < dictionary.txt | sort | uniq -c

(Get rid of everything that isn't a number, then count how many digits are left on each line, and finally count how many of each result.)

The result was (slightly edited for clarity):
16258 1
57693 2
37983 3
15271 4
4709 5
1193 6
 211 7
  30 8
   5 9
   1 11
   2 12
   1 14
So that's 37,983 words of 3 syllables out of 133,357 words, or about 28%.

Bear in mind that the CMU dictionary has a lot of proper names and Spanish words in it (for some reason). Also I may have made stupid mistakes.
posted by moonmilk at 9:28 AM on November 12, 2013 [2 favorites]


Of course, now we're all wondering what the 14 syllable word is. It's SUPERCALIFRAGILISTICEXPEALIDOSHUS, and I'm not at all convinced they spelled it right!

The two 12 syllable words are ANTIDISESTABLISHMENTARIANISM and N92762, which is a total fakeout because (1) it has numbers in it, which messed up my count, and (2) WHY IS N92762 IN THE DICTIONARY??
posted by moonmilk at 9:31 AM on November 12, 2013 [1 favorite]


Now there are two. There are two _______.: "According to this paper, in a lexicon of 20,000 words, about 22% were three-syllable words. "

moonmilk: "So that's 37,983 words of 3 syllables out of 133,357 words, or about 28%."

It doesn't seem wildly unreasonable to assume that the percentage will continue to rise asymptotically as the lexicon size increases.
posted by turkeyphant at 10:48 AM on November 12, 2013


I suspect that's right. There's a tendency for high-frequency words to be short. Larger lexica will have more low-frequency words. So larger lexica will probably have more long words — and in English, a three-syllable word counts as long.
posted by Now there are two. There are two _______. at 8:18 AM on November 13, 2013


« Older How [Not] To Job Search   |   Iceland in the winter Newer »
This thread is closed to new comments.