What are the most common 20% of words in any language?
June 14, 2009 4:11 PM   Subscribe

What are the most common 20% of words in any language?

In the interests of accelerated language learning, it would be useful to know the 20% or so of the words in any language that are used 80% or the time (Pareto's principle).

Trying to find a list of the most commonly used words - either a general list for all languages, or if there are lists for specific languages, even better
posted by jinatrix to Writing & Language (12 answers total) 3 users marked this as a favorite
The theoretical background: Zipf's law
For English: Most common words; Basic English

Because words (especially the most common 'functional' words) often don't have a 1:1 correlation across languages, creating a general list would probably be futile.
posted by Paragon at 4:19 PM on June 14, 2009

If you're going for optimization, go for the verbs, since IME verbs are the keystones of every utterance.
posted by @troy at 4:23 PM on June 14, 2009

That's surely not how language learning works. Memorizing words isn't really the hard part about language learning.

Also, some languages don't quite have "words" the way Indo-European languages do.
posted by Casuistry at 4:24 PM on June 14, 2009

Depending upon how you frame the question, there might be a quarter million words in English... or three quarters of a million. Even if you take the bottom figure, applying your principal means you're talking about 50,000 words. That's a big list. Even highly-educated people are thought to top out at 20,000 words or so.

There may be value in memorizing a language's most common words, but I don't think it would be on the scale envisioned here.
posted by Conrad Cornelius o'Donald o'Dell at 4:27 PM on June 14, 2009

I think I've heard that 80% of English is a list of about 200 words. Even if that's true, it's largely irrelevant because it's the other 20% which carry most of the meaning. A sentence may be made up of 10 or 15 common words andone or two unusual ones, but it's the unusual ones which are the point, and if you don't understand them you won't understand the sentence. (E.g. "I want to buy xxxx." The first four words are common and easy (well, disregarding the complex verb tense), but if you don't know the last one, the sentence doesn't convey much to you.)

Taking your principle (20%) doesn't help you as much as you might think. English has something in excess of half a million words in its living vocabulary (and tens of thousands more which are archaic); 20% is more than a hundred thousand. Not very short, as "short lists" go.

Other languages don't tend to have as rich of vocabularies as English (which is notorious for that) but 20% will still be tens of thousands of words.
posted by Chocolate Pickle at 4:44 PM on June 14, 2009 [3 favorites]

I would go for the top 1,000 or so words, rather then the top 20%. The top 20% probably extend far beyond 80% and into something like 99.9% of words spoken.
posted by delmoi at 5:26 PM on June 14, 2009

W/r/t what Chocolate Pickle is saying, check out Wikipedia in Simple English.
posted by Conrad Cornelius o'Donald o'Dell at 5:49 PM on June 14, 2009

Response by poster: Thanks for the responses thus far. To clarify - I'm looking for the most common words as a starting point; it doesn't have to be 20% (I do understand that this would be a huge number in many languages). The purpose is not to cut corners but to give an idea of the best words to tackle first.
posted by jinatrix at 6:04 PM on June 14, 2009

Ah, but the most common words are not necessarily the best words to tackle first!

There are people out there who are experts at teaching languages. Sounds like you're trying to reinvent the wheel. Why not let the experts teach you what they know? They've already figured out what the best words are to tackle first.
posted by Chocolate Pickle at 6:42 PM on June 14, 2009

jinastrix - Keeping track of the common words is one of the easiest parts of most languages, so I don't know how you'll really "accelerate" anything. I like to give examples from Hungarian, because it's the language I'm concentrating on now, plus it's not an Indo-European language, so the differences between it and English are often stark.

Some of the most common words in English would include the following: in, it, on, at, my, your, their, have, has, had, to, from. In Hungarian, none of these are really words at all. Similarly, a Hungarian would tell you that English has no words for common Hungarian words, so this must be expressed through longer phrases: Mikorra?, for instance, means "By what time?", but to a Hungarian it's an everyday word. And although they have words for "a/an" and "the," they're used in very different frequencies. You can't really say "My friend Bill" in Hungarian without the word "the," and the word "a/an" is frequently not used where we'd have to use it in English.

Learning verbal equivalents and nouns is a good bet, though, because these things are more likely to work the same across languages. But as people have mentioned, it's the most common words that you really need to learn with grammar - the pronouns, possessives, prepositions, modal verbs, and that sort of thing. These tend to not translate very "neatly," so it's somewhat pointless to learn them without grammar.

But while I think your approach is the wrong way to go about it, some generally useful things to learn are: colors, common adjectives and their opposites (young/old, "pretty/ugly" and so on), family relationships and so on. Even these are tough - many languages differentiate between an "old" that's the opposite of young and the "old" that's the opposite of new. Hungarian doesn't really have a word for "brother" or "sister" (they have separate ones for older brother, young brother, older sister and young sister) and even colors don't always translate well.
posted by Dee Xtrovert at 6:44 PM on June 14, 2009 [2 favorites]

The Loom of Language attempts this with its Basic Vocabularies, although the book has been previously debunked. However, I think the Basic Vocabularies are a good starting point for nouns and verbs in groups, and can show you areas where your knowledge is patchy (eg, body parts, feelings, etc).
posted by thebazilist at 7:06 PM on June 14, 2009

2nding Dee Xtrovert, although i'll grant that in lanugages closely related to English your approach might work to some extent.

Also? DX, sok sikert a magyartanuláshoz! Remélem nem őrjít meg az anyanyelvem. Kitartás :-)
posted by tigrrrlily at 7:14 AM on June 15, 2009

« Older recommendations of great videos on TED.com   |   Where can I get a quality Sanskrit audio? Newer »
This thread is closed to new comments.