How many words do we really use?
April 4, 2013 5:35 AM   Subscribe

What is the average working vocabulary (and outliers) of various languages? Is the working vocabulary of English English different from American English or Australian English? and how does this compare with other languages?
posted by adamvasco to Education (11 answers total) 5 users marked this as a favorite
 
What is English English? Did you mean British English?
posted by humph at 5:53 AM on April 4, 2013 [1 favorite]


I would bet that the working vocabulary varies more between individuals (or levels of education, or whatever) than between nations that use the same language...
posted by acm at 6:03 AM on April 4, 2013


Response by poster: For the sake of argument let's just say English spoken in the archipelago off the north west of Europe.
posted by adamvasco at 6:04 AM on April 4, 2013


Steven Pinker cites a study in his book "The Language Instinct" in which news reports from NPR were studied and each new word used noted - for a whole year. The resulting word list was in the area of millions of unique words. For fun they repeated the study for one more day and got three more words that hadn't been used at all the year before. These weren't esoteric words either - one of the new words, if I recall correctly, was "snowy".

It's really hard to quantify working vocabulary because even allowing for differences in education it is massive.
posted by chainsofreedom at 6:13 AM on April 4, 2013 [1 favorite]


You might be interested in both the quiz and the background information found on the Test Your Vocab site.
posted by jacquilynne at 6:15 AM on April 4, 2013 [3 favorites]


The short answer is that it depends, but most word frequency studies that I have read across various languages generally pegs the most common 2,000 words at being 80% of all words used, give or take a few percentage points. There is also a few percentage points of variation between spoken and written language, fiction and non-fiction.

Vocabulary size is highly correlated with general intelligence, which is why the ten-question WORDSUM test, which simply asks the definitions of ten words, is such a good proxy for IQ. A college-educated English speaker has a vocabulary size of 15,000 to 20,000 words, generally speaking. Upon entry to college, the average vocabulary size is about 12,000 words. The study is here. Please note that is passive, not active vocabulary. Also, the authors had concerns that the numbers are a bit high because of the use of multiple-choice, which allows for guessing. However, knowledge about about 5,000 words will have you knowing 999 in 1,000 words used.

It is important to keep in mind what a "word" is for vocabulary purposes. Such studies generally measure vocabulary size in terms of lexemes rather than words. To use the example of "snowy" mentioned earlier, "snowy", "snows", and "snow" are three words, but one lexeme. When I have used "word" to talk about vocabulary size, I was speaking of lexemes or "word families". So no, we don't know millions of words.

Also, you have to be aware of the distinction between active and passive vocabulary. We all understand more than we can speak, although this is perhaps more apparent to us in foreign languages we study. If you have ever studied a foreign language, I am sure you have thought that you can understand/read better than you can speak original utterances.

To answer your question, if I were to make an estimate of the vocabulary size of the average high school graduate, I would say it is under 10,000 words.
posted by Tanizaki at 6:35 AM on April 4, 2013 [2 favorites]


Correction! It was written language - news stories from the Associated Press - and it was compiled in 1988. The total was out of 44,000,000 words, 300,000 DIFFERENT words. And the next day there were THIRTY-FIVE new words (including, not "snowy", but "fuzzier").

Sorry for the misinformation. This is what you get when you post without getting up to check the actual book.
posted by chainsofreedom at 7:24 AM on April 4, 2013


I used to run a program that analyzed my stories: Unique words, Words used Once, Total words, Number of Times each unique word was Used. The program counted slow and slowly as two different words.

I would look the list over to get a flavor for the tone of the piece I'd written. Sometimes I was inspired to go back and murder a bunch of passive constructions, or kick the shit out of flabby adjectives. Not scientific, but it was a good stress-reliever, and sometimes it was helpful. For example, I got to where I hardly ever used the word very. (Well, not quite true. I used to just type "fucking" in when I was inclined to use "very", then when I was editing, I would go back and find all the "fuckings" to see if that was what I really wanted to say.) This sort of editing is so much more fun that killing widows and orphans, and running down typos.

Seems like most stuff ran to the area of 2,500-3,200 unique words, for anecdotes and autobiographia, and considerably less for other essays--most of my essays run to a maximum of (about) 900 words total. Emails, and the like had several orders of reduction, since most contained only a few dozen total words.

I believe my passive vocabulary to be around 35,000 words.
posted by mule98J at 10:48 AM on April 4, 2013 [1 favorite]


Comparing lexicon sizes between languages on is also very difficult because languages differ a lot in what kind of morphological tricks they can do to with one lexem. A language that is relatively poor in these tricks will need more root lexems, but language with rich morphology can stretch one lexem quite far. In these languages one lexem = one word rule doesn't seem so useful way to count lexicon size.

In finnish dictionary (and intuitively) these are considered different words, though the base lexem is the same:
kirja = a book
kirjata = to book, to note down
kirjoittaa = to write
kirje = a letter (that you mail)
kirjain = a letter (alphabet)
kirjasto = a library
kirjanen = a booklet
kirjaaja = a clerk
kirjailija = a writer
etc.

If we go to other end and count all different morphological forms of words as different words, then these languages could have easily dozens and often thousands of different combinations of possible and usable forms for each lexem (think if in english instead of preposition + word, you would have a new form of word.) Some combinations would be quite theoretical, but for example 'preposition' forms are in daily use.
kirjalla = by book
kirjaan = into book
kirjasta = from book ... 10 more...

So you have to choose some compromise to what words count in dictionary as separate words and there is no way of doing that works for every language. We also have no agreement of how lexicon in mind is organized and what counts there as different words.

I assume that each language has about similar payload. Otherwise it is difficult to limit the theory so, that it won't claim one language having 10 000 times more words, or words that mean different things and look different as same word, or something else silly.
posted by Free word order! at 11:17 AM on April 4, 2013


How many words? (Downside: no exact answers. Upside: no bullshit.)
posted by languagehat at 12:05 PM on April 4, 2013


Corpus linguistics. Frequency dictionaries. Both useful sources of information on that question, the former will be online, if you study the latter in a library for a language you speak extremely well, you will get a personal feel of where you feel the cut-off is for 'common' words.
posted by maiamaia at 4:05 PM on April 4, 2013


« Older So how do I clean my egg beater?   |   How much do I pay a new hire? Newer »
This thread is closed to new comments.