List of simple word roots
May 16, 2013 2:44 PM   Subscribe

I am looking for a text file of a list of words (roughly the 5000-10000 most common English words) and their root word and root word language. My Google Fu only turns up single words or pages that I can type in a word to get to another page to get the etymology. Wikipedia has some stuff, but it is sorted by language root, which is not what I am looking for. I would like to have a long list of words in a text file so that I can manipulate it programatically. Comma separated or whatever, any format would be great. Here is one use case: Yoke - [list of words that have yoke in the etymological history] (Many, many many English words come from the root work for Yoke.) All answers appreciated!
posted by Monkey0nCrack to Education (6 answers total) 12 users marked this as a favorite
Just to clarify what you're looking for: for example, "yoke" and "zeugma" both come from the same Indo-European root (although one through Greek and one through Proto-Germanic). Would you want zeugma to be on the list of words attached to yoke?

If so, I think what you want is a list of cognates of common English words, although that may turn up a lot of words that are not themselves English words (unlike zeugma, which has been borrowed into English). That may help your search.

If you want both cognates like zeugma and more straightforward formations like "yokefellow" or something like that, I don't know what term you might use.
posted by rustcellar at 3:31 PM on May 16, 2013

Best answer: Here is a pretty big freely downloadable machine-readable etymological database. It's huge but certainly not complete (actually some of the data used in the example on that page is missing! I just emailed the maintainer about it), but it's certainly a start.

You could then take that, and programmatically associate those entries with an english word frequency list from one of any number of sources (here's one example), and then prune out any entry that's not reachable from your 5000 most common english words (or whatever filtering criterion you want.), and re-save in whatever format you want.
posted by aubilenon at 4:33 PM on May 16, 2013 [3 favorites]

Ah, now I see your example is your desired output, not input.
posted by rustcellar at 4:43 PM on May 16, 2013

Response by poster: etymwn is everything I need and more! Aubilenon Thanks so much, this is really great.
posted by Monkey0nCrack at 5:56 PM on May 16, 2013

Update: The maintainer of that etymological db wrote back, agreeing that there seems to be a problem with PIE entries, and he'll look into it and update it when he's sorted it out. So if you check it again in a week or two you may find it even everything-you-need-and-more-er than it is now.
posted by aubilenon at 6:07 PM on May 16, 2013

You can also screenscrape the Online Etymology Dictionary.
posted by redlines at 6:34 AM on May 17, 2013

« Older Graphic ID please   |   The first 10 days with prozac. Newer »
This thread is closed to new comments.