help me find word lists to build better bots!
July 12, 2015 3:52 PM   Subscribe

I need an easy way to find lists of words related to specific topics, preferably ones of particular parts of speech, for my bots.

The bots that I make need big lists of words so that the content that they spit out has a lot of different variations. I like to hand-curate these lists, so I'm not interested in using something like the wordnik API, but I'd really like it if there were Big Lists Of Words related to particular topics. It seems like people would have done this already, but I can't find anything like it. I can find individual lists of 20 or so words on certain themes on language learning websites-- lists of violent verbs, or color nouns, stuff like that-- and some of the wordnik lists made by the community have been useful.

It seems like something like this probably exists somewhere (some big English as a second language site, or keywords from word searches or someplace that grabs data from wiktionary or SOMETHING) but I have no idea where to start.

Here's the kind of lists I've looked for in the past. I already have these covered and do not need suggestions for them; I am just trying to give examples.
  • Positive adjectives ("great" "wonderful" etc)
  • Violent (nouns/adjectives/verbs)
  • Color names that are just colors and not words for something else (as in, yellow, red and purple, but not lilac, eggplant or amethyst)
Bonus points if sorting tools are available.

Any suggestions will be appreciated-- it's hard to tell exactly what will be useful in advance, since looking at these lists often ends up being the inspiration for a new bot idea.
posted by NoraReed to Writing & Language (11 answers total) 6 users marked this as a favorite
 
Darius Kazemi (aka tinysubversions) has a project called Corpora that's essentially a public repo of various word lists, that may be handy for some of these ideas.
posted by cortex at 3:56 PM on July 12, 2015 [1 favorite]


There are a lot of word lists here (if I've understood what you're looking for).
posted by billiebee at 4:01 PM on July 12, 2015 [1 favorite]


Thesaurus.com has a search feature, and 42 words for "wonderful."

Wikipedia has Lists of English Words, including Newspeak words.
posted by Little Dawn at 4:01 PM on July 12, 2015


Also, this is pretty spotty and involves some manual collection, but there are for nouns in particular a whole lot of "types of x" lists and indexes on Wikipedia, e.g. (but certainly not limited to) stuff like Lists of English words for specific subsets of vocab. That's one of the main resources I use for collecting stuff like country names, animal species, international common given names, etc.
posted by cortex at 4:03 PM on July 12, 2015


Abulafia has a ton of word lists (and example generators)
posted by CrystalDave at 4:04 PM on July 12, 2015


Response by poster: Are there any easy ways to just grab the words from one column of a Wikipedia table and get them in text form? I'd love to, say, copy and paste over the list of animal names linked from the Lists Of English Words that cortex and billiebee links to into notepad and then go through and delete all the ones I don't want. Right now when I grab stuff from lists like that I just open it in one window and type in all the ones I want in another one, but there's got to be an easier way than that.
posted by NoraReed at 4:06 PM on July 12, 2015


There is a list of palindromes and other words from this Spelling List Generator that might be easier to work with - the website says it has 30 pages of filtered wordlists.
posted by Little Dawn at 4:12 PM on July 12, 2015


For something like that I'd usually get into Webscraping with Python and BeautifulSoup or similar (I've used it for a couple minor projects)

However, in this case it looks like you can get away with using ImportIO, which you can point at a webpage and see what it can extract out of it (and then let you download into an Excel CSV file for further manipulation). It works quite well on the example List of Animal Names, for example.
posted by CrystalDave at 4:13 PM on July 12, 2015 [1 favorite]


Are there any easy ways to just grab the words from one column of a Wikipedia table and get them in text form?

The approach that has worked best for me is to select and copy the whole table and paste that into a spreadsheet, and then just copy out the actual row of stuff I want from there to my text editor. It's a little hacky but it works and is reasonably fast.
posted by cortex at 4:20 PM on July 12, 2015 [4 favorites]


How advanced do the words have to be? Enchanted Learning has lists of vocabulary words for the K-12 set organized by topic. But "topic" is a broad term on that page, so one topic might be "Ways to say Big," another might be "Positive words," another "Irregular verbs."
posted by mittens at 5:03 PM on July 12, 2015


I've had some luck copying tables from wikipedia and pasting them into google spreadsheets, like this. It takes some manual cleanup (which I never got around to doing much on this example) but it's better than retyping the whole thing.
posted by moonmilk at 8:08 AM on July 13, 2015


« Older DEEP CLEAN... The House   |   Smart phone needed ASAP; home has poor cell... Newer »
This thread is closed to new comments.