Is there a way of categorizing a set of words by emotion or tone?
February 24, 2017 6:22 PM   Subscribe

I'm trying to brainstorm something by comparing a set of words together.

I would like to see if there is a "standard" way of categorizing adjectives or verbs into, say, emotional categories. I see things like this, which provides an "analysis" of tweets being "upbeat", "angry", and so on.

Some cursory googling around shows that this is in the purvey of "natural language processing" (NLP), but looking at the first things that pop up seem more concerned with tokenizing / diagramming sentences. I don't want to go too far down a rabbit hole with the contemporary buzzwords, so I'm seriously just trying to find the actual terms that I'm looking for.

Other googling tends to be word lists for english studying or just plain definitions.

Are there standard terms or listings that would distinguish "cheer", "cozy", "warm" from "disgust", "inhibit", "chill"?
What are the terms I'm even looking for here?

Ideally, it'd be some sort of library I could run some word lists through. Platform/tech doesn't matter, but I'd prefer open-source.

Again, I'm not trying to do full on NLP or guess someone's education or demeanor, I'm looking for just static analysis of individual words. Like so I could run an adjective through a thesaurus and separate similar words into positive/negative piles (or something equally simplistic).

Let me know if I can clarify, like I said, I know I'm not even using the right terms for this.
posted by lkc to Computers & Internet (9 answers total) 3 users marked this as a favorite
When I think of people who are doing this sort of work, I think of researchers looking at things like facebook and doing semantic analysis, coding words. You might want to look into the concept of semantic differential. I get in the weeds after that point and don't find lists but it seems like the right jumping off point.
posted by jessamyn at 6:39 PM on February 24, 2017 [2 favorites]

Something like Contrasting and Categorizing Emotions, perhaps?
posted by Thella at 6:51 PM on February 24, 2017 [1 favorite]

Again, I'm not trying to do full on NLP or guess someone's education or demeanor, I'm looking for just static analysis of individual words. Like so I could run an adjective through a thesaurus and separate similar words into positive/negative piles (or something equally simplistic).

The problem is, what you've just described requires *somebody* to have done the full on NLP.

In order to do this, a human would need to train a program on a sample of words like the ones you listed, and teach the program which ones were positive, and which ones were negative. Then, the linguist lets the program loose on a wider pool of data, and hopes the program acts similarity to human raters in categorizing new words. (This is why you're getting google results talking about tokenizing and diagramming sentences.)

Here's an example of somebody doing this, using Amazon Book reviews here .

The problem is you're basically asking for a sentiment analysis for...everything. Not just "hey, can you figure out if "chill" has negative meaning in a book review?" or in a tweet, or in some other (relatively small, well-defined) corpus, but "Hey, can you figure out if 'chill' has a negative meaning every time it's used?" which is an incredibly complex problem. (At least, I think it is, based on my interactions with computational linguists. Any real computational linguists, feel free to correct me.)

The wiki page on sentiment analysis might be helpful as a jumping off point for more googling.
posted by damayanti at 7:04 PM on February 24, 2017 [2 favorites]

I think that there's no 'standard' way of doing this, as language is subjective. However you could look for people doing research into 'sentiment analysis.' This research uses 'dictionaries' that have lists of words marked up for particular attributes, such as sentiment. I did a quick search for 'sentiment analysis dictionaries' and came up with some promising looking stuff. I'm assuming that you'd probably need to run the dictionaries through NLP programs, or at least understand how they mark up each of their terms.
posted by carter at 7:07 PM on February 24, 2017 [1 favorite]

You could try looking at the concept of "semantic fields" or "lexical fields". I'm more familiar with categorising nouns or verbs this way but presumably people somewhere are using it for adjectives.

The other problem is that semantic fields are usually broader than I think you are looking for.
posted by lollusc at 7:55 PM on February 24, 2017 [1 favorite]

marsha linehan has a list of words in her workbook for people who live with Borderline Personality disorder. There are giant lists of words that describe an emotion, in varying intensity. I wonder if something like that would be helpful to you. What is nice about it, is that it is built around the emotion, not the words or linguistics, synonyms, etc.

A taxonomy of words might also be useful to you. But it seems you may have to build something like that.

If you are interested in the pages from Linehan's workbook, email me on this site and I can send you copies of those pages. or you can find the workbook at a library near you.
posted by Jewel98 at 8:00 PM on February 24, 2017 [1 favorite]

The problem is, what you've just described requires *somebody* to have done the full on NLP.
Well, I'm looking for something much simpler, but don't quite have the terms I'm looking for. I'm not really looking for meanings in context, but a more general category. Not as closely related as what a thesaurus would give you, but something more static, like a rhyming dictionary or a syllable count.

Like, I just want to look this up in a table. We have words that are "formal", "polite", or "vulgar", and we have collections of these words going back well before computers. Is there a more general term for that kind of categorization?

Having typed that out, I feel like I'm looking for a dataset of meta-data about words. Maybe somewhere between a thesaurus entry and an etymology.

Obviously language is subjective, and I'm not really trying to make a computer be able to understand the intent of a document, I just want to shuffle words into sets and make funny sounding pairs. I'm wondering if there are defined "tags" or "collections" or something.

To use the above example of sets: if I had "lordship" tagged as being a "formal" word and "shithead" as a "vulgar" word, I'd just grab a random one from each set and generate "lordship shithead", not trying to see what it means to use that in a sentence.

So is there any kind of tooling or dataset that would have "cheerful" as generally positive, "hurt" or "anxious" as generally relating to a feeling. It doesn't have to be an exhaustive collection of english words, but some kind of categorization that's a bit more machine friendly than scraping dictionary definitions off the web.

Hope that makes a bit more sense.
posted by lkc at 8:55 PM on February 24, 2017

These answers all look great! I'm going to go wikipedia digging and report back, thanks!
posted by lkc at 9:01 PM on February 24, 2017

OK, Y'all have definitely pointed me in the right direction.
Sentiment analysis looks pretty close to what I was asking, and I'll keep poking around it.

damayanti: In that amazon analysis, its kinda bizarre, he points out that he's doing a search/replace on punctuation and then stripping it, and then doesn't strip exclamation or question marks which then show up all over his results (the first word in the positive list is "it!!"). I'm an old perl-n-regex guy, so destructive replaces on data input sets off a couple alarms for me. The model is interesting, but pretty simple. Actually it looks more like he's halfway to a markov chain discriminated by review level since he doesn't quite get to a conclusion with the current setup. Fun though. I kinda want to play golf with this example. (while writing this, I see that he continues, so I'll keep reading).

Thella: That "Categorizing and Contrasting Emotions" wiki page is SO cool. There's something epistemologically challenging about such a tidy set of hierarchies and operations to define the range of human emotion. That's actually very close to what I was hoping to find, but with a wider scope. Really enjoyed seeing that.

jessamyn: That is a good jumping off point. I think the difference (due to my idea-belchery-rambling of the OP), is that it looks like they are trying to find the outlines of the categories as they manifest, and I'm looking for pre-set categories with stuff in it I can check to see if a word is in it.

Anyway, jumping between those and checking through some public datasets got me over to WordNet and its ilk, which looks like the most practical starting point to screw around. Though, now I'm a little annoyed that apparently none of this stuff has etymology, which seems like it would be a great thing to be able to quickly look up about a word.

Thanks for the responses, and I'd love to hear any more suggestions!
posted by lkc at 10:49 PM on February 24, 2017

« Older Where to mail-order frozen soups and stews in US?   |   how to best help First Nations kids/families? Newer »
This thread is closed to new comments.