How to extract contextual information from speech?
September 15, 2020 12:49 PM Subscribe

After watching too much sports talk on cable television, I became convinced the national media is obsessed with only a few storylines. I work in computer vision and AI, so I'm familiar with extracting information out of photos. Is there a similar way of extracting context with natural language processing or something similar the way broadly I would with labeling an image and feeding it into a model? No sports knowledge required, but I'll explain what I mean inside.

I made a joke that if you weren't the Patriots/Tom Brady or Dak Prescott the national media won't care about you. I manually recorded the segments and the broad topics they were talking about. I don't know much about NLP and I'm being a bit purposely dumb in my understanding to hopefully not pigeonhole the answers, but here's what as a human I understood two segments to be:

0-18 minutes: The topic was about Bruce Arians. He's the coach of the Bucs. The topic was not really about Bruce Arians but about his ability to coach quarterbacks. The Bucs quarterback is Tom Brady. So this segment was a roundabout way of talking about Tom Brady and you'd be hard pressed to argue otherwise.

18-33 minutes: This was easier, the topic was solely on Dak Prescott and how much he can make. Was it really about the Cowboys? Would the topic exist without the Cowboys and just Dak Prescott? That's harder to answer.

If I was analyzing the closed captions with time stamps, has NLP come along far enough to extract that the topic of 0-18 minutes was Tom Brady and the topic of 18-33 minutes was Dak Prescott? The more I thought about this the more difficult I am thinking it would be to extract this information, but to be honest some aspects of deep learning and computer vision seem sort of like magic even though I can tell you how ConvNet works.

Knowing what I know about using news reports or Twitter to analyze when people are talking about stocks, I assume there's been a lot of formal research done on this. Is there something I should or can search on? Or better yet is there something well maintained like Facebook's Detectron2 where I can get a general idea of where the research is on this and how to approach and decompose problems like this?

posted by geoff. to Computers & Internet (8 answers total) 4 users marked this as a favorite

I’m not sure about the current relevance because I’m not in the field anymore, but 10-15 years ago this work was being published at ACM Multimedia, and would have been called Media Summarization or something similar. There were contests and canned datasets and people got excited over small improvements in results.

Today, as you hypothesize, machine learning may have eaten that world, but if you look at last year’s MM proceedings, there might be jumping-off points into the literature.
posted by Alterscape at 12:56 PM on September 15, 2020

Topic modelling is one term to search for.

https://towardsdatascience.com/topic-modeling-for-everybody-with-google-colab-2f5cdc99a647
posted by smidgen at 1:17 PM on September 15, 2020 [1 favorite]

You should also look at named entity recognition and entity linking. Named entity recognition is the task of deciding that the string "Bruce Arians" refers to a person. Entity linking is the task of figuring out which human it refers to — and, if you're really on top of your shit, figuring out which other human "his quarterback" refers to.

One way people sometimes do this is by linking strings to the Wikipedia articles for their referents. So your task would be to convert "Bruce Arians was talking to his quarterback" to something like this:

<a href='http://en.wikipedia.org/wiki/Bruce_Arians'>Bruce Arians</a> was talking to <a href='http://en.wikipedia.org/wiki/Tom_Brady'>his quarterback</a>.

That specific version of the task can be called "wikification."
posted by nebulawindphone at 1:27 PM on September 15, 2020

Is there something that would indicate and "weight" someone in the conversation higher? Again going on the Bruce Arians conversation as that's the perfect example if I didn't know any context of Tom Brady outside of the conversation or that he was his quarterback, I would not have assumed the conversation was really about Tom Brady. In the context of the conversation itself, even if I knew it was about talking about historic quarterback performance under Bruce Arians, I think it would be hard-pressed to know it was really a way of presenting a conversation about Tom Brady without having it directly about Tom Brady. Is there a parameter or weight that figured out what's popular?

In a similar fashion if I'm tracking someone in a video and they go behind a bookcase or occluded in someway, I can estimate that they are behind the bookcase (hidden information) but the longer they are behind the bookcase the less my confidence is that they are behind the bookcase. I assume in theory the two concepts are conceptually the same, is there a way to analytically measure what we call "media buzz?"
posted by geoff. at 1:41 PM on September 15, 2020

But also, look, part of what you're dealing with here is that the concept of "topic" is really, really slippery. When you ask people to annotate the topics of pieces of text, you get lousy interannotator agreement. When theoretical linguists talk about topichood (which is sorta kinda what my Ph.D. research is in, so I'm calling foul on my own side here), we have trouble even making our definitions precise enough to argue about, much less agreeing on which definition is right. It's just a big, big mess.

It sounds like you might specifically be interested in the people the news is talking about, rather than abstract topics like "skill" or "wealth" or "training," or concrete-but-not-human topics like "Tampa" or "the Cowboys." If that's your goal, I'd honestly leave aside the notion of topic entirely, and just work on detecting references to people with enough accuracy that you can recognize broad trends. Barring a weird situation like "Juan Pérez is mentioned in every single news story, but he isn't the main subject of any of them," you can probably get a decent handle on how tight the media's focus is just by looking at who's getting mentioned at all.
posted by nebulawindphone at 1:41 PM on September 15, 2020

we have trouble even making our definitions precise enough to argue about, much less agreeing on which definition is right. It's just a big, big mess.

This is probably why my language is a mess when trying to describe what I want, as in my mind I'm trying to more or less define a formal input and output. Defining top topics, "Tom Brady, Bruce Arians, Buccaneers" will be a lot easier and in theory should still allow me to generate a report about the minutes people spent talking about a given topic. I haven't even gotten into topic segmentation. Ostensibly the first segment had a beginning and end, but like all conversations weaved in and out. I don't even know how to define where the topic really begins or ends (right now they're arguing whether the attractive reporter wears glasses or not and it is ostensibly a broader segment about the NFC).

It is sort of like if I classify a person in a cucumber costume as a cucumber or a costume. It isn't a cucumber, but then what makes it a cucumber? That it is alive? That would mean classifying recently picked cucumbers as not cucumbers. At a certain point our classification language itself influences how we classify things if that makes sense.
posted by geoff. at 2:10 PM on September 15, 2020

I agree with smidgen that it sounds like topic modeling may be what you want, or at least part of what you want. The basic idea is that you have a bunch of observed "documents" consisting of words, and each document is governed by one or more unobserved "topics". Each topic influences the frequency of certain words being found in a document, and by looking at the statistics of word frequency across your documents, you infer both the nature of the topics and the topics associated with each document. In your case if you're interested in seeing how the topic changes over the course of a discussion, you'd probably divide the discussion into subsections, which would become your "documents."
posted by biogeo at 5:33 PM on September 15, 2020

Here’s a paper about this - maybe you could reach out to the authors, or see what has cited it in the last decade.
posted by oceanjesse at 6:28 PM on September 15, 2020

« Older Scarce goods in September (Covid) | How do I save my marriage OR how do I leave my... Newer »

This thread is closed to new comments.

Ask MetaFilter

How to extract contextual information from speech?
September 15, 2020 12:49 PM Subscribe

Tags

Share

How to extract contextual information from speech? September 15, 2020 12:49 PM Subscribe

Tags

Share

How to extract contextual information from speech?
September 15, 2020 12:49 PM Subscribe