AlgorithmFilter: Why not use Bayes to deal with online harassment?
February 9, 2015 6:41 AM Subscribe
Isn't Bayesian filtering the already-existing solution to online harassment on a medium like Twitter?
I was reading this piece in The Verge today and it struck me that, from a technical standpoint, online harassment could be dealt with the same as spam email: with Bayesian filtering techniques like those implemented in SpamAssassin. We solved spam! It hasn't been a problem for me for at least 10 years. Why can't we make online harassment disappear the same way? (I do realize that this would cure the symptom, not the cause, but sometimes that's the best we can hope for.)
Am I missing something about the nature of online harassment in non-email environments that makes the application of Bayesian techniques less feasible? My question in short: is this a technical issue or is it just a question of will?
I was reading this piece in The Verge today and it struck me that, from a technical standpoint, online harassment could be dealt with the same as spam email: with Bayesian filtering techniques like those implemented in SpamAssassin. We solved spam! It hasn't been a problem for me for at least 10 years. Why can't we make online harassment disappear the same way? (I do realize that this would cure the symptom, not the cause, but sometimes that's the best we can hope for.)
Am I missing something about the nature of online harassment in non-email environments that makes the application of Bayesian techniques less feasible? My question in short: is this a technical issue or is it just a question of will?
Probably because harassment (in some forms anyway) objectively looks a lot like good-faith arguing. Spam mostly looks like spam.
More than that, though, it's a policy/philosophy thing: these platforms give people the benefit of the doubt about not being harassers and wait for other users to make that call. I can barely imagine the blowback against a company whose "abuse filter" was set to "mute first, ask questions later".
posted by supercres at 6:49 AM on February 9, 2015 [2 favorites]
More than that, though, it's a policy/philosophy thing: these platforms give people the benefit of the doubt about not being harassers and wait for other users to make that call. I can barely imagine the blowback against a company whose "abuse filter" was set to "mute first, ask questions later".
posted by supercres at 6:49 AM on February 9, 2015 [2 favorites]
It's a lot easier to spot bots/spam than bad human interactions. Gmail's spam filter is an enormous achievement, but they got to work on a huge body of known junk that had a lot of easily parsed properties, and they are ineffective against individually tailored phishing/scam emails, which are more similar to tweets than mass spam.
Spam:
posted by michaelh at 6:51 AM on February 9, 2015 [8 favorites]
Spam:
- Has to put in a link
- Gets sent from weird places
- Gets sent to a ton of people, so there's a lot of feedback for each piece. Gmail can learn about edge case spam based on the behavior of the first few people to see it.
- Are written by people
- Don't necessarily use forbidden/warning words; they're often just disliked for what the words imply.
- Do something twitter users are supposed to do (@mentions, retweets)
- Are sent to very few people, so there isn't much feedback for any particular one.
posted by michaelh at 6:51 AM on February 9, 2015 [8 favorites]
Forgot to mention the URLs in spam provide a ton of information; Google is able to use what its crawler knows about website titles, URLs, domain registrars, etc. to determine whether the spam email is sending something worthwhile. In a sense, the content of the spam email is just the tip of the iceberg of data to analyze. Tweets have almost no content in them.
posted by michaelh at 6:53 AM on February 9, 2015 [1 favorite]
posted by michaelh at 6:53 AM on February 9, 2015 [1 favorite]
I think something like this could work. Let's take a look at some micro-cases. Look at the thing Anita Sarkeesian posted a little while ago about her week in harassment, or the roundup Deadspin did of racist tweets directed at Joel Ward after he knocked the Bruins out of the playoffs. Just looking at those tweets, you can see a lot of patterns: the particular slurs that are used, the hashtags and mentions they're used with. I'm willing to bet a lot of the users follow similar accounts. (In fact, I think there's an anti-GG blacklist you can sign up for that autoblocks people based on certain followees that are highly correlated with abuse.) Twitter has a database of hundreds of billions of tweets to build the algorithms from. I don't think the defenses of insufficient data or impossible-to-discern patterns hold much water.
posted by protocoach at 7:17 AM on February 9, 2015 [3 favorites]
posted by protocoach at 7:17 AM on February 9, 2015 [3 favorites]
I think michaelh has the right idea. You might be able to detect large, mob-style attacks, but I think your fundamental analogy is off. Even if you accept that spam filtering is a "solved" problem, Bayesian statistics work largely because spam is mostly generated en mass by programs that leave detectable patterns. They haven't done anything to reduce harassment via email because it's real email coming from real people.
posted by mkultra at 7:39 AM on February 9, 2015 [2 favorites]
posted by mkultra at 7:39 AM on February 9, 2015 [2 favorites]
I still get spam in my personal email, even with all the magic that Gmail does behind the scenes. It's hardly a "solved" problem, just well controlled.
Also, I think "harassment" is a fairly subjective experience. I've gotten into arguments with people online (yes, even on MeFi) that involved ad-hominems and similar aggressive personal attacks which would, for some people, imply harassment, but I rarely am bothered by these experiences, so I usually don't even register them as a misuse of the platform. I think I've hit the "flag" button all of three times on MeFi. Others may have a lower/higher tolerance for such experiences. So I think it would be hard to quantify harassment compared to spam, which has more obvious objective measures – e.g., links/URLs in email, sent from an address not in the user's mailbox, mentions of Nigerian royalty, etc.
posted by deathpanels at 8:06 AM on February 9, 2015 [1 favorite]
Also, I think "harassment" is a fairly subjective experience. I've gotten into arguments with people online (yes, even on MeFi) that involved ad-hominems and similar aggressive personal attacks which would, for some people, imply harassment, but I rarely am bothered by these experiences, so I usually don't even register them as a misuse of the platform. I think I've hit the "flag" button all of three times on MeFi. Others may have a lower/higher tolerance for such experiences. So I think it would be hard to quantify harassment compared to spam, which has more obvious objective measures – e.g., links/URLs in email, sent from an address not in the user's mailbox, mentions of Nigerian royalty, etc.
posted by deathpanels at 8:06 AM on February 9, 2015 [1 favorite]
Have a look at the forum at this Kaggle machine-learning competition, "Detecting insults in social commentary," for a good intro to some of the technical challenges here. This blog entry by one of the top finishers in the contest also has a lot of good insight into what works and what doesn't. Bayesian inference isn't magic, it's a constellation of techniques and strategies for problems which are often really hard.
posted by escabeche at 8:19 AM on February 9, 2015 [6 favorites]
posted by escabeche at 8:19 AM on February 9, 2015 [6 favorites]
Escabeche, if I'm reading those links correctly, it seems like given a dataset of 7,000 comments that were each around a paragraph in length, the victor of the competition was able to correctly flag 90% of the insulting comments. I'm not a statistician, but it seems like, given the volume of data Twitter has collected over the last several years and the ability to throw far more people, money, and time at the problem, they could probably come up with solutions that were at least as effective as what small, informal teams and individuals managed to put together over a weekend more than two years ago.
Even if Twitter's system was only half as effective (say it only autoflagged things where there's zero doubt that it is harassment, like tweets directed at someone else that say "You are/you're/your a [fill in one of a tightly controlled list of awful slurs]", or "I will rape you", or "Kill yourself"), wouldn't that be a pretty huge improvement over how it is now? I suspect someone like Sarkeesian would appreciate a 50% decrease in the flood of abuse she deals with every day. Combining that with Twitter's current (woefully inadequate) block/mute/report options and maybe investing some actual money in moderating would put a huge dent in Twitter's reputation as a churning abuse machine.
posted by protocoach at 8:56 AM on February 9, 2015 [1 favorite]
Even if Twitter's system was only half as effective (say it only autoflagged things where there's zero doubt that it is harassment, like tweets directed at someone else that say "You are/you're/your a [fill in one of a tightly controlled list of awful slurs]", or "I will rape you", or "Kill yourself"), wouldn't that be a pretty huge improvement over how it is now? I suspect someone like Sarkeesian would appreciate a 50% decrease in the flood of abuse she deals with every day. Combining that with Twitter's current (woefully inadequate) block/mute/report options and maybe investing some actual money in moderating would put a huge dent in Twitter's reputation as a churning abuse machine.
posted by protocoach at 8:56 AM on February 9, 2015 [1 favorite]
I imagine Twitter does use Bayesian filters to detect and combat some forms of abuse (automatically created accounts, for example, or messages sent from accounts with compromised credentials). And I agree that they should be more aggressive in providing their users with tools to filter messages with particular content (like those mentioned by protocoach above), or messages from users with particular characteristics (e.g., Block Together or something like it should probably be a core Twitter feature).
(Then again, Twitter's business model is, essentially, to spam you with unsolicited commercial messages—i.e., advertisements in the form of promoted tweets—so I have to believe that any tools they provide with this functionality won't ever be truly robust, since they always want certain messages to go through regardless of the desires of their users.)
I also agree with deathpanels' point above, that even conventionally automated spam actually does get through spam filters on occasion. Then there's the matter of spam that comes in different forms, like e-mail from businesses with sneaky opt-in rules, that aren't generally subject to spam filters in the same way, but are nonetheless sent to users without their consent. All of which is to say: Spam filters aren't perfect, and just as much engineering (social and technical) is going into breaking them as there is going into making them more effective.
The fundamental issue, however, is that harassment does not inhere in the text, but in the intention behind it. And you can't use a Bayesian filter (or any other tool that only looks at the surface of a text and not its meaning) to distinguish between messages with different intentions. As a consequence, it will always be possible for an individual to fashion some kind of message that "passes" an automated filter as "not harassment" but reaches (and harasses) its recipient nonetheless. (And I have to imagine that many serial harassers online are motivated and technically competent enough to get around any such filter, the same way that people develop exploits for operating systems and encryption software.)
Moreover, online harassment doesn't necessarily take the form of tweets or comments or blog posts or even anything textual at all. Any action with a visible consequence can be harassment if it's intended as such: a check-in on a location-based social network, a registration on a forum with a particular username, an anonymously uploaded image, a follow request on Instagram. Even just a ping from a particular IP address can be harassment under certain circumstances. Someone who is intent on harassment will use any modalities that present themselves, not just those that can be easily filtered with text classifiers.
posted by aparrish at 8:57 AM on February 9, 2015 [6 favorites]
(Then again, Twitter's business model is, essentially, to spam you with unsolicited commercial messages—i.e., advertisements in the form of promoted tweets—so I have to believe that any tools they provide with this functionality won't ever be truly robust, since they always want certain messages to go through regardless of the desires of their users.)
I also agree with deathpanels' point above, that even conventionally automated spam actually does get through spam filters on occasion. Then there's the matter of spam that comes in different forms, like e-mail from businesses with sneaky opt-in rules, that aren't generally subject to spam filters in the same way, but are nonetheless sent to users without their consent. All of which is to say: Spam filters aren't perfect, and just as much engineering (social and technical) is going into breaking them as there is going into making them more effective.
The fundamental issue, however, is that harassment does not inhere in the text, but in the intention behind it. And you can't use a Bayesian filter (or any other tool that only looks at the surface of a text and not its meaning) to distinguish between messages with different intentions. As a consequence, it will always be possible for an individual to fashion some kind of message that "passes" an automated filter as "not harassment" but reaches (and harasses) its recipient nonetheless. (And I have to imagine that many serial harassers online are motivated and technically competent enough to get around any such filter, the same way that people develop exploits for operating systems and encryption software.)
Moreover, online harassment doesn't necessarily take the form of tweets or comments or blog posts or even anything textual at all. Any action with a visible consequence can be harassment if it's intended as such: a check-in on a location-based social network, a registration on a forum with a particular username, an anonymously uploaded image, a follow request on Instagram. Even just a ping from a particular IP address can be harassment under certain circumstances. Someone who is intent on harassment will use any modalities that present themselves, not just those that can be easily filtered with text classifiers.
posted by aparrish at 8:57 AM on February 9, 2015 [6 favorites]
« Older Garden / greenhouse light meter recommendation... | Snow mania: Ice damns, snow inside my window... Newer »
This thread is closed to new comments.
Because no harassment that I've ever seen could be effectively filtered without a LOT of collateral damage. Spam, yes, individual actual people harassing others, well that is much harder.
posted by barnone at 6:48 AM on February 9, 2015 [4 favorites]