A Gender-balanced Wikipedia
September 7, 2022 4:02 PM   Subscribe

How many Level 4 Vital Articles would a gender-balanced Wikipedia have?

Sometimes I play Redactle.

"Redactle is a daily browser game where the user tries to determine the subject of a random obfuscated Wikipedia article, chosen from Wikipedia's 10,000 Vital Articles (Level 4)."

When I play Redactle, he/him/his is just about always more likely to yield higher numbers than she/her/hers or they/them/theirs.

What I'd like to know is, using some variation on my Redactle metric (how many instances of he vs she, for example) and the existing 10,003 articles, how many Wikipedia articles might a collection of Level 4 articles that more accurately reflects gender balance contain?
posted by aniola to Grab Bag (7 answers total) 5 users marked this as a favorite

Good question! I think it could use a little clarification of what you want.

So if Wikipedia has 100 vital l4 articles and 70 are about men, we'd need 140 articles minimum to reach gender parity, which would mean adding 40 articles about women. This assumes we don't delete or demote any of the articles about men (which may be warranted nonetheless!) and that we are talking about two genders only (wildly wrong assumption!).

Is that the kind of answer you're looking for, which just depends on the current number of 'male' articles? Adding 'they' as a gender is easy enough to count in principle, but then you have to take a vague guess at what equity would look like for them, etc.

Are you active on Wikipedia? They have groups dedicated to eg discussing gender representation there. For starters they have a nice article titled Gender Bias on Wikipedia.
posted by SaltySalticid at 4:18 PM on September 7 [1 favorite]

Response by poster: What I came up with in my head was just "how many male articles would you have to subtract" but I left the metrics vague on purpose because I recognize there's lots of potential and probably better ways to do this.

I am not active on Wikipedia, if they make it welcoming someday, maybe I'll consider it. Thanks for that article! It's interesting, I am reading it.
posted by aniola at 4:39 PM on September 7

According to the link, there are 10,000 Vital Articles, but the target for articles about People is 2,000, so that's a first step -- 80% of the articles aren't actually about people per se. Notwithstanding ~100 articles about fictional characters, mythological figures, and the like, who are often gendered. As well as articles about things that were created/discovered by people who were themselves gendered -- Hamlet or Jane Eyre, for example, although this becomes fuzzy -- is Carmen male (like the composer of the opera) or female (like the titular figure)? But setting that aside, how many of the people articles should be about men? (Acknowledging the problems with the gender binary, etc.)

On one hand, the people who have been selected for these have been selected to some degree because of their notability. The Politicians and Leaders category (and I note there are other categories with some potential overlap, like Activists and Philosophers) includes 21 articles about US figures in the Modern era. 15 of those are US Presidents. It's not possible to have gender balance within the category of US Presidents, since they have so far (*sigh*) all been men. And I think most people would find it reasonable that many of the most prominent figures in US politics have been Presidents, since that is the single office with the highest power. So some degree of the gender bias in selection is factual reporting about the prominence of people (in a history that has very rarely been unbiased by gender). Roughly 90% of the people who have travelled to space are men; 2/3 of those with Vital Articles are.

But then there's a second dimension; under sports, the articles about tennis are pretty evenly gender balanced, but only 1 of the 14 soccer players are women. Some of this is (as with above) potentially due to the history of the two sports; women's tennis competitions began around the same time as men's, and the professional circuits started around the same time so there are prominent women throughout tennis history, while the men's soccer World Cup began in the 1930s and the women's in the 1990s and the professional games started even earlier for men and later for women. So there are plausibly proportionally more notable women in tennis than in soccer. But the choice of 4 articles about tennis players and 14 about soccer players and 9 about astronauts is, to some degree, arbitrary. Who is to say which is more important, especially when comparing between wildly different areas of impact, places and times?

It's so arbitrary, that I actually lied above about two of the categories; there are actually 8 tennis players (5 women) and three astronauts (one woman) along with 14 soccer players -- did it seem disproportionate under the previous, incorrect numbers? Does it seem disproportionate now? I don't know.

So in evaluating the gender bias, there are I think three dimensions to consider:
1). Actual historical bias in spheres of activity.
1b). Bias in the historical record in spheres of activity -- how much of the knowledge of, say, female military leaders in the Middle Ages is due to the lack of them in the real world, and how much is due to a lack of information about the ones that exist?
2). Selection bias within a sphere of activity, biasing the selection to favour one gender over another relative to their historical prominence.
3). Selection bias between different spheres of activity, which is to some degree totally arbitrary. Should there be more animators & puppeteers, Shia Islamic figures or boxers? There happen to be three of each, but *shrug emoji*.

1 (or at least 1a) is to some degree reflecting something underlying that is itself biased, so presumably that would remain in a 'gender balanced' version. 3 is so tricky and arbitrary, I don't even know how you would benchmark it -- 50 people could try to assign 2000 articles to these categories in good faith and I guarantee you would get 50 unique results. (I'd love to do a poll and see how people allocate them.) It's really only step 2 that is the one that has the potential to increase or reduce gender bias in itself.
posted by Superilla at 5:27 PM on September 7 [5 favorites]

Wikipedia has some standards of notability, which is largely based on the article topics having other media about them already. Like it or not, many more men have been written about. Many more men have had the opportunity to accomplish noteworthy things than women have had. Over a long time, that ratio will change, which should lead to changes in Wikipedia.
posted by NotLost at 5:31 PM on September 7

I've been wondering about this too (also a Redactle fan). I'm not quite sure how to interpret your specific question though, as the bias would not be changed by adding more articles to Level 4 unless the new articles had a greater proportion of women as subjects (I may be misreading your question). I guess there is the known issue about lack of representation of women in articles in Wikipedia - last time I saw the stats 16% of articles about people were about women - but also a likely bias in the % of Vital Articles that are about women. There is some info here about the process of categorising an article as Level 4. I can't immediately see a figure for the percentage of Level 4 articles about people that are about women, but that must be findable, through categories if nothing else. I'll look later when I have more time.

One of the things doing Redactle has made me notice is the less countable gender bias as opposed to the straightforward metric about % of articles about women. So even in articles about topics you'd think would be reasonably gender-neutral, there are often more 'he's than 'she's. And articles about men often say more about their fathers than their mothers. Of course this isn't surprising but it does mean that the work of the Women in Red project to reach gender parity will at some point need to look beyond the most obvious metric. I think articles on women tend to be shorter too. In my own Wikipedia-ing, of the 20 or so articles I've created, I found it much easier to find source material on the one of those articles that is about a man, so that article is longer than almost all the ones I've written about women. I also had difficulty finding the name of his wife, despite him saying he couldn't have achieved what he did without her.
posted by paduasoy at 12:51 AM on September 8 [3 favorites]

To add to the above - the article on Radio, which is a Level 4 article, has three mentions of He, of which one refers to an inventor, but two are fairly egregious sexist language where car drivers are assumed to be male: "the owner can open the door when he drives up in his car, and close it after he leaves". There is no citation for this text so it's not a case of Wikipedia quoting a text which is itself sexist. The article has no uses of She. Obviously any editor can change this.
posted by paduasoy at 12:58 AM on September 8 [2 favorites]

There are 1,991 or 1,990 Level 4 articles in the category People (source 1, source 2). I can't easily see a way to separate this group by gender, but looking at the lists is interesting. Picking things more or less at random, there are 8 Vital Articles at this level on photographers, and all 8 are male. Similarly sculptors, 8 of 8. There are 28 articles on businesspeople, of whom 1 is female, 1 is a family and the rest are male.
posted by paduasoy at 4:29 AM on September 8

« Older Help a beginner get into coffee   |   Should I pay $25 more per month for a 13 month... Newer »

You are not logged in, either login or create an account to post comments