Help the average Jane with a statistics question?March 24, 2008 1:25 PM   Subscribe

An acquaintance who's a university professor on the hiring committee for his department cried 'sexism!' when 50% of the candidates who were asked back for interviews were women, when only 18% of the initial applicants for the position were women. I rebutted. Who's right?

There were 200 total applicants for this humanities department position. 18% were women. Twenty applicants were invited back for more interviews and half (10) were women. He said that since 50% of those who made it to the next level of the application were women, there must be something wrong with the criteria the department was using to narrow the field—statistically the number should reflect the number of total women applicants.

I made three points. One, the criteria for advancement in the hiring process are qualitative, and the only situation in which the percentages could be counted on to match would be if gender were a heavily weighted criteria. The goal of the hiring committee is to narrow down the best candidates for the position, not to maintain the gender ratio of applicants.

Two, the number of women advancing to the next level was 5% of the entire pool of applicants, a number far below the 18% he held out as a statistically acceptable benchmark.

Third, given a world population—or hell even a smaller field of, say, North American doctorates—of approximately 50% women, if he were going to cry sexism, it should have been when he saw that only 18% of the applicants were women.

The problem: I know nothing about statistics. Do my responses hold water and if not, can you give me some that would?
posted by cocoagirl to Grab Bag (36 answers total) 3 users marked this as a favorite

Two, the number of women advancing to the next level was 5% of the entire pool of applicants, a number far below the 18% he held out as a statistically acceptable benchmark.

This isn't right. If you're going to divide them up by gender, then there were 36 women and 164 men. 10/36 women advanced = roughly 30%. 10/164 men advanced = roughly 8%. However, to then extrapolate that "the bar was higher for men" isn't a fair conclusion and there is certainly absolutely no data to support that argument. It is just as reasonable to assume the findings are explained because a lot of female applicants decided not to apply for the job given your institution's known reputation for sexism - i.e., not reasonable at all, because there is no data to support that argument either.

Your third point sums up the entire argument in a nutshell, of course. There are advantages to having women professors in the future; one of these advantages is that they may serve as more effective mentors and role models to women students.
posted by ikkyu2 at 1:31 PM on March 24, 2008

It sounds like you two have different goals. He wants to see a process where men and women have equal odds of success. You want to see one where men and women wind up equally represented. This is just one instance of a huge ethical debate (the catchphrases for the two sides are "equality of opportunity" and "equality of results" respectively) and frankly, I'm not sure it's going to be resolved any time soon.

And the thing is, once you agree on a goal, statistics can tell you how best to achieve it; but they can't tell you what your goal ought to be. So worrying about statistics here is missing the point. What you need to give your friend is an ethical argument that equal results are what's important.
posted by nebulawindphone at 1:38 PM on March 24, 2008 [4 favorites]

Best answer: Without being able to show show some quantitative proof of the worthiness of all the applicants (and thus of the folks who got invited back), you're stuck with a sort of unsettlable question in either direction. What he's arguing implies that the field of candidates had qualifications that were distributed evenly across gender; your position to make him wrong would have to be that the distribution of qualification was heavily skewed toward the female applicants. Either of these could be true, and without a way to measure qualifications you're both stuck with your respective presumptions.

You point out that many of the criteria for advancement are qualitative; if some aspects of qualification are quantifiable, you might consider laying those out and seeing how the correspond to the invitations back. If there are any disqualifying notes on folks who weren't invited back, you might examine those. In the end, 20 out of 200 is not a huge sample, so it may be hard to make a compelling argument in one direction or the other even with a solid statistical analysis of whatever quanitifiables you could turn up.

As for this:

Two, the number of women advancing to the next level was 5% of the entire pool of applicants, a number far below the 18% he held out as a statistically acceptable benchmark.

This argument will get you nowhere; those are two unrelated percentages, and you can't meaningfully argue that the 5% is "less" than the 18%. Drop it like a rock; it will only hurt your case.

Third, given a world population—or hell even a smaller field of, say, North American doctorates—of approximately 50% women, if he were going to cry sexism, it should have been when he saw that only 18% of the applicants were women.

The question isn't whether there was systemic sexism at work in preventing women from applying for the program (though that might be its own independent question to ask), but what happened with the applicant pool in question.

That fewer women applied and yet women pulled an equal share of the callbacks could suggest a number of things, one possibility being that women candidates in this case were more cautious about applying for a position they weren't likely to win, and so many women self-selected themselves out prior to application where men went ahead and applied anyway. It could be that, it could be any number of other things. What's the national rate for male vs. female doctorates in this particular field, etc?
posted by cortex at 1:41 PM on March 24, 2008 [1 favorite]

nebulawindphone is accurate and concise - to the extent that there's a good answer here, that's it.
posted by Wolfdog at 1:42 PM on March 24, 2008

Statistically he is correct. Assuming that qualitative attributes distribute themselves equally across the sexes then the numbers called back should reflect the initial applicant pool.

That being said you could have a severely skewed pool of poor male applicants.

Honestly your numbers are out of whack enough that his argument should hold some weight though its difficult to tell without any other data.
posted by bitdamaged at 1:43 PM on March 24, 2008

Aside from the statistical missteps noted above, you just can't prove sexism (or gender discrimination, two distinct things) with statistics. It's impossible to do. (and proving facts is not really the aim of the field of statistics, anyway)

What if the women who applied were all, every single one, better than even the best man who applied. Likely? No, but possible... and if that's the case, there is some sex discrimination going on... namely that under-qualified men are being selected over better-qualified women.

There are too many variables. What if your posting was advertised to a significantly higher number of men... but the women to whom it was advertised were especially qualified? That would explain numbers like you have.

Or it could be just like it looks... that the committee is trying to use equality as a proxy for fairness and, in so doing, discriminating against men.
posted by toomuchpete at 1:44 PM on March 24, 2008 [2 favorites]

It sounds like you two have different goals. He wants to see a process where men and women have equal odds of success. You want to see one where men and women wind up equally represented. This is just one instance of a huge ethical debate (the catchphrases for the two sides are "equality of opportunity" and "equality of results" respectively) and frankly, I'm not sure it's going to be resolved any time soon.

This isn't demonstrated by anything in the original post, and is purely conjecture.

You are correct, with qualitative criteria for advancement to round two, one cannot draw the conclusions that your friend wants to. Statistics does not say that there should only be 18% female applicants for the callbacks. Were there a larger data set, one might be able to suggest that there was a bias in one direction or another, but not in this case.
posted by OmieWise at 1:45 PM on March 24, 2008

If gender wasn't a significant criteria, then he's essentially accusing his colleagues of favoritism. Not very professional of him.

He said that since 50% of those who made it to the next level of the application were women, there must be something wrong with the criteria the department was using to narrow the field—statistically the number should reflect the number of total women applicants

Now THIS is sexist. Something is wrong with the criteria because more of the outstanding candidates were female? Why on earth would the results of a qualitative review reflect the precise statistical makeup of the applicants?
posted by desuetude at 1:47 PM on March 24, 2008 [2 favorites]

Neither the professor's complaint nor your response holds water.

The professor's complaint doesn't because we don't know anything about the correlation between applicant quality for this particular job and sex. Where did the applicants earn their PhDs? Who was on their committees? How were the quality of the letters of recommendation? It might well be the case that, for this job, if you brought in the twenty people from the best departments with the best letters from the best recommenders, you'd bring in 50% women. Or it might not. The professor won't know unless he runs a controlled study of the probability of advancement, which would require quantification of horrible muddy concepts like "quality of PhD department" and "strength of letters."

Your first response doesn't make sense to me. You might be saying what I just said, or not. Your second response misunderstands statistics. Naive neutrality would mean we should expect 18% of those who advance to be women, just like the pool. Your third response is basically irrelevant, since the department can't force additional women to apply; the applicant pool is what it is.
posted by ROU_Xenophobe at 1:55 PM on March 24, 2008

Response by poster: Just to clarify, I'm not a colleague of this acquaintance or employed by the university in any capacity. It was a point of conversation brought up by a mutual friend as we all talked.
posted by cocoagirl at 1:58 PM on March 24, 2008

12 years ago when I was studying statistics for my MBA I could have told you what the odds are that this outcome could be a random occurrence. I've simply forgotten all this stuff and I'm not going to google it now. :-) Somebody out there must be knee-deep in stats and can tell us the answer. I suspect you could argue that while it's not likely, it's not like a million-to-one chance that half the selected applicants are female.
posted by thomas144 at 2:04 PM on March 24, 2008

Women tend to rate their abilities lower than men rate theirs. This could easily mean that they don't apply for a job at a program like yours -- or even enter the field -- unless they are extremely confident that they are good enough to be successful.
posted by transona5 at 2:05 PM on March 24, 2008 [3 favorites]

Here's an interesting page I found. I googled 'identifying bias in hiring'.
posted by PercussivePaul at 2:25 PM on March 24, 2008

Somebody out there must be knee-deep in stats and can tell us the answer.
I can; the answer is "vanishingly small" - if you assume that the male and female populations at large have the same fraction of "good" (i.e., worth a second call) candidates and that the group of applicants reflects the population at large, then, with the numbers that applied, the probability that the best 20 are a 10/10 split are so close to zero that it doesn't make a difference. The problem is, that doesn't let you conclude anything useful about why the apparently-improbable event actually happened. It could be because there was a systematic bias in favor of choosing women or constructing a balanced short-list (you can debate the rest of your life whether that's a good idea or not, but it certainly happens). It could be because the women are more selective about applying, so that the group the hiring committee sees doesn't reflect the population at large. It could be because the fraction of good candidates among the population at large is actually different between men and women. Here's the important bit again: you can't tell what the cause is. And while investigating it may be a worthwhile cause, doing so properly is more work than your friend is ever going to undertake.
posted by Wolfdog at 2:30 PM on March 24, 2008

Two assumptions one could challenge:

We're assuming that there is no explicit-within-the-committee policy of trying to get an equal mix of interviewees by sex? In some departments at some times there's a known intention to try to hire a woman or minority etc if possible. (This doesn't mean hiring someone unqualified, but in many academic hiring situations you have far more well-qualified applicants than you can reasonably interview.) And this can be due to perfectly legitimate goals of the department. Having women faculty helps the department to attract and retain top female grad students for example. (IMO)

Also -
Why assume random initial distribution of qualifications? Among people who get PhDs and apply for academic jobs (especially in certain fields), there may be good reason to suppose that the women will disproportionately be more qualified than the men. Women who stick with, for example, heavily male-dominated fields, may only be able to do that because they are especially talented in the field and especially dedicated. The women who stick it out might really be on average better, if it's harder for women to stick it out. (I don't have any data or way of assessing whether this is true, or if it's a line of reasoning that appeals to you.)
posted by LobsterMitten at 3:21 PM on March 24, 2008

Best answer: Let's take this out of the realm of gender politics for a minute. Say that you put 200 jellybeans in a jar, 18% of which are tangerine-flavored. So when randomly choosing a single jellybean, you have a 0.18 probability of getting a tangerine one. You close your eyes, reach in your hand and pull out 20 jellybeans. 10 of them are tangerine-flavored. This is a pretty unlikely result. If you put them back in the jar and grabbed another handful, then another, then another, you'd only get this result 0.000001132% of the time.

Therefore it's likely that something is going on, but the statistical probability alone doesn't mean that you're biased for or against tangerines.
posted by desjardins at 3:32 PM on March 24, 2008 [2 favorites]

These responses are all pretty good-- but many put the honus on you to come up with a reason that such an unlikely number of women were chosen. I mean, just positing a few scenarios about self-selection of the applicant group provide for the possibility the committee wasn't biased, but they don't prove the stronger argument I think you wish to make: that there was not any bias.

The best test is to ask your colleague to choose a single woman that made the cut and a single man who did not where the male candidate is clearly the better choice to a majority of evaluators.

If he can do this, there was probably bias (or, a reasonably heavy weighting of the soft advantages of a female hire in terms of presumed ability to mentor female students and so forth.)

If he can't do this, then there isn't any bias.

By the way, you mentioned statistics-- the chances of choosing 5 or more women in a group of 10 if the applicant pools of men and women were equally qualified is less than 1 in 5,000. Most likely, there is either a systematic reason why the assumption of equal qualification is wrong, or there was bias at work-- and we cannot distinguish between these two without more data, or making additional assumptions.
posted by Maxwell_Smart at 3:49 PM on March 24, 2008

Oops, bad math on my part. I expect desjardins answer is right.
posted by Maxwell_Smart at 3:53 PM on March 24, 2008

Who's right?

I'd give the edge to the person who's actually seen the applications.
posted by danOstuporStar at 4:26 PM on March 24, 2008 [1 favorite]

Best answer: desjardins, I didn't believe you so I tried it myself. I got 0.000462... still very small.
I did 36 choose 10 * 164 choose 10 / 200 choose 20. (is it right? my probability is rusty). Here are the probabilities if anyone cares:

# prob. % of group
0 1.51% 0%
1 7.49% 5%
2 17.06% 10%
3 23.68% 15%
4 22.44% 20%
5 15.42% 25%
6 7.97% 30%
7 3.17% 35%
8 0.98% 40%
9 0.24% 45%
10 0.05% 50%
11 0.01% 55%
12 0.00% 60%
13 0.00% 65%
14 0.00% 70%
15 0.00% 75%
16 0.00% 80%
17 0.00% 85%
18 0.00% 90%
19 0.00% 95%
20 0.00% 100%
sum 100.00% 0%

To clarify, for a number n, it shows the probability that a group of 20 contains n women and 20-n men, given a pool of 36 women and 164 men. The third number shows the percentage of the group composed of women. So a group with 20% women, or 4 women, occurs with 22.4% probability.

Also let me state that measuring a hiring process against probabilities like this is kind of silly. But it is sometimes nice to look at the numbers anyway.
posted by PercussivePaul at 4:35 PM on March 24, 2008

Oh and an interesting result is that having a group of ten women (or more) is thirty times less probable than a result of all men and no women.

In other words the assumption that hiring can be precisely modeled by random probabilities -- which is the source of the idea that your group should have been 18% women, notice how 15% and 20% have the highest probabilities -- tends not to work out in your favour. Since I think that's a bogus assumption in the first place (and common sense bears it out), I would recommend attacking that assumption as the core of your argument.
posted by PercussivePaul at 4:43 PM on March 24, 2008

Well, my math was performed as follows:

The first jellybean you pick out has a 36/200 chance of being tangerine = 18% or .18
However, the SECOND jellybean has a 35/199 chance of being tangerine if the first one was tangerine. The THIRD one you pick out has even less of a chance, and so on. I went down the line until you get 10 tangerines, and then multiplied the probabilities together, arriving at 0.000001132%.
posted by desjardins at 6:02 PM on March 24, 2008

But there are 20 selections in total, desjardins. Your description of your analysis says you are calculating the odds of picking 10 women from the population in only 10 draws.
posted by cardboard at 7:02 PM on March 24, 2008

"Well, my math was performed as follows:"

I don't think that works, since the order in which the flavors are selected messes with the denominator/numerator. You're finding the probability that the exact sequence happens... add up all of the permutations of that 10-10 sequence (ten men, then ten women; ten women, then ten men; 5-5-5-5, etc...) and you'll have the right number.

...a number which is completely irrelevant, because it doesn't really answer the question of "Is there discrimination going on here?"
posted by toomuchpete at 7:04 PM on March 24, 2008

I'd give the edge to the person who's actually seen the applications.

But not to the other people who saw the applications and decided that most of the best candidates were women.
posted by transona5 at 7:27 PM on March 24, 2008

The statistics show that there is some kind of aberration going on, but numbers alone can't prove what it is. Any of the theories could be right, we would need more information to determine what the real reason is.
posted by gjc at 8:24 PM on March 24, 2008

Best answer: I think this has been said before, but it bears repeating. The statistics do not show any aberration, they merely hint at the possibility. The percentages I gave tell you roughly what you could expect if you hired a bunch of people randomly over and over and over again. That you got any specific result does not necessarily mean there is a bias. You could flip a coin and get ten heads in a row; it doesn't mean the coin is not fair, it just means it's an unlikely result. (It's still a valid result.) You need many trials and a large sample size before you can make any claims about bias.
posted by PercussivePaul at 11:46 PM on March 24, 2008

Best answer: I can't believe people are actually trying to apply statistics to this question. This is not a stochastic process. The draws from the candidate pools are not occurring randomly, nor should they be. If the prof thinks they should pick the final candidate randomly, he should so state.

Also, the finally-selected candidate is going to be a man or a woman. That means there is a 100% chance that one gender is going to be excluded in the final selection process. How's that for a statistic, statistics boy?
posted by ikkyu2 at 11:59 PM on March 24, 2008

Just to be a dork:

Assume that there is some linear value V with which you could rank all of the applicants. Also assume that within each population, there is a gaussian distribution along this value V. In order for a 28% rate of acceptance for women and a 6% rate of acceptance for men to result in the same cutoff value for V, then the difference in means between the populations must be 0.96 times the standard deviation of the main group.

So, if we assume that V was the standard, and that the men applying had mean Vs of 100, and SD of 15: The women that applied would have had average Vs of 114. Not an impossible mean difference, but there would have been some HEAVY self selection for that to happen.
posted by FuManchu at 12:27 AM on March 25, 2008 [1 favorite]

It's not clear from the question whether your acquaintance was arguing that the hiring process was definitely sexist, or making the more limited claim that it was probably sexist The first can be ruled out fairly easily, the second is much harder to argue with given how strongly the raw numbers are skewed.
posted by tomcooke at 5:12 AM on March 25, 2008

But not to the other people who saw the applications and decided that most of the best candidates were women.

Untrue. I would give the same weight as I gave to the professor to the opinions of the other hiring committee members, if we had any indication of how they reached their conclusions regarding who to call back. Of course, in that case, this question would likely answer itself. As it is, all we know of even the professor's position comes from the loaded phrase "cried sexism" which I don't believe the professor would agree to be an accurate assessment of his reaction. Given the info we have, yeah, I'll take the opinion of the person with first-hand experience over the opinion of someone who doesn't every time.

(Note that this does not mean I believe he is "right" that this is a case of sexism, but merely that gender likely played a role in the result. There are many valid reasons that may occur.)
posted by danOstuporStar at 6:04 AM on March 25, 2008

Both arguments are both inane regardless of statistics. what about the reality of the department? Is it weighted towards men or women? If it is weighted towards men why not hire a woman to even out the imbalance? And the most important question is about qualifications. The most qualified person should be hired regardless of gender.
posted by JJ86 at 6:14 AM on March 25, 2008

Also from what I understand from friends that regularly interview for humanities tenure positions at major universities is that the final candidate is generally known well in advance of the shortlist. The shortlist is generally a formality to choose a secondary candidate should the primary candidate refuse the offer. Any questions of sexism in the shortlist are a moot point. Shouldn't your friend already be in on who the final candidate is and how the process works?
posted by JJ86 at 6:27 AM on March 25, 2008

Worth pointing out here that in employment law "disparate treatment" and "disparate impact" are two separate concepts. I'm not a lawyer or particularly well versed in this, but my understanding is that the former is about treating members of protected groups differently because of their group membership, and the latter is about policies that on their face are neutral but have a different impact on different groups. Disparate impact is much harder to demonstrate or force a remedy for. Just some terms to google if you want to keep looking into this.
posted by yarrow at 7:24 AM on March 25, 2008

Response by poster: Wow, dejardins and PercussivePaul get points just for doing the math. desjardins, your analogy with the jellybeans is close but on thinking about it overnight, I wonder if it’s more like: out of 200 jellybeans where 18% are tangerine-flavored, if you actively search for another characteristic—perhaps the smoothest 20 jellybeans—what are the chances that half of those will be tangerine? I have virtually no math foundation so I don’t know if that makes sense or is solvable with what is known.

toomuchpete: Or it could be just like it looks... that the committee is trying to use equality as a proxy for fairness and, in so doing, discriminating against men.
Well, the comments from both these acquaintances—a casual conversation over beers—led me to believe that everyone on the committee was surprised at the mix of genders for the second round. Given that, I don’t think there was an “explicit-within-the-committee policy of trying to get an equal mix,” as LobsterMitten puts it. (In fact, given their surprise my radar might go up again and point out that if anything, the committee was predisposed toward advancing men if only out of habit.) Nor did this acquaintance cry sexism in any official manner, but he did use that term more or less as a verbal shrugging of shoulders to wonder among colleagues what was contributing to this (mathematically) surprising outcome. Thanks everyone!
posted by cocoagirl at 12:03 PM on March 25, 2008

out of 200 jellybeans where 18% are tangerine-flavored, if you actively search for another characteristic—perhaps the smoothest 20 jellybeans—what are the chances that half of those will be tangerine?

This is a good point, particularly if "smoothness" is a continuum rather than a binary quality (smooth vs rough). I'm personally not experienced enough to do the math on this, but I'm confident that it would drastically reduce your chances even further, unless it was known that tangerine jellybeans are ALWAYS the smoothest or somesuch.
posted by desjardins at 12:15 PM on March 25, 2008

« Older Serial killers serve X purpose in society...solve...   |   Fidgeting with a purpose. Newer »