Statistical framing of the engineering and extremism article
March 25, 2016 7:01 AM Subscribe
Reading this article on the blue got me thinking about conditional probabilities, prediction and causality. I came up with an analytical framing of what I think the article is saying and would be grateful if stats/social science Mefites could tell me if it seems accurate or else set me right.
Reading this article on the blue got me thinking about conditional probability in a simple discrete 2x2 case.
Suppose there are two discrete random variables in a population of individuals, A and B. According to conditional probability, P(A,B)=P(A|B).P(B)=P(B|A).P(A).
If A and B are statistically independent then P(A|B)=P(A) and P(B|A)=P(B).
Suppose A and B are not independent. Suppose P(A|B)=k.P(A), k>1 But since
P(A,B)=P(A|B).P(B)=P(B|A).P(A) then this implies that k.P(A).P(B) =P(B|A).P(A). Cancelling the P(A)s gives P(B|A)=k.P(B) with k>1.
So I was thinking about this article about engineers and extremist.
I tried to put the article in the framing above. The way I see it, it could be framed for statistical purposes that the world’s population can be partitioned separately by two random variables. Engineer or not-engineer, and extremist or not extremist.
The article notes evidence that suggests that P(engineer|extremist)>P(engineer). I.e. engineers are more prevalent among extremists than they are among the general population. The article then considers explanations of the fact.
However, as far as I can see the algebra above suggests that if the above is true, it IMPLIES that
P(extremist|engineer)>P(extremist), just by the way the 2x2 discrete partitioning and conditional probability works.
I find this a little shocking. As I was reading the article I was sort of turning my nose up at some of the explanations, and the title, which to me sounded a bit like evidence for P(engineer|extremist)>P(engineer), rather than P(extremist|engineer)>P(extremist). Before I went through the algebra, I assumed it would be possible that P(engineer|extremist)>P(engineer) could be consistent with P(extremist|engineer)=P(extremist), but from the algebra above it appears that this is not the case. Possibly in my original thoughts I muddled prediction and causality.
What I would like to know is what implications does the 2x2 discrete partition case have for the example, if it turned out to indeed be the case that P(engineer|extremist)>P(engineer). Does it mean, for example, that as an estimator, in this case rather than looking at a sample of extremists and counting the proportion of engineers among them, we could in principle look at a sample of engineers and count the number of extremists? [Aside from the practical problem that we would need to sample a huge number of engineers to sample any extremists at all.]
Please note: I am aware that statistical dependence is not the same as causality, and that there is a separate “causal” calculus in the statistics/probability literature by Judea Pearl and others, which among other things respects the fact that cause can be unidirectional whereas statistical information flows both ways (i.e. we can predict and retrodict things which both may or not be causally related). I am aware that there is a special P(A|do(B=b)) notation to denote causality. (I have been reading part IV of Cosma Shalizi’s book here )) As I understand it, the special causal “do” refers to a manipulated distribution. Crucially, the manipulated distribution may not be identifiable from observations which you cannot control by experiment. Also note I am interested in the statistical/social science framing of the debate here, not trying to make some kind of oblique point about extremism.
Secondly, I would like to know whether this is a reasonable statistical/social science framing of the article:
There is evidence P(engineer|extremist)>P(engineer). I.e. there is a selection effect that causes there to be a higher proportion of engineers among extremists than there are engineers among the population as a whole. Could the reason for this be that engineers are more likely to be extremists, i.e there is some statistical causality running from engineers to extremism, because of certain psychological traits of engineers?
This is where the psychological explanations are brought in. (In causality notation I think exploring this psychological argument would look like an enquiry as to whether P(extremist|do(Engineer))>P(extremist|do(Not engineer)). Others in comments note that when it comes to specifically terrorist extremists, the selection effect possibly explains itself, since presumably engineers are a large part of the few those with skills to carry them out. So inference to the simplest explanation would suggest that the causal explanation that flips the conditioning for the original observed selection effect and imposes a casual“do” is not required.
Do you think I am on the right lines with this analytical framing? I’m sure you will let me know if I’m far off base here. Many thanks.
Reading this article on the blue got me thinking about conditional probability in a simple discrete 2x2 case.
Suppose there are two discrete random variables in a population of individuals, A and B. According to conditional probability, P(A,B)=P(A|B).P(B)=P(B|A).P(A).
If A and B are statistically independent then P(A|B)=P(A) and P(B|A)=P(B).
Suppose A and B are not independent. Suppose P(A|B)=k.P(A), k>1 But since
P(A,B)=P(A|B).P(B)=P(B|A).P(A) then this implies that k.P(A).P(B) =P(B|A).P(A). Cancelling the P(A)s gives P(B|A)=k.P(B) with k>1.
So I was thinking about this article about engineers and extremist.
I tried to put the article in the framing above. The way I see it, it could be framed for statistical purposes that the world’s population can be partitioned separately by two random variables. Engineer or not-engineer, and extremist or not extremist.
The article notes evidence that suggests that P(engineer|extremist)>P(engineer). I.e. engineers are more prevalent among extremists than they are among the general population. The article then considers explanations of the fact.
However, as far as I can see the algebra above suggests that if the above is true, it IMPLIES that
P(extremist|engineer)>P(extremist), just by the way the 2x2 discrete partitioning and conditional probability works.
I find this a little shocking. As I was reading the article I was sort of turning my nose up at some of the explanations, and the title, which to me sounded a bit like evidence for P(engineer|extremist)>P(engineer), rather than P(extremist|engineer)>P(extremist). Before I went through the algebra, I assumed it would be possible that P(engineer|extremist)>P(engineer) could be consistent with P(extremist|engineer)=P(extremist), but from the algebra above it appears that this is not the case. Possibly in my original thoughts I muddled prediction and causality.
What I would like to know is what implications does the 2x2 discrete partition case have for the example, if it turned out to indeed be the case that P(engineer|extremist)>P(engineer). Does it mean, for example, that as an estimator, in this case rather than looking at a sample of extremists and counting the proportion of engineers among them, we could in principle look at a sample of engineers and count the number of extremists? [Aside from the practical problem that we would need to sample a huge number of engineers to sample any extremists at all.]
Please note: I am aware that statistical dependence is not the same as causality, and that there is a separate “causal” calculus in the statistics/probability literature by Judea Pearl and others, which among other things respects the fact that cause can be unidirectional whereas statistical information flows both ways (i.e. we can predict and retrodict things which both may or not be causally related). I am aware that there is a special P(A|do(B=b)) notation to denote causality. (I have been reading part IV of Cosma Shalizi’s book here )) As I understand it, the special causal “do” refers to a manipulated distribution. Crucially, the manipulated distribution may not be identifiable from observations which you cannot control by experiment. Also note I am interested in the statistical/social science framing of the debate here, not trying to make some kind of oblique point about extremism.
Secondly, I would like to know whether this is a reasonable statistical/social science framing of the article:
There is evidence P(engineer|extremist)>P(engineer). I.e. there is a selection effect that causes there to be a higher proportion of engineers among extremists than there are engineers among the population as a whole. Could the reason for this be that engineers are more likely to be extremists, i.e there is some statistical causality running from engineers to extremism, because of certain psychological traits of engineers?
This is where the psychological explanations are brought in. (In causality notation I think exploring this psychological argument would look like an enquiry as to whether P(extremist|do(Engineer))>P(extremist|do(Not engineer)). Others in comments note that when it comes to specifically terrorist extremists, the selection effect possibly explains itself, since presumably engineers are a large part of the few those with skills to carry them out. So inference to the simplest explanation would suggest that the causal explanation that flips the conditioning for the original observed selection effect and imposes a casual“do” is not required.
Do you think I am on the right lines with this analytical framing? I’m sure you will let me know if I’m far off base here. Many thanks.
Regarding your second question, it might be helpful to consult the source materials rather than just a popular news summary -- either the original 2009 paper or the 2016 book (just the first chapter available online). One thing to keep in mind is that they're talking very specifically about Islamist extremism. They observe some similarities in engineer proportions among other right-wing extremist groups but observe the exact opposite phenomenon (i.e.: very few engineers) among left-wing extremist groups. Also, there's a weird exception among Saudi extremists who have few engineers among them.
The authors go through some of the obvious alternate explanations and present evidence for why they don't really hold up. In particular:
1) Engineers are recruited for their technical experience. This is contradicted by the fact that there's no evidence that extremist groups are recruiting engineers for their technical ability (more on this later) and that most engineers aren't actually doing the bomb-making. Also they point out that getting bomb-making materials is substantially more difficult than putting them together. They point out that "electricians, mechanics, and ex-army officers have shown themselves to be as good at making bombs (if not better) than engineers."
2) Network effects: like recruits like. The authors' counter-evidence here doesn't look quite as strong to me but they point out that their study observes the engineering bias in extremists across four geographic groups (N. Africans, Southeast Asians, Palestinians, and "core" Arabs) that weren't that well connected. Also, most of the people in their data set were radicalized pre-internet so you can rule that out as a factor.
3) Sample bias: engineering-educated extremists are more likely to be successful and/or involved in high-profile plots and hence show up more in the sample of known extremists. The authors actually acknowledge that this is a legitimate concern (at least in the book chapter). I don't have the whole book to see whether they're able to mitigate this later on in the book.
Regarding the psychological mindset explanation, they draw on some circumstantial (but not conclusive) data regarding the somewhat higher rates of conservatism and religiosity among engineers relative to other academic disciplines. However, one interesting data point they describe in the book chapter is in one extremist group's own recruiting literature which recommends recruiting among people with technical or professional qualifications -- but not because of their technical skills. Rather, it's to find people who are "very inquisitive but less challenging and more susceptible to extremist reasoning/arguments" (yikes!).
Also, keep in mind that the authors are pointing to two main hypotheses: 1) some overlap between the engineering mindset and Islamist extremism, and 2) lack of social mobility for engineers in the Muslim world. For the first one, I don't think they're pointing to a causative explanation so much as an "affinity" type of argument. For the second one, however, I think they're indicating more on the causation side but I haven't read enough about what they're saying about it.
posted by mhum at 11:09 AM on March 25, 2016 [3 favorites]
The authors go through some of the obvious alternate explanations and present evidence for why they don't really hold up. In particular:
1) Engineers are recruited for their technical experience. This is contradicted by the fact that there's no evidence that extremist groups are recruiting engineers for their technical ability (more on this later) and that most engineers aren't actually doing the bomb-making. Also they point out that getting bomb-making materials is substantially more difficult than putting them together. They point out that "electricians, mechanics, and ex-army officers have shown themselves to be as good at making bombs (if not better) than engineers."
2) Network effects: like recruits like. The authors' counter-evidence here doesn't look quite as strong to me but they point out that their study observes the engineering bias in extremists across four geographic groups (N. Africans, Southeast Asians, Palestinians, and "core" Arabs) that weren't that well connected. Also, most of the people in their data set were radicalized pre-internet so you can rule that out as a factor.
3) Sample bias: engineering-educated extremists are more likely to be successful and/or involved in high-profile plots and hence show up more in the sample of known extremists. The authors actually acknowledge that this is a legitimate concern (at least in the book chapter). I don't have the whole book to see whether they're able to mitigate this later on in the book.
Regarding the psychological mindset explanation, they draw on some circumstantial (but not conclusive) data regarding the somewhat higher rates of conservatism and religiosity among engineers relative to other academic disciplines. However, one interesting data point they describe in the book chapter is in one extremist group's own recruiting literature which recommends recruiting among people with technical or professional qualifications -- but not because of their technical skills. Rather, it's to find people who are "very inquisitive but less challenging and more susceptible to extremist reasoning/arguments" (yikes!).
Also, keep in mind that the authors are pointing to two main hypotheses: 1) some overlap between the engineering mindset and Islamist extremism, and 2) lack of social mobility for engineers in the Muslim world. For the first one, I don't think they're pointing to a causative explanation so much as an "affinity" type of argument. For the second one, however, I think they're indicating more on the causation side but I haven't read enough about what they're saying about it.
posted by mhum at 11:09 AM on March 25, 2016 [3 favorites]
I think you probably also want to think about those probabilities as quantitative, concrete things as well. If the prior probability of extremism P(extremist) is very low, P(extremist|engineer) and P(engineer) might end up not being very different in practical terms, even if engineers really are substantially over-represented among extremists (i.e., P(engineer|extremist) >> P(engineer)).
posted by en forme de poire at 2:24 PM on March 25, 2016 [1 favorite]
posted by en forme de poire at 2:24 PM on March 25, 2016 [1 favorite]
« Older Examples of interactive art experiences utilizing... | Robust and safe blog for whistleblowing? Newer »
This thread is closed to new comments.
posted by andrewcooke at 9:05 AM on March 25, 2016 [1 favorite]