Distributions/Plots for Social Sciences
January 2, 2017 2:43 PM   Subscribe

I'm writing a story and I need a marginally believable conversation between a professor and grad student in the social sciences discussing some data. The student is showing the prof some data analysis and ...

... I need the scene to get to a point where the prof points out to the student data viewed one way (say a Weibull distribution or a scatter plot) has a certain aesthetic appeal but is specious, whereas look at another way, looks less attractive but is more accurate in its representation. What are common methods of data analysis in the social sciences? I don't need absolute preciseness, just something that's good enough.
posted by falsedmitri to Science & Nature (7 answers total) 2 users marked this as a favorite
Pie charts are commonly used by my social science students when they should be banned from science entirely.

Social sciences is a really broad term. My data can look like anything, and I use visualization techniques ranging from actual photographs from participants to theoretical models drawn by hand or in a computer program to radar charts to good old histograms. A little more specificity in your question would help - what kind of social science are we talking about here?

You may look at Edward Tufte's work for examples on gorgeous and honest visualization techniques.
posted by sockermom at 2:52 PM on January 2, 2017

Pie charts are actually fine for visualizing categorical data provided that the categories are mutually exclusive and no data point is counted in more than one category. And if the percentages are big enough to make wedges you can easily see.

But that's a tangent and, yes, Edward Tufte.

However, a fine place to start would be Khan Academy. Their videos start at 6th grade level but they also go up to more advanced levels, and also, a lot of the visualization techniques that you learn in 6th grade continue to be useful as you advance.
posted by tel3path at 3:18 PM on January 2, 2017

Well, if you need "good enough" and something that is totally believable, then go with a bunch of pie charts, because though they are a complete plague, they are all over the place. They are so innocuous that no one ever notices how badly they can misrepresent distributions in data, which it sounds like your character is trying to do. They ain't fancy, but they fit your bill pretty well for comparing with other methods of visualizing the same distributions.
posted by paco758 at 3:32 PM on January 2, 2017

I could also suggest the student showing the professor a cross-tab between two categorical variables and saying one causes the other because the chi-square is statistically significant, and the professor saying a) correlation is not causation, and b) that if you do a logistic regression and control for other variables, like gender, age, education, etc., the result the student found disappears. The ice cream/homicide example is along these lines.

However, the data visualization examples are a lot sexier and are more likely to be used by students. I find it unlikely that a grad student would mis-use a pie chart though. They'd be more likely to misuse one of the super-fancy data visualizations available in R, like a whisker chart (although that's not very fancy) where the professor would be likely to ask questions I suggest above about controlling for other factors.
posted by OrangeDisk at 3:54 PM on January 2, 2017 [2 favorites]

The "Aggregation by quantiles erroneously amplifies trend" section of Common errors in statistical analyses comes to mind. It doesn't exactly fit what you were asking for but might work for your story.
posted by grouse at 4:41 PM on January 2, 2017 [2 favorites]

Unless your target audience are themselves social scientists or other people well versed in quantitative methods, you're going to want something very simple to get.

So: just use the ecological fallacy. They have an idea about people that more X goes with more Y. One of them looks at, say, counties and finds that counties with more X have more Y. But the other shows with individual level data that there's actually no relationship at the individual level or it goes the other way. Most years, this is actually the case with income and presidential vote -- poor states vote Republican and rich states Democratic, but poor individuals vote Democratic and rich ones Republican. Mowstly.
posted by ROU_Xenophobe at 5:15 PM on January 2, 2017 [1 favorite]

Well, following the scatter plot assumptions of normality, linearity, homoscedasticitty, independence from error and potential covariates would be looked at to see if it can follow the central limit theorem (eg parametric testing) or non-parametric testing.

In English this means you would be looking for correlations between two things (coffee and depression, say) and whether a confounding variable could be at play (number of hours of sleep) and what methods could be used to draw conclusions.

Source: I'm working on my MS in Clinical Psychology.

I also encourage you to google Andy Field (his website is called "Statistics Hell"). He is a lecturer but does so in a way that is completely relatable and hilarious. My stat class used his text book.
posted by floweredfish at 5:59 PM on January 2, 2017 [1 favorite]

« Older Lessons that stuck with you   |   To pseudonym or to not pseudonym? Newer »
This thread is closed to new comments.