Visually exploring and representing survey data.
May 7, 2008 9:45 AM   Subscribe

Best ways to visually explore a large survey data set?

My advisor has advised me to explore my data set visually before diving in statistically. It is a large (N = 180,000+) survey data set comprised of individuals in over 80 countries. Most of the responses are categorical or dichotomous in nature, taking the form "agree/disagree" or "yes/no/maybe." Some of them are Likert-style scales (1-5, Disagree-Agree). Many of the demographic variables are also categorical (for example, rather than asking income, "income level" is asked) but I do have a few continuous variables such as age. My dependent variable of interest is a scale composed of four survey items indexed to 100 (although the actual number of discrete values taken on the scale is rather low owing to the nature of the questions comprising the scale).

What would be some interesting ways to visually explore this data? Obviously, scatterplots (even with jittering) are not the way to go because of the highly redundant and categorical nature of the data. I have a few boxplots that I've generated (usually separating by gender or region). I am open to abstract suggestions or concrete suggestions using R or Stata.
posted by proj to Education (9 answers total) 10 users marked this as a favorite
I do infovis research and there's a new tool that might be of interest to you. It's called Tableau. There's a free trial. It's a very exploratory tool and you can slice and dice the data a lot of different ways and you can also represent different attributes with a pretty wide variety of visual encodings (such as color, size, shapes -- star, triangle, square). It handles ordinal data as well as nominal (categorical) data, so you should be able to load your data pretty easily. There's a bit of a learning curve with the software, but it certainly makes easy things easy and the hard things possible, so I think it's probably worth a shot.
posted by zpousman at 9:52 AM on May 7, 2008

GGobi is good for visually exploring multivariate data with R.
posted by Blazecock Pileon at 10:01 AM on May 7, 2008

I'd like to throw in that I'd be interested in similar software/plug-ins for SPSS if anyone has suggestions.
posted by k8t at 10:06 AM on May 7, 2008

Are there likely to be geographic patterns? Do you have access to a GIS program? Perhaps a cloropleth map would be appropriate. An example.
posted by desjardins at 11:25 AM on May 7, 2008

I meant choropleth.
posted by desjardins at 11:26 AM on May 7, 2008

Lattice plots of contours/histograms of your outcome over groups? Be sure to include the reference histogram/cdf. With data that has many inter-dependent variables, I find it helpful to construct added-variable and component-plus-residual plots.
posted by a robot made out of meat at 12:47 PM on May 7, 2008

Wow, tableau does look nice.
posted by a robot made out of meat at 12:58 PM on May 7, 2008

In Excel, you can do some conditional formatting (especially by color shading) that could help with the categorical data. Although there's a limit to the number of conditions you can use.
posted by dondiego87 at 1:03 PM on May 7, 2008

Thanks for the suggestions. Tableau looks cool indeed but alas I'm on a Mac. The terminal server that I use for data analysis is a PC but I don't have software install authority on it. I installed GGobi but it doesn't seem to want to recognize my dataset. I think at this point I'll just stick with boxplots over several categorical variables.
posted by proj at 7:36 AM on May 8, 2008

« Older Check Debit Madness   |   Clean up your dishes or get tazed...bro Newer »
This thread is closed to new comments.