Visually exploring and representing survey data.
May 7, 2008 9:45 AM Subscribe
Best ways to visually explore a large survey data set?
My advisor has advised me to explore my data set visually before diving in statistically. It is a large (N = 180,000+) survey data set comprised of individuals in over 80 countries. Most of the responses are categorical or dichotomous in nature, taking the form "agree/disagree" or "yes/no/maybe." Some of them are Likert-style scales (1-5, Disagree-Agree). Many of the demographic variables are also categorical (for example, rather than asking income, "income level" is asked) but I do have a few continuous variables such as age. My dependent variable of interest is a scale composed of four survey items indexed to 100 (although the actual number of discrete values taken on the scale is rather low owing to the nature of the questions comprising the scale).
What would be some interesting ways to visually explore this data? Obviously, scatterplots (even with jittering) are not the way to go because of the highly redundant and categorical nature of the data. I have a few boxplots that I've generated (usually separating by gender or region). I am open to abstract suggestions or concrete suggestions using R or Stata.
My advisor has advised me to explore my data set visually before diving in statistically. It is a large (N = 180,000+) survey data set comprised of individuals in over 80 countries. Most of the responses are categorical or dichotomous in nature, taking the form "agree/disagree" or "yes/no/maybe." Some of them are Likert-style scales (1-5, Disagree-Agree). Many of the demographic variables are also categorical (for example, rather than asking income, "income level" is asked) but I do have a few continuous variables such as age. My dependent variable of interest is a scale composed of four survey items indexed to 100 (although the actual number of discrete values taken on the scale is rather low owing to the nature of the questions comprising the scale).
What would be some interesting ways to visually explore this data? Obviously, scatterplots (even with jittering) are not the way to go because of the highly redundant and categorical nature of the data. I have a few boxplots that I've generated (usually separating by gender or region). I am open to abstract suggestions or concrete suggestions using R or Stata.
GGobi is good for visually exploring multivariate data with R.
posted by Blazecock Pileon at 10:01 AM on May 7, 2008
posted by Blazecock Pileon at 10:01 AM on May 7, 2008
I'd like to throw in that I'd be interested in similar software/plug-ins for SPSS if anyone has suggestions.
posted by k8t at 10:06 AM on May 7, 2008
posted by k8t at 10:06 AM on May 7, 2008
Are there likely to be geographic patterns? Do you have access to a GIS program? Perhaps a cloropleth map would be appropriate. An example.
posted by desjardins at 11:25 AM on May 7, 2008
posted by desjardins at 11:25 AM on May 7, 2008
I meant choropleth.
posted by desjardins at 11:26 AM on May 7, 2008
posted by desjardins at 11:26 AM on May 7, 2008
Lattice plots of contours/histograms of your outcome over groups? Be sure to include the reference histogram/cdf. With data that has many inter-dependent variables, I find it helpful to construct added-variable and component-plus-residual plots.
posted by a robot made out of meat at 12:47 PM on May 7, 2008
posted by a robot made out of meat at 12:47 PM on May 7, 2008
Wow, tableau does look nice.
posted by a robot made out of meat at 12:58 PM on May 7, 2008
posted by a robot made out of meat at 12:58 PM on May 7, 2008
In Excel, you can do some conditional formatting (especially by color shading) that could help with the categorical data. Although there's a limit to the number of conditions you can use.
posted by dondiego87 at 1:03 PM on May 7, 2008
posted by dondiego87 at 1:03 PM on May 7, 2008
Response by poster: Thanks for the suggestions. Tableau looks cool indeed but alas I'm on a Mac. The terminal server that I use for data analysis is a PC but I don't have software install authority on it. I installed GGobi but it doesn't seem to want to recognize my dataset. I think at this point I'll just stick with boxplots over several categorical variables.
posted by proj at 7:36 AM on May 8, 2008
posted by proj at 7:36 AM on May 8, 2008
This thread is closed to new comments.
posted by zpousman at 9:52 AM on May 7, 2008