Survey research: How can I create a meaningful index from a diverse set of items?
July 15, 2008 8:18 AM
SurveyResearchFilter: Creating an index when response items are on different (and somewhat non-sensical) scales.
I am working with a large survey data set. There are several questions that are of interest to me and these questions hang quite well together in a principal-components factor analysis; therefore, I'd love to make an index of the responses to this question to use a single dependent variable. Unfortunately, the questions are on different scales and some of the scales don't make sense. Two of the questions are dichotomous (agree/disagree). Two of the questions are on a semi-Likert-style scale (1: Strongly agree - 4: Strongly disagree --- I know, it is weird not to have a neutral middle point). One question has the scale 1: Agree, 2: Disagree, 3: Depends.
Obviously, I can't just throw all of the items into an index because the Likert-style questions would be weighted more heavily than the dichotomous or trichotomous questions. Further, I can't really tell what I should do with the trichotomous scale to make it make more sense. Interestingly, there exists a good deal of research using these exact questions and, puzzlingly, this methodological problem has not occurred to previous authors.
How can I create an index from these items while maintaining their respective proportional impact and, further, how can I make that trichotomous scale make sense?
I am working with a large survey data set. There are several questions that are of interest to me and these questions hang quite well together in a principal-components factor analysis; therefore, I'd love to make an index of the responses to this question to use a single dependent variable. Unfortunately, the questions are on different scales and some of the scales don't make sense. Two of the questions are dichotomous (agree/disagree). Two of the questions are on a semi-Likert-style scale (1: Strongly agree - 4: Strongly disagree --- I know, it is weird not to have a neutral middle point). One question has the scale 1: Agree, 2: Disagree, 3: Depends.
Obviously, I can't just throw all of the items into an index because the Likert-style questions would be weighted more heavily than the dichotomous or trichotomous questions. Further, I can't really tell what I should do with the trichotomous scale to make it make more sense. Interestingly, there exists a good deal of research using these exact questions and, puzzlingly, this methodological problem has not occurred to previous authors.
How can I create an index from these items while maintaining their respective proportional impact and, further, how can I make that trichotomous scale make sense?
Transform the weird ones (a, dis, dep) to (a, dep, dis), do principal components (which you already have working) and use the component value.
posted by a robot made out of meat at 8:39 AM on July 15, 2008
posted by a robot made out of meat at 8:39 AM on July 15, 2008
Oh, PCA is just a weighted average (it's a linear transformation) with statistical respectability. The wiki article is fine for this topic, and it's correct that looking at factor analysis could do a similar job for you.
posted by a robot made out of meat at 8:44 AM on July 15, 2008
posted by a robot made out of meat at 8:44 AM on July 15, 2008
Yeah, I'm not sure why the first-dimension principal components wouldn't work for you. Most PCA routines normalize the variances to 1 by default, so each variable will be weighted equally. PCA is just a variance-maximizing average, so it's not fundamentally different from any other index you would build in a more ad hoc manner. They will have the same measurement properties.
Factor scores, 99 times out of 100, will not be substantially different, but bear in mind that those are estimates (with error) of a score on an underlying latent variable, not a direct transformation of the data.
posted by shadow vector at 9:06 AM on July 15, 2008
Factor scores, 99 times out of 100, will not be substantially different, but bear in mind that those are estimates (with error) of a score on an underlying latent variable, not a direct transformation of the data.
posted by shadow vector at 9:06 AM on July 15, 2008
Also let me add that it's unlikely that you need to be too worried about finding the absolute best index. If you're confident that all the individual items measure some common underlying variable t, then any reasonable index will be a more reliable measure of t than any one item would be, through the magic of the sampling distribution.
posted by shadow vector at 9:14 AM on July 15, 2008
posted by shadow vector at 9:14 AM on July 15, 2008
This thread is closed to new comments.
posted by k8t at 8:29 AM on July 15, 2008