Clustering Discrete Data
September 8, 2010 5:54 PM   Subscribe

Help me figure out how to do cluster analysis on discrete data.

I have a bunch of multiple choice data (surveys, basically) that I would like to analyze by clustering the respondents. In the past, I've used k-means and Ward's method to do this, but my recollection is that both techniques require continuous data.

My data is more like "How do you perform X today?"
A, B, C, D, or E

So are there any techniques I can use to accomplish the same goal with discrete data like multiple choice answers? Even better, are there any good (preferably free or easy-to-try) tools that I can use to do this kind of analysis? In the past I've used XLStat, but looking over it last night I didn't see an obvious method for doing cluster analysis on discrete data.
posted by fremen to Science & Nature (10 answers total) 5 users marked this as a favorite
 
Best answer: The R package cluster has a daisy() function that can generate a dissimilarity matrix for ordinal or nominal scale data using Gower's method. This can then be fed into a clustering function like hclust().
posted by grouse at 6:11 PM on September 8, 2010 [1 favorite]


What programs do you have access to other than XLStat?
posted by emilyd22222 at 7:06 PM on September 8, 2010


Best answer: Get a book on multivariate stats. Check out correspondence analysis. Also, grouse is spot on, but I'd add multidimensional scaling to the list of tools (in R, cmdscale -- you'll need to tweak it a bit).
posted by devilsbrigade at 8:33 PM on September 8, 2010


You'll also want to start calling it categorical data if A, B, C, D, and E aren't ordinal (ranked) numbers like 1,2,3,4,5 and are more like "method A", "method B", etc.
posted by devilsbrigade at 8:43 PM on September 8, 2010


Response by poster: grouse - Thanks. It sounds like I should give R a look.

emilyd22222 - At the moment I just have XLStat on hand. I'm open to other tools though.

devilsbrigade - Can you recommend any books? Also, I agree with the categorical data term. I knew there was something that worked better than discrete, but it just didn't come to mind.
posted by fremen at 9:10 PM on September 8, 2010


Best answer: This is what I used, but its heavy on the math. This looks ok, and covers correspondence analysis, but is expensive.. haven't used it or read any of it though.
posted by devilsbrigade at 11:20 PM on September 8, 2010


Another option (depending on what kind of structure you're thinking about) is called latent class analysis, which is historically more of a psychometrics thing. R has several packages which (claim to) do LCA, but I have never used them. SAS has PROC LCA and PROC LTA, which has its own book. Mplus also does LCA. You can set a similar analysis up in winbugs.
posted by a robot made out of meat at 8:08 AM on September 9, 2010


Response by poster: a robot made out of meat - can you give me a quick explanation about why I would use LCA instead of cluster analysis? Wikipedia simply says that they're related and solve similar problems, but I don't see a good explanation about why.
posted by fremen at 11:14 AM on September 9, 2010


Best answer: I'm not terribly familiar with many of the methods for categorical / ordinal clustering, so I can't say too much. I can tell you that LCA comes with a fully specified statistical model, whereas many clustering techniques are more heuristic. That means you get statements like "the posterior probability that person#312 belongs in cluster #3 is 0.2". That's handy for some purposes.

The flip side is that underneath there really is a model with assumptions, specification, and choices about identifiably. You may not like those assumptions and choices, but at least you can figure out what they are.
posted by a robot made out of meat at 12:33 PM on September 9, 2010


Response by poster: Got it. That's useful, but probably not what I need. But thanks for the info! You never know when you'll need something like this in the future.
posted by fremen at 3:43 PM on September 9, 2010


« Older What cool things can you do with fruit?   |   Looking for SMAP Newer »
This thread is closed to new comments.