How to cluster when dealing with more than one factor?
September 30, 2008 2:07 AM
Subscribe
How to cluster when dealing with more than one factor?
Let's say I have 100 observations of X, each of length n, with two factors:
1 aX1 bX1
2 aX1 bX2
...
n aXn bXn
1 aX1 bX1
2 aX1 bX2
...
n aXn bXn
(...100 of these guys)
Let's say I have 200 observations of Y, each of length n, with two factors:
1 aY1 bY1
2 aY1 bY2
...
n aYn bYn
1 aY1 bY1
2 aY1 bY2
...
n aYn bYn
(...200 of these guys)
I'd like to calculate the correlation (distance) between X and Y, so that I can cluster them. The two factors may have differing levels of dependence from 1...n.
Is there a general approach for reducing X and Y in such a way that I can cluster them?
posted by Blazecock Pileon to science & nature (8 comments total)
2 users marked this as a favorite
With only 2 dimensions, the latter could be a simpler method. What does a plot look like? Are they linearly discriminable? If so, you could look into LDA.
If you're after clusters, and then you want to say X is so-and-so distance from Y, K-means is probably the simplest (with K=2).
posted by handee at 3:09 AM on September 30, 2008