How to correlate 3-variable categorical data?
November 11, 2008 1:03 PM Subscribe
I have a dataset of tuples which consists of 3 nominal variables (categories): X (10 different values), Y (500,000 different values), and Z (4 different values). What statistical method can I use to find correlation between all 3 variables? I feel like this is a contingency table but not sure what to do from there.
Here are an example. Let x_1, x_2, etc. be members of X, and y_1, y_2, etc. be members of Y, and so forth. This statistical method finds that
1) records that contain x_2 usually contain z_3 or z_4
2) records that contain a 80% subset of the Y values don't contain any z_1 and contain a disproportional amount of x_4
If 3-variable correlation is not possible, what method can I use to find correlation between 2 of the variables (X and Z)?
Here are an example. Let x_1, x_2, etc. be members of X, and y_1, y_2, etc. be members of Y, and so forth. This statistical method finds that
1) records that contain x_2 usually contain z_3 or z_4
2) records that contain a 80% subset of the Y values don't contain any z_1 and contain a disproportional amount of x_4
If 3-variable correlation is not possible, what method can I use to find correlation between 2 of the variables (X and Z)?
opps, left out a command
da<-as.data.frame(cbind(x,z,y))
posted by a robot made out of meat at 6:39 PM on November 11, 2008
da<-as.data.frame(cbind(x,z,y))
posted by a robot made out of meat at 6:39 PM on November 11, 2008
Response by poster: Thanks for the response, robot.
I was under the impression ANOVA was for ordinal data (numerical), but doesn't work for nominal data (categories/names). Since X, Y, and Z are all names, will I still be able to use it?
posted by tasty at 7:12 PM on November 11, 2008
I was under the impression ANOVA was for ordinal data (numerical), but doesn't work for nominal data (categories/names). Since X, Y, and Z are all names, will I still be able to use it?
posted by tasty at 7:12 PM on November 11, 2008
Oh, I was under the impression that something with 500k values was actually numbers. I'll think a minute on that.
posted by a robot made out of meat at 5:29 AM on November 12, 2008
posted by a robot made out of meat at 5:29 AM on November 12, 2008
« Older Help me think of modern american skills or... | Positive quotations or aphorisms about crime... Newer »
This thread is closed to new comments.
in R
> x<-round(runif(n=10000)*4)
> z<-round(runif(n=10000)*10)
> y<-rnorm(n=10000)+x/2+z*runif(n=10000,sd=.2)
> table(da$x,da$z)
> summary(lm(da$y~as.factor(da$z)*as.factor(da$x)))
> anova(lm(da$y~as.factor(da$z)*as.factor(da$x)))
posted by a robot made out of meat at 6:38 PM on November 11, 2008