April 10, 2012 12:16 PM Subscribe

Could someone help me to analyze statistical relationship between two dimensions of a frequency matrix - preferably in Excel?

I have a matrix of data frequencies, collected from a qualitative data analysis. I need to analyze whether there is a relationship between the two dimensions of this matrix, but don't know which test to use, to determine this.

Project frequencies in each dimension, A or B, are summarized in a 3 x 3 matrix, e.g.

A-Low A-Med A-High

B-Low 20 35 13

B-Med 13 80 28

B-High 2 13 81

From these frequencies, it is clear that projects in A-Low are most likely to also be in category B-Low. Projects in category A-Med are most likely to also be in category B-Med. Projects in category C-High are most likely to also be in category B-High. I would like to measure the strength of the relationship between these variables, but I don't know what type of statistical test would be valid for these data. Any suggestions?

If can determine the strength of the relationship, how would I interpret the values of this measure of co-relationship?
posted by Susurration to Science & Nature (10 answers total) 1 user marked this as a favorite

I have a matrix of data frequencies, collected from a qualitative data analysis. I need to analyze whether there is a relationship between the two dimensions of this matrix, but don't know which test to use, to determine this.

Project frequencies in each dimension, A or B, are summarized in a 3 x 3 matrix, e.g.

A-Low A-Med A-High

B-Low 20 35 13

B-Med 13 80 28

B-High 2 13 81

From these frequencies, it is clear that projects in A-Low are most likely to also be in category B-Low. Projects in category A-Med are most likely to also be in category B-Med. Projects in category C-High are most likely to also be in category B-High. I would like to measure the strength of the relationship between these variables, but I don't know what type of statistical test would be valid for these data. Any suggestions?

If can determine the strength of the relationship, how would I interpret the values of this measure of co-relationship?

Whoops - I meant A-High is related to B-High (not C-High as typed).

posted by Susurration at 12:21 PM on April 10, 2012

posted by Susurration at 12:21 PM on April 10, 2012

Yes, there are three levels of each variable (project dimensions).

posted by Susurration at 12:21 PM on April 10, 2012

posted by Susurration at 12:21 PM on April 10, 2012

You could measure this relationship with a simple Pearson correlation coefficient. I think in Excel, the function is CORR() or something like that -- just search help for "correlation" or "pearson".

You'll probably have to reformat the data however. Make two columns (A and B), and assign the numbers 1, 2, and 3 to each level. Make 20 rows of 1,1, then 35 rows of 2,1, and so forth. Then just plug the two columns into the correlation function.

posted by mikeand1 at 1:19 PM on April 10, 2012

You'll probably have to reformat the data however. Make two columns (A and B), and assign the numbers 1, 2, and 3 to each level. Make 20 rows of 1,1, then 35 rows of 2,1, and so forth. Then just plug the two columns into the correlation function.

posted by mikeand1 at 1:19 PM on April 10, 2012

Unless I'm misunderstanding your experiment, you have three categories and two dimensions. Take a look at contingency tables. You could do a Fisher's exact test or a chi-squared test to determine relationships. There may be plug-ins for this calculation for Excel, but you could do this with R, as well.

posted by Blazecock Pileon at 1:22 PM on April 10, 2012

posted by Blazecock Pileon at 1:22 PM on April 10, 2012

Oh, and as two your last question: The Pearson correlation coefficient ranges between -1 and 1, where 1 indicates a perfect positive linear correlation, and -1 indicates a perfect negative correlation.

In other words, if all of your observations had fallen on the diagonal of that matrix (A-Low and B-Low, A-Med and B-Med, and A-High and B-High), the coefficient would be 1.

Conversely, if all the observations had fallen on A-Low and B-High, A-Med and B-Med, and A-High and B-Low, the coefficient would be -1.

Just eyeballing it, I'd guess that you'll end up with a coefficient around 0.6 or so.

posted by mikeand1 at 1:24 PM on April 10, 2012

In other words, if all of your observations had fallen on the diagonal of that matrix (A-Low and B-Low, A-Med and B-Med, and A-High and B-High), the coefficient would be 1.

Conversely, if all the observations had fallen on A-Low and B-High, A-Med and B-Med, and A-High and B-Low, the coefficient would be -1.

Just eyeballing it, I'd guess that you'll end up with a coefficient around 0.6 or so.

posted by mikeand1 at 1:24 PM on April 10, 2012

Also, here's how to calculate the correlation coefficient in Excel. The function is CORREL.

posted by mikeand1 at 1:26 PM on April 10, 2012

posted by mikeand1 at 1:26 PM on April 10, 2012

The thing to keep in mind with the chi-square test (or the Fisher exact test) is that it's testing for *any* kind of relationship at all. The Excel function for a chi-square test appears to be CHISQ.TEST. The Pearson correlation coefficient tests for something more specific: namely that increasing levels of one variable correspond to (linearly) increasing levels of the other variable. (As an aside, if your "high", "medium" and "low" categories are discretised versions of something that you think is continuous in the real world, what you probably want is a polychoric correlation coefficient not the Pearson correlation coefficient, but I doubt Excel actually does this)

Overall, if you don't want to do anything fancy, I'd suggest:

(1) Run the chi-square test to check that there is some kind of relationship in your contingency table (typically, if you get p<.05 then you can safely assume that some relationship exists)

(2) Calculate the correlation coefficient as a measure of how strong the relationship is (e.g. r=.6 is moderately strong)

(3) When interpreting the correlation, make a note of the fact that there appear to be two things going on: yes, higher levels of A tend to correspond to higher levels of B. But also note that variable A tends to have higher levels in general than variable B, since that can be important in some contexts.

posted by mixing at 3:20 PM on April 10, 2012 [1 favorite]

Overall, if you don't want to do anything fancy, I'd suggest:

(1) Run the chi-square test to check that there is some kind of relationship in your contingency table (typically, if you get p<.05 then you can safely assume that some relationship exists)

(2) Calculate the correlation coefficient as a measure of how strong the relationship is (e.g. r=.6 is moderately strong)

(3) When interpreting the correlation, make a note of the fact that there appear to be two things going on: yes, higher levels of A tend to correspond to higher levels of B. But also note that variable A tends to have higher levels in general than variable B, since that can be important in some contexts.

posted by mixing at 3:20 PM on April 10, 2012 [1 favorite]

Do you have access to the raw data for this?

Rather than chunk it together as is done here (low, med and high) I'd set up a dot plot that graphed each raw A, B pair and then draw a line through the whole mess using the slope, intercept and correl functions.

This works better the more evenly distributed your data is over the ranges.

posted by Kid Charlemagne at 4:20 PM on April 10, 2012

Rather than chunk it together as is done here (low, med and high) I'd set up a dot plot that graphed each raw A, B pair and then draw a line through the whole mess using the slope, intercept and correl functions.

This works better the more evenly distributed your data is over the ranges.

posted by Kid Charlemagne at 4:20 PM on April 10, 2012

Thanks so much for all the really helpful contributions. Yes, Kid Charlemagne, I do have the raw data - and I understand now, from mikeand1's explanation, that I really should have thought about running a correlation analysis on this data set, rather than on my frequency matrix. (Duh!).

Thanks for all the excellent suggestions for tests - this is really helpful. I may even try playing with R (thanks, Blazecock Pileon). I do remember some of this stuff, but as with everything else, if you don't use it, you lose it ... :-)

posted by Susurration at 5:01 PM on April 10, 2012

Thanks for all the excellent suggestions for tests - this is really helpful. I may even try playing with R (thanks, Blazecock Pileon). I do remember some of this stuff, but as with everything else, if you don't use it, you lose it ... :-)

posted by Susurration at 5:01 PM on April 10, 2012

This thread is closed to new comments.

posted by k8t at 12:20 PM on April 10, 2012