Z transformations of correlations
December 9, 2009 7:17 PM   Subscribe

When is it appropriate to z-transform correlations? Whats the rationale?

Funny story: I did z-transform some correlations as part of a study I'm working on, but I'm kind of confused about why it was necessary. I'd hate to have somebody ask me when I'm presenting and have to say that I'm not sure.

In one analysis, I was looking at a correlation between two variables in multiple samples. I wanted to find an average for that correlation across all the samples, so z-transformed each individual correlation, averaged those and un-transformed them back into pearson's r.

What I'm reading online makes me think that the z allows for comparisons between correlations in which the distribution is not normal. Is this correct? Does it have something to do for adjusting for different sample sizes? I swear I got things when I started it, but I totally forgot.

Thanks!
posted by gilsonal to Education (5 answers total)
 
I assume you want to test the hypothesis that the correlation of your two variables X and Y is nonzero? The purpose of the Fisher transformation is to give you a tractable distribution of a function of the sample correlation. Under your null hypothesis, you assume that rho (population correlation) = 0. Then you compute z from your sample correlation. Under your null hypothesis, z is distributed normally with mean mu = 1/2 log((1+rho)/(1-rho))|[rho=0] = 0 and variance = 1/sqrt(N-3) (source). With this distribution, you can test the hypothesis with a single sample correlation.

Now, to combine your multiple sample correlations, consider this. You now assume that these correlations are drawn from the null distribution. Thus, their Fisher transformations are also drawn from the above normal distribution (though each one will have a different variance based on the sample size). Now, you have multiple drawings from your null z distribution. Given the null, you can calculate the joint p-value that these sample zs are drawn from the null distribution.
posted by phunniemee at 8:39 PM on December 9, 2009


Response by poster: I'm not doing any significance tests. I'm averaging several correlations from different samples to arrive at a grand mean across eight samples. Its all to allow comparisons of that average to other averages for correlations of other variables in those same samples.
posted by gilsonal at 8:48 PM on December 9, 2009


I am not familiar with using the Fisher transformation as a means to average correlations, though some cursory Googling does reveal that at least one paper has been written about this. I cannot access it, but perhaps you can.

"Averaging correlations: expected values and bias in combined Pearson rs and Fisher's z transformations. (biometric researchers K. Pearson and R.A. Fisher)"
The Journal of General Psychology, 1 July 1998, Corey, David M.; Dunlap, William P.; Burke, Michael J.
posted by phunniemee at 9:01 PM on December 9, 2009


Best answer: The following is from the abstract of "Averaging correlation coefficients: Should Fisher's z transformation be used?" (1987).

"Averaging correlations leads to underestimation because the sampling distribution of the correlation coefficient is skewed. It is also known that if correlations are transformed by Fisher's z prior to averaging, the resulting average overestimates the population value of z.
...
Regardless of sample size, backtransformed average z was always less biased; therefore, the use of the z transformation is recommended when averaging coefficients, particularly when sample size is small."

Here's the abstract link (scroll down mid way). You can probably access the whole thing through your institution.

Here's another summary of why you want to z-transform prior to averaging (via google books).

But, overall, like you said, the distribution of r isn't always normal, and the z-transform normalizes the r values and expresses them in terms of a known distribution so you can meaningfully combine them.
posted by sentient at 10:14 PM on December 9, 2009


It's a usual thing to do with meta-analysis. The alternative is a hierarchical model, which some people (currently I'd say many), prefer.

When you do your average, make sure that you use the inverse of the variance estimate (n_i-3) as the weight for each study.
posted by a robot made out of meat at 5:45 AM on December 10, 2009


« Older Dr. Bonner's weirdo soap   |   Looking to remove a logo from a metal know Newer »
This thread is closed to new comments.