February 1, 2012 8:55 AM Subscribe

Statistics filter: If I asses the same paired variables for the same population at multiple points in time, can I integrate the relation into an overall correlation?

Let's say I measure height and shoe size for the same 100 people every year for five years. For each year, I can calculate a regression / correlation between those variables, giving me r(2012), r(2013), r(2014), and so on.

Due to the same sample size of 100, the reliability of the effects will remain about the same. But shouldn't I be able to calculate an overall effect with a sample size of 500, i.e. at a higher level of confidence?

Obviously, I can't just take the 5x100 samples because the 5 measurements for each person are not independent. But what else could I do to postulate confidently: "5 years of study have shown that height and shoe size are strongly correlated"?
posted by lord_yo to Science & Nature (3 answers total) 1 user marked this as a favorite

Let's say I measure height and shoe size for the same 100 people every year for five years. For each year, I can calculate a regression / correlation between those variables, giving me r(2012), r(2013), r(2014), and so on.

Due to the same sample size of 100, the reliability of the effects will remain about the same. But shouldn't I be able to calculate an overall effect with a sample size of 500, i.e. at a higher level of confidence?

Obviously, I can't just take the 5x100 samples because the 5 measurements for each person are not independent. But what else could I do to postulate confidently: "5 years of study have shown that height and shoe size are strongly correlated"?

No. You'll want to look up "pseudoreplication", but put simply your replicates aren't statistically independent. You don't have 500 replicates, you have [group of 5 measurements from 1 person] replicated 100 times.

posted by Pinback at 12:27 PM on February 1, 2012

I'll check the search terms you've given me - thank you!

@Pinback: Yes, as I said, there is no independence, so I can't use all samples as a big list of measurements. But measuring something 5x as many times will still reduce statistical error, which somehow has to manifest in an increased confidence.

posted by lord_yo at 3:16 AM on February 3, 2012

@Pinback: Yes, as I said, there is no independence, so I can't use all samples as a big list of measurements. But measuring something 5x as many times will still reduce statistical error, which somehow has to manifest in an increased confidence.

posted by lord_yo at 3:16 AM on February 3, 2012

This thread is closed to new comments.

Your toy example would actually be even more complex because it could be considered a time-series multilevel model, bringing in all the gory details from both worlds, or a cross-classified model in which height/shoe size are clustered both in individuals and in years (maybe Genghis Khan shows up in year 3 and chops the toes off of a bunch of people).

posted by ROU_Xenophobe at 9:54 AM on February 1, 2012