Correlation measurements of unequal vectors?
May 1, 2008 8:07 PM Subscribe
Can one perform Pearson's correlation or a variant with unequal numbers of rows?
Is there a variation of Pearson's correlation (or another correlation measurement) that I can use for two vectors X and Y, which have unequal numbers of rows?
Likewise, if I have two sets of data (genomic sequences) that can be "centered", is it reasonable to throw away data at the "edges" which are in X but not in Y, if the edge data do not contribute greatly to the mean and variance? I see this option in R, for example, but am curious about the real-world side effects.
Is there a variation of Pearson's correlation (or another correlation measurement) that I can use for two vectors X and Y, which have unequal numbers of rows?
Likewise, if I have two sets of data (genomic sequences) that can be "centered", is it reasonable to throw away data at the "edges" which are in X but not in Y, if the edge data do not contribute greatly to the mean and variance? I see this option in R, for example, but am curious about the real-world side effects.
No, because by definition there can be no "correlation" if data for one of the variables is missing for certain cases. There are a number of ways to estimate the values for missing data, and these are routinely employed in situations like yours.
posted by Crotalus at 10:12 PM on May 1, 2008
posted by Crotalus at 10:12 PM on May 1, 2008
Oh, and to piggy back on bsdfish's response, if there is a non-random reason why certain people gave their weight but withheld their height, then the "real world side effects" would be a spurious relationship between the variables.
posted by Crotalus at 10:14 PM on May 1, 2008
posted by Crotalus at 10:14 PM on May 1, 2008
Response by poster: There are a number of ways to estimate the values for missing data, and these are routinely employed in situations like yours.
What guides the decision to estimate or to truncate where null values exist in pairs? R's default is to omit ("truncate").
posted by Blazecock Pileon at 10:19 PM on May 1, 2008
What guides the decision to estimate or to truncate where null values exist in pairs? R's default is to omit ("truncate").
posted by Blazecock Pileon at 10:19 PM on May 1, 2008
Best answer: What guides the decision to estimate or to truncate where null values exist in pairs? R's default is to omit ("truncate").
Your judgement. Face validity. Do you have any reason to believe that there is a systematic reason why certain cases have missing data? My decision making process in your case would probably be along these lines: If I think an external reviewer is more likely to tank my article because of too many dropped cases than because I estimated missing data, then I'll estimate. Otherwise I won't. How's that for "real world"?
posted by Crotalus at 10:31 PM on May 1, 2008 [1 favorite]
Your judgement. Face validity. Do you have any reason to believe that there is a systematic reason why certain cases have missing data? My decision making process in your case would probably be along these lines: If I think an external reviewer is more likely to tank my article because of too many dropped cases than because I estimated missing data, then I'll estimate. Otherwise I won't. How's that for "real world"?
posted by Crotalus at 10:31 PM on May 1, 2008 [1 favorite]
Response by poster: That's about as real world as it gets. Thanks for the advice.
posted by Blazecock Pileon at 10:42 PM on May 1, 2008
posted by Blazecock Pileon at 10:42 PM on May 1, 2008
If you have just two variables, I'd be hard-pressed for a good reason to use imputation and estimate something for one.
posted by a robot made out of meat at 4:22 AM on May 2, 2008
posted by a robot made out of meat at 4:22 AM on May 2, 2008
« Older What Linux Distro Should i Use/Everything else you... | Enough with the Calvins wizzing on the Ford/Chevy... Newer »
This thread is closed to new comments.
IE, if you're looking for the correlation between height and weight, you'll measure a bunch of individual's heights and weights, and look at the relationship between the two. If you have a weight measurement without the corresponding height measurement, or vice versa, that's useless for determining correlations.
posted by bsdfish at 9:45 PM on May 1, 2008