Statistics terminology
November 30, 2011 5:51 PM   Subscribe

What does "posthoc" mean?

Settle a semantics argument I am having with a colleague. Suppose I run an ANOVA analysis on 3 populations: A,B,C. I consider the only informative posthoc comparisons of the initial F test to be the pairwise ones: A vs B, A vs C and B vs C.

My colleague contends that another informative “posthoc” comparison would be to redefine the populations by comparing a single population to a “new” population formed by pooling together the measurements of the other two levels : A vs BC, B vs AC, C vs AB.

My contention is that once you go back and redefine what a population is, then that is a brand new question and no longer a “posthoc” analysis of the initial F test. The results may be interesting, but ought not to be used to interpret the F test. My colleague insists otherwise and claims that both pairwise comparison and comparisons made by pooling populations ought to be termed "posthoc" . Who is correct?
posted by cnanderson to Science & Nature (7 answers total) 1 user marked this as a favorite
 
"Post hoc" (it's two words) is a Latin phrase that just means "after this." Specifically, in statistics, it just means going back to the data to look for patterns you weren't specifically looking for when you started the experiment. So assuming that you have data on each individual subject, and your colleague just wants to go back and look at the subjects grouped in different ways, I think your colleague is correct.
posted by decathecting at 5:58 PM on November 30, 2011 [2 favorites]


This Wikipedia article explains it well.
posted by decathecting at 5:58 PM on November 30, 2011


I agree with your colleague - the comparisons using pooled populations are a subset of all possible comparisons among populations A, B, and C, implicit in running the initial ANOVA.
posted by pemberkins at 5:59 PM on November 30, 2011


My contention is that once you go back and redefine what a population is, then that is a brand new question

Yeah, that's exactly why it's post hoc — because you'd be raising a different question than anyone contemplated when you were collecting the data (assuming you're right about the underlying issue, which I can't figure out).

"Post hoc" doesn't imply "informative" or "a good idea." In fact, "post hoc" has a negative connotation. You're criticizing the analysis he wants to do as being too post hoc.
posted by John Cohen at 6:05 PM on November 30, 2011 [1 favorite]


Any linear combination of the levels of your factor is a post hoc contrast, whether it's 1·A – 1·B + 0·C or 1·A – ½(B+C). It's an unplanned contrast that does not use any new data — and there's only so much you can squeeze from a fixed set of data before you begin to dangerously inflate familywise error.
posted by Nomyte at 6:09 PM on November 30, 2011


Best answer: Post hoc (after the fact) vs. a priori (from before) refers to when you decided you were going to run the test. If, during the design phase of your experiment, you specify that you're going to run a certain comparison with the data because it will be meaningful, it is an a priori comparison when you run it - because it came from before you had the data. If instead, you go running all sorts of tests afterwards that you didn't have an a priori reason for doing, those are post hoc tests, because it came after you have all your data. It's not something inherent to the comparison itself, rather to whether you designed it to be the comparison of interest beforehand.
posted by brainmouse at 7:17 PM on November 30, 2011 [3 favorites]


Response by poster: Ok, so all the comparisons are "post hoc", but I did specify in our disagreement that the pairwise approach comprised an appropriate “posthoc” analysis of the initial F test .

We're scientists and (hopefully) not data miners so from a hypothesis testing framework- The F test investigates this null hypothesis:

H0a: A=B=C

Paired posthoc tests investigate these null hypotheses:
H0x:A=B
H0y:A=C
H0z:B=C

Pooled posthoc tests investigate these null hypotheses:
H0p:A=BC
H0q:B=AC
H0r:C=AB

It is clear to see that conducting all of the paired tests are both efficient and sufficient in selecting among the alternative hypotheses of H0a. The pooled tests are not efficient in selecting among the alternative hypotheses of H0a.

My colleague wanted to skip the paired tests and go straight to pooling our populations together to interpret an initially significant F test (despite the fact that our populations are well defined a priori ). This "feels" wrong to me. It "feels" like testing an altogether new question.

But I can't rightly say that a pooling approach is not "posthoc". Brainmouse states it well that it's "not something inherent to the comparison itself, rather to whether you designed it to be the comparison of interest beforehand." Thanks all!
posted by cnanderson at 7:09 AM on December 1, 2011


« Older Disabling and enabling php.ini functions using SSH...   |   Low Country Books Newer »
This thread is closed to new comments.