Statistics question when value of test statistic t...
July 1, 2013 9:20 AM
I'm working on a problem for "Inferences about the difference between two population means for independent samples: sigma 1 & 2 unknown and unequal."
The final value of "test statistic t" falls in the rejection region for 95% confidence interval, but falls in the nonrejection region for 99% confidence interval.
Should I perform additional calculations before rejecting my null hypothesis?
It depends on the terms of your assignment and what you're supposed to be learning from this particular assignment -- a lesson/problem set that looks like simple calculations of t values might really be a lesson about how confidence levels in hypothesis testing work.
I would ask the person running the course.
posted by ROU_Xenophobe at 9:58 AM on July 1, 2013
I would ask the person running the course.
posted by ROU_Xenophobe at 9:58 AM on July 1, 2013
err... the professor suggested that when alpha is not given, that he wants us to use 0.05 for 95% confidence intervals and 0.01 for 99% confidence intervals. I'm done with the assignment. I was just wondering in real world situations, what do people do with data like that? it seems like there should be more to it.
posted by iamcharity at 10:11 AM on July 1, 2013
posted by iamcharity at 10:11 AM on July 1, 2013
he wants us to use 0.05 for 95% confidence intervals and 0.01 for 99% confidence intervals
Those are... the definition of 95% and 99% confidence intervals.
There is no like... "real world" application of this -- it's thresholds for publishability. Different fields and different subfields and different types of experiments in different subfields have different thresholds, depending on a wide variety of things. The rough default is that 95% is generally publishable (then you get into one-tailed vs. two-tailed arguments, though). If you're trying to say something opposite from what a body of literature already says, generally you would need to do better than that. Some fields now also take power calculations into account. I'm not sure what you mean by "more to it" though... for a lot of research, at the end of all the data collection and manipulation and math, at the end of the day you're running a super simple t-test and seeing if it hits that 95% barrier.
posted by brainmouse at 10:27 AM on July 1, 2013
Those are... the definition of 95% and 99% confidence intervals.
There is no like... "real world" application of this -- it's thresholds for publishability. Different fields and different subfields and different types of experiments in different subfields have different thresholds, depending on a wide variety of things. The rough default is that 95% is generally publishable (then you get into one-tailed vs. two-tailed arguments, though). If you're trying to say something opposite from what a body of literature already says, generally you would need to do better than that. Some fields now also take power calculations into account. I'm not sure what you mean by "more to it" though... for a lot of research, at the end of all the data collection and manipulation and math, at the end of the day you're running a super simple t-test and seeing if it hits that 95% barrier.
posted by brainmouse at 10:27 AM on July 1, 2013
OK. Well then I guess it depends on what sort of "real world situations" one is talking about. i.e., how much does it matter that you might be right (vs wrong) about whether the two groups have different mean values with respect to some issue? Oh, and consider the sample size too--is it large enough to even guage such a question in the first place.
Really, it is rather difficult to relate theoretical statistical notions such as "confidence interval" to the "real world".
posted by Halo in reverse at 10:27 AM on July 1, 2013
Really, it is rather difficult to relate theoretical statistical notions such as "confidence interval" to the "real world".
posted by Halo in reverse at 10:27 AM on July 1, 2013
We expect that if an experiment is repeated 100 times, 95 times the difference in the means will lie within the 95% confidence interval that you calculated. If that interval doesn't include zero, that's good enough evidence for most people that the means are detectably different. But it's always the case that a higher-percentage confidence interval is wider because it has to contain more future possibilities. In this case, the 99% interval was large enough to include zero (or whatever your null-hypothesis difference was). This is absolutely typical, and represents the tradeoff between setting bounds on future data and detecting a difference between groups of existing data.
Really, it is rather difficult to relate theoretical statistical notions such as "confidence interval" to the "real world".
A confidence interval can be used to infer the likelihood of future events (such as the frequency of obtaining out-of-tolerance parts, for example), and is easily and immediately applicable to the real world.
posted by Mapes at 10:59 AM on July 1, 2013
Really, it is rather difficult to relate theoretical statistical notions such as "confidence interval" to the "real world".
A confidence interval can be used to infer the likelihood of future events (such as the frequency of obtaining out-of-tolerance parts, for example), and is easily and immediately applicable to the real world.
posted by Mapes at 10:59 AM on July 1, 2013
No additional calculations needed! To make sure that you're properly controlling your Type I Error, you need to pick your alpha BEFORE your experiment and then reject strictly based upon the observed p-value.
posted by zscore at 11:24 AM on July 1, 2013
posted by zscore at 11:24 AM on July 1, 2013
zscore just said exactly what I came in to say: your hypothesis and alpha levels would be defined a priori in a real world setting.
posted by lulu68 at 11:29 AM on July 1, 2013
posted by lulu68 at 11:29 AM on July 1, 2013
In most science papers they would report the 95%CI and the p-value. Whether the effect size is interesting or a p value between 0.05 and 0.01 is enough certainty for your application is, well, application dependent. If rejecting your null hypothesis means spending $millions or changing how you treat patients, that may not be enough. If it means trying some new experiments it might be. Very few problems actually come down to picking your alpha and sticking to it for type 1 error rate control (a Neyman approach); most people use p-values numerically (like Fisher) or are really interested in the false discovery rate instead of T1ER. If the stakes are high, an actual decision analysis can be done, but is complicated. Bayesians of course do it all differently.
Something marginally significant like that will often inspire someone to search for changes in the specification or analysis to see if the result goes away, but really one should do that anyway.
posted by a robot made out of meat at 11:35 AM on July 1, 2013
Something marginally significant like that will often inspire someone to search for changes in the specification or analysis to see if the result goes away, but really one should do that anyway.
posted by a robot made out of meat at 11:35 AM on July 1, 2013
The final value of "test statistic t" falls in the rejection region for 95% confidence interval, but falls in the nonrejection region for 99% confidence interval. Should I perform additional calculations before rejecting my null hypothesis?
The simple answer is no. You set a criterion before you begin (95% for the sake of example), then you exceed the criterion. By definition, your result is significant under that criterion. It makes very little sense to then choose a stricter criterion and say, "hey, under this stricter criterion, my result is not significant." No result passes every conceivable test of significance.
In the real world, people may quibble with you if your test statistic is very close to the threshold. Some will call that "marginal." Others will take a test statistic that just barely fails and call that marginal and publish it along with other results.
Ultimately, the analysis portion of your manuscript is for pointing to differences you find interesting and then saying, "This isn't very likely to arise by chance. How unlikely? Let me quantify it for you!"
posted by Nomyte at 12:30 PM on July 1, 2013
The simple answer is no. You set a criterion before you begin (95% for the sake of example), then you exceed the criterion. By definition, your result is significant under that criterion. It makes very little sense to then choose a stricter criterion and say, "hey, under this stricter criterion, my result is not significant." No result passes every conceivable test of significance.
In the real world, people may quibble with you if your test statistic is very close to the threshold. Some will call that "marginal." Others will take a test statistic that just barely fails and call that marginal and publish it along with other results.
Ultimately, the analysis portion of your manuscript is for pointing to differences you find interesting and then saying, "This isn't very likely to arise by chance. How unlikely? Let me quantify it for you!"
posted by Nomyte at 12:30 PM on July 1, 2013
Psst. Don't assume equal variances in the real world.
Edit: whoops! Misread this one- ignore my comment.
posted by wittgenstein at 7:23 PM on July 1, 2013
Edit: whoops! Misread this one- ignore my comment.
posted by wittgenstein at 7:23 PM on July 1, 2013
This thread is closed to new comments.
Anyway, this is all to say: what is your alpha criterion (cut-off)? It seems that you have a p-value less than .05 but greater than .01.
[Edited to add: Re-reading, I wonder if you may need to use a different table or formula to determine the rejection regions, since you stated that "sigma 1 & 2 unknown and unequal". In other words, your groups may have unequal variances. Is this ringing a bell?]
posted by Halo in reverse at 9:55 AM on July 1, 2013