# The Statistics of Test Data

February 5, 2010 3:30 PM Subscribe

Statistics Filter: Given a small sample of test data, how can I derive a conservative estimate of the 95% confidence interval around my sample mean as it would apply to the whole population?

I have about 10 tests that were conducted on a sample material. The test was to apply a flame promoter to a sample of a particular configuration and then measure the burn length of the sample. The test was conducted in the same way for 10 samples of a material that were in the same configuration. The burn lengths varied from test-to-test.

From that data, what can I now say about any other sample of the material in the same configuration and subjected to the same test?

Or rather, how do I calculate the 95% confidence interval around the mean of my test results (burn lengths) so I can call it an estimate of my population mean?

Note: I took 1 class in Engineering Statistics in College, and I got a C. :(

I found this website: http://www.stat.yale.edu/Courses/1997-98/101/confint.htm

And I think that the example under "Confidence Intervals for Unknown Mean and Unknown Standard Deviation" is applicable. But I'm a little confused. In the example, the t-distribution is used to estimate a 95% confidence interval around the mean. But the interval derived is much less than the standard deviation derived from the samples. In this case, the standard deviation derived from the samples (I think that is called the Sample Error) is 0.733 but the method produced an interval of ±0.126 (for their example). This is really confusing to me, because I remember that 2*standard deviation contains 95% of your set. This is an estimate of the entire population. Wouldn't your 95% confidence interval for your population be larger than the 2*standard deviation of your sample?

It seems like this question I have is something that has been asked and answered a lot. But I'm having a hard time finding examples that are easy for me to follow and to apply to my results. The examples I can find and can follow (like the one I linked to), I question the results and accuracy of the method.

What I would like to (ultimately) say (I think), is given the same test configuration but with different material samples pulled from the same population of material samples another person conducting the test will find that their results are within my range. That is, if they conducted 100 tests, 95 of those tests would be within my confidence interval of my mean.

I have about 10 tests that were conducted on a sample material. The test was to apply a flame promoter to a sample of a particular configuration and then measure the burn length of the sample. The test was conducted in the same way for 10 samples of a material that were in the same configuration. The burn lengths varied from test-to-test.

From that data, what can I now say about any other sample of the material in the same configuration and subjected to the same test?

Or rather, how do I calculate the 95% confidence interval around the mean of my test results (burn lengths) so I can call it an estimate of my population mean?

Note: I took 1 class in Engineering Statistics in College, and I got a C. :(

I found this website: http://www.stat.yale.edu/Courses/1997-98/101/confint.htm

And I think that the example under "Confidence Intervals for Unknown Mean and Unknown Standard Deviation" is applicable. But I'm a little confused. In the example, the t-distribution is used to estimate a 95% confidence interval around the mean. But the interval derived is much less than the standard deviation derived from the samples. In this case, the standard deviation derived from the samples (I think that is called the Sample Error) is 0.733 but the method produced an interval of ±0.126 (for their example). This is really confusing to me, because I remember that 2*standard deviation contains 95% of your set. This is an estimate of the entire population. Wouldn't your 95% confidence interval for your population be larger than the 2*standard deviation of your sample?

It seems like this question I have is something that has been asked and answered a lot. But I'm having a hard time finding examples that are easy for me to follow and to apply to my results. The examples I can find and can follow (like the one I linked to), I question the results and accuracy of the method.

What I would like to (ultimately) say (I think), is given the same test configuration but with different material samples pulled from the same population of material samples another person conducting the test will find that their results are within my range. That is, if they conducted 100 tests, 95 of those tests would be within my confidence interval of my mean.

I would take the standard error of the mean, and times that by 1.96 (the 95% interval) and then your errors are the mean of your samples plus and minus the SEM. Quick formula is 1.96*(standard deviation of samples)/squareroot(number of samples)

posted by scodger at 4:40 PM on February 5, 2010

posted by scodger at 4:40 PM on February 5, 2010

What you want is called a prediction interval.

posted by a robot made out of meat at 5:07 PM on February 5, 2010

posted by a robot made out of meat at 5:07 PM on February 5, 2010

You are indeed confusing a few things. Let me see if I can back things up a bit.

You have a sample of ten burn lengths on pieces of material.

Let's say you wonder what the average burn length is for all pieces of that material. This is what making an inference about the true population mean would be. NOTE: THIS IS VERY MUCH NOT THE SAME AS WONDERING ABOUT HOW ANOTHER SAMPLE OF TEN PIECES OF MATERIAL WOULD BURN.

To figure this out, you need to know the mean burn length of your sample, the standard deviation of burn lengths in your sample, and the number of observations in your sample. You need to know these things because what you're really interested in, in the abstractest sense, is not anything about burn lengths. What you're really interested in is the question of how

(1) Approximately normally

(2) With mean mu (the population mean) -- sample means are, on average, on target.

(3) And a standard deviation of the population standard deviation divided by the square root of the sample size. Or, sigma/sqrt(N). This means that the bigger the sample, closer sample means tend to be to their population mean. This value, sigma/sqrt(N), is the standard error of the sample.

This is the central limit theorem. It is proof that God loves us and wants us to be happy. Even better proof of that than beer is.

What you're doing with confidence intervals is running the logic of a sample backwards. Say you have a sample average of ten units and a standard deviation of one unit. What population did this come from? Did it come from a population with mean one? That would be very unlikely -- you would have to have had really bad luck when you drew your sample. Maybe from a population mean of four... but you'd still have to be unlucky to get that. Imagine going through each point on the number line and asking, "How likely would it be to get this sample that I got from that population?" And the confidence interval is, roughly, the set of population means that are reasonably likely to have produced that sample mean.

Now, you know the sample mean. You know the size of the sample, because you did it. But you don't know sigma, the true population standard deviation. But you *do* know the sample standard deviation, and you can use that as a guess. As a way to account for being a little bit unsure about what the true, exact population sigma is, you use the T distribution instead of the normal.

It's not weird for the confidence interval range to be smaller than your sample standard deviation. This happens because your confidence interval is about two* times your standard

*Strictly it is t025 time the SE, where the value of t025 depends on your sample size. For a sample of ten, it's about 2.3.

posted by ROU_Xenophobe at 8:58 PM on February 5, 2010

You have a sample of ten burn lengths on pieces of material.

Let's say you wonder what the average burn length is for all pieces of that material. This is what making an inference about the true population mean would be. NOTE: THIS IS VERY MUCH NOT THE SAME AS WONDERING ABOUT HOW ANOTHER SAMPLE OF TEN PIECES OF MATERIAL WOULD BURN.

To figure this out, you need to know the mean burn length of your sample, the standard deviation of burn lengths in your sample, and the number of observations in your sample. You need to know these things because what you're really interested in, in the abstractest sense, is not anything about burn lengths. What you're really interested in is the question of how

*sample*means are distributed. And as it turns out, if the size of the sample is Not Too Small, then sample means are distributed(1) Approximately normally

(2) With mean mu (the population mean) -- sample means are, on average, on target.

(3) And a standard deviation of the population standard deviation divided by the square root of the sample size. Or, sigma/sqrt(N). This means that the bigger the sample, closer sample means tend to be to their population mean. This value, sigma/sqrt(N), is the standard error of the sample.

This is the central limit theorem. It is proof that God loves us and wants us to be happy. Even better proof of that than beer is.

What you're doing with confidence intervals is running the logic of a sample backwards. Say you have a sample average of ten units and a standard deviation of one unit. What population did this come from? Did it come from a population with mean one? That would be very unlikely -- you would have to have had really bad luck when you drew your sample. Maybe from a population mean of four... but you'd still have to be unlucky to get that. Imagine going through each point on the number line and asking, "How likely would it be to get this sample that I got from that population?" And the confidence interval is, roughly, the set of population means that are reasonably likely to have produced that sample mean.

Now, you know the sample mean. You know the size of the sample, because you did it. But you don't know sigma, the true population standard deviation. But you *do* know the sample standard deviation, and you can use that as a guess. As a way to account for being a little bit unsure about what the true, exact population sigma is, you use the T distribution instead of the normal.

It's not weird for the confidence interval range to be smaller than your sample standard deviation. This happens because your confidence interval is about two* times your standard

*error*, not standard*deviation*. In this case, we'd expect their confidence interval range to be +/- 1.96 times the standard deviation divided by the square root of 120. This works out to mean that the final +/- should be about 18% of the sample standard deviation, and it is.*Strictly it is t025 time the SE, where the value of t025 depends on your sample size. For a sample of ten, it's about 2.3.

posted by ROU_Xenophobe at 8:58 PM on February 5, 2010

OTOH, you also ask:

For this you would indeed want a prediction interval.

The way to think of the difference goes like this:

Say I have sample data about people's heights.

If I want to guess at the average height of people in general, that's a confidence interval.

If I want to guess at how high the average of the next five people who walk by will be -- those specific five people -- then that's a prediction interval. Prediction intervals are wider than confidence intervals.

posted by ROU_Xenophobe at 9:03 PM on February 5, 2010

*what can I now say about any other sample of the material in the same configuration and subjected to the same test*For this you would indeed want a prediction interval.

The way to think of the difference goes like this:

Say I have sample data about people's heights.

If I want to guess at the average height of people in general, that's a confidence interval.

If I want to guess at how high the average of the next five people who walk by will be -- those specific five people -- then that's a prediction interval. Prediction intervals are wider than confidence intervals.

posted by ROU_Xenophobe at 9:03 PM on February 5, 2010

Thanks for all the help. You guys rock and have cleared a lot of things up for me. Greatly appreciated!

posted by nickerbocker at 10:48 AM on February 7, 2010

posted by nickerbocker at 10:48 AM on February 7, 2010

Just a note, the correct interpretation of an x% confidence level is "if we repeat the same experiment on multiple samples, and calculate the interval the same way, it will contain the true population parameter x% of the time." It is commonly misinterpreted as "if we repeat the same experiment on multiple samples, x% of the sample means will lie in this interval."

Also, note that 95% corresponds to 1 out of 20. It's not as conclusive as it sounds. When you're making a decision with serious consequences, you need much better than 95% confidence. Some users of statistics prefer to give P-values and avoid the loaded word "confidence."

posted by scose at 9:00 PM on February 7, 2010

Also, note that 95% corresponds to 1 out of 20. It's not as conclusive as it sounds. When you're making a decision with serious consequences, you need much better than 95% confidence. Some users of statistics prefer to give P-values and avoid the loaded word "confidence."

posted by scose at 9:00 PM on February 7, 2010

This thread is closed to new comments.

The question you're asking below - if you conduct 100 tests, what is the interval within which 95 will fall, should be 2*your standard deviation (as you think).

Again, I'm not a statistician so if someone with better advice comes along, listen to them!

posted by pombe at 4:18 PM on February 5, 2010