What should my sample size be?
November 12, 2012 8:36 AM   Subscribe

Say that I have a bag which contains 100 balls and every ball in the bag should be red, but it's possible that one or more of these balls is the wrong colour. How many balls should I look at to be 90% sure that all the balls are red? Or 95%? Or 99.9%? Talk me through how to work this out, please?
posted by xchmp to Science & Nature (14 answers total) 7 users marked this as a favorite
 
Can you take the balls out and leave them out, or do you only get to look at one ball at a time? In other words, do you have to replace every ball you examine?
posted by cr_joe at 8:44 AM on November 12, 2012 [1 favorite]


I usually thing of "percent certainty" in terms of confidence interval for experimental results. However, if you reach into the bag and pull out a green ball on your first effort, you can know with 100% certainty that not all of the balls are red.

Assuming that you want to know how many red balls you would have to pull out to have 90% certainty that all balls are red, pull out 90 balls. However, this is just another way of saying that you know with certainty that 90% of the balls are red.

99.5% does not make sense for such a small sample size.
posted by Tanizaki at 8:46 AM on November 12, 2012


Response by poster: Once I've taken a ball out the bag it stays out. So I guess the question is how many I need to take out.
posted by xchmp at 8:46 AM on November 12, 2012


What would it mean to be 90% sure that all the balls are red? Does that mean that there's a 9/10 chance that any given ball is red? Because those seem like slightly different things, but it's hard for me to tell what you're looking for.
posted by clockzero at 9:12 AM on November 12, 2012


You need to specify the problem more completely. Before you draw any balls out of the box, what do you believe is the probability that there are black balls? If there are black balls, how many are there? For example, you could say that the probability of no black balls is 80%, the probability of one black ball is 15%, and the probability of two black balls is 5%.

Then you use Bayes' theorem to figure out how you should update that as you draw balls and see their color. Doubtless someone reading this can help out. (I could, but I probably won't because I'm on my way to work.)
posted by madcaptenor at 9:17 AM on November 12, 2012 [1 favorite]


I was reminded of Erlang Tables, which we used at the phone company to determine how many phone lines a business would need to handle X% of traffic at the busiest time.

This also works for how many restrooms do you need in a stadium to achieve a certian rate of service (how many angry women are in a line waiting for a stall and for how long)

But I'm reminiscing.

I found a Sample Size Calculator, and you'd need to know your Confidence Level (95% or 90% confident) and the interval, the number you're comfortable being off by, plus or minus. So for example, if you're okay with being off by 5 balls, and you want 95% confidence, then the answer is: 80 Balls.

Fool around with the calculator, and there's some interesting stuff about statistical samples on the site.
posted by Ruthless Bunny at 9:21 AM on November 12, 2012


To answer this question, we need some sort of baseline idea for the chance of each individual ball being red. If the color of each ball is understood to be random, then the chance of a ball being red would seem to be rather small; if the balls were, as you implied, intended to be red all along, and would only be a different color due to some mistake, then the chance of any individual ball being red would seem to be very high; and everything inbetween.

So the bag contains N balls. Suppose the chance of any individual ball being red is P. The chance of all N balls being red is N multiplied by P. The chance of all N balls being red, having already removed X balls and confirmed them to be red, is N - X * P (assuming that N > X).

So: if the chance of any individual ball being red is 50%, then without removing any balls, the chance of all 100 being red is 50% times 2^100, which is an absurdly minute fraction. If we remove 90 balls, all of which turn out to be red, then the chance of every single ball being red is 50% times 2^10 - the same as if the bag only contained 10 balls to begin with, and we removed none of them. If we remove 99 balls, all of them red, then the chance that every single ball is red is the same as the chance that the very last ball is red: 50%.

The number of balls you rermove from the bag and confirm to be red is, in a sense, irrelevant - what matters is how many balls remain in the bag.

If you wish to be 99% certain that every ball is red, then the chance that each individual ball is red must be at least 99%. If, for instance, you want to be this certain when their are two balls remaining, then the probability of each individual ball being red must be the square root of 99%, or rather, the square root of .99, which my calculator informs me is .994987.
posted by CustooFintel at 9:23 AM on November 12, 2012 [1 favorite]


The "or more" part is what's throwing me off. Though there's much more sophisticated answers here, I'm thinking about it in terms of percentages of probability of "fewer than X number of black balls."

For instance, if I remove 30 balls (and they're all red) I know that there's:
100% chance that there are fewer than 70 black balls (70/(100-30))
86% chance of being fewer than 60 (60/(100-30)
71% chance of being fewer than 50
57% chance of being fewer than 40
43% chance of being fewer than 30
29% chance of being fewer than 20
14% chance of being fewer than 10
1.4% chance of being fewer than 1
posted by cr_joe at 9:39 AM on November 12, 2012


Assuming that you want to know how many red balls you would have to pull out to have 90% certainty that all balls are red, pull out 90 balls.

Exactly. On a related note, you may be interested in the odds that there are fewer than k non-red balls in the bag. So, we can compute the probability of drawing n red balls in a row if there are k non-red balls:

p = (100-k)/100 * (99-k)/99 * ... * (101-n-k)/(101-n)

since at step j, either you draw one of the 101-j-k red balls remaining out of the 101-j balls in the bag, or you stop the process. Once p drops below 0.05, you're 95% confident that there are fewer than k non-red balls in the bag. If k=1, this calculation turns into the telescoping product

p = 99/100 * 98/99 * ... * (100-n)/(101-n) = (100-n)/100,

which means that to be 90% confident there are no (i.e. fewer than 1) non-red balls in the bag, you should pull out 90 balls.

This also means that once you've pulled out the 90 balls, it's 99% safe to say there are fewer than 2 non-red, 99.9% safe to say there are fewer than 3, and mathematically certain that there are fewer than 11.
posted by zeptoweasel at 9:43 AM on November 12, 2012


Best answer: Will we be getting co-credit on your homework?

Madcaptenor is right, this is insufficient information to determine what you need. When you're all done sampling do the balls go back in the bag? Or is the test destructive in this regards? If so, do you have a target number you need to have when your tests are finished or will any size remaining sample do? Which can't be exactly true or else you'd simply test every one.

You should focus your research on manufacturing or purchase-acceptance testing; the red/black thing is a macguffin: you're looking to determine, absent any knowledge of a failure rate, what your bare minimum number of necessary tests are to determine if something is good.

I'm assuming you're still in that IT degree, in which case the principle here is likely operating characteristics.

Here's a P-H text which addressed this directly and has the appropriate formulas.
To help you understand the theory underlying the use of sampling plans, we will illustrate how an OC curve is constructed statistically.

In attribute sampling, where products are determined to be either good or bad, a binomial distribution is usually employed to build the OC curve. The binomial equation is (T2-1)

where n = number of items sampled (called trials)
p = probability that an x (defect) will occur on any one trial
P(x) = probability of exactly x results in n trials

When the sample size (n) is large and the percent defective (p) is small, however, the Poisson distribution can be used as an approximation of the binomial formula. This is convenient because binomial calculations can become quite complex, and because cumulative Poisson tables are readily available. Our Poisson table appears in Appendix II of the text.

In a Poisson approximation of the binomial distribution, the mean of the binomial, which is np, is used as the mean of the Poisson, which is λ; that is, λ = np
By the way, XKCD was somewhat on-point for you the other day.
posted by phearlez at 9:57 AM on November 12, 2012


Response by poster: Madcaptenor is right, this is insufficient information to determine what you need. When you're all done sampling do the balls go back in the bag? Or is the test destructive in this regards? If so, do you have a target number you need to have when your tests are finished or will any size remaining sample do? Which can't be exactly true or else you'd simply test every one.

I think my hypothetical is just getting in the way here.

So the real situation involves software rather that balls. The software will select a certain number of things according to specific criteria. I'm trying to work out how many of these I should manually check to give (some level) of confidence that no 'wrong' things have been selected.

The number of items in the 'bag' will vary between the hundreds to the tens of thousands, so checking every one of them isn't an option. I want to be able to say that checking x number of the y items will give us z level of confidence in the software process.
posted by xchmp at 10:09 AM on November 12, 2012


Response by poster: That Acceptance Sampling paper looks great, by the way. I'll read that now.
posted by xchmp at 10:12 AM on November 12, 2012


Best answer: I found an excellent discussion of a very similar question via the Xkcd comic linked above - I have burned 200 disks and I want to make sure that they are all in perfect working order. What is the smallest size sample I could test in order to be relatively confident that 98% of all the disks are fine/burned correctly? (it's on Quora - you don't need to log in to see it).
posted by siskin at 10:24 AM on November 12, 2012 [4 favorites]


Response by poster: Thanks everyone. It turns out that acceptance sampling is exactly what I was looking for. I don't know the defect rate, but I can use acceptance sampling to work out the probability that the number of defects is below various thresholds and make decisions about sample size on that basis.

Will we be getting co-credit on your homework?

This was actual work rather than homework. I'll consider linking to your answer in my test plan, though.
posted by xchmp at 9:22 AM on November 13, 2012


« Older Where to take my macbook for repair in Los Angeles...   |   Web site podcast feeds? Newer »
This thread is closed to new comments.