Comments on: Binomial distribution comparison
http://ask.metafilter.com/38909/Binomial-distribution-comparison/
Comments on Ask MetaFilter post Binomial distribution comparisonThu, 25 May 2006 16:48:26 -0800Thu, 25 May 2006 16:48:26 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Question: Binomial distribution comparison
http://ask.metafilter.com/38909/Binomial-distribution-comparison
Probability of one binomially-distributed variable being greater than another one - is there a closed form for this? <br /><br /> Specifically, the problem is this. (No, this isn't for homework)<br>
<br>
I've got two random variables, X and Y, both drawn from binomial distributions. p is the same for each, but n differs; essentially, both X and Y are the number of heads that come up in a single instance of n coin flips if the coins are weighted such that there's a p probability of heads. X and Y are independent, and n can differ from one to the other (so call the two Ns n_x and n_y).<br>
<br>
Is there a way to get a closed form in terms of p, n_x, and n_y for Pr[X≥Y]? I've been banging my head against this for hours and I thought I'd gotten somewhere but belatedly realized I'd made a mistake. I'm hoping some MeFi combanitorics whiz can help me out here.post:ask.metafilter.com,2006:site.38909Thu, 25 May 2006 15:54:16 -0800wanderingmindbinomialprobabilitycombanitoricsmathBy: Galvatron
http://ask.metafilter.com/38909/Binomial-distribution-comparison#601195
I don't have time to look hard at this right now, but it seems like the most straightforward approach is to condition on particular values of Y and use the law of total probability:<br>
<br>
\sum_{y=0}^{n_y} P(X \ge Y | Y = y) P(Y = y)<br>
<br>
Both of the probability terms in this expression can be computed in closed form. It looks like you'd get a double summation for a result, not sure if that can be simplified...<br>
<br>
I'll be back later if no one has addressed this in more detail.comment:ask.metafilter.com,2006:site.38909-601195Thu, 25 May 2006 16:48:26 -0800GalvatronBy: Civil_Disobedient
http://ask.metafilter.com/38909/Binomial-distribution-comparison#601200
<i>I don't have time to look hard at this right now...</i><br>
<br>
Lemme guess, you found a truly marvelous answer for this, but don't have enough margin space to write it all down?comment:ask.metafilter.com,2006:site.38909-601200Thu, 25 May 2006 16:56:15 -0800Civil_DisobedientBy: Wolfdog
http://ask.metafilter.com/38909/Binomial-distribution-comparison#601225
I will welcome information to the contrary but I don't think you're going to get an answer as simple as you're hoping. The probability is a polynomial in p, of degree nx+ny. I mean, you've no doubt discovered that for yourself already, and you know it's not hard to express as a double sum, but all those terms in the polynomial really matter - in fact, most of the action is in the middle coefficients. So the bigger nx and ny are, the more complicated the expression will be.<br>
<br>
If nx and ny are large you may be better off replacing it with a normal approximation. Then look at the difference (Y-X) - that will be normal with -- well, you probably know what to do, but I'll elaborate if you want.comment:ask.metafilter.com,2006:site.38909-601225Thu, 25 May 2006 17:14:02 -0800WolfdogBy: Mr. Six
http://ask.metafilter.com/38909/Binomial-distribution-comparison#601257
When you're comparing two distributions, you might think about looking at the probability distribution of their difference P<sub>X-Y</sub>.<br>
<br>
The difference of two random variables is itself a random variable and has its own probability distribution.<br>
<br>
If you have a large Nx and Ny, you can try to use a normal approximation to the binomial distribution. <a href="http://mathworld.wolfram.com/NormalDifferenceDistribution.html">Mathworld</a> has listed probability distribution of the difference of two normal random variables.<br>
<br>
Using the binomial means (μ = Np) and variances (σ = Np(1-p)) of the distributions of X and Y, you can estimate P<sub>X-Y</sub>.comment:ask.metafilter.com,2006:site.38909-601257Thu, 25 May 2006 17:30:51 -0800Mr. SixBy: hooves
http://ask.metafilter.com/38909/Binomial-distribution-comparison#601551
I'm celebrating my last chem lab of the year, so I don't (can't?) want to work out the details right now, but here is something to think about. BTW, since you are hoping for a "combinatorics whiz," I'll assume that a normal approximation of the binomial is not acceptable. You probably want exact values; the fine line between discrete and continuous gets blurred in this post, however.<br>
<br>
So, we (wikipedia does too) know the cumulative distribution function (cdf) for a binomial random variable. <br>
<br>
For some constant k, we want the probability of Binomial Dist 1 < k and if binomial dist 2> k. This translates to:<br>
<br>
Z = cdf(B_1 @ k) * (1 - cdf2@k)<br>
<br>
We can do this without any covariance factors messing up the equation because X and Y are independent. <br>
<br>
Then, we want to know the cdf of that value over all possible values of k, 0 to infinity. So, we would integrate the above pdf (Z) from 0 to infinity. Unfortunately, I'm not sure that k is rendered insignificant in this process. If k does disappear, great. If it doesn't, and you are still dying to know the answer to your question, I'd be glad to run this one by my PSTAT prof, simply remind me to ask after the 3 day weekend.</>comment:ask.metafilter.com,2006:site.38909-601551Fri, 26 May 2006 00:35:32 -0800hoovesBy: Galvatron
http://ask.metafilter.com/38909/Binomial-distribution-comparison#601555
So, how are you defining "closed form?"comment:ask.metafilter.com,2006:site.38909-601555Fri, 26 May 2006 00:44:55 -0800GalvatronBy: louigi
http://ask.metafilter.com/38909/Binomial-distribution-comparison#601808
It's not easy to get a nice expression, much easier to approximate. <br>
<br>
Let's suppose X=Bin(n,p) and Y=Bin(cn,p) for an integer c >= 1, and study P(X >= Y). We can express Y as the sum of c independent Bin(n,p) random variables. By symmetry, X is bigger than each of those with probability exactly 1/2, and if X is smaller than any of them then it is smaller than Y. <br>
<br>
Thus P(X >= Y) < 2^{-c}.br>
<br>
If p=a/n for some constant a, then you can get a lower bound of order e^{-ac} by bounding P(X >= Y) by the probability that <br>
X>= 1 and Y=0, which is easy to explicitly express, and then using the approximation (1-t/n)^{un} ~ e^{-tu}.<br>
<br>
If p is larger, say p >= a*log n / n for some constant a, then you're best off using <a href="http://en.wikipedia.org/wiki/Chernoff%27s_inequality">Chernoff's Inequality</a>.</>comment:ask.metafilter.com,2006:site.38909-601808Fri, 26 May 2006 07:32:12 -0800louigiBy: Wolfdog
http://ask.metafilter.com/38909/Binomial-distribution-comparison#601843
<i>We can express Y as the sum of c independent Bin(n,p) random variables. By symmetry, X is bigger than each of those with probability exactly 1/2</i><br>
<br>
Your probability exactly 1/2 assertion is not correct. If X1 and X2 are identically distributed, independent random variables then P(X1>=X2) is not 1/2. P(X1 > X2) isn't 1/2 either. That's the thing with discreteness - those pesky little points having mass.comment:ask.metafilter.com,2006:site.38909-601843Fri, 26 May 2006 08:09:20 -0800WolfdogBy: Wolfdog
http://ask.metafilter.com/38909/Binomial-distribution-comparison#601862
I mean to say, "If X1 and X2 are identically distributed, independent <em>binomial</em> random variables" there, of course.comment:ask.metafilter.com,2006:site.38909-601862Fri, 26 May 2006 08:24:38 -0800WolfdogBy: Wolfdog
http://ask.metafilter.com/38909/Binomial-distribution-comparison#601883
If you do the normal approximation, you get a closed-form answer in terms of Erf:<br>
<tt><br>
P(X>=Y) = (1/2)(1 + Erf((nx*p - ny*p)/(Sqrt(2*(nx + ny)*(1 - p)*p]))<br>
</tt><br>
Of course, the normal approximation believes that the answer is exactly 1/2, independent of p, if nx=ny. That is terribly wrong for small nx & ny's but nx and ny don't have to get too big before the approximation is excellent. As usual, "big" is bigger is p is very close to 0 or 1, and not too big at all if p is closer to 1/2.comment:ask.metafilter.com,2006:site.38909-601883Fri, 26 May 2006 08:38:03 -0800WolfdogBy: louigi
http://ask.metafilter.com/38909/Binomial-distribution-comparison#602653
Sorry: I should have said P(X > Y) = P (Y > X).<br>
<br>
You can also get fairly easy upper bounds on P(X=Y) which will be pretty small (i.e. not that close to 1) unless p is much smaller than 1/n.comment:ask.metafilter.com,2006:site.38909-602653Sat, 27 May 2006 06:47:00 -0800louigi