# Difference between two correlated normally distributed random variables

September 14, 2008 9:35 PM Subscribe

A statistics question about dependent normal random variables.

My stats is really rusty so forgive me if I say something that doesn't make sense.

I have two random variables, X1 and X2. Both are normally distributed; let's say mean1=7000, mean2=7500, and the standard deviations are both 15. I am interested in knowing the probability that X1 < X2.

The trick is that these variables are not independent. They should be strongly correlated. That is, if X1 is really big, then X2 should also be pretty big. I don't know the exact correlation; maybe it's between 0.7 and 1.0. I plan to make a bunch of graphs leaving that as a variable. So let's assume for now that it's 0.8.

If they were independant I would know what to do, but I can't figure out to handle the correlation. I've been mucking around in Matlab but all I can figure out how to do is make a multivariate normal distribution with all the parameters above; I don't know how to go from there to figuring out if X1 < X2. Help?

My stats is really rusty so forgive me if I say something that doesn't make sense.

I have two random variables, X1 and X2. Both are normally distributed; let's say mean1=7000, mean2=7500, and the standard deviations are both 15. I am interested in knowing the probability that X1 < X2.

The trick is that these variables are not independent. They should be strongly correlated. That is, if X1 is really big, then X2 should also be pretty big. I don't know the exact correlation; maybe it's between 0.7 and 1.0. I plan to make a bunch of graphs leaving that as a variable. So let's assume for now that it's 0.8.

If they were independant I would know what to do, but I can't figure out to handle the correlation. I've been mucking around in Matlab but all I can figure out how to do is make a multivariate normal distribution with all the parameters above; I don't know how to go from there to figuring out if X1 < X2. Help?

Er ... I meant to post, "The probability that X1 < X2 is the same as the probability that (-1*X1 + X2) > 0.".

posted by hattifattener at 10:06 PM on September 14, 2008

posted by hattifattener at 10:06 PM on September 14, 2008

Easier to prove X2>=X1. Include the covariance between the two terms and that is a start.

Ho: X2-X1 = 0

Ha: X2-X1 not = 0

Wald test: (X2-X1)/sqrt((varX1+varX2-2*CovX1X2))

p-value from Z distribution

Like everything, it has its limitations (and so do I) so proceed with caution. The Wald test is googleable.

posted by Eringatang at 10:21 PM on September 14, 2008

Ho: X2-X1 = 0

Ha: X2-X1 not = 0

Wald test: (X2-X1)/sqrt((varX1+varX2-2*CovX1X2))

p-value from Z distribution

Like everything, it has its limitations (and so do I) so proceed with caution. The Wald test is googleable.

posted by Eringatang at 10:21 PM on September 14, 2008

Do you believe that the two variables are multivariate normal? (Your question seems to imply so.) If so, then any linear transformation of a normal variable is also normal. In your case, X = (x1, x2) is a two dimensional random vector. If it's Normal(mu, Sigma), and B is a kx2 matrix, then BX is normal with mean B*mu and variance B^T Sigma B.

Pick B = [ 1 -1 ], and you get the distribution of X1 - X2, which is what you want. Then the new variance is something like v_11 - 2*v_12 v_22, (where the v's refer to entries of the covariance matrix), assuming that I did the algebra right at this time of night. You can get the probability you want by using the Matlab function for the normal cdf (which I don't recall offhand).

posted by sesquipedalian at 11:43 PM on September 14, 2008

Pick B = [ 1 -1 ], and you get the distribution of X1 - X2, which is what you want. Then the new variance is something like v_11 - 2*v_12 v_22, (where the v's refer to entries of the covariance matrix), assuming that I did the algebra right at this time of night. You can get the probability you want by using the Matlab function for the normal cdf (which I don't recall offhand).

posted by sesquipedalian at 11:43 PM on September 14, 2008

Response by poster: I don't know if they're multivariate or not and I don't have the mental strength to wade through the equations on Wikipedia to figure out what that means. It was just the closest thing I could find in Matlab that seemed like it might work. As you might guess I am a little adrift when it comes to stats.

Let me explain more about the variables. It's a communications problem. X1 and X2 represent the arrival times of two events. Nominally X1 is supposed to arrive 500 units before X2. If X1 happens to be late due to random noise or whatever, X2 should also be late, because the two processes that generate the events are subject to (roughly) the same noise.

sesquipedalian, your answer feels right-ish to me, though of course I would like to check the algebra. That is, if they're multivariate. (are they?)

posted by PercussivePaul at 12:09 AM on September 15, 2008

Let me explain more about the variables. It's a communications problem. X1 and X2 represent the arrival times of two events. Nominally X1 is supposed to arrive 500 units before X2. If X1 happens to be late due to random noise or whatever, X2 should also be late, because the two processes that generate the events are subject to (roughly) the same noise.

sesquipedalian, your answer feels right-ish to me, though of course I would like to check the algebra. That is, if they're multivariate. (are they?)

posted by PercussivePaul at 12:09 AM on September 15, 2008

Response by poster: By the way, I should point out that this is a theoretical exercise, not one based on measurements. The mean, stdev, and distributions are my best guess as to what might occur in a real system and I want to know what the implications are.

posted by PercussivePaul at 12:13 AM on September 15, 2008

posted by PercussivePaul at 12:13 AM on September 15, 2008

This question is easy to approach directly. The joint distribution (X1, X2) is a two-variable normal distribution. Draw a large sample of pairs from (X1, X2), and fit a normal distribution to the sample. (MATLAB must have a routine for this.) Then integrate the p.d.f. of that distribution over the set {X1 < X2}. (Again, MATLAB must have a routine to do this numerical integration. I don't use MATLAB, so I can't point you at the relevant routines.) This should give the same answer as sesquipedalian's so it's a good cross-check.

posted by Coventry at 6:33 AM on September 15, 2008

posted by Coventry at 6:33 AM on September 15, 2008

You write it like this: X2=rho*X1+N(m,s) where N(m,s) is a normal RV uncorrelated to X1 [the sum of two normals is a normal]. That is, as X1 increases the central tendency of X2 increases. Then

P(X2>X1) = P( (rho-1)*X1 + N(m,s) >0)

And (rho-1)*N(7000,15) is distributed as N((rho-1)*7000,(rho-1)*15)

To get the X2 variance to equal 15, you have to require s^2 = 225*(1-rho^2)

To get the second mean to equal 7500 you have to require m= 7000 (1-rho) +500

What's rho? If you work through the covariance rho^2=corr(X1,X2)

And (rho-1)*X1 + N(m,s) is distributed as N( 500, 15*sqrt(2-2*rho) )

Now (PX2>X1) = P(N(500, 15*sqrt(2-2*rho) )>0)

= P(N(0,15*sqrt(2-2*rho) )>-500)

=P(N(0,1)>-500/15/sqrt(2-2*rho))

Which you look up in a Z-table. For example, with rho = sqrt(0.8) the probability is so close to one as to be negligibly different (about 70 standard deviations).

A difference in means of 33 standard deviations (500/15) is a ton. When the first one being big tends to make the bigger one even bigger the probability of the small one being larger is really really small.

posted by a robot made out of meat at 1:15 PM on September 15, 2008

P(X2>X1) = P( (rho-1)*X1 + N(m,s) >0)

And (rho-1)*N(7000,15) is distributed as N((rho-1)*7000,(rho-1)*15)

To get the X2 variance to equal 15, you have to require s^2 = 225*(1-rho^2)

To get the second mean to equal 7500 you have to require m= 7000 (1-rho) +500

What's rho? If you work through the covariance rho^2=corr(X1,X2)

And (rho-1)*X1 + N(m,s) is distributed as N( 500, 15*sqrt(2-2*rho) )

Now (PX2>X1) = P(N(500, 15*sqrt(2-2*rho) )>0)

= P(N(0,15*sqrt(2-2*rho) )>-500)

=P(N(0,1)>-500/15/sqrt(2-2*rho))

Which you look up in a Z-table. For example, with rho = sqrt(0.8) the probability is so close to one as to be negligibly different (about 70 standard deviations).

A difference in means of 33 standard deviations (500/15) is a ton. When the first one being big tends to make the bigger one even bigger the probability of the small one being larger is really really small.

posted by a robot made out of meat at 1:15 PM on September 15, 2008

Oh, when you're playing with other parameters MATLAB probably has the normal (Gaussian) cumulative distribution (aka Phi) built in. Your answer will be Phi(delta_mean/sd/sqrt(2-2sqrt(corr)))

posted by a robot made out of meat at 1:21 PM on September 15, 2008

posted by a robot made out of meat at 1:21 PM on September 15, 2008

This thread is closed to new comments.

posted by hattifattener at 10:06 PM on September 14, 2008