# Conditional Probability

December 16, 2009 5:25 PM Subscribe

Stats-filter: Given a binary matrix, if I know the total number of ones in a given row and a given column, can I calculate the probability that a given position contains a one?

I have a binary matrix, like so, where every value is either 1 or 0. So, if the first column contains 2 ones, and the first row contains 1 one, what's the probability that position A contains a one?

Example:

(not homework-filter)

I have a binary matrix, like so, where every value is either 1 or 0. So, if the first column contains 2 ones, and the first row contains 1 one, what's the probability that position A contains a one?

Example:

2

________

1 | A| | |

|__|__|__|

| | | |

|__|__|__|

| | | |

|__|__|__|

| | | |

|__|__|__|

(not homework-filter)

Best answer: The simplest additional conditions I can think of are that you have only those two sums for information, and that every binary matrix is equally probable under the prior.

In that case, let the matrix be nxm, the sum across a row be x and the sum down a column be y. If the given position contains a one, there are (m-1) choose (x-1) configurations of the other elements in the same row, and (n-1) choose (y-1) configurations of the other elements in the same column. If the given position contains a zero, there are (m-1) choose x and (n-1) choose y configurations, mutatis mutandis.

So P(1)/P(0) = [(m-1)!(n-1)!/((x-1)!(y-1)!(m-x)!(n-y)!)]/[(m-1)!(n-1)!/(x!y!(m-x-1)!(n-y-1)!)] = xy/(m-x)(n-y). Therefore P(1) = xy/((m-x)(n-y)+xy).

In the case you give, x=1,y=2,n=4,m=3, so P(1) = 2/((3-1)(4-2)+2)=1/3.

posted by topynate at 6:00 PM on December 16, 2009 [1 favorite]

In that case, let the matrix be nxm, the sum across a row be x and the sum down a column be y. If the given position contains a one, there are (m-1) choose (x-1) configurations of the other elements in the same row, and (n-1) choose (y-1) configurations of the other elements in the same column. If the given position contains a zero, there are (m-1) choose x and (n-1) choose y configurations, mutatis mutandis.

So P(1)/P(0) = [(m-1)!(n-1)!/((x-1)!(y-1)!(m-x)!(n-y)!)]/[(m-1)!(n-1)!/(x!y!(m-x-1)!(n-y-1)!)] = xy/(m-x)(n-y). Therefore P(1) = xy/((m-x)(n-y)+xy).

In the case you give, x=1,y=2,n=4,m=3, so P(1) = 2/((3-1)(4-2)+2)=1/3.

posted by topynate at 6:00 PM on December 16, 2009 [1 favorite]

mhum and topynate are right, if you want a more simplistic explanation:

Create a possibility space of all the potential ways your condition: "the first column contains 2 ones, and the first row contains 1 one" could be satisfied.

Divide the number of ways that end up with position A being 1 by the total number of states in your possibility space. That's your probability.

posted by pseudonick at 6:03 PM on December 16, 2009

Create a possibility space of all the potential ways your condition: "the first column contains 2 ones, and the first row contains 1 one" could be satisfied.

Divide the number of ways that end up with position A being 1 by the total number of states in your possibility space. That's your probability.

posted by pseudonick at 6:03 PM on December 16, 2009

Everyone else is being an asshole. I'll make it simple for you.

Take the average of the probabilities for the row and for the column.

i.e. the probability of a 1 for the column is 2 / 4 = 50%

the probability of a 1 for the row is 1 / 3 = 33.3%

The average is ( 1 / 2 ) + ( 1 / 3 ) = ( 5 / 12 ) = 41.667%

You're welcome. Now what are you really trying to learn?

posted by MaxK at 6:11 PM on December 16, 2009

Take the average of the probabilities for the row and for the column.

i.e. the probability of a 1 for the column is 2 / 4 = 50%

the probability of a 1 for the row is 1 / 3 = 33.3%

The average is ( 1 / 2 ) + ( 1 / 3 ) = ( 5 / 12 ) = 41.667%

You're welcome. Now what are you really trying to learn?

posted by MaxK at 6:11 PM on December 16, 2009

MaxK, I don't think the other commenters are being assholes, and I also believe that your analysis is incorrect. Consider what would happen to your example if column 1 had zero probability of having a 1. Then clearly the probability for every cell in that column would be zero, not zero averaged with the corresponding row probability.

posted by a snickering nuthatch at 6:14 PM on December 16, 2009

posted by a snickering nuthatch at 6:14 PM on December 16, 2009

*Take the average of the probabilities for the row and for the column.*

Will this work if you have zero 1s in the the column and one 1 in the row? Or say, one 1 in the column and all 1s in the row?

posted by mhum at 6:15 PM on December 16, 2009

Do you have information on just one row-column pair, or do you have information on multiple rows/columns? If you have information on multiple rows/columns, then whether a particular cell has a 1 or 0 in it is no longer independent of its neighbors and you've got a more complicated calculation on your hands.

posted by a snickering nuthatch at 6:22 PM on December 16, 2009

posted by a snickering nuthatch at 6:22 PM on December 16, 2009

I would try min(prob_row, prob_col), but I don't have any probability theory to justify that. It does solve the problem Jpfed and mhum mentioned.

posted by demiurge at 6:49 PM on December 16, 2009

posted by demiurge at 6:49 PM on December 16, 2009

If you only have one row-column pair of numbers, topynate has the answer here (and you'll note that his solution handles zeroes correctly).

posted by a snickering nuthatch at 6:54 PM on December 16, 2009

posted by a snickering nuthatch at 6:54 PM on December 16, 2009

*I would try min(prob_row, prob_col)*

This does not solve the problem of having a row or column filled with all ones.

posted by mhum at 7:29 PM on December 16, 2009

mhum and topynats are totally correct as far as you requiring more assumptions on the distribution and is otherwise ill defined. However, if you're willing to assume that every binary matrix is equally probably under the prior, there exists a reasonably simple calculation you can do.

Lets say there the matrix is n x m in size and the row/column you are considering have a and b 1s.

If you only knew that a out of n elements of the row were 1, the probability of the element you're considering would be a/n. I'm pretty sure the quantity you're looking for is (a/n * b/m) / (a/n *b/m + (n-a)/n * (m-b)/m).

Think of generating your matrix in the following way: You first take a 1s and randomly distribute them in the n spots in the row. Then, you take b 1st and randomly distribute them in the m spots in the column. Now, in order to be possible, there is consistency required: the cell at the intersection of the row and column must either have a 1 or 0, so if the row process generates a 1 but the column process generates a 0 there, you must start over. Let x be the the value generated by the row and y the value generated by the column. Then,

P(1) = P(x=1, y=1 | x = y) = P(x = 1, y = 1) / P(x = y) = (a/n * b/m) / (a/n *b/m + (n-a)/n * (m-b)/m).

posted by bsdfish at 7:46 PM on December 16, 2009 [1 favorite]

Lets say there the matrix is n x m in size and the row/column you are considering have a and b 1s.

If you only knew that a out of n elements of the row were 1, the probability of the element you're considering would be a/n. I'm pretty sure the quantity you're looking for is (a/n * b/m) / (a/n *b/m + (n-a)/n * (m-b)/m).

Think of generating your matrix in the following way: You first take a 1s and randomly distribute them in the n spots in the row. Then, you take b 1st and randomly distribute them in the m spots in the column. Now, in order to be possible, there is consistency required: the cell at the intersection of the row and column must either have a 1 or 0, so if the row process generates a 1 but the column process generates a 0 there, you must start over. Let x be the the value generated by the row and y the value generated by the column. Then,

P(1) = P(x=1, y=1 | x = y) = P(x = 1, y = 1) / P(x = y) = (a/n * b/m) / (a/n *b/m + (n-a)/n * (m-b)/m).

posted by bsdfish at 7:46 PM on December 16, 2009 [1 favorite]

Best answer: I'm going to turn this into a combinatorics question. This is what it secretly is if you make the assumption that every binary matrix is equally likely.

The problem will be to find the probability that the upper left corner of an m-by-n binary matrix contains a 1, given that the top row contains r 1s and the left column contains s 1s.

This is P/(P+Q), where:

P is the number of m-by-n binary matrices with r 1s in the top row, s 1s in the left column, and a 1 in the upper left corner, and

Q is the number of m-by-n binary matrices with r 1s in the top row, s 1s in the left column. and a 0 in the upper left corner.

(This might seem like a silly way to write it! But you'll see why.)

First, compute P. If there is a 1 in the upper left corner, then there must be:

- (r-1) 1s among the (m-1) other entries in the top row;

- (s-1) 1s among the (n-1) other entries in the left column;

and we can choose the remaining (m-1)(n-1) entries however we want. Thus we have

P = C(m-1, r-1) * C(n-1, s-1) * 2^((m-1)(n-1))

where C(k,l) = k!/(l! * (k-l)!) is a binomial coefficient.

You can compute Q in the same way. If there is a 0 in the upper left corner, then there must be:

- r 1s among the (m-1) other entries in the top row;

- s 1s among the (n-1) other entries in the left column;

and we can choose the remaining (m-1)(n-1) entries however we want.

So Q = C(m-1, r) * C(n-1, s) * 2^((m-1)(n-1))

In particular, when we compute P/(P+Q), lots of things cancel, and we get

P/(P+Q) = rs/(2rs + mn - ms - rn).

In the particular case you asked about, m = 3, n = 4, r = 1, s = 2, and the probability is 1/3, which agrees with the answer other people have given.

Also, this satisfies some sanity checks: if the first row consists entirely of 1s (r = m) or the first column consists entirely of 1s (s = n) then the probability that the upper left corner has a 1 is in fact 1, as it should be! And if r = 0 or s = 0 (no ones in the first row/column) the probability is zero.

posted by madcaptenor at 8:28 PM on December 16, 2009 [1 favorite]

The problem will be to find the probability that the upper left corner of an m-by-n binary matrix contains a 1, given that the top row contains r 1s and the left column contains s 1s.

This is P/(P+Q), where:

P is the number of m-by-n binary matrices with r 1s in the top row, s 1s in the left column, and a 1 in the upper left corner, and

Q is the number of m-by-n binary matrices with r 1s in the top row, s 1s in the left column. and a 0 in the upper left corner.

(This might seem like a silly way to write it! But you'll see why.)

First, compute P. If there is a 1 in the upper left corner, then there must be:

- (r-1) 1s among the (m-1) other entries in the top row;

- (s-1) 1s among the (n-1) other entries in the left column;

and we can choose the remaining (m-1)(n-1) entries however we want. Thus we have

P = C(m-1, r-1) * C(n-1, s-1) * 2^((m-1)(n-1))

where C(k,l) = k!/(l! * (k-l)!) is a binomial coefficient.

You can compute Q in the same way. If there is a 0 in the upper left corner, then there must be:

- r 1s among the (m-1) other entries in the top row;

- s 1s among the (n-1) other entries in the left column;

and we can choose the remaining (m-1)(n-1) entries however we want.

So Q = C(m-1, r) * C(n-1, s) * 2^((m-1)(n-1))

In particular, when we compute P/(P+Q), lots of things cancel, and we get

P/(P+Q) = rs/(2rs + mn - ms - rn).

In the particular case you asked about, m = 3, n = 4, r = 1, s = 2, and the probability is 1/3, which agrees with the answer other people have given.

Also, this satisfies some sanity checks: if the first row consists entirely of 1s (r = m) or the first column consists entirely of 1s (s = n) then the probability that the upper left corner has a 1 is in fact 1, as it should be! And if r = 0 or s = 0 (no ones in the first row/column) the probability is zero.

posted by madcaptenor at 8:28 PM on December 16, 2009 [1 favorite]

Best answer: I definitely wouldn't use min(P(r),P(c)) or avg(P(r),P(c)) for the reasons mhum and jpfed have given.

Here's a somewhat different way of getting an answer to your question. If you assume the rows and columns are independent, and you know the total proportion of 1s and 0s (or if you don't but you can guess), then you can just use a simple "naive Bayes" approach:

P(A=1|r,c)

= P(A=1,r,c) / P(r,c)

= P(r,c|A=1)P(A=1) / P(r,c)

= P(r|A=1)P(c|A=1)P(A=1) / P(r,c)

= P(r|A=1)P(c|A=1)P(A=1) / sum(P(r,c|A=a)P(A))

= P(r|A=1)P(c|A=1)P(A=1) / sum(P(r|A=a)P(c|A=a)P(A=a))

= P(A=1|r)P(A=1|c)P(r)P(c)P(A=1) / [P(A=1)P(A=1) * sum(P(A=a|r)P(A=a|c)P(A=a)P(r)P(c) / P(A=a)P(A=a)]

=

Or, more explicitly:

=

(Here P(A=1|r) is the probability of finding a 1 in a given row, and so forth.)

All you have to know for this to work is the proportions of 1s and 0s in the column/row of interest, as well as the total proportion of 1s and 0s in the matrix as a whole (that's the P(A) stuff). Incidentally, if you don't know the total proportion and set P(A=1) = P(A=0) = 0.5, which matches the assumption others made that all matrices are equally likely, then you also get a probability of 1/3 for your example using this approach.

posted by en forme de poire at 8:58 PM on December 16, 2009

Here's a somewhat different way of getting an answer to your question. If you assume the rows and columns are independent, and you know the total proportion of 1s and 0s (or if you don't but you can guess), then you can just use a simple "naive Bayes" approach:

P(A=1|r,c)

*where A is your cell of interest, r is a row, and c is a column*= P(A=1,r,c) / P(r,c)

*Bayes rule, pt 1*= P(r,c|A=1)P(A=1) / P(r,c)

*Bayes rule, pt 2*= P(r|A=1)P(c|A=1)P(A=1) / P(r,c)

*assuming row and column are independent*= P(r|A=1)P(c|A=1)P(A=1) / sum(P(r,c|A=a)P(A))

*marginalize denominator: values of a are 0 or 1*= P(r|A=1)P(c|A=1)P(A=1) / sum(P(r|A=a)P(c|A=a)P(A=a))

*independence again*= P(A=1|r)P(A=1|c)P(r)P(c)P(A=1) / [P(A=1)P(A=1) * sum(P(A=a|r)P(A=a|c)P(A=a)P(r)P(c) / P(A=a)P(A=a)]

*applying Bayes rule again*=

**P(A=1|r)P(A=1|c) / [P(A=1) * sum(P(A=a|r)P(A=a|c)/P(A=a))]**Or, more explicitly:

=

**P(A=1|r)P(A=1|c) / [P(A=1) * [P(A=0|r)P(A=0|c)/P(A=0)) + (P(A=1|r)P(A=1|c)/P(A=1))]]**(Here P(A=1|r) is the probability of finding a 1 in a given row, and so forth.)

All you have to know for this to work is the proportions of 1s and 0s in the column/row of interest, as well as the total proportion of 1s and 0s in the matrix as a whole (that's the P(A) stuff). Incidentally, if you don't know the total proportion and set P(A=1) = P(A=0) = 0.5, which matches the assumption others made that all matrices are equally likely, then you also get a probability of 1/3 for your example using this approach.

posted by en forme de poire at 8:58 PM on December 16, 2009

Response by poster: I'm a little bit glad that this wasn't a dead simple answer, as I've been overthinking this for much of the afternoon and it would have been a bit embarassing.

To clear up a few things:

- I'm using an algorithm that goes through a matrix and places bets on whether a given bit will be one or zero. As input, I have only the sums of each row and column. The question is, how can I use that information to make the best possible informed guess at each position?

- I do have information on every row and column, and I do know the total number of ones in the matrix so I do know something about the underlying distribution.

Given the scant information I provided, the answers using combinatorics were right on, and I was thinking of doing something along those lines initially. Using Bayes' rule (as suggested by en forme de poire) appears to be the answer I'm really looking for, since I do have info about the distribution. I'm playing around with now to make sure it works as I expect.

posted by chrisamiller at 9:35 PM on December 16, 2009

To clear up a few things:

- I'm using an algorithm that goes through a matrix and places bets on whether a given bit will be one or zero. As input, I have only the sums of each row and column. The question is, how can I use that information to make the best possible informed guess at each position?

- I do have information on every row and column, and I do know the total number of ones in the matrix so I do know something about the underlying distribution.

Given the scant information I provided, the answers using combinatorics were right on, and I was thinking of doing something along those lines initially. Using Bayes' rule (as suggested by en forme de poire) appears to be the answer I'm really looking for, since I do have info about the distribution. I'm playing around with now to make sure it works as I expect.

posted by chrisamiller at 9:35 PM on December 16, 2009

Response by poster:

posted by chrisamiller at 9:39 PM on December 16, 2009

*en forme de poire*: In step 3, you assume that the row and column are independent. Isn't that assumption violated by the fact that they share a position? That is, if the first row has 3 positions and a sum of three, I know that the value of the column has to be at least one (and that the position in question has to be a one as well). Am I over thinking that assumption?posted by chrisamiller at 9:39 PM on December 16, 2009

Response by poster: never mind - as I start to play with the numbers, I see how it makes sense. Thanks for the input all.

posted by chrisamiller at 10:05 PM on December 16, 2009

posted by chrisamiller at 10:05 PM on December 16, 2009

Chris: actually, if the first row is all ones, then P(A=0|r) is going to be zero and you're going to get:

P(A=1|c,r) = [1 * P(A=1|c) / P(A=1)] * [P(A=1) / (1 * P(A=1|c)) ] = 1

So this will correctly handle the case you're talking about. By non-independence I really meant something more like if you only ever had entries along the diagonal of the matrix, or only in the bottom right-hand corner, since then P(c,r|A) would be really different from P(c|A)P(r|A).

posted by en forme de poire at 10:23 PM on December 16, 2009

P(A=1|c,r) = [1 * P(A=1|c) / P(A=1)] * [P(A=1) / (1 * P(A=1|c)) ] = 1

So this will correctly handle the case you're talking about. By non-independence I really meant something more like if you only ever had entries along the diagonal of the matrix, or only in the bottom right-hand corner, since then P(c,r|A) would be really different from P(c|A)P(r|A).

posted by en forme de poire at 10:23 PM on December 16, 2009

(oops, should've previewed, but perhaps someone else will find it useful)

posted by en forme de poire at 10:23 PM on December 16, 2009

posted by en forme de poire at 10:23 PM on December 16, 2009

If you have information on every row and column, then doing your calculations independently for each cell (using only the local row/column information) will give you a sort of heuristic estimate of the probability but it will not be exact. Consider the following:

I don't mean to be a broken record here but if you have more information than just one row and column, then the true calculation gets hairier and may not have a nice simple form.

posted by a snickering nuthatch at 7:08 AM on December 17, 2009 [1 favorite]

0 1 3 +---+---+---+ 1 | | | | +---+---+---+ 2 | | | | +---+---+---+ 1 | | | | +---+---+---+Let us imagine that we were interested in finding the central square's probability of being a 1. Simply using the row and column information for the cell we are interested in, we would use the formulas given above and get some number between 0 and 1. But that misses the fact that we are indeed certain that the central square is a 1. This matrix has only one solution, with 1s running down the right side and a 1 in the center.

I don't mean to be a broken record here but if you have more information than just one row and column, then the true calculation gets hairier and may not have a nice simple form.

posted by a snickering nuthatch at 7:08 AM on December 17, 2009 [1 favorite]

Response by poster: What I'm looking for is a heuristic, not a solver. As I said, I'm essentially placing bets on each bit as I move through a matrix, and in accordance with information theory, I want to increase my returns by betting an amount as close to the actual probability as I can.

That said, I don't need perfect information, just enough information that I can do quite a bit better than I would by chance. The computational overhead of "solving" the matrix like you describe isn't worth it to me at this juncture. I appreciate the feedback though.

posted by chrisamiller at 9:11 AM on December 17, 2009

That said, I don't need perfect information, just enough information that I can do quite a bit better than I would by chance. The computational overhead of "solving" the matrix like you describe isn't worth it to me at this juncture. I appreciate the feedback though.

posted by chrisamiller at 9:11 AM on December 17, 2009

Yeah, knowing that the values all have to be binary definitely does give you extra information. I think what approach you end up taking depends on a) whether or not there is any uncertainty associated with the row and column sums, b) how big the matrix is, and c) how sparse the matrix is. If there is any uncertainty, an exact approach might be prone to overfitting, and for big matrices that aren't very sparse you might have a computationally intractable problem (although I don't know that for sure, there might be an efficient way to do it that I'm not thinking of). But if you have a small matrix (or one that's mostly 0s or mostly 1s) with no errors in the column or row sums, then an exact approach like what jpfed was talking about would probably perform better.

posted by en forme de poire at 9:55 AM on December 17, 2009

posted by en forme de poire at 9:55 AM on December 17, 2009

en forme de poire: in fact, Bayes' theorem gives the same result that I (and other people before me?) got combinatorially. I noticed this when I was playing around with my formula. This is reassuring.

and a more general comment: I'm pretty sure that the general problem of finding a matrix given the sums of its columns and rows is referred to as finding a "contingency table"; this is something that is quite difficult, and also something that has been seriously studied.

posted by madcaptenor at 12:23 PM on December 17, 2009

and a more general comment: I'm pretty sure that the general problem of finding a matrix given the sums of its columns and rows is referred to as finding a "contingency table"; this is something that is quite difficult, and also something that has been seriously studied.

posted by madcaptenor at 12:23 PM on December 17, 2009

I'm essentially placing bets on each bit as I move through a matrix, and in accordance with information theory, I want to increase my returns by betting an amount as close to the actual probability as I can.Something to bear in mind: your probability estimate and your bets aren't necessarily going to be proportional. It depends on what scoring rule you use.

So, for instance, say you have to bet a fixed amount of one unit each turn. Then if you get paid double on a correct choice, nothing on an incorrect choice, you should always bet one unit on the outcome you assign higher probability to.

However, suppose you get paid 1- (x-q)^2 units, where q is the amount you bet on the bit being 1, x is the actual value of the bit. This is called the Brier score, and it has the property that you get the most return if you bet an amount proportional to your probability estimate for that result. There are many scoring rules like this, called proper scores. So under this rule, if you think the probability of the bit being 1 is p, bet p on outcome 1, 1-p on outcome 0.

posted by topynate at 3:32 AM on December 19, 2009

This question is almost a year old, but I just discovered how to answer it fully, so for the benefit of any interested parties here's the complete (numerical) solution.

The way around having to calculate the distribution in advance is to use MCMC - Markov Chain Monte Carlo. This introduction to MCMC solves nearly the same problem as an example (Section 2.2). The problem there lets the matrix values be any non-negative integer; here we just have 0 or 1. There are two parts that aren't handed to us by the linked PDF (which you should look at first):

1. How to construct some matrix that has the right row and column sums, in order to get started.

One way is to fill each row with 1s from the left, skipping any columns which already have enough 1s, and stopping when the row has enough.

2. As we move through the matrix, placing bets, we learn the values of each element that we bet on. How can we modify our algorithm to use this new information?

We need to make a new matrix that fits all the information we're given, and modify our Markov chain so that all the matrices visited fit that same information. We generate a matrix by filling in the known values, then filling around them. We can modify the Markov chain as follows: after we generate the successor matrix, we now check if it alters a known value. If it does, we stay where we are for that iteration of the algorithm. This ensures that the transition probability to each neighbouring matrix is the same, and so preserves the uniform distribution.

posted by topynate at 8:13 PM on November 18, 2010 [3 favorites]

The way around having to calculate the distribution in advance is to use MCMC - Markov Chain Monte Carlo. This introduction to MCMC solves nearly the same problem as an example (Section 2.2). The problem there lets the matrix values be any non-negative integer; here we just have 0 or 1. There are two parts that aren't handed to us by the linked PDF (which you should look at first):

1. How to construct some matrix that has the right row and column sums, in order to get started.

One way is to fill each row with 1s from the left, skipping any columns which already have enough 1s, and stopping when the row has enough.

2. As we move through the matrix, placing bets, we learn the values of each element that we bet on. How can we modify our algorithm to use this new information?

We need to make a new matrix that fits all the information we're given, and modify our Markov chain so that all the matrices visited fit that same information. We generate a matrix by filling in the known values, then filling around them. We can modify the Markov chain as follows: after we generate the successor matrix, we now check if it alters a known value. If it does, we stay where we are for that iteration of the algorithm. This ensures that the transition probability to each neighbouring matrix is the same, and so preserves the uniform distribution.

posted by topynate at 8:13 PM on November 18, 2010 [3 favorites]

« Older Water shut off due to non-payment by landlord.... | I'm having real strange sleeping issues. I can... Newer »

This thread is closed to new comments.

What is the universe of possibilities here? A likely scenario is that you're generating binary matrices such that the probability of 1 in any given entry is, say, p (maybe p=1/2?). Then, given that you've generated such a matrix, you observe that you have, say, two 1s in the first column and one 1 in the first row. Is this what you're imagining?

posted by mhum at 5:58 PM on December 16, 2009