March 9, 2010 12:21 PM Subscribe

Excel filter: Is there a simple way to iterate through all pairs in a list, perform a calculation on each pair, and sum the result?

For example, I often need to do things like compute the sum all the possible pairwise products in a list. So if my data is X_1, X_2, X_3, I want X_1*X_2 + X_1*X_3 + X_2*X_3. Forgetting algebra tricks (which would depend on which function I was using), is there a simple way with Excel functions to address each pair, compute whatever I need, and add them up? I know how to do it in VBA, but I have the feeling there's a clever trick I'm missing.
posted by albrecht to Computers & Internet (11 answers total)

For example, I often need to do things like compute the sum all the possible pairwise products in a list. So if my data is X_1, X_2, X_3, I want X_1*X_2 + X_1*X_3 + X_2*X_3. Forgetting algebra tricks (which would depend on which function I was using), is there a simple way with Excel functions to address each pair, compute whatever I need, and add them up? I know how to do it in VBA, but I have the feeling there's a clever trick I'm missing.

The way I do it. And I'd be interested to see if anyone has a better way, is to take the column, and paste it as a row. Forming a grid. Then putting the formula in each cell eg. R1C*RC1, and summing up the whole table.

posted by 92_elements at 12:45 PM on March 9, 2010

posted by 92_elements at 12:45 PM on March 9, 2010

Stick one list in a column. Copy the other list, then Paste Special->Transpose into the row above the start of the first list. Now you have a matrix. Use the '$' notation to set up absolute references in your formula.

So, presuming list one is in column one, and list two in row A, you could have cell B2 be:

=B$1*$A2

and then fill across and down. The non-$'d parts of the reference will be relative, the $'d part will be absolute.

The sum would be =SUM(reference_to_your_matrix)/2

posted by pompomtom at 12:46 PM on March 9, 2010

So, presuming list one is in column one, and list two in row A, you could have cell B2 be:

=B$1*$A2

and then fill across and down. The non-$'d parts of the reference will be relative, the $'d part will be absolute.

The sum would be =SUM(reference_to_your_matrix)/2

posted by pompomtom at 12:46 PM on March 9, 2010

Also, I'm pretty sure the smart answer is the SUMPRODUCT function, but it's early and I can't seem to wrap my hear around its use just now...

posted by pompomtom at 12:58 PM on March 9, 2010

posted by pompomtom at 12:58 PM on March 9, 2010

SUMPRODUCT doesn't do permutations the way a cartesian product requires. SUMPRODUCT's use can be demonstrated by say, a spreadsheet of inventory prices and their sales. If you wanted to know how much revenue was generated, you'd multiply each inventory item's price by it's number sold, and add them all up. SUMPRODUCT takes two ranges and does that calculation for you.

posted by pwnguin at 1:07 PM on March 9, 2010

posted by pwnguin at 1:07 PM on March 9, 2010

That would definitely work, but I'm usually dealing with long columns of data (10k entries or more). So if at all possible, I'd prefer to have something that I could implement in a single column or even a single cell. I just have the vague feeling that there's some way to do it with a combination of offsets and sums.

posted by albrecht at 1:20 PM on March 9, 2010

Ahh, OK.

pwnguin: Thanks for that... now I understand the weird results I got.

Knowing all this, I'd use a UDF:

Function CARTES(rInput As Range)

For Each c1 In rInput.Cells

For Each c2 In rInput.Cells

CARTES = CARTES + (c1 * c2)

Next c2

Next c1

End Function

posted by pompomtom at 2:36 PM on March 9, 2010

pwnguin: Thanks for that... now I understand the weird results I got.

Knowing all this, I'd use a UDF:

Function CARTES(rInput As Range)

For Each c1 In rInput.Cells

For Each c2 In rInput.Cells

CARTES = CARTES + (c1 * c2)

Next c2

Next c1

End Function

posted by pompomtom at 2:36 PM on March 9, 2010

You could generalize that UDF a bit by taking two parameters, and giving the same parameter twice if you're only operating on one list.

posted by pwnguin at 3:19 PM on March 9, 2010

posted by pwnguin at 3:19 PM on March 9, 2010

Ok, I've thought about this a bit more, and a bit of math is in order:

Assume a list of items from a-z, on which you wish to compute the sum you've defined. Going by the above discussion, I assume order matters, ie a*z and z*a are both in the sum, and that it's okay that you only count a*a once.

If you sort the algebraic representation of the cartesian sum by first item, you get`a*a + a*b + a*c + ... + b*a + b*b + b*d`.

So then we factor this representation and get`a*(a +b +c...) + b*(a+b+c...)`.

Replace these with a summation and get`a*SUM(a:z) + b*SUM(a:z)`.

Now the clever bit: I claim the above can be rewritten mathematically the sum of i from a-z of i*SUM(a:z). Since the SUM(a:z) is constant, it can be moved outside the sigma, leaving you with`SUM(a:z)*SUM(a:z)`.

Which can be further optimized to`(SUM(a:z))^2`.

posted by pwnguin at 3:44 PM on March 9, 2010 [1 favorite]

Assume a list of items from a-z, on which you wish to compute the sum you've defined. Going by the above discussion, I assume order matters, ie a*z and z*a are both in the sum, and that it's okay that you only count a*a once.

If you sort the algebraic representation of the cartesian sum by first item, you get

So then we factor this representation and get

Replace these with a summation and get

Now the clever bit: I claim the above can be rewritten mathematically the sum of i from a-z of i*SUM(a:z). Since the SUM(a:z) is constant, it can be moved outside the sigma, leaving you with

Which can be further optimized to

posted by pwnguin at 3:44 PM on March 9, 2010 [1 favorite]

Right, that works great for the particular example I gave, but I was really looking for something more general that I could apply any time I have to compute the sum of some function over all pairs in my data set. For example, today I was trying to compute the Kendall tau correlation coefficient, which requires you to sum things like sgn((X_i-X_j)*(Y_i-Y_j)) over all pairs (i, j). I'll probably just continue doing these as UDFs, but I figured someone out there might know a shortcut just using the built in Excel functions.

posted by albrecht at 8:10 PM on March 9, 2010

This thread is closed to new comments.

posted by pwnguin at 12:42 PM on March 9, 2010