Join 3,363 readers in helping fund MetaFilter (Hide)


Calling all pairs
March 9, 2010 12:21 PM   Subscribe

Excel filter: Is there a simple way to iterate through all pairs in a list, perform a calculation on each pair, and sum the result?

For example, I often need to do things like compute the sum all the possible pairwise products in a list. So if my data is X_1, X_2, X_3, I want X_1*X_2 + X_1*X_3 + X_2*X_3. Forgetting algebra tricks (which would depend on which function I was using), is there a simple way with Excel functions to address each pair, compute whatever I need, and add them up? I know how to do it in VBA, but I have the feeling there's a clever trick I'm missing.
posted by albrecht to Computers & Internet (11 answers total)
 
It sounds like you want a cartesian product. Unfortunately, a quick google for that term + Excel only turns up VBA and SQL answers to your problem, which I kinda expected.
posted by pwnguin at 12:42 PM on March 9, 2010


The way I do it. And I'd be interested to see if anyone has a better way, is to take the column, and paste it as a row. Forming a grid. Then putting the formula in each cell eg. R1C*RC1, and summing up the whole table.
posted by 92_elements at 12:45 PM on March 9, 2010


Stick one list in a column. Copy the other list, then Paste Special->Transpose into the row above the start of the first list. Now you have a matrix. Use the '$' notation to set up absolute references in your formula.

So, presuming list one is in column one, and list two in row A, you could have cell B2 be:

=B$1*$A2

and then fill across and down. The non-$'d parts of the reference will be relative, the $'d part will be absolute.

The sum would be =SUM(reference_to_your_matrix)/2
posted by pompomtom at 12:46 PM on March 9, 2010


(gah, rows, columns, you know what I mean...)
posted by pompomtom at 12:47 PM on March 9, 2010


Also, I'm pretty sure the smart answer is the SUMPRODUCT function, but it's early and I can't seem to wrap my hear around its use just now...
posted by pompomtom at 12:58 PM on March 9, 2010


SUMPRODUCT doesn't do permutations the way a cartesian product requires. SUMPRODUCT's use can be demonstrated by say, a spreadsheet of inventory prices and their sales. If you wanted to know how much revenue was generated, you'd multiply each inventory item's price by it's number sold, and add them all up. SUMPRODUCT takes two ranges and does that calculation for you.
posted by pwnguin at 1:07 PM on March 9, 2010


Stick one list in a column. Copy the other list, then Paste Special->Transpose into the row above the start of the first list. Now you have a matrix. Use the '$' notation to set up absolute references in your formula.

That would definitely work, but I'm usually dealing with long columns of data (10k entries or more). So if at all possible, I'd prefer to have something that I could implement in a single column or even a single cell. I just have the vague feeling that there's some way to do it with a combination of offsets and sums.
posted by albrecht at 1:20 PM on March 9, 2010


Ahh, OK.

pwnguin: Thanks for that... now I understand the weird results I got.

Knowing all this, I'd use a UDF:

Function CARTES(rInput As Range)
For Each c1 In rInput.Cells
For Each c2 In rInput.Cells
CARTES = CARTES + (c1 * c2)
Next c2
Next c1
End Function
posted by pompomtom at 2:36 PM on March 9, 2010


You could generalize that UDF a bit by taking two parameters, and giving the same parameter twice if you're only operating on one list.
posted by pwnguin at 3:19 PM on March 9, 2010


Ok, I've thought about this a bit more, and a bit of math is in order:

Assume a list of items from a-z, on which you wish to compute the sum you've defined. Going by the above discussion, I assume order matters, ie a*z and z*a are both in the sum, and that it's okay that you only count a*a once.

If you sort the algebraic representation of the cartesian sum by first item, you get a*a + a*b + a*c + ... + b*a + b*b + b*d.

So then we factor this representation and get a*(a +b +c...) + b*(a+b+c...).

Replace these with a summation and get a*SUM(a:z) + b*SUM(a:z).

Now the clever bit: I claim the above can be rewritten mathematically the sum of i from a-z of i*SUM(a:z). Since the SUM(a:z) is constant, it can be moved outside the sigma, leaving you with SUM(a:z)*SUM(a:z).

Which can be further optimized to (SUM(a:z))^2.
posted by pwnguin at 3:44 PM on March 9, 2010 [1 favorite]


Now the clever bit: I claim the above can be rewritten mathematically the sum of i from a-z of i*SUM(a:z). Since the SUM(a:z) is constant, it can be moved outside the sigma, leaving you with SUM(a:z)*SUM(a:z)

Right, that works great for the particular example I gave, but I was really looking for something more general that I could apply any time I have to compute the sum of some function over all pairs in my data set. For example, today I was trying to compute the Kendall tau correlation coefficient, which requires you to sum things like sgn((X_i-X_j)*(Y_i-Y_j)) over all pairs (i, j). I'll probably just continue doing these as UDFs, but I figured someone out there might know a shortcut just using the built in Excel functions.
posted by albrecht at 8:10 PM on March 9, 2010


« Older Trying new things! Two week, c...   |  Help my friend figure out what... Newer »
This thread is closed to new comments.