Comments on: Find standard deviation in Excel, but first develop a labor-saving trick
http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick/
Comments on Ask MetaFilter post Find standard deviation in Excel, but first develop a labor-saving trickFri, 21 Jun 2013 12:54:25 -0800Fri, 21 Jun 2013 13:14:59 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Question: Find standard deviation in Excel, but first develop a labor-saving trick
http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick
Help me with statistics and Excel. Especially help me if you know any labor saving methods. I want the median, mean and standard deviation for the average price of all items sold, but my spreadsheet-full-of-data doesn't tell me the price of each sale -- just the average price per store, and the number sold at that store. Something like this: <br /><br /> <pre>Store# soldAvg price per item<br>
A72$16,824<br>
B208$13,133<br>
C269$15,190<br>
...</pre><br>
<br>
So I think I need to count store A's price 72 times and store B's price 208 times, etc. It's pretty each to figure out algebra that will allow me to calculate mean treatment price given this information ("# of items sold at A" times "Avg. price at A"+"# of items sold at B" times "Avg. price at B" ... etc.) divided by (sum of # of items sold at all stores). <br>
<br>
Is there a formula or other approach I can use in Excel that will also allow me to calculate the median and standard deviation relatively easily? Or do I need to re-create my spreadsheet so it reads <br>
<br>
<pre>StoreAvg price per item<br>
A$16,824<br>
A$16,824<br>
A$16,824<br>
... (etc, a total of 72 times)<br>
B$13,133<br>
B$13,133<br>
... (etc., etc.)</pre><br>
<br>
There are 30-some stores, 99 distinct items and roughly 39,500 unique sales of those items across all the stores, so I'd *really* prefer not to have to copy and paste onto a new line to represent each individual item sold.post:ask.metafilter.com,2013:site.243333Fri, 21 Jun 2013 12:54:25 -0800croutonsupafreakstatisticsmathmathematicsstandarddeviationmeanmedianExcelaveragehelpBy: Blazecock Pileon
http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick#3533420
The Wikipedia page on <a href="http://en.wikipedia.org/wiki/Weighted_arithmetic_mean">weighted arithmetic means</a> offers some examples of and formulas for how to calculate these properties.comment:ask.metafilter.com,2013:site.243333-3533420Fri, 21 Jun 2013 13:14:59 -0800Blazecock PileonBy: croutonsupafreak
http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick#3533440
Thanks, Blazecock Pileon. That's helpful, although some of that math is over my head. I'd love any help possible with translating to Excel-ese. In the mean time, I guess I'll dig up my old stats text book (knew there was a reason I kept it around) to see if it will help me interpret the Wikipedia entry.comment:ask.metafilter.com,2013:site.243333-3533440Fri, 21 Jun 2013 13:24:19 -0800croutonsupafreakBy: pombe
http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick#3533481
Depending on what you want to know you may or may not be able to calculate it from the above data. More precisely, the median value (or standard deviation) of all products sold is not the same as the median average sale price from each store. This example should make that clear:<br>
<br>
Store A sells two of the item, one at $1, and one at $7. The average price for store A is (1+7)/2 = $4<br>
<br>
Store B sells one of the item, at $2, so its average price is $2.<br>
<br>
Store C sells two of the item, one at $2, and one at $4. It's average price is $3.<br>
<br>
The median price for all five items = median(1,2,2,4,7) = 2.<br>
<br>
The median of the average prices is median(2,3,4) = 3, which is different. Even if you count each average the number of times that store sold something you would have median(2,3,3,4,4) =3, which is still different than the median price of all items sold.<br>
<br>
So you really want to think about what you want to measure. You can calculate weighted medians and weighted standard deviations along the lines of the weighted mean, but that won't give you the same answers as if you had all the original sale prices.<br>
<br>
Hopefully this isn't too pedantic. If you already know all this I apologize!comment:ask.metafilter.com,2013:site.243333-3533481Fri, 21 Jun 2013 13:40:56 -0800pombeBy: St. Peepsburg
http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick#3533485
Your total average for all stores is correct - you're taking the "weighted average."<br>
<br>
I'm tempted to ask why you care about the median, unless this is a school exercise. For a normal distribution mean = median = mode.<br>
<br>
For stdev, you could take the stdev of all the means which in excel is stdev(A,B,C...) but all that tells you is that if you were to draw a number out of the "hat of average store sales", you can predict where that value might fall.<br>
<br>
I feel like I need more info about what you want in order to make the statistics give you a meaningful answer. I find the trick with statistics is to always ask yourself: yes but what does this math represent in real life?<br>
<br>
<em>(omg I am sooo jonesing to look at your spreadsheet now, does that make me sick?)</em>comment:ask.metafilter.com,2013:site.243333-3533485Fri, 21 Jun 2013 13:45:40 -0800St. PeepsburgBy: kalessin
http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick#3533489
I've been hacking around in excel trying to create an array constant dynamically within the spreadsheet itself. I did manage to create a text representation of an array constant, but true to my fears, you can't use =value() or apparently other functions to take the array constant-like text and use it as an array constant.<br>
<br>
My hope there was that if you could dynamically create an array constant and get Excel to use it as one, you could load that into the standard =average() and other formulas already provided in standard Excel spreadsheets. Unfortunately it doesn't seem easy or possible to do that in the spreadsheet formulas themselves.<br>
<br>
In regard to getting that far, the salient formula is this:<br>
=CONCATENATE("{",CONCATENATE(REPT(CONCATENATE(C2,", "), B2)),"}")<br>
where B2 is the quantity sold and c2 is the price per item sold.<br>
<br>
But after creating the array constant-like text, there doesn't seem to be anywhere further to go.<br>
<br>
There may be some leeway if you went into VBA and did some macro scripting, but I'm afraid I don't have time to do that research for you.comment:ask.metafilter.com,2013:site.243333-3533489Fri, 21 Jun 2013 13:47:46 -0800kalessinBy: DevilsAdvocate
http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick#3533490
Let's say your #sold data is in cells B2 - B40, and the average price per item is in cells C2 - C40. (row 1 as your column headers). The total number of sales is going to be useful for all three calculations, so let's say you put that in cell B42:<br>
<br>
<pre>=SUM(B2:B40)</pre><br>
<br>
The mean is:<br>
<br>
<pre>=SUMPRODUCT(B2:B40,C2:C40)/B42</pre><br>
<br>
Let's say you put that in cell C42, because we'll use that number again.<br>
<br>
This works regardless of the distribution of individual sale prices at each store.<br>
<br>
I mention that because it's time for a <b>huge</b> caveat regarding the standard deviation and the mean: you don't have enough information to calculate the true values of these. To calculate those, you'd need to know more about the distribution of sales at each store. Those 72 sales at store A might be 72 sales at $16,824 each, or they might be 54 sales at $18,395 and 18 sales at $12,111 each, or any number of other distributions. And what that distribution is makes a difference for the median and standard deviation. (On preview, repeating pombe's warning.)<br>
<br>
That said, if you want to <i>assume</i> that the 72 sales at store A were in fact 72 sales all at $16,824 each, and so forth, you can calculate the standard deviation and median given that assumption.<br>
<br>
For the standard deviation, put the following in cell D2:<br>
<pre>=B2*(C2-C$42)^2</pre><br>
<br>
Now pull that down to fill cells D3-D40. And in cell D42, put:<br>
<br>
<pre>=SQRT(SUM(D2:D40)/B42)</pre><br>
<br>
That's your standard deviation.<br>
<br>
For the median, you'll need to sort your data on average price per item (ascending, or descending, it doesn't matter). Then put a running total of the sales in column E, by putting the following in cell E2:<br>
<br>
<pre>=SUM(E$2:E2)</pre><br>
<br>
and pull that down to fill cells E3-E40.<br>
<br>
Now take half the number of your total sales (half the number in B42), and find the smallest number in column E which is greater than that. The corresponding price per item is the median. <b>Special case</b>: if a number in column E is <i>exactly</i> half the total number of sales, the median is the average of the price per item corresponding to that entry and the next price per item in the list.comment:ask.metafilter.com,2013:site.243333-3533490Fri, 21 Jun 2013 13:48:14 -0800DevilsAdvocateBy: DevilsAdvocate
http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick#3533497
Sorry, that last formula should be<br>
<br>
<pre>=SUM(B$2:B2)</pre><br>
<br>
Just missed the edit window.comment:ask.metafilter.com,2013:site.243333-3533497Fri, 21 Jun 2013 13:54:13 -0800DevilsAdvocateBy: croutonsupafreak
http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick#3533512
<a href="http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick#3533485">St. Peepsburg</a>: You're right, I don't actually need the median.<br>
<br>
My goal here is to identify striking patterns related to both stores and items for sale, along the lines of:<br>
* Which stores are charging considerably more or less than the norm for multiple items? (At which point I can start asking if they are taking advantage of people and/or undercutting the competition as a result of their charges.)<br>
* Which items tend to sell for about the same price consistently, and which have a very wide variation? (Which would lead me to ask why the variation exists.)<br>
<br>
I can actually find so-so answers to both of these questions by just sorting and eyeballing what I see, but I'd prefer to be more methodical and rigorous than that with my work.<br>
<br>
<small><br>
(I'm not actually researching stores and items sold, but the analogy is close and requires less explanation than going into full details, so I'll stick with it.) </small>comment:ask.metafilter.com,2013:site.243333-3533512Fri, 21 Jun 2013 14:06:14 -0800croutonsupafreakBy: croutonsupafreak
http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick#3533516
kalessin and DevilsAdvocate, thanks for the potentially useful/helpful answers. It's gonna take me time to try to fully grok your posts, so follow-ups and/or best answer status may be forthcoming. We'll see.comment:ask.metafilter.com,2013:site.243333-3533516Fri, 21 Jun 2013 14:07:55 -0800croutonsupafreakBy: kalessin
http://ask.metafilter.com/243333/Find-standard-deviation-in-Excel-but-first-develop-a-laborsaving-trick#3535598
No worries. I don't give answers for kudos. I do it to try to help.<br>
<br>
Also like I said, I think my research went to a dead end and that what you're really looking for is likely to take either some other approach and/or some VBScripting in Excel to expand those qty/price pairs into actual ranges in a hidden sheet somewhere.comment:ask.metafilter.com,2013:site.243333-3535598Mon, 24 Jun 2013 10:47:25 -0800kalessin