I got 99 problems and Excel isn't one of them.
May 21, 2011 12:38 PM   Subscribe

How do I, using Excel, determine the frequency of orders, the minimum/maximum amount per customer and the average order size?

My boss is breathing down my neck to finish a gigantic, hairy beast of a report. And me stumped.

What I have is a spreadsheet with 8 columns across and about 1500 rows of data. The columns contain information about products I sell as follows: SKU number, size of product, name of product, date ordered, buying customer name, order size (number of units ordered), order number and invoice number. I currently have the rows of information sorted by SKU number (each has a unique SKU). I then sorted each SKU by customer name. I then placed between each product/customer 3 blank rows or so to clearly demarcate the different customers. I did this to make handling 2000 different sales instances with 20 to 50 different customers a little easier to handle, rather than put them all on separate tabs. There are a total of about 15 to 20 different SKU numbers in this lot of products.
(I hope this is making sense so far!)
The only thing that concerns us in this exercise is the order size (number of units ordered) and the date ordered column. The units ordered range from 1 to 500 units (possibly more) per order. The dates range from June 2010 to March 2011.

So to find the average order size, I very innocently went to the bottom of one customer's group of orders of a product and told Excel to average the orders. But I realized that I wasn't finding the average order size. I was finding the average of the units ordered as a whole. If they order 100 units four times and 50 units two times, they most commonly order 100, not 83. This seemed off to me so I asked someone in accounting. After many fits and starts and being barely able to explain why this seems off to me, he tells me that what it sounds to him I'm asking for is the median. Which seems to work because "median(A1:A6)" below the aforementioned 4 100 unit orders and 2 50 unit orders gives me "100". But at this point, I'm not really sure how this whole thing works. As Walter would say to Donny, I am out of my element!

Then I started trying to figure out how often the customer placed the orders using the dates column and I broke out in a cold sweat. I tried the frequency function in Excel and tried to set up a separate column with all of the dates between June 1, 2010 and March 31, 2011 in it for the bin array and used the series of dates in the "date ordered" column for the data array. And it seems totally silly now because of course it didn't work.

So now I'm spending a lovely Saturday afternoon at my computer dreading dealing with the hairy eyeball from my boss on Monday. Please help me get out of the weeds, mathematicians/statisticians of MeFi! Bonus points if you can explain it as if speaking to a six year old. Should you be able to help me I will build a shrine and christen my first born child in your honor.
posted by hecho de la basura to Computers & Internet (11 answers total) 1 user marked this as a favorite
 
(1) Is your situation comparable to a shoe store? That is, shoe sizes come in certain numbers only, so an average shoe size is meaningless because it could be 9.435345, and that's not a shoe size? If so, I think you want the mode. If you sold shoes in sizes 5, 6, 7, 7, 8, 9, and 10, 7 is the mode, or the most frequent size sold.

(2) If you have your data organized by customer, couldn't you just make a new column that numbers each sale by an increment of 1, and then find the max number per a customer? Not sure I'm following you here.

(3) You should start looking into relational databases. They will make your life much easier if you work with this kind of data often.
posted by ifandonlyif at 1:03 PM on May 21, 2011


You don't want the median either from what you wrote. The median of 1, 1, 2, 2, 2, 3, 3, 3, 3 would be 2 which occurs three times. You would want 3 which occurs four times.

I think what you want is to count the number of orders matching a given criteria which you can do with the countif function. What I've done before in similar situations is make a new column and for every row do a countif. So if your order size column were AA it might look something like this:

=countif($AA:$AA, $AAx);

where x is just whatever row you're looking in. If you put that formula into every row then any row that matches the order numbers of your current row will be counted (including the current row). If you sort by this row the largest count will be your most common order size.

I'm not entirely sure how you want to determine frequency. Did you want something like: number of orders / number of days between first and last order?
posted by Green With You at 1:06 PM on May 21, 2011


(Mode! Didn't know that was the name of this thing. Excel seems to have a 'mode' function for it that might be simpler than using countif but I have never used it)
posted by Green With You at 1:08 PM on May 21, 2011


Response by poster: Thank you so far! I will try mode when I get back on Monday. I think that's closer than averaging the total or the median for the units ordered. I have zero experience working with mode so it should be fun!

The order frequency is more on the order of "How often did this customer order this product per month over the period of June 2010 to March 2011?" The products contain a controlled substance that must be monitored. We are trying to determine where the "flag" for orders should lay based on past behavior of current customers. If a customer, over our analysis period, orders 15 times a month normally and then in May 2011 places 60 different orders during that month, we need to be able to see that. Does that make it a little clearer? I apologize for not being clearer. I hate to be a stereotype but I'm an art major!
posted by hecho de la basura at 1:20 PM on May 21, 2011


Best answer: For the order frequency problem, Excel provides a solution called 'pivot tables', which are hard to master but which are aimed at the specific problem of summarizing data by day/week/month/period/etc. There are a lot of pivot table tutorials on the web; google around.
posted by felix at 2:18 PM on May 21, 2011


Best answer: Was just about to say pivottables as well. Put dates in rows (there is an option to Group dates by months, to save you a bit of work), and customers going across on each column.

Drop Order Size into the values area. Change it to be the average. You can also drop it again and that set it to give you the number of orders/customer/month.

Note Pivottables can't give you the mode as you specify above, but from what you say, you just want to see outliers. If so, comparing the average to the max might help spot deviations from norms. You could also look at the standard deviation to help identify them.
posted by Boobus Tuber at 2:31 PM on May 21, 2011


For the "most frequently ordered size' you need the MODE function.

For the number of orders between June 1, 2010 and March 31, 2011, try this:
=COUNTIF(A1:A6,"<"&DATE(2011,3,31))-COUNTIF(A1:A6,"<"&DATE(2010,6,1))
posted by prenominal at 4:54 PM on May 21, 2011


Stop tryin gto use a screwdriver as a hammer.

Put the data into a database and run some queries wit group bys.
posted by orthogonality at 5:26 PM on May 21, 2011


You could also pull your gigantor spreadsheet into Access and (as long as it's well organized) write queries using the query designer or wizard. They do not work the same way, and without really knowing your data I can't really recommend how to formulate your query but if each line is a separate order, you could write a query that would show you all orders by that customer between certain dates.
posted by fiercekitten at 5:28 PM on May 21, 2011


Best answer: The only problem with a database is that finding the mode is going to be... challenging... depending on what database software you're using. Also, if you're not comfortable with this stuff then you'll probably have trouble getting up to speed with databases by Monday. In the long term, though, this kind of data really does belong in a database, especially since it sounds like you're going to be compiling reports on a regular basis.

Also, to summarize the math:

Mean (arithmetic mean): Add up all the numbers, divide by how many items there were. If my numbers are 1, 2, 10, 11 then the mean is (1+2+10+11)/4 = 6. Excel incorrectly calls this the average.

Median: The middle number when all the numbers are sorted. If my numbers are 1, 1, 10, 19, 20 then the median is the third number, which is 10. If you have two middle numbers, then you take the mean of them, for example 1, 1, 10, 12, 19, 20 the median is the mean of 10 and 12, so 11.

Mode: The number which happens most frequently. If your numbers are 1, 2, 3, 3 then the mode is 3. If you have several numbers which repeat equally, for example 1, 1, 2, 2, 3, 3 then the mode is not well defined: Excel will use the number which is smallest.

Average: The single number which best represents the data. Outside of Excel, this is often, but not always, the mean, median, or mode...depending on what kind of data you have. Inside of Excel this is always the mean.
posted by anaelith at 6:16 PM on May 21, 2011


Response by poster: I have not quite massaged the gigantor spreadsheet into the perfect pivot table, but it's close! Pivot tables are actually a fantastic tool that I've never really had to use so I'm happy to be learning it now. Thanks for the suggestions! I never would have thought of pivot tables!
posted by hecho de la basura at 5:06 PM on May 23, 2011


« Older Men and Dust in the Wind   |   Do you this Poem? Newer »
This thread is closed to new comments.