How to present and evaluate data relative to weight
July 19, 2016 8:01 AM
I could use some help walking through this: I have Excel data that means more or less depending on where it comes from. When I'm working with it, presenting it, or thinking about it, I want to treat it differently depending on origin. Example within.
Here's my example. I have stores in three towns that sell lemons.
One town has a population of 10, and the store there sells 43 lemons.
One town has a population of 100 and the store there sells 47 lemons.
One town has a population of 50, and the store there sells 42 lemons.
The town with only 10 people is exceeding expectations even though they've sold less than the store in the town with 100 people, because they're moving a ridiculous number of lemons for the population.
I'm not terribly math-y, though I do have the basic business calculations; I've just hit a mental block with this, and Googling led me into a level of mathiness where people start writing in Sanskrit and using squiggly little variables and I guess I need this dumbed down a bit, English-major style. I don't know if I'm standardizing or normalizing or what, and I don't even have the math skills needed to read blog posts about it. It's all assuming the reader took a statistics class and while I wish I did, I didn't.
I'm working with the data in Excel, so formulas are helpful.
Here's my example. I have stores in three towns that sell lemons.
One town has a population of 10, and the store there sells 43 lemons.
One town has a population of 100 and the store there sells 47 lemons.
One town has a population of 50, and the store there sells 42 lemons.
The town with only 10 people is exceeding expectations even though they've sold less than the store in the town with 100 people, because they're moving a ridiculous number of lemons for the population.
I'm not terribly math-y, though I do have the basic business calculations; I've just hit a mental block with this, and Googling led me into a level of mathiness where people start writing in Sanskrit and using squiggly little variables and I guess I need this dumbed down a bit, English-major style. I don't know if I'm standardizing or normalizing or what, and I don't even have the math skills needed to read blog posts about it. It's all assuming the reader took a statistics class and while I wish I did, I didn't.
I'm working with the data in Excel, so formulas are helpful.
In this case, I'd divide the # of lemons by the population of the town to show lemons per person.
posted by cabingirl at 8:11 AM on July 19, 2016
posted by cabingirl at 8:11 AM on July 19, 2016
As far as this data goes, you are just interested in lemons sold per capita, right?
The store in town A sells 43/10 = 4.30 lemons to each resident.
The store in town B sells 47/100 = 0.47 lemons to each resident.
The store in town C sells 42/50 = 0.84 lemons to each resident.
But I have a feeling that the question you are actually asking is more complicated than that.
posted by dfan at 8:11 AM on July 19, 2016
The store in town A sells 43/10 = 4.30 lemons to each resident.
The store in town B sells 47/100 = 0.47 lemons to each resident.
The store in town C sells 42/50 = 0.84 lemons to each resident.
But I have a feeling that the question you are actually asking is more complicated than that.
posted by dfan at 8:11 AM on July 19, 2016
You could also make a chart where lemons per capita is one axis, and population is the other axis, if you're looking to show a connection between those.
posted by cabingirl at 8:12 AM on July 19, 2016
posted by cabingirl at 8:12 AM on July 19, 2016
I don't know if I'm standardizing or normalizing or what
Based on your (single) example, I think you're normalizing, and the way to normalize a dataset like that is to have a single (shared) scale. In this particular case, it'd be lemons per resident. The way to calculate lemons per resident is to divide the number of lemons by the number of residents in the town.
posted by saeculorum at 8:47 AM on July 19, 2016
Based on your (single) example, I think you're normalizing, and the way to normalize a dataset like that is to have a single (shared) scale. In this particular case, it'd be lemons per resident. The way to calculate lemons per resident is to divide the number of lemons by the number of residents in the town.
posted by saeculorum at 8:47 AM on July 19, 2016
Agree with finding one metric that will give you a single scale. In your case lemons per capita, although you may also need to figure out if you need to control for the fact that the store in the town of 100 people is also competing with a second store and has a customer base of 50 and therefore not actually underperforming.
posted by The Elusive Architeuthis at 11:33 AM on July 19, 2016
posted by The Elusive Architeuthis at 11:33 AM on July 19, 2016
Thank you all. I think I was seriously overthinking it. Dfan is right that the actual scenario is a lot more complicated, but I think just because the scenario is more complicated maybe the solution doesn't have to be.
Thanks everyone for your help.
posted by A Terrible Llama at 3:06 PM on July 19, 2016
Thanks everyone for your help.
posted by A Terrible Llama at 3:06 PM on July 19, 2016
Beware the Small Sample Size Paradox: https://clinicaltrialist.wordpress.com/paradoxes/small-sample-size-paradox/
In a nutshell, if you ask "what towns have the highest lemon sales per capita" you will find it's a small town.
If you ask "what towns have the lowest lemon sales per capita" - also a small town.
posted by soylent00FF00 at 5:22 PM on July 19, 2016
In a nutshell, if you ask "what towns have the highest lemon sales per capita" you will find it's a small town.
If you ask "what towns have the lowest lemon sales per capita" - also a small town.
posted by soylent00FF00 at 5:22 PM on July 19, 2016
Goddamn it. Now I'm confused again.
What I'm looking for is a mathematical way to normalize against market size, over and over again. Not just lemon sales. And the market might have one million people and sell 172 lemons. Once I look at per capita, we start looking at miniscule per capita stats.
If I have ten markets of varying sizes. I want to weight everything they do relative to each other by multiplying somehow. If market A is 30% bigger than market B, when I'm reporting on this I want to take into consideration that market B is smaller than A.
(It's a true market to market comparison in the real world--the starting points are going to range from ten million to 30,000, for example. )
I want to be able to do this over and over.
I'll try using the per capita model, but I'm not sure it's really what I want to do and I'm having trouble articulating why....
posted by A Terrible Llama at 8:31 AM on July 20, 2016
What I'm looking for is a mathematical way to normalize against market size, over and over again. Not just lemon sales. And the market might have one million people and sell 172 lemons. Once I look at per capita, we start looking at miniscule per capita stats.
If I have ten markets of varying sizes. I want to weight everything they do relative to each other by multiplying somehow. If market A is 30% bigger than market B, when I'm reporting on this I want to take into consideration that market B is smaller than A.
(It's a true market to market comparison in the real world--the starting points are going to range from ten million to 30,000, for example. )
I want to be able to do this over and over.
I'll try using the per capita model, but I'm not sure it's really what I want to do and I'm having trouble articulating why....
posted by A Terrible Llama at 8:31 AM on July 20, 2016
Could you just multiply the per capita number by 100 or 1000 or whatever to get lemons per 1000?
Alternatively, maybe equalize the market sizes by multiplying the per capita numbers by the population of your average market? So, for example:
posted by kevinbelt at 11:43 AM on July 20, 2016
Alternatively, maybe equalize the market sizes by multiplying the per capita numbers by the population of your average market? So, for example:
TOWN | POPULATION | LEMONS SOLD | PER CAPITA LEMONS | AVERAGE POPULATION | LEMONS PER AVG POP | A | 10 | 43 | =c1/b1 [4.3] | =(sum(b1:b3))/3 [53.33] | =d1*e1 [229.3] | B | 100 | 47 | =c2/b2 [0.47] | =(sum(b1:b3))/3 [53.33] | =d2*e2 [25.11] | C | 50 | 42 | =c3/b3 [0.84] | =(sum(b1:b3))/3 [53.33] | =d3*e3 [44.8] |
posted by kevinbelt at 11:43 AM on July 20, 2016
A Terrible Llama, with your updated description it still sounds to me like the per capita model is the natural one. If market A is 30% bigger than market B, then when you do the division you will be dividing market A's results by 30% more than you divide market B's results by, which will normalize the numbers as you desire. If you don't like how small the numbers are, just multiply them by 1000 or something as kevinbelt suggests.
posted by dfan at 6:56 AM on July 25, 2016
posted by dfan at 6:56 AM on July 25, 2016
« Older Domestic cleaning agencies in central London... | Vacation recommendations for Corsica Newer »
This thread is closed to new comments.
posted by kevinbelt at 8:10 AM on July 19, 2016