August 26, 2009 5:31 PM Subscribe

How to calculate the stats for the "Average" US citizen? And does it mean anything?

Recently there have been news of a US study stating that the "average gamer" is 35, overweight and depressed. My first joking reaction was to say "hey, this is like the average US citizen!". Please don't hit me. But I would like to know how far this "average gamer" is from the "average USian", and how you would go about calculating it.

Also, as a side, question, I don't think that you can average people in any meaningful way. If the distribution is normal, the average fits the median, which does mean something. If the median gamer is 35, that means 50% of all gamers are over 35, and 50% are under 35. Wow, games are really not just for kids anymore, etc.

But that's the median. Are the details for the "average anything" useful to anyone, or is the "average person" just lazy rhetorical shorthand for "there's many of them"?

Recently there have been news of a US study stating that the "average gamer" is 35, overweight and depressed. My first joking reaction was to say "hey, this is like the average US citizen!". Please don't hit me. But I would like to know how far this "average gamer" is from the "average USian", and how you would go about calculating it.

Also, as a side, question, I don't think that you can average people in any meaningful way. If the distribution is normal, the average fits the median, which does mean something. If the median gamer is 35, that means 50% of all gamers are over 35, and 50% are under 35. Wow, games are really not just for kids anymore, etc.

But that's the median. Are the details for the "average anything" useful to anyone, or is the "average person" just lazy rhetorical shorthand for "there's many of them"?

I couldn't find the original article but msnbc says the study was conducted "in the Seattle-Tacoma area." Isn't everyone depressed in Seattle?

posted by caelumluna at 5:47 PM on August 26, 2009

posted by caelumluna at 5:47 PM on August 26, 2009

Here is the survey: Health-Risk Correlates of Video-Game Playing Among Adults

Note in particular that "a total of 45.1% of respondents reported playing video games". So this is not a small subgroup of the population. Also that results about higher BMI and being more depressed were sex-specific: the male players were fatter, while the female players were more depressed. And that it only surveyed 18+, but looking at adults was the point so that's not surprising — it just was misreported.

posted by smackfu at 5:57 PM on August 26, 2009

Note in particular that "a total of 45.1% of respondents reported playing video games". So this is not a small subgroup of the population. Also that results about higher BMI and being more depressed were sex-specific: the male players were fatter, while the female players were more depressed. And that it only surveyed 18+, but looking at adults was the point so that's not surprising — it just was misreported.

posted by smackfu at 5:57 PM on August 26, 2009

This may be a bit simple, but Wikibooks has a pretty good overview of the different types of statistical analysis commonly used to analyze data. Here's the link.

posted by Aanidaani at 6:20 PM on August 26, 2009

posted by Aanidaani at 6:20 PM on August 26, 2009

According to the latest census estimates, the average (mean) age is about 37 years old. The median is about 36 years old. The mode (most common) age is 49 -- those gosh-darned baby-boomers.

*But that's the median. Are the details for the "average anything" useful to anyone, or is the "average person" just lazy rhetorical shorthand for "there's many of them"?*

Could you be more specific? You can certainly calculate means, medians, and modes whenever you see fit.

posted by mhum at 6:34 PM on August 26, 2009 [1 favorite]

Could you be more specific? You can certainly calculate means, medians, and modes whenever you see fit.

posted by mhum at 6:34 PM on August 26, 2009 [1 favorite]

Erm, yeah, you can. It's called demographics and it's a whole field of social science.

You can use the census to calculate the average age of the population. The census doesn't ask about weight, but I am sure there are plenty of medical studies available online.

There's no such thing as "average race" or "average gender" because race and gender are nominal variables. All you can do is say that most Americans are white, and a slight majority are female. Depressed/not depressed is likely to be measured in a nominal fashion rather than an ordinal, so it also can't be averaged, although if a study asked you to rate your mood on a scale of 1-5, you could produce an average result - for people in that study.

posted by desjardins at 6:39 PM on August 26, 2009

Well, technically, you're averaging the statistics related to the people. So, when someone says "the average American is around 36 years old", that really means "the average age of Americans is around 36 years". I'm not sure what the objection is here. Granted some statistics are categorical (e.g.: gender) which don't lend themselves to averaging. It doesn't make sense to think of an actual person as the "average American" because then you'd end up with some kind of 49% male, 51% female hermaphrodite (or otherwise intersexed) individual.

On preview: What desjardins says.

posted by mhum at 6:46 PM on August 26, 2009

Average is tricky. It is somewhat useful for normally distributed data, and less so for skewed data.

An example of normally distributed data would be the average height of an American male. Adding Shaquille O'Neal's height to a randomly selected sample of 30 American males' heights won't cause the average to change much. These data are not skewed.

The distribution of Americans' net worth, however, is very skewed. This is a skewed distribution, and so adding Bill Gates' net worth to a randomly drawn sample of 30 Americans' net worth will change the average, very much so.

In short, the average of anything is limited in its utility.

posted by dfriedman at 6:47 PM on August 26, 2009 [2 favorites]

An example of normally distributed data would be the average height of an American male. Adding Shaquille O'Neal's height to a randomly selected sample of 30 American males' heights won't cause the average to change much. These data are not skewed.

The distribution of Americans' net worth, however, is very skewed. This is a skewed distribution, and so adding Bill Gates' net worth to a randomly drawn sample of 30 Americans' net worth will change the average, very much so.

In short, the average of anything is limited in its utility.

posted by dfriedman at 6:47 PM on August 26, 2009 [2 favorites]

If you want descriptive statistics on the population, in addition to what the census collects there are many large surveys which take elaborate precautions to be as representative as possible. Example: NORC's page on the General Social Survey. They have many other survey projects. Another example would be behavioral risk factor surveillance system from the CDC.

If you want to learn how surveys are implemented and analyzed to get valid representations of the population, I recommend Sampling Techniques by Cochran and Survey Sampling by Kish.

posted by a robot made out of meat at 7:00 PM on August 26, 2009 [1 favorite]

If you want to learn how surveys are implemented and analyzed to get valid representations of the population, I recommend Sampling Techniques by Cochran and Survey Sampling by Kish.

posted by a robot made out of meat at 7:00 PM on August 26, 2009 [1 favorite]

There are three things you need to consider on this sort of thing: Normal Distribution, the Mean or Average and Standard Deviation.

If your data set is normally distributed the standard deviation tells you how widely it varies. Let's say we want to determine the health of a pond and we're going to weigh a bunch of frogs from each of two ponds. If you take a bunch of measurements at your pond and you get these numbers 8, 8, 9, 9 , 9, 10, 10, 10 oz. and I take a bunch of measurements in a different pond and I get these numbers 8, 8, 8, 9, 9, 10, 10, 10 oz. - on average my results are less than yours.

So my pond is less healthy than yours, right? Not so much, since my average plus or minus my standard deviation has huge overlap with your mean plus or minus standard deviation. If we went out and caught more frogs the next day the odds are probably close to even that my day two average would be higher.

posted by Kid Charlemagne at 7:24 PM on August 26, 2009 [2 favorites]

If your data set is normally distributed the standard deviation tells you how widely it varies. Let's say we want to determine the health of a pond and we're going to weigh a bunch of frogs from each of two ponds. If you take a bunch of measurements at your pond and you get these numbers 8, 8, 9, 9 , 9, 10, 10, 10 oz. and I take a bunch of measurements in a different pond and I get these numbers 8, 8, 8, 9, 9, 10, 10, 10 oz. - on average my results are less than yours.

So my pond is less healthy than yours, right? Not so much, since my average plus or minus my standard deviation has huge overlap with your mean plus or minus standard deviation. If we went out and caught more frogs the next day the odds are probably close to even that my day two average would be higher.

posted by Kid Charlemagne at 7:24 PM on August 26, 2009 [2 favorites]

dersins: *In short, the average of anything is limited in its utility.*

You do bring up a good point about the effect of outliers and skewed distributions on the average (I'm assuming you're talking about the mean). Of course,*everything* is limited in its utility. The key is figuring out which things are applicable when.

Consider the following contrived example: Suppose you had a wager which had a 99% chance of a $1 gain and a 1% chance of a $100 loss. Both the median and mode outcome is a $1 gain, but the mean outcome is a $0.01 loss. Which of these is most important to your evaluation is ultimately a judgment call. If this was just a one-time deal, maybe you'd be most interested in the most likely outcome. If, on the other hand, you were making a series of such wagers, you might be more interested in the mean outcome.

Returning to demographics, a quick glance at the data shows that it's not really outliers or skew per se which are affecting the different measures "average" age. It's the baby boomers (and their children and grandchildren) making a kinda lumpy age distribution. So, while the mean and median are quite close (indicating a relatively small degree of skew), the mode is quite a bit removed from those other two measures. This leads to an interesting tidbit: while the average (both mean and median) age is in the mid-thirties, if you were to forced to guess the age of a randomly-selected American (including babies), you should pick 49. You'd be right only 1.5% of the time, but if you said 36 or 37, you'd only be right 1.3% of the time. Even if you allow for a +/- 5-year window, you'd still guess 49, but now you'd have a 16.1% chance of being right vs. a 14.6% chance of being right if you picked 36.

posted by mhum at 7:29 PM on August 26, 2009

You do bring up a good point about the effect of outliers and skewed distributions on the average (I'm assuming you're talking about the mean). Of course,

Consider the following contrived example: Suppose you had a wager which had a 99% chance of a $1 gain and a 1% chance of a $100 loss. Both the median and mode outcome is a $1 gain, but the mean outcome is a $0.01 loss. Which of these is most important to your evaluation is ultimately a judgment call. If this was just a one-time deal, maybe you'd be most interested in the most likely outcome. If, on the other hand, you were making a series of such wagers, you might be more interested in the mean outcome.

Returning to demographics, a quick glance at the data shows that it's not really outliers or skew per se which are affecting the different measures "average" age. It's the baby boomers (and their children and grandchildren) making a kinda lumpy age distribution. So, while the mean and median are quite close (indicating a relatively small degree of skew), the mode is quite a bit removed from those other two measures. This leads to an interesting tidbit: while the average (both mean and median) age is in the mid-thirties, if you were to forced to guess the age of a randomly-selected American (including babies), you should pick 49. You'd be right only 1.5% of the time, but if you said 36 or 37, you'd only be right 1.3% of the time. Even if you allow for a +/- 5-year window, you'd still guess 49, but now you'd have a 16.1% chance of being right vs. a 14.6% chance of being right if you picked 36.

posted by mhum at 7:29 PM on August 26, 2009

This is just wrong. For any variable, you can add up the variable over all people and divide by the number of people. This is an average.

It is quite possible that so single observation in the sample or population will take on the average value, or even that it would be impossible for any single observation to do so (ie, almost anything that is a count). That has no impact on the ability to calculate the value, or the meaningfulness of the value.

An average is useful to anyone who wants a measure of central tendency that is related to the total sum of the thing in question, or to anyone who wants to deal in expected values of statistics from that population.

If you want to know how big a school you should plan to build a growing community, knowing that the families already there have 2.38 children on average is amazingly useful because you can then compute a confidence interval around how many children a new subdivision of 1248 homes will introduce into the school system. Even though no single family will ever have 2.38 children.

posted by ROU_Xenophobe at 7:52 PM on August 26, 2009 [1 favorite]

Well, firstly...here's the problem with studies concerning video games: Only a handful are carried out rationally by scientists looking to study something, rather than think tanks looking to prove something (of which plenty of doctors, scientists and lawyers can be involved, but with an agenda). There's an important difference there. That's not the only problem, however. Probably the biggest problem is that very few who study video games have played them. In the multimedia leg of my double major degree, everyone had played video games. In the media studies leg of it, probably half had and certainly fewer girls had; however, it was this side that wrote, theorized and studied video games. See a disconnect? It's all too common. So, first of all, this is maybe a bad study to try to base your understanding of research in the United States on.

I think what you want to know is how well the relatively small sample of 562 people in*one* area can be transposed onto the general U.S. population from all 50 states. Personally, I think it'd be difficult *with this study*, because it's *not based on objective fact*. It's based on self-reporting (even the BMI! WTF?), which already has many problems, but particularly when you start getting into hazy areas like depression, where probably a lot of people are inappropriately self-diagnosing, to begin.

I think transposing small sample studies (and yes 562 people in a single area is a small sample for a country the size of the U.S.) onto the entire U.S. population is one of the biggest issues with national and international media concerning the United States. This fuels so much misunderstanding that I don't even know where to begin, so I'm not even going to try. While you can find averages, the reasons behind them are often lost when study samples are small or even consist of only one region, particularly survey studies where the results aren't objective, unlike what ROU_Xenophone is talking about, which*is* objective. What they're referring to is purely statistical and mathematical; it's much less erroneous.

In some cases, there is common ground between regions, so as to warrant a generalized (average) statement about the country, as in the case of birth or death rates. However, in the foggier, self-reporting studies, I find generalized statements aren't very helpful at all. Very few of these studies look at other important, objective facts, like economic and/or job status, race, real (non-self-reported) medical conditions. In other words, it's pretty questionable*what* these sorts of studies prove; unfortunately, shoddy studies are much more abundant than quality studies, even from "reliable" institutions.

In the case of this particular research, I imagine job status would be the most important factor to the results, as you are probably more likely to play games, watch TV, browse the Internet and, I don't know,*feel depressed* if you've been fired and haven't been able to find a job. Race is important, too. So is the fact that this was an anonymous Internet survey, a fact which has many positives and negatives going for it.

Yes,*some* averages are helpful and truthful. Others actually don't tell us much at all. **Each study you encounter should be evaluated for its individual worth. Use a healthy dose of skepticism with a side of Google fu, and you'll be right as rain.**

posted by metalheart at 9:12 PM on August 26, 2009

I think what you want to know is how well the relatively small sample of 562 people in

I think transposing small sample studies (and yes 562 people in a single area is a small sample for a country the size of the U.S.) onto the entire U.S. population is one of the biggest issues with national and international media concerning the United States. This fuels so much misunderstanding that I don't even know where to begin, so I'm not even going to try. While you can find averages, the reasons behind them are often lost when study samples are small or even consist of only one region, particularly survey studies where the results aren't objective, unlike what ROU_Xenophone is talking about, which

In some cases, there is common ground between regions, so as to warrant a generalized (average) statement about the country, as in the case of birth or death rates. However, in the foggier, self-reporting studies, I find generalized statements aren't very helpful at all. Very few of these studies look at other important, objective facts, like economic and/or job status, race, real (non-self-reported) medical conditions. In other words, it's pretty questionable

In the case of this particular research, I imagine job status would be the most important factor to the results, as you are probably more likely to play games, watch TV, browse the Internet and, I don't know,

Yes,

posted by metalheart at 9:12 PM on August 26, 2009

Thanks for the answers, everyone. However, one clarification.

When I said "you can't average *people* in any meaningful way", I didn't imply statistics about humans are bunk. Quite the pposite. You can average (in the sense of "calculating the arithmetic mean") magnitudes like people's height, salary, etc. and draw useful conclusions from them.

However, the "average person" whose attributes are the average of the attributes of a given population still seems to me some kind of journalistic strawman. That's what I meant.

posted by kandinski at 2:47 AM on August 27, 2009

When I said "you can't average *people* in any meaningful way", I didn't imply statistics about humans are bunk. Quite the pposite. You can average (in the sense of "calculating the arithmetic mean") magnitudes like people's height, salary, etc. and draw useful conclusions from them.

However, the "average person" whose attributes are the average of the attributes of a given population still seems to me some kind of journalistic strawman. That's what I meant.

posted by kandinski at 2:47 AM on August 27, 2009

"The average person" is just nontechnical shorthand (or "lazy rhetorical shorthand") for "the expected value," even if the journalists reporting the result themselves don't understand that.

posted by ROU_Xenophobe at 8:52 AM on August 27, 2009

posted by ROU_Xenophobe at 8:52 AM on August 27, 2009

This thread is closed to new comments.

The idea that you can't average people in any meaningful way would come as a surprise to a lot of scientists and mathematicians.

posted by box at 5:38 PM on August 26, 2009