Tags:

# I took statistics, and I learned I'd better go to law school.February 7, 2008 5:14 PM   Subscribe

What does it mean when they "adjust" statistics?

I promise I've tried to find the answer, but I don't understand what it means to "adjust" in statistics. Particularly since it seems to happen in connection with studies where the answer seems obvious.... Like, I'm always reading articles that say something like, "People Who Consume Champaign and Caviar Are Less Likely To Face Foreclosure!" That seems like a dumb conclusion, but then it'll always say.... "even adjusted for socioeconomic status..." Does "adjusted" mean to throw certain data out? Is it like handicapping the data? Why don't they just do the study on a specific group that *doesn't* include the adjusted people and title it accordingly? Etc.
posted by moxiedoll to science & nature (24 answers total) 1 user marked this as a favorite

I'm assuming it means they "normalize" all their data, compensating for any factors they're not studying. They're bringing everyone to the same level, comparing apples to apples, if you will. How this is done depends on the study, of course.

In your example, this would mean that Champagne and Caviar consumers are less likely to face foreclosure than non-CnC consumers, even if all else (socioeconomic status, in this case) is equal. The conclusion would be that banks are biased, or they manage money better, or something.

Hope that helped, I get the feeling that I completely missed the mark on what you were asking though.
posted by wsp at 5:24 PM on February 7, 2008

They're not throwing the data out, they're just comparing apples to apples. On its face, "people who consume champagne and caviar are less likely to face foreclosure" is obvious, because people who consume champagne and caviar are more likely to be wealthy, and wealthy people are less likely to face foreclosure.

If this is true even when "adjusted for income level," that means they've compared people at the same income level who consume champagne and caviar to those who don't, in which case the result might be surprising after all - why would someone who makes \$50K/yr and consumes champagne and caviar be less likely to face foreclosure than someone who makes \$50K/yr and doesn't? And they're not just picking a single income level, they're doing the comparison at all income levels. If you're interested in the specific technicques, Google a bit on things like "multivariate statistics" or "multivariate analysis."
posted by DevilsAdvocate at 5:25 PM on February 7, 2008 [1 favorite]

"Adjusting" is controlling for some other correlation in order to better show the correlation that a study is pursuing. For instance, let's say you had a study that showed X correlated to Z but Y influenced the relationship in some way. Adjusting factors out the influence that Y may have on the relationship in order to show the correlation between X and Z.
posted by Inspector.Gadget at 5:26 PM on February 7, 2008

Imagine that you have a study. There are 2 independent variable (let's assume they are causes for now) and 1 dependent variable (lets call it the effect). For example:

A study looking at income based on height and weight. It is found through this study that being taller ends up making you more money (it comes out to some number of dollars per additional inch from average). Let's say that is turns out that weight has some effect too. If someone wants to quote the effect of weight, they might want to statistically remove the effect of height from the number. To do that, they would "adjust" the resulting incomes such that all people were statistically the same height (they make tall people have lower incomes, and short people have higher incomes, proportional to their real height). Now, you have effectively "solved for" height, then weight is the only remaining independent variable.

Does that make sense?

They just try to remove the effect of one thing, in order to get what might be a better picture of another.
posted by milqman at 5:30 PM on February 7, 2008 [1 favorite]

If someone wants to quote the effect of weight, they might want to statistically remove the effect of height from the number. To do that, they would "adjust" the resulting incomes such that all people were statistically the same height (they make tall people have lower incomes, and short people have higher incomes, proportional to their real height). Now, you have effectively "solved for" height, then weight is the only remaining independent variable.

How do they do this? And doesn't doing this (changing the incomes to "adjust" for one variable) kind of assume what they're trying to prove and mess up the results?
posted by moxiedoll at 5:41 PM on February 7, 2008

AFAIK "adjusted" isn't a reserved word in statistics, it has its ordinary English meaning, and so may mean a variety of things, depending on the circumstances and motivations. At the simplest level, I would take it as meaning "throwing out obvious sampling errors". For example, if we have a machine attached to the automatic door of a bus that photographs each person who steps in and measures their height, we'll throw out 8" (daschund on a lead) and we'll throw out 13'7" (reflective metal decals on a surfboard).

It would commonly also mean disposing of outliers whose presence is irrelevant to the sample, for instance, if our height device is intended to measure primary school children bus travellers, we'll take teachers and the driver and other adults out of the samples. We might also take out Charlie, who has a pituitary problem and was held back four grades, thus being the only child in grade 4 to be over seven feet tall.

As for adjusting for socioeconomic status etc, what that's about is dependent on the thing being measured. For instance, if we're interested in measuring dietary effects on children's weight, it's extremely important that we adjust for the age of the child, and moderately important that we adjust for the height of the child. Thus for each child we start with a dietary profile and three numbers: age and weight and height. We have a set of tables for optimal (or at least, average over millions of children) weight, ages, and height. So we find each child's age and height on the chart, and record the individual's difference from the optimal weight. Probably as a multiplier, rather than an addition or subtraction. Now we can sensibly compare the dietary profiles to the weight of the children, despite their different ages and heights.

Where this is important socioeconomically may be in terms of, say, dental health and diet. People with higher disposable income have better dental health, obviously, regardless of their diets. So what we need to measure is, in the absence of economic mitigation, how does the person's diet affect the health of their teeth? So we take the average "teeth health profile" for a person of their age and socioeconomic status and whatever else is relevant, and compare theirs to that.

The basic idea of statistical analysis is, we want to be varying only one factor. So we take that factor, adjust all of the other factors back to an average effect on that factor, and then measure the difference. Because socioeconomic status affects so many things, adjusting for it is something that has to be done a lot, in statistical analysis.
posted by aeschenkarnos at 5:41 PM on February 7, 2008

Another issue, when conducting opinion polls for example, are the different chances of catching people in different demographic groups when you ring them up to poll them.

For example, you might be conducting a poll to work out whether the public will vote for the Pinkos or the Brownshirts in an upcoming election, so you phone a few thousand people in the evening to ask them their opinion.

However, you know that more young people tend to vote for the Pinkos, on average, than the general population. And young people tend to be more likely to have jobs in the evening, or to be away from home in the evening when you call, than the typical Brownshirt voter who is more likely to work a 9 to 5 job then come home to dinner.

Therefore, the results of your phone poll may indicate enhanced support for the Brownshirts. But you can then adjust these numbers, based on what you know about the demographic differences you're dealing with, to take into account the fact that a lot of the Pinko voters may have missed your phonecalls.
posted by Jimbob at 5:44 PM on February 7, 2008 [1 favorite]

Another example - I often hear unemployment figures they quote as being "seasonally adjusted". What this means is, they know there are certain times of the year when there is more employment available, for example, retail jobs over the Christmas season, fruit-picking jobs at certain times of the year. The actual unemployment data is then adjusted based on these known biases to give you an idea of unemployment on the whole.
posted by Jimbob at 5:46 PM on February 7, 2008

Therefore, the results of your phone poll may indicate enhanced support for the Brownshirts. But you can then adjust these numbers, based on what you know about the demographic differences you're dealing with, to take into account the fact that a lot of the Pinko voters may have missed your phonecalls.

OK BUT HOW do they do that without making something up?
posted by moxiedoll at 5:53 PM on February 7, 2008

In that context, I would understand the study to be saying that they know there is a correlation between socioeconimic status (SES) and C&C consumption. However, if you statistically adjust for the effect (in other words, knowing someone's SES would predict this much foreclosure) but you find out that people who who consume C&C have less foreclosure than would otherwise be predicted by their SES and people who don't consume C&C have more foreclosures than predicted by their SES that means that you have relationship between consumption and foreclosure over and above the obvious relationship between SES and foreclosure.

However, I'm not sure that "adjusted" is the right word for that process. I would probably use the phrase "controlling for" SES. Usually I think of adjusted statistics as things like unemployment data where they do a very rough estimated and then adjust it when they have more (better) data. Hopefully, someone with a better statistical background can help with the definitions.
posted by metahawk at 5:54 PM on February 7, 2008

Let's say that the average height of a worker is found to be 5'6". Every inch above that statistically speaking, is found to result in approximately \$500 extra per year. So, someone that is 5'10" might statistically make \$2000 more per year than someone who is 5'6".

When they go to "adjust" things for height, they are going to subtract \$500 from a person's income for every inch they are above 5'6" (and add that much per inch for the shorties). Then, they will examine the effect of the other variable on the new adjusted income.

This can be useful in a number of ways. If one effect is way bigger than the other, adjusting it out is often a good way to "focus in" on the smaller effect (because it just looks like noise compared to the magnitude of the larger effect). Also, if the two independent variables are linked (height and weight, for example) then researchers often want to determine the effects of just one by itself. In order to do that, you need to "correct" for it and do the analysis with that variable solved/constant.

Is that any better?
posted by milqman at 6:02 PM on February 7, 2008

OK BUT HOW do they do that without making something up?

Because of previous sorts of polling they would have done.

More serious, longer, pre-election polling (aggregated over many years) may show that young people have 60% support for the Pinkos, while old people only have 40% support for the pinkos.

Data they have on the working habits, social habits, phone ownership of young people might let them know that they were 30% less likely to phone a young person during the survey than an old person.

After that they can do the math; more or less, add another 30% on top of the poll numbers they have, and have that extra 30% biased 20% towards the Pinkos.

Of course, the actual demographic calculations and adjustments they do are a lot more complex than that. You may notice, looking at political opinion polls, that there is quite a wide spread in the numbers they come up with, and some polls consistently swing one way or the other. And you might exclaim "They're biased!". You would be correct - every polling company has their own demographic data, their own secret formulas and adjustments they perform, their own surveying technique. But, hopefully, if one pollster consistently predicts incorrect results, they will take a look at their data and adjust their polls differently in the future.
posted by Jimbob at 6:08 PM on February 7, 2008

To answer the "how" bit. They use something called econometrics, which allows you to try to assess the impact of variables on another.

This gets complicated fast, but roughly: Say you want to work out the determinants of income then you regress income on a bunch of other variables that you think (for prior reasons) determine income This regression is usually done using specialised computer software such as EViews:

Income = constant + a*(sex) + b*(age) + c*(education) + .... + (error term)

You then get values for a, b, c, ... and standard errors for each of them. You can then interpret (say) c as being the impact of education on income "adjusting" for the other terms.

Health warnings: This is simplified. There are a bunch of things that can go wrong with this such as autocorrelation, heteroskedasticity and omitted variables, but this is more or less what they do.
posted by TrashyRambo at 6:12 PM on February 7, 2008

Lets say that they determine SES to be on a scale from 1 to 100, 100 being Bill Gates and 1 being homeless. The researchers also find that for each additional SES "point" the chance of foreclosure goes down by .25%.

So, when they do the adjusting: for every additional SES point that someone has over some blanket average, their adjusted chance of foreclosure is .25% more (than it is in reality).

Now you can isolate the effect of Caviar on foreclosure chance. The data you talked about indicates that, even with that "adjustment" handicap, people who eat caviar are _still_ at lower risk for foreclosure.

Make sense?
posted by milqman at 6:18 PM on February 7, 2008

One factor that data are commonly adjusted for is inflation.

If we look at, say, average income in the US now v.s. in 1950, we would find that people earn far, far more now than they did in 1950. To compare average income in 2008 v.s. average income in 1950 and have it mean anything as far as what people are able to do with that income, we need to adjust for inflation. To do this, we need some standard (here, the consumer price index might be appropriate) that we can apply to all instances of the data. This is what's going on when you see graphs of something "adjusted for 1998 dollars".
posted by yohko at 6:43 PM on February 7, 2008

Ok. I think I get it. Ok. So.... let's pretend, for purposes of my example, that for the most part, only rich ladies with high income partners are stay at home moms. There are some rare non-rich ladies who stay at home and there are plenty of rich ladies who don't. So if I read an abstract of a study which concluded that "teens whose moms stayed at home have higher SAT scores than kids with working moms, even adjusted for family income...."
It COULD mean that they only looked at rich families, and compared SAHM families and non-SAHM families *at the same income levels* and found a difference....
OR
It COULD mean that they compared all kinds of families and did the above explained math hoodoo to find that the result carries across the whole population...
but we wouldn't know, just from the word "adjusted", whether they only looked at people who matched the obvious high income group, or not.
Is that accurate?
posted by moxiedoll at 7:05 PM on February 7, 2008

Yes. Adjusted can mean many, many things, but in the actual paper published from the research they would indicate the methods they used.
posted by Jimbob at 7:16 PM on February 7, 2008

In research papers, it would almost certainly be your second alternative. They have a broad sample, and found that after statistically controlling for family income, there is still a relationship between staying at home and SAT scores.

Using the phrase "even adjusted for income" implies that this was statistically controlled. Of course, the full paper would indicate this using words like 'covariate' to indicate the variables that were statistically controlled for.
posted by i love cheese at 9:05 PM on February 7, 2008

Seconding Jimbob's point above, for the actual paper to be peer reviewed and scientifically valid, the raw data and information relating to collection of it etc would need to be available for other researchers to go through.
posted by aeschenkarnos at 10:19 PM on February 7, 2008

I would say that there are 2 meanings, though there's a spectrum between them.

1) Corrected for a well known effect. This could be inflation (a dollar now is less than a dollar 50 years ago), season (housing sales data is typically seasonal; they know that on-average, there are fewer homes sold in the winter than in the summer, so a fudge factor is often included), etc. In this context, 'adjusted' means that the study authors have a belief about how the factor impacts what you are interested in, and include a fudge-factor to make this impact go away.

2) It can also mean that the factor that is being adjusted for is taken into account in the model. The difference is that unlike 1), the authors don't impose a pre-defined fudge factor, but rather let their model take care of things.

From the reader's perspective, 'adjusted' typically means that the authors remembered about the factor and ask the reader to trust their handling of it. More details tend to be specified in scientific papers, etc.

In you case above about the rich ladies, I would suspect that 'adjusted' means that they applied the correction. If they only looked at rich families, they could use a word like 'controlled' (though in some cases, it can mean the same as adjusted ... sorry), or would explicitly say "at a fixed income level, stay at home moms tend to ..."

As a note, missing variables that aren't corrected for can lead to terrible conclusions, and are the fault of many bad statistics. IE, Volvos are very safe -- but when you control for the fact that Volvo drivers tend to be safety-conscious (that's why they bought the car!), the safety benefit becomes less significant. Whenever you hear of the latest study that promotes some absurd claim (X causes Y), always looks for a 3rd factor Z that could be correlated with both X and Y, and would make the conclusion invalid.
posted by bsdfish at 2:22 AM on February 8, 2008

Multiple Linear Regression
posted by tiburon at 6:46 AM on February 8, 2008 [1 favorite]

"Seconding Jimbob's point above, for the actual paper to be peer reviewed and scientifically valid, the raw data and information relating to collection of it etc would need to be available for other researchers to go through."

This is not necessarily correct. In many fields peer review proceeds only on the basis of the presented manuscript with no additional access to raw data.
posted by roofus at 1:06 PM on February 8, 2008

This is not necessarily correct. In many fields peer review proceeds only on the basis of the presented manuscript with no additional access to raw data.

This is true in practice, but if a reviewer for Nature, say, felt that it was necessary to view the raw data and asked for it giving good reasons, and was told "it's not available" without very good reasons, I wouldn't hold out much hope for the paper.
posted by aeschenkarnos at 4:15 PM on February 11, 2008

Thanks so much! This was incredibly helpful.
posted by moxiedoll at 7:27 PM on February 11, 2008

« Older Is it possible to make the typ...   |  What do I do with my life afte... Newer »