Skip
# Can you please help me understand this stuff?

(adsbygoogle = window.adsbygoogle || []).push({});

Consider the following three sets:

A: [10,10,10,10]

B: [8,9,11,12]

C: [2,4,16,18]

The sets all have the same

The SD measures statistical dispersion - that is how "spread out" the members of a set of values are. More precisely, it measures how close the members of this set are to the mean value of the set. A lower SD means that the values in a set are generally closer to the mean; a higher SD means that they are farther away from the mean.

In this case, the SDs of each set are:

A=0

B = 1.58

C=7.07

So we can see that C is more dispersed than B, which is more dispersed than A - which is, in fact, not dispersed at all, since all the numbers in set A are the same.

posted by googly at 7:47 AM on July 12, 2007 [1 favorite]

This is incorrect in a minor but important way. 95% confidence is not 95% probability. Before you take your sample, there is a 95% probability that your sample mean will be within 2 standard errors of the population mean. Once you take it, a sample is realized and there's no probability over realized events (except for zero or one).

Think of it this way: there's a 50/50 chance of getting heads when you flip a coin. But after you flip it and you're staring down at Washington's head, there's no probability about it. And after you flip it, and before you look at it, there's *still* no probability about it, only uncertainty.

sneakin, you almost certainly don't need to worry about this distinction.

posted by ROU_Xenophobe at 8:44 AM on July 12, 2007

(adsbygoogle = window.adsbygoogle || []).push({});

Post

# Can you please help me understand this stuff?

July 12, 2007 6:20 AM Subscribe

[IHaveAnExamTodayFilter] I am a graduate student in social work. Today I have a midterm that will include questions on a few things I just cannot understand despite lectures, notes and a glossary. They include

My brother tried to explain the above concepts to me in the context of his background which is biology and statistics. Unfortunately that doesn't help me a bit with what I need to understand-- he had way too many formulas, numbers and technical terms I didn't know. If anyone out there has a social science or social science research background (or is really, really good at making this stuff understandable to someone with NO math, NO stats, NO hard science background) and can explain this stuff, I would be darn grateful.

I don't need an incredibly nuanced understanding of the math or equations involved, I just need to have a general conceptual understanding of what each of the above things are. Definitions plus example, perhaps?

I realize this is a tall order. Thank you in advance.

**confidence level, confidence interval, standard deviation and margin or error**. Can you explain this stuff to me, preferably in relation to one another?My brother tried to explain the above concepts to me in the context of his background which is biology and statistics. Unfortunately that doesn't help me a bit with what I need to understand-- he had way too many formulas, numbers and technical terms I didn't know. If anyone out there has a social science or social science research background (or is really, really good at making this stuff understandable to someone with NO math, NO stats, NO hard science background) and can explain this stuff, I would be darn grateful.

I don't need an incredibly nuanced understanding of the math or equations involved, I just need to have a general conceptual understanding of what each of the above things are. Definitions plus example, perhaps?

I realize this is a tall order. Thank you in advance.

This site gives an introduction to statistical terms for journalists with a minimum of math involved.

posted by TedW at 7:03 AM on July 12, 2007 [1 favorite]

posted by TedW at 7:03 AM on July 12, 2007 [1 favorite]

Master in Public Policy here. I had to take several stats classes, which I just barely passed but finally grasped, so I may be of help.

Confidence level and interval:

This is best explained with an example. Let's say I did a study and reported that "in this sample, 90% of adults who were read to once a day as children received at least a bachelors degree." If the confidence level were 95%, this would mean that if I did this study on 1000 samples, I would get results within the confidence interval 95% of the time.

Margin of Error:

This is the same as the confidence interval. It basically means that results within this margin or interval are statistically identical to each other.

Standard deviation:

Can't help you there...

posted by lunasol at 7:23 AM on July 12, 2007

Confidence level and interval:

This is best explained with an example. Let's say I did a study and reported that "in this sample, 90% of adults who were read to once a day as children received at least a bachelors degree." If the confidence level were 95%, this would mean that if I did this study on 1000 samples, I would get results within the confidence interval 95% of the time.

Margin of Error:

This is the same as the confidence interval. It basically means that results within this margin or interval are statistically identical to each other.

Standard deviation:

Can't help you there...

posted by lunasol at 7:23 AM on July 12, 2007

To expand on lunasol's answer, the standard deviation is a mathematical measure of how spread out the data is. This number is then used to calculate the confidence interval. A 95% confidence interval is the result plus or minus 2 standard deviations; a 99% confidence interval is 3 standard deviations on either side of the result.

posted by TedW at 7:31 AM on July 12, 2007

posted by TedW at 7:31 AM on July 12, 2007

Standard Deviation:

You take a bunch of measurements of people's weights, add them all up and divide by the number of measurements and you get the average weight. But there are many ways of getting the same average. You could (1) take one really fat person, like me, and a bunch of skinny people, or (2) several medium weight people and get the same average.

Take another example. You and I get on an elevator going up, and on the next floor Dick Cheney gets on. Now the average wealth of the elevator has increased enormously, but that doesn't do either you or I any good.

Standard deviation measures how different the measurements are from the average. In the Cheney-on-the elevator case, the standard deviation would be large. In the medium-weight case, the standard deviation would be small. It turns out that you can't just calculate the average difference between a person's weight and the overall average, because that's always zero. Some weights are higher, and some are lower, so the average differences cancel out. But if you square the differences between the measurments and the average, those squares are always positive. So, they add up to something positve. The standard deviation is the square root of the sum of the squares of these differences. If all the weights are close to the average, the squares of the differences between the weights and the average will be small, and their sum will be small. If some weights are large and some are small, the differences between the weights and the average will be large, their squares will be large, and the sum of their squares will be large.

Standard deviation is always a non-negative real number. It always larger than or equal to zero. If it's zero, then all the weights are equal to the average.

You can also think of how much a measurement is an outlier by measuring how many standard deviations they are from the average. Dick Cheney's wealth is many thousands of standard deviations above the average, so that makes him very much an outlier.

It probably doesn't help, but the if you think of taking k measurements as a point, P, in k-dimensional space, then the standard deviation is (close to) the distance between P and the point whose coordinates are all the average. There, I didn't think that would help.

I hope at least that some of this helps. Good luck on your test.

posted by vilcxjo_BLANKA at 7:44 AM on July 12, 2007 [1 favorite]

You take a bunch of measurements of people's weights, add them all up and divide by the number of measurements and you get the average weight. But there are many ways of getting the same average. You could (1) take one really fat person, like me, and a bunch of skinny people, or (2) several medium weight people and get the same average.

Take another example. You and I get on an elevator going up, and on the next floor Dick Cheney gets on. Now the average wealth of the elevator has increased enormously, but that doesn't do either you or I any good.

Standard deviation measures how different the measurements are from the average. In the Cheney-on-the elevator case, the standard deviation would be large. In the medium-weight case, the standard deviation would be small. It turns out that you can't just calculate the average difference between a person's weight and the overall average, because that's always zero. Some weights are higher, and some are lower, so the average differences cancel out. But if you square the differences between the measurments and the average, those squares are always positive. So, they add up to something positve. The standard deviation is the square root of the sum of the squares of these differences. If all the weights are close to the average, the squares of the differences between the weights and the average will be small, and their sum will be small. If some weights are large and some are small, the differences between the weights and the average will be large, their squares will be large, and the sum of their squares will be large.

Standard deviation is always a non-negative real number. It always larger than or equal to zero. If it's zero, then all the weights are equal to the average.

You can also think of how much a measurement is an outlier by measuring how many standard deviations they are from the average. Dick Cheney's wealth is many thousands of standard deviations above the average, so that makes him very much an outlier.

It probably doesn't help, but the if you think of taking k measurements as a point, P, in k-dimensional space, then the standard deviation is (close to) the distance between P and the point whose coordinates are all the average. There, I didn't think that would help.

I hope at least that some of this helps. Good luck on your test.

posted by vilcxjo_BLANKA at 7:44 AM on July 12, 2007 [1 favorite]

**Standard Deviation.**

Consider the following three sets:

A: [10,10,10,10]

B: [8,9,11,12]

C: [2,4,16,18]

The sets all have the same

**mean**: 10. But clearly the sets are very different. A is more "bunched up" around the mean than B is, and B is more "bunched up" than C. Is there some way to express this bunched-upedness in a single number? Yes: the

**standard deviation**[SD].

The SD measures statistical dispersion - that is how "spread out" the members of a set of values are. More precisely, it measures how close the members of this set are to the mean value of the set. A lower SD means that the values in a set are generally closer to the mean; a higher SD means that they are farther away from the mean.

In this case, the SDs of each set are:

A=0

B = 1.58

C=7.07

So we can see that C is more dispersed than B, which is more dispersed than A - which is, in fact, not dispersed at all, since all the numbers in set A are the same.

posted by googly at 7:47 AM on July 12, 2007 [1 favorite]

An easy way to understand the confidence interval is to think about what statistics does.

Briefly, say you are concerned with the percentage of all people on earth that have brown hair (hypothetically, let's ignore the fact that we know certain areas might have more/less people with brown hair than others). Obviously you cannot check all people on earth to see how many of them do have brown hair, so the idea behind statistics is that you check on a number of people that is both small enough to be checkable, and large enough (according to the math) to be a reasonably accurate assessment of the population as a whole.

Thus, you come out of this with one number: the estimated percentage of people with brown hair, arrived at by the math you've used and your data. Because you haven't actually checked all people on earth, the estimated percentage of people with brown hair and the ACTUAL percentage of people with brown hair are two different numbers, with the second being essentially unknowable. Given this, a confidence interval with a confidence level of 95% (say in this case it works out to be +/- 5 units of your estimated percentage of people with brown hair) is a range of values centered on your estimate within which it is 95% probable the actual percentage of people within brown hair lies.

The standard deviation is really only relevant conceptually in terms of how it relates to the above definition, and TedW states that relation pretty well.

posted by invitapriore at 7:48 AM on July 12, 2007

Briefly, say you are concerned with the percentage of all people on earth that have brown hair (hypothetically, let's ignore the fact that we know certain areas might have more/less people with brown hair than others). Obviously you cannot check all people on earth to see how many of them do have brown hair, so the idea behind statistics is that you check on a number of people that is both small enough to be checkable, and large enough (according to the math) to be a reasonably accurate assessment of the population as a whole.

Thus, you come out of this with one number: the estimated percentage of people with brown hair, arrived at by the math you've used and your data. Because you haven't actually checked all people on earth, the estimated percentage of people with brown hair and the ACTUAL percentage of people with brown hair are two different numbers, with the second being essentially unknowable. Given this, a confidence interval with a confidence level of 95% (say in this case it works out to be +/- 5 units of your estimated percentage of people with brown hair) is a range of values centered on your estimate within which it is 95% probable the actual percentage of people within brown hair lies.

The standard deviation is really only relevant conceptually in terms of how it relates to the above definition, and TedW states that relation pretty well.

posted by invitapriore at 7:48 AM on July 12, 2007

at the end of the third paragraph, "within brown hair" = "with brown hair". Oops.

posted by invitapriore at 7:52 AM on July 12, 2007

posted by invitapriore at 7:52 AM on July 12, 2007

In retrospect, I take standard deviation for granted. Great answers from googly and vilcxjo_BLANKA!

posted by invitapriore at 7:54 AM on July 12, 2007

posted by invitapriore at 7:54 AM on July 12, 2007

A confidence level is a percentage. It tells you how likely it is that the results you got in your experiment came from random chance rather than something meaningful.

So, if I survey two people I know and find out that the average person's age is 21, my confidence level is not very high - my experiment doesn't tell me much about the average age of all people . If I survey ten thousand people, and the average age of those people is 21, my confidence level is much higher that the average age of all people is really 21.

I could also choose a wider confidence interval: maybe I say that the average age is between 15 and 25. I can have a higher confidence level about this with a smaller group of people.

As for standard deviation - this tells you how similar a group of things are. If the standard deviation of the ages of the people is very low, this means that all the ages are very close to 21 (the average). If the standard deviation is very high, there is a really wide spread of ages. Maybe there are lots of babies and geriatrics.

posted by emilyw at 8:09 AM on July 12, 2007

So, if I survey two people I know and find out that the average person's age is 21, my confidence level is not very high - my experiment doesn't tell me much about the average age of all people . If I survey ten thousand people, and the average age of those people is 21, my confidence level is much higher that the average age of all people is really 21.

I could also choose a wider confidence interval: maybe I say that the average age is between 15 and 25. I can have a higher confidence level about this with a smaller group of people.

As for standard deviation - this tells you how similar a group of things are. If the standard deviation of the ages of the people is very low, this means that all the ages are very close to 21 (the average). If the standard deviation is very high, there is a really wide spread of ages. Maybe there are lots of babies and geriatrics.

posted by emilyw at 8:09 AM on July 12, 2007

This decision by Richard Posner in Kadas v. MCI Systemhouse Corp. provides a nice description (apologize for the length; citations removed):

"Some cases suggest that statistical evidence is not admissible ... unless it is significant at the conventional 5 percent significance level (that is, the coefficient of the relevant correlation is at least two standard deviations away from zero), ... --in other words, unless there is no more than a 5 percent probability that we would observe a statistical correlation between the dependent variable (such as whether terminated) and the independent variable having legal significance (such as age) even if the variables were uncorrelated in the population from which the sample was drawn. ... The 5 percent test is arbitrary; it is influenced by the fact that scholarly publishers have limited space and don't want to clog up their journals and books with statistical findings that have a substantial probability of being a product of chance rather than of some interesting underlying relation between the variables of concern. ..."

posted by GarageWine at 8:27 AM on July 12, 2007

"Some cases suggest that statistical evidence is not admissible ... unless it is significant at the conventional 5 percent significance level (that is, the coefficient of the relevant correlation is at least two standard deviations away from zero), ... --in other words, unless there is no more than a 5 percent probability that we would observe a statistical correlation between the dependent variable (such as whether terminated) and the independent variable having legal significance (such as age) even if the variables were uncorrelated in the population from which the sample was drawn. ... The 5 percent test is arbitrary; it is influenced by the fact that scholarly publishers have limited space and don't want to clog up their journals and books with statistical findings that have a substantial probability of being a product of chance rather than of some interesting underlying relation between the variables of concern. ..."

posted by GarageWine at 8:27 AM on July 12, 2007

Confidence level: an arbitrary value chosen by the author for his confidence intervals. 95% and 99% are common (sometimes they're expressed by their complements, 5% and 1% or .05 and .01).

Confidence interval: You've taken a sample and found an average value of something. Your confidence interval is the range within which you're pretty sure -- 95% or 99% or whatever your confidence level is -- that the true population value lies*, given the sample value that you found. 99% CIs are always wider than 95% CIs.

Standard deviation: a measure of how variable or volatile a sample or population is. Say you have two samples, both with an average value of 0. If sample A has values of 0.05, 0.10, -.05, -.10, and sample B has values of 10, -10, 100, -100, then sample B has a much bigger standard deviation.

Margin of error: another way of saying confidence interval. You might say that you have a 95% CI of 15--21%, or you might say you have 18% with a 3% margin of error. Same thing.

*This is sloppy.

posted by ROU_Xenophobe at 8:38 AM on July 12, 2007 [1 favorite]

Confidence interval: You've taken a sample and found an average value of something. Your confidence interval is the range within which you're pretty sure -- 95% or 99% or whatever your confidence level is -- that the true population value lies*, given the sample value that you found. 99% CIs are always wider than 95% CIs.

Standard deviation: a measure of how variable or volatile a sample or population is. Say you have two samples, both with an average value of 0. If sample A has values of 0.05, 0.10, -.05, -.10, and sample B has values of 10, -10, 100, -100, then sample B has a much bigger standard deviation.

Margin of error: another way of saying confidence interval. You might say that you have a 95% CI of 15--21%, or you might say you have 18% with a 3% margin of error. Same thing.

*This is sloppy.

posted by ROU_Xenophobe at 8:38 AM on July 12, 2007 [1 favorite]

*within which it is 95% probable the actual percentage of people within brown hair lies*

This is incorrect in a minor but important way. 95% confidence is not 95% probability. Before you take your sample, there is a 95% probability that your sample mean will be within 2 standard errors of the population mean. Once you take it, a sample is realized and there's no probability over realized events (except for zero or one).

Think of it this way: there's a 50/50 chance of getting heads when you flip a coin. But after you flip it and you're staring down at Washington's head, there's no probability about it. And after you flip it, and before you look at it, there's *still* no probability about it, only uncertainty.

sneakin, you almost certainly don't need to worry about this distinction.

posted by ROU_Xenophobe at 8:44 AM on July 12, 2007

ROU: Thanks for pointing that out! My knowledge is limited to the extent of a few intro stats classes, so that's not a point that I ever came across, despite its nigh axiomatic truthfulness...*smacks head*

posted by invitapriore at 8:55 AM on July 12, 2007

posted by invitapriore at 8:55 AM on July 12, 2007

Great explanation on standard deviation from vilcxjo_BLANKA. A simple way to remember how to calculate it and which may make it easier to understand can be found here.

posted by triggerfinger at 10:43 AM on July 12, 2007

posted by triggerfinger at 10:43 AM on July 12, 2007

if you're still unclear on any of this after reading all this, feel free to e-mail me. I have a Master's in a social science field, and I was a stats TA.

posted by desjardins at 10:48 AM on July 12, 2007

posted by desjardins at 10:48 AM on July 12, 2007

oh wow. that brings back memories.

take a look at this site, includes those topics and others

posted by radsqd at 11:58 AM on July 12, 2007

take a look at this site, includes those topics and others

posted by radsqd at 11:58 AM on July 12, 2007

Thanks everyone who helped. I officially understand this stuff now!

posted by sneakin at 1:14 PM on July 12, 2007

posted by sneakin at 1:14 PM on July 12, 2007

As an update, should anyone check this thread, I did really well on the exam overall and got all questions about the above material correct. Thanks, everyone. You rule.

posted by sneakin at 5:49 PM on July 19, 2007

posted by sneakin at 5:49 PM on July 19, 2007

This thread is closed to new comments.

posted by Gyan at 6:50 AM on July 12, 2007