Calculus? Is that it?
June 21, 2011 10:11 PM   Subscribe

Why can't I read this? And what sort of classes would I need to make sense of this?

I've been reading a lot of wikipedia lately. I can do basic algebra. Nevertheless, I keep running into articles under statistics that are confusing and disorienting. This section describing regression toward the mean is an excellent example. I understand what's going on, verbally, but the math is foreign. My high school didn't offer calculus, only a variety of algebras and "integrated math." What do I need to learn to make sense of these equations?
I have access to any class at Vanderbilt University and the internet.
Thanks in advance.
posted by Tennyson D'San to Education (12 answers total) 9 users marked this as a favorite
 
The notation says that the regression line is one that minimizes the sum of squared errors. The following equation derives the regression coefficients analytically.

The road to this kind of math would include some sort of calculus sequence, an understanding of mathematical proofs (either introduced as part of another course, or as a course by itself), and a first course on mathematical statistics.

One could get by without a full calculus sequence and pick up proofs elsewhere, but then the going would get progressively more difficult.
posted by Nomyte at 10:20 PM on June 21, 2011 [1 favorite]


Well, statistics is confusing and disorienting even for many people who do have the math background for it.

I'd guess you need two things to understand that: one, you need to be familiar with the notations they're using, some of which are specific to statistics. Two, you need enough calculus to understand how they're finding the minimum of that equation by setting its derivative to zero— that's covered in first-year calculus, typically.

(You might also benefit from some linear algebra. I think it's a shame that linear algebra is often taught pretty late, after differential equations. It can logically be taught after you know basic algebra, and it's probably far more applicable to most peoples' lives than diffeq. Maybe there's a Khan Academy video that would teach you this?)

Ahh, Nomyte makes a good point about understanding how formal proofs work, as well. I got that in high-school algebra and geometry.
posted by hattifattener at 10:25 PM on June 21, 2011


Yes, calculus; but note that that page isn't really giving you the math-- it's summarizing it.

A little Googling finds this page which explains regression a lot more thoroughly.

If it wasn't clear, the big sigma means "sum"... think of it as a little program that sums up an expression for each value of i, in this case.

Ah, and if you stopped at algebra, make sure you get some geometry too, at least so you can understand things like why y - a + bx defines a straight line.
posted by zompist at 10:27 PM on June 21, 2011




Possibly helpful:

http://www.khanacademy.org/video/squared-error-of-regression-line?playlist=Statistics
posted by nzero at 10:41 PM on June 21, 2011


And the actual link.
posted by nzero at 10:42 PM on June 21, 2011


Math notation is something I've picked up along the way, and not always in class. Taking courses through Calculus I will get you most of the way there, though. The

min
a,b

notation is one I hadn't seen in school, though (it means find the values of a and b that minimize the expression on the right).

The big E is a sigma and it is a shorthand for addition. For instance,
  10
-------
\
 >        f(i)
/
--------
 i = 1
means f(1) + f(2) + ... + f(10). The variable i (named on the bottom) takes each value between 1 ("assigned" to it below the sigma) to 10 (indicated above the sigma) and evaluated in the expression to the right of the sigma for each integer value, and then you add up all the results.
Take a stats class to get a handle on Cov() and Var() [covariance and variance].

I would highly recommend watching the video lectures in the MIT OpenCourseware Linear Algebra class taught by Gil Strang (here). His explanation of linear regression is absolutely delightful and mind-blowing.

Wikipedia is generally a terrible way to learn math. It's much too terse.
posted by jewzilla at 10:42 PM on June 21, 2011 [4 favorites]


Thank you all for your input - it looks like some calculus is in order. I'll respond in full tomorrow.
posted by Tennyson D'San at 11:34 PM on June 21, 2011


Statistics has its own, often inconsistently used, notation. It's a bit unfortunate that they didn't define any of it on that page. The undefined bits are overbar(x) = average of sampled (x), hat(y) = predicted y, epsilon in regression is always observed y - predicted y. Cov[] and Var[] are covariance and variance; E[] is expected value (which have their own pages). A vertical pipe means "conditional on" eg E[X|Y] means the expected value of X given that we know Y.
posted by a robot made out of meat at 7:39 AM on June 22, 2011


Oh, when they have hat(alpha) and hat(beta) they mean the values estimated by minimizing the residual sum of squares.
posted by a robot made out of meat at 7:49 AM on June 22, 2011


This is great stuff and you've given me a good place to start. Thanks so much - I'm tired of not understanding these equations and I'm going to start to drill down into the links here. I'll check out that Gil Strange link as well - thanks jewzilla.
posted by Tennyson D'San at 9:43 AM on June 22, 2011


Tennyson -

Don't feel bad if you find it confusing, because it is confusing. I come from a Math background and just taught some Statistics classes recently. Two thing come to mind

— Most math (and mathematicians) value definitness and consistency. We introduce a term or notation, define it, and pretty much use it exactly that way. Stats seems to have accumulated a lot more historical cruft. They say "We will use ^ to indicate the value we observed in our sample.", but when they start on regression, ^ is used for "predicted value".

— Learning Stats is, to a degree, like learning a foreign language. There are a lot of new terms that aren't used in other areas of math. The Stats god in our department said it explicitly, "they [the students] need to be able to carry on a conversation with me, properly using statistical language".

As an analogy Math : Stats :: Horse riding : Polo. You will need and use your math skills to do stats, but there's a whole bunch more stats-specific skills you need.
posted by benito.strauss at 3:52 PM on June 22, 2011


« Older Recommendations for things to do near Seven...   |   US->England visa question Newer »
This thread is closed to new comments.