June 8, 2012 6:18 AM Subscribe

I'd like to learn about data science. Things like predictive modelling, regression and classification and so on. What would be good books or online courses to start with?

posted by gwynp to Education (9 answers total) 63 users marked this as a favorite

posted by gwynp to Education (9 answers total) 63 users marked this as a favorite

I came here to mention the Stanford ML course at coursera, so consider this a second.

posted by russm at 6:59 AM on June 8, 2012

posted by russm at 6:59 AM on June 8, 2012

Here's a free textbook that I have not yet read.

posted by bessel functions seem unnecessarily complicated at 7:54 AM on June 8, 2012 [1 favorite]

posted by bessel functions seem unnecessarily complicated at 7:54 AM on June 8, 2012 [1 favorite]

There are some references in a recent closely related question of mine:

http://ask.metafilter.com/214823/Im-a-data-scientist-What-does-that-mean

posted by Tooty McTootsalot at 8:03 AM on June 8, 2012

http://ask.metafilter.com/214823/Im-a-data-scientist-What-does-that-mean

posted by Tooty McTootsalot at 8:03 AM on June 8, 2012

Some of the books I've seen most recommended for data scientists are:

Data Mining: Practical Machine Learning Tools and Techniques

Data Analysis with Open Source Tools

Machine Learning in Action

The Visual Display of Quantitative Information

Data Analysis: A Bayesian Tutorial

Pattern Recognition and Machine Learning

I recently went through the exercises here: http://www.rdatamining.com/home, which were useful, but the site (and text) is currently incomplete, so I can't recommend it too much yet. The author is working on a book describing Data Mining with R.

Also, these aren't so much technical, but if you're interested in data science and haven't seen them, you should read them over:

Planning for Big Data

The Fourth Paradigm: Data-Intensive Scientific Discovery

posted by Tooty McTootsalot at 9:13 AM on June 8, 2012

Data Mining: Practical Machine Learning Tools and Techniques

Data Analysis with Open Source Tools

Machine Learning in Action

The Visual Display of Quantitative Information

Data Analysis: A Bayesian Tutorial

Pattern Recognition and Machine Learning

I recently went through the exercises here: http://www.rdatamining.com/home, which were useful, but the site (and text) is currently incomplete, so I can't recommend it too much yet. The author is working on a book describing Data Mining with R.

Also, these aren't so much technical, but if you're interested in data science and haven't seen them, you should read them over:

Planning for Big Data

The Fourth Paradigm: Data-Intensive Scientific Discovery

posted by Tooty McTootsalot at 9:13 AM on June 8, 2012

To learn the basics of regression and ANOVA "Applied Linear Statistical Models" by Neter & Kutner is a good book that is easy to read. This book will tell you about how to do linear regression and how linear regression works.

For generalized linear models "Categorical Data Analysis" by Agresti is my favorite book. This talks about data such as binary data or count data. You need to know something about linear regression before this book.

For data driven modeling "Statistical Learning: Data Mining, Inference, and Prediction." by Hastie & Tibshirani is nice, however, not as easy as the other two books. This book talks about regression and judging the fit of a model and data driven stuff. One good thing about it is that it is FREE.

Regression and Data driven modeling are useful for different things. If you are interested in answering scientific questions, try to learn the basics of regression and the theory behind it. If you are interested in predictive modeling, data driven models are good.

posted by benthegirl at 4:24 PM on June 8, 2012

For generalized linear models "Categorical Data Analysis" by Agresti is my favorite book. This talks about data such as binary data or count data. You need to know something about linear regression before this book.

For data driven modeling "Statistical Learning: Data Mining, Inference, and Prediction." by Hastie & Tibshirani is nice, however, not as easy as the other two books. This book talks about regression and judging the fit of a model and data driven stuff. One good thing about it is that it is FREE.

Regression and Data driven modeling are useful for different things. If you are interested in answering scientific questions, try to learn the basics of regression and the theory behind it. If you are interested in predictive modeling, data driven models are good.

posted by benthegirl at 4:24 PM on June 8, 2012

Another free textbook: Mining of Massive Datasets. This text is used for Stanford's "big data" course, CS 246.

posted by town of cats at 10:07 PM on June 8, 2012

posted by town of cats at 10:07 PM on June 8, 2012

This thread is closed to new comments.

posted by kandinski at 6:36 AM on June 8, 2012 [3 favorites]