Join 3,411 readers in helping fund MetaFilter (Hide)

learn data science
June 8, 2012 6:18 AM   Subscribe

I'd like to learn about data science. Things like predictive modelling, regression and classification and so on. What would be good books or online courses to start with?
posted by gwynp to Education (9 answers total) 63 users marked this as a favorite
I am enjoying a lot Coursera's Machine Learning course.
posted by kandinski at 6:36 AM on June 8, 2012 [3 favorites]

I came here to mention the Stanford ML course at coursera, so consider this a second.
posted by russm at 6:59 AM on June 8, 2012

Here's a free textbook that I have not yet read.
posted by bessel functions seem unnecessarily complicated at 7:54 AM on June 8, 2012 [1 favorite]

There are some references in a recent closely related question of mine:
posted by Tooty McTootsalot at 8:03 AM on June 8, 2012

Some of the books I've seen most recommended for data scientists are:

Data Mining: Practical Machine Learning Tools and Techniques
Data Analysis with Open Source Tools
Machine Learning in Action
The Visual Display of Quantitative Information
Data Analysis: A Bayesian Tutorial
Pattern Recognition and Machine Learning

I recently went through the exercises here:, which were useful, but the site (and text) is currently incomplete, so I can't recommend it too much yet. The author is working on a book describing Data Mining with R.

Also, these aren't so much technical, but if you're interested in data science and haven't seen them, you should read them over:

Planning for Big Data
The Fourth Paradigm: Data-Intensive Scientific Discovery
posted by Tooty McTootsalot at 9:13 AM on June 8, 2012

Machine Learning for Hackers?

posted by mgogol at 1:35 PM on June 8, 2012

To learn the basics of regression and ANOVA "Applied Linear Statistical Models" by Neter & Kutner is a good book that is easy to read. This book will tell you about how to do linear regression and how linear regression works.

For generalized linear models "Categorical Data Analysis" by Agresti is my favorite book. This talks about data such as binary data or count data. You need to know something about linear regression before this book.

For data driven modeling "Statistical Learning: Data Mining, Inference, and Prediction." by Hastie & Tibshirani is nice, however, not as easy as the other two books. This book talks about regression and judging the fit of a model and data driven stuff. One good thing about it is that it is FREE.

Regression and Data driven modeling are useful for different things. If you are interested in answering scientific questions, try to learn the basics of regression and the theory behind it. If you are interested in predictive modeling, data driven models are good.
posted by benthegirl at 4:24 PM on June 8, 2012

What is your background? Math? CS? Stats? Physics? English?
posted by pmb at 8:53 PM on June 8, 2012

Another free textbook: Mining of Massive Datasets. This text is used for Stanford's "big data" course, CS 246.
posted by town of cats at 10:07 PM on June 8, 2012

« Older [GuitarFilter] Please help me ...   |  Looking for places to go, eat,... Newer »
This thread is closed to new comments.