learn data science
June 8, 2012 6:18 AM   Subscribe

I'd like to learn about data science. Things like predictive modelling, regression and classification and so on. What would be good books or online courses to start with?
posted by gwynp to Education (9 answers total) 65 users marked this as a favorite
I am enjoying a lot Coursera's Machine Learning course.
posted by kandinski at 6:36 AM on June 8, 2012 [3 favorites]

I came here to mention the Stanford ML course at coursera, so consider this a second.
posted by russm at 6:59 AM on June 8, 2012

Here's a free textbook that I have not yet read.
posted by bessel functions seem unnecessarily complicated at 7:54 AM on June 8, 2012 [1 favorite]

There are some references in a recent closely related question of mine:

posted by Tooty McTootsalot at 8:03 AM on June 8, 2012

Some of the books I've seen most recommended for data scientists are:

Data Mining: Practical Machine Learning Tools and Techniques
Data Analysis with Open Source Tools
Machine Learning in Action
The Visual Display of Quantitative Information
Data Analysis: A Bayesian Tutorial
Pattern Recognition and Machine Learning

I recently went through the exercises here: http://www.rdatamining.com/home, which were useful, but the site (and text) is currently incomplete, so I can't recommend it too much yet. The author is working on a book describing Data Mining with R.

Also, these aren't so much technical, but if you're interested in data science and haven't seen them, you should read them over:

Planning for Big Data
The Fourth Paradigm: Data-Intensive Scientific Discovery
posted by Tooty McTootsalot at 9:13 AM on June 8, 2012 [2 favorites]

Machine Learning for Hackers?

posted by mgogol at 1:35 PM on June 8, 2012

To learn the basics of regression and ANOVA "Applied Linear Statistical Models" by Neter & Kutner is a good book that is easy to read. This book will tell you about how to do linear regression and how linear regression works.

For generalized linear models "Categorical Data Analysis" by Agresti is my favorite book. This talks about data such as binary data or count data. You need to know something about linear regression before this book.

For data driven modeling "Statistical Learning: Data Mining, Inference, and Prediction." by Hastie & Tibshirani is nice, however, not as easy as the other two books. This book talks about regression and judging the fit of a model and data driven stuff. One good thing about it is that it is FREE.

Regression and Data driven modeling are useful for different things. If you are interested in answering scientific questions, try to learn the basics of regression and the theory behind it. If you are interested in predictive modeling, data driven models are good.
posted by benthegirl at 4:24 PM on June 8, 2012

What is your background? Math? CS? Stats? Physics? English?
posted by pmb at 8:53 PM on June 8, 2012

« Older 8xx88x. The guitar chord shape, not the smiley.   |   Help a Kanto gaijin have a good one night trip in... Newer »
This thread is closed to new comments.