Predictive analytics basics for programmers?
February 8, 2015 4:57 PM   Subscribe

I'm looking for resources that discuss the basics of data mining and building predictive models.

I'm already a competent (and professional) programmer, but new to this particular area. I'm looking for resources that describe the basic concepts most commonly used today and how they're typically implemented in software.

Are there books out there that cover what I'm looking for? (I'm not finding anything on Amazon so far, but I may be searching the wrong keywords). What areas of probability and statistical analysis would be most helpful?
posted by wansac to Computers & Internet (9 answers total) 40 users marked this as a favorite
 
I'm not completely sure this is what you have in mind, but with respect to data mining, this textbook, which the authors have made freely available, will take you very, very far.
posted by grisha at 5:05 PM on February 8, 2015 [8 favorites]


For the "predictive model" piece, you'll probably want to use "machine learning" as your keyword, which should cover the types of prediction that you have in mind: supervised and unsupervised learning, decision trees, logistic regression, support vector machines, Bayesian methods, neural networks, and deep learning. The Stanford ML course on Coursera taught by Andrew Ng is really well regarded; the current session started on January 19 so you may be able to join now and catch up.
posted by The Michael The at 5:40 PM on February 8, 2015 [3 favorites]


Response by poster: Grisha and The Michael The: THANK YOU! Those both look like outstanding resources.
posted by wansac at 5:51 PM on February 8, 2015


Elements of Statistical Learning is good but dense. For a gentler book, and one targeted at people with a programming background, check out this, although some of the chapters aren't finished. It also has one chapter devoted to how to actually compute these models efficiently which you'll probably find interesting.
posted by vogon_poet at 6:01 PM on February 8, 2015


What do you want to do? Learn the concepts and algorithms, or actually implement them? If the latter, and you know some Python, scikit-learn is fantastic, along with its documentation.
posted by un petit cadeau at 6:26 PM on February 8, 2015 [1 favorite]


Yeah, scikit-learn is terrific. And don't be fooled by the name: while there is a ton of pedagogic value in its documentation, the "learn" in scikit-learn refers to the "learn" in machine learning. It's not a teaching tool; it's a full-fledged ML suite, with some terrific documentation.
posted by The Michael The at 6:57 PM on February 8, 2015


If you are willing to learn R, I would suggest Machine Learning with R by Brett Lantz. I found the examples very well explained but simple enough that I can go back to them when I need to scale up to a more challenging problem.
posted by z11s at 8:22 PM on February 8, 2015


That Elements of Statistical Learning book has a course currently running on Stanford Online for free, with great supplementary videos.
posted by deludingmyself at 7:54 AM on February 9, 2015


Response by poster: Thanks to all the respondents here! Now I have a wealth of resources to work with.
posted by wansac at 2:59 PM on February 9, 2015


« Older Italy on short notice   |   How to organize email from a particular sender? Newer »
This thread is closed to new comments.