I'm looking for resources that discuss the basics of data mining and building predictive models. <br /><br /> I'm already a competent (and professional) programmer, but new to this particular area. I'm looking for resources that describe the basic concepts most commonly used today and how they're typically implemented in software. <br>
Are there books out there that cover what I'm looking for? (I'm not finding anything on Amazon so far, but I may be searching the wrong keywords). What areas of probability and statistical analysis would be most helpful?
I'm not completely sure this is what you have in mind, but with respect to data mining, <a href="http://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf">this textbook</a>, which the authors have made freely available, will take you very, <i> very </i> far.comment:ask.metafilter.com,2015:site.275649-3999476Sun, 08 Feb 2015 17:05:25 -0800grishaBy: The Michael The
For the "predictive model" piece, you'll probably want to use "machine learning" as your keyword, which should cover the types of prediction that you have in mind: supervised and unsupervised learning, decision trees, logistic regression, support vector machines, Bayesian methods, neural networks, and deep learning. The <a href="https://www.coursera.org/course/ml">Stanford ML course on Coursera</a> taught by Andrew Ng is really well regarded; the current session started on January 19 so you may be able to join now and catch up.comment:ask.metafilter.com,2015:site.275649-3999490Sun, 08 Feb 2015 17:40:35 -0800The Michael TheBy: wansac
Grisha and The Michael The: THANK YOU! Those both look like outstanding resources.comment:ask.metafilter.com,2015:site.275649-3999494Sun, 08 Feb 2015 17:51:29 -0800wansacBy: vogon_poet
Elements of Statistical Learning is good but dense. For a gentler book, and one targeted at people with a programming background, check out <a href="http://ciml.info">this</a>, although some of the chapters aren't finished. It also has one chapter devoted to how to actually compute these models efficiently which you'll probably find interesting.comment:ask.metafilter.com,2015:site.275649-3999498Sun, 08 Feb 2015 18:01:43 -0800vogon_poetBy: un petit cadeau
What do you want to do? Learn the concepts and algorithms, or actually implement them? If the latter, and you know some Python, <a href="http://scikit-learn.org/stable/">scikit-learn</a> is fantastic, along with its documentation.comment:ask.metafilter.com,2015:site.275649-3999504Sun, 08 Feb 2015 18:26:52 -0800un petit cadeauBy: The Michael The
Yeah, scikit-learn is terrific. And don't be fooled by the name: while there is a ton of pedagogic value in its documentation, the "learn" in scikit-learn refers to the "learn" in machine learning. It's not a teaching tool; it's a full-fledged ML suite, with some terrific documentation.comment:ask.metafilter.com,2015:site.275649-3999516Sun, 08 Feb 2015 18:57:29 -0800The Michael TheBy: z11s
If you are willing to learn R, I would suggest <a href="http://www.amazon.com/exec/obidos/ASIN/1782162143/metafilter-20/ref=nosim/">Machine Learning with R by Brett Lantz</a>. I found the examples very well explained but simple enough that I can go back to them when I need to scale up to a more challenging problem.comment:ask.metafilter.com,2015:site.275649-3999548Sun, 08 Feb 2015 20:22:19 -0800z11sBy: deludingmyself
That Elements of Statistical Learning book has a course currently running on Stanford Online for free, with great supplementary videos.comment:ask.metafilter.com,2015:site.275649-3999731Mon, 09 Feb 2015 07:54:15 -0800deludingmyselfBy: wansac
Thanks to all the respondents here! Now I have a wealth of resources to work with.comment:ask.metafilter.com,2015:site.275649-4000064Mon, 09 Feb 2015 14:59:00 -0800wansac