Textbooks on data mining techniques / statistical analysis on large data sets?
October 22, 2010 12:35 PM   Subscribe

Textbooks on data mining techniques / statistical analysis on large data sets?

I come from a computer science background, and want to basically run statistical analysis on very large data sets, looking for interesting trends and the like. I am looking for resources/textbooks on:

-Finding said interesting trends
-Computational techniques to work on said data sets efficiently
-Statistical tests to help find structure in the data (for example: auto-correlation, proving that it is or is not from a given statistical distribution, etc)
-Anything you think might be good to know for someone who wants to extract meaning and work with super large data sets

I am fine with math and CS, just need to up my exposure to the stats side of it (although I have taken stats in the past, I just haven't taken it with this in mind)
posted by wooh to Education (5 answers total) 26 users marked this as a favorite
 
Empirical Methods for Artificial Intelligence by Paul Cohen. Much more about statistics than AI, don't let the title fool you.
posted by Tristram Shandy, Gentleman at 12:50 PM on October 22, 2010


I would like to learn some of this stuff myself. When I get around to it, I think I'd like to read The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Hastie, Tibshirani, & Friedman, which is available for free online. I've heard good things about it from other people, but I have not read any of it myself.
posted by Chicken Boolean at 12:52 PM on October 22, 2010 [1 favorite]


a friend suggests Information Theory, Inference, and Learning Algorithms. also free online.
posted by Freen at 9:36 PM on October 22, 2010


Computational Linguists tend to do a lot of interesting and large-scale statistical analyses. One good book in this field is Manning and Sch├╝tze's "Foundations of Statistical Natural Language Processing."
posted by zippy at 10:45 PM on October 22, 2010


"Data Mining, practical machine learning tools and techniques with Java Implementations" by Witten and Frank.

I don't believe the book is open source but the program is, which you might appreciate.
posted by mostly-sp3 at 10:47 AM on November 26, 2010


« Older The Art of War   |   Do you know where you're going to? Newer »
This thread is closed to new comments.