Textbooks on data mining techniques / statistical analysis on large data sets?
October 22, 2010 12:35 PM Subscribe
Textbooks on data mining techniques / statistical analysis on large data sets?
I come from a computer science background, and want to basically run statistical analysis on very large data sets, looking for interesting trends and the like. I am looking for resources/textbooks on:
-Finding said interesting trends
-Computational techniques to work on said data sets efficiently
-Statistical tests to help find structure in the data (for example: auto-correlation, proving that it is or is not from a given statistical distribution, etc)
-Anything you think might be good to know for someone who wants to extract meaning and work with super large data sets
I am fine with math and CS, just need to up my exposure to the stats side of it (although I have taken stats in the past, I just haven't taken it with this in mind)
I come from a computer science background, and want to basically run statistical analysis on very large data sets, looking for interesting trends and the like. I am looking for resources/textbooks on:
-Finding said interesting trends
-Computational techniques to work on said data sets efficiently
-Statistical tests to help find structure in the data (for example: auto-correlation, proving that it is or is not from a given statistical distribution, etc)
-Anything you think might be good to know for someone who wants to extract meaning and work with super large data sets
I am fine with math and CS, just need to up my exposure to the stats side of it (although I have taken stats in the past, I just haven't taken it with this in mind)
I would like to learn some of this stuff myself. When I get around to it, I think I'd like to read The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Hastie, Tibshirani, & Friedman, which is available for free online. I've heard good things about it from other people, but I have not read any of it myself.
posted by Chicken Boolean at 12:52 PM on October 22, 2010 [1 favorite]
posted by Chicken Boolean at 12:52 PM on October 22, 2010 [1 favorite]
a friend suggests Information Theory, Inference, and Learning Algorithms. also free online.
posted by Freen at 9:36 PM on October 22, 2010
posted by Freen at 9:36 PM on October 22, 2010
Computational Linguists tend to do a lot of interesting and large-scale statistical analyses. One good book in this field is Manning and Schütze's "Foundations of Statistical Natural Language Processing."
posted by zippy at 10:45 PM on October 22, 2010
posted by zippy at 10:45 PM on October 22, 2010
"Data Mining, practical machine learning tools and techniques with Java Implementations" by Witten and Frank.
I don't believe the book is open source but the program is, which you might appreciate.
posted by mostly-sp3 at 10:47 AM on November 26, 2010
I don't believe the book is open source but the program is, which you might appreciate.
posted by mostly-sp3 at 10:47 AM on November 26, 2010
This thread is closed to new comments.
posted by Tristram Shandy, Gentleman at 12:50 PM on October 22, 2010