# Textbooks on data mining techniques / statistical analysis on large data sets?

October 22, 2010 12:35 PM Subscribe

Textbooks on data mining techniques / statistical analysis on large data sets?

I come from a computer science background, and want to basically run statistical analysis on very large data sets, looking for interesting trends and the like. I am looking for resources/textbooks on:

-Finding said interesting trends

-Computational techniques to work on said data sets efficiently

-Statistical tests to help find structure in the data (for example: auto-correlation, proving that it is or is not from a given statistical distribution, etc)

-Anything you think might be good to know for someone who wants to extract meaning and work with super large data sets

I am fine with math and CS, just need to up my exposure to the stats side of it (although I have taken stats in the past, I just haven't taken it with this in mind)

I come from a computer science background, and want to basically run statistical analysis on very large data sets, looking for interesting trends and the like. I am looking for resources/textbooks on:

-Finding said interesting trends

-Computational techniques to work on said data sets efficiently

-Statistical tests to help find structure in the data (for example: auto-correlation, proving that it is or is not from a given statistical distribution, etc)

-Anything you think might be good to know for someone who wants to extract meaning and work with super large data sets

I am fine with math and CS, just need to up my exposure to the stats side of it (although I have taken stats in the past, I just haven't taken it with this in mind)

I would like to learn some of this stuff myself. When I get around to it, I think I'd like to read The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Hastie, Tibshirani, & Friedman, which is available for free online. I've heard good things about it from other people, but I have not read any of it myself.

posted by Chicken Boolean at 12:52 PM on October 22, 2010 [1 favorite]

posted by Chicken Boolean at 12:52 PM on October 22, 2010 [1 favorite]

a friend suggests Information Theory, Inference, and Learning Algorithms. also free online.

posted by Freen at 9:36 PM on October 22, 2010

posted by Freen at 9:36 PM on October 22, 2010

Computational Linguists tend to do a lot of interesting and large-scale statistical analyses. One good book in this field is Manning and Schütze's "Foundations of Statistical Natural Language Processing."

posted by zippy at 10:45 PM on October 22, 2010

posted by zippy at 10:45 PM on October 22, 2010

"Data Mining, practical machine learning tools and techniques with Java Implementations" by Witten and Frank.

I don't believe the book is open source but the program is, which you might appreciate.

posted by mostly-sp3 at 10:47 AM on November 26, 2010

I don't believe the book is open source but the program is, which you might appreciate.

posted by mostly-sp3 at 10:47 AM on November 26, 2010

This thread is closed to new comments.

posted by Tristram Shandy, Gentleman at 12:50 PM on October 22, 2010