Comments on: Textbooks on data mining techniques / statistical analysis on large data sets?
http://ask.metafilter.com/168473/Textbooks-on-data-mining-techniques-statistical-analysis-on-large-data-sets/
Comments on Ask MetaFilter post Textbooks on data mining techniques / statistical analysis on large data sets?Fri, 22 Oct 2010 12:50:16 -0800Fri, 22 Oct 2010 12:50:16 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Question: Textbooks on data mining techniques / statistical analysis on large data sets?
http://ask.metafilter.com/168473/Textbooks-on-data-mining-techniques-statistical-analysis-on-large-data-sets
Textbooks on data mining techniques / statistical analysis on large data sets? <br /><br /> I come from a computer science background, and want to basically run statistical analysis on very large data sets, looking for interesting trends and the like. I am looking for resources/textbooks on:<br>
<br>
-Finding said interesting trends<br>
-Computational techniques to work on said data sets efficiently<br>
-Statistical tests to help find structure in the data (for example: auto-correlation, proving that it is or is not from a given statistical distribution, etc)<br>
-Anything you think might be good to know for someone who wants to extract meaning and work with super large data sets<br>
<br>
I am fine with math and CS, just need to up my exposure to the stats side of it (although I have taken stats in the past, I just haven't taken it with this in mind)post:ask.metafilter.com,2010:site.168473Fri, 22 Oct 2010 12:35:30 -0800woohmathstatisticsdataprogrammingBy: Tristram Shandy, Gentleman
http://ask.metafilter.com/168473/Textbooks-on-data-mining-techniques-statistical-analysis-on-large-data-sets#2422116
<a href="http://www.amazon.com/exec/obidos/ASIN/0262032252/metafilter-20/ref=nosim/">Empirical Methods for Artificial Intelligence</a> by Paul Cohen. Much more about statistics than AI, don't let the title fool you.comment:ask.metafilter.com,2010:site.168473-2422116Fri, 22 Oct 2010 12:50:16 -0800Tristram Shandy, GentlemanBy: Chicken Boolean
http://ask.metafilter.com/168473/Textbooks-on-data-mining-techniques-statistical-analysis-on-large-data-sets#2422118
I would like to learn some of this stuff myself. When I get around to it, I think I'd like to read <a href="http://www-stat.stanford.edu/~tibs/ElemStatLearn/">The Elements of Statistical Learning: Data Mining, Inference, and Prediction</a> by Hastie, Tibshirani, & Friedman, which is available for free online. I've heard good things about it from other people, but I have not read any of it myself.comment:ask.metafilter.com,2010:site.168473-2422118Fri, 22 Oct 2010 12:52:17 -0800Chicken BooleanBy: Freen
http://ask.metafilter.com/168473/Textbooks-on-data-mining-techniques-statistical-analysis-on-large-data-sets#2422636
a friend suggests <a href="http://www.inference.phy.cam.ac.uk/mackay/itila/book.html">Information Theory, Inference, and Learning Algorithms</a>. also free online.comment:ask.metafilter.com,2010:site.168473-2422636Fri, 22 Oct 2010 21:36:35 -0800FreenBy: zippy
http://ask.metafilter.com/168473/Textbooks-on-data-mining-techniques-statistical-analysis-on-large-data-sets#2422663
Computational Linguists tend to do a lot of interesting and large-scale statistical analyses. One good book in this field is Manning and Schütze's "Foundations of Statistical Natural Language Processing."comment:ask.metafilter.com,2010:site.168473-2422663Fri, 22 Oct 2010 22:45:37 -0800zippyBy: mostly-sp3
http://ask.metafilter.com/168473/Textbooks-on-data-mining-techniques-statistical-analysis-on-large-data-sets#2467311
"Data Mining, practical machine learning tools and techniques with Java Implementations" by Witten and Frank. <br>
<br>
I don't believe the <a href="http://www.cs.waikato.ac.nz/~ml/weka/book.html">book</a> is open source but the <a href="http://www.cs.waikato.ac.nz/ml/weka/">program</a> is, which you might appreciate.comment:ask.metafilter.com,2010:site.168473-2467311Fri, 26 Nov 2010 10:47:10 -0800mostly-sp3