Comments on: Statistical Analysis Help
http://ask.metafilter.com/62724/Statistical-Analysis-Help/
Comments on Ask MetaFilter post Statistical Analysis HelpTue, 15 May 2007 18:49:59 -0800Tue, 15 May 2007 18:49:59 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Question: Statistical Analysis Help
http://ask.metafilter.com/62724/Statistical-Analysis-Help
Simple Stat Filter: I'm trying to do a statistical analysis of correlation between two value fields, but all the simple tests that I know are inapplicable. Help me MeFis! <br /><br /> Here's the basic setup of the conditions: I have survey data from individuals. I am trying to draw a connection between two sets of data that individuals inputed (one category they enter a number from 0-10, the other they put in a number from 0-25). I hypothesize that the higher the number in field 2 (eg 10), the higher the number for field two (eg 25). Conversely, the lower the lower (Perhaps not linear, but directional?). So I have 35 sets of the two data values, and I've tried but failed with T tests and Chi-squared.<br>
<br>
How do I need to do this? Should I submit the data on here? Is it possible to do this test without an "expected" value structure, which I think would be artificial? <br>
<br>
Thanks a bunchpost:ask.metafilter.com,2007:site.62724Tue, 15 May 2007 18:33:55 -0800stratastarStatisticsChiSquaredSurveysBy: grobstein
http://ask.metafilter.com/62724/Statistical-Analysis-Help#943762
It sounds like you should be able to learn something just by estimating the correlation coefficient, which is defined as the covariance over the product of the standard deviations. This has the effect of imposing a linear model on your data, but you have to make assumptions to reduce a cloud of points to a single number. <br>
<br>
What software are you using? In Excel, the CORREL function will calculate the correlation coefficient for columns you give it.comment:ask.metafilter.com,2007:site.62724-943762Tue, 15 May 2007 18:49:59 -0800grobsteinBy: stratastar
http://ask.metafilter.com/62724/Statistical-Analysis-Help#943787
I could use Excel, but I'm more familiar with my Ti83. So by estimating the correlation coeffecient, I would be imposing a linear functionality (2.5x) and then test for variance from that linear function?comment:ask.metafilter.com,2007:site.62724-943787Tue, 15 May 2007 19:07:27 -0800stratastarBy: msittig
http://ask.metafilter.com/62724/Statistical-Analysis-Help#943788
Like grobstein said, it's probably a good idea to do a linear regression and find the correlation of the two data sets. Then you can do an inference test to decide if the slope of the of the regression line is 1 or not (free response question 6c).<br>
<br>
Year-end AP Stats project? My students are starting their on Monday :)comment:ask.metafilter.com,2007:site.62724-943788Tue, 15 May 2007 19:07:31 -0800msittigBy: msittig
http://ask.metafilter.com/62724/Statistical-Analysis-Help#943791
I'm sorry, test whether the slope of the LSRL is different from or greater than 0.comment:ask.metafilter.com,2007:site.62724-943791Tue, 15 May 2007 19:08:50 -0800msittigBy: shadow vector
http://ask.metafilter.com/62724/Statistical-Analysis-Help#943828
I'd recommend linear regression. Say you regress field 2 on field 1. The coefficient on field 2 tells you the effect of a one-unit change in field 2 on field 1. The p-value associated with that coefficient tells you if this is statistically different from zero.<br>
<br>
Practically speaking, I would also banish from your mind any worries about non-linearity. Regression is fantastic at giving you the right answer even in the presence of minor hiccups such as that.comment:ask.metafilter.com,2007:site.62724-943828Tue, 15 May 2007 19:32:12 -0800shadow vectorBy: grobstein
http://ask.metafilter.com/62724/Statistical-Analysis-Help#943840
Finding the correlation coefficient has the effect of telling you the fit of a simple linear model; you don't have to run any tests on it (it's just a scalar value, what could you do?). <br>
<br>
But msittig (who teaches this stuff! yay!) is totally right that you can estimate a linear regression model, and test the fit (<i>t</i>-stat of the single coefficient seems like a good way to do it, I think). <br>
<br>
On preview: yeah, and sv is right about the non-linearitycomment:ask.metafilter.com,2007:site.62724-943840Tue, 15 May 2007 19:39:24 -0800grobsteinBy: singingfish
http://ask.metafilter.com/62724/Statistical-Analysis-Help#943856
The significance of the correlation coefficient is proportional to sample size so treat with caution. Is the data approximately normally distributed? If so, and the data is a continuous variable (i.e 1-2 = 3-2 = 4-3 etc.) then you want the Pearson Product Moment correlation, otherwise the Spearman Rank Correlation. If the latter, you can't (easily) show the regression line.comment:ask.metafilter.com,2007:site.62724-943856Tue, 15 May 2007 19:53:58 -0800singingfishBy: jtfowl0
http://ask.metafilter.com/62724/Statistical-Analysis-Help#943859
nth linear regression. It will do what you want. Test if the coefficient associated with the independent variable is statistically different from zero. Also, the sign on the coefficient will tell you if the two variables are positively or negatively related. <br>
<br>
Why are you worried about non-linearity? I don't think you need to sweat it, but if you are worried (and want to have an extra-good stats project), just take your independent variable, square it, and include that in your regression. Magically, simple linear regression has become multiple regression! If the t-stat associated with the squared term is statistically significant, your data are not linear. You can put in as many higher order terms as you like (each higher order term lets the regression line change direction one extra time), but if the squared term isn't significant, no higher order terms will be either. Good luck!comment:ask.metafilter.com,2007:site.62724-943859Tue, 15 May 2007 19:57:06 -0800jtfowl0By: exphysicist345
http://ask.metafilter.com/62724/Statistical-Analysis-Help#943886
Oh, for goodness sake, just make a scattergram for each question and look at the results. (Fortunately, your Ti83 has graphing capability.) If the points are randomly distributed over the page, why bother with statistical tests? If there's a visible trend, linear or nonlinear, then it's worth pursuing.comment:ask.metafilter.com,2007:site.62724-943886Tue, 15 May 2007 20:22:21 -0800exphysicist345By: goingonit
http://ask.metafilter.com/62724/Statistical-Analysis-Help#943982
As you pointed out, you shouldn't necessarily hypothesize, even if one scale goes from 0-10 and the other from 0-25, that the second will on average be 2.5 times the first. There might be a different linear relationship, one with a nonzero intercept and a different slope. (Additionally the relationship could be nonlinear, but with this little data as shadow vector said that isn't practical). <br>
<br>
So you want to do a linear regression, then find r, the correlation coefficient. In order to find out p of a given value of a correlation coefficient, you need to go from the coefficient to a T-statistic:<br>
<br>
t = r / (sqrt((1-r^2)(N-2), which follows a T-distribution with N-2 degrees of freedom.<br>
<br>
Now you can do whatever significance testing you want.comment:ask.metafilter.com,2007:site.62724-943982Tue, 15 May 2007 21:27:23 -0800goingonitBy: mikeand1
http://ask.metafilter.com/62724/Statistical-Analysis-Help#944058
Hmmm.....<br>
<br>
Before you start employing inferential statistics (e.g. T-tests, Chi-squared, significance levels, etc.), you need to explain how you gathered your data, and ask yourself why such statistics would be appropriate.<br>
<br>
Inferential statistics assume a model that somehow generates randomness. The usual example is a random sample taken from a population, from which you are trying to draw inferences about the population.<br>
<br>
But if the individuals in your study were not selected using a random sample, then exactly what is the model you're using? And where does the randomness come in?<br>
<br>
If you can't answer these questions, then all this talk of T-tests, significance, and so on, is utterly meaningless.comment:ask.metafilter.com,2007:site.62724-944058Tue, 15 May 2007 23:08:03 -0800mikeand1