Statistics help?
May 16, 2010 10:08 AM Subscribe
Is there an internet forum that is especially good at discussing strange problems in statistics?
I have a weird scientific measurement, the significance of which I am trying to determine. Are there any good forums that offer sound advice on such things?
For anyone whose interested, my problem is as follows:
I have a normal distribution of measurements (ie, there are X-number of data points, each with its own magnitude, and the combined set of the magnitudes is a normal distribution). I want to take sample set of Y-number of them and then sum up the magnitudes of this sample set. I am choosing the data for my sample based upon an independent trait, and the hypothesis that I am testing involves how close the sum of the magnitudes in sample agree with an already determined number. So, what I want to test is whether or not this sum would be expected by a chance sampling of Y-terms.
I think the two parts of this that are really confounding things for me, is that my sample of Y-number of data is not random and that the metric I'm interested in (the summed magnitudes) is not actually *in* the set from which I'm sampling.
What I'm afraid of is that what I must first do is construct a new data set that represents every possible combination of Y-number of original data. This scares me since my current data set has 12000 data points and my Y=70, so the number of possible combinations is like 10^183 (!!!)
Any help or direction to better resources would be greatly appreciated!
I have a weird scientific measurement, the significance of which I am trying to determine. Are there any good forums that offer sound advice on such things?
For anyone whose interested, my problem is as follows:
I have a normal distribution of measurements (ie, there are X-number of data points, each with its own magnitude, and the combined set of the magnitudes is a normal distribution). I want to take sample set of Y-number of them and then sum up the magnitudes of this sample set. I am choosing the data for my sample based upon an independent trait, and the hypothesis that I am testing involves how close the sum of the magnitudes in sample agree with an already determined number. So, what I want to test is whether or not this sum would be expected by a chance sampling of Y-terms.
I think the two parts of this that are really confounding things for me, is that my sample of Y-number of data is not random and that the metric I'm interested in (the summed magnitudes) is not actually *in* the set from which I'm sampling.
What I'm afraid of is that what I must first do is construct a new data set that represents every possible combination of Y-number of original data. This scares me since my current data set has 12000 data points and my Y=70, so the number of possible combinations is like 10^183 (!!!)
Any help or direction to better resources would be greatly appreciated!
Best answer: Physics Forums has a great statistics section.
It seems like a Monte Carlo approach could be useful here. What if you just sampled Y points randomly and repeated N times? If the sum of the data based on the independent trait is closer to the already determined value than the random data is >95% of the time, then you've shown statistical significance at the 95% level (i.e., a Monte Carlo-acquired p-value of 0.05).
posted by Mapes at 10:44 AM on May 16, 2010
It seems like a Monte Carlo approach could be useful here. What if you just sampled Y points randomly and repeated N times? If the sum of the data based on the independent trait is closer to the already determined value than the random data is >95% of the time, then you've shown statistical significance at the 95% level (i.e., a Monte Carlo-acquired p-value of 0.05).
posted by Mapes at 10:44 AM on May 16, 2010
Monte-carlo simulation is the first thing that came to mind as well. However, randomly drawing N samples from a Gaussian distribution will just give you the mean, so you can skip that step.
The output of the simulation should give you Y(y) where Y is a subset of a GRV and is conditioned by y.
posted by I_am_jesus at 11:37 AM on May 16, 2010
The output of the simulation should give you Y(y) where Y is a subset of a GRV and is conditioned by y.
posted by I_am_jesus at 11:37 AM on May 16, 2010
I agree with I_am_jesus's recommendation (first time I've said that). Instead of mapping your entire population of interest, you can use simulations to sample from the larger population. You can think of each simulation run as something like a survey of the overall population. By conducting thousands of those surveys, you will get a pretty good estimate of the mean and distribution of the population's parameters.
posted by eisenkr at 12:23 PM on May 16, 2010
posted by eisenkr at 12:23 PM on May 16, 2010
Response by poster: Great!
Thanks to *everyone* for the ideas. I will venture to the recommended forums and also look into MC simulations.
posted by DavidandConquer at 2:23 PM on May 16, 2010
Thanks to *everyone* for the ideas. I will venture to the recommended forums and also look into MC simulations.
posted by DavidandConquer at 2:23 PM on May 16, 2010
This thread is closed to new comments.
posted by jeffamaphone at 10:11 AM on May 16, 2010