Ze ball, if you please
August 7, 2008 11:55 AM
Subscribe
Statistics Filter: How to pick a red ball from
n buckets, when there could be multiple red balls in one or more of the buckets, and multiple bucket sets?
I am looking to calculate the significance of finding a red ball in a generic bucket, when I have n buckets.
Currently, I employ a sampling method to generate a z-score:
1. I go through my n buckets methodically and look for the observed frequency of a red ball in all buckets.
2. I calculate an expected frequency by shuffling or shaking a bucket thousands of times and trying to find a red ball in the bucket. (In reality, these are not red balls but a particular substring of letters. One metaphor is that shaking the bucket might cause a red ball to turn green, or vice versa.) This is sampling without replacement -- i.e., a permutation test.
My z-score = red ballobserved - red ballexpected / s.d. red ballexpected.
I can use this approach to generate z-scores for finding red balls from different sets of buckets (say, bucket set A and bucket set B), with different numbers of buckets.
I would like to compare (rank) z-scores for red balls between bucket-sets A and B, however.
I find that the observed frequency is complicated by situations where more than one red ball is found in bucket, e.g. a bucket in set A may have three red balls, and another bucket may have none. Additionally, I will likely be dealing with different numbers of buckets between two or more sets of buckets.
Are there strategies for correcting the observed and expectations with these complications in mind, so that I can generate comparable significance scores?
posted by Blazecock Pileon to science & nature (11 comments total)
1 user marked this as a favorite
Because the way you go about doing similar things might vary for different kinds of variables, or what you're describing with red balls and buckets might actually be something simple and canned in Stata or R.
posted by ROU_Xenophobe at 12:26 PM on August 7, 2008