Statistics on the entire population
May 22, 2009 9:23 AM Subscribe
How does statistical analysis differ when analyzing the entire population rather than a sample?
I need to do some statistical analysis on legal cases. I happen to have the entire population rather than a sample. I'm basically interested in the relationship between case outcomes and certain features (e.g., time, the appearance of certain words or phrases in the opinion, the presence or absence of certain issues).
Should I do anything different than I would if I were using a sample? For example, is a p-value meaningful in this kind of case?
If it matters, the population is large (many thousands of cases) and spans several years.
posted by jedicus to science & nature (9 answers total) 5 users marked this as a favorite
The lesson we got is that you almost never have the true population, you always have a sample (even if it is a VERY large sample).
I would run my analysis as if I was still using a sample, and be sure to run Power tests afterwards. Its entirely possible that, even given your very large "sample", you won't be able to get results with a Beta value of greater than 0.90 (i.e. your risk or committing a Type II error [unable to detect change even when a change is present] is less than 10%).
posted by jpolchlopek at 10:00 AM on May 22, 2009 [1 favorite]