The fallacy is assuming that statistic information about a thing is more relevant in dealing with a particular instance of that thing than available first-hand data. [more inside]
I'm trying a analyse a set of biological data for a research project and I'm having trouble finding the appropriate statistical tests to use. [more inside]
StatFilter: Would anybody be able to recommend a good introduction to the statistical computing language "R" that a reasonably quantitatively-adept psychologist might be able to work through on his own? Something like a step-by-step book or textbook with exercises would be great to help me become more fluent in R. (My colleagues at work who use R are primarily computer scientists who either first learned MatLab or are brilliant autodidacts when it comes to learning different scripting languages, and thus don't have any suggestions; Googling has mostly proferred a somewhat obscurely structured guide from the R authors and lots of invocations to just learn on my own, somehow...). I've become familiar with how to do many individually useful tasks in data structuring and analysis, but I feel a bit like a very high-functioning tourist who has learned a lot of phrases to get around but who would be lost and mugged in an alleyway if I strayed off the beaten path.
I have a Soundcloud account. I think it's among the better sound file sharing sites, but it can be a little pricey. I am regularly adding new content so I am always butting up against the file size limitations of my account. So, I want to have a way to look at the files in my account and decide which of them needs to go and keep the most productive ones. I want to be able to compare how long a file as been posted to number of times it has been viewed. I think that deleting a file based just on duration or views could be deceiving because I could delete an old account that is delivering consistent viewers while a new account that starts out fast could drag as time goes on. I can't tell that by just looking, so I need some help. Can I create something in Excel? I also have SPSS. Would creating crosstabs help?
Best statistical resources for comparing crime by neighborhoods? [more inside]
I have recently been introduced to the concept of pseudoreplication as a mistake that people often make when using inferential statistics to evaluate treatment outcomes. My field (evolutionary and conservation biology) makes heavy use of inferential statistics, including techniques that are vulnerable to pseudoreplication, yet nowhere in my formal education have I been taught about how poor experimental design and lack of statistical rigor can lead to fallacies like this. My personal statistical proficiency is poor, but I am working to remedy that. To that end, could folks help me by identifying and ideally explaining whatever other potential pitfalls you can think of, and explaining how they can be avoided through careful experimental design and data-analysis?
I'd like to estimate the number of days a year when the high temperature is likely to be below a particular threshold, e.g. below freezing. This turns out to be harder than expected. [more inside]
The Wikipedia page on statistics about rape shows a very high crime rate for countries like UK, US and Australia in stark contrast to, say, India. The gap can't be explained simply by under-reporting, as that exists in all these countries (even assuming different rates of under-reporting). Is it because these countries have different definitions of rape? Or something else? [more inside]
Is scientific research ever organized to search for evidence of absence by reversing the null hypothesis? If not, why not? [more inside]
Can you think of a method that allows an individual to pseudo randomly create a sequence of numbers (at the very least the randomness is opaque to the minds of other people) assuming said individual may only use his mind and body (no physical tools are allowed)? [more inside]
What statistical test should I use to determine if there is a significant difference in the percent change in the presence of bacterial species observed among five groups before and after treatment. [more inside]
[StatisticsForTheFeebleMinded]My medical office sees patients who must have monthly blood draws for a condition they have. The samples have the same two tests performed on them at an outside laboratory and by our in-house laboratory. Within the last year, these values have begun to differ wildly. Need recommendations for software/programs/equations/thoughts for analyzing any trend within the numbers that might offer an explanation as to why this is now happening. [more inside]
StatsFilter: How do I compare the means of an individual at three different points? [more inside]
Is there any kind of listing online that gives average or median apartment sizes by country or, ideally, by city? [more inside]
(Good) jobs involving probability and statistics other than math teacher or actuary? [more inside]
I have a bunch of scores for sites that are the sum of the individual scores of the samples that they contain. The number of samples in each site varies from 1 to several hundred. I would like to adjust the overall site scores to adjust for the variance in samples, so that a site with 200 samples doesn't overwhelm a site that has 10 where the site may be just as significant. However, I'm at a complete loss as how to accomplish this. Any thoughts?
Statistics question about a rare event, and the expected distribution of sightings among witnesses. [more inside]
In this game, you roll a number of six-sided dice to get a total. The total is either the highest single die result, or the sum of any multiples rolled, whichever is higher. For example: If I roll three dice and get a 3, 4, and 6, my total is 6. But if I roll a 4, 4, and 6, my total is 8, the sum of the two 4s. What I want to find out is the mean, median, mode, and standard deviation of the possible totals given N dice. How might I create a simple script to compute this? [more inside]
My stats notes are getting too long to distribute using the university printers. A publisher wants to turn them into a printed book, but I want to keep control of the electronic distribution of my work. How should I approach this situation? [more inside]
Economics Mathematics: I have a Maths degree but lately I've become interested in Economics (Microeconomics and Macroeconomics) and have been reading some textbooks and classic texts and doing some online lecture courses on Economics. But find many of that the "handwaving" graphical "proofs" of economic theories lack a sense of mathematical robustness. Do more thorough mathematics for these ideas exist? Where can I find them? [more inside]
What kind of statistical analysis would I use to compare the outcomes of a prospective cohort study, one with an intervention and one as the control? [more inside]
Say that I have a bag which contains 100 balls and every ball in the bag should be red, but it's possible that one or more of these balls is the wrong colour. How many balls should I look at to be 90% sure that all the balls are red? Or 95%? Or 99.9%? Talk me through how to work this out, please?
What great books or resources are there for practicing probability word problems such as for standardized tests like the GRE? [more inside]
Statisticsfilter: Given available information about the distribution of self-selected 4-digit passwords (specifically banking PINs), is it possible to calculate the probability of two randomly selected individuals having the same PIN? If so, what're the odds? [more inside]
I'm looking for the percentage of out-of-wedlock births per capita in the United States in 1880 (or, failing that, 1890). So far I'm coming up empty-handed. Specific stats for the District of Columbia will be even better, but I'm fairly certain that they're not available online.
Statistics filter: How can I categorize time series curves into pattern categories? [more inside]
Please help me figure out potential careers based on my interests and the best paths to obtain them. Psychology, economics, statistics? Market research? Psychometrics? [more inside]
I'm looking to learn how to calculate probabilities for a multi-round dice game. I've researched this question some, and it looks like I might need to know how to use the multinomial distribution, but I can't find any good introductions. Please point me to the most layman-accessible educational material on this subject, and help me to help myself. [more inside]
Grad programs--I've just heard (for the first time) that conditional admissions are A Thing. Would I have a snowball's chance with a good GRE score alone or will I have to take pre-req undergrad classes first? [more inside]
What after bio-statistics software experience is most attractive to future employers? [more inside]
Putting on the math signal: calling the statistics-literate. Trying to change my weight/body composition and track progress in a useful way, but I'm having trouble separating normal daily variation from actual real change. [more inside]
How do I elegantly present tabular, statistical data online and automatically? I'd love some examples of beautifully presented tabular data online - something that works natively in a browser, ideally also on a tablet and mobile as well. Some interactivity (sorting, filtering) also OK but priority is usability and elegance like you'd find in printed statistical abstracts. Bonus points for open source web tools / frameworks that could help automate this from a database! [more inside]
Recommendations for great books about probability and risk. [more inside]
Considering the huge amount of oil/energy expended driving trucks full of food all over the country, would it make more sense to increase our train infrastructure (is that even possible?) to move more food? That's the question, but not why I've come to Metafilter. [more inside]
StatisticsFilter, non-parametric edition: I'd like to test if my non-normally distributed outcome is significantly different between two groups adjusting for a third variable. [more inside]
Does Facebook publish statistics? I'm specifically interested in statistics about photo uploads and camera metadata / popularity. [more inside]
Are there any reliable statistics for the number of American high school seniors who apply to colleges each year? [more inside]
What are the *approximate* download numbers for a Top 20 podcast episode? [more inside]
Say I'm an industrial designer (I'm not), and I have to design a car seat, desk chair, climbing harness, body armor, or some other piece of equipment that has to be able to interact with a whole bunch of differently-shaped humans on a regular basis. What resource do I use to find out things like average foot width, or average knee circumference, average jaw length, or any other specific anatomical measurements? [more inside]
What is the best way for me to learn R? In particular, what is the best website or online tutorials for learning to deal with large datasets. [more inside]
Is there a way to find out approximately when I left this wad of money in my (very old) jacket pocket, using statistics and date/number info from the bills? What would be the minimum number of bills needed for such a thing?
I'd like to learn about data science. Things like predictive modelling, regression and classification and so on. What would be good books or online courses to start with?
What is some recent important or interesting research done in your field? [more inside]
I'm looking for an infographic (?) about how lucky it is to have been born in the developed world. I remember a format similar to "If you are also literate, then you are already in the top x% of the world," with various characteristics substituted in for "literate." [more inside]
Morbid question about Louis CK's statistical analysis of the crowd at last year's Beacon Theater show. [more inside]
I am interested in seeing some comparative data on the civilian casualties of warfare, particularly in the 20th and 21st centuries. Are there any great analyses of this topic, perhaps on DVD, YouTube, or illustrated in books with graphs? (I have a hard time envisioning the magnitude of these things when they are presented only as bald numbers.) I started thinking about this topic after watching The Fog of War, which is a documentary about Robert McNamara. He presented some amazing comparative information about World War II and the Cold War. I would love to acquire a similar critical context for other conflicts, including the recent wars in Iraq and Afghanistan.
Besides Central Americans, what are other countries that have major first-generation diaspora communities in the United States? India? Armenia? Cambodia? Other Asian countries? Just wild guesses here. [more inside]
Statistics, machine learning, and image analysis/processing. Two months. Self-study. No other obligations. Recommendations? [more inside]
I'd like to read about unexpected statistical correlations that begin to emerge when companies or academics use data mining to analyze behaviour in large groups of people. I'm looking for articles/sites that are a little more serious than this (found while searching Metafilter for this topic). More like this (NYT: Target is able to predict that some customers are pregnant before their children are born). Thanks for your recommendations! [more inside]