So I'm trying to find how two variables are related, and I have a mountain of data. [more inside]
Looking for statistical information on how many students study history at the B.A., M.A. and Ph.D. level in various non-US countries. [more inside]
So I'm going to Kenya in 5 weeks time for some work, and I'm meant to be briefing some colleagues (emphasis on brief) about some aspects of our work tomorrow. Something has leapt out at me, and I don't have the time to research it myself before presenting. [more inside]
In English, scientists customarily use the word "significant" or "statistically significant" to refer to an effect that is distinguished from zero at a p < .05 confidence level. On the other hand, the word "significant" in non-technical English carries a connotation of being meaningful, important, or substantial; this creates confusion when researchers write about "a significant effect," since the effect might be significant in the statistical sense while being so small as to be insignificant in the common-English sense. In your native language, what word is used for "signficance" in the statistical context? Is the same word used outside the technical context, and if so, is it a word whose common meaning is something more like "detectable," more like "important," or something else entirely? In particular, does the confusion that arises in English also take place in your language?
I have five structural equation models that are identical except for the final outcome variable. Should I expect the model fit statistics to vary more than negligibly? [more inside]
I'm curious what the most frequently purchased colors are for regular, non-jean pants for men. I was discussing this with a colleague today, and guessed black and navy. I'm looking specifically for industry numbers or anecdotal evidence from those working in clothing manufacturing or sales. (I already guessed!) Thanks in advance.
Feeling a little stuck in my current job; unsure if I should go for the PhD or cross it off my list and change jobs. [more inside]
I've been thinking about product ratings online. Product A and B both have an average rating of 4 stars. Product A is universally liked: every reviewer gave it 4 stars. Product B is the Twilight Series: lots of people love it (5 stars), but many 1 star reviews drags down the average to 4 stars. Other than displaying the rating distribution (# of 1 through 5 star reviews), are there well-known formulas that would give Product A a higher rating? I think what I'm asking about are weighted means, or some sort of formula that takes into account variance or skew. But rather than re-invent the statistical wheel, I was hoping some of you may be able to point out well-known examples of good weighted formulas, or research related to this question. Hope this is clear! Thank you!
I never use SPSS (I hate it like I hate nothing else, except perhaps Excel) but I must use SPSS for this problem. Normally I'd just use R and be done with it, but SPSS is necessary for this problem. How can I get a simple error bar plot from two proportions? [more inside]
The fallacy is assuming that statistic information about a thing is more relevant in dealing with a particular instance of that thing than available first-hand data. [more inside]
I'm trying a analyse a set of biological data for a research project and I'm having trouble finding the appropriate statistical tests to use. [more inside]
StatFilter: Would anybody be able to recommend a good introduction to the statistical computing language "R" that a reasonably quantitatively-adept psychologist might be able to work through on his own? Something like a step-by-step book or textbook with exercises would be great to help me become more fluent in R. (My colleagues at work who use R are primarily computer scientists who either first learned MatLab or are brilliant autodidacts when it comes to learning different scripting languages, and thus don't have any suggestions; Googling has mostly proferred a somewhat obscurely structured guide from the R authors and lots of invocations to just learn on my own, somehow...). I've become familiar with how to do many individually useful tasks in data structuring and analysis, but I feel a bit like a very high-functioning tourist who has learned a lot of phrases to get around but who would be lost and mugged in an alleyway if I strayed off the beaten path.
I have a Soundcloud account. I think it's among the better sound file sharing sites, but it can be a little pricey. I am regularly adding new content so I am always butting up against the file size limitations of my account. So, I want to have a way to look at the files in my account and decide which of them needs to go and keep the most productive ones. I want to be able to compare how long a file as been posted to number of times it has been viewed. I think that deleting a file based just on duration or views could be deceiving because I could delete an old account that is delivering consistent viewers while a new account that starts out fast could drag as time goes on. I can't tell that by just looking, so I need some help. Can I create something in Excel? I also have SPSS. Would creating crosstabs help?
Best statistical resources for comparing crime by neighborhoods? [more inside]
I have recently been introduced to the concept of pseudoreplication as a mistake that people often make when using inferential statistics to evaluate treatment outcomes. My field (evolutionary and conservation biology) makes heavy use of inferential statistics, including techniques that are vulnerable to pseudoreplication, yet nowhere in my formal education have I been taught about how poor experimental design and lack of statistical rigor can lead to fallacies like this. My personal statistical proficiency is poor, but I am working to remedy that. To that end, could folks help me by identifying and ideally explaining whatever other potential pitfalls you can think of, and explaining how they can be avoided through careful experimental design and data-analysis?
I'd like to estimate the number of days a year when the high temperature is likely to be below a particular threshold, e.g. below freezing. This turns out to be harder than expected. [more inside]
The Wikipedia page on statistics about rape shows a very high crime rate for countries like UK, US and Australia in stark contrast to, say, India. The gap can't be explained simply by under-reporting, as that exists in all these countries (even assuming different rates of under-reporting). Is it because these countries have different definitions of rape? Or something else? [more inside]
Is scientific research ever organized to search for evidence of absence by reversing the null hypothesis? If not, why not? [more inside]
Can you think of a method that allows an individual to pseudo randomly create a sequence of numbers (at the very least the randomness is opaque to the minds of other people) assuming said individual may only use his mind and body (no physical tools are allowed)? [more inside]
What statistical test should I use to determine if there is a significant difference in the percent change in the presence of bacterial species observed among five groups before and after treatment. [more inside]
[StatisticsForTheFeebleMinded]My medical office sees patients who must have monthly blood draws for a condition they have. The samples have the same two tests performed on them at an outside laboratory and by our in-house laboratory. Within the last year, these values have begun to differ wildly. Need recommendations for software/programs/equations/thoughts for analyzing any trend within the numbers that might offer an explanation as to why this is now happening. [more inside]
StatsFilter: How do I compare the means of an individual at three different points? [more inside]
Is there any kind of listing online that gives average or median apartment sizes by country or, ideally, by city? [more inside]
(Good) jobs involving probability and statistics other than math teacher or actuary? [more inside]
I have a bunch of scores for sites that are the sum of the individual scores of the samples that they contain. The number of samples in each site varies from 1 to several hundred. I would like to adjust the overall site scores to adjust for the variance in samples, so that a site with 200 samples doesn't overwhelm a site that has 10 where the site may be just as significant. However, I'm at a complete loss as how to accomplish this. Any thoughts?
Statistics question about a rare event, and the expected distribution of sightings among witnesses. [more inside]
In this game, you roll a number of six-sided dice to get a total. The total is either the highest single die result, or the sum of any multiples rolled, whichever is higher. For example: If I roll three dice and get a 3, 4, and 6, my total is 6. But if I roll a 4, 4, and 6, my total is 8, the sum of the two 4s. What I want to find out is the mean, median, mode, and standard deviation of the possible totals given N dice. How might I create a simple script to compute this? [more inside]
My stats notes are getting too long to distribute using the university printers. A publisher wants to turn them into a printed book, but I want to keep control of the electronic distribution of my work. How should I approach this situation? [more inside]
Economics Mathematics: I have a Maths degree but lately I've become interested in Economics (Microeconomics and Macroeconomics) and have been reading some textbooks and classic texts and doing some online lecture courses on Economics. But find many of that the "handwaving" graphical "proofs" of economic theories lack a sense of mathematical robustness. Do more thorough mathematics for these ideas exist? Where can I find them? [more inside]
What kind of statistical analysis would I use to compare the outcomes of a prospective cohort study, one with an intervention and one as the control? [more inside]
Say that I have a bag which contains 100 balls and every ball in the bag should be red, but it's possible that one or more of these balls is the wrong colour. How many balls should I look at to be 90% sure that all the balls are red? Or 95%? Or 99.9%? Talk me through how to work this out, please?
What great books or resources are there for practicing probability word problems such as for standardized tests like the GRE? [more inside]
Statisticsfilter: Given available information about the distribution of self-selected 4-digit passwords (specifically banking PINs), is it possible to calculate the probability of two randomly selected individuals having the same PIN? If so, what're the odds? [more inside]
I'm looking for the percentage of out-of-wedlock births per capita in the United States in 1880 (or, failing that, 1890). So far I'm coming up empty-handed. Specific stats for the District of Columbia will be even better, but I'm fairly certain that they're not available online.
Statistics filter: How can I categorize time series curves into pattern categories? [more inside]
Please help me figure out potential careers based on my interests and the best paths to obtain them. Psychology, economics, statistics? Market research? Psychometrics? [more inside]
I'm looking to learn how to calculate probabilities for a multi-round dice game. I've researched this question some, and it looks like I might need to know how to use the multinomial distribution, but I can't find any good introductions. Please point me to the most layman-accessible educational material on this subject, and help me to help myself. [more inside]
Grad programs--I've just heard (for the first time) that conditional admissions are A Thing. Would I have a snowball's chance with a good GRE score alone or will I have to take pre-req undergrad classes first? [more inside]
What after bio-statistics software experience is most attractive to future employers? [more inside]
Putting on the math signal: calling the statistics-literate. Trying to change my weight/body composition and track progress in a useful way, but I'm having trouble separating normal daily variation from actual real change. [more inside]
How do I elegantly present tabular, statistical data online and automatically? I'd love some examples of beautifully presented tabular data online - something that works natively in a browser, ideally also on a tablet and mobile as well. Some interactivity (sorting, filtering) also OK but priority is usability and elegance like you'd find in printed statistical abstracts. Bonus points for open source web tools / frameworks that could help automate this from a database! [more inside]
Recommendations for great books about probability and risk. [more inside]
Considering the huge amount of oil/energy expended driving trucks full of food all over the country, would it make more sense to increase our train infrastructure (is that even possible?) to move more food? That's the question, but not why I've come to Metafilter. [more inside]
StatisticsFilter, non-parametric edition: I'd like to test if my non-normally distributed outcome is significantly different between two groups adjusting for a third variable. [more inside]
Does Facebook publish statistics? I'm specifically interested in statistics about photo uploads and camera metadata / popularity. [more inside]
Are there any reliable statistics for the number of American high school seniors who apply to colleges each year? [more inside]
What are the *approximate* download numbers for a Top 20 podcast episode? [more inside]
Say I'm an industrial designer (I'm not), and I have to design a car seat, desk chair, climbing harness, body armor, or some other piece of equipment that has to be able to interact with a whole bunch of differently-shaped humans on a regular basis. What resource do I use to find out things like average foot width, or average knee circumference, average jaw length, or any other specific anatomical measurements? [more inside]
What is the best way for me to learn R? In particular, what is the best website or online tutorials for learning to deal with large datasets. [more inside]