How flexible is a master's degree in biostatistics compared to one in applied statistics? Is this even what I want to do? [more inside]
I have a statistics question about ranked lists. This is not a homework question. [more inside]
I have a list of 15 people. Each person has between 1 and 3 entries in a lottery, for a total of 35 entries. I need to select 9 people from the list of 15--nobody can win more than once. What is the most transparent, most random, most low-tech way I can do this? [more inside]
I post a lot of URLs to social media sites (and since one of those is Twitter I often use url shorteners) that point to my own publishing company's website, and also directly to where I sell my books on Amazon, Barnes & Noble, etc. I know services like ow.ly will track how many clickthroughs a url will get, and I think they can give me multiple shortened urls for the same target url. I'm wondering if any url shortening sites will also let me keep track of all of my shortened urls and give them nicknames or make notes (so I can note where I've used them) and give me a chart or spreadsheet or something that shows me which urls are getting the most traffic. Or if there's an app or separate website where I can enter the info that will then collect the tracking data. I'm trying to avoid having to manually check every url's visitors data.
I currently work for a growing company doing various social media marketing for small businesses. I have been finding that I receive a lot of satisfaction doing activities related to what I learned in library school. I enjoy collecting, organizing, and providing data and information for our internal staff and making things approachable. One weakness I see is that we are especially data rich and insight poor with social media. I would like to know if there are any recommended programs for data mining or statistical analysis? [more inside]
What is the longest streak of at-bats a fielder has played through without being part of a play?
I'm looking for a word-count tool that will allow me to: set a goal for words written by a specific date, enter in the words I have written each day, see how many words I remaining toward my goal, and how many words I will need to average each day to reach my goal. [more inside]
My office has recently had some funds open up and we are looking into investing in some statistical software to make our lives easier. We do a lot of work with distribution fitting, Monte Carlo analysis, and regression analysis with data sets that may contain left or right censored data. Unfortunately, we only have a few days to identify the best software package for our buck. Alternatively, the idea has been floated to download the free R software and spend the money on some training to get over the steep learning curve. What program or approach would be the best use of our money?
I'm struggling to understand likelihood ratios (LR) in the context of diagnostic tests, and why a positive LR is influenced by the sensitivity of the test. [more inside]
Is there actual scientific research behind this oft-cited UI response time => human perception table? Or is it just "conventional wisdom." If not, I'd love to see the original research and know how it was conducted.
We have a group of six people with 55 different options. Each member of the party has to vote for each option under 8 different analyses i.e. appearance, distinctness, etc. The options are quality weighted. [more inside]
So I take it that the OkTrends blog was killed off after Match bought OkCupid. Where can I now get my regular fix of really interesting statistics presented at a level that the lay person can understand? (I already know about Nate Silver and xkcd's What If.)
I'm working on a problem for "Inferences about the difference between two population means for independent samples: sigma 1 & 2 unknown and unequal." The final value of "test statistic t" falls in the rejection region for 95% confidence interval, but falls in the nonrejection region for 99% confidence interval. Should I perform additional calculations before rejecting my null hypothesis?
I have over 10 years of sent email sitting in a folder. Are there any tools (preferably for OS X or *nix, but anything interesting is welcome) that I can use to generate interesting statistics, or draw pretty graphs, word clouds... basically anything interesting that works on a huge number of emails.
Help me with statistics and Excel. Especially help me if you know any labor saving methods. I want the median, mean and standard deviation for the average price of all items sold, but my spreadsheet-full-of-data doesn't tell me the price of each sale -- just the average price per store, and the number sold at that store. Something like this: [more inside]
So I'm beginning a statistics PhD program this fall and I'm concerned that my math skills have gotten rusty since I haven't done anything related for the past two years. I've been working as an actuary since I graduated college but I don't do that much math--mostly a lot of programming. Has anybody been in a similar situation to me? How was the adjustment for you? I'm considering retaking advanced calculus and linear algebra during my first year (probably next summer before I take 2nd year advanced courses) just to refresh myself again. I'm aware some people may think this is kind of pathetic but I'd rather be safe than sorry. Besides, it's only my first year. Is this frowned upon? [more inside]
I'm running into trouble with my statistics course. I'm just getting up to t statistics for independent measures research design. My problems are: 1. I'm going through a lot of paper 2. I need to keep all my calculations better organized as a do them 3. I'm flipping back and forth between my book, an online version of the book, and another screen so that I can reference as much material as possible at once. I'm thinking some kind of basic statistics calculator spreadsheet (or any other format) would be in order. Can anybody direct me towards one? [more inside]
What are the limits to bedbugs? Why isn't every hotel room infested given how tough they are claimed to be? Is there any evidence on the chances of taking bed bugs home from a hotel with you? Will the bedbug infestation rates go ever upwards? Why or why not? Interested in aggregated, rather than anecdotal evidence here. [more inside]
I come from an engineering background rather than a research background, and I find myself lacking in vocabulary when it comes to understanding research papers, particularly when they start talking about ANOVA analyses, F(x) effect sizes and p values. I can skim through the results of a study and see that certain numbers are bigger than other numbers, but I don't really know how to tell whether what I'm seeing is significant. I'm guessing that I'm missing basic education in statistics. Can I fix this in a simple way?
I was recently sitting down to tea with a friend of mine's (we're both in our early 20's), and the topic wandered over to hard drug use (e.g. stuff like cocaine and crystal meth, not marijuana or alcohol.) When comparing our perceptions of how common hard drug use was, we were completely surprised when our answers were polar opposites: I saw it as an extremely rare thing, but she said it was something virtually everyone did but no one talked about. What's the truth here? How prevalent is hard drug use anyway? And why do our experiences differ so much? [more inside]
I have a list of paired numbers that span multiple orders of magnitude, and I need to find a method to a) compare within each pair in a way that does not disproportionately bias the comparison at the high or low end of the list, and b) define which pairs are dissimilar enough to be excluded from further analysis. The dataset itself follows a rough sigmoid curve, with a few pairs in the 1000s, more in the 100s, a lot in the upper 10's, some in the low 10's, and a few in the single digits. I have tried a few different comparison methods so far, including percent difference and relative percent difference of both the raw and log-transformed data. [more inside]
I'm about to begin a new project that looks at the outcomes of specific events, and would like to query the hivemind to see what kinds of approaches I can take to it. I'm always impressed with the wide variety of approaches to statistical problems I see on here. [more inside]
So I'm trying to find how two variables are related, and I have a mountain of data. [more inside]
Looking for statistical information on how many students study history at the B.A., M.A. and Ph.D. level in various non-US countries. [more inside]
So I'm going to Kenya in 5 weeks time for some work, and I'm meant to be briefing some colleagues (emphasis on brief) about some aspects of our work tomorrow. Something has leapt out at me, and I don't have the time to research it myself before presenting. [more inside]
In English, scientists customarily use the word "significant" or "statistically significant" to refer to an effect that is distinguished from zero at a p < .05 confidence level. On the other hand, the word "significant" in non-technical English carries a connotation of being meaningful, important, or substantial; this creates confusion when researchers write about "a significant effect," since the effect might be significant in the statistical sense while being so small as to be insignificant in the common-English sense. In your native language, what word is used for "signficance" in the statistical context? Is the same word used outside the technical context, and if so, is it a word whose common meaning is something more like "detectable," more like "important," or something else entirely? In particular, does the confusion that arises in English also take place in your language?
I have five structural equation models that are identical except for the final outcome variable. Should I expect the model fit statistics to vary more than negligibly? [more inside]
I'm curious what the most frequently purchased colors are for regular, non-jean pants for men. I was discussing this with a colleague today, and guessed black and navy. I'm looking specifically for industry numbers or anecdotal evidence from those working in clothing manufacturing or sales. (I already guessed!) Thanks in advance.
Feeling a little stuck in my current job; unsure if I should go for the PhD or cross it off my list and change jobs. [more inside]
I've been thinking about product ratings online. Product A and B both have an average rating of 4 stars. Product A is universally liked: every reviewer gave it 4 stars. Product B is the Twilight Series: lots of people love it (5 stars), but many 1 star reviews drags down the average to 4 stars. Other than displaying the rating distribution (# of 1 through 5 star reviews), are there well-known formulas that would give Product A a higher rating? I think what I'm asking about are weighted means, or some sort of formula that takes into account variance or skew. But rather than re-invent the statistical wheel, I was hoping some of you may be able to point out well-known examples of good weighted formulas, or research related to this question. Hope this is clear! Thank you!
I never use SPSS (I hate it like I hate nothing else, except perhaps Excel) but I must use SPSS for this problem. Normally I'd just use R and be done with it, but SPSS is necessary for this problem. How can I get a simple error bar plot from two proportions? [more inside]
The fallacy is assuming that statistic information about a thing is more relevant in dealing with a particular instance of that thing than available first-hand data. [more inside]
I'm trying a analyse a set of biological data for a research project and I'm having trouble finding the appropriate statistical tests to use. [more inside]
StatFilter: Would anybody be able to recommend a good introduction to the statistical computing language "R" that a reasonably quantitatively-adept psychologist might be able to work through on his own? Something like a step-by-step book or textbook with exercises would be great to help me become more fluent in R. (My colleagues at work who use R are primarily computer scientists who either first learned MatLab or are brilliant autodidacts when it comes to learning different scripting languages, and thus don't have any suggestions; Googling has mostly proferred a somewhat obscurely structured guide from the R authors and lots of invocations to just learn on my own, somehow...). I've become familiar with how to do many individually useful tasks in data structuring and analysis, but I feel a bit like a very high-functioning tourist who has learned a lot of phrases to get around but who would be lost and mugged in an alleyway if I strayed off the beaten path.
I have a Soundcloud account. I think it's among the better sound file sharing sites, but it can be a little pricey. I am regularly adding new content so I am always butting up against the file size limitations of my account. So, I want to have a way to look at the files in my account and decide which of them needs to go and keep the most productive ones. I want to be able to compare how long a file as been posted to number of times it has been viewed. I think that deleting a file based just on duration or views could be deceiving because I could delete an old account that is delivering consistent viewers while a new account that starts out fast could drag as time goes on. I can't tell that by just looking, so I need some help. Can I create something in Excel? I also have SPSS. Would creating crosstabs help?
Best statistical resources for comparing crime by neighborhoods? [more inside]
I have recently been introduced to the concept of pseudoreplication as a mistake that people often make when using inferential statistics to evaluate treatment outcomes. My field (evolutionary and conservation biology) makes heavy use of inferential statistics, including techniques that are vulnerable to pseudoreplication, yet nowhere in my formal education have I been taught about how poor experimental design and lack of statistical rigor can lead to fallacies like this. My personal statistical proficiency is poor, but I am working to remedy that. To that end, could folks help me by identifying and ideally explaining whatever other potential pitfalls you can think of, and explaining how they can be avoided through careful experimental design and data-analysis?
I'd like to estimate the number of days a year when the high temperature is likely to be below a particular threshold, e.g. below freezing. This turns out to be harder than expected. [more inside]
The Wikipedia page on statistics about rape shows a very high crime rate for countries like UK, US and Australia in stark contrast to, say, India. The gap can't be explained simply by under-reporting, as that exists in all these countries (even assuming different rates of under-reporting). Is it because these countries have different definitions of rape? Or something else? [more inside]
Is scientific research ever organized to search for evidence of absence by reversing the null hypothesis? If not, why not? [more inside]
Can you think of a method that allows an individual to pseudo randomly create a sequence of numbers (at the very least the randomness is opaque to the minds of other people) assuming said individual may only use his mind and body (no physical tools are allowed)? [more inside]
What statistical test should I use to determine if there is a significant difference in the percent change in the presence of bacterial species observed among five groups before and after treatment. [more inside]
[StatisticsForTheFeebleMinded]My medical office sees patients who must have monthly blood draws for a condition they have. The samples have the same two tests performed on them at an outside laboratory and by our in-house laboratory. Within the last year, these values have begun to differ wildly. Need recommendations for software/programs/equations/thoughts for analyzing any trend within the numbers that might offer an explanation as to why this is now happening. [more inside]
StatsFilter: How do I compare the means of an individual at three different points? [more inside]
Is there any kind of listing online that gives average or median apartment sizes by country or, ideally, by city? [more inside]
(Good) jobs involving probability and statistics other than math teacher or actuary? [more inside]
I have a bunch of scores for sites that are the sum of the individual scores of the samples that they contain. The number of samples in each site varies from 1 to several hundred. I would like to adjust the overall site scores to adjust for the variance in samples, so that a site with 200 samples doesn't overwhelm a site that has 10 where the site may be just as significant. However, I'm at a complete loss as how to accomplish this. Any thoughts?
Statistics question about a rare event, and the expected distribution of sightings among witnesses. [more inside]
In this game, you roll a number of six-sided dice to get a total. The total is either the highest single die result, or the sum of any multiples rolled, whichever is higher. For example: If I roll three dice and get a 3, 4, and 6, my total is 6. But if I roll a 4, 4, and 6, my total is 8, the sum of the two 4s. What I want to find out is the mean, median, mode, and standard deviation of the possible totals given N dice. How might I create a simple script to compute this? [more inside]
Economics Mathematics: I have a Maths degree but lately I've become interested in Economics (Microeconomics and Macroeconomics) and have been reading some textbooks and classic texts and doing some online lecture courses on Economics. But find many of that the "handwaving" graphical "proofs" of economic theories lack a sense of mathematical robustness. Do more thorough mathematics for these ideas exist? Where can I find them? [more inside]