I have to validate some input for a database, and I want to present the user with a mathematically accurate estimate of the percentage of data that they entered which may be invalid. The problem is that valid data looks like invalid data 10% of the time. [more inside]
I'm curious about how many Americans have been exposed to meditation and mindfulness, and to what degree. I've had a difficult time finding research about these and related statistics. Can anyone point me in the right direction?
I'm looking for something like the most popular/visited pages for each month (or week, or day) since Wikipedia's debut. I found the Top 25 list which would be perfect if it wasn't for the fact that it starts only in 2013. I'd like to go back as many year as possible. I combed through the statistics department but I can't find anything usable except for a couple of "Top N" pages for a single year. Most pages or sections refer to just little samples and examples. Thanks a lot for any hints Chris
I'm working with some multi-variable data (up to 12 factors per event, from a possible 28 factors) so far I've only seen it discussed as a combination of two factors. Is there a way to elegantly (or at least clearly) display inter-relations of more features? I realize 12 factors is a lot, so anything more than 2 factors would be an improvement. [more inside]
My question is about how to find someone who can appropriately choose a representative set of zip codes from a larger set. How could I get someone to choose, in an intellectually defensible way, a list of 30 to 50 zip codes that essentially represent my entire state? In other words, how could I come up with a list of 30 to 50 zip codes that economically/demographically are a representative subset of the entire state? [more inside]
Our company has decided to measure the jobs we do by assigning points to tasks. The problem here is that departments don’t do the same thing, or the same amount of things. What concerns me is that the company wants to publish a company-wide average, against which members in each department will be compared. [more inside]
Anyone familiar with proc phreg (Cox regression/survival analysis) in SAS who could help me figure out if my code is right? I'm new to survival analysis and my data are set up a little differently than the examples I'm seeing online so I'm not sure I'm doing it right. [more inside]
I am undertaking a fun statistical project, and I need help... [more inside]
I am a TA in an introductory college biology lab. Soon, I have to teach a class of mostly freshmen about t-tests and p-values in conjunction with a field experiment that they are conducting and on which they will have to write a report. I want to make sure I do this right—help me! [more inside]
So I was recently faced with a serious health care decision, and the people advising me strongly disagreed with each other, which made me realize something: I don't know enough science to make well-reasoned decisions for myself and my family. Help me figure out a DIY way to make up the gaps. [more inside]
My state has 709 zip codes. I want to draw reasonably well-grounded conclusions about the entire population of my state (having to do with insurance coverage and ACA subsidies, if you're interested). How many zip codes should I sample? [more inside]
I am looking for a resource that lists probability distributions and their common real-world applications. For example, I'd expect to see: Lognormal - daily returns in the stock market. Poisson - failure rates for mechanical equipment, ... [more inside]
If autism is diagnosed in 1 of 68 children, and there are 381 million people in the country, can I use these two facts to find out how many people in the US are autistic?
I have a statistics and/or probability question and the last time I took a statistics class Vanilla Ice and Andrew "Dice Clay" were multi-millionaires. I am not looking for a problem to be solved, I am asking what statistical technique should I use to determine if a time series of data is due to randomness or not. [more inside]
I recently came up with a statistical thought experiment from watching too many reality game shows, but I'm having trouble remembering how to solve a problem like this. Help me figure out this problem that's been bugging me (and by extension relearn some statistics I've forgotten). [more inside]
How do you calculate the probability of something when it's not as simple as "do it a bunch of times"? Specifics inside. [more inside]
Years ago I came across an introductory book on statistics that introduced the topic by means of a metaphor of a pile of sand (possibly dirt) dropped off in a front yard. As I recall, the author drew the parallel that statistics was the effort to describe the shape and size of the pile of sand. This is all that I can remember of the book (other than my wish that I'd bought it). Does this decription ring any bells?
I have 754 tickets in our development ticket system. All tickets are considered "minor" so they are not very disparate (you would not get one ticket with 4 hours, and another ticket of 4 days). Given that I have ~200 of these items estimate by hand, how can I generate estimates for the rest of the items? [more inside]
What formula do I need to determine the probability that a set of size N contains two elements, each appearing with a specific frequency? [more inside]
Please help with this probability related math problem. [more inside]
I am part of a group working on a policy document for the mitigation of traffic issues (e.g. speeding). Stakeholders are having a hard time with a particular criteria which reads as follows: "85th percentile speed is in excess of the signed speed limit by 5 mph or more." [more inside]
How do I explain the principles of minimum detectable effect, statistical power, and statistical significance to a client? [more inside]
Every outcome in a fair lottery is equally probable, yet some results display obvious patterns and feel less likely to the statistically uninformed. Nobody would blink if a six-number lotto draw came up with (3,12,27,31,40,44), but a result of (1,2,3,4,5,6) would probably make the news. Has this ever happened in a major lottery? If yes, what was the public responce?
I am searching for a live traffic flow counter which is on the net and reachable by http, or some webpage that displays hourly or cumulative vehicle statistics. [more inside]
I have a set of data: D(t). 5000 samples. Scatter-graphing makes some patterns clear (D-mean increases with t, for instance). D and t are always positive. I want to characterize these, statistically. [more inside]
Research help: What percentage of youth in foster care are born to parents who were at one point in foster care of themselves? I'm looking for information on the prevalence of intergenerational child abuse and experiences of foster care. In addition to the question above, I am also interested in what percentage of perpetrators of child abuse were victims of child abuse themselves. I've done some digging myself and haven't been able to find any data on this.
I just started grad school, and I'm feeling overwhelmed with my Biostatistics class. Can you recommend some resources to keep me from falling behind? [more inside]
Our Google Analytics numbers are hugely inflated but we don't know why. [more inside]
I want to generate synthetic user-session data to predict how big a peak in application usage might be shortly before a weekly deadline (for timesheet submission - it's a time and labour tracking application). I've come up with a method that looks like it works - it involves a gamma distribution for the login time. But I don't have enough (in fact, any) statistical training to know whether I'm using that distribution meaningfully. Statisticians, please reassure me. Thanks! Excel functions inside... [more inside]
My teen aged niece has suddenly found a strong interest in statistics. What book would you recommend for a 14 year old who has good, but not advanced, math skills?
With deadline looming, stats consultant has bailed. Simple queries need resolution. Help? I am working on a data graphic that involves statistical calculations about survival rates for startup businesses, correlated with certain tangible and intangible factors. The raw data (about survival/closure/merger outcomes) has already been investigated, and the original researchers (who are awesome) have generated some interesting correlations using univariate regressions and Cox regressions. For my output I am relying on their statistically significant findings, wanting to create comparisons among the univariate coefficients. Not sure my methods are kosher and would appreciate consultation. Avalanche inside. [more inside]
Does this exist? The popular myth is that 50% of marriages end in divorce, but those numbers are slippery. Is there a place with a good breakdown? [more inside]
I need help finding some statistics regarding workplace violence and - particularly separating out violence perpetrated by internal employees vs external 3rd parties. [more inside]
I have just completed my MSc in mathematics in Europe. I do enjoy math, but I spent my uni years feeling like a autodidact hippie marooned on an island full of Mr and Mrs I-Want-A-Good-Job. My main interests revolve around humanities (literature/history/anthropology) and economics (but not finance), and instead of starting a "stable" well-paying career I dream about something inter-disciplinary. I am very open to earning little money and relocating just to do kind of work that engages those skills. What are some random uses of my degree? [more inside]
What is the relationship between statistical power and the validity of an experiment in general? If I have low power, but a very low p value, am I still OK? [more inside]
My itty-bitty portfolio website traffic is almost all new visitors from Brazil. I don't have anything on my portfolio that would appeal to Brazilians. What is going on? Should I be worried? [more inside]
How many unique ways are there to put X rocks into Y boxes? (Given two different sets of attributes for both the rocks and the boxes.) [more inside]
Does anyone know if there is a trial version of SPSS to download for a Mac? I tried downloading a free trial, but I didn't trust the website that I was on. Are there add-ons for excel that would allow me to do this? Any suggestions are welcome.
I play fantasy baseball in a league with friends, some of whom pay no attention and others who have baseball stats in their very DNA. I myself am moderately ok most of the time, but generally have a low-level understanding of what I'm doing, so when things go wrong my solution is usually to flail around with free agent players while watching my ship sink, week by week. I'd like to get off this plateau and actually learn more about the game, sabermetrics, etc so I can be a legit contender, but I'm lost in the morass: every resource I can find is super basic or way over my head. Help! [more inside]
Would Statistics masters make me competitive for research analyst jobs in state/county government agencies? [more inside]
This study and this article mention that there are more than 3,000 pharmaceuticals currently approved for prescription in the U.S and many compounds approved for OTC use. Where did they come up with that number or how might one come up that type of estimate? [more inside]
I have a spreadsheet with 120,000 or so rows & need to pull out some data. [more inside]
Looking for a community college that has an online statistics and/or women's lit course. [more inside]
I'm looking for a social science type book, probably an edited volume, that tackles an issue using lots of data & statistical tests / arguments to debate an issue. I am less concerned with the issue being debated, and more concerned with learning 1) what constitutes proper manipulation of data 2) what tests are appropriate and when 3) what proper inference is. I'm not afraid of math.
There's a well-known joke about statisticians. "Did you hear the one one about the statistician who drowned trying to wade across the river? He knew it was three feet deep...on average." I've searched for a source for this joke on the internet, but haven't found one. Does anyone have any idea where this joke originated? Call me an academic, but I feel the need to attribute it if I can.
I'm a scientist who deals with statistics all day, and my partner is a die-hard basketball fan. He follows some of the basketball stats nerds, and sometimes he wants to talk about basketball stats with me, but I can never find any decent statistical summaries - just a bunch of averages with no context provided. Are there any free resources that provide sports statistics with more than simple averages? [more inside]
I'm looking to take an intro to statistics class this fall or summer. I'd be up for taking it online or in person, if in person it would have to be in or near Washington DC. It could either be an undergraduate or graduate class, but not one where you have to know calculus. Has anyone taken a good introduction to statistics class in the DC area or online that they could recommend?
I want to learn SPSS. What are my options? [more inside]
Can/how can one improve the estimate for a chance of an event with a small historical sample size by utilizing the chance of a related event with a large historical sample size? Example and half-assed guess inside. [more inside]
I have 600 images tagged with keywords. In addition to nice pictures of tag clouds and frequency calculations, what sort of smart, insightful analysis can I do with these data that could reveal relationships between the tags and other (more formal) data attached to the images? Any advice on software tools (Windows, preferably FOSS) would be appreciated. I have some training in statistics and I have actually done textual statistics before but only briefly so I'm not familiar with the current tools and methods.