Can/how can one improve the estimate for a chance of an event with a small historical sample size by utilizing the chance of a related event with a large historical sample size? Example and half-assed guess inside. A baseball-related example (for the purposes of this question, please forget about complicating factors like lefty/righty splits, home/away splits, the fact that a particular player might be better or worse now than he was in the past, etc.):<br>
<br>
Joe has had 1000 at bats. He has gotten a hit in 270 of those 1000 at bats.<br>
<br>
Of those 1000 at bats, 10 were against the pitcher Fred. In those 10 at bats against Fred, Joe got six hits.<br>
<br>
Clearly we can say "Joe is a .600 hitter against Fred". But also clearly, that doesn't really have any meaningful predictive power for Joe's future at bats against Fred. If we want to guess what the chance of Joe getting a hit off of Fred is, 27% is almost certainly a much better guess than 60%.<br>
<br>
But can we use <i>both</i> pieces of information to get a guess that's better than "27%"?<br>
<br>
I have a half-assed guess, which I'll describe momentarily, but it occurs to me that this is probably a problem which has been thought about rigorously by mathematicians. So does anyone know if there's a "real" answer to this problem?<br>
<br>
My half-assed guess is something along these lines:<br>
<br>
Joe has 10 at bats against Fred, and 6 hits in them. But Joe has 1000 at bats total (with 270 hits). Let's assume that if Joe had had 1000 at bats against Fred, 10 of them would have gone as they did, and the other 990 would have been as if against an average pitcher. So Joe would have gotten:<br>
<br>
6 + 990 * 270 / 1000<br>
<br>
= 6 + 267.3<br>
<br>
= 273.3<br>
<br>
So we guess that in his upcoming at bat against Fred, Joe has a 27.33% chance of getting a hit.
I'm struggling to understand likelihood ratios (LR) in the context of diagnostic tests, and why a positive LR is influenced by the sensitivity of the test. I know that:<br>
<ol><br>
<ul>1. its a tool to get from a pre-test probability to a post-test probability</ul><br>
<ul>2. it is defined as the (percentage of people with the disease who test positive) divided by (the percentage of people without the disease who test positive).<br>
</ul><br>
<ul>3. Or, alternately, a positive LR is equal to sensitivity/ (1-specificity).<br>
</ul><br>
</ol><br>
</ul><br>
What I don't understand is <strong>why is a positive LR dependant on the sensitivity of the test?</strong><br>
<br>
For example, lets say I want to diagnose someone with Hairy Face Syndrome, and my diagnostic test is that the clouds open up, and God comes down from the heavens and tells me "this man has hairy face syndrome!"<br>
<br>
This is, understandably, a very *specific* test, but very poorly sensitive. <br>
<br>
The way I've always understood likelihood ratios is that the positive likelihood ratio helps you interpret what to do with a positive result, not how useful searching for a positive result is likely to be.<br>
<br>
Therefore, why does the low sensitivity of waiting for an act-of-god negatively impact the likelihood ratio and, consequently, your interpretation of it when it does, miraculously happen?
The fallacy is assuming that statistic information about a thing is more relevant in dealing with a particular instance of that thing than available first-hand data. The basic form of the fallacy is this:<br>
<br>
<strong>Premise:</strong> I am in situation X.<br>
<strong>Premise:</strong> Statistics show that X most commonly follows the pattern of situation Y.<br>
<strong>Premise:</strong> I should base my actions on the available statistics.<br>
<strong>Conclusion:</strong> I will react to situation X according to my rules for reacting to situation Y.<br>
<br>
(Forgive me if I put that poorly; my formal logic is rusty.)<br>
<br>
<strong>An example:</strong><br>
Jack is walking in the park when he sees two people walking dogs. One is walking a golden retriever, and the other is walking a pit bull. The golden retriever is straining at its owner's leash, barking at everyone who passes, growling, and baring its teeth. The pit bull is walking calmly by its owners side and letting people pet it without complaint. Jack has to pass by one of the dogs to continue. He recalls reading a very reliable and well-sourced study which said that pit bulls are 35% more likely than golden retrievers to be aggressive and dangerous towards strangers; therefore, in order to stay safe, Jack chooses to walk past the golden retriever, and avoid the pit bull.<br>
<br>
Obviously, statistics have their use, and if Jack did not have first-hand data, his choice would have been logical. If, for instance, he didn't see the dogs, but was simply told by a friend that one path would lead past a golden retriever, and other path would lead past a pit bull, it would be logical to conclude that, given the limited information, there was a higher probability of meeting with an unfriendly dog if he chose to walk past the pit bull. But Jack should have realized that he had access to first-hand information (his witness of the dogs' behavior) that was more likely to represent the temperaments of those particular two dogs than the statistical average was.<br>
<br>
Is there a name for this fallacy? Or a way of expressing it mathematically? The closest I can find is the ludic fallacy, but I suspect that there are better ways to express it.
Can you think of a method that allows an individual to pseudo randomly create a sequence of numbers (at the very least the randomness is opaque to the minds of other people) assuming said individual may only use his mind and body (no physical tools are allowed)? Some use cases to test this method:<br>
* tell someone a number between 1-100<br>
* select 10 out out of 20 doors and select 5 out of those 10<br>
* create a string consisting of 10 ascii characters<br>
* select a date and time (YYYY-MM-DD, HH:MM)tag:ask.metafilter.com,2012:site.231631Fri, 21 Dec 2012 16:47:33 -0800Foci for AnalysisWhat does a professional statistician do?
(Good) jobs involving probability and statistics other than math teacher or actuary? I am trying to determine what career I should pursue. I like math, specifically probability and statistics. Teacher and actuary are both on the table, but I want to know what other options there are.tag:ask.metafilter.com,2012:site.230283Mon, 03 Dec 2012 14:10:17 -0800CustooFintelHow to solve a complex statistics problem with a script?
In this game, you roll a number of six-sided dice to get a <strong>total</strong>. The total is either the highest single die result, or the sum of any multiples rolled, whichever is higher.
For example: If I roll three dice and get a 3, 4, and 6, my total is 6. But if I roll a 4, 4, and 6, my total is 8, the sum of the two 4s.
What I want to find out is the mean, median, mode, and standard deviation of the possible totals given N dice. How might I create a simple script to compute this? With two or three dice, I can easily figure this out by listing all the possible results, basically by brute force. But that's a pretty labor-intensive method when it comes to four or more dice.<br>
<br>
I have a Macintosh, and I'm comfortable using the Unix command line and several programming languages for simple problems, but I'm not even sure where to start automating something like this. I'd be grateful for any guidance.
What great books or resources are there for practicing probability word problems such as for standardized tests like the GRE? I have a difficult time deciphering the wording in probability problems, and need basic/intermediate practice problems. It should cover concepts such as when to multiple or add probabilities, conditional probability, independent and mutually exclusive events, etc. It can't be overly basic either (ie: probably not Khan Academy).<br>
<br>
Schaum's outline seems like a great choice, but the <a href="http://www.amazon.com/exec/obidos/ASIN/0071350047/metafilter-20/ref=nosim/">reviews on amazon</a> say its full of mistakes which I cannot have when learning.tag:ask.metafilter.com,2012:site.228436Thu, 08 Nov 2012 11:55:46 -0800Mr. PapagiorgioWhat are the odds that two randomly selected people share the same bank PIN?
Statisticsfilter: Given available information about the distribution of self-selected 4-digit passwords (specifically banking PINs), is it possible to calculate the probability of two randomly selected individuals having the same PIN? If so, what're the odds? What I'd like to know is, if you had the sample set used <a href="http://www.datagenetics.com/blog/september32012/index.html">here</a> (a compelling analysis of leaked 4-digit passwords), could you calculate the probability of two random users in that dataset sharing a PIN? Given the information visualized there, could you? (Turn the infographic back into numbers, do some math magic?)<br>
<br>
If not, what data <i>would</i> you need in order to approximate a probability? Or if so, what are the odds?<br>
<br>
Other perhaps-relevant thing I found while googling around: Probability of the same PIN digits, author assumes uniform distribution (I know that's an actual probability/statistics term, but that's all I know).
I'm looking to learn how to calculate probabilities for a multi-round dice game. I've researched this question some, and it looks like I might need to know how to use the multinomial distribution, but I can't find any good introductions. Please point me to the most layman-accessible educational material on this subject, and help me to help myself. The game I'm playing can be abbreviated like this:<br>
<br>
I roll two six sided dice and add them.<br>
I then subtract a (fixed) penalty value, and score as many points as remain.<br>
I cannot score less than 0. If I would score less than 0 points, I score 0 instead.<br>
I repeat this process several times, without altering the penalty between rolls.<br>
<br>
How do I predict the probability of any 1 particular cumulative score?<br>
I am not interested in simulating this event and taking a random sample; I want to calculate all possible outcomes and their precise probabilities. Examples follow:<br>
<br>
<br>
Example #1:<br>
My penalty value is 0, and I repeat the die roll 36 times.<br>
Each roll will produce a number between 2 and 12 inclusive, and my final score will be between 72 and 432. What is the chance that I will score 7 points exactly 6 times.<br>
<br>
Example #2:<br>
My penalty value is 11, and I repeat the procedure 40 times.<br>
Each roll will produce a value between 0 and 1 inclusive, and my final total score will be between 0 and 40, and likely closer to 0 than 40.<br>
I wish to predict the chance that I will score a cumulative total of exactly 2 points across the sum of all 40 trials.<br>
<br>
Example #3:<br>
My penalty value is -5, and I repeat the procedure 5 times.<br>
Each roll will produce a value between 7 and 17 inclusive (negative penalty). I wish to plot the probability of a arriving at each possible total: 35 through 85. <br>
<br>
Again, I think maybe this is a case of the muiltinomial distribution, but everything I can find about it (wikipedia) is confusing and not written as educational but as reference material. <br>
<br>
I'm familiar with MS Excel and have a basic Liberal Arts understanding of calculus and probability/statistics, though I'm quite rusty. I'm willing to re-learn what I have forgotten, and also to learn new skills - within reason. If this procedure requires me to learn lots of new math, I would appreciate links or book titles to pursue as well as a general outline of what the fields of study are called, and what I'll need to know to pursue them intelligently (i.e. help me ask questions without sounding ignorant/clueless). <br>
<br>
I gather that the program R might be helpful in this, so please feel free to recommend a good introduction to both R and Statistics.<br>
<br>
Thanks for any help you're able to give!
Recommendations for great books about probability and risk. I'm looking for great pop-science books (well-written, well-referenced) about risk and probability as well as books relating to perception of risk and the associated cognitive biases. <br>
<br>
I'm also interested (it may seem tangential but it isn't to me) in books that explain mathematical modelling of real-world processes (e.g. Bayesian statistics, Game theory) in non-mathematical terms so that an intelligent non-mathematician could grasp them. <br>
<br>
If it helps, my motivation: I am a doctor, I spend much of my life talking to people about probability and risk. I wish to deepen my understanding and develop a more intuitive appreciation of it partially to satiate my own curiosity and partially so that I am better able to navigate patients' pre-existing cognitive biases.
What are the most mathematically 'advanced' RPG systems? Pen & paper and otherwise? There's been a lot of work done in the past 30 years on statistical modeling, as well as a lot of computing power available in mobile devices, but it seems to me as someone who has been only casually following RPG development that RPG's are still mostly relying on dice and cards and basically haven't gotten much more advanced than craps or poker. <br>
<br>
I'm curious if maybe I've missed some development to produce more realistic character generation and world modelling? I know that some gamer someone has to have done some work along these lines, even if the games failed.<br>
<br>
So what's out there? I'm mostly interested in developments in tabletop gaming, but even like really modern RPG's seem like they're just using really dice rolls in the background. Is that the case?
Probability Question A quick question of probability that I can't figure out. For traits A,B,X,Y. If the probability of A and X is 0.9, the probability of A and Y is 0.1, the probability of B and X is 0.75, and the probability of B and Y is 0.25.<br>
<br>
If Y, what is the probability of B as opposed to A?
How do I figure out what the size of this password's keyspace is? I currently work for a large company with an archaic IT infrastructure and am forced to change my password every 90 days. While I don't specifically think that such a policy is unwarranted, I am constantly annoyed by the arbitrary restrictions that are placed on the passwords that they will allow me to use. I'm currently estimating the total number of distinct passwords that are possible in this system to be in the realm of 160 trillion, an astonishingly small keyspace for a modern password.<br>
<br>
Help me figure out what the exact size of the keyspace is given the following requirements:<br>
<ul><br>
<li>Must be exactly 8 characters.</li><br>
<li>Must contain at least 1 uppercase character</li><br>
<li>Must contain at least 1 lowercase character</li><br>
<li>Must contain at least 1 number</li><br>
<li>Must contain a leading letter (upper or lower)</li><br>
<li>May contain up to 2 special characters ($ or # only)</li><br>
<li>May not have repeating characters</li><br>
Need help with a probability question
Can someone help with a probability question? At least I think that's what it is. I'm not a math person, so I may not even be asking the right question:<br>
<br>
16 contestants<br>
500,000 votes cast<br>
Out of the 8 semi-finalists, two contestants received the exact same number of votes (so they went with 9 instead of the usual 8).<br>
<br>
What is the probability that there would be a tie?
Which probability distribution should I use to model examination results? At the moment I'm using the Beta distribution, but that's mainly because it looks right and is relatively easy to implement in Excel, which is what I use as a markbook. I don't think that the normal distribution is correct because that'd create a symmetric graph and the results are usually biased around one end of the scale, but I wonder about other distributions.<br>
<br>
I'm a scientist, so I can understand the maths, but I haven't done a lot of stats work and so I don't know which distribution is appropriate for which situation.
Has there ever been any research done on whether there is any correlation between spurts of adding contacts to LinkedIn or LinkedIn activity and someone changing job? I've noticed among people I know that there seems to be an increase in LinkedIn activity and them changing jobs. I was idly wondering if anybody had ever done research that showed any statistical relationship between the two.
Probability filter: after eating the TD turkey we play the Turkey game which consists of tossing six dice. The six faces of each die are carved with the letters that spells turkey. Each combination of letters earn a different score (for example T U is 5 points, 3 Ts wipe all the points earned, etc.) with the first TURKEY being the winner. How many tosses would you need to spell TURKEY? You would not believe how many different answer a bunch of supposedly intelligent people gave, from 6! to 6 to the 6th power. Which answer is right? Explain to me how and why, please.
How does one guess sports betting odds, or determine at what point to place a bet on a sporting event? I'm a huge fan of Bill Simmons on ESPN. Every NFL season, he and a friend named Cousin Sal will do a weekly podcast where they guess the lines on NFL games. I'm mystified as to how they make educated guesses on what the lines will be. <br>
<br>
What sorts of factors would go into consideration in guessing the odds for sporting events, and is there a model that can predict when a person should or should not bet on a sporting event?<br>
<br>
BONUS QUESTION: Can someone explain in a simple fashion how to bet using a money line?<br>
<br>
Thanks!
Stats-filter: Given a binary matrix, if I know the total number of ones in a given row and a given column, can I calculate the probability that a given position contains a one? I have a binary matrix, like so, where every value is either 1 or 0. So, if the first column contains 2 ones, and the first row contains 1 one, what's the probability that position A contains a one?<br>
<br>
Example:<br>
<code><br>
2<br>
________<br>
1 | A| | |<br>
|__|__|__|<br>
| | | |<br>
|__|__|__|<br>
| | | |<br>
|__|__|__|<br>
| | | |<br>
|__|__|__|<br>
</code><br>
<br>
(not homework-filter)
Actuarial / statistics geeks: a puzzle for you. You are on a two month training program with a group of Korean teachers of English.<br>
<br>
There are 20 teachers, ranging in age between late 20s to early 50s.<br>
<br>
How many of their grandmothers are likely to die during the course?<br>
<br>
Consider the current months (October-November) as the timeframe. Assume the grandmothers are all residents of Daegu, Korea.<br>
<br>
I probably cannot provide any more specifics than those, but if you want clarification I will try.
How do I get a controlled distribution of random numbers to fairly determine a start position. In a sporting event, start position is decided based on the last digit of your registration number. Each week, random numbers are drawn to decide the start order. For example, the random draw order for a single week is 4, 0, 3, 5, 1, 8, 7, 9, 6, 2. So everyone with a number ending in 4 starts first. Everyone with a number ending in 0 starts second. And so on. The next week the draw order is again random.<br>
<br>
While this works, the distribution of of numbers can end up being unfair (one particular number can be "lucky" or "unlucky" for many weeks). Statistically, how would one generate a set of "random" start orders so that the value of each registration number was roughly equal over the course of a season (for ease of calculation, let's say 10 weeks).<br>
<br>
I don't know anything about math or statistics, so my description of this situation probably uses lots of words incorrectly. I'd google this, but I don't even know how to start.<br>
<br>
Basically, is it possible for value of all of the numbers to even out. But in a random order. So, for example, that 0's aren't always going after 4's. And one group doesn't always start in the middle.
I have a bucket containing <i>N</i> marbles: <i>M</i> white marbles and <i>N-M</i> black marbles. I need to grab a handful of marbles (<i>n</i>) and figure out the probability of having picked up <i>m</i> white marbles. At first, I thought I could use the <a href="http://en.wikipedia.org/wiki/Hypergeometric_distribution">hypergeometric distribution</a>. But there's a complication, namely that the white marbles are not equally distributed in my bucket.<br>
<br>
In other words, if my handful of marbles contains one white marble, I'm more likely to have picked up one or more additional white marbles in my hand, and this probability is different depending on how many white marbles I may have picked up.<br>
<br>
Is there a good approach to modeling or simulating this kind of situation?
It's late, I'm tired, and I have a probability and coding question that's fairly simple. Say I have 5 buckets (A,B,C,D,E), with different colored balls in them. The probability of removing a red ball is different for each bucket:<br>
<br>
A: 1/3<br>
B: 1/4<br>
C: 1/16<br>
D: 1/6<br>
E: 1/9<br>
<br>
If I draw 1 ball from each bucket, what is the probability (P) of drawing at least N red balls? <br>
<br>
Doing it for small numbers of buckets is easy enough, but I have thousands of buckets here. Given that I know the probabilities for each bucket, what's the easiest way to calculate P? I suspect that there's an R function that makes this a breeze, but I'm having trouble tracking it down.<br>
<br>
I can do basic operations in R, and I'm also open to functions or pseudo code from your favorite language. Ruby and Perl preferred, but I can use others if they get the job done.<br>
<br>
For good measure, here's a preemptive "This isn't homework-filter"
How many times would I have to roll a standard 6-sided die to get a statistically representative view of whether it was truly random or not? I have a bunch of dice that I haven't used in years. The other day, I was playing with one, and I noticed that the 5 came up fairly often. I started rolling the die and writing down the results, to see if it was just a short term statistical fluke, or observer bias, or if the die really favored the 5.<br>
<br>
I think I ended up rolling it around 200 times, and 5 definitely had a significant edge.<br>
<br>
Now, I'd like to check my other dice. However, I'm not a statistician. I understand that, obviously, the larger your data set (the more times you roll each die and write down the results), the better your analysis of non-randomness will be. However, I know that it isn't necessary to roll the die 1 billion times to check for randomness, that there is some generally accepted statistical minimum, below which the margin of error is too large, and above which the margin of error is generally considered acceptable.<br>
<br>
How many die rolls is that point?
Suppose you take a test for a rare type of cancer that affects 0.01 percent of the population. The test is 98 percent reliable. You get a positive reading. What are the chances you have the cancer? I read this probability puzzle today and the writer said the statistical chances of you having the cancer in this scenario are less than half a percent. I don't get it. Isn't the rarity factor irrelevant compared with the test reliability? Please explain.