Ask MetaFilter questions tagged with statistics and probability
http://ask.metafilter.com/tags/statistics+probability
Questions tagged with 'statistics' and 'probability' at Ask MetaFilter.Wed, 21 Sep 2016 08:49:13 -0800Wed, 21 Sep 2016 08:49:13 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Too Many Strata To Sample
http://ask.metafilter.com/300748/Too%2DMany%2DStrata%2DTo%2DSample
I have multiple independent variables that I want to stratify my sampling across, but the result would be way more strata than I can sample. Is there a way to stratify across different character states for each independent variable without creating strata for the interactions of each? Example inside. In my example I have independent variables Color (red, orange, yellow, green, blue, purple), Shape (square, round, sinusoidal), Size (small medium large), and Flavor (apple, banana, grape). If I tried to stratify my sampling across each of these I would get 162 strata. My population is only about 1,000 individuals though! I can’t pull 30 samples each from 162 strata. In fact, there may not even <em>be</em> any large square blue grapes (for example).<br>
<br>
I don’t really care about the effect of the possible interaction between the terms though. Can I design a sampling regime that would make sure I get a randomly selected 30 of each color, 30 of each shape, 30 of each size, and 30 of each flavor, but not try to get 30 of each combination of color, shape, size, and flavor?<br>
<br>
Why is this a bad idea? Or if it’s not, is there an established way of designing such a sampling regime?tag:ask.metafilter.com,2016:site.300748Wed, 21 Sep 2016 08:49:13 -0800agentofselectionStatistical framing of the engineering and extremism article
http://ask.metafilter.com/293713/Statistical%2Dframing%2Dof%2Dthe%2Dengineering%2Dand%2Dextremism%2Darticle
Reading <a href="http://www.metafilter.com/158088/Does-Engineering-Education-Breed-Terrorists">this </a> article on the blue got me thinking about conditional probabilities, prediction and causality. I came up with an analytical framing of what I think the article is saying and would be grateful if stats/social science Mefites could tell me if it seems accurate or else set me right. Reading this article on the blue got me thinking about conditional probability in a simple discrete 2x2 case. <br>
<br>
Suppose there are two discrete random variables in a population of individuals, A and B. According to conditional probability, P(A,B)=P(A|B).P(B)=P(B|A).P(A). <br>
<br>
If A and B are statistically independent then P(A|B)=P(A) and P(B|A)=P(B). <br>
<br>
Suppose A and B are not independent. Suppose P(A|B)=k.P(A), k>1 But since<br>
P(A,B)=P(A|B).P(B)=P(B|A).P(A) then this implies that k.P(A).P(B) =P(B|A).P(A). Cancelling the P(A)s gives P(B|A)=k.P(B) with k>1. <br>
<br>
So I was thinking about this <a href="http://www.metafilter.com/158088/Does-Engineering-Education-Breed-Terrorists">article</a> about engineers and extremist. <br>
I tried to put the article in the framing above. The way I see it, it could be framed for statistical purposes that the world’s population can be partitioned separately by two random variables. Engineer or not-engineer, and extremist or not extremist.<br>
<br>
The article notes evidence that suggests that P(engineer|extremist)>P(engineer). I.e. engineers are more prevalent among extremists than they are among the general population. The article then considers explanations of the fact.<br>
<br>
However, as far as I can see the algebra above suggests that if the above is true, it IMPLIES that<br>
P(extremist|engineer)>P(extremist), just by the way the 2x2 discrete partitioning and conditional probability works. <br>
<br>
I find this a little shocking. As I was reading the article I was sort of turning my nose up at some of the explanations, and the title, which to me sounded a bit like evidence for P(engineer|extremist)>P(engineer), rather than P(extremist|engineer)>P(extremist). Before I went through the algebra, I assumed it would be possible that P(engineer|extremist)>P(engineer) could be consistent with P(extremist|engineer)=P(extremist), but from the algebra above it appears that this is not the case. Possibly in my original thoughts I muddled prediction and causality.<br>
<br>
What I would like to know is what implications does the 2x2 discrete partition case have for the example, if it turned out to indeed be the case that P(engineer|extremist)>P(engineer). Does it mean, for example, that as an estimator, in this case rather than looking at a sample of extremists and counting the proportion of engineers among them, we could in principle look at a sample of engineers and count the number of extremists? [Aside from the practical problem that we would need to sample a huge number of engineers to sample any extremists at all.] <br>
<br>
Please note: I am aware that statistical dependence is not the same as causality, and that there is a separate “causal” calculus in the statistics/probability literature by Judea Pearl and others, which among other things respects the fact that cause can be unidirectional whereas statistical information flows both ways (i.e. we can predict and retrodict things which both may or not be causally related). I am aware that there is a special P(A|do(B=b)) notation to denote causality. (I have been reading part IV of Cosma Shalizi’s <a href="http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ADAfaEPoV.pdf">book</a> here )) As I understand it, the special causal “do” refers to a manipulated distribution. Crucially, the manipulated distribution may not be identifiable from observations which you cannot control by experiment. Also note I am interested in the statistical/social science framing of the debate here, not trying to make some kind of oblique point about extremism.<br>
<br>
Secondly, I would like to know whether this is a reasonable statistical/social science framing of the article:<br>
<br>
<em>There is evidence P(engineer|extremist)>P(engineer). I.e. there is a selection effect that causes there to be a higher proportion of engineers among extremists than there are engineers among the population as a whole. Could the reason for this be that engineers are more likely to be extremists, i.e there is some statistical causality running from engineers to extremism, because of certain psychological traits of engineers?</em> <br>
<br>
This is where the psychological explanations are brought in. (In causality notation I think exploring this psychological argument would look like an enquiry as to whether P(extremist|do(Engineer))>P(extremist|do(Not engineer)). Others in comments note that when it comes to specifically terrorist extremists, the selection effect possibly explains itself, since presumably engineers are a large part of the few those with skills to carry them out. So inference to the simplest explanation would suggest that the causal explanation that flips the conditioning for the original observed selection effect and imposes a casual“do” is not required. <br>
<br>
Do you think I am on the right lines with this analytical framing? I’m sure you will let me know if I’m far off base here. Many thanks.tag:ask.metafilter.com,2016:site.293713Fri, 25 Mar 2016 07:01:10 -0800mister_kaupungisterCalculating the probability of a scenario that might happen by chance
http://ask.metafilter.com/279849/Calculating%2Dthe%2Dprobability%2Dof%2Da%2Dscenario%2Dthat%2Dmight%2Dhappen%2Dby%2Dchance
I have to validate some input for a database, and I want to present the user with a mathematically accurate estimate of the percentage of data that they entered which may be invalid. The problem is that valid data looks like invalid data 10% of the time. I'm trying to validate UPCs. The 12-digit UPCs you see on a normal barcode always end with a check digit, which is obtained using a simple formula with the other 11 digits as input. These UPCs also always start with one or two zeroes, which will most likely be stripped if they've ever seen Excel at any point (so I can't rely on length to determine if a check digit is present). I will have to prepend zeroes to the input so that each UPC is 11 digits. I need to store the UPCs without their check digits.<br>
<br>
The question is, what is the probability that the user has given me UPCs with check digits? I can calculate what the check digits <em>would have been</em> and compare to what the inputted UPCs actually end with, but that has a 10% chance of matching just by coincidence, since there are 10 possible check digits which are distributed evenly.<br>
<br>
Let's say I receive this as a batch of input data from a single user.<br>
<pre><br>
100 total UPCs.<br>
I calculate that 30 have valid check digits.<br>
</pre><br>
<br>
Statistically, what is the most likely number of the 100 UPCs actually having check digits? I know that some of them must, since if all 100 UPCs were valid, I should only see around 10 with matching check digits. I don't want to confuse people by showing them which UPCs' check digits are matching (since it's hard to explain the concept of 10% might match by coincidence without the multiple paragraph explanation that I just gave) - I just want to say, "Please check your input. It looks like x out of 100 UPCs have check digits."tag:ask.metafilter.com,2015:site.279849Fri, 08 May 2015 10:37:57 -0800zixyerProbability distributions and their practical applications
http://ask.metafilter.com/276170/Probability%2Ddistributions%2Dand%2Dtheir%2Dpractical%2Dapplications
I am looking for a resource that lists probability distributions and their common real-world applications. For example, I'd expect to see: Lognormal - daily returns in the stock market. Poisson - failure rates for mechanical equipment, ... The closest I can find is this wiki entry. http://en.wikipedia.org/wiki/List_of_probability_distributions<br>
<br>
Ideally this would be a handbook which includes applications in as many fields as possible (biology, actuarial, physics, etc.) Does this resource exist?tag:ask.metafilter.com,2015:site.276170Thu, 19 Feb 2015 13:19:17 -0800wivyBest or most appropriate Statistic method to Use.
http://ask.metafilter.com/275045/Best%2Dor%2Dmost%2Dappropriate%2DStatistic%2Dmethod%2Dto%2DUse
I have a statistics and/or probability question and the last time I took a statistics class Vanilla Ice and Andrew "Dice Clay" were multi-millionaires.
I am not looking for a problem to be solved, I am asking what statistical technique should I use to determine if a time series of data is due to randomness or not. For example: <br>
<br>
Let’s say I have a list of 10 College Football Teams (names, mostly made up.) For each team, and for each year over the past five years, I have the total number of points scored in the season. So the raw data looks something like this. Data is arranged alphabetically by team name.<br>
<br>
Bears – Year 1 points scored = 677; year 2 = 654; year 3 = 691; year 4 = 688, year 5 = 692.<br>
Cats – Year 1 points scored = 692; year 2 = 643; year 3 = 656; year 4 = 650; year 5 = 661.<br>
Ducks – Year 1 points scored = 688; year 2 = 692; year 3 = 678; year 4 = 684; year 5 = 696.<br>
<br>
The remaining 7 example teams would have data arrainged in a similar manner. <br>
<br>
Falcons, Gorillas, Killers, Mother Boys, Orange Crush, Peacocks, Vandals.<br>
------------------<br>
To repeat and rephrase my question, what statistics or probability technique should be used so I can be able to say, Team X’s performance, is likely<strong> not</strong> due to randomness, and therefore likely due to something else?<br>
<br>
If you really enjoy this type of stuff and would like to go through the steps, that would be great, but there is no obligation.<br>
<br>
Thank youtag:ask.metafilter.com,2015:site.275045Tue, 27 Jan 2015 06:35:55 -0800otto42Statistics are hard.
http://ask.metafilter.com/274607/Statistics%2Dare%2Dhard
How do you calculate the probability of something when it's not as simple as "do it a bunch of times"? Specifics inside. I have a system. This system has one button, which always performs the same action. Occasionally, when you press the button, the system crashes and has to be reset.<br>
<br>
I've set up an automated button-pusher, but the reset is a manual process. So I end up with several pieces of data that look like this:<br>
- 31 button presses until it crashed<br>
- 42 button presses until it crashed<br>
- 27 button presses until it crashed<br>
<br>
It seems to me that the probability of the crash happening isn't as simple as 3% (3/(31+42+27)), because my sampling stops abruptly when the crash occurs.<br>
<br>
An analogy: You roll a die, and count the number of times it takes you to roll a 1. The odds of rolling a 1 are one in six, but I don't think you can expect an average of six attempts to roll a 1... Or can you?<br>
<br>
Am I just overcomplicating this? I have no idea what sort of statistical language to even search for. All help appreciated, thanks!tag:ask.metafilter.com,2015:site.274607Sat, 17 Jan 2015 20:40:18 -0800DilligasFruit salad for statisticians
http://ask.metafilter.com/273946/Fruit%2Dsalad%2Dfor%2Dstatisticians
What formula do I need to determine the probability that a set of size N contains two elements, each appearing with a specific frequency? Fruit boxes, let me show you them. Let's assume I have 5 apples, 10 bananas, 8 pears, which I will randomly throw into 4 boxes of various sizes to make Christmas presents.<br>
<br>
Box 1 = can fit 3 pieces of fruit<br>
Box 2 = can fit 5 pieces of fruit<br>
Box 3 = can fit 10 pieces of fruit<br>
Box 4 = can fit 5 pieces of fruit<br>
<br>
So 23 pieces of fruit in total and 23 slots in boxes. Now, for each Box, what is the probability that it will contain at least one apple and one pear, given their overall frequency?<br>
<br>
So far I can tell this is going to involve combinations, which I know how to calculate, but it's the added frequency distribution of the fruit types that I am struggling with, and the fact that I don't need to know the number of possible combinations of size r (=box size), but how many of these combinations contain A(pple) and P(ear).<br>
<br>
Y'all can have the biggest box if you help me solve this, or at least point me in the right direction!tag:ask.metafilter.com,2015:site.273946Sun, 04 Jan 2015 19:45:43 -0800Ender's FriendMath Problem
http://ask.metafilter.com/272475/Math%2DProblem
Please help with this probability related math problem. This should be simple for many people. It seems I've been out of a math class for too long. <br>
<br>
"Two teams of rock climbers are each setting up a system of anchors on a cliff. Team A's system consists of three anchor points, where the failure of any one of the three points results in the catastrophic failure of the entire system. Team B's system consists of five anchor points, where failure of any two of the five points results in the catastrophic failture of the entire system. Assuming all anchor points have an equal chance of failing during the duration of the exercise, which team's system is safer, and by how much?"<br>
<br>
In case you are curious, the question is related to this: <a href="http://i.dailymail.co.uk/i/pix/2013/07/05/article-0-1AABFAE5000005DC-315_634x445.jpg">link</a>tag:ask.metafilter.com,2014:site.272475Thu, 04 Dec 2014 04:26:32 -0800BeaverTerrorHas a major lottery ever produced a result that doesn't look random?
http://ask.metafilter.com/271134/Has%2Da%2Dmajor%2Dlottery%2Dever%2Dproduced%2Da%2Dresult%2Dthat%2Ddoesnt%2Dlook%2Drandom
Every outcome in a fair lottery is equally probable, yet some results display obvious patterns and feel less likely to the statistically uninformed. Nobody would blink if a six-number lotto draw came up with (3,12,27,31,40,44), but a result of (1,2,3,4,5,6) would probably make the news. Has this ever happened in a major lottery? If yes, what was the public responce?tag:ask.metafilter.com,2014:site.271134Thu, 06 Nov 2014 13:03:26 -0800Dr Dracator2 problems of combinations and permutations.
http://ask.metafilter.com/263055/2%2Dproblems%2Dof%2Dcombinations%2Dand%2Dpermutations
How many unique ways are there to put X rocks into Y boxes?
(Given two different sets of attributes for both the rocks and the boxes.) Apologies in advance for what is probably some pretty non-standard notation; I am a very amateur mathematician, and I've not studied set theory.<br>
<br>
I'm working on a table of formulas for figuring out various combinations/permutations. The table is missing two formulas, which I've not yet been able to figure out.<br>
<br>
There are a few ways to describe these sorts of problems, but I'm going to use the following metaphor for each of them: How many unique ways are there to put X rocks into Y boxes?<br>
<br>
(In the examples, the rocks are labelled X1, X2, etc., if they are distinct from each other, and simply X if they are not. Likewise for the boxes.)<br>
<br>
.....<br>
<br>
<strong>PROBLEM 1:</strong><br>
How many unique ways are there to put X rocks into Y boxes?<br>
·Every rock is unique. Every rock can be used up to one time.<br>
·The boxes are not unique. Every box can hold an unlimited number of rocks. The minimum number of rocks per box is zero.<br>
<br>
<strong>EXAMPLE 1-1:</strong><br>
If X=2 and Y=2, there are 5 unique combinations:<br>
01. Y() Y()<br>
02. Y(X1) Y() <br>
03. Y(X2) Y()<br>
04. Y(X1) Y(X2)<br>
05. Y(X1,X2) Y()<br>
<br>
<strong>EXAMPLE 1-2:</strong><br>
If X=3 and Y=3, there are 15 unique combinations:<br>
01. Y() Y() Y()<br>
02. Y(X1) Y() Y()<br>
03. Y(X2) Y() Y()<br>
04. Y(X3) Y() Y()<br>
05. Y(X1) Y(X2) Y()<br>
06. Y(X1) Y(X3) Y()<br>
07. Y(X2) Y(X3) Y()<br>
08. Y(X1) Y(X2) Y(X3)<br>
09. Y(X1,X2) Y() Y()<br>
10. Y(X1,X3) Y() Y()<br>
11. Y(X2,X3) Y() Y()<br>
12. Y(X1,X2) Y(X3) Y()<br>
13. Y(X1,X3) Y(X2) Y()<br>
14. Y(X2,X3) Y(X1) Y()<br>
15. Y(X1,X2,X3) Y() Y()<br>
<br>
.....<br>
<br>
<strong>PROBLEM 2:</strong><br>
How many unique ways are there to put X rocks into Y boxes?<br>
·The rocks are not unique. Every rock must be used exactly once.<br>
·The boxes are not unique. Every box can hold an unlimited number of rocks. The minimum number of rocks per box is one.<br>
<br>
<strong>EXAMPLE 2-1:</strong><br>
If X=3 and Y=3, there is 1 possible combination:<br>
01. Y(X) Y(X) Y(X)<br>
<br>
<strong>EXAMPLE 2-2:</strong><br>
If X=8 and &=3, there are 5 possible combinations:<br>
01. Y(XXX) Y(XXX) Y(XX)<br>
02. Y(XXXX) Y(XX) Y(XX)<br>
03. Y(XXXX) Y(XXX) Y(X)<br>
04. Y(XXXXX) Y(XX) Y(X)<br>
05. Y(XXXXXX) Y(X) Y(X)tag:ask.metafilter.com,2014:site.263055Wed, 04 Jun 2014 18:49:18 -0800CustooFintelChance of event with small sample size, based on larger related sample?
http://ask.metafilter.com/258957/Chance%2Dof%2Devent%2Dwith%2Dsmall%2Dsample%2Dsize%2Dbased%2Don%2Dlarger%2Drelated%2Dsample
Can/how can one improve the estimate for a chance of an event with a small historical sample size by utilizing the chance of a related event with a large historical sample size? Example and half-assed guess inside. A baseball-related example (for the purposes of this question, please forget about complicating factors like lefty/righty splits, home/away splits, the fact that a particular player might be better or worse now than he was in the past, etc.):<br>
<br>
Joe has had 1000 at bats. He has gotten a hit in 270 of those 1000 at bats.<br>
<br>
Of those 1000 at bats, 10 were against the pitcher Fred. In those 10 at bats against Fred, Joe got six hits.<br>
<br>
Clearly we can say "Joe is a .600 hitter against Fred". But also clearly, that doesn't really have any meaningful predictive power for Joe's future at bats against Fred. If we want to guess what the chance of Joe getting a hit off of Fred is, 27% is almost certainly a much better guess than 60%.<br>
<br>
But can we use <i>both</i> pieces of information to get a guess that's better than "27%"?<br>
<br>
I have a half-assed guess, which I'll describe momentarily, but it occurs to me that this is probably a problem which has been thought about rigorously by mathematicians. So does anyone know if there's a "real" answer to this problem?<br>
<br>
My half-assed guess is something along these lines:<br>
<br>
Joe has 10 at bats against Fred, and 6 hits in them. But Joe has 1000 at bats total (with 270 hits). Let's assume that if Joe had had 1000 at bats against Fred, 10 of them would have gone as they did, and the other 990 would have been as if against an average pitcher. So Joe would have gotten:<br>
<br>
6 + 990 * 270 / 1000<br>
<br>
= 6 + 267.3<br>
<br>
= 273.3<br>
<br>
So we guess that in his upcoming at bat against Fred, Joe has a 27.33% chance of getting a hit.tag:ask.metafilter.com,2014:site.258957Thu, 20 Mar 2014 07:20:36 -0800FlunkieI'm having trouble understanding likelihood ratios and diagnostic tests.
http://ask.metafilter.com/245816/Im%2Dhaving%2Dtrouble%2Dunderstanding%2Dlikelihood%2Dratios%2Dand%2Ddiagnostic%2Dtests
I'm struggling to understand likelihood ratios (LR) in the context of diagnostic tests, and why a positive LR is influenced by the sensitivity of the test. I know that:<br>
<ol><br>
<ul>1. its a tool to get from a pre-test probability to a post-test probability</ul><br>
<ul>2. it is defined as the (percentage of people with the disease who test positive) divided by (the percentage of people without the disease who test positive).<br>
</ul><br>
<ul>3. Or, alternately, a positive LR is equal to sensitivity/ (1-specificity).<br>
</ul><br>
</ol><br>
</ul><br>
What I don't understand is <strong>why is a positive LR dependant on the sensitivity of the test?</strong><br>
<br>
For example, lets say I want to diagnose someone with Hairy Face Syndrome, and my diagnostic test is that the clouds open up, and God comes down from the heavens and tells me "this man has hairy face syndrome!"<br>
<br>
This is, understandably, a very *specific* test, but very poorly sensitive. <br>
<br>
The way I've always understood likelihood ratios is that the positive likelihood ratio helps you interpret what to do with a positive result, not how useful searching for a positive result is likely to be.<br>
<br>
Therefore, why does the low sensitivity of waiting for an act-of-god negatively impact the likelihood ratio and, consequently, your interpretation of it when it does, miraculously happen?tag:ask.metafilter.com,2013:site.245816Thu, 01 Aug 2013 08:48:46 -0800cacofonieIs there a name for this logical fallacy? It has to do with statistics.
http://ask.metafilter.com/237042/Is%2Dthere%2Da%2Dname%2Dfor%2Dthis%2Dlogical%2Dfallacy%2DIt%2Dhas%2Dto%2Ddo%2Dwith%2Dstatistics
The fallacy is assuming that statistic information about a thing is more relevant in dealing with a particular instance of that thing than available first-hand data. The basic form of the fallacy is this:<br>
<br>
<strong>Premise:</strong> I am in situation X.<br>
<strong>Premise:</strong> Statistics show that X most commonly follows the pattern of situation Y.<br>
<strong>Premise:</strong> I should base my actions on the available statistics.<br>
<strong>Conclusion:</strong> I will react to situation X according to my rules for reacting to situation Y.<br>
<br>
(Forgive me if I put that poorly; my formal logic is rusty.)<br>
<br>
<strong>An example:</strong><br>
Jack is walking in the park when he sees two people walking dogs. One is walking a golden retriever, and the other is walking a pit bull. The golden retriever is straining at its owner's leash, barking at everyone who passes, growling, and baring its teeth. The pit bull is walking calmly by its owners side and letting people pet it without complaint. Jack has to pass by one of the dogs to continue. He recalls reading a very reliable and well-sourced study which said that pit bulls are 35% more likely than golden retrievers to be aggressive and dangerous towards strangers; therefore, in order to stay safe, Jack chooses to walk past the golden retriever, and avoid the pit bull.<br>
<br>
Obviously, statistics have their use, and if Jack did not have first-hand data, his choice would have been logical. If, for instance, he didn't see the dogs, but was simply told by a friend that one path would lead past a golden retriever, and other path would lead past a pit bull, it would be logical to conclude that, given the limited information, there was a higher probability of meeting with an unfriendly dog if he chose to walk past the pit bull. But Jack should have realized that he had access to first-hand information (his witness of the dogs' behavior) that was more likely to represent the temperaments of those particular two dogs than the statistical average was.<br>
<br>
Is there a name for this fallacy? Or a way of expressing it mathematically? The closest I can find is the <a href="http://en.wikipedia.org/wiki/Ludic_fallacy">ludic fallacy</a>, but I suspect that there are better ways to express it.tag:ask.metafilter.com,2013:site.237042Tue, 12 Mar 2013 08:44:45 -0800CustooFintelHuman Random Generator
http://ask.metafilter.com/231631/Human%2DRandom%2DGenerator
Can you think of a method that allows an individual to pseudo randomly create a sequence of numbers (at the very least the randomness is opaque to the minds of other people) assuming said individual may only use his mind and body (no physical tools are allowed)? Some use cases to test this method:<br>
* tell someone a number between 1-100<br>
* select 10 out out of 20 doors and select 5 out of those 10<br>
* create a string consisting of 10 ascii characters<br>
* select a date and time (YYYY-MM-DD, HH:MM)tag:ask.metafilter.com,2012:site.231631Fri, 21 Dec 2012 16:47:33 -0800Foci for AnalysisWhat does a professional statistician do?
http://ask.metafilter.com/230283/What%2Ddoes%2Da%2Dprofessional%2Dstatistician%2Ddo
(Good) jobs involving probability and statistics other than math teacher or actuary? I am trying to determine what career I should pursue. I like math, specifically probability and statistics. Teacher and actuary are both on the table, but I want to know what other options there are.tag:ask.metafilter.com,2012:site.230283Mon, 03 Dec 2012 14:10:17 -0800CustooFintelHow to solve a complex statistics problem with a script?
http://ask.metafilter.com/230075/How%2Dto%2Dsolve%2Da%2Dcomplex%2Dstatistics%2Dproblem%2Dwith%2Da%2Dscript
In this game, you roll a number of six-sided dice to get a <strong>total</strong>. The total is either the highest single die result, or the sum of any multiples rolled, whichever is higher.
For example: If I roll three dice and get a 3, 4, and 6, my total is 6. But if I roll a 4, 4, and 6, my total is 8, the sum of the two 4s.
What I want to find out is the mean, median, mode, and standard deviation of the possible totals given N dice. How might I create a simple script to compute this? With two or three dice, I can easily figure this out by listing all the possible results, basically by brute force. But that's a pretty labor-intensive method when it comes to four or more dice.<br>
<br>
I have a Macintosh, and I'm comfortable using the Unix command line and several programming languages for simple problems, but I'm not even sure where to start automating something like this. I'd be grateful for any guidance.tag:ask.metafilter.com,2012:site.230075Fri, 30 Nov 2012 18:26:42 -0800j0hnpaulPractice Probability Word Problems
http://ask.metafilter.com/228436/Practice%2DProbability%2DWord%2DProblems
What great books or resources are there for practicing probability word problems such as for standardized tests like the GRE? I have a difficult time deciphering the wording in probability problems, and need basic/intermediate practice problems. It should cover concepts such as when to multiple or add probabilities, conditional probability, independent and mutually exclusive events, etc. It can't be overly basic either (ie: probably not Khan Academy).<br>
<br>
Schaum's outline seems like a great choice, but the <a href="http://www.amazon.com/exec/obidos/ASIN/0071350047/metafilter-20/ref=nosim/">reviews on amazon</a> say its full of mistakes which I cannot have when learning.tag:ask.metafilter.com,2012:site.228436Thu, 08 Nov 2012 11:55:46 -0800Mr. PapagiorgioWhat are the odds that two randomly selected people share the same bank PIN?
http://ask.metafilter.com/227580/What%2Dare%2Dthe%2Dodds%2Dthat%2Dtwo%2Drandomly%2Dselected%2Dpeople%2Dshare%2Dthe%2Dsame%2Dbank%2DPIN
Statisticsfilter: Given available information about the distribution of self-selected 4-digit passwords (specifically banking PINs), is it possible to calculate the probability of two randomly selected individuals having the same PIN? If so, what're the odds? What I'd like to know is, if you had the sample set used <a href="http://www.datagenetics.com/blog/september32012/index.html">here</a> (a compelling analysis of leaked 4-digit passwords), could you calculate the probability of two random users in that dataset sharing a PIN? Given the information visualized there, could you? (Turn the infographic back into numbers, do some math magic?)<br>
<br>
If not, what data <i>would</i> you need in order to approximate a probability? Or if so, what are the odds?<br>
<br>
<small>Other perhaps-relevant thing I found while googling around: <a href="http://dimle.wordpress.com/2012/03/14/probability-of-the-same-pin-digits/">Probability of the same PIN digits</a>, author assumes uniform distribution (I know that's an actual probability/statistics term, but that's all I know).</small>tag:ask.metafilter.com,2012:site.227580Sat, 27 Oct 2012 13:17:54 -0800myrrhHow do I calculate the probability of a specific sum of repeated die rolls?
http://ask.metafilter.com/225560/How%2Ddo%2DI%2Dcalculate%2Dthe%2Dprobability%2Dof%2Da%2Dspecific%2Dsum%2Dof%2Drepeated%2Ddie%2Drolls
I'm looking to learn how to calculate probabilities for a multi-round dice game. I've researched this question some, and it looks like I might need to know how to use the multinomial distribution, but I can't find any good introductions. Please point me to the most layman-accessible educational material on this subject, and help me to help myself. The game I'm playing can be abbreviated like this:<br>
<br>
I roll two six sided dice and add them.<br>
I then subtract a (fixed) penalty value, and score as many points as remain.<br>
I cannot score less than 0. If I would score less than 0 points, I score 0 instead.<br>
I repeat this process several times, without altering the penalty between rolls.<br>
<br>
How do I predict the probability of any 1 particular cumulative score?<br>
I am not interested in simulating this event and taking a random sample; I want to calculate all possible outcomes and their precise probabilities. Examples follow:<br>
<br>
<br>
Example #1:<br>
My penalty value is 0, and I repeat the die roll 36 times.<br>
Each roll will produce a number between 2 and 12 inclusive, and my final score will be between 72 and 432. What is the chance that I will score 7 points exactly 6 times.<br>
<br>
Example #2:<br>
My penalty value is 11, and I repeat the procedure 40 times.<br>
Each roll will produce a value between 0 and 1 inclusive, and my final total score will be between 0 and 40, and likely closer to 0 than 40.<br>
I wish to predict the chance that I will score a cumulative total of exactly 2 points across the sum of all 40 trials.<br>
<br>
Example #3:<br>
My penalty value is -5, and I repeat the procedure 5 times.<br>
Each roll will produce a value between 7 and 17 inclusive (negative penalty). I wish to plot the probability of a arriving at each possible total: 35 through 85. <br>
<br>
Again, I think maybe this is a case of the muiltinomial distribution, but everything I can find about it (wikipedia) is confusing and not written as educational but as reference material. <br>
<br>
I'm familiar with MS Excel and have a basic Liberal Arts understanding of calculus and probability/statistics, though I'm quite rusty. I'm willing to re-learn what I have forgotten, and also to learn new skills - within reason. If this procedure requires me to learn lots of new math, I would appreciate links or book titles to pursue as well as a general outline of what the fields of study are called, and what I'll need to know to pursue them intelligently (i.e. help me ask questions without sounding ignorant/clueless). <br>
<br>
I gather that the program R might be helpful in this, so please feel free to recommend a good introduction to both R and Statistics.<br>
<br>
Thanks for any help you're able to give!tag:ask.metafilter.com,2012:site.225560Fri, 28 Sep 2012 22:16:58 -0800Richard DalyRecommendations for great books about probability and risk.
http://ask.metafilter.com/222451/Recommendations%2Dfor%2Dgreat%2Dbooks%2Dabout%2Dprobability%2Dand%2Drisk
Recommendations for great books about probability and risk. I'm looking for great pop-science books (well-written, well-referenced) about risk and probability as well as books relating to perception of risk and the associated cognitive biases. <br>
<br>
I'm also interested (it may seem tangential but it isn't to me) in books that explain mathematical modelling of real-world processes (e.g. Bayesian statistics, Game theory) in non-mathematical terms so that an intelligent non-mathematician could grasp them. <br>
<br>
If it helps, my motivation: I am a doctor, I spend much of my life talking to people about probability and risk. I wish to deepen my understanding and develop a more intuitive appreciation of it partially to satiate my own curiosity and partially so that I am better able to navigate patients' pre-existing cognitive biases.tag:ask.metafilter.com,2012:site.222451Fri, 17 Aug 2012 05:29:28 -0800inbetweenerMost 'realistic' rpg rules?
http://ask.metafilter.com/210490/Most%2Drealistic%2Drpg%2Drules
What are the most mathematically 'advanced' RPG systems? Pen & paper and otherwise? There's been a lot of work done in the past 30 years on statistical modeling, as well as a lot of computing power available in mobile devices, but it seems to me as someone who has been only casually following RPG development that RPG's are still mostly relying on dice and cards and basically haven't gotten much more advanced than craps or poker. <br>
<br>
I'm curious if maybe I've missed some development to produce more realistic character generation and world modelling? I know that some gamer someone has to have done some work along these lines, even if the games failed.<br>
<br>
So what's out there? I'm mostly interested in developments in tabletop gaming, but even like really modern RPG's seem like they're just using really dice rolls in the background. Is that the case?tag:ask.metafilter.com,2012:site.210490Wed, 14 Mar 2012 07:24:58 -0800empathProbability Question
http://ask.metafilter.com/195168/Probability%2DQuestion
Probability Question A quick question of probability that I can't figure out. For traits A,B,X,Y. If the probability of A and X is 0.9, the probability of A and Y is 0.1, the probability of B and X is 0.75, and the probability of B and Y is 0.25.<br>
<br>
If Y, what is the probability of B as opposed to A?tag:ask.metafilter.com,2011:site.195168Sat, 03 Sep 2011 17:56:41 -0800nickhbStatistics filter: How do I calculate the keyspace size of a password?
http://ask.metafilter.com/194528/Statistics%2Dfilter%2DHow%2Ddo%2DI%2Dcalculate%2Dthe%2Dkeyspace%2Dsize%2Dof%2Da%2Dpassword
How do I figure out what the size of this password's keyspace is? I currently work for a large company with an archaic IT infrastructure and am forced to change my password every 90 days. While I don't specifically think that such a policy is unwarranted, I am constantly annoyed by the arbitrary restrictions that are placed on the passwords that they will allow me to use. I'm currently estimating the total number of distinct passwords that are possible in this system to be in the realm of 160 trillion, an astonishingly small keyspace for a modern password.<br>
<br>
Help me figure out what the exact size of the keyspace is given the following requirements:<br>
<ul><br>
<li>Must be exactly 8 characters.</li><br>
<li>Must contain at least 1 uppercase character</li><br>
<li>Must contain at least 1 lowercase character</li><br>
<li>Must contain at least 1 number</li><br>
<li>Must contain a leading letter (upper or lower)</li><br>
<li>May contain up to 2 special characters ($ or # only)</li><br>
<li>May not have repeating characters</li><br>
</ul>tag:ask.metafilter.com,2011:site.194528Sat, 27 Aug 2011 01:35:00 -0800vmrobNeed help with a probability question
http://ask.metafilter.com/189228/Need%2Dhelp%2Dwith%2Da%2Dprobability%2Dquestion
Can someone help with a probability question? At least I think that's what it is. I'm not a math person, so I may not even be asking the right question:<br>
<br>
16 contestants<br>
500,000 votes cast<br>
Out of the 8 semi-finalists, two contestants received the exact same number of votes (so they went with 9 instead of the usual 8).<br>
<br>
What is the probability that there would be a tie?tag:ask.metafilter.com,2011:site.189228Sat, 25 Jun 2011 20:18:22 -0800caroljean63Picking the correct probability distribution
http://ask.metafilter.com/175604/Picking%2Dthe%2Dcorrect%2Dprobability%2Ddistribution
Which probability distribution should I use to model examination results? At the moment I'm using the Beta distribution, but that's mainly because it looks right and is relatively easy to implement in Excel, which is what I use as a markbook. I don't think that the normal distribution is correct because that'd create a symmetric graph and the results are usually biased around one end of the scale, but I wonder about other distributions.<br>
<br>
I'm a scientist, so I can understand the maths, but I haven't done a lot of stats work and so I don't know which distribution is appropriate for which situation.tag:ask.metafilter.com,2011:site.175604Thu, 13 Jan 2011 01:28:10 -0800alby