Comments on: About Benford's Law
http://ask.metafilter.com/257086/About-Benfords-Law/
Comments on Ask MetaFilter post About Benford's LawThu, 13 Feb 2014 09:17:56 -0800Thu, 13 Feb 2014 09:28:53 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Question: About Benford's Law
http://ask.metafilter.com/257086/About-Benfords-Law
So in many data sets, the leading digit has a 30.1% chance of being 1, and decreasing on down the line.
Fine. But what about NON-leading digits? Are those also irregularly distributed in naturally-occurring data sets, or are they just 11.11% chance, as a layman would expect?post:ask.metafilter.com,2014:site.257086Thu, 13 Feb 2014 09:17:56 -0800OpengreenBenford'slawBy: grudgebgon
http://ask.metafilter.com/257086/About-Benfords-Law#3735866
The effect continues but is substantially less pronounced with each digit. And since zeros are possible for the non-leading numbers, so it converges to 10.0 rather than 11.1.<br>
<br>
Here's a <a href="http://www.usfca.edu/fac-staff/huxleys/Benford.html">Source</a>.comment:ask.metafilter.com,2014:site.257086-3735866Thu, 13 Feb 2014 09:28:53 -0800grudgebgonBy: JumpW
http://ask.metafilter.com/257086/About-Benfords-Law#3735869
The naive guess would be 10% probability that any particular number would occur in a non-leading digits (because there are ten possible numbers: 0,1,..,9).<br>
<br>
<a href="http://en.wikipedia.org/wiki/Benford%27s_law#Generalization_to_digits_beyond_the_first">Wikipedia</a> says that for second digits it is not quite an even 10% probability for each number, but that once you get to the fourth digit it approaches a uniform distribution where each number has a 10% change of occurring.comment:ask.metafilter.com,2014:site.257086-3735869Thu, 13 Feb 2014 09:30:46 -0800JumpWBy: DevilsAdvocate
http://ask.metafilter.com/257086/About-Benfords-Law#3735884
The principle behind Benford's Law is that, for data sets distributed according to it, if you express the values in scientific notation (n*10<sup>m</sup>, where 1≤n<10, and m is an integer, the probability of any n appearing is proportional to 1/n.<br>
<br>
So the probability of the first digit being 1, i.e., 1≤n<2, is (∫<sub>1</sub><sup>2</sup> 1/n dn) / (∫<sub>1</sub><sup>10</sup> 1/n dn)<br>
<br>
Since 1/n is larger for smaller values of n, the probability of n being between 1 and 2 is much larger than the 1/9 of the range it makes up.<br>
<br>
You can use the same principle to derive the probability of the second digit: the probability that the second digit is one is the probability that n is between 1.1 and 1.2, or between 2.1 and 2.2, or between 3.1 and 3.2 ... or between 9.1 and 9.2. This will still show some preference for smaller digits, since 1/n is larger between 1.1 and 1.2 than it is between 1.2 and 2; and larger between 2.1 and 2.2 than between 2.2 and 3, and so forth. But the effect is much less pronounced. (Also note that the most likely second digit is zero, which is not possible for the first digit.) So the theoretical probability that the second digit is one would be:<br>
<br>
(∫<sub>1.1</sub><sup>1.2</sup> 1/n dn + ∫<sub>2.1</sub><sup>2.2</sup> 1/n dn + ∫<sub>3.1</sub><sup>3.2</sup> 1/n dn + ... + ∫<sub>9.1</sub><sup>9.2</sup> 1/n dn) / (∫<sub>1</sub><sup>10</sup> 1/n dn)<br>
<br>
The effect becomes less pronounced with each additional digit: for the first digit, you are taking the first 1/9 of the range; for the second, you are taking nine slices, each 1/90 out of the entire range, spaced 1/10 of the range apart; for the third digit, you are taking 90 slices, each 1/900 of the entire range, spaced 1/100 of the range apart, and so forth. In each case, the slices containing the desired digit take up a total of 1/9 (for the first digit) or 1/10 (for any digit after the first) of the total range, but as you go further to the right, the slices become more numerous and more evenly spaced throughout the entire range.comment:ask.metafilter.com,2014:site.257086-3735884Thu, 13 Feb 2014 09:53:04 -0800DevilsAdvocateBy: beagle
http://ask.metafilter.com/257086/About-Benfords-Law#3735886
Fun fact illustrating the 10% convergence: Until at least the early 1970s, New York newspapers used to publish, every day, the "U.S. Daily Treasury Balance" which was the cash on hand at the United States Treasury. This was usually at least an 11-digit number. The reason was that the last three digits, excluding the cents (because not every paper published the cents and at some point the Treasury started rounding to the nearest dollar) would be the "daily number" in the local mob's numbers racket, which is what folks used to gamble on before state lotteries came along. You picked a three-digit number, gave it to the numbers runner with your bet, and if your number "came in" — matched the last three of the Treasury balance — he'd bring you back the payoff at 600:1. So, if mob statisticians had determined that those last three digits were sufficiently random, they must have been pretty close to a 10.0% probability.<br>
<br>
(Newspapers were clearly colluding with the mob here, since the daily Treasury balance had no particular usefulnesss otherwise, but probably mainly on the theory that publishing the number would sell more papers. <a href="http://scholarlycommons.law.northwestern.edu/cgi/viewcontent.cgi?article=4192&context=jclc">Here's a good rundown</a> on how the game worked in the 50s.)comment:ask.metafilter.com,2014:site.257086-3735886Thu, 13 Feb 2014 09:54:26 -0800beagleBy: DevilsAdvocate
http://ask.metafilter.com/257086/About-Benfords-Law#3735921
Based on the formulas I gave above, here's the (rounded) theoretical probabilities for the first four digits:<br>
<br>
First digit:<br>
1: 30.103%<br>
2: 17.609%<br>
3: 12.494%<br>
4: 9.691%<br>
5: 7.918%<br>
6: 6.695%<br>
7: 5.799%<br>
8: 5.115%<br>
9: 4.576%<br>
<br>
Second digit:<br>
0: 11.968%<br>
1: 11.389%<br>
2: 10.882%<br>
3: 10.433%<br>
4: 10.031%<br>
5: 9.668%<br>
6: 9.337%<br>
7: 9.035%<br>
8: 8.757%<br>
9: 8.500%<br>
<br>
Third digit:<br>
0: 10.178%<br>
1: 10.138%<br>
2: 10.097%<br>
3: 10.057%<br>
4: 10.018%<br>
5: 9.979%<br>
6: 9.940%<br>
7: 9.902%<br>
8: 9.864%<br>
9: 9.827%<br>
<br>
Fourth digit:<br>
0: 10.018%<br>
1: 10.014%<br>
2: 10.010%<br>
3: 10.006%<br>
4: 10.002%<br>
5: 9.998%<br>
6: 9.994%<br>
7: 9.990%<br>
8: 9.986%<br>
9: 9.982%<br>
<br>
Unless you have a <i>humongous</i> data set (probably on the order of millions of values), you won't be able to see a statistically significant difference in the fourth digit.comment:ask.metafilter.com,2014:site.257086-3735921Thu, 13 Feb 2014 10:15:26 -0800DevilsAdvocateBy: IAmBroom
http://ask.metafilter.com/257086/About-Benfords-Law#3736909
<a href="http://ask.metafilter.com/257086/About-Benfords-Law#3735886">beagle</a>: "<i> So, if mob statisticians had determined that those last three digits were sufficiently random, they must have been pretty close to a 10.0% probability.</i>"<br>
<br>
That's... an incredibly naive view on mob statisticians. More likely, if mob statisticians had determined those digits were <em>not</em> completely random, they found a way to hedge their bets. <br>
<br>
Put another way - why would the mob want the game to be fair?comment:ask.metafilter.com,2014:site.257086-3736909Fri, 14 Feb 2014 13:15:36 -0800IAmBroom