Comments on: Details of the German Tank Counting Method.
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method/
Comments on Ask MetaFilter post Details of the German Tank Counting Method.Wed, 19 Sep 2007 22:04:40 -0800Wed, 19 Sep 2007 22:04:40 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Question: Details of the German Tank Counting Method.
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method
Details of the German tank counting method (as seen on reddit)? <br /><br /> I'm interested in knowing more about the statistics (or at least the history) behind the story of the <a href="http://www.daltonlp.com/view/507">german tank counting method</a>. More information (and a citation that is useless to me, as I don't have access to the journal) <a href="http://www.scc.ms.unimelb.edu.au/whatisstatistics/gtanks.html">here</a>.post:ask.metafilter.com,2007:site.71956Wed, 19 Sep 2007 21:38:26 -0800tehgeekmeistergermantankcountingmethodwwiiworldwariitankpanzerstatisticsmathBy: chrisamiller
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1071875
I believe it had to do with the serial numbers that the manufacturers stamped on the tanks. By seeing only a few tanks, one could extrapolate all the numbers in between.comment:ask.metafilter.com,2007:site.71956-1071875Wed, 19 Sep 2007 22:04:40 -0800chrisamillerBy: chrisamiller
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1071877
Here's a post talking about <a href="http://www.guardian.co.uk/g2/story/0,,1824525,00.html">some of the details</a>. Collecting a few stories about this would make a good post in the blue, I imagine. It's fascinating stuff.comment:ask.metafilter.com,2007:site.71956-1071877Wed, 19 Sep 2007 22:06:14 -0800chrisamillerBy: geoff.
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1071890
I rewrote this a couple times, but here is the best way to explain it without getting too mathematical.<br>
<br>
We are making the following assumptions:<br>
- The serial numbers are meaningful and accurate. That is the Germans (or Google) doesn't just give serial numbers willy nilly. Also there is no replacement for serial numbers.<br>
- The serial numbers are unbias. That is that certain serial numbers aren't more likely to be included than others.<br>
- The found serial numbers follow a Gaussian distribution and all its implications, including finite variance.<br>
- That the highest serial number is at equal to or less than the largest amount produced. That we won't see 1500 and only have 1498 tanks.<br>
<br>
Then, knowing all this, you take the lowest number and the greatest number and computed variance using statistical analysis and calculus (which we won't get into here, google minimum variance unbiased estimator for the calculus, I am sure there is a lot on it). It computes what the population estimate should be based on the variance of the serial numbers given that the the observed sample is random (that is in the Gaussian sense).<br>
<br>
I don't know if this would work with Google serves. It certainly might, and is interesting, but there is nothing to say that the observed sample is both unbias and of finite variance without replacement. Simple there's a lot of assumptions made that aren't verified.comment:ask.metafilter.com,2007:site.71956-1071890Wed, 19 Sep 2007 22:26:22 -0800geoff.By: geoff.
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1071892
I should add there's a lot of ways to estimating N from the sample population, including t-distributions. Because we have no idea how or why google numbers its servers there is no reason to assume that it is normally distributed. Given that Google uses Markove-state switching for its Page Rank, I would not be surprised if they used some weird, non-normal distribution for this -- for reasons we might not realize but makes sense when designing the system.<br>
<br>
I could just be over thinking this though.comment:ask.metafilter.com,2007:site.71956-1071892Wed, 19 Sep 2007 22:30:26 -0800geoff.By: bsdfish
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1071961
Suppose the tanks (or servers) are numbered 1, 2 ... n where n is the total number of tanks (or servers). When you observe the serial number of a few(k) tanks, you look to see what the highest one is. Lets suppose that number is m. That obviously implies that the number of tanks is at least m, but in fact, with all likelihood, there are more than m tanks. <br>
<br>
Think of what happens when you randomly pick one number from 1 to n. The average number you'll pick will be (n+1)/2. So, if the one number which you saw was m, your best estimate of m is the solution to m=(n+1)/2 which means that n = 2m-1 and the number of tanks is 2m-1 (remember the 0th tank).<br>
<br>
Now, imagine that you have captured several (k) tanks, each one chosen uniformly (no tank has a higher probability of being captured than others, and capturing one tank doesn't make capturing others any less likely). If the true number of tanks is n, the highest serial number observed (m) will, on average, be (n+1)*k/(k+1), and so, the most reasonable estimate for n will be m *(k+1)/k - 1. I may be messing up some of the constants here (ie, it may not be -1 but +1 somewhere), but this is the idea.<br>
<br>
We make the following assumptions: a) the tanks are numbered 1 ... n and no serial number is skipped. b) each tank has the same probability of being captured. c) Tanks are captured independently of each other. Some of these assumptions are somewhat easy to relax; if we think the tanks are numbered starting with some number that's not 1, we can estimate the starting point as well. However, the uniform sampling and independence assumptions are practically impossible to get rid of, because that's what gives you the power to say anything about the total number of tanks.comment:ask.metafilter.com,2007:site.71956-1071961Thu, 20 Sep 2007 01:37:32 -0800bsdfishBy: smackfu
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1072007
Incidentally, this is why your e-commerce store should randomize order numbers. Otherwise competitors can do the same analysis on your # of sales.comment:ask.metafilter.com,2007:site.71956-1072007Thu, 20 Sep 2007 05:29:05 -0800smackfuBy: a robot made out of meat
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1072027
We did this with continuous distributions (to death) in math stat.<br>
<br>
The likelihood function for drawing k x's from n without replacement is fact(n-k)/fact(n) if max(x)=<n>=k, so the maximum likelihood estimate n_hat is just max(x). I'd have to think a bit about a least MSE or UMVUE, but I recall the other options not being so different.</n>comment:ask.metafilter.com,2007:site.71956-1072027Thu, 20 Sep 2007 06:00:50 -0800a robot made out of meatBy: a robot made out of meat
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1072034
HTML ate my math with less thans in it. The key bit: the likelihood function is fact(n-k)/fact(n) if max(x) lt eq k. So L is ONLY a function of max(x) and is strictly decreasing for all n_hat (the guess at n) gt eq max(x), so the maximum likelihood estimate n_hat is just max(x).comment:ask.metafilter.com,2007:site.71956-1072034Thu, 20 Sep 2007 06:06:14 -0800a robot made out of meatBy: lioness
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1072193
tehgeekmeister, if you would like to read the article of the citation in full, my e-mail is in my profile.comment:ask.metafilter.com,2007:site.71956-1072193Thu, 20 Sep 2007 08:39:21 -0800lionessBy: cowbellemoo
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1072195
I have nothing to add, but I'll 2nd the call to have this stuff FPPed. Fascinating!comment:ask.metafilter.com,2007:site.71956-1072195Thu, 20 Sep 2007 08:41:10 -0800cowbellemooBy: lioness
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1073392
Very short summary of the article:<br>
<br>
"Part I of this article describes the historical development and problems of a technique of economic intelligence which sought to overcome the basic inadequacies of other types of intelligence. This technique involved analyzing the markings found on enemy equipment in order to obtain useful information about German armaments production.<br>
<br>
In Part II, the reliability of the estimates achieved by this analysis have been assessed on the basis of official German production records which have since become available.<br>
<br>
The first product to be analyzed by this technique was tires, then tanks, trucks, guns, flying bombs and rockets. Aircraft markings were not studied by the Economic Warfare Division, since, by previous agreement, the British Air Ministry bore the resonsability for all estimates on aircraft production."<br>
<br>
The article is very detailed about the used technique and gives clear examples, but is unfortunately too long to summarize.comment:ask.metafilter.com,2007:site.71956-1073392Fri, 21 Sep 2007 03:33:55 -0800lionessBy: lioness
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1074021
For people interested in the statistics:<br>
Goodman (1952). "Serial Number Analysis," Journal of the American Statistical Association, 47:622-634, which is a follow-up study on Ruggles & Brodie (1947).comment:ask.metafilter.com,2007:site.71956-1074021Fri, 21 Sep 2007 16:23:47 -0800lionessBy: vegetableagony
http://ask.metafilter.com/71956/Details-of-the-German-Tank-Counting-Method#1093351
This topic was discussed in my statistics class just recently. Here is the hand out: http://pages.pomona.edu/~jsh04747/courses/math152/tanks_all.pdf<br>
unfortunately it does require a good amount of knowledge of beginning statistics. Particularly interesting are the graphs in back that show the accuricy rate of a variety of different methods that could be used to predict the total number of tanks produced given some random subset of those numbers that you find.comment:ask.metafilter.com,2007:site.71956-1093351Tue, 09 Oct 2007 23:46:04 -0800vegetableagony