January 24, 2005
8:15 PM   RSS feed for this thread Subscribe

Probability question (more...)
posted by trharlan to (17 comments total)
Say, for example, that I am blindly drawing a number. Someone has told me that the number I am drawing is part of a set. I wasn’t told exactly how large the set is, only that it contains every integer from 1-1000, or 1-999, or 1-998, and so on, down to 1-2. In other words, I know that the set contains every consecutive integer up to the largest integer in the set, starting with 1, but it contains somewhere between 2 and 1000 elements.

If I randomly drew a 1, and I was asked to guess how large the set was, wouldn’t I be best off guessing that the set had two elements? Would it be good practice to always guess the smallest possible number of elements for a given draw? Is there a name for this kind of problem? What does the distribution look like for a given number? Can it be solved with/without a Monte Carlo simulation?
posted by trharlan at 8:16 PM on January 24, 2005


Of course, I also realize that when I draw 1,000, I can guess the size of the set correctly every time. and when I draw 999, I can guess right 50% of the time, and so forth, but soon this thought experiment starts to break down fast.
posted by trharlan at 8:19 PM on January 24, 2005


This is sometimes called the Taxicab counting problem. It is similar the problem of estimating the number of tanks an opponent has by observing as many tanks' serial numbers as possible and assuming some rational numbering scheme.

Mathematicians help me out, but I'm not sure there is a 'closed form' solution to this problem. I solved the problem (with no constraint on cab number--could be infinite) using a Maximum likelihood technique for a class. Email me if you'd like the solution and MATLAB code.
posted by fatllama at 8:25 PM on January 24, 2005


On second thought, for a limited time I'll just put the material online. Writeup (pdf, see problem three, absolutely no warranty given, heh heh). And MATLAB code parts 1, 2, 3, and Stirling's approximation.

This is a great problem to think about and to attempt and test solutions. I'll sort of be sad if there turns out to be a closed answer in the general case.
posted by fatllama at 8:35 PM on January 24, 2005


My gratitude! Time to dust off my, err, borrowed copy of MATLAB's student version.
posted by trharlan at 8:41 PM on January 24, 2005


The N+1/N estimate given in that pdf is also the results of taking a Bayesian approach (with a simple posterior probaibility). If N=1 then this collapses to become the same as the Mean approach.

In other words, the answer to trharlan's original question is: 1) Your guess should be double of the number you draw.

2) Unless you draw a number > 500 in which case your guess should just be 1000 (the highest probability among the permitted set)
posted by vacapinta at 10:13 PM on January 24, 2005


That was the same solution I was thinking when reading the question, but of course I don't have any real mathematical basis for that.

It seems pretty rational (to me) that if there is no upper bound on the size of the set, your best guess would be 2 * n (n being the number you drew). Throwing the limit in there leads me to believe that the real solution is probably not intuitive (as I know that many probability problems of this type also aren't).
posted by neckro23 at 10:20 PM on January 24, 2005


This same question has given rise to the Doomsday Theory. If the human race is going to survive for a given number of years (say 1-2100 years or 1-200,000 years), and we were born at (drew the number of) year 1970 (or whatever), shouldn't we assume that the set isn't a particularly large one?

There's a very good book about all this that describes things much better than I have, called The End of the World - The Science and Ethics of Human Extinction.
posted by Jairus at 10:31 PM on January 24, 2005


It's kind of a silly logic though, in that context.

You could do the same thing for an infant. It has some lifespan, but we don't know what it is. We only know that the baby is three days old. But if we were to encounter this individual, we'd expect to encounter him sometime near the middle of his life, in expectation, so clearly the baby must be destined to live for a week or so.

Or for humans at any point in prehistory; say one of the Lascaux artists. The human race has some lifespan, but we don't know what it is. It's more likely that Og is at the burgeoning end of the distribution instead of the very early tail of it, so we should assume that the human race didn't have long to live after ~50,000BC.

It seems to me that it's just expressing ignorance. We know how human lifespans are actually distributed, by observing billions of lives, so we know that the baby example is silly. We don't know anything about the "lifespan" of human civilization, or technological civilizations, except to know that we've made it this far, and anything more is a pure wild-assed guess. The idea of trying to extrapolate a distribution, with a central tendency, and spread, and a shape, from a single observation seems inherently silly to me.
posted by ROU_Xenophobe at 10:58 PM on January 24, 2005


It seems to me that it's just expressing ignorance.

You've completed the first step in rediscovering Bayesian inference.

The idea of trying to extrapolate a distribution, with a central tendency, and spread, and a shape, from a single observation seems inherently silly to me.

How about two? Or three? Or a billion? At what point should a mathematical theory declare "no longer inherently silly"? Better, it should give a best guess (min(2*N,1000) in this case, apparently) along with a measure of how much confidence one might have in that guess.

I read a great article many years ago (the first one on this page) about Bayesian inference in cosmology. We detected some neutrinos from the supernova in 1987, so let's use them to get some info about neutrino masses. How many did we detect? Seventeen.
posted by Aknaton at 11:17 PM on January 24, 2005


ROU_Xenophobe, you should read the book. The authors address your concern/rebuttal. :)
posted by Jairus at 11:23 PM on January 24, 2005


This sounds to me a lot like the Doomsday Argument. (now that I read the posted comments, I see that commenters have already mentioned it).

Nick Bostrum writes about the Doomsday Argument quite a bit. The Bayesian answer only works if you use what he calls the self-sampling assumption: Observers should reason as if they were a random sample from the set of all observers in their reference class ... and that can get pretty indefensible pretty quickly. Here's a good paper on this.
posted by painquale at 1:51 AM on January 25, 2005


You've completed the first step in rediscovering Bayesian inference

You start with assumptions that lead you to a prior that the world is about to end, add an uninformative datum, and you're left with your prior.

How about two? Or three? Or a billion?

With one data point, you can't even calculate a variance. How are you going to extrapolate the shape of a whole distribution from a single data point?

ROU_Xenophobe, you should read the book. The authors address your concern/rebuttal. :)

Thanks, but, well... I don't mean to be rude, but it seems like crank-stuff, and if I'm going to read the work of cranks, it's going to be James Hogan's Velikovskian claptrap or L. Neil Smith's anarcho-libertarian claptrap because that stuff is hilarious.
posted by ROU_Xenophobe at 8:17 AM on January 25, 2005


In this day and age you call Bayesian analysis crank stuff? I guess there's even still dinosaurs around that think all infinities are the same size, but for alephs sake read the book, it's truth.
posted by 31d1 at 8:44 AM on January 25, 2005


I don't believe believe ROU is saying Bayesian analysis is crank stuff but rather that the Doomsday theory is crank stuff. In this, I agree. It makes no sense to me to regard ourselves as "average" when already we are spectacularly unaverage in terms of our place in the Universe. The problem with Bayesian analysis is that it can be twisted to all sorts of interpretations.

And, yes, i have read the book.
posted by vacapinta at 8:52 AM on January 25, 2005


My bad, I thought they was knocking Bayes.
Carry on :)
posted by 31d1 at 8:59 AM on January 25, 2005


you call Bayesian analysis crank stuff?

Indeed, not only is it not crank stuff, I have to teach it to bright-eyed bushy-tailed young grad students later this week. The hard part is getting them to understand it's a process, not a formula to memorize.

which reminds me... *reapplies nose to grindstone*
posted by ROU_Xenophobe at 9:58 AM on January 25, 2005


« Older I just took a nap, and was hav...   |   MusicFilter: I decided to expl... Newer »

You are not logged in, either login or create an account to post comments



Related Questions
How to calculate probability of being dealt... July 9, 2008
complex dice probability make head esplode December 21, 2007
How many rolls 'till n values in a row. December 19, 2007
Looking for help with a question of probability. September 1, 2006
Binomial distribution comparison May 25, 2006