Statistics help
December 1, 2012 8:05 AM   Subscribe

Statistics question about a rare event, and the expected distribution of sightings among witnesses.

Okay, fine. It's about ghosts. I don't believe in ghosts, but someone I was talking to recently knew someone who had seen two ghosts, on two separate occasions, in two separate places.

This got me thinking about what the probability was that a person would have a chance to see two extremely rare, but completely unrelated events. Unfortunately, my stat knowledge isn't up to the task, so I am asking for some help.

So let's assume for a second that the anomalous events that people consider to be "ghosts" are real. Let's also assume that the events only occur in houses. Most houses never have the event occur in them, while a small percent of all houses have the event occur in them frequently.

Some numbers:

75.56 million housing units in the US. (real number)
4.09 persons per housing unit in the US. (real number)
12 hrs/day, average time spent at home per person. (guess)
1%, percentage of houses in which the events occurs. (guess)
1 per 10,000 hours, rate at which the event occurs. (guess)

What would be the expected distribution of sighting events among witnesses? Would most people have seen 0, while a very small percent would see 1, and an even small percent 2, or 3, or 4. Or would it be more like most people would see 0, but a small percentage of people would see 8 or 9?

What about the witnessing of 2 unrelated events in 2 different locations? Would the distribution be different? (this is the question I am really interested in, the previous question just helps me understand this one.)

I know it's a stupid question, but I am more interested in the stat than the other stuff, and a little frustrated that it is out of my depth, so if you could include a short explanation of how you got the answer, I would appreciate it.

(No, this is not homework. It's curiosity.)
posted by 517 to Science & Nature (9 answers total) 1 user marked this as a favorite
 
Best answer: What you need to compute the answer to your first question is
- the probability of each event
- the Poisson distribution, which will give you the probability of a certain number of discrete events occuring (k=0,1,...,n events), if you give it the probability of each individual event.

What you would do here is compute the probability of k=0,1,...n ghost sightings per hour. Each of those hours is independent from all the others, so you'd have a probability density of ghosts per hour.

Then you can multiply that probability density by (the human lifespan in hours)*(12/24) to get the expected number of ghosts in a lifetime.

The 1% of houses is confusing. If you mean that in 1% of houses, the probability is 1/10000 to see a ghost, and in 99% of houses, the probability is zero, then you can multiply your probability density computed above by the appropriate ratio to get the expectations of sightings in the USA. This is assuming that each person has one house which is either haunted or not.

If the 1% means that the probability is actually 1%*(1/10000) per house, then use this to compute the initial Poisson probability density, instead of applying it after.

As for witnessing two unrelated events? If people can go to multiple houses, it's a bit more complicated. If you assume that only some houses have ghosts, then you need the probability that a person will be in two ghost-having houses in a lifetime, AND that there will be a ghost in each one. Off the top of my head I don't know if there is a distribution for this, or if you have to derive an analytic closed form by multiplying two Poisson distributions or taking an integral or something.
posted by kellybird at 8:41 AM on December 1, 2012 [1 favorite]


The difficulty here is that - supposing we believe in ghosts - the two events may not be causally unconnected. The common factor between these sightings is your friend. There might be certain characteristics of a person (psychic receptivity? a friendly aura?) that make them more likely to see ghosts than the average person. So it might not be a pure coincidence that your friend saw ghosts twice.

And of course, if we don't believe ghosts are real supernatural beings, but rather that ghost sightings are caused by some yet-unidentified physical phenomenon like low frequency noises etc, it's likewise possible that your friend has some feature that makes him/her more likely than average to see ghosts (eg higher sensitivity to low-freq noise).
posted by LobsterMitten at 9:20 AM on December 1, 2012 [3 favorites]


Best answer: Yes, you need to make a few more assumptions about the geographic distribution of (a) people's time and (b) these events. Suppose we assume that people spend 12 hours/day in their own house and 1 hour/day in someone else's house, that the other houses people spend time in are randomly distributed, and that these events only occur in 1% of houses. Then there'll effectively be two distributions: one for people who live in "event houses" and one for people who live in "non-event houses". Each one would be governed by a separate Poisson distribution (as described above), but the average rate of sightings would differ for each of these populations.
posted by Johnny Assay at 9:20 AM on December 1, 2012


If we're going with "ghosts are real" then you need to factor in other (equally made-up) variables, like:

1) Are particular people more capable / specially gifted / in-tune / likely to see ghosts?

2) Does seeing a ghost once make a subject more likely to see them again in future?

I'd go with "yes" to both and that is hard to quantify.
posted by DarlingBri at 11:30 AM on December 1, 2012


This got me thinking about what the probability was that a person would have a chance to see two extremely rare, but completely unrelated events. Unfortunately, my stat knowledge isn't up to the task, so I am asking for some help.

I don't think you can really give an answer without considering the possibility that there is a hidden/latent variable playing a role in the distribution. This is basically what LobsterMitten said: probabilities that are vanishingly low in the population at large are not necessarily all that small in certain subgroups. Latent variables that I would suggest may play a role here: prior belief of individuals that ghosts can be observed, prior beliefs that there are ghosts in a particular place, high degree of imagination, good hearing in lower frequency ranges (as suggested already by LM), susceptability to sleep paralysis, and so on. But then I am fairly skeptical.
posted by advil at 11:38 AM on December 1, 2012


Response by poster: Really? You can't make a few assumptions and reduce a variable to approximate an answer like my question already did by assuming events only happen in houses? The point of the question is the statistics of rare events, the ghosts are just a vehicle to ask the question.
posted by 517 at 12:23 PM on December 1, 2012


It's fine if you want to make the assumption that ghosts sightings are causally independent of each other and are randomly distributed, and find out what the odds are with those assumptions in place.

But I was just thinking that if you use the results to say something to your friend like "the odds of you seeing two ghosts are vanishingly small, thus you probably didn't see ghosts", the person will have an obvious counterargument (namely, those assumptions might be false).

If you really are not interested in the ghost thing, that's hunky dory. I was just mentioning this as something to be aware of if you're estimating odds of random events happening to a person twice - need to be sure the events are really random (or causally independent).
posted by LobsterMitten at 12:34 PM on December 1, 2012 [1 favorite]


Best answer: 517, I'm with you that it's pretty straightforward. All you really need here is a Poisson distribution. However, to get the statistics for an individual seeing a ghost (e.g., probability that I will see k=0,1,...n ghosts over my lifetime), you need to do a repeated measures analysis of repeated Poisson events, where the repeated event is seeing a ghost in a given hour. There may be a closed form of that repeated measures analysis but I'm not sure.

Anyway, given your numbers above, assuming 1/100 houses is haunted, the expected number of ghosts any individual would see in a lifetime is about .025. HOWEVER- the way you've set up the problem, it's not evenly distributed. The expected number of sightings for someone living in a non-haunted house is zero. The expected number of sightings for someone living in a haunted house is 25 sightings over their lifetime!

As for the expectation that someone will have seen just one ghost, it's pretty vanishingly small. The odds are that if you live in a haunted house, you will have a bunch of ghost sightings averaging about 25. It would be very unlikely for you to have seen just 1 ghost if you are in a haunted house. (I did the math and it came out to less than 1 person in the USA, but I'm not 100% solid on my repeated measures analysis so I don't want to give that answer directly. Either way, even if my math is a little bit off, it's still a very small probability to have seen just 1 ghost.)

Statistics ftw.
posted by kellybird at 11:34 PM on December 1, 2012 [1 favorite]


Best answer: And "living in a haunted house" can be a metaphor for having a tendency toward paranoid delusions or schizophrenia or whatever. So, you will have some segment of the population having seen lots of ghosts (about 25 ghosts) and some segment having seen no ghosts at all.

Incidentally, by your assumptions, there will be about 3 million people in the USA (1/100) who have seen an average of 25 ghosts each. If you think about it, a person tends to live 500,000 hours, and if the probability of a sighting per hour is 1/10000, you get an expected value of 50 sightings, or 25 if you're home half the time.

The true probability of having seen a ghost is probably less than 1/10000 per hour. The true number of haunted houses is probably less than 1/100. Though, your numbers are not too far off from what would be reasonable (assuming "haunted house" is a metaphor for mental illness).
posted by kellybird at 11:39 PM on December 1, 2012 [1 favorite]


« Older To seek or not to seek?   |   The warm salad that got away. Newer »
This thread is closed to new comments.