Can you figure out the probability of a specific outcome?
September 11, 2016 11:30 AM   Subscribe

I work in a field where we think about possibly dangerous people. If Mr. X is a poor driver and has a 5% chance of running a red light in the next month, can you know the probability of him running a red light at a specific intersection? Can you know a range of probabilities for that specific intersection? What if it's the intersection that he drives through most often? I think this likely has an easy answer but none of us can clearly explain it so we keep talking about it.
posted by kerf to Science & Nature (6 answers total) 2 users marked this as a favorite
If you know Mr. X has a 5% chance of running a red light in a month, but do not now how many traffic lights Mr. X sees in a month, you cannot estimate the chance that he will run any given light. If you knew that Mr. X encounters 100 traffic lights in a month, and 5% of months runs one of them, then his chance for running each traffic light he encounters would be 1-(100th root(1 - 0.05)).

That assumes that Mr. X's bad driving is completely random, rather than influenced by various factors (such as being late or drunk or in a bad mood or something)
posted by aubilenon at 11:35 AM on September 11, 2016 [1 favorite]

There is no easy answer, here, actually, because spatial data (and spatial-time data) are not random. Even if you're looking at one specific intersection, you need data how much traffic there is through that intersection throughout the day, you need to know what intervals you're dividing into (are you looking at hours, half hours, fifteen minute intervals), you need to look at the direction he's coming from (that intersection may be blind from one direction but have a great view from another), you need to know if he's driving the same vehicle every time, you need to look at whether special events that screw the traffic patterns are common in that area... you get the point, and we haven't even gotten into Mr. X's behavioral patterns as aubilenon pointed out. You can model such things - after all, insurance companies looks at our driving records and makes predictions about our driving, and thus our insurance rates, all the time - but it's not simple or easy, because the data isn't random but dependent upon a host of interconnected factors.

(There are actually technical terms about these issues and how you deal with them, and I can pontificate further if you want [though I'll be better equipped in a few months when I get further into spatial data analysis]. Or you can find your nearest geographer. :) )
posted by joycehealy at 11:45 AM on September 11, 2016 [2 favorites]

I work in risk management, and if this kind of question came up I would probably call our actuary and say "Hey Mark, here's the situation, what do you think?" and let him do his thing.

If you are actuaries, maybe tap into whatever professional information networks you have access to.
posted by Lexica at 11:50 AM on September 11, 2016

joycehealy is right about the details and difficulty of getting a precise answer. But there are ways to arrive at a back-of-the-envelope answer that may produce a useful result. To answer the question of how likely is Mr. X to run a particular red light, you need to know how many traffic lights in total he encounters every month and how many times he encounters this particular traffic light (call it Z). Suppose he sees 1000 traffic lights every month and 80 of them are at intersection Z. Then for months in which he runs a red light (5% to begin with), 8% of the all the possibilities are at intersection Z. Thus the probability of him running a light at Z in a given month is 5% * 8%, or 0.05 * 0.08 = 0.004 = 0.4%. To arrive at this simple result you still have to determine the area Mr. X covers, how many traffic lights there are in that area, and what his typical routes are so that you can count how many times he hits each traffic light in his area.

This answer may be wildly inaccurate for all the reasons joycehealy mentions, but it's a start. To refine the answer, you can break the problem down where you have more information. For example, your "5% chance of running a red light" might really be "3% on weekdays but 7% on weekends". There are likely different routes on weekends vs weekdays too, so the light at Z might be seen a different number of times. There are daily and monthly cycles (rush hour, school in session, holiday season) that change traffic patterns. Fundamentally though you are counting all of the traffic lights Mr. X sees over some time interval, spreading the probability that he will run a red light over all the lights in that interval, and calculating how much of that probability occurs at a single traffic light.
posted by ldenneau at 12:43 PM on September 11, 2016

Best answer: There is no principled way to move from just the information "Mr. X has a 5\% chance of running a red light this month" to anything more precise than "Mr. X's probability of running a red light at this specific intersection is somewhere between zero and 5\%."
posted by ROU_Xenophobe at 12:45 PM on September 11, 2016 [1 favorite]

I mean, not without actually observing what his probability of running a red light there is, or without actually observing some other set of probabilities in the real world. But then you would have much more information than "Mr. X has a 5\% chance of running a red light this month."
posted by ROU_Xenophobe at 12:47 PM on September 11, 2016

« Older Historic San Antonio hotel with windows that open?   |   Best practices for turning a yard into a garden Newer »
This thread is closed to new comments.