How to model the accident rate on a certain stretch of road
May 7, 2012 2:19 AM   Subscribe

How could I approach modeling the number of accidents on a certain stretch of road? I've got some semi-ecological approaches in mind but no idea if they're workable. Are there standard approaches to this sort of problem?

Recently I've been traveling a certain 250 mile stretch of road quite a bit. It's reasonably dangerous--one wet, drunken three-day weekend several years ago 27 people died on it--and I've found myself wondering what the serious accident rate is. (No lives will depend on my answers; nor am I doing this for a school project. This is all about curiosity.)

It occurred to me that I've traveled this road--let's say 30 times in 6 months. That's effectively 30 "samples" (or "transects," maybe?), and I've seen the remains of one serious accident. Could I use these samples to make an estimate of the serious accident rate?

I gather ecologists and zoologists do something similar when doing animal censuses: they do a transect, count up the animals they see, then make some assumptions about how many they're missing and derive an estimate from that. Others look for animal leavings (dung, animal carcasses, etc) and extrapolate from there. Could I do something similar in this case with the wreck I saw (and those I didn't)?

(Of course my "transect" is different in that cars don't travel outside the transect; I'm only interested in accidents along the transect itself.)

My thinking is that the remains of a serious accident will have a "half-life" based on how long it takes to get a tow-truck service out to remove the wreckage. I'd guess that 50% of remains are gone in a day or so: support can be called out relatively quickly as there are five towns along this 250-mile route, spaced fairly evenly (two at the termini, and then every 50 miles in-between), but it might take a while for them to muster up removal services, especially for big wrecks. The one I spotted was a burnt-out tanker and a delivery van that appeared to have had a head-on collision. I assume that large wrecks like this would have been difficult to move.

So could I use my "samples," my one observation, and some assumptions like the "half-life" of a wreck to produce a sensible estimate of the accident rate?

I don't need this data: I'm mostly just curious about whether and how my line of thinking could be used to make a reasonable estimate of the accident rate. Would it work? Are there other, better ways of approaching this kind of modeling?

Full workings of the problem, tips, examples, links to articles and books on modeling technique would all be very welcome.
posted by col_pogo to Science & Nature (9 answers total) 2 users marked this as a favorite
AusRAP (the Australian Road Assessment Program) gives this explanation of how they figure out their statistics:
Risk Maps.
Collective Risk Maps show the density, or total number of crashes on a road over a given length. Collective risk is calculated by dividing the number of casualty crashes on a link by the length of the link.
Individual Risk Maps show casualty crash rates per vehicle kilometre travelled. This is the risk rate for individual drivers. Individual risk is calculated by dividing the frequency of crashes by the distance travelled by vehicles on each road link.
Risk Maps have been produced for the National Highway Network,on which roads crash fatalities typically account for around 15 per cent of Australia's annual road fatalities.
I assume they have access to data that a layperson wouldn't necessarily have about serious crashes: police statistics with location information for instance, and insurance claims.
posted by Fiasco da Gama at 3:41 AM on May 7, 2012

One thing to consider is that your trips (probably) aren't random, therefore your data will be biased due to time of day/day of week. So you will probably mis-estimate the true accident rate.
posted by shrabster at 5:45 AM on May 7, 2012 [1 favorite]

Umm... most municipalities keep records of these kinds of things. Insurance companies certainly do. So you don't actually need to model it at all: the actual data is out there.

Whether or not you can get it is a different question. Even if you can, it's not likely to be online.
posted by valkyryn at 7:10 AM on May 7, 2012

I have been known to read the odd transportation accident analysis site – but for different reasons than you are wanting. There is a lot of detail that goes in to examining traffic accidents – from the age and experience level of the drivers; to the quality and condition of the vehicles; to the design and visibility of the road; to injuries versus casualties and the weather.

Rather than using your experience on the roadway – I would suggest looking in to the accident rates at that particular roadway. As someone suggested, the City or Police would have that data, and I would bet there is already a pattern (time of day; weather).

In my experience, reconstruction is a matter of what the data is to be used for. There are computer simulation reconstruction programs that are used more frequently.

Some of my favorite sites are: ; and .
posted by what's her name at 7:42 AM on May 7, 2012

The basic tests you're talking about, that ecologists do with transect data, are called occupancy analysis. The corrections you can do to make sure you not missing false negatives or including false positives is called detection analysis. That might help you start googling.

The classic book is Occupancy Estimation and Modeling and the classic program to analyse the data is Program MARK which is available for free (but isn't intuative to use).

As for the particulars of your problem, I feel like occupancy analysis wouldn't work the way you want it to but I have to think about it more. The big thing is that your trips won't be random, but you should be able to correct that using detection.
posted by hydrobatidae at 7:44 AM on May 7, 2012 [2 favorites]

If you just want to know the number of accidents, I'd say if you're in the UK, consult Every death on every road in Great Britain 1999-2010, if in the US you want US road accident casualties: every one mapped across America.
posted by Mike1024 at 7:46 AM on May 7, 2012

I appreciate the advice on looking up actual stats--those are some fascinating studies from Mike1024 and Fiasco. I am, however, more interested in the modeling as an intellectual exercise (and besides, where I live those stats are actually neither reliable, nor necessarily available).

Hydrobatidae's tips look like just the stuff. Any other advice along those lines?
posted by col_pogo at 10:24 AM on May 7, 2012

The main problem that you are going to encounter, which hydrobatidae already alluded to, is that your sampling is dependent upon how representative your observations are of the whole. The best way to ensure this is to make sure they are random, and that there are enough of them to cover significant variations in the underlying data. You can determine this statistically, but you can get part of the way there just thinking about it. Do your observations all fall within roughly the same time of the week? If so, then you are systematically biasing your sample. For example, if more accidents generally occur Friday-Sunday, and you always travel on Tuesday-Thursday, your observations will underestimate the total number of accidents. Same goes for time of year, time of day, sunny/rainy weather, etc. This problem is exacerbated by your low sample size.

Also, your 'half-life' estimate makes certain assumptions about kinds of accidents. It may be the case that, on the scale of minor fender-bender to three-car pileup it takes, on average, a day to clear the wreckage. But that might be due to the fact that all severe accidents take a long time to clear up, and all minor ones are cleared immediately. If there are many more minor than major accidents, then far more than 50% of all accidents will be cleared within a day.

One final idea: if you can characterize that strip of road in useful ways, you might be able to find other strips of road with similar characteristics, then make inferences to your own from available data.
posted by googly at 10:45 AM on May 7, 2012 [2 favorites]

The nice thing about detectability modelling is that it takes into account all the factors that googly mentioned. For example, you have a column with each trip (each trip! not just each trip that you notice an accident!) and then you put in all the variables that describe the situation - weather, viability, time, etc. - then the model calculates how likely you are to see an accident if it's present. The occupancy models then correct the accident rate depending on how likely you were to see them.
posted by hydrobatidae at 3:43 PM on May 7, 2012 [1 favorite]

« Older How big a deal is this numbnes...   |  There have been questions abou... Newer »
This thread is closed to new comments.