Can't Get There From Here? Data on distances between places, etc.
January 20, 2020 7:45 AM   Subscribe

(1) The most basic thing I'm looking for is table data for mileage between points. Something like this, but the raw data. I'll take it by zip code, I'd love to have it by census tract or taz.

(2) The application for the information in #1 is to use other information I have to assess why people are choosing certain modes of travel for certain types of trips. Ie, it's basically an ingredient for a transportation modeling exercise. So!...if people have other tips or links to public resources for how people travel, also very welcome!

Eg, I'll be looking at work trips vs non-work trips across metropolitan areas and seeing why, let's say Chicago's 'commute alone' rate is twice that of NYC. I'll be trying to take into account things like access to transit modes / pedestrian amenities / land use organization / cost to park, etc.

Also open to taking suggestions on false correlations to watch out for.
posted by Reasonably Everything Happens to Computers & Internet (3 answers total) 4 users marked this as a favorite
 
I only have basic knowledge in the program but I think ArcGIS or a similar open source version would be perfect for this kind of project
posted by raccoon409 at 7:49 AM on January 20, 2020 [1 favorite]


Best answer: Google distance matrix API should give you the information.
posted by rdr at 8:09 AM on January 20, 2020


Best answer: Okay, so there's distance and then there's distance. This can quickly turn into a high level of complexity; consider carefully what you actually need. This can start out quick and dirty and turn into a multiyear project if you do it right; I do the multiyear project approach for governments myself. Travel demand modelling is an entire (small) field, I've been doing it for 20 years. (Feel free to memail me if you have more specific questions, btw.)

The easiest is just to use the haversine formula to calculate a straight line distance from lats and longs; this may be enough for your purposes in a lot of situations. (A related version is Manhattan distance, which is the north-south distance plus the east-west distance, as if all trips are made in a grid.) It's a wild simplification, but it's also something that can be done in Excel if needed and is where I'd start.

The second easiest is to use network distance; ie the distance travelling on a road network. This takes into account that the road network is not perfectly connected; this can be important at the microscale with pedestrian and transit access movements, where a cul-de-sac or a freeway can really hamper access. It's also important in cases where the straight line is not a possible path; in metros like San Francisco or New York, it's not possible to drive a relatively straight line between pairs of points because there are bays in the way and limited bridges (versus Chicago or Phoenix).

Both Arc and QGIS have network plugins, and you can download OpenStreetMap networks, but I've always had a hard time with these. One key problem with these - more for cars then pedestrians - is that the straightest route isn't necessarily the most commonly taken route, because of higher-speed roads like freeways. Goeff Boeing has a really good package called osmnx which is a Python library that does a ton of great network analysis aimed at the pedestrian scale, and available in a Docker container which I found easier to get working.

The best measure is actually travel time. For one, that's the thing people consider the most (how often have you heard someone say they live 10 minutes from somewhere, which by implication is a 10 minute drive), and the time can put different modes on the same foot. Time is by far the most meaningful way to look at transit, since the physical distance isn't always well correlated with time. For instance, from the California Ave Green Line station on Chicago's Near West Side to the Morgan station on the same line, it's 2.3 miles and 8 minutes by train. From the same starting point to the corner of Roosevelt and Damen, which is the same 2.3 miles, it takes 25 minutes and you have to switch between two different buses. This also is important, since transit service can change substantially by time of day; less in NYC and Chicago - until you get to commuter rail, but in a lot of suburban contexts in the US there are hourly buses and it's important to be able to consider that; the simplest way is to consider a number of potential departure times (which can affect connections as well). I've had success in the past using OpenTripPlanner to pull together the OpenStreetMap network and add in agency specific GTFS transit timetables.

The importance of travel times also extends to car trips where there are faster and slower routes because of freeways or other faster roads, as well as when traffic becomes an issue. For instance, querying now, it takes 35 minutes to drive from O'Hare to the Art Institute of Chicago downtown, but only 30 minutes in the reverse direction, which is by definition the same distance. La Grange is about the same distance from O'Hare, but it's only 23 minutes by car. (And that's on a holiday around noon; the congestion would be higher on a weekday - Google suggests it would take between one and two hours to go downtown on a Monday at 7:30.)

There are specialized software packages, like EMME, Cube, TransCad and MatSim, which can simulate roadway congestion, but then you need all the trips that are going on the roads to create the congestion, and this will become a multi-year project.

If you're only interested in the present day, there are APIs you can use, Google for one, but there are others (I've not used them personally) to get this information which are useful if you have on the order of hundreds of points, but can quickly get expensive as you pass thousands of points.

I'm not sure what data you're thinking of using as input data in terms of trips, but the NHTS travel survey data already has great circle distance calculated as a field, so you might want to start with that.
posted by Homeboy Trouble at 10:31 AM on January 20, 2020 [13 favorites]


« Older Help formatting a document for a critique   |   Why were my audio files distorted when streamed on... Newer »
This thread is closed to new comments.