How do I optimize the shortest route between all 48 continental U.S. states?
July 5, 2012 6:16 AM   Subscribe

How do I optimize the shortest route between all 48 continental U.S. states?

These guys did it in 6850 miles. This guy did it in 6500 miles, but his route is apparently a trade secret. This guy continued to Hyder, Alaska to pick up a 49th state, but went for speed, not for shortest distance.

How would I go about finding the absolute shortest distance between all 48 states? Do I need to use trial and error or is there an algorithm that will help?
posted by moammargaret to Travel & Transportation (15 answers total) 8 users marked this as a favorite
This is called The Traveling Salesman Problem. It's a biggie in computer science, as it's very complex and difficult to program. The algorithms in the article really only apply to straight line distances, though. I don't know how you'd incorporate highways.
posted by nickhb at 6:25 AM on July 5, 2012

Yep, I meant shortest road routes, not as the crow flies.
posted by moammargaret at 6:26 AM on July 5, 2012

As nickhb said this is a very, very hard problem to solve. You can check out the Heuristics section of the Wikipedia article to figure out the ways to go about solving it.

To add another wrinkle, the Traveling Salesman assumes you're visiting point cities, but in order to figure out the best route between the states you need to pick a point in each state to visit. For some, like Washington, the answer is almost certainly the southeastern-most corner, but for others it's not as clear.

Either way it's a very interesting problem so please post back if you figure any of this stuff out.
posted by Aizkolari at 6:44 AM on July 5, 2012

You would want a "weighted" version of the problem, where each "edge" of the graph has a weight based upon travel time. But the versions of the TSP I'm familiar with are based upon traveling between fixed points in space whereas you're looking at traveling between regions, e.g. you can just sweep through the Four Corners rather than having to travel to the center of each state.

Alternatively, you could take a non-optimized route through the states but collect a piece of each one as you pass through, a stone or a few pounds of soil, and place them near each other at the end to make for arbitrarily short travel times.
posted by XMLicious at 6:52 AM on July 5, 2012

Theory aside, since TFA talks about going 3 feet past the boarder and then turning around as a legit tactic, makes me think there's more to being canny about your starting point and end point could be one great way to shave miles off your trip.

So you'd want to aim to hit corner points/junctions where multiple states come together, or are all in close proximity (start in four corners, say).
posted by k5.user at 6:54 AM on July 5, 2012

Short answer, which you can get from reading the long Wikipedia entry above, is that it's computationally infeasible to compute the exact shortest route with a home computer when you're looking at thousands of cities. Technically what you have is the "generalized" version, but they're equivalently difficult to compute.

However, there are algorithms that will get pretty good answers in much, much less time. But what you need and won't get without paying some money or effort is the matrix of routes between cities. Routes are basically represented as nodes on a graph (the matrix is equivalent) and there are a lot of them. It's actually quite difficult to take the huge set (millions) of entries like: "Heading north on Pine and Main St. you can get to Pine and 1st St. in a .05 mile segment," and produce coherent routes. Trying to run even an approximation of TSP on a set with all the millions of individual road segments would not work without serious computational horsepower, and maybe not even then.

What you need to compute this is a route calculator between a list of identified locations (e.g. cities, but not necessarily). Pick 10 location in each state (not necessarily the biggest, probably mostly ones on the border). Then use the Google Maps API (which you can query 25,000 times a day without having to pay, so it would take 10 days worth of queries) to calculate the distance between those. Probably you could restrict it to states that are adjacent, so you wouldn't need all 250,000 distances.

Then feed that into a TSP approximator. I don't know if there is some off-the-shelf software that you could use, though I note one of the Wikipedia references is a JavaScript site that will optimize a route though any list of locations.
posted by wnissen at 7:01 AM on July 5, 2012 [1 favorite]

All of the examples you listed are also trying to minimize time. I suspect shortest-by-distance will take a much longer time than shortest-by-time, since you're not tied to major roads and highways don't necessarily provide lots of optimization compared to smaller roads that might cut miles off your path.
posted by rmd1023 at 7:17 AM on July 5, 2012

Are there any constraints on your problem? For example, are you able to backtrack along a road you've already traveled and can you visit a state more than once?
posted by TheCavorter at 7:45 AM on July 5, 2012

Trial and error. There's no algorithm, because you're not dealing with idealized topography but actual freaking roads. For example, there are precious few ways of getting around in much of the Rockies, and the fact that a given destination is ten miles as the crow flies may not change the fact that the shortest road distance is several times that. Which might encourage you to take a different route entirely.

Likewise, there are interstates galore, but while using them may minimize your time, they very frequently don't minimize your distance. I routinely drive from Fort Wayne to South Bend, Indiana. As the crow flies, it's a little more than 80, and there are roads that will take me pretty much straight there. But that takes upwards two hours. But if I take Interstates 69 and 80, I can get there in 90 minutes, even though it's more like 120 miles, because I can do 75mph the whole way. Which of those routes you take will depend on whether you want to do a straight time trial or are really keen on minimizing your distance.

But I'm thinking that the result really has to be some kind of spiral. Otherwise you need to go north and south too many times to make it feasible. Start somewhere in the middle and just spiral out from there.
posted by valkyryn at 7:48 AM on July 5, 2012

All I know is that you either start in Maine or you finish there. The rest is extremely complicated, but I must say that the 6,500 figure is rather impressive since it is generally about 3-3,500 miles between the coasts. I'm at a loss to trace even a direct line (to say nothing of winding roads) across the states that could be that short.
posted by dgran at 8:01 AM on July 5, 2012

A route that prioritizes time rather than distance seems like it would be way too difficult to optimize, since you'd have to factor in data on road type and speed limits, etc. (It's also extremely unsafe; I would not want to be the guy in the left lane approached by someone who hasn't slept for 96 hours.)

Not repeating any states would be an interesting constraint; you would, of course, have to start in Maine which would limit your options considerably. The guy who did it in 6500 miles used White River Junction, Vt. as a terminus which was also my pick. Then it seems self-evident that you move down the coast, head west in Maryland and hit the tips of states in Appalachia before nicking the corner of Florida. Then it's up for grabs; the midwestern states only really work as two east-west segments, and the Pacific states are a real problem.
posted by moammargaret at 8:11 AM on July 5, 2012

As others have mentioned above, this is a complex graph problem with nontrivial obstacles to solving it accurately. And I agree with wnissen that to solve it you'll need to get your hands on some kind of GIS API for the road connectivity information to base your graph on.

To be pedantic, though, your problem is *not* quite the TSP problem, and you should be aware of that when trying to use TSP solvers to get your solution.

Let me use an example to make the rest of this comment clearer: suppose you wanted to visit 26 cities in the United States. First you would take the GIS/road data and use it to construct a graph of your problem with nodes labelled A-Z representing your cities, and edges connecting each pair of nodes, representing the "best" route to take if traveling directly between those two particular nodes/cities. Now for a brief tour:

The Travelling Salesman Problem (TSP) would look at your graph and attempt to find the shortest route that starts in one city/node, visits all of the other cities exactly once, and then ends up at the city where you started. A variant called the Open TSP allows for having different start and finish cities.

The Generalized Travelling Salesman Problem (aka Covering Salesman Problem) would attempt to find the same thing, except that you don't have to visit a particular city if it's within X distance of one that you did visit. The ones that are visited are still only visited once at most. Solvers for gTSP greatly concern themselves with figuring out which cities would reduce the overall route the most if left out.

Don't be confused by something called the Chinese Postman Problem - that one tries to find the shortest route that travels down every road (at most once).

What you want is a rarely dicussed variant called the Travelling Salesman Walk Problem (TSW), where the goal is to visit each city at least once, and possibly more if it leads to a lower overall travel cost. A good paper on this variant is "Travelling salesman path problems", by Fumei Lam and Alantha Newman, January 2008. It's not terribly accessible for building algorithms, though.

The reason why computer scientists study TSP is that of all graph routing problems, it's the one with the most restrictions and is the most difficult (time consuming) one to compute. The good news is, relaxing any of the constraints of TSP results in a problem that's easier to solve than TSP itself.

The important detail here is to carefully construct your problem so that it matches the algorithm that you use to solve it. Don't blindly throw your walk graph at a TSP solver unless you really want a TSP solution (to use a TSP solver, first augment your graph to represent a TSP problem, which is beyond the scope of this comment). Your best bet is to think of practical ways to reduce the problem, and then use a custom search algorithm to get (successively better approximations for) your TSW solution.

Hope this helps.
posted by ceribus peribus at 8:28 AM on July 5, 2012 [6 favorites]

Stephen Fry did this (sorta?) for a series of documentaries.

His route (for the documentaries, I'm sure they tweaked this for production) went down the east coast starting in Maine, across the Deep South -- with a detour all the way down to Miami which is not necessary for your purposes -- to New Orleans, then up the Mississippi River to Minnesota, down to the Southwest through the Dakotas, Montana, Nebraska, etc, then back up the west coast to Seattle.

The doc series is worth watching even if you don't use the same route.
posted by Sara C. at 8:52 AM on July 5, 2012

Simulated Annealing is probably the methodology you want to use, but it won't give you the absolute shortest possible solution.
posted by rocket88 at 11:59 AM on July 5, 2012

Ooh, bad news on the Google front. Not only are you limited to 2,500 requests for directions per day, but "using Directions data without displaying a map for which directions data was requested is prohibited." It doesn't matter if you are willing to pay. Obviously Google has signed strict contracts with its GIS data providers not to allow people to perform optimal route calculations of the type you're trying to do. That's especially unfortunate as Google has a really nice API that you can call from Javascript without having to know actual addresses, just cities, or even GPS points.

Ceribus Peribus is right that TSP is not exactly what you want, oops. Having taken a graduate Algorithms class, it seems likely that you could algorithmically convert one to the other, though possibly at the cost of increased graph complexity (and therefore runtime). However, it also, again just off the top of my head, removing the constraint that you can't re-visit a node actually makes the problem easier. This is a really interesting problem, with a real practical application, thanks for posting it.
posted by wnissen at 7:22 AM on July 6, 2012

« Older 頭が真っ白!   |   Insane Kettle Grilling? Newer »
This thread is closed to new comments.