How do I turn a large list of locations into small, close groups?
December 3, 2009 2:13 PM   Subscribe

Given a raw list of several hundred addresses, is there an way to divide them into smaller zones based on proximity? No, nothing to do with the DARPA challenge. It's a Santa thing.

My problem is this. I have around 800 packages to deliver around the city in a day, and about 50 people to deliver them. I'm going to assign each person to a zone containing 10-20 addresses. Traditionally, this would mean plotting each address on a map, and just eyeballing it to find clusters. It's a lot of manual work, and I'm wondering if there's any way to automate it.

After I have the zones defined, I can just dump the group of addresses into a Travelling Salesman solver to produce maps and routes, but I'm wondering if there's also a tool somewhere that would find the clustered zones for me. It doesn't need to be optimal at all. If there's a way to import all the addresses into a map, then selectively export a group, that would be an improvement. It's really just all the manual typing and cutting and pasting of each location that is so time consuming.

I guess that if I at least had something that could automatically produce the distance between each node for me, I could figure out how to calculate the groupings. It just seems like the kind of problem that someone else has already solved, and I just need to find it. Thanks!
posted by solo to Technology (12 answers total)
 
MS MapPoint might be able to help you with this.

Alternatively, Gooogle Docs spreadsheet plus a map plugin should do the trick.
posted by Wild_Eep at 2:16 PM on December 3, 2009


Can you get decimal latitude/longitude coordinates for each of the addresses?

If you can, then feeding this data into a K-means clusterer might be a good, relatively simple start. A search for "k-means excel" turned up a few things that looked promising if you use Excel.
posted by tss at 2:30 PM on December 3, 2009


I'm assuming you can geocode these.

geoda, a free GIS tool could help. It can do some clustering analysis.

There is also QGIS, which enables more advanced GIS analysis (free). Or, if you have access to ArcGIS you could do a few more things (not free).

In QGIS or ArcGIS, you could buffer all of the points with a certain radius, and see what blobs form. In ArcGIS you could make a density raster based on the points, and use that. With all of them you can drag in street maps underneath.
posted by a womble is an active kind of sloth at 2:58 PM on December 3, 2009


And if you need it, a very useful web geocoder
posted by a womble is an active kind of sloth at 3:04 PM on December 3, 2009


What you're looking at is the "multiple traveling salesman problem". You can approach it by clustering and then solving multiple single-traveling-salesman instances, but I think you get better results from an actual mTSP solver — it can assign points to one group or another based on the TSP routes it's considering.
posted by hattifattener at 4:01 PM on December 3, 2009


Response by poster: Do you know where I can find an mTSP solver?
posted by solo at 4:15 PM on December 3, 2009


you want carrier route sortation. IDK if you can get a trial copy free?
posted by toodleydoodley at 5:33 PM on December 3, 2009


If you make the clusters, you could use NetworkX to analyze it. Similar question here.
posted by a womble is an active kind of sloth at 6:03 PM on December 3, 2009


If you have access to Mathematica the relevant functions are FindClusters and FindShortestTour, and probably Import.

It doesn't solve the fully optimal multiple traveling salesman problem mentioned by hattifattener. But the results seem good and it sure is fun! I tested it out with 5 clustered tours of some cities in Australia.
posted by hAndrew at 11:45 PM on December 3, 2009


Oh this doesn't solve the problem of converting a list of addresses into a cost matrix... Still fun.
posted by hAndrew at 12:15 AM on December 4, 2009


Response by poster: Just to follow-up, I ended up doing it by hand, with the help of this Excel workbook.

I was doing this on a volunteer basis for a charity, so I was hoping to find a solution I could hand off to them for future use. Meaning I tried to avoid software you have to pay for (they already have Excel). I tried to use QGIS, but the learning curve was too steep for the time frame. I found there are some MTSP solvers for Matlab, but again I balked at the learning curve, and they take the cost matrix as input, so I had to still solve that problem.

I only had two days lead, so I couldn't spend a lot of time studying possible solutions. Just had to hunker down and get it done. But thanks for all the tips. Maybe next year I'll be able to sort through what I've learned here and have a better method.
posted by solo at 11:14 AM on December 10, 2009


I assume you could generate the cost matrix by querying the google maps api or the like. You'd definitely need some algorithm that lets you provide cost estimates or bounds or something, though, so that you don't have to query Google for all 8002=640000 entries of that matrix… that is, start with an as-the-crow-flies estimate, refine as needed.
posted by hattifattener at 6:16 PM on December 10, 2009


« Older Is my apartment company liable for my shorted-out...   |   Meaning of a sign in Tillamook OR? Newer »
This thread is closed to new comments.