Finding closest address pairs from a list
April 17, 2010 1:24 PM Subscribe
I have a list of several hundred complete U.S. street addresses. For each record, I need to identify another record with the nearest address (in road distance), and its distance from the subject record’s address. Is there a free, easy way to accomplish this?
For example, if the list is of the form: Record ID, Street, City, ST, Zip:
1001,123 Main St, Anytown, NY, 12345
1002, 44 E Broadway, Random City, NJ, 07654
1003, 404 S Washington, Smallburg, VA 23456
I need to produce a results list with (at least): ID, nearest ID, road miles:
1001, 1766, 44.9 miles
1002, 3105, 8.6 miles
1003, 2094, 70.1 miles
But I have no GIS tools or skills. In a magic world where all wishes come true, there is a web service where I could upload the file and get back the results list, for free! Or a Google Maps add-on. Or maybe there is some user-friendly trialware I can download and use (without programming skills). Any ideas?
For example, if the list is of the form: Record ID, Street, City, ST, Zip:
1001,123 Main St, Anytown, NY, 12345
1002, 44 E Broadway, Random City, NJ, 07654
1003, 404 S Washington, Smallburg, VA 23456
I need to produce a results list with (at least): ID, nearest ID, road miles:
1001, 1766, 44.9 miles
1002, 3105, 8.6 miles
1003, 2094, 70.1 miles
But I have no GIS tools or skills. In a magic world where all wishes come true, there is a web service where I could upload the file and get back the results list, for free! Or a Google Maps add-on. Or maybe there is some user-friendly trialware I can download and use (without programming skills). Any ideas?
You should be able to retrieve the lat/lon from Google maps for each address. Add columns for these values, store them, and then you can use pretty simple math to determine which record's lat/lon is closest to the current.
(There's a little distortion thrown in by the fact that the distance equivalent of a degree of longitude is variable and dependent upon lattitude, but the distance equivalent of a degree of lattitude is fixed. You can decide whether your addresses are far enough apart to worry about this.)
The retrieval and storage of lat/lon pairs, correction for lattitude, and comparison of lat/lon pairs to determine closest address should all be fairly simple to script, if you have some programming skill. If you don't have programming skill, it could still be done with cheap labor, but the comparison step really ought to be computerized.
posted by richyoung at 1:37 PM on April 17, 2010
(There's a little distortion thrown in by the fact that the distance equivalent of a degree of longitude is variable and dependent upon lattitude, but the distance equivalent of a degree of lattitude is fixed. You can decide whether your addresses are far enough apart to worry about this.)
The retrieval and storage of lat/lon pairs, correction for lattitude, and comparison of lat/lon pairs to determine closest address should all be fairly simple to script, if you have some programming skill. If you don't have programming skill, it could still be done with cheap labor, but the comparison step really ought to be computerized.
posted by richyoung at 1:37 PM on April 17, 2010
Best answer: There is a free geocoding service BatchGeoCode. A free GIS software is QGIS that is cross platform and easy to use.
I think the best approach would be to use BatchGeoCode, and then calculate a straight line distance using algebra for the lat/long (you could do this in excel). To do it using the actual road, you would need a road network (which you can freely get) but then you need to make a network and calculate an Origin-Destination matrix which is not an easy task.
posted by a womble is an active kind of sloth at 1:39 PM on April 17, 2010 [1 favorite]
I think the best approach would be to use BatchGeoCode, and then calculate a straight line distance using algebra for the lat/long (you could do this in excel). To do it using the actual road, you would need a road network (which you can freely get) but then you need to make a network and calculate an Origin-Destination matrix which is not an easy task.
posted by a womble is an active kind of sloth at 1:39 PM on April 17, 2010 [1 favorite]
Ah, I believe Snerd wants _road distance_, which is rather different from distance in the plane, which is in turn different from distance on a globe.
It sounds like a job for Google Earth or Google Maps. Road distance is a Hard problem, and google seems to have solved it as well as anyone could be expected to. If you can get together some sort of script that gets the distance between two entries automagically, then you would just have to iterate the script over your dataset, and you'de be done.
posted by kaibutsu at 1:41 PM on April 17, 2010 [1 favorite]
It sounds like a job for Google Earth or Google Maps. Road distance is a Hard problem, and google seems to have solved it as well as anyone could be expected to. If you can get together some sort of script that gets the distance between two entries automagically, then you would just have to iterate the script over your dataset, and you'de be done.
posted by kaibutsu at 1:41 PM on April 17, 2010 [1 favorite]
It's doable. You will need coding skills to go through a list, pick pairs of points, query google maps, and update a second table (your distance matrix).
But I have no GIS tools or skills.
Then no, you cannot do this yourself. Post this as a job on a site scriptlance.com and pay someone to do it.
posted by special-k at 1:51 PM on April 17, 2010 [1 favorite]
But I have no GIS tools or skills.
Then no, you cannot do this yourself. Post this as a job on a site scriptlance.com and pay someone to do it.
posted by special-k at 1:51 PM on April 17, 2010 [1 favorite]
It's not difficult to do for a programmer. The GMaps link is easily generated for each item in your dataset, downloaded and parsed to extract the road distance, hence generating a table of every address distance to every other address.
How are you looking to do this? Online, or desktop?
As a crutch, you can use a great circle distance calculator or algorithm to determine 'as the crow flies' distance using latitude/longitude.
Actually, this sort of interests me for my own purposes where I'm using the 'as the crow flies'. It would be more useful to make more use of Google for the distance by road. If no-one pops-in with an existing solution, get in touch.
posted by hungrysquirrels at 1:55 PM on April 17, 2010
How are you looking to do this? Online, or desktop?
As a crutch, you can use a great circle distance calculator or algorithm to determine 'as the crow flies' distance using latitude/longitude.
Actually, this sort of interests me for my own purposes where I'm using the 'as the crow flies'. It would be more useful to make more use of Google for the distance by road. If no-one pops-in with an existing solution, get in touch.
posted by hungrysquirrels at 1:55 PM on April 17, 2010
Best answer: I think you've just described the Nearest Neighbor Problem, which is astonishingly computationally expensive to work out. Unless you use some clever spatial index, you'll need to calculate N2 cartesian distances for N points - but since I now see you want road distances, you've just added a whole level of complexity.
You could script Google Maps and scrape the result, but I'm sure they'd be none too pleased with tens of thousands of queries coming from your machine in quick succession.
posted by scruss at 2:03 PM on April 17, 2010
You could script Google Maps and scrape the result, but I'm sure they'd be none too pleased with tens of thousands of queries coming from your machine in quick succession.
posted by scruss at 2:03 PM on April 17, 2010
It's not free, but you can do all of this in ArcGIS. I didn't mention it in my earlier answer as it is very expensive to purchase, but there are several wizards that help with this process. No programming skill is required, but ArcGIS is a behemoth of a program so you probably need someone with experience to do this for you.
posted by a womble is an active kind of sloth at 2:16 PM on April 17, 2010
posted by a womble is an active kind of sloth at 2:16 PM on April 17, 2010
google "routing software", there are a number of solutions for this.
posted by HuronBob at 2:21 PM on April 17, 2010
posted by HuronBob at 2:21 PM on April 17, 2010
Response by poster: Thanks, everyone! Each answer so far is helpful. BatchGeo will take me a long way toward a solution. This is a one-time project, so investing in a GIS or routing software isn't practical.
I neglected to include in my question that if no neighbor is within X miles "as the crow flies", that's all I need to know for that specific address. I can reduce the number of comparisons needed by splitting the list up into groups of adjacent states, and calculate distances of each combination within each group. Looks like road miles won't be practical for me to determine.
posted by Snerd at 2:45 PM on April 17, 2010
I neglected to include in my question that if no neighbor is within X miles "as the crow flies", that's all I need to know for that specific address. I can reduce the number of comparisons needed by splitting the list up into groups of adjacent states, and calculate distances of each combination within each group. Looks like road miles won't be practical for me to determine.
posted by Snerd at 2:45 PM on April 17, 2010
If it's several hundred, you could do road-distance semi-manually in Google Maps.
posted by zippy at 3:18 PM on April 17, 2010
posted by zippy at 3:18 PM on April 17, 2010
Without programming skills, you're not likely to find a free way to do what you want. I have puny programming skills, but I was able to bang something together in Processing that did something similar.
If your timeframe permits it, I would submit that this would be the perfect problem for you to learn some programming skills with. These days, the barrier to entry is very very low, and if you're the type of person who regularly has problems like this in his/her life, you will benefit hugely from some simple programming skills. A bit of Processing and the Google Maps API will get you exactly what you want. The documentation is extensive and accesible to anyone with some basic intelligence and patience.
I'd say it would be a month or so before you could get all the pieces in place, with a little help from the internet. But it would be a good investment.
posted by Casimir at 4:07 PM on April 17, 2010
If your timeframe permits it, I would submit that this would be the perfect problem for you to learn some programming skills with. These days, the barrier to entry is very very low, and if you're the type of person who regularly has problems like this in his/her life, you will benefit hugely from some simple programming skills. A bit of Processing and the Google Maps API will get you exactly what you want. The documentation is extensive and accesible to anyone with some basic intelligence and patience.
I'd say it would be a month or so before you could get all the pieces in place, with a little help from the internet. But it would be a good investment.
posted by Casimir at 4:07 PM on April 17, 2010
Congratulations -- you've just requested a solution to a variation on one of the most famous unsolved problems in computing.
It's not quite as bad as the TSP, but it's still going to be a bear to compute all of the possibilities if you've got a long list.
posted by schmod at 7:34 PM on April 17, 2010
It's not quite as bad as the TSP, but it's still going to be a bear to compute all of the possibilities if you've got a long list.
posted by schmod at 7:34 PM on April 17, 2010
Best answer: You could use OpenTripPlanner once you've geocoded the addresses. Nothing to buy, just some setup and then some simple scripts to do the pairwise comparison. If you assume that road distances are at most some multiple of cartesian or modified-cartesian (to account for curvature) distance, you can cut out some of the trip planning work by only comparing among things that might be nearby.
If your distances are large, it might take a while to download and compile all of the map data. You might try seeing if you can reduce this by cleverly choosing to only download regions where there's more than one point. Where there's only one, you could simply choose the nearest point by great circle distance.
posted by novalis_dt at 8:34 PM on April 17, 2010
If your distances are large, it might take a while to download and compile all of the map data. You might try seeing if you can reduce this by cleverly choosing to only download regions where there's more than one point. Where there's only one, you could simply choose the nearest point by great circle distance.
posted by novalis_dt at 8:34 PM on April 17, 2010
Interesting, hadn't heard of batchgeo.com before. Sounds better than Gmaps for getting lat/lon info.
If you are allowed to drop addresses further than X miles as the crow flies, and you're allowed to generalize the conversion from degrees of longitude to miles, and you don't need road distances... then brute force and cameradv's formula will get you there. It's not that bad to ask a modern computer to perform 250,000 pythagorean theorem calculations, even if it doesn't win you any awards for elegant programming. What else were you going to use all those idle megaflops for?
posted by richyoung at 9:28 PM on April 17, 2010
If you are allowed to drop addresses further than X miles as the crow flies, and you're allowed to generalize the conversion from degrees of longitude to miles, and you don't need road distances... then brute force and cameradv's formula will get you there. It's not that bad to ask a modern computer to perform 250,000 pythagorean theorem calculations, even if it doesn't win you any awards for elegant programming. What else were you going to use all those idle megaflops for?
posted by richyoung at 9:28 PM on April 17, 2010
« Older I love you to dear, but my high score is still... | What is a definitive source for the exact wording... Newer »
This thread is closed to new comments.
Then use Excel or something similiar to convert the differences of the coordinates into miles.
remember that the distance is the sqrt(square(x distance)+square(y distance))
posted by cameradv at 1:35 PM on April 17, 2010