Parse freetext postal addresses to structured form for geocoding to KML?
November 1, 2007 7:53 PM Subscribe
Parse freetext postal addresses to structured form for geocoding to KML?
I have a bunch of contact information (harvested from a very user-unfriendly aetna docfind website) that I want to plot on a google map. Naturally, the info is unstructured. If I want to use batchgeocode.com I need to get it into a structured CSV format.
Back when I used windows, I used to use a product called ListGrabber to perform this operation. Now I'm on the Mac, and I don't pirate software anymore, so any good options?
I came across Geo::StreetAddress::US, but I'm not so perl-savvy anymore (I lean towards python and ruby), and it won't separate out the names nicely (though I can probably find a way around that).
How would you do this?
I have a bunch of contact information (harvested from a very user-unfriendly aetna docfind website) that I want to plot on a google map. Naturally, the info is unstructured. If I want to use batchgeocode.com I need to get it into a structured CSV format.
Back when I used windows, I used to use a product called ListGrabber to perform this operation. Now I'm on the Mac, and I don't pirate software anymore, so any good options?
I came across Geo::StreetAddress::US, but I'm not so perl-savvy anymore (I lean towards python and ruby), and it won't separate out the names nicely (though I can probably find a way around that).
How would you do this?
Response by poster: >sed
I have no desire to write *that* many awful regexps. I've done it once before with this data, and it was very painful.
Looking for something more general purpose that I can use for other similar kinds of projects without writing a different set of regexps for every data source.
Also, forgot to mention Leopard's Data Detectors... sadly I will not be upgrading to Leopard anytime soon.
posted by joshwa at 8:12 PM on November 1, 2007
I have no desire to write *that* many awful regexps. I've done it once before with this data, and it was very painful.
Looking for something more general purpose that I can use for other similar kinds of projects without writing a different set of regexps for every data source.
Also, forgot to mention Leopard's Data Detectors... sadly I will not be upgrading to Leopard anytime soon.
posted by joshwa at 8:12 PM on November 1, 2007
Well once you've got the names off the perl would just be
posted by nicwolff at 10:18 PM on November 1, 2007
perl -MGeo::StreetAddress::US -ne 'print join ", ", map qq("$_"), @{Geo::StreetAddress::US->parse_location($_)}{qw(number street type state city zip)};' datafile.txt
posted by nicwolff at 10:18 PM on November 1, 2007
This thread is closed to new comments.
posted by pompomtom at 7:56 PM on November 1, 2007