Join some of the dots, throw the rest away
June 26, 2007 8:46 AM Subscribe
Map geeks: can I generate a polygon shapefile from the outline of a large collection of points? There's more detail, of course.
I'm all Googled out, so this is my next stop. Here's a slightly-genericised version of my need:
I am able to generate, from a dataset which is quite old, several thousand points. All of them are in the same area (say, a ZIP code).
In order to test whether new data points - which are newer than that ageing dataset - are inside the same area, I'd like to take what I have and create a shapefile.
This shapefile would need to be a polygon which tracks the outermost points from the cluster I have very tightly, in order to avoid false positives when testing new points. So a bounding box/circle/parallelogram won't do.
Can this be done? And more importantly, can it be done with an existing software package or component which has an open license?
I'm all Googled out, so this is my next stop. Here's a slightly-genericised version of my need:
I am able to generate, from a dataset which is quite old, several thousand points. All of them are in the same area (say, a ZIP code).
In order to test whether new data points - which are newer than that ageing dataset - are inside the same area, I'd like to take what I have and create a shapefile.
This shapefile would need to be a polygon which tracks the outermost points from the cluster I have very tightly, in order to avoid false positives when testing new points. So a bounding box/circle/parallelogram won't do.
Can this be done? And more importantly, can it be done with an existing software package or component which has an open license?
Yeah, you need a convex hull. It's simple to do - qhull has code but it's certainly simple enough to code yourself. See various algorithms here.
After that, it's just hit testing in the polygon, which is also pretty simple. Coding it is simple and qhull can probably do it too.
posted by GuyZero at 9:07 AM on June 26, 2007
After that, it's just hit testing in the polygon, which is also pretty simple. Coding it is simple and qhull can probably do it too.
posted by GuyZero at 9:07 AM on June 26, 2007
nthing convex hull. There's a great book called "Computational Geometry in C" that I used quite a bit. It has a good description of convex hull problem, along with some implementations of solvers in C, which seems like what you want.
There's also this for ArcGIS, which I hope you're using or have access to (otherwise, GIS can be quite a PITA). Its under GPL (why not BSD, I have no clue).
posted by devilsbrigade at 10:09 AM on June 26, 2007
There's also this for ArcGIS, which I hope you're using or have access to (otherwise, GIS can be quite a PITA). Its under GPL (why not BSD, I have no clue).
posted by devilsbrigade at 10:09 AM on June 26, 2007
Best answer: Wait, zip code areas aren't necessarily convex, are they? You probably want a concave hull. (linky)
posted by anaelith at 10:25 AM on June 26, 2007
posted by anaelith at 10:25 AM on June 26, 2007
Even better, ArcCatalog should have zip areas already in polygon format (and if it doesn't, states should have the data published). Why not just match it up? (I hope your point data is georeferenced).
posted by devilsbrigade at 10:29 AM on June 26, 2007
posted by devilsbrigade at 10:29 AM on June 26, 2007
I do exactly this in Java, with JTS and geotools with the shapefile reader. It's really straightforward, I'll toss some example code that won't compile but has all the relevant bits here. And some .jars you might need, though I recommend downloading the official distributions. Oh, and to test whether a point is inside the hull, you'd just do
posted by Xelf at 10:45 AM on June 26, 2007 [1 favorite]
if(bounds.contains(testpoint))
That should get you started, good luck!posted by Xelf at 10:45 AM on June 26, 2007 [1 favorite]
PS. Feel free to email me with questions, address is in profile.
posted by Xelf at 10:49 AM on June 26, 2007 [1 favorite]
posted by Xelf at 10:49 AM on June 26, 2007 [1 favorite]
Response by poster: Sorry, for clarification: when I said "a slightly-genericised version of my need", what that means in practice is that I'm not dealing with ZIP codes at all. I'm trying to recreate electoral area boundaries in a different country.
Unfortunately, therefore, these ZIP-specific comments aren't directly useful.
Clearly convex hull is the thing. The qhull pages, however, seem to deal exclusively in language which relates to 3D applications and/or degree-level maths. A link to example usage with map points (mine are in WGS84 X&Y pairs like Google Maps) would be handy if someone can point me towards a link.
Xelf: thanks, but this is a one-off task. Batch-processing with (eg) PHP, mysql and a shell script will be ugly, but it should get the job done without me having to learn enough Java to follow you :)
posted by genghis at 11:19 AM on June 26, 2007
Unfortunately, therefore, these ZIP-specific comments aren't directly useful.
Clearly convex hull is the thing. The qhull pages, however, seem to deal exclusively in language which relates to 3D applications and/or degree-level maths. A link to example usage with map points (mine are in WGS84 X&Y pairs like Google Maps) would be handy if someone can point me towards a link.
Xelf: thanks, but this is a one-off task. Batch-processing with (eg) PHP, mysql and a shell script will be ugly, but it should get the job done without me having to learn enough Java to follow you :)
posted by genghis at 11:19 AM on June 26, 2007
You could do this fairly easily with the (free) statistical computing program R. There's a bit of a learning curve but it might well be worth it as it is easy to automate data acquisition and output.
There is a function called "chull" which will return the outer points in a collection of x,y coordinates. And another function called "areapl" which will return an area for this minimum area convex polygon (MCP).
Given this information it would be a simple matter to check whether the addition of a new point falls outside the initial area (by checking whether the area of the polygon was changed).
There are also some useful functions available to calculate x% MCP, (i.e. polygon's containing x$ of the points, in order to deal with outliers), or functions to calculate kernel area methods (used in animal home range analysis for example).
posted by jonesor at 11:20 AM on June 26, 2007
There is a function called "chull" which will return the outer points in a collection of x,y coordinates. And another function called "areapl" which will return an area for this minimum area convex polygon (MCP).
Given this information it would be a simple matter to check whether the addition of a new point falls outside the initial area (by checking whether the area of the polygon was changed).
There are also some useful functions available to calculate x% MCP, (i.e. polygon's containing x$ of the points, in order to deal with outliers), or functions to calculate kernel area methods (used in animal home range analysis for example).
posted by jonesor at 11:20 AM on June 26, 2007
The book I referenced covers it for a set of 2D points. I suggest you check it out.
posted by devilsbrigade at 11:46 AM on June 26, 2007
posted by devilsbrigade at 11:46 AM on June 26, 2007
Response by poster: Wait a second...
Guyzero's link contains a number of examples and they all draw a larger bounding box than is strictly necessary. For a mapping application, accuracy is far more important than lowest-number-of-sides.
A point set, for example, containing three of four corners on one square block is drawn as a triangle. For my purposes, that's half a block which should not be "inside the rubber band", not least because it might legitimately belong in an adjacent set instead and thus give a completely incorrect result for any address in that small area.
If that's something you get with all convex hull calculations then we're on the wrong track. Sticking with the analogies, it seems I don't want an elastic band, but shrinkwrapping.
Any clues?
posted by genghis at 8:18 PM on June 26, 2007
Guyzero's link contains a number of examples and they all draw a larger bounding box than is strictly necessary. For a mapping application, accuracy is far more important than lowest-number-of-sides.
A point set, for example, containing three of four corners on one square block is drawn as a triangle. For my purposes, that's half a block which should not be "inside the rubber band", not least because it might legitimately belong in an adjacent set instead and thus give a completely incorrect result for any address in that small area.
If that's something you get with all convex hull calculations then we're on the wrong track. Sticking with the analogies, it seems I don't want an elastic band, but shrinkwrapping.
Any clues?
posted by genghis at 8:18 PM on June 26, 2007
Response by poster: Point taken, but you've posted a link which describes what it is, not a link to something which can do it.
posted by genghis at 7:15 AM on June 27, 2007
posted by genghis at 7:15 AM on June 27, 2007
Concave hull isn't described in any books I've ever read on geometric algorithms, though my knowledge is far from exhaustive. That one site does show some results, but there's no algorithm.
I guess you could take all the points, generate a delaunay triangulation of them and then test each point to discard "inner" points. No, wait, that's the convex hull.
Anyway, I guess it's time to hit the library and do some research. You that is, not me.
posted by GuyZero at 7:27 AM on June 27, 2007
I guess you could take all the points, generate a delaunay triangulation of them and then test each point to discard "inner" points. No, wait, that's the convex hull.
Anyway, I guess it's time to hit the library and do some research. You that is, not me.
posted by GuyZero at 7:27 AM on June 27, 2007
You may want to email Joseph O'Rourke, who has written some books on computational geometry. http://maven.smith.edu/~orourke/
There are so few google and ACM library hits for "concave hull" or "non-convex hull" that I wonder if researchers call this something else.
posted by GuyZero at 7:36 AM on June 27, 2007
There are so few google and ACM library hits for "concave hull" or "non-convex hull" that I wonder if researchers call this something else.
posted by GuyZero at 7:36 AM on June 27, 2007
From the bottom of that page:
An implementation of this algorithm is available online (see the Downloads section).
And here's the downloads section: linky. You should also be able to get the source, if you mail them, as is written on that page.
posted by anaelith at 3:03 PM on June 27, 2007
An implementation of this algorithm is available online (see the Downloads section).
And here's the downloads section: linky. You should also be able to get the source, if you mail them, as is written on that page.
posted by anaelith at 3:03 PM on June 27, 2007
I think the reason you don't see 'concave' hulls around very much is because there can be many concave hulls for a given set of points (to give a very simple example, think about 4 points, 3 in a triangle and one in the center - how would you draw a reliably reproducable concave hull of this?). That's not to say it doesn't exist, just that it may not be of as much interest because its so fuzzy.
posted by devilsbrigade at 4:25 AM on June 28, 2007
posted by devilsbrigade at 4:25 AM on June 28, 2007
This thread is closed to new comments.
This looks like a good place to start for code.
posted by Leon at 8:58 AM on June 26, 2007