How do I combine polygons?
April 11, 2007 7:48 AM   Subscribe

Given multiple contiguous polygons, the coordinates of which are precisely known, how can I combine them into a coordinate set for a single polygon?

I'm creating some maps that show the zip codes within a US metro area, shaded by the frequency of a particular event within each zip code. I've got the cartographic boundary files from the Census Bureau, and have created some pretty nice-looking maps using Perl and GD. However, I'd like to draw custom borders around certain contiguous "clusterings" of zip codes. Can anyone describe (not necessarily in code -- even just at an abstract, theoretical level) how I might combine multiple polygons (for which I have exact coordinates) into a single polygon, the outline of which I can overlay on the map as the "cluster border?"

The only thing I've thought of would be to actually draw in a temp image each zip code as a filled polygon, and then do a pixel-by-pixel analysis of the resultant image to figure out where the border lies, but surely there's a simpler way.... isn't there?
posted by Doofus Magoo to Technology (18 answers total) 5 users marked this as a favorite
You have a bunch of points, and want to find the convex hull of them all. This is a well studied problem. Search on Google you can find lots of algorithms.
posted by chunking express at 7:55 AM on April 11, 2007

At least, that's what it sounds like you want to do.
posted by chunking express at 7:56 AM on April 11, 2007

Best answer: A question:

For any given actually-adjascent pair of zones, are the vertices that should be identical actually identical?

If not, can we presume you can do some post processing to identify logical pairs/sets of "identical" vertices and fix 'em so they are, in fact, identical? So two points below a certain threshold of difference, say, would be updated to all be the same.

Assuming yes to either the former or the latter, you've essentially got zones defined as polygons built out of a limited, common set of vertices: so, if you have two adjascent zones, they've got one-or-more edges exactly in common.

One method of creating a larger polygon would be to combine the lists of edges/vertices of all the zones in your set, and then remove any duplicate edges. If all your polygons are drawn clockwise (or counterclockwise), you can search each pair of vertices in your combined list for both V1-V2 and V2-V1: if you find a match, that's a shared internal edge, and can be nixed.

After that, you've got all the edges of your combined polygon. I think. It's been a while.
posted by cortex at 7:57 AM on April 11, 2007

Do you just want the convex hull of the set of points that forms the boundary of each of your zip code regions? That should enclose all the zip code regions (and possibly some extra depending on how the points are distributed).

I presume you want to take as input any old sets of regions and spit out the contiguous region, so doing it by hand for the particular set wouldn't be useful.

You might be able to try ordering the coordinates, say lexicographically, and then traversing them in some good way, but I'm not sure that would get you waht you want.
posted by leahwrenn at 7:58 AM on April 11, 2007

I don't think it's a convex hull problem—it sounds like Doofus Magoo could easily end up with a non-convex final perimeter.

(For that matter, my basic method may leave a big gotcha in the case of desired donut-holes within a containing outer perimeter.)
posted by cortex at 7:59 AM on April 11, 2007

I agree with cortex. If your polygons are expressed as sets of pairs of points, put all the pairs of points into a list, throw the duplicated pairs away, and draw a polygon with the remaining pairs.

The convex hull of your points isn't quite what you're looking for, because zip codes sometimes have concave edges.
posted by Mr. President Dr. Steve Elvis America at 8:04 AM on April 11, 2007

cortex, even if there are donut holes, I think your method will draw the right lines. The question just becomes how to fill it correctly. The module he's using might already be smart enough.
posted by Mr. President Dr. Steve Elvis America at 8:07 AM on April 11, 2007

Response by poster: The convex hull is a good lead; I'll look into it. I should point out that the borders have to be identical to the zip codes which comprise the cluster, so if there's any rounding that takes place as part of calculating the convex hull, that won't work.

One not-insignificant complication is that the polygon boundaries are not in fact perfect -- consider two zip codes that share a common border which is a river. In many cases, each zip code goes to the river's edge, while the river itself is "unallocated." In addition, one zip code's river border might be made up of 3 vertices, while the adjacent zip code's river border might be made up of 30 vertices.

There is only one "doughnut-shaped" zip code, but it's being clustered with the zip code that makes up the "hole," so I can adjust that one manually if necessary.

Traversing the combined sets of coordinates for each polygon in the cluster to find out which are the "outermost" vertices, as leahwrenn suggested, sounds like what I thought I was going to have to do, but I'm not really sure how to go about that (distance from the centroid, maybe, but in order to calculate the centroid I imagine I need the combined coordinate set... but maybe not).
posted by Doofus Magoo at 8:07 AM on April 11, 2007

You can't just eliminate duplicates because the sides might not match up. You could have a poky-out part of one jabbed into the side of another.

However, you do a little pre-processing and then use cortex's suggestion. first check if any vertices from polygon A are inside of polygon B. If so, truncate polygon A so it abuts B the way cortex assumes.
posted by DU at 8:09 AM on April 11, 2007

Sorry, to keep posting, but it strikes me that cortex's method fails if the "corner" of one zip code rests on the "edge" of another.

The way to fix this would be to run through all your sets of points, determine if any point lay on another line, then split that line into two lines, each terminated by that point and one of the points of the original line. Then I think cortex's method applied to that works.
posted by Mr. President Dr. Steve Elvis America at 8:09 AM on April 11, 2007

Sorry, got caught up in the math. Zipcode regions aren't going to poke into each other. NM.
posted by DU at 8:10 AM on April 11, 2007

The only thing I've thought of would be to actually draw in a temp image each zip code as a filled polygon, and then do a pixel-by-pixel analysis of the resultant image

If this is not real-time and you don't want to worry about the math...

1. Draw a temp image with each zip as a GREEN filled polygon on a WHITE background.
2. Create thousands (experiment with how many you need) of RED pixels randomly on the white background. If they "fall" on a GREEN pixel, try again.
3. Loop over each red pixel, randomly moving it in one of the 8 pixel directions.
4. If it "lands" on a white pixel, keep going. If it lands on a green pixel, you'v found a border pixel.

In essence, you're "feeling" out your border rather than calculating it. I'm lazy. Thats how I'd do it.
posted by vacapinta at 8:42 AM on April 11, 2007

Response by poster: Following up on cortex's solution to traverse the combined coordinate set and look for identical sets (or at least those in very close proximity to each other), the (possibly intractable) problem I've now encountered is that the particular Perl module I'm using (GD) draws polygons in the order in which the points are added, so while I'm getting very close to having a nice border around the cluster, there are crazy lines going across the cluster connecting the "last point' of one polygon to the "first point" of the next polygon. I think I'd have to add the vertices in their "natural order" traversing the cluster's polygon clockwise (or counterclockwise).

Put another way, if I create a polygon with points added in the following order:

[0,0] , [1,1], [1,0], [0,1], [0,0]

... it will create an X with two closed sides, not a square.

And vacapinta, what's the advantage of that method versus looping through all possible pixels using a pair of FOR loops?
posted by Doofus Magoo at 9:26 AM on April 11, 2007

Zipcode regions aren't going to poke into each other.

As mentioned earlier you do need to watch out for numerical precision issues clouding the topology. For one polygon to have a corner actually on another's edge could require infinite precision in some cases.
posted by StickyCarpet at 9:29 AM on April 11, 2007

Best answer: Righto, StickyCarpet. I fudged it by calculating the distance between the two points, and if < 2 pixels, then delete the pair.
posted by Doofus Magoo at 9:34 AM on April 11, 2007

Is this something you're going to be doing a lot of? The PostgreSQL database has a highly-regarded extension called PostGIS (free, open source, blah blah blah), which supports geographic calculations exactly like yours.

It's a very efficient 2D geometry engine that's generally used for points, lines, and polygons in lat/long coordinates. A friend showed me how he could use a single query to retrieve "boundaries of all U.S. census tracts with at least 50% of their area contained within 1/2 mile of a given railroad right-of-way". A few lines of Perl got the results into KML and onto Google Earth. Scary.

If not, the best "high level" way of doing this I've seen is to start by chopping your areas up into lots and lots of triangles. It's easy to figure out overlaps between triangles.
posted by migurski at 10:23 AM on April 11, 2007 [2 favorites]

Best answer: The advantage of vacapincta's method is that it's a clever heuristic that doesn't break on the same things (or require the same set of hoopjumps) as the vertex method. It's rather elegant, but it has its own strengths and weaknesses and so may not work that great for your specific application.

I think I'd have to add the vertices in their "natural order" traversing the cluster's polygon clockwise (or counterclockwise).

Yeah. One possible method:
- Combine your edge data into the Big List;
- Start traversing clockwise along the edges of some known-outer-edge polygon;
- Each edge you traverse to, check for duplication. If you don't find it, add it to the New List. If you do find a dupe, delete the pair and traverse next to the next clockwise edge of the polygon that owns the other half of the duplicate.

When you're done, New List should have a happy in-order outline. Again, this may fail or need extra passes at least for donuts, since I think it'd only traverse a continuous perimeter.
posted by cortex at 10:28 AM on April 11, 2007

Response by poster: Thanks a bunch, cortex. I haven't had a chance to test it out yet, but that sounds like a pretty rock-solid method of re-creating the clustered polygons in "natural" order.
posted by Doofus Magoo at 5:19 AM on April 15, 2007

« Older What to do when someone wants you to do something...   |   Ideas for innovative uses of technology in higher... Newer »
This thread is closed to new comments.