# orthogonality isn't a SQL know-it-all after all, let's laugh at him

October 21, 2006 10:14 AM Subscribe

Help! Stumped by an SQL problem.

(posted on behalf of orthogonality, via MetaChat)

I think this is NP-complete, and I don't have the time to think it through.

I have two tables, call them places and people. Both people and places have a location ll. I have a function d( ll a, ll b ) that calculates the distance between locations.

I can create a cartesian join of all possible combination of people and places, thus:

select * from people a, places b

I can refine this into a view that calculates the distance between people and places:

create view distances as

select a.id as aid, a.name as aname, b.id as bid, b.name as bname, d( a.ll, b.ll) as distance

However, I can only assign a person to a place. I know how to find the place closest to a person:

select * from distances

group by bid

having min( distance )

But how do I find the total set of people-> place assignment such that the sum of all assignment distances are minimized, again given that any one person can only be assigned to one place?

I mean I want to select a "slice" of the cartesianed result set, such that people.id is unique in that slice and sum( distance ) for that slice is minimized, but what do I where or group by on?

Thanks.

(posted on behalf of orthogonality, via MetaChat)

I think this is NP-complete, and I don't have the time to think it through.

I have two tables, call them places and people. Both people and places have a location ll. I have a function d( ll a, ll b ) that calculates the distance between locations.

I can create a cartesian join of all possible combination of people and places, thus:

select * from people a, places b

I can refine this into a view that calculates the distance between people and places:

create view distances as

select a.id as aid, a.name as aname, b.id as bid, b.name as bname, d( a.ll, b.ll) as distance

However, I can only assign a person to a place. I know how to find the place closest to a person:

select * from distances

group by bid

having min( distance )

But how do I find the total set of people-> place assignment such that the sum of all assignment distances are minimized, again given that any one person can only be assigned to one place?

I mean I want to select a "slice" of the cartesianed result set, such that people.id is unique in that slice and sum( distance ) for that slice is minimized, but what do I where or group by on?

Thanks.

Can multiple people be assigned to the same place? If so, I can't see why this would be NP-complete. How is that different from simply assigning each person to the closest place? I assume I'm misreading this.

posted by gsteff at 10:36 AM on October 21, 2006

posted by gsteff at 10:36 AM on October 21, 2006

I think I have the same confusion as gsteff, but I'm assuming that for the problem to be "hard" you can't assign multiple people to the same place.

This is what I'm imagining your problem to be:

You have a hundred offices and ten sales managers. Each sales manager must be assigned to a different office, but will be required to visit all of them. What assignments will result in the least amount of total distance traveled?

Is that right or am I way off?

posted by justkevin at 10:51 AM on October 21, 2006

This is what I'm imagining your problem to be:

You have a hundred offices and ten sales managers. Each sales manager must be assigned to a different office, but will be required to visit all of them. What assignments will result in the least amount of total distance traveled?

Is that right or am I way off?

posted by justkevin at 10:51 AM on October 21, 2006

(For clarification I realize my description sounds like the traveling salesman problem, but assume that the sales manager comes back to his home office between trips.)

posted by justkevin at 10:57 AM on October 21, 2006

posted by justkevin at 10:57 AM on October 21, 2006

gsteff, no: "...given that any one person can only be assigned to one place..."

posted by Khalad at 11:02 AM on October 21, 2006

posted by Khalad at 11:02 AM on October 21, 2006

Oops, nevermind, I read that wrong. That doesn't answer your question.

posted by Khalad at 11:03 AM on October 21, 2006

posted by Khalad at 11:03 AM on October 21, 2006

Do all of the locations have to have someone assigned to them?

posted by jacobm at 11:04 AM on October 21, 2006

posted by jacobm at 11:04 AM on October 21, 2006

*I realize my description sounds like the traveling salesman problem, but assume that the sales manager comes back to his home office between trips.*

If so, that's a meaningless distinction in the context of NP-completeness.

I'm now reading it as assigning each person to one place, with the restriction that each place can be assigned at most one person. If so, I believe that's equivalent to deciding what order in which to assign people, which is

*O(n! * m)*(where n is #people, m is #places), which makes it NP-complete.

posted by gsteff at 11:09 AM on October 21, 2006

(This sounds like some combination of election observers and polling places.)

There's not an easy solution. Imagine ten places in a straight line.

1 2 3 4 5 6 7 8 9 10

Ten people are present: two live near the first place, one near the second place, one near the third place, and so on until the last place, which has no one living near it. Is it "better" to take the first two people and assign them to place 1 and 2, then assign everyone to the place one higher (which makes nine people each drive one place over), or is it better to assign nine people to the closest place to them and assign person 2 to place 10 (making one person drive nine places over)? The answer is, who the hell knows?

I think I would approach it like this: assign everyone to the closest place, even if that results in several people assigned to the same place. Now, check if there are any places with no people assigned. For all the places with no people assigned, check and see if there is a place nearby with multiple people assigned, and if so, calculate the distance from the empty place to all of the people assigned to the doubled-up place, find the closest person, and reassign them to the empty place.

Put some kind of max distance on it, so you're not assigning anyone to go 100 miles out of their way.

posted by jellicle at 11:17 AM on October 21, 2006

There's not an easy solution. Imagine ten places in a straight line.

1 2 3 4 5 6 7 8 9 10

Ten people are present: two live near the first place, one near the second place, one near the third place, and so on until the last place, which has no one living near it. Is it "better" to take the first two people and assign them to place 1 and 2, then assign everyone to the place one higher (which makes nine people each drive one place over), or is it better to assign nine people to the closest place to them and assign person 2 to place 10 (making one person drive nine places over)? The answer is, who the hell knows?

I think I would approach it like this: assign everyone to the closest place, even if that results in several people assigned to the same place. Now, check if there are any places with no people assigned. For all the places with no people assigned, check and see if there is a place nearby with multiple people assigned, and if so, calculate the distance from the empty place to all of the people assigned to the doubled-up place, find the closest person, and reassign them to the empty place.

Put some kind of max distance on it, so you're not assigning anyone to go 100 miles out of their way.

posted by jellicle at 11:17 AM on October 21, 2006

*(This sounds like some combination of election observers and polling places.)*

Nope. It's a gerrymandering problem.

posted by gsteff at 11:18 AM on October 21, 2006

Well, actually, election observation is more likely. And ortho shouldn't say either way. Sorry I can't help with the SQL. I'm shutting up now.

posted by gsteff at 11:19 AM on October 21, 2006

posted by gsteff at 11:19 AM on October 21, 2006

Ok, I was too loose in my specification: basically, each place must get assigned at least one person.

Ok, assume we have three people. Two of them are both: 1 mile from place 1, 2 miles from place 2, and three miles from place 3. The third person is one mile from place 1 and two miles from place 3.

In this case, I want to assign person three to place three (he's closest to it) and I want to assign either one person one or person to two to place one and place two.

If I assign everyone to the place he's closest to, I get everyone at place one, and nobody at places 2 and 3. My sum distance is 3 miles, but I've failed the problem.

What I want is person three at place 3, and persons one and two, in any order, at places 1 and 2. Then my sum distance is 4, but I've solved the problem.

Hope that's clear.

posted by orthogonality at 11:22 AM on October 21, 2006

**gsteff**writes*"How is that different from simply assigning each person to the closest place? I"*Ok, assume we have three people. Two of them are both: 1 mile from place 1, 2 miles from place 2, and three miles from place 3. The third person is one mile from place 1 and two miles from place 3.

In this case, I want to assign person three to place three (he's closest to it) and I want to assign either one person one or person to two to place one and place two.

If I assign everyone to the place he's closest to, I get everyone at place one, and nobody at places 2 and 3. My sum distance is 3 miles, but I've failed the problem.

What I want is person three at place 3, and persons one and two, in any order, at places 1 and 2. Then my sum distance is 4, but I've solved the problem.

Hope that's clear.

posted by orthogonality at 11:22 AM on October 21, 2006

If I'm understanding the problem correctly, it may be a minimal weighted bipartite-graph matching problem. If that is the case, then the problem isn't NP-complete. Anyway, the good news is there are algorithms for finding minimal bipartite matchings; the bad news is that you're not likely to be able to implement them as a SQL query.

posted by jacobm at 11:34 AM on October 21, 2006

posted by jacobm at 11:34 AM on October 21, 2006

I can't begin to think how you'd do this in SQL alone, but one approach would be to work out which person is furthest from their nearest place, cross the person and place off the list, and repeat.

posted by cillit bang at 11:38 AM on October 21, 2006

posted by cillit bang at 11:38 AM on October 21, 2006

This is not a single query sql solution, but how about this:

Find the sum (S) of total distances each person might have to travel. Each person bids "X" on a particular location where X is the difference of S and their distance to the location. So if I'm 50 miles from A, 20 miles from B and 10 miles from C (for a total 80 possible miles), I'll bid 30 on A, 60 on B and 70 on C.

The first assignment goes to the highest bid. Remove all remaining bids for that person and that location. Repeat.

posted by justkevin at 12:20 PM on October 21, 2006

Find the sum (S) of total distances each person might have to travel. Each person bids "X" on a particular location where X is the difference of S and their distance to the location. So if I'm 50 miles from A, 20 miles from B and 10 miles from C (for a total 80 possible miles), I'll bid 30 on A, 60 on B and 70 on C.

The first assignment goes to the highest bid. Remove all remaining bids for that person and that location. Repeat.

posted by justkevin at 12:20 PM on October 21, 2006

I'm probably missing something here, but it seems like you just assign each place the person closest to it, and then you assign any remaining people to the place closest to them.

I'm not sure what the most convenient way to calculate this is -- perhaps one call to stick the minimum distance to each place in a table, one call to find all the people with minimum distance to each place, and one call to find all the places with minimum distance to each person.

posted by inkyz at 1:18 PM on October 21, 2006

I'm not sure what the most convenient way to calculate this is -- perhaps one call to stick the minimum distance to each place in a table, one call to find all the people with minimum distance to each place, and one call to find all the places with minimum distance to each person.

posted by inkyz at 1:18 PM on October 21, 2006

I too doubt you can do this with a single SQL query. The way I would implement it would be iterative.

Start with the solution that puts each person at the location closest to them, even if that leaves some locations empty. Now find the set of locations that have been assigned more than one person, and collectively call the set of people assigned to these the set of spares.

For each location that has no one yet assigned, choose the one from the spare set that is closest and switch his assignment. Remove that person from the spare set, and if that leaves his old location now manned by only one person then also remove that person from the spare set, since he's no longer a spare.

Repeat for the remaining empty locations. It's not pretty, and I'm pretty sure it wouldn't find the optimal answer in every case but it should get you close, and in the worst case you'd only have to iterate through each location once so it shouldn't blow up algorithmically.

Alternatively if your data sets are small you could calculate all permutations of people and locations and then filter out solutions that have empty locations, and then sort by sum(distance). But any solution that starts with "calculate all permutations" really will not hold up to large amounts of data.

posted by Rhomboid at 8:47 PM on October 21, 2006

Start with the solution that puts each person at the location closest to them, even if that leaves some locations empty. Now find the set of locations that have been assigned more than one person, and collectively call the set of people assigned to these the set of spares.

For each location that has no one yet assigned, choose the one from the spare set that is closest and switch his assignment. Remove that person from the spare set, and if that leaves his old location now manned by only one person then also remove that person from the spare set, since he's no longer a spare.

Repeat for the remaining empty locations. It's not pretty, and I'm pretty sure it wouldn't find the optimal answer in every case but it should get you close, and in the worst case you'd only have to iterate through each location once so it shouldn't blow up algorithmically.

Alternatively if your data sets are small you could calculate all permutations of people and locations and then filter out solutions that have empty locations, and then sort by sum(distance). But any solution that starts with "calculate all permutations" really will not hold up to large amounts of data.

posted by Rhomboid at 8:47 PM on October 21, 2006

*but it seems like you just assign each place the person closest to it, and then you assign any remaining people to the place closest to them*

No, this won't work.

From Ortho's example (Δx = distance of Person to point x):

Person ΔA ΔB ΔC

Joe 1 2 3

Mary 1 2 3

Bill 1 1 2

If we assign the people based on the order they come (basically random depending on how they're entered in the database), station

**A**is manned by Joe, station

**B**is manned by Bill, and station

**C**is manned by Mary.

That means Joe has to drive 1 mile, Bill has to drive 1 mile, but Mary has to drive 3 miles. The minimum driving distribution would have either Joe or Mary at stations

**A**or

**B**(doesn't matter which) and Bill at station

**C**. Joe and Mary each drive one mile, Bill drives two.

posted by Civil_Disobedient at 8:50 PM on October 21, 2006

Whoops, posted before re-reading.

Joe drives 1 mile, Mary drives 2 miles, Bill drives 2 miles. This would make more sense if I put ΔC for Mary as something very large, like 10 miles. Then it's more obvious that Bill should man station

posted by Civil_Disobedient at 8:54 PM on October 21, 2006

Joe drives 1 mile, Mary drives 2 miles, Bill drives 2 miles. This would make more sense if I put ΔC for Mary as something very large, like 10 miles. Then it's more obvious that Bill should man station

**C**.posted by Civil_Disobedient at 8:54 PM on October 21, 2006

SQL might not be the right hammer for this nail, although I'm not sure I fully understand what is being asked [and if not, please blow this comment off as well intentioned, but not responsive]. What orthogonality seems to describe is a problem more suited to linear programming approaches, and I suggest it might well be better handled with something like the free O-Matrix and lp-solve framework, particularly if the data sets get to more than a few hundred points each.

But the optimization being sought determines much about the approach to the problem. If the goal is to minimize the total distance traveled by all persons to all points ("the sum of all assignment distances are minimized"), an LP approach might be good, but it doesn't guarantee that any or all volunteers will travel the shortest feasible distances, individually (in other words, this is

If you come at the problem as an excercise in formulating an optimal route for each volunteer, you have as many trivial (because the routes are 1 stop in length) Traveling Salesman Problems as you have volunteers, but you want to consider them as a set, I guess, recursively taking out early feasibles on logically consistent bounding rules, for both volunteers and locations. [The classic TSP solution research and solution packages I linked in my first link won't be of great help in finding these solutions, because the individual routes are trivial.] The effectiveness of doing selections by recursion is going to be largely that of the bounding rules applied in subsequent iterations. You'd want to do common sense things, like stopping generation of distances in your initial "create view distances" step for distances greater than 50 miles, if that is the maximum practical travel distance, to greatly limit your data set for subsequent iterations. And you might want to apply some hueristics next, if you have any locations where more than 1 person is required, for volume reasons, and/or block locations with lesser impact from early selections, if you think you won't have enough volunteers. After that, you set some distance limits based on location, not people, and recursively run for locations, marking out people and locations assigned on a "first found feasible" basis for increasing distance bounds, at each iteration. That's actually what some commercial route planning packages do for middling sized problems. You can certian obvious biases built-in to such selections, as when volunteers with names beginning in later alphanumeric sequence find themselves with the longer drives, but on average, where possible, you do generate shortest driving distances. And although this is theoritically an NP-complete problem, in practice, the distribution of people and places in the real world tends to "lump" feasible solutions pretty quickly, if your bounding rules are sane.

posted by paulsc at 4:09 AM on October 22, 2006

But the optimization being sought determines much about the approach to the problem. If the goal is to minimize the total distance traveled by all persons to all points ("the sum of all assignment distances are minimized"), an LP approach might be good, but it doesn't guarantee that any or all volunteers will travel the shortest feasible distances, individually (in other words, this is

*not*the Traveling Salesman Problem). That would be of interest to a campaign re-imbursing volunteers for mileage they incur in driving to polling places they might observe, as it will generate the lowest mileage cost to the campaign. But that globally best solution will probably not be the shortest distance for every individual driving to a polling place.If you come at the problem as an excercise in formulating an optimal route for each volunteer, you have as many trivial (because the routes are 1 stop in length) Traveling Salesman Problems as you have volunteers, but you want to consider them as a set, I guess, recursively taking out early feasibles on logically consistent bounding rules, for both volunteers and locations. [The classic TSP solution research and solution packages I linked in my first link won't be of great help in finding these solutions, because the individual routes are trivial.] The effectiveness of doing selections by recursion is going to be largely that of the bounding rules applied in subsequent iterations. You'd want to do common sense things, like stopping generation of distances in your initial "create view distances" step for distances greater than 50 miles, if that is the maximum practical travel distance, to greatly limit your data set for subsequent iterations. And you might want to apply some hueristics next, if you have any locations where more than 1 person is required, for volume reasons, and/or block locations with lesser impact from early selections, if you think you won't have enough volunteers. After that, you set some distance limits based on location, not people, and recursively run for locations, marking out people and locations assigned on a "first found feasible" basis for increasing distance bounds, at each iteration. That's actually what some commercial route planning packages do for middling sized problems. You can certian obvious biases built-in to such selections, as when volunteers with names beginning in later alphanumeric sequence find themselves with the longer drives, but on average, where possible, you do generate shortest driving distances. And although this is theoritically an NP-complete problem, in practice, the distribution of people and places in the real world tends to "lump" feasible solutions pretty quickly, if your bounding rules are sane.

posted by paulsc at 4:09 AM on October 22, 2006

This thread is closed to new comments.

"However, I can only assign a person to a place."should be "However, I can only assign a person toone singleplace.Also,

to give fair warning to potential answerers,the answer has a practical application which is partisan in nature. If you'd prefer not to aid such endeavors, you probably will not wish to help in answering this. I just don't want to take anyone by surprise.posted by orthogonality at 10:25 AM on October 21, 2006