# Algorithm Challenge!

December 16, 2007 6:35 PM Subscribe

Algorithmfilter: I have a set of tasks and a set of workers. Tasks vary in size. Each task can be done by only some of the workers. How can I find a mapping from tasks to workers that minimizes the load on the the heaviest-loaded worker, and assigns all the work?

Here's an example in case it's not totally clear:

Tasks:

Task A: 2 hours

Task B: 5 hours

Task C: 3 hours

Task D: 2 hours

Task E: 7 hours

Task F: 7 hours

Task G: 4 hours

Worker 1 is able to do A, B, C

Worker 2 is able to do B, C, D

Worker 3 is able to do C, D, E

Worker 4 is able to do D, E, F

Worker 5 is able to do E, F, G

Worker 6 is able to do F, G, A

Worker 7 is able to do G, A, B

Here's one mapping that this not optimal (because there are other obvious mappings that have the heaviest-loaded worker doing less than worker 2's 10 hours of work):

1 -> (A)

2 -> (B, C, D)

3 -> (E)

4 -> (F)

5 -> (G)

6 -> ()

7 -> ()

Can the optimal mapping be computed in polynomial time? How about an approximation?

I feel like this must be analagous to a combinatorial optimization problem; it's similar to multi-commodity max flow, or the assignment problem, but neither is a good fit as far as I can tell. I'm pretty sure the objective function is non-linear, which is probably what's killing me. I'm thinking that simulated annealing may be the way to go (randomly swapping assignments on each iteration), but I'd rather not resort to that.

Here's an example in case it's not totally clear:

Tasks:

Task A: 2 hours

Task B: 5 hours

Task C: 3 hours

Task D: 2 hours

Task E: 7 hours

Task F: 7 hours

Task G: 4 hours

Worker 1 is able to do A, B, C

Worker 2 is able to do B, C, D

Worker 3 is able to do C, D, E

Worker 4 is able to do D, E, F

Worker 5 is able to do E, F, G

Worker 6 is able to do F, G, A

Worker 7 is able to do G, A, B

Here's one mapping that this not optimal (because there are other obvious mappings that have the heaviest-loaded worker doing less than worker 2's 10 hours of work):

1 -> (A)

2 -> (B, C, D)

3 -> (E)

4 -> (F)

5 -> (G)

6 -> ()

7 -> ()

Can the optimal mapping be computed in polynomial time? How about an approximation?

I feel like this must be analagous to a combinatorial optimization problem; it's similar to multi-commodity max flow, or the assignment problem, but neither is a good fit as far as I can tell. I'm pretty sure the objective function is non-linear, which is probably what's killing me. I'm thinking that simulated annealing may be the way to go (randomly swapping assignments on each iteration), but I'd rather not resort to that.

NP-complete.

Here's a reduction from PARTITION. Given a partition problem, create an instance of your problem with two workers and one task of length n for each integer n in the multiset to be partitioned. Suppose your algorithm can find the optimal assignment in polynomial time. Then we can easily check whether that assignment divides the work exactly evenly. If so, then the multiset can be partitioned; if not, it can't. Thus, whatever poly-time algorithm can solve your problem can also provide a a poly-time solution to PARTITION.

So the bad news is that you shouldn't go looking for a polynomial-time exact solution. The good news is that you might still be able to find a reasonable approximation algorithm.

posted by grimmelm at 7:08 PM on December 16, 2007 [1 favorite]

Here's a reduction from PARTITION. Given a partition problem, create an instance of your problem with two workers and one task of length n for each integer n in the multiset to be partitioned. Suppose your algorithm can find the optimal assignment in polynomial time. Then we can easily check whether that assignment divides the work exactly evenly. If so, then the multiset can be partitioned; if not, it can't. Thus, whatever poly-time algorithm can solve your problem can also provide a a poly-time solution to PARTITION.

So the bad news is that you shouldn't go looking for a polynomial-time exact solution. The good news is that you might still be able to find a reasonable approximation algorithm.

posted by grimmelm at 7:08 PM on December 16, 2007 [1 favorite]

Best answer: This is a very classic problem in operations research called the job-shop schedule problem. It is NP-hard, and is a discrete, combinatorial optimization problem. Depending on how large a problem you want to throw it at, you can probably use a standard search via brute-force, a heuristic method, or a metaheuristic method such as hill-climbing, simulated annealing, or evolutionary algorithms.

http://en.wikipedia.org/wiki/Job-shop_problem

In this situation, your Workers on the Machines, Your Tasks are the Jobs

posted by miasma at 8:09 PM on December 16, 2007

http://en.wikipedia.org/wiki/Job-shop_problem

In this situation, your Workers on the Machines, Your Tasks are the Jobs

posted by miasma at 8:09 PM on December 16, 2007

Response by poster: Looks like mine's actually the

posted by jewzilla at 9:29 PM on December 16, 2007

*flexible*job shop problem (because each job can be done only by a subset of machines). It's a little easier because there's only one operation per job, but my intuition is that it doesn't get me off the hook. Sounds like there's a lot of research on this to go through, but you pointed me in the right direction. Thanks!posted by jewzilla at 9:29 PM on December 16, 2007

I've had decent success using the auction algorithm on similar problems. It has the benefit of being nearly as easy to code as an annealing algorithm but without the fudge factor of "did I run it for long enough and/or add enough randomness?"

posted by breath at 9:37 PM on December 16, 2007

posted by breath at 9:37 PM on December 16, 2007

Can you use the simplex algorithm? I took a class in Linear Programming, and we had to solve problems like this and we used the simplex algorithm. Or am I way off?

posted by philomathoholic at 10:33 PM on December 16, 2007

posted by philomathoholic at 10:33 PM on December 16, 2007

I've had this problem for years. I've tried and failed to find an app to help me with it, and I've thought about trying to build one. For me, the domain is scheduling actors for rehearsals. I have X number of actors. I'm not paying them, so I don't want to waste their time. On a given night, there are Y scenes I could rehearse. By the end of the rehearsal period, I need to have rehearsed them all, but I don't have to rehearse them in order.

Not all actors are in all scenes.

Many actors have conflicts on specific days. Actor 1 can never work Tuesday nights. Actor 2 is generally free except for two specific dates.

I can't figure out a reasonable way to make sure all the scenes get rehearsed in the most efficient way possible. Sometimes I have an actor that I'm thinking about casting. He has a bunch of conflicts. I'm not sure whether or not I can fit him into the production. Will his conflicts, when added to someone else's, mean that there are certain scenes that I won't be able to rehearse at all?

I'll be following this thread with interest.

posted by grumblebee at 7:47 AM on December 17, 2007

Not all actors are in all scenes.

Many actors have conflicts on specific days. Actor 1 can never work Tuesday nights. Actor 2 is generally free except for two specific dates.

I can't figure out a reasonable way to make sure all the scenes get rehearsed in the most efficient way possible. Sometimes I have an actor that I'm thinking about casting. He has a bunch of conflicts. I'm not sure whether or not I can fit him into the production. Will his conflicts, when added to someone else's, mean that there are certain scenes that I won't be able to rehearse at all?

I'll be following this thread with interest.

posted by grumblebee at 7:47 AM on December 17, 2007

Response by poster: Breath: thanks, I'll check that out.

philomathoholic: I don't think the objective function is linear, so simplex won't cut it

posted by jewzilla at 7:49 AM on December 17, 2007

philomathoholic: I don't think the objective function is linear, so simplex won't cut it

posted by jewzilla at 7:49 AM on December 17, 2007

The specific version of job-shop scheduling you're looking at is known as "minimum makespan" scheduling. Classically, this problem comes in two flavours: 1) "identical machines", where each job takes the same amount of time on each machine and 2) "idependent machines", where they don't. Your restriction that only some jobs can be performed by some machines seems too place your version somewhere between these two versions but probably closer to the identical machines version.

A brief Google search seems to suggest that the simple greedy heuristic (due to Graham) of assigning the longest remaining job to the least-loaded worker (aka "least processing time (LPT) scheduling") should provide a good starting point. In the independent machines case, this scheme provides a (4/3)-approximation, (i.e.: the solution it provides is guaranteed to be no worse than 4/3 times the optimal solution). See this (ps file), for a proof. I haven't checked to see what guarantees can be proved in your case where some jobs can't run on some machines.

Also, note that while the problem is NP-complete, there exists a polynomial-time approximation scheme for it, at least in the identical machines case. You can see a sketch of it here. You'll have to go through it in more detail to see if it can be adapted to the restrictions in your case.

posted by mhum at 6:26 PM on December 17, 2007

A brief Google search seems to suggest that the simple greedy heuristic (due to Graham) of assigning the longest remaining job to the least-loaded worker (aka "least processing time (LPT) scheduling") should provide a good starting point. In the independent machines case, this scheme provides a (4/3)-approximation, (i.e.: the solution it provides is guaranteed to be no worse than 4/3 times the optimal solution). See this (ps file), for a proof. I haven't checked to see what guarantees can be proved in your case where some jobs can't run on some machines.

Also, note that while the problem is NP-complete, there exists a polynomial-time approximation scheme for it, at least in the identical machines case. You can see a sketch of it here. You'll have to go through it in more detail to see if it can be adapted to the restrictions in your case.

posted by mhum at 6:26 PM on December 17, 2007

This thread is closed to new comments.

It does seem like the kind of optimization / resource allocation / operations-research problem that was studied extensively in the 50s and 60s but doesn't show up in textbooks much any more. Have you checked Knuth?

posted by hattifattener at 7:03 PM on December 16, 2007