Algorithm for accommodating multiple preferences about work groups?
November 25, 2015 3:58 PM   Subscribe

A person is asked to select five other people from among a larger group of, say, 90 people with whom they would particularly like to work. Each of the other 89 people does the same thing. An overseer is charged with forming three groups of 30 people each (give or take one or two people, if that makes it easier) that best accommodates the most preferences. What algorithm would the overseer use?

This is frankly mind buggering, but it also seems like the sort of problem about which smart people would say something like 'oh, you just pipe the output of a linked list into a Bayesian sieve, and here are five lines of python that do exactly that'.
posted by obiwanwasabi to Science & Nature (19 answers total) 1 user marked this as a favorite
 
didn't someone ask this like a month ago? it's a variant of the stable marriage problem. in this case, the stable room-mates problem.

i think.
posted by andrewcooke at 4:03 PM on November 25, 2015


Stable marriage/roommates is about finding a single solution that satisfies a set of constraints; this problem is an optimization problem (satisfy the most preferences). You could certainly express it as an integer programming problem, but I wouldn't be surprised if the problem is big enough that finding the absolute optimal solution is unreasonable.
posted by dfan at 4:47 PM on November 25, 2015


My wild-ass guess is cluster analysis, used in ecology for grouping sample units with similar communities / environment. I'd make a big matrix with rows and columns for each person (analagous to a presence/absence matrix of species at sites).

There are some other multivariate statistical methods that might work, too, but none of these inherently make even-sized groups, you'd get a graph of points mapped to the best solution (can use more than 2D if you need to) and work from there.
posted by momus_window at 5:04 PM on November 25, 2015


I think cluster analysis is going to be tough. For clustering, you'd need some n-dimensional space where each point represents a person, and then you'd find the three good clusters using n-dimensional Euclidean distance as your metric. But here there's no good n-dimensional representation of your sample data (that I can think of), since the data is all just of the form "person A likes person B".
posted by dfan at 5:17 PM on November 25, 2015 [1 favorite]


This sounds like stable roommates with incomplete preference lists (and possibly with ties, it's not clear to me). Stable marriage and stable roommates both have the potential for multiple stable solutions for a given problem instance and the set of all solutions has a lattice structure on which you can place a measure of your choice. You could then find the optimal solution for your measure and come up with a result that is both optimal and stable. Unfortunately, stable roommates problems are not guaranteed to have a stable solution (unlike stable marriage problems), but it's even less likely given the incomplete preference lists of the stated problem. So yeah, I wouldn't go that route.

I actually don't think you need to find the optimal solution, you just need to find a solution that shows you considered their preferences for partners. The following is by no means the most elegant solution or will find an optimal (or possibly near optimal) solution, but it would be quick and easy and is easily explainable. Randomly assign each person to one of three groups and score the solution by the sum of the number of preferred partners each person has in their group (you could additionally weight these scores if the preference lists are ordered). Run this as many times as you are willing to wait for. After each run, compare the current solution to the current best (based on solution score), and keep whichever is better. In the end you'll be able to say you looked at thousands of possible groupings and are using the best of those groupings.
posted by noneuclidean at 5:19 PM on November 25, 2015 [3 favorites]


If you want a fast solution that is pretty good but not optimal, here's something you can code up quickly:
1. Divide your population into three groups of 30 randomly.
2a. Choose two out of your three groups at random.
2b. Choose a random person from each of those two groups.
2c. See if swapping them would increase the global happiness, and swap them if so.
3. Repeat step 2 a million times.

It sounds stupid but it'll get you something reasonable. I am pretty sure that finding the actual optimal solution is NP-hard.
posted by dfan at 5:22 PM on November 25, 2015 [3 favorites]


I'm guessing if you mapped these out in a directed graph on paper, you would see some large clusters and chains (A likes B and C, both B and C like A, D likes A and B) and that some the clusters would be mostly disjoint (E doesn't like A, B, C, or D). You could probably get pretty close just by eyeball, dumping whole "friend groups" into one of the three sets roughly evenly.

Whether trusting people's own preference is a great idea is another question. Sometimes you want to break up the group to get any work done, and sometimes you have to intentionally defeat a clique-ish workplace. Might be a good start all other things being equal though.
posted by ctmf at 5:42 PM on November 25, 2015 [1 favorite]


Clustering.

Or you could brute force it.

You're looking for three self-selecting groups, maximizing the affinities in each group. At 90 people and five choices each, you're looking for the solution that maximizes the number of satisfied choices.

At this size, I think you could iterate through every possible group of 30, and total up the number of satisfied preferences in each, and find the globally optimum solution.

For each of 90 people, you have one of three possible groups, with an upper bound of 90 ^ 3 = 729000 possible assignments (actually less than this, since each group 30±2)
posted by zippy at 5:43 PM on November 25, 2015


For each of 90 people, you have one of three possible groups, with an upper bound of 90 ^ 3 = 729000 possible assignments
You have a choice between 3 alternatives being made 90 times, so it's 390 = 8.73 x 1042, not 903 = 729,000. You do get to divide by 6 because the ID of each group doesn't matter, but that doesn't come near to helping.
posted by dfan at 5:57 PM on November 25, 2015


Actually, because you need each group to be of size 30, you can throw out lots of those 390 alternatives. The total number of possible groupings is only (90 choose 30) * (60 choose 30) / 6 = 1.33 x 1040. Still way way too much to brute force, of course.
posted by dfan at 6:00 PM on November 25, 2015 [2 favorites]


While the suggestions of stable marriage/roommates problem are evocative, I'd like to advocate for a min-cut formulation.

Construct an undirected, edge-weighted graph G as follows. The set of vertices of G represents the set of people. Construct an edge between vertices X and Y if either person X wants to work with person Y or vice versa (or both); the weight of that edge is 2 if both people want to work with each other and 1 otherwise.

Then, I claim that the stated problem is equivalent to finding a partition of the vertices G into three parts such that: a) each partition is 1/3 of the total vertices and b) the sum of the weights of the edges that cross partitions is minimized. The basic observation is that maximizing the number of satisfied requests is the same as minimizing the number of unsatisfied requests.

Note that this problem differs from the standard, classical min-cut problem in two specific ways: 1) the size of each partition is prescribed (albeit loosely?), and 2) there are three partitions, not two. For the latter condition, we have the relatively standard min k-cut problem. In particular, here is a reference to a paper by Burlet and Goldschmidt (googlecache version of the paper) that provides an exact algorithm for the minimum 3-cut problem. However, I don't think it specifically addresses the equal size constraint. But, maybe this is enough to get you started for further research?
posted by mhum at 7:02 PM on November 25, 2015


This is ourside the parameters of your stated problem, but just an idea to consider... if you give everyone a set of votes but allow them to distribute as they wish, and allow them to also give negative votes (thus increasing their positive votes), you ll be more likely to get groups that meet people's desires. A side issue (benefit?) with this revised system is you can easily discover who the org would be better off without alltogether.
posted by Meatbomb at 7:22 PM on November 25, 2015


Ah yes, I got the math wrong.
posted by zippy at 8:07 PM on November 25, 2015


A relatively simple clustering approach seems like it could be a reasonable heuristic:

Pick three people to assign to the initial groups (ideally with minimal preference overlaps.) Then go to group 1 and find the person who increases the the number of connections in that group by the highest amount possible and add them. Repeat with group 2, then 3, and then 1 again until all people have been assigned.

This is obviously not going to be optimal. But it's the equivalent of each group taking turns to pick the next best person to work with from the remaining pool.
posted by mark k at 10:50 PM on November 25, 2015


Wow, lots of really great answers here. I came back to post some more potential solutions that I had thought of in the past several hours and see that they've all already been posted. Honestly, I think you can probably pick whichever one of these you like the best and can implement easily enough and It will be good enough for what you need. For the graph theoretic ones, I personally like the NetworkX module for Python. It has a lot of great built in graph functions and algorithms and has made implementing some graph-based optimization models I've done very easy.
posted by noneuclidean at 6:11 AM on November 26, 2015


I just wrote some Python code to run this algorithm on random data (yours is probably more structured if this is an actual problem and not theoretical).

Initial "happiness" (number of preferences met) was generally in the 140-160 range (you'd expect about 90 * 5 / 3 = 150, so that makes sense). Final happiness was usually in the 260s, and stopped increasing by the time it had run 100,000 iterations (total time about 3 seconds).

MeMail me if you'd like the code.
posted by dfan at 7:21 AM on November 26, 2015


I'm not nearly as smart as the other folks already here, but I haven't seen any mention of using a Borda vote. Simply have people rank their top X candidates, then MATH and boom you got your 30 people. Logistically, this might work better than some of the solutions above. (Although I now hope that someone will point out why it's no good here, or in fact identical to something already proposed.)
posted by intermod at 7:44 AM on November 26, 2015


I'm not nearly as smart as the other folks already here, but I haven't seen any mention of using a Borda vote.
It's not clear to me how to characterize this particular problem in the form of a vote; we don't have a selection of candidates from which we need to choose a winner.
posted by dfan at 8:03 AM on November 26, 2015


Thanks all.

For background, schools in my area do a thing at the end of the year where they ask parents to nominate five classmates for their child. Teachers then apparently use MATH to turn this information into EVIDENCE-BASED DECISIONS that take account of COMMUNITY INPUT THANK YOU FOR PARTICIPATING IN THIS IMPORTANT SURVEY.

As a typical clueless dad I had no idea this was going on until my wife, who also works at a school, asked whether it would be possible to make an app that would make this easy. My first thoughts were 'are you crazy? That's impossible. Any teacher who says they can do this is full of it.' My second thoughts were 'you are not so smart'.

Looks like I might have been right anyway, if only by accident, unless they've got dfan's code.
posted by obiwanwasabi at 2:57 AM on December 11, 2015


« Older DIY Gift ideas for engineer employees?   |   The holiday question you've been waiting for: Cat... Newer »
This thread is closed to new comments.