Maximum sum of numbers, one from each set, with constraints
June 23, 2014 10:16 PM   Subscribe

Working on a personal project, I am running into a number of math problems of the kind described within. I am not a math expert, so I don't know what to call these kinds of problems, so I don't know how to search for information about them.

Here is an example of the kind of problem I am trying to find the answer to.

There are ten sets of numbers with three numbers each (X, Y, Z).
I want to find the greatest possible sum of (for example) 4 Xs, 2 Ys, and 1 Z
BUT: (for example) If an X is chosen from a set, the Y and Z value from that set cannot be chosen.

My questions are:
Is there a name for these kinds of problems?
Is there a general method to make sure I always find the greatest possible sum, that will work even when the details change? Sometimes I have more sets, sometimes there are more numbers within each set, sometimes more than one number can be taken from inside a set, each problem is different, but I am always trying to calculate the greatest possible sum.

If I haven't been clear, please ask questions, and I will do my best to explain.
posted by moonroof to Grab Bag (11 answers total) 2 users marked this as a favorite
The most brain-dead, but guaranteed accurate way possible would be to sum every possible permutation and sort.
posted by empath at 10:30 PM on June 23, 2014

I think these are generally called optimization problems, but that's a name that covers a lot of ground. Still, it might be a useful search term for you.

I would see if dynamic programming is applicable to your problems. It's a technique that's fallen into half-obscurity, but it's really good at certain classes of search or optimization problems.
posted by hattifattener at 10:31 PM on June 23, 2014

The most brain-dead, but guaranteed accurate way possible would be to sum every possible permutation and sort.

That approach works if the number of sets is small, but it doesn't scale well as the number of sets grows. (It scales at least as the square, and it may scale as the factorial.)

As a more general solution, this would be amenable to a genetic algorithm. It can't guarantee to find God's Answer, but it would find one of the best within a manageable amount of time.
posted by Chocolate Pickle at 10:47 PM on June 23, 2014

That approach works if the number of sets is small

Well his example problem was small.
posted by empath at 10:54 PM on June 23, 2014

Sorry, but why wouldn't you just choose the largest number from each set and sum those numbers?
posted by ethidda at 12:50 AM on June 24, 2014

I think it is because you only have a certain budget of X's Y's and Z's so you can't just choose the biggest numbers.
(49995, 50003, 50004)
(15, 35, 45)
(22, 26,18)

So if you could have only 1 X, 1 Y and 1 Z
you'd end up picking the smallest number in Set 1 for the optimal result?
posted by Just this guy, y'know at 4:42 AM on June 24, 2014

What do you need this for? By far the simplest way of doing this sounds like writing a bit of code that finds the brute force solution a la empath. Is there a reason why you can't do that?
posted by Ned G at 5:00 AM on June 24, 2014

By far the simplest way of doing this sounds like writing a bit of code that finds the brute force solution a la empath. Is there a reason why you can't do that?

I don't know how to program. But this is not the first time the solution to a question of mine has been "learn programming," so maybe I should.
posted by moonroof at 8:48 AM on June 24, 2014

Dealing with any type of data these days requires at best moderate ability in statistical programming languages. Well, if you want to be efficient enough to produce final products that any academy or firm would care about. I ran into that problem myself having finished undergrad as an econ major who couldn't program. Since then I've taught myself a ton of programming in R/Stata, and it's really incredible how 10 lines of code can turn 10 hours of work into 20 minutes (but the first time it will turn 10 hours of work into 20 hours of coding :P)
posted by jjmoney at 9:18 AM on June 24, 2014

There are many freely-available solvers for combinatorial optimization problems, so you won't necessarily need to write your own. Each is typically written to solve a general class of problems specified in a particular input format. They will solve your problems exactly, and they are fast -- much faster than anything you would write yourself. If you can model your problem in a particular format, you can use an existing solver to solve it.

For your problems, Pseudo-Boolean Optimization (PBO) fits rather well (though you could model it as any NP-Hard optimization problem, with enough work). A recent competition evaluated more than 30 different solvers to determine which were most efficient, but it is likely that your problems are simple enough to not require the fastest solver out there.

The input format used by those solvers is formally specified as a BNF grammar, but there are many simple examples in that document as well. The "trick" for you is just to figure out how to model your problem as a set of Boolean variables and linear inequalities.

Let's take your example of 10 sets of (X, Y, Z) values. Give each value its own Boolean variable, so X1, Y1, Z1, X2, Y2, Z2, etc... Each variable is a Boolean (0/1, True/False) indicating whether the corresponding value is chosen or not. So if X1 is set to 1, that means you're using X from the first set in your sum. Your goal is to find an assignments for all of the variables that satisfies certain constraints and maximizes the corresponding sum.

Note that X1=1 implies Y1=0 and Z1=0, given that you can't choose more than one value from each set of three. This gives us a constraint for every set, in the form required for PBO solvers:

1 x1 +1 y1 +1 z1 = 1;

I.e., x1+y1+z1=1, or exactly one of X1, Y1, and Z1 is chosen. (Note that the format actually requires all variables to be x followed by an integer, but you can easily rename your variables to that form and keep a mapping back to the original meanings).

Then you have the constraint that you can take 4 X values, 2 Y values, etc. Each of those is a constraint itself:

1 x1 +1 x2 +1 x3 [...] +1 x10 = 4;
1 y1 +1 y2 +1 y3 [...] +1 y10 = 2;

And finally, you need to provide an objective function to maximize, which is where the actual numeric values come in. Here, I'll use the values (1,2,3) and (5,6,7) as the (X,Y,Z) values for the first two sets. Note that the PBO format only allows for minimizing an objective function, but we can just flip the signs:

min: -1 x1 -2 y1 -3 z1 -5 x2 -6 y2 -7 z2 [...] ;

I've left out a few other details of the format, such as the fact that the objective function has to come first, but the format document and its examples should cover it. And caveat emptor, I may have gotten something wrong in the model I've suggested.

So once you have a text file containing your problem instance encoded in that way, you just have to give it to a solver. Most solvers will output the optimal assigment to your variables, and you can then quickly translate that back to the set of values that give you your maximum sum.

The solvers are typically written as unix command-line programs and provided as source code, but SCIP, for example, provides binaries for Windows, Mac, and Linux. You'll probably want to get the statically-linked version, as it should not rely on you having [as much] other software installed on your system to run. I'm afraid it will still have a bit of a learning curve, but it will be less than learning how to program. And fwiw, SCIP can do a lot more than just solve PBO problems; it's a general constraint optimization framework.

Feel free to memail me if you have questions.

And about learning to program... I strongly recommend learning how to program in general, as it is a useful skill across many different domains. And to write a brute-force solver for your problem here would be accessible to a beginner after getting the basics of programming figured out (I'd suggest Python, fwiw). Just beware that it won't scale to larger problem sizes. If you have just ten sets of values, then a simple approach should work fine. If you have fifty or one hundred sets of values, your brute force solver will likely never finish. Solvers like SCIP, on the other hand, are highly optimized by a bunch of researchers who have spent years of their lives developing efficient algorithms for solving these problems, and those solvers probably can handle much larger problems without much trouble.
posted by whatnotever at 11:18 AM on June 24, 2014 [2 favorites]

« Older London comedy club suggestion for fan of Mighty...   |   What do I do in Bangkok when I can't really DO... Newer »
This thread is closed to new comments.