Comments on: How can I best visualize the results of my combinatorial optimization problem?
http://ask.metafilter.com/104382/How-can-I-best-visualize-the-results-of-my-combinatorial-optimization-problem/
Comments on Ask MetaFilter post How can I best visualize the results of my combinatorial optimization problem?Thu, 16 Oct 2008 07:14:01 -0800Thu, 16 Oct 2008 07:14:01 -0800en-ushttp://blogs.law.harvard.edu/tech/rss60Question: How can I best visualize the results of my combinatorial optimization problem?
http://ask.metafilter.com/104382/How-can-I-best-visualize-the-results-of-my-combinatorial-optimization-problem
DataVisualizationFilter: How can I best visualize the results of my combinatorial optimization problem? I'm trying to solve an n-dimensional discrete optimization problem using some metaheuristic algorithms. More details behind the cut. <br /><br /> Imagine you have M resources (R1, R2, ... , Rm) and N tasks (T1, T2, ..., Tn). There are currently no constraints on which resources you can use for what task, or how many tasks a single resource can satisfy. Any given configuration will be evaluated on two objectives, cost and value. You're trying to minimize cost and maximize the value of your solution.<br>
<br>
Sample solution:<br>
<br>
R1: T1, T2, T3<br>
R2: T4<br>
R3: T5, T6<br>
R4: T7, T8, T9<br>
R5: T10<br>
<br>
This solution yields a Cost of .2 (on a 0-1 scale) and a Value of .8 (on a 0-1 scale).<br>
<br>
I can currently plot my solutions in objective space as a scatter plot where each axis is each objective and then solution is an (x,y) point in that space. I'm currently comparing a random sampling of solutions to my metaheuristic.<br>
<br>
I'm interested in other solutions to plot in the decision space. For example, if I do a random sampling of my possible solutions, do good solutions match some criteria? If R4 performs T7, does it matter who satisfies T10 or will I get around the same region in objective space? Are there building blocks of solutions (R4 performs T7, R5 performs T1, T2, T3) that yield the best results? Is one solution brittle because all of its neighbors quickly drop off in satisfying objectives while another solution is robust because its neighbors are more stable?<br>
<br>
Ideally, I'd setup axes for each dimension of the solution (each resource or each task) and then plot the solution against that axes, but since the values are discrete, there's no relationship to the order. Furthermore, n-axes is going to be mighty difficult to visualize.<br>
<br>
I feel like what I'm trying to do must be solved in either the data visualization field or statistics. Is there some clustering algorithm I could use? Is there a way to reduce the dimensionality? Something like k-means. nearest neighbor? The best euclidian distance I can have between two solutions is something like Hamming distance (how many matches there are).<br>
<br>
Guidance would be much appreciated!post:ask.metafilter.com,2008:site.104382Thu, 16 Oct 2008 05:07:24 -0800miasmadatavisualizationoptimizationalgorithmsBy: FastGorilla
http://ask.metafilter.com/104382/How-can-I-best-visualize-the-results-of-my-combinatorial-optimization-problem#1509925
Not a solution in the least but more of suggestion so you change your approach:<br>
<br>
I find it strange that you are using a scatter plot as a way to solve this. Normally for optimization problems, you use linear programming or non-linear programming optimization. http://en.wikipedia.org/wiki/Nonlinear_programming Random sampling will never prove an optimal solution after all. There are some free programs to download to do LP and NLP. <br>
<br>
If you are convinced you want to do this random sampling, you could always look for solutions that _dominate_ other solutions. By this I mean, solutions that are better in all evaluation criteria than others, the others are discarded and you continue until you are left with. If there is a boundary of solutions that are dominated/not dominated, I guess a really, really extensive random sampling might reveal this. On the other hand, this is just a real shot in the dark. <br>
<br>
I think that Nonlinear programming is your best bet.comment:ask.metafilter.com,2008:site.104382-1509925Thu, 16 Oct 2008 07:14:01 -0800FastGorillaBy: FastGorilla
http://ask.metafilter.com/104382/How-can-I-best-visualize-the-results-of-my-combinatorial-optimization-problem#1509930
Not a solution in the least but more of suggestion so you change your approach:<br>
<br>
I find it strange that you are using a scatter plot as a way to solve this. Normally for optimization problems, you use linear programming or non-linear programming optimization. http://en.wikipedia.org/wiki/Nonlinear_programming Random sampling will never prove an optimal solution after all. There are some free programs to download to do LP and NLP. <br>
<br>
If you are convinced you want to do this random sampling, you could always look for solutions that _dominate_ other solutions. By this I mean, solutions that are better in all evaluation criteria than others, the others are discarded and you continue until you are left with only the non-dominated. If there is a boundary of solutions that are dominated/not dominated, I guess a really, really extensive random sampling might reveal this. On the other hand, this is just a real shot in the dark. <br>
<br>
I think that Nonlinear programming is your best bet.comment:ask.metafilter.com,2008:site.104382-1509930Thu, 16 Oct 2008 07:16:48 -0800FastGorillaBy: sapere aude
http://ask.metafilter.com/104382/How-can-I-best-visualize-the-results-of-my-combinatorial-optimization-problem#1510031
Seconding FastGorilla; the first thing I thought of was linear programming as well. I can't speak for the nonlinear approach. The result of LP should be a shape or surface representing all possible solutions, and then it's just a matter of picking the vertex that represents the optimal combination of cost, value, whatever your objective functions are. <br>
<br>
Your comment about finding good classes of solutions also suggests <a href="http://en.wikipedia.org/wiki/Genetic_algorithm">genetic algorithms</a>, or letting good solutions "mate" and exchange elements of structure to see if it creates a great solution. The GA idea of schemas, or a family of related solutions, corresponds with your desire to see if a general class of solutions (such as R2 perfoming T3) regularly outperforms other solutions. If this is a LP type of problem, though, then I'd imagine GA is going to be a lot of work to set up for little payoff. In any case, good luck!comment:ask.metafilter.com,2008:site.104382-1510031Thu, 16 Oct 2008 08:47:57 -0800sapere audeBy: losvedir
http://ask.metafilter.com/104382/How-can-I-best-visualize-the-results-of-my-combinatorial-optimization-problem#1510060
Hmmm... doesn't it matter <i>how</i> the costs and values are arrived at? I would think the answers to your questions depend on whether it's just some black box you put a solution in and out pops a (Cost, Value), versus something where you see how each particular Resource-Task match-up contributes to the overall cost and value. For example:<br>
<br>
<i>If R4 performs T7, does it matter who satisfies T10 or will I get around the same region in objective space?</i><br>
<br>
I don't see how this could be known without more details about how the costs and values are calculated.comment:ask.metafilter.com,2008:site.104382-1510060Thu, 16 Oct 2008 09:09:30 -0800losvedirBy: miasma
http://ask.metafilter.com/104382/How-can-I-best-visualize-the-results-of-my-combinatorial-optimization-problem#1510272
Oh, I think I've confused some people.<br>
<br>
I agree with you all about how to solve the optimization problem. LP and NLP are definitely tasks. And it's actually a genetic algorithm-based approach that I'm using as a metaheuristic.<br>
<br>
The random sampling is just a way to compare this approach to other approaches.<br>
<br>
What I'm really interested in is <strong>visualizing the results</strong> of these generate and test algorithms. I'd like to see what sort of results have been generated in the decision space. The scatterplot gives me an idea of what it looks like in objective space.comment:ask.metafilter.com,2008:site.104382-1510272Thu, 16 Oct 2008 11:42:54 -0800miasmaBy: mhum
http://ask.metafilter.com/104382/How-can-I-best-visualize-the-results-of-my-combinatorial-optimization-problem#1510720
You might trying applying <a href="http://en.wikipedia.org/wiki/Factor_analysis">Factor Analysis</a> or the closely related <a href="http://en.wikipedia.org/wiki/Principal_components_analysis">Principal Component Analysis</a> to your decision space. You can specify each solution as a point in {1, 2, ..., m}<sup>n</sup>. The idea of these two techniques is to take your set of solutions as a cloud of points in n-dimensional space, rotate it around, and project it onto a lower-dimensional space (say, 2-dimensional) so that the lower-dimensional projection (or shadow) is somehow meaningful. Each axis in the lower-dimensional space would be a linear combination of the original dimensions (in your case, tasks). This may or may not have meaning for you, but these kinds of analyses are pretty standard.comment:ask.metafilter.com,2008:site.104382-1510720Thu, 16 Oct 2008 17:49:31 -0800mhumBy: miasma
http://ask.metafilter.com/104382/How-can-I-best-visualize-the-results-of-my-combinatorial-optimization-problem#1511179
@mhum<br>
<br>
I think this is more on the right track, thanks. Could you provide an example, perhaps? I'm still unclear as to how I can put discrete points on an axis. R1 cannot really be related to R2 in terms of T1. I think I'm just still confused about how to apply Factor Analysis or PCA to this sort of problemcomment:ask.metafilter.com,2008:site.104382-1511179Fri, 17 Oct 2008 08:09:38 -0800miasmaBy: mhum
http://ask.metafilter.com/104382/How-can-I-best-visualize-the-results-of-my-combinatorial-optimization-problem#1511445
On second thought, since your data is categorical, I'm not so sure that a straight-up PCA (in the formulation I proposed) would be the way to go here. Any assignment of these discrete variables (i.e.: resources) to an axis would be largely arbitrary. There appears to be some kind of version of PCA for categorical variables, but I don't know anything at all about it. There's also something called "joint correspondence analysis", but I know even less about that.<br>
<br>
However, we might be able to do something slightly different. Instead of trying to directly visualize the solution space, perhaps we can try visualizing <i>properties</i> of the solution space. For example, for each solution, we can form a vector (w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>m</sub>) where w<sub>k</sub> is the number of tasks assigned to resource k (or maybe w<sub>k</sub> is the total costs or total value associated with tasks assigned to resource k). By formulating the solution space this way, we get vectors in R<sup>m</sup> instead of some discrete space and thus you can apply the usual PCA-type analyses directly. Granted, you lose the task-specific information (e.g.: does task 1 always get assigned to resource 3?), but you might be able to tease out some better information this way. <br>
<br>
In any case, if your problem is more about visualizing your decision space rather than drawing analytic conclusions, there are maybe other less rigorous options. For example, let's say you wanted to compare some set of solutions to a single reference solution. Assign a color to each task according to which resource it was assigned to in the reference solution (maybe you can assign different gradations of the same color for different tasks assigned to the same resource). Then, draw each solution as a sequence of color strips, each strip corresponding to a resource, each entry on the strip corresponding to a task. This will only work if you don't have unmanageably many resources. If this is unclear (which I'm pretty sure it is) I might be able to whip up an example. <br>
<br>
Another possibility for visualization is <a href="http://en.wikipedia.org/wiki/Multidimensional_scaling">multidimensional scaling</a>. The idea of this technique is this: given a set of things and a similarity function between any pair of things, we want to plot the set in two (or maybe three) dimensions such that the Euclidean two-dimensional distance between any two plotted points more or less corresponds to the similarity function distance between the underlying things. In your case, the things are your solutions and the similarity function would be some kind of distance metric between partitions (a description of three such measures can be found <a href="http://cedric.cnam.fr/PUBLIS/RC405.pdf">here</a>; it looks like the <a href="http://en.wikipedia.org/wiki/Rand_index">Rand index</a> is the easiest). This kind of analysis can sometimes show clusters of things that wouldn't have been apparent otherwise. I know there exist (or existed) free software to do this because I've downloaded it before, but I'll be damned if I can find it again.comment:ask.metafilter.com,2008:site.104382-1511445Fri, 17 Oct 2008 11:25:23 -0800mhumBy: mhum
http://ask.metafilter.com/104382/How-can-I-best-visualize-the-results-of-my-combinatorial-optimization-problem#1511574
You might want to check out <a href="http://www.ggobi.org/">GGobi</a>. It looks like it has a bunch of tools for exploring high-dimension data sets.comment:ask.metafilter.com,2008:site.104382-1511574Fri, 17 Oct 2008 12:49:40 -0800mhum