October 26, 2008 8:26 AM Subscribe

Maximizing a function over orthogonal matrices, or: solving polynomial order 2 systems. Help me out or point me to a better forum / listhost.

I have a function f(T) of a matrix that I want to maximize, which is order two in the elements of T. Easy, partial derivatives gives you a linear system. Here's the catch; I also want T to be orthonormal (the dot product of the rows to be zero or one -- the extension of orthogonal to non-square matrices).

The obvious way to maximize a function with constraints is to tack on Lagrange multipliers g(T) = f(T) + sum{lambda[ij](Ti . Tj - delta(i,j) ) }, but now g(T) is order three in parameters, and partial derivatives are order two and I don't know how to solve that system generally.

The dimension of T can be huge, tens of thousands. Let's not suggest numerics unless there's an amazing trick to deal with the dimensionality.

Any ideas? Maximization subject to orthogonality seems like something that should have come up a lot and someone have figured out.
posted by a robot made out of meat to Science & Nature (15 answers total) 2 users marked this as a favorite

I have a function f(T) of a matrix that I want to maximize, which is order two in the elements of T. Easy, partial derivatives gives you a linear system. Here's the catch; I also want T to be orthonormal (the dot product of the rows to be zero or one -- the extension of orthogonal to non-square matrices).

The obvious way to maximize a function with constraints is to tack on Lagrange multipliers g(T) = f(T) + sum{lambda[ij](Ti . Tj - delta(i,j) ) }, but now g(T) is order three in parameters, and partial derivatives are order two and I don't know how to solve that system generally.

The dimension of T can be huge, tens of thousands. Let's not suggest numerics unless there's an amazing trick to deal with the dimensionality.

Any ideas? Maximization subject to orthogonality seems like something that should have come up a lot and someone have figured out.

Yes, it contains terms like T[1,2]*T[9,3] and T[4,7]^2.

posted by a robot made out of meat at 1:27 PM on October 26, 2008

posted by a robot made out of meat at 1:27 PM on October 26, 2008

1. Are you looking for solutions over the complex numbers or real numbers?

2. O(n, R) is compact, so if your function is real-valued, and continuous, (which it is since it is polynomial), then it attains a maximum. So that's good to know.

3. Singular may be able to solve systems of polynomial equations over C. I don't remember exactly what it can do, but it's worth checking out.

posted by metastability at 1:53 PM on October 26, 2008

2. O(n, R) is compact, so if your function is real-valued, and continuous, (which it is since it is polynomial), then it attains a maximum. So that's good to know.

3. Singular may be able to solve systems of polynomial equations over C. I don't remember exactly what it can do, but it's worth checking out.

posted by metastability at 1:53 PM on October 26, 2008

Real values only. Yeah, I know that it has a solution. Finding it is the trick.

Note: I'm willing to accept an approximate answer. I'd have to think about exactly how much tolerance I'm willing to accept from the peak or from my condition.

posted by a robot made out of meat at 3:51 PM on October 26, 2008

Note: I'm willing to accept an approximate answer. I'd have to think about exactly how much tolerance I'm willing to accept from the peak or from my condition.

posted by a robot made out of meat at 3:51 PM on October 26, 2008

Ok, I'm seeing the "Grobner basis method" come up in the documentation for Singular and Maple, so maybe this is The Way. It's activating long dormant parts of my brain responsible for abstract algebra and algebraic geometry, which hurts.

posted by a robot made out of meat at 4:12 PM on October 26, 2008

posted by a robot made out of meat at 4:12 PM on October 26, 2008

Real numbers are harder. You should probably abandon all hope of getting an exact solution. Numerical methods are a better bet.

posted by metastability at 5:15 PM on October 26, 2008

posted by metastability at 5:15 PM on October 26, 2008

You could directly parameterise the X set of all orthonormal matrices of size m x n. Then write f as a function of these parameters and minimise from there.

To parameterise the set X, you could think by analogy with parameterising the set of all pairs (m=2) of orthogonal unit vectors in R^3 (n=3). I think you'll end up with a bunch of products of cos(parameter) and sin(parameter).

I suppose the set X will have dimension (n-1) + (n - 2) + ... + (n - m + 1) = 1/2 (-2 - 3 m - m^2 + 2 n + 2 m n). You directly minimise over those parameters with no need for any constraints.

posted by hAndrew at 10:25 PM on October 26, 2008

To parameterise the set X, you could think by analogy with parameterising the set of all pairs (m=2) of orthogonal unit vectors in R^3 (n=3). I think you'll end up with a bunch of products of cos(parameter) and sin(parameter).

I suppose the set X will have dimension (n-1) + (n - 2) + ... + (n - m + 1) = 1/2 (-2 - 3 m - m^2 + 2 n + 2 m n). You directly minimise over those parameters with no need for any constraints.

posted by hAndrew at 10:25 PM on October 26, 2008

"You could directly parameterise the X set of ..." should be "You could directly parameterise **the set X of** ..."

posted by hAndrew at 10:26 PM on October 26, 2008

posted by hAndrew at 10:26 PM on October 26, 2008

Thanks everyone, I'm pursuing these strategies in parallel after midterms.

The Grobner or Janet basis method is appealing, but I have some apprehension since members on the forums claim trouble with MUCH smaller ideals than the ones I'd be working with.

It's true that I have an analytic gradient and hessian, so it's not inconceivable that straight numeric optimization over g(T) could work even in high dimension. I'm skeptical, but it's worth a shot. Since I have definable tolerance for suboptimal solutions, I might have a hope.

It's true that the rows of an orthonormal matrix are orthogonal elements of the hypersphere, which is defined by the right choice of orienting angles. I'll look into rewriting my matrix in angular form. This would give me advantages in numeric and statistical methods of optimization since I've plugged into the reduced space rather than the larger space implied by the Lagrange method. I'll have to see if the many frequency domain methods will give me anything analytic here. The disadvantage is the loss of computational simplicity of the function, which will now be a sum of products of trig functions rather than monomials.

posted by a robot made out of meat at 5:51 AM on October 28, 2008

The Grobner or Janet basis method is appealing, but I have some apprehension since members on the forums claim trouble with MUCH smaller ideals than the ones I'd be working with.

It's true that I have an analytic gradient and hessian, so it's not inconceivable that straight numeric optimization over g(T) could work even in high dimension. I'm skeptical, but it's worth a shot. Since I have definable tolerance for suboptimal solutions, I might have a hope.

It's true that the rows of an orthonormal matrix are orthogonal elements of the hypersphere, which is defined by the right choice of orienting angles. I'll look into rewriting my matrix in angular form. This would give me advantages in numeric and statistical methods of optimization since I've plugged into the reduced space rather than the larger space implied by the Lagrange method. I'll have to see if the many frequency domain methods will give me anything analytic here. The disadvantage is the loss of computational simplicity of the function, which will now be a sum of products of trig functions rather than monomials.

posted by a robot made out of meat at 5:51 AM on October 28, 2008

Is your function concave? Maximizing a non-concave function usually scales exponentially.

It depends on your function but it's possible that with a few assumptions/restrictions (symmetry, positive-definite-ness, concavity) you could solve this using a convex programming approach, which allows for the restriction that you are optimizing over the space of symmetric orthogonal matrices.

posted by onalark at 7:54 AM on October 31, 2008

It depends on your function but it's possible that with a few assumptions/restrictions (symmetry, positive-definite-ness, concavity) you could solve this using a convex programming approach, which allows for the restriction that you are optimizing over the space of symmetric orthogonal matrices.

posted by onalark at 7:54 AM on October 31, 2008

So, it seems that defining orthogonal vectors on the hypersphere is less easy than I thought. However, you can optimize over skew-symmetric matrices and use the Cayley transform to project into SO(n). Tomorrow I'll see if there's an easy extension for non-square matrices.

posted by a robot made out of meat at 5:31 PM on November 2, 2008

posted by a robot made out of meat at 5:31 PM on November 2, 2008

Also, I am working to see if I can write my orthonormal matrix as the product of householder or givens matrices in an easy way.

posted by a robot made out of meat at 5:40 PM on November 2, 2008

posted by a robot made out of meat at 5:40 PM on November 2, 2008

Do you mean you couldn't find forms for them as a function of the 1/2 (-2 - 3 m - m^2 + 2 n + 2 m n) parameters, or that you could but something else proved difficult after that?

posted by hAndrew at 10:26 PM on November 2, 2008

Yes, writing them as a clean function of the reduced parameter space wasn't clear. That's what I'm hoping to do with the Cayley transformation.

posted by a robot made out of meat at 7:16 AM on November 3, 2008

posted by a robot made out of meat at 7:16 AM on November 3, 2008

For anyone following,

I still don't have a great proof that k-row skew-symmetric matrices inject into the space of k-row special orthonormal matrices, but I have some appealing statistical results. Namely, that out of many random 7x7 orthogonal matrices, if I trim the first k rows I seem to be able to (numerically) find k-row skew-symmetric matrices which Caley transform arbitrarily close to the target.

I may do this for the Givens rotator method of generating orthogonal matrices also; ie can I just use fewer angles and inject into the desired space.

posted by a robot made out of meat at 5:16 AM on November 5, 2008

I still don't have a great proof that k-row skew-symmetric matrices inject into the space of k-row special orthonormal matrices, but I have some appealing statistical results. Namely, that out of many random 7x7 orthogonal matrices, if I trim the first k rows I seem to be able to (numerically) find k-row skew-symmetric matrices which Caley transform arbitrarily close to the target.

I may do this for the Givens rotator method of generating orthogonal matrices also; ie can I just use fewer angles and inject into the desired space.

posted by a robot made out of meat at 5:16 AM on November 5, 2008

This thread is closed to new comments.

_{ij}, you also add an additional equation. So the system should still be, in principle, solvable.When you say that the function is order two in the elements of T, do you mean that it is essentially a polynomial of degree 2 in n

^{2}variables?Honestly, I'm not sure that there is a better way to go about this. Setting each of your partials to zero results in the intersection of n

^{2}hyperplanes inR^{n2}. In general position, these should all intersect in a single point.Of course, this doesn't answer your question. I'm largely brainstorming, but coming up with nothing helpful.

posted by vernondalhart at 10:59 AM on October 26, 2008