Help me understand homogenous coordinates
October 22, 2009 7:37 PM   Subscribe

Help me understand homogenous coordinates!

I encountered homogenous coordinates in computer graphics, where I understand that they let us translate vertices. But this is a rather shallow level of understanding, and I don't really get why, for instance, the position of the light source in OpenGL is necessarily specified with (x, y, z, w). Is it kind of like how there is an infinite number of points that can correspond to a single point on a 2D image at (x, y), but by specifying a third dimension, we can disambiguate between the different points?

I've tried reading Googling and I've tried reading Wikipedia -- but to no avail. I'm really hoping that somebody in the hive mind can explain it at a level that I can understand, and also why homogenous coordinates are important on a more broad level. Thanks!
posted by tickingclock to Science & Nature (14 answers total) 4 users marked this as a favorite
Kind of pat answer: because they allow transformations to be expressed as matrices. (quick before geocities is deleted in 4 days!)
posted by Monday, stony Monday at 8:00 PM on October 22, 2009

Best answer: In a transformation matrix, the 1 [explicit or implicit] in the W position of a vertex is what gets mutiplied by the translation factors, to contribute the translation. If W is 0, then the translation portion of the matrix will not be applied. So, when the GL pipeline transforms the light position by the matrix which puts it into model space, if there is a 0 in the W component of the light's position, the translation of that matrix (i.e., the model's position) will have no effect on the light's position in space, so the light is effectively infinitely far away. This can be thought of as pretty much independent of the way W is used later in the pipe to control perspective division.

Homogenous coordinates seem to mathematically work out suprisingly well for computer graphics. Jim Blinn, in his excellent book "A Trip Down the Graphics Pipeline", devotes a chapter to talking about how neat they are.
posted by blenderfish at 8:02 PM on October 22, 2009

Homogeneous coordinates allow translations to be expressed in a 4x4 matrix as well as rotations and scaling (which can be expressed in a 3x3 ).

Once you do this you can stack up translations and use them all together and construct things like scene graphs. You can also implement 4x4 vector processors all over your graphics hardware.
posted by sien at 8:09 PM on October 22, 2009

This is the wikipedia article that explains the need for homogeneous coordinates. You can't represent 3D translation with a 3x3 matrix, but you can with a 4x4 matrix. In fact, you can represent all affine transformations in 3D with a 4x4 matrix. That's what makes homogeneous coordinates so useful.

For most uses of homogeneous coordinates, you just put a 1 in the w coordinate so everything works out OK. However, with specifying OpenGL lights, the w coordinate has a special meaning. w=0 means a directional light, while w=1 means a point light. This special usage is explained in the spec, but you can probably understand it better from the Red Book.
posted by demiurge at 8:32 PM on October 22, 2009 [1 favorite]

It all comes from Quaternions!
posted by Monday, stony Monday at 8:46 PM on October 22, 2009 [1 favorite]

The math for manipulating quaternions is quite different from that used for homogeneous coordinate (x, y, z, w) vectors. Although learning about quaternions can certainly be helpful for computer graphics, it will confuse anyone trying to get a basic grasp of homogeneous coordinates.
posted by demiurge at 9:21 PM on October 22, 2009

It's also not unheard-of to leave off the W (which is always 1) and the fourth column of the transformation matrix (which is always 0 0 0 1) and simply use a 3x4 affine transformation matrix, or for 2d graphics, a 2x3 matrix. The concept of homogeneous coordinates never appears in this case, but the actual math that you do is identical.

Either way, the benefit is that you can treat all parts of the transformation uniformly, which makes everything easier.
posted by hattifattener at 10:15 PM on October 22, 2009

Yeah, I think demiurge is right, and w=0 is special for lights in OpenGL, even after the initial transform. But, at least the transform that OpenGL does by modelview when you call glLight doesn't have to have any special case code to care about whether W=0. (which is an example of what hattifattener just said.)
posted by blenderfish at 11:05 PM on October 22, 2009

Best answer: Drop down a dimension, and think about R^2. You can think about taking your copy of R^2 and embedding it at z=1 in R^3. If you do this, then every point (x,y) in the plane corresponds to the point (x,y,1) in 3-space. Ok, so why would you do this? Well, affine transformations (e.g. transformations) are a pain to implement with matrices, because you can't just do matrix multiplication. But: if you think of your points as being in 3space, then you can implement the 2D translation using 3x3 matrices, as a linear transformation (i.e. matrix multiplication) onthose points in the z=1 copy of the plane you've embedded in 3space. Or, another nice trick: if you view the 2D points as living in the z=1 plane in 3space, then to determine if three points are collinear all you have to do is compute a determinant, which is easy.

But, the way you get the linear algebra to work is that those (x,y,1) points are simply nice representatives of a line through the origin. But any other scalar multiple of the vector would work just as well, so instead of ( x,y,1) you could use (ax,ay,a) for any nonzero a. And there's the homogeneous coordinates.

(ok, really what you're doing is working in the projective plane, so that your Points are lines through the origin and your Lines are planes through the origin. )
posted by leahwrenn at 12:29 AM on October 23, 2009 [1 favorite]

Gah. Affine transformations, e. g. translations...
posted by leahwrenn at 12:30 AM on October 23, 2009

Best answer: I don't know if this will also be helpful (last night I was on an iPhone so I couldn't type so much), but here goes. I'm going to be talking some more about the plane R^2 embedded in three-space R^3, just because it's easier to visualize things, but nothing changes if you kick things up a dimension and think about 3-space embedded at w=1 in R^4 (except that all visualization goes away). This would be easier with pictures.

We can model the extended Euclidean plane (that is, the ordinary Euclidean plane plus "points at infinity" and "lines at infinity") by considering the plane z=1, that is, all points of the form (x, y,1). Every point in the ordinary Euclidean plane uniquely corresponds to a point (x,y,1) in the z=1 plane. In order to do linear algebra, however, it is easier to consider those points as being the places where lines that pass through the origin poke through that z=1 plane.

But that leaves some lines we haven't discussed, namely, the ones which lie in the x-y plane in 3-space. We would like to be working with all lines through the origin, so that nothing is privileged. Ok, so we will include those, also. It turns out that those lines, the ones with z=0, correspond in the extended Euclidean plane model to "points at infinity".

So, how does this relate to homogeneous coordinates and directional vs. point light sources? Well, we are going to consider two Points (as given by homogeneous coordinates) to be the same if they differ by a non-zero scalar. That is, (x,y, 1) and (ax, ay, a) are the same point, and similarly, (x,y,0) and (bx, by, 0) are the same point. This is because we really are thinking of them as generating the same line through the origin. The points (x,y,0) are "points at infinity" and the points (x,y,1) are not.

Why is this good? Well, thinking about vectors in R^3 allows us to leverage some considerable linear-algebraic muscle. For example, it's easy to determine if a vector is in the span of some other vectors (corresponding to points in R^2 being collinear). Rotations and reflections and translations in R^2 are easy to implement as matrix multiplications in R^3. So the idea is that you do all your vector manipulation up a dimension, where things are easy, and then project back down to your original dimension when you're done. (and then for computer graphics, do a further projection onto the screen.)

As for the directional lights: if you want a light coming uniformly from some direction, you can think of it as a point source coming from infinitely far away. (Think about the sun: it appears to us as though the sun's rays are hitting the earth all parallel to each other, rather than radiating out from a single point.) The nice thing about the projective plane model is that we know how to deal with this. Say you want a light source that is coming from inifinitely far away in the direction of the line y=x. (I'm still in R^2/R^3 here). OK, this is easy: we just put our light at the point at infinity that is in that direction, namely the point (1,1,0). The analogue of the points at infinity in 3-space are going to be the points with w=0, only now instead of having an entire "line at infinity" of "infintely far" points, you will have a plane/sphere at infinity.

(Typically, people draw the extended Euclidean plane as a circle, with the boundary being the line at infinity, and one point in each direction, corresponding to all the lines you can draw that pass through the center of the circle. This means that on that boundary circle, you have to identify opposite ("antipodal") points, because you think about "going out to infinity" and then "coming back around...this is hard without being able to draw a picture. I can say more if anyone cares.)

(Full disclosure: I'm a geometer, not a compuer graphics person, so I don't know much about the actual implementation. But I can talk about the projective plane more if that would be helpful A resource I like for projective geometry is Brannan, Esplen & Gray's Geometry).
posted by leahwrenn at 10:37 AM on October 23, 2009 [1 favorite]

Response by poster: Thanks for all of your answers! I've read over the thread a few times now, and I think I'm starting to understand it a little. I will definitely be looking into quarternions and the like in the future, although I have to admit that my understanding of high-dimensional geometry is unfortunately poor and it will take me a while to really grok the material. On the other hand, I'm looking forward to learning this stuff -- math is cool!

I will also be looking into the book recommendations, as well -- thanks, blenderfish and leahwrenn! Brannan, Esplen & Gray seems to be more geared towards beginners, so I will start with that, then move on to the Blinn book.

Just for anybody else looking to learn more about homogeneous coordinates and the like in the future, I also found this explanation of translation in 2D as a type of shear transformation in 3D -- and by extension, any translation in nD as a shear in (n+1)D -- to be extremely helpful. The page also has Java applets to visually demonstrate what is actually happening, which is always nice.

(Also, I just noticed that I've been misspelling homogeneous coordinates. I thought that "homogenous" looked a little off, but that was how this other resource was spelling it... Sigh.)
posted by tickingclock at 8:47 PM on October 23, 2009

Thanks for the explanation, leahwrenn; it really helped me.
posted by Monday, stony Monday at 10:03 PM on October 23, 2009

Yeah, the Blinn book definitely isn't a "nuts and bolts" howto book. But it will help you be more "rounded" graphics programmer-- giving you and idea of some of the history, the mindset, the fundamental math, and "why things are the way they are." Blinn was definitely a pioneer-- remember all the computer graphics of NASA satellites whizzing by planets you saw as a kid? That was him.
posted by blenderfish at 12:28 AM on October 24, 2009

« Older Kid's book about traveling to a bunch of fried...   |   Cheapify my cellie Newer »
This thread is closed to new comments.