Genetic distance - Why angle rather than magnitude of difference?
March 8, 2008 9:18 PM Subscribe
Why is it that the genetic "distance" between two populations would be determined by the angle between their position vectors rather than the magnitude of their respective displacements?
That must have been a mighty confusing question. Essentially, I've got a problem very similar to the one represented here and I'm wondering why is it that the angle has been chosen rather than the magnitude of the difference between the vectors?
This has been bugging me for ages and settling it in my mind will help me continue with this assignment. Thanks, mefi :-)
That must have been a mighty confusing question. Essentially, I've got a problem very similar to the one represented here and I'm wondering why is it that the angle has been chosen rather than the magnitude of the difference between the vectors?
This has been bugging me for ages and settling it in my mind will help me continue with this assignment. Thanks, mefi :-)
Thinking about it a little further-- if those vectors are, indeed, always normalized, then the "genetic space" is the surface of a 4-dimensional hypersphere. In that case, it might make sense to measure genetic distance as the distance from A to B across the surface of the sphere (for the same reason we don't usually consider direct-through-the-earth distance between two cities). [disclaimer: I don't know anything about this problem domain]
posted by qxntpqbbbqxl at 9:46 PM on March 8, 2008
posted by qxntpqbbbqxl at 9:46 PM on March 8, 2008
In that case, it might make sense to measure genetic distance as the distance from A to B across the surface of the sphere (for the same reason we don't usually consider direct-through-the-earth distance between two cities)
That would be proportional to the angle.
posted by vacapinta at 9:51 PM on March 8, 2008
That would be proportional to the angle.
posted by vacapinta at 9:51 PM on March 8, 2008
Response by poster: Excuse me for being an idiot, but I thought the vectors would only be considered normalised if the squares of their individual constituents add up to 1, rather than just summing all the components as-is?
posted by PuGZ at 11:15 PM on March 8, 2008
posted by PuGZ at 11:15 PM on March 8, 2008
Yeah, the vectors aren't normalized to length 1, they're "normalized" such that their comopnents sum to 1, which is different. You should be able to drop any single row of the table without losing any information, and the remaining vectors will fill a unit (n-1)-dimensional cube. I guess that puts the full vectors onto the surface of a rotated n-dimensional cube, rather than the surface of an n-sphere.
It's still the case that two vectors that are parallel have to be the same vector, so you don't lose anything by representing distance as an angle. Is it possible that common operations (summing two populations, say) are more easily described in terms of angles than distance?
posted by hattifattener at 12:10 AM on March 9, 2008
It's still the case that two vectors that are parallel have to be the same vector, so you don't lose anything by representing distance as an angle. Is it possible that common operations (summing two populations, say) are more easily described in terms of angles than distance?
posted by hattifattener at 12:10 AM on March 9, 2008
While I just wrote a PhD thesis on evolutionary distance measurement, I am unfamiliar with this kind of population genetics distance, and my population genetics book is at the lab. So if you can provide a reference on this in the biological literature, I'd appreciate it.
Let's take a much simpler non–population-genetics example. Here's a phylogenetic tree for a humans, mice, and dogs. The asterisk and plus sign indicate where the last shared ancestry of each branch was, and the branch lengths indicate how much mutation occurred on each branch. We can also consider each species to be a vector, with the plus sign to be the origin of this space.
If you want to order pairs of species in terms of how many nucleotides are different, you would sum the branch lengths. But if you want to do it terms of most recent common ancestry, then the angle of their two vectors would be fine. This may be the motivation for using angles rather than magnitudes in your example.
posted by grouse at 3:30 AM on March 9, 2008
Let's take a much simpler non–population-genetics example. Here's a phylogenetic tree for a humans, mice, and dogs. The asterisk and plus sign indicate where the last shared ancestry of each branch was, and the branch lengths indicate how much mutation occurred on each branch. We can also consider each species to be a vector, with the plus sign to be the origin of this space.
+
/ \
/ \
* \
/ \ \
/ \ \
/ \ \
H \ D
\
\
\
\
\
M
If you want to order pairs of species in terms of how many nucleotides are different, you would sum the branch lengths. But if you want to do it terms of most recent common ancestry, then the angle of their two vectors would be fine. This may be the motivation for using angles rather than magnitudes in your example.
posted by grouse at 3:30 AM on March 9, 2008
In your particular example, the components are gene frequencies at the same locus so they would have always been summing up to 1 at any previous point in evolution. Consider a locus with two alleles, A and B, represented by the gene frequencies p and q. Since p + q = 1, 0 ≤ p ≤ 1, and 0 ≤ q ≤ 1, there is only a 45° line segment that contains the possible values. Doesn't it seem obvious there that angle would be a good way of measuring the difference between two populations?
I should also let you know that there are so many different ways to estimate genetic "distance" between two populations or species that it's not even funny. Most of these are not true mathematical distances—they do not satisfy the triangle inequality and some are not even symmetric.
posted by grouse at 3:44 AM on March 9, 2008
I should also let you know that there are so many different ways to estimate genetic "distance" between two populations or species that it's not even funny. Most of these are not true mathematical distances—they do not satisfy the triangle inequality and some are not even symmetric.
posted by grouse at 3:44 AM on March 9, 2008
Response by poster: I should probably say now that this information was just given in the context of an undergraduate mathematics assignment. For all I know, the choice of using this particular measurement for genetic distance might be completely arbitrary!
That said, the report has to be "completely self-contained", which I take to mean it has to explain why this measurement has been chosen. How very confusing and vague. :-(
grouse: Thank you so much for your replies, but I must confess I don't quite understand the following: ...there is only a 45° line segment that contains the possible values. Doesn't it seem obvious there that angle would be a good way of measuring the difference between two populations? If you could explain this to me, it might open my eyes to a solution. :-)
posted by PuGZ at 3:14 AM on March 9, 2008
That said, the report has to be "completely self-contained", which I take to mean it has to explain why this measurement has been chosen. How very confusing and vague. :-(
grouse: Thank you so much for your replies, but I must confess I don't quite understand the following: ...there is only a 45° line segment that contains the possible values. Doesn't it seem obvious there that angle would be a good way of measuring the difference between two populations? If you could explain this to me, it might open my eyes to a solution. :-)
posted by PuGZ at 3:14 AM on March 9, 2008
Plot all the points that fit the three conditions I laid out above, with p on the x-axis and q on the y-axis.
posted by grouse at 5:49 AM on March 9, 2008
posted by grouse at 5:49 AM on March 9, 2008
The reason to use angle rather than magnitude is that angle is independent of normalization, which in principle is arbitrary. You can scale these vectors however you want, and the angle between them won't change.
Furthermore, on a unit n-sphere, the distance between two points is just proportional to the angle between them (the line element of a circle, for example, is r d(theta), or just d(theta) for a unit circle). So, if we're working with properly normalized vectors, angle really is the way to go for true geometrical distance.
posted by dsword at 9:08 AM on March 9, 2008
Furthermore, on a unit n-sphere, the distance between two points is just proportional to the angle between them (the line element of a circle, for example, is r d(theta), or just d(theta) for a unit circle). So, if we're working with properly normalized vectors, angle really is the way to go for true geometrical distance.
posted by dsword at 9:08 AM on March 9, 2008
This thread is closed to new comments.
posted by qxntpqbbbqxl at 9:39 PM on March 8, 2008