Shared human/lettuce DNA
December 31, 2008 7:36 AM   Subscribe

"The DNA of humans and chimps is 98.4% identical." I've read that several places. I've also read "The DNA of all living things is 90% identical" and "The DNA of humans and lettuce is 16% identical." How could I find out which of those last two statements is correct? Or is the problem that I don't understand which part of the DNA is being referred to? (Frankly, I'm not that clear on DNA in the first place - I'd just like the right number.)
posted by kestralwing to Science & Nature (14 answers total) 5 users marked this as a favorite
I cant answer the whole of your question but some of it can be answered below.

1. The whole of the DNA content of a living thing is called genome.

2. Most of the information related to structure and function is written down in the sequence of the DNA (rest is epigenetic information but best to ignore it for reasons of simplicity).

3. Most of genetic information is coded in genes (which are read by molecular machine like a computer tape) rest is called 'junk DNA' which represents more than 90% of total DNA (junk DNA is no longer considered all junk)

4. Genes carry blueprints of proteins which form enzymes, hormones, cell membranes and everything our bodies have and use to carry out complex activities.

5. A lot of the basic information required to 'live' is common to all living organisms like extracting energy from sugar, converting destructive metabolites to simpler organisms.

6. Higher functions such as language etc which distinguish us from Chimpanzees do not arise purely from more genes or very different one.

I am sure other wiser members of the community can help better this answer.
posted by london302 at 8:05 AM on December 31, 2008

maybe by 'all living things' they really meant 'living animals'? depending on the source of the quote, it could be a mistake on the author's part.
posted by ArgentCorvid at 8:09 AM on December 31, 2008

This article tries to explain it in an accessible way. The 99% number comes from looking at known genes. You can also look at things like non-coding regions and the number of copies of various genes to get a lower number.

Don't believe anything without a citation to real research is a good rule. The "all life" quote could be looking at something like this where they looked at the yeast sequence and tried to find "close" homologs in what was known of the human sequence at the time. These numbers will depend on what you mean by "the same" and what part of the genome should count. For example, the human genome is about 3 billion base pairs and the yeast genome about 12 million, so there can't be that much direct sequence identity.
posted by a robot made out of meat at 8:16 AM on December 31, 2008 [1 favorite]

It's hard to define an exact answer to these questions. There are several wrinkles that make it a bit more complicated than one might assume at first glance, including these facts:
  • We don't know the entire human or chimpanzee genome sequence. In particular, there are repetitive regions of the genome that are difficult to sequence, and there are regions that we have sequenced and localized to a chromosome but have not yet figured out where they fit in with the other known sequence of that chromosome. These lead to holes in our sequence known as assembly gaps, and less coverage of the heterochromatic parts of the genome.
  • We don't have a perfect understanding of which parts of the human and chimpanzee genome correspond to each other (are aligned) and we probably never will.
  • Each human has a slightly different genome sequence from the others, and the same is true of chimpanzees, so one ought to estimate how much of the divergence is due to within-species polymorphism, and how much is due to divergence from most recent common ancestors of the two species to the MRCAs of each species individually
  • The different genomes have insertions and deletions in different places and defining divergence when including these areas can be somewhat ambiguous.
  • There is a question of whether one should compare the whole genome or only those regions that you understand the function of better, which suffer less from the previous problems, but will also be more conserved in general.
It's possible to deal with all of the above factors, but different people might do it in different ways, so you might get slightly different answers depending on who you ask. The chimpanzee genome paper found 1.23% divergence in the form of single-nucleotide substitutions in easily compared regions between the chimpanzee and human reference genomes. But if you correct for variations within the species, there's only a 1.06% divergence. Of course there are insertions and deletions in each lineage that make up another 3% of species-specific sequence. If you limit yourself to genes that you can find in both species, 29% of protein-coding genes are identical, and the median number of substitutions that change the protein sequence is 2. But as a robot made out of meat points out, there are many genes that cannot be compared in this way, as they are new to one lineage or the other.

All these problems get much worse if you want to compare much more distantly-related species such as humans and lettuce. Doing a whole-genome comparison is impossible because the genome sequences are so different. We always focus on regions we can identify as similar, but there are far fewer for a human–plant comparison, so it is misleading to mention human–lettuce results in the same breath as human–chimpanzee whole-genome results. Comparisons limited to universal regions of sequence (such as the 18S ribosomal RNA) are still useful because they allow us to qualitatively assign relationships between different species, but the raw identity fraction is less useful.

As someone who has published peer-reviewed papers in vertebrate comparative genomics, I can tell you that working researchers are not usually bothered by the exact numbers because they are hard to know or even define well. It's much more important to have a general idea of how closely related the species are. So I know that chimps are about 99 percent identical to humans in aligned sequence and that lettuce is not very similar at all.

I should note that the definition of the term "gene" is in a state of flux right now, and that genome biologists have not seriously used the term "junk DNA" for many years.
posted by grouse at 8:44 AM on December 31, 2008 [15 favorites]

That might be much more than you wanted. Here's the short version:

human–chimp: Somewhere between 95% and 99% identical depending on how you look at it.

human–lettuce: The number will be low but not zero. You can find similarities between the species but not enough to make a meaningful numerical comparison.

all living things: It's impossible to make a claim about the DNA of all living things in the same way as the above. It's nowhere near 90%, though.
posted by grouse at 8:50 AM on December 31, 2008

Kestralwing, as you can see, this subject can get awfully complicated and technical, very fast.

I think the shortest, sweetest answer is: there are an awful lot of ways you look at the issue to come up with A number, which means that there really is no single answer to THE number.

Another perspective on the issue, which may or may not suit your purpose. All living things are built from essentially the same materials (proteins, carbohydrates, nucleic acids, fats, water, certain small molecules) and share similar, if not identical, functions at the cellular level (generating energy, breaking down complicated molecules, building up complicated molecules). That means that ALL living things, from single-celled creatures like yeast all the way up to complex plants and animals, require similar "cellular machinery". Much of that cellular machinery is what is "described" by the genome--a la the genome-as-blueprint analogy. So all living things have some similarity to all others in their genomes.

The more similar the creatures, the more similar the genomes. People and chimps are a whole lot more similar than people and lettuce, so the genomes of people and chimps are going to be a lot more similar.

If it seems the lettuce (and yeast, and viruses, and stuff like that) are far too dissimilar to people for this to make sense, consider that they have to be similar enough to us on a cellular level for us to use them as food, or for them to successfully infect us, or whatever sort of biological interaction.
posted by Sublimity at 10:02 AM on December 31, 2008

I wonder if this whole DNA equivalence thing (humans and bananas share 40-50% of their DNA!) is just one of those science-type urban legends, like "we only use 10% of our brains."
posted by jasper411 at 10:50 AM on December 31, 2008

A lot of pathways are relatively conserved across species. You can use this tool to see pathway conservation statistically. You can look at the genome maps there too, though the usefulness of looking at them starts to drop when you start increasing genome complexity. This link shows graphically which metabolic pathways in E. coli are conserved in humans (anything green is conserved). You can click around on the map to figure out what stuff does, though if you haven't had a molecular bio class it may not make a whole lot of sense.

Basically, the number isn't bull, it's actually there. One thing to keep in mind, though, is that DNA itself isn't the only factor in complexity. How the DNA is structured and what gets attached to it makes a big difference in regulation, and regulation makes a massive difference in what actually happens, so complexity-wise, humans/chimps/etc. may be massively more complicated than E. coli but still share a significant amount of basic genes with it.

Hope that sort of helps..
posted by devilsbrigade at 11:18 AM on December 31, 2008

I wonder if this whole DNA equivalence thing (humans and bananas share 40-50% of their DNA!) is just one of those science-type urban legends, like "we only use 10% of our brains."


The value of the number may certainly be called into question depending on how you're defining the similarity (see above), but there's little doubt that we share a number of identical sequences, and also share a number of similar sequences.

It doesn't mean, however, that if you put all our genes end to end, and do the same for the chimps, that they match for the first X% of the sequence. It's far more broken up than that.
posted by Netzapper at 12:51 PM on December 31, 2008

A lot of the similarity figures arise because many distantly related species share the same genes, it is the regulation that is different. That is, the way those genes are used can be vastly different. Many of the genes that create the eye in a fly are the same genes that create the eye in humans, despite that fact that our last common ancestor was hundred of millions of years ago.

Sorry this isn't a more concrete answer, but when you see the incredibly high levels of similarity in species they are often close. I suggest checking out some popular science literature lite The Ancestor's Tale or Endless Forms Most Beautiful.
posted by Midnight Rambler at 3:24 PM on December 31, 2008

Here's an analogy -- cake and bread are 90% identical, you might say, and you could say that all food is (pick a number) 40% identical.

Cake and bread have many of the same ingredients in similar proportions, and are made in similar ways. All food though, from steamed broccoli to bread and cake to a hamburger, is composed from a relatively small subset of molecules that we can use as fuel or that at least won't poison us, combined in a certain limited number of ways that we can digest.

Wood is mostly cellulose, and potatoes are mostly starch. Cellulose and starch are 100% identical -- except their molecules are arranged differently, so we can digest starch but not cellulose.

A better analogy? This post and the Gettysburg Address are 97% identical in the letters used, maybe 20% identical in the words used, but in sense and importance -- not at all. So though there is a number a person could pick for DNA identity between chimps and people or people and lettuce, it isn't very meaningful. It's the combinations of genes that work together in particular ways -- the molecular pathways -- that make living things the way they are. As Eric Clapton said, it's in the way that you use it.
posted by Methylviolet at 4:36 PM on December 31, 2008 [2 favorites]

Sorry Methylviolet, I have to disagree with much of your comment. Your post and the Gettysburg address are not 97% identical in the letters used, at least in the way that identity is considered in genomics. Identity is not just the fraction of identical ingredients in an unordered soup of simple elements—the order and structure of these elements is important. For these reasons, claiming that cellulose and starch are 100% identical or that cake and bread is 90% identical are just wrong.

Knowing that the human and chimpanzee genomes have approximately 99% identity in aligned regions is actually quite meaningful and useful. For starters, it tells you that the combinations of gene products acting together are likely to be more similar to each other than those from species sharing only 90% identity or 80% identity, and the gene products themselves will also be more similar. It is also an essential element in understanding the pattern and process of evolution in the primate lineage, and where we will have to look to find the essential bits of sequence that make humans human.
posted by grouse at 2:52 AM on January 1, 2009

Everything grouse has said is correct, but here's a slightly different way of looking at it, related more directly to the question:

"The DNA of humans and chimps is 98.4% identical."
This is about right, give or take a couple of percent depending on which particular bits of DNA you're talking about. Human and chimp DNA is so similar that you can line up the sequences next to each other and count the differences. This comparison is a bit like comparing the King James Bible with the Revised King James, or something. The similarities are so great that it's easy to spot the changes.

"The DNA of all living things is 90% identical"
Did you get this from Jurassic park? I think this is stated there, but it's bullshit. The human genome is about 3 billion base pairs; bacterial genomes are of the order of one million. So the statement makes no sense. There are some parts of all living genomes that are similar - ribosomal genes, and some core enzymes, for instance - but that's a tiny percentage of an E.coli genome, and 1000 times smaller in humans. Think of this not like comparing this post to the Gettysburg address, but like comparing every written work in the English language to every other, from pamphlets to encyclopedias. There is no meaningful way to apply a single percentage number to the result.

"The DNA of humans and lettuce is 16% identical."
This statement again doesn't make a lot of sense in isolation. The lettuce genome isn't fully sequenced for a start. But we have to assume that it's an oversimplified form of a more meaningful statement. Comparing very different genomes like these is comparing two different books on the same subject - there may be several sentences or paragraphs that are almost identical, but other parts will be very different. And maybe one book has ten times as many pages as the other. How can you say they are a certain percent identical without clarifying further what you mean by that; ie how did you do the comparison?
I can think of several possibilities:
a) 16% of the lettuce DNA sequence has identical chunks in the human genome.
b) 16% of the human DNA sequence has identical chunks in the lettuce genome.
- these two would be mutually exclusive if it were not for the fact that the human and lettuce genomes are roughly similar in size. As it is, they could both be true, but it seems very unlikely to me that the similarity would be that much.
c) 16% of the genes in lettuce have equivalents in humans
- this is roughly the correct ball-park figure, but note that this is not a statement about DNA sequence identity. It would likely be corrupted to such by a lazy journalist.
d) The genes which are equivalent in humans and lettuce have, on average, 16% identical DNA sequences.
- this is plausible.

I can think of dozens of ever more baroque comparisons that might be corrupted to the phrase "The DNA of humans and lettuce is 16% identical," but I haven't been able to Google up any original research which might have been the basis of such.
posted by nowonmai at 3:39 PM on January 3, 2009

I'm pretty sure the human/chimp similarity is based on dna hybridization. You can also measure similarity allele by allele, gene by gene, codon by codon, base by base, etc, so you can see why there's so much noise. So it's not that you don't understand what part of the dna is being refered to, it's that that information has been lost to the telephone game by the time the statistic reaches you.
posted by Eothele at 7:49 PM on January 27, 2009

« Older Spending a long weekend alone.   |   I Like A Bit Of Comedy, Me Newer »
This thread is closed to new comments.