Could your body be a data storage device?
November 28, 2007 8:39 AM   Subscribe

Could you hide a number in someone?

Assuming the trajectory of research into genetic manipulation, will it eventually (within the next few centuries)* be possible to "encode" an arbitrarily long string of data into someone's genetic code, effectively turning them into a genetic storage device? I'm not talking about implanting memory, but a permanent, from-birth alteration that would have no physical effects. How much information could you store this way? Bonus: Could this information be "hidden" so that someone couldn't find it unless they were specifically looking for it?
posted by mkultra to Science & Nature (27 answers total) 8 users marked this as a favorite
Sure - anything that can be encoded in base-4 can be stored in DNA with fairly high fidelity.
posted by chrisamiller at 8:55 AM on November 28, 2007

You mean like this patent for DNA-based steganography?
posted by GuyZero at 9:00 AM on November 28, 2007

You could encrypt it, then it wouldn't matter if anyone found it or not.
posted by unixrat at 9:00 AM on November 28, 2007

Best answer: Information theoretic perspective: There's a lot of non-functional DNA which you could stick a message in. The message would require a bit of error-correction if it was long enough, in order to deal with errors in the initial insertion and (infrequent) copy errors. Could it be hidden... I dunno. If there are very highly non-conserved regions of non-functional DNA, then there would be sufficient background randomness to hide a message, yes.

Interesting diversion: what if you wanted the message to survive through the generations? The best target would seem to be mitochodrial DNA, for this: it is passed only from the mother's side, so avoids being mixed up with the other parent's DNA. To guarantee probable transmission to generation N would probably require some superpolynomial function of N initial females.

A upper bound on the amount of data that can be transmitted is 20% of the genome, or 150 million base pairs (300 megabits). That's from here, where it says that up to 80% of DNA base pairs may be expressed. For mitochondria I am unaware of any figures, but I think that there is very little room for extra stuff.

I know nothing about the difficulties of genetically modifying humans, so I'll leave that to someone else.
posted by topynate at 9:03 AM on November 28, 2007

Best answer: You could probably do it right now. The human primate genome is full of junk already, so there's plenty of room to hide stuff. Viruses do it all the time, where they stick their own genetic information into living cells and 'hide' it from the host's proofreading and immune systems. Given that the human genome is 3 billion base pairs, your problem might be that you've hidden the data so well that you can't find it later.

If you want it in every cell of the human, then you'd have to insert it early, just after the egg is fertilized. That's probably a bit trickier, and a bit outside my expertise. The advantage you have is that you don't need the inserted data do do anything, which is one of the problems with genetic engineering of embryos. The disadvantage is that you need to insert it somewhere where it won't kill the person. Unless you're willing to accept a certain number of mistakes (aborted/messed up people), that can be difficult.

One question is how long you want this data to last. Are you talking about inserting it in one generation and retrieving it a few dozen generations down the line? You'd face new problems then owing to natural mutation ('junk' doesn't get well-preserved, evolutionarily speaking) and genetic shuffling, which can all scramble your data. Your best bet in that case could be to stick your data into maternal mitochondrial DNA, which you inherit only from your mom and not your dad.
posted by Mercaptan at 9:03 AM on November 28, 2007

Best answer: You can think of each link of DNA as a quaternary number, and there are a *lot* of them that (as fas as we can tell) are just noise. How much is useless is still undecided, but it's probably enough to fill an encyclopedia, give or take a few orders of magnitude.

As far as inserting DNA, we can do that right now. You needn't wait a few centuries. And, you needn't do it at birth. Researchers change retroviruses and adenoviruses to alter living cells in gene therapy to make some functional change in the cell.

Doing that for the noise areas of DNA is a tiny bit harder, because the noise may not be one of the standard well-known segments. To be certain of targeting something that exists but is meaningless, it would probably take sequencing an individual's DNA to get it right. But, that is really cheap these days. (Impossible 10 years ago. A decent birthday present today.)

Could it be hidden? Of course. It would be hard to find it anyway. Making it obvious is a much harder problem. A few thousand contiguous (e.g.) Guanines is a decent marker that says "start here", I guess.
posted by cmiller at 9:03 AM on November 28, 2007

Response by poster: anything that can be encoded in base-4 can be stored in DNA with fairly high fidelity.

How would this work? In "unused" sequences? Would someone looking at this person's DNA immediately realize that there was "something" there, even if they didn't quite know what?

(yes, this is all a potential plot point)
posted by mkultra at 9:04 AM on November 28, 2007

mkultra, how much data do you want to hide? A short message would be no problem to completely hide, if done correctly. Potential wrinkle is if the 'attacker' gets her hands on the parents' DNA: this would allow a base pair for base pair comparison to check if any material is there that shouldn't be.

The workaround is to fake the genetic shuffling. You can make it so that which parent a certain base pair 'comes from' determines that bit of the message. This reduces the amount of transmissible data but would make it really, really hard to detect. Decryption would require access to both parents' DNA and the deciphering algorithm.
posted by topynate at 9:13 AM on November 28, 2007

I don't think we have the technology right now to absolutely reliably insert the dna fragment into someone's genome somewhere it won't cause side effects. But there's no reason to think we won't have that technology in the nearish future.

If the number is longer than a few tens of bits, I think it would be apparent to someone who sequenced the person's genome and compared it to a baseline human genome that something artificial had been spliced in, even if they didn't know what.

(On preview: what y'all said)
posted by hattifattener at 9:14 AM on November 28, 2007

mkultra: That information needn't hide inside the person's DNA. We have more non-human "germs" in our body than we have human cells, by about 10 times. In the near future, a smart team could engineer a particular virus that grows only in a person with specific markers, and doesn't have (to the host, noticeable) side effects. Your message could hide in the noise of the bazillions of benign germs inside of us, if someone is inclined to look in the host's DNA.
posted by cmiller at 10:16 AM on November 28, 2007

Wait, so if one could store information in DNA...and transfer it across generations...would this finally allow...IP Over Penis? IPOP?
posted by bkudria at 10:31 AM on November 28, 2007

There's a trap here. If your data lucks into any kind of activation code already used in the genome, duplicating an existing header sequence, then at least some part of your message won't be "dead". It would probably encode for a nonsense protein, but it could have health effects.

In fact, if you're considering this for use in a novel, that would be one way that such a thing might be detected. Someone comes in to a doctor and complains about how their fingernails have always had a weird color. Investigation reveals it's genetic, and the gene is unlike any anyone has ever seen before, and suddenly someone realizes it was inserted...
posted by Steven C. Den Beste at 10:51 AM on November 28, 2007

In Chris Lawson's SF short story "Written in Blood" an Islamic geneticist develops a method of encoding the Koran into DNA.
posted by thatwhichfalls at 11:06 AM on November 28, 2007

Response by poster: Thanks for the info, everyone!
posted by mkultra at 11:09 AM on November 28, 2007

There was also a novel written by Janet Kagan named Mirabile, in which she ran with the idea of encoding other species' DNA in the "unused" portion. The setup was that humanity was colonizing another world and that as a redundant system, a geneticist could extract the genes for a cow (and incubate a cow) from a chicken.

Because of some Disaster Long Ago, the protagonist's colony doesn't have all of the technology they should, and there are all kinds of interesting complications from strange traits getting expressed along with interesting interaction with the "native" ecosystem.

I recall that the book is made up mostly of a collection of short stories / novellas originally published in Asimov's. It's actually a very light (and humorous) read. I recall Kagan did a good job of telling an interesting story around the concept of storing information "inside" DNA.

Here's the Amazon link.
posted by QuantumMeruit at 12:02 PM on November 28, 2007

I'm not finding the exact links I was looking for, but there are two scientist/artists who proposed a similar trick for archiving a New York Times magazine. In the cockroach genome.
[summary from one of the artists] [comment in Nature] [related NYT article]
posted by twoporedomain at 12:22 PM on November 28, 2007

A neat article in next week's New Yorker:
posted by cmiller at 2:01 PM on November 28, 2007

cmiller: perhaps a retrovirus that integrates into the genome of a germ-line (egg or sperm) cell? The only problem there is that it works at very low fidelity. As far as passing on information via the gut etc flora, this is not passed down directly. A baby born by C-section is born without any colonizing species - these will come later.
posted by fermezporte at 2:48 PM on November 28, 2007

A baby born by C-section is born without any colonizing species - these will come later.

IANABiologist, but that sounds wonky to my ears. Why would the mother have so many subcellular and intracellular critters but the child have none? I'm not talking exclusively about the flora in the intestines.
posted by cmiller at 3:48 PM on November 28, 2007

IANABiologist, but that sounds wonky to my ears. Why would the mother have so many subcellular and intracellular critters but the child have none? I'm not talking exclusively about the flora in the intestines.

The shouldn't be able to get through the placenta, just like they can't get through the blood/brain barrier. Although in theory some can (like HIV) most wouldn't.
posted by delmoi at 4:40 PM on November 28, 2007

If the number is longer than a few tens of bits, I think it would be apparent to someone who sequenced the person's genome and compared it to a baseline human genome that something artificial had been spliced in, even if they didn't know what.

You have a point, but there are ways plenty of ways around this. Break the message up into chunks, or base the code on small modifications of Alu Sequences. They comprise about 10% of the genome, and a extra ones, especially in regions with high levels of genomic duplication, wouldn't indicate anything obviously askew.
posted by chrisamiller at 7:38 PM on November 28, 2007

Going back to what someone said earlier, if you tried to use a virus to implant into a person, they yeah, you could have all sorts of trouble as a result. But, if you put your "gene" into a mess of cells in a culture and then carefully chose one that doesn't do anything weird for reimplantation, well, the sky's the limit.

Here's another possibility. Take your host. Modify some bone marrow cells so that they use a different codon that reads for the same amino acid when expressed (this is known as a silent mutation). Reimplant those. Now, when you want to decode, you look for where the discrepancies are. Matching codons are a zero, flipped codons are a one.

Neither of these would be a from birth option (and wouldn't breed true), but if you can manage something like human cloning.....
posted by Kid Charlemagne at 10:49 PM on November 28, 2007

A baby born by C-section is born without any colonizing species - these will come later.

This is wrong. No baby makes it out of the mother without getting covered in maternal blood, bacteria from her skin, bacteria from the OR air etc etc. In any case the placenta is far from a perfect barrier against viruses and bacteria.
posted by roofus at 5:04 AM on November 29, 2007

Simplifying greatly, the problem here is that the regions of DNA that are "not used" tend to be those with the greatest mutation rate (hence most rapid degradation of the message). The regions which are "used" most are the most stable through time, but far more likely to kill the person if you start messing with them.

I like the idea of taking a very well-conserved protein (a histone, say) and modifying it with synonymous mutations (that don't change the protein sequence) to encode the message. Such a scheme would probably still have an effect on the person's cells (a change in the levels of expression of the protein, if nothing else). Because the rate of change in these proteins is very low, the message would probably be readable for many generations.

Alternatively, you could store the message in an "unused" bit of "junk" DNA. The message would degrade much more rapidly, but you could allow for this by storing very redundant information. For example, make up a string of 100 bases to represent a binary 1 and a different string of 100 to represent 0. When you come to read the message back, even if 30 of the 100 bases of a given bit have changed, there should still be enough information left to decide whether it was a 1 or a 0.

How about encoding the message by altering the order of genes on a chromosome? Actually, here's my favourite idea: store the message by swapping the DNA strand on which genes reside. In a chromosome, which consists of two complementary DNA strands, the genes are split up between the two strands. So if we take a chunk of chromosome with 8 genes on it, counting from one end, and arbitrarily label on strand "forward" (F) and one "reverse" (R), genes 1, 3, 4 and 5 might be on the F strand and genes 2, 6, 7 and 8 on the R strand.

So to encode an ASCII A (65 in decimal, 1000001 in binary) we transfer genes 1 and 8 to the opposite strand (R and F respectively). A swap (relative to the normal human genome) means binary 1, a gene left where it is means binary 0. The sequences of the genes themselves are left unchanged (so this potentially works better for hiding the message in a person). The only way to "read" the message would be with some fairly expensive (currently) chromosome walking. I'm not sure what the rate of spontaneous strand-swapping of genes is in humans, but I bet it's a lot lower than that of most point mutations (where one base is swapped for another) so your message will be more permanent. Even a rare random strand-swap would only flip one of your bits, so you could use a checksum to correct it.
posted by primer_dimer at 8:01 AM on November 29, 2007

Just to add that the strand-swapping scheme laid out above only works within the lifetime of an individual. There's no guarantee that your carefully altered chunk of chromosome will be the one that gets into the next generation. Ahh, unless you put it on the Y chromosome, and have your message pass down the male line!

Or you could just clone the guy when he comes near the end of his life.
posted by primer_dimer at 8:05 AM on November 29, 2007

How about encoding the message by altering the order of genes on a chromosome?

Probably a bad idea. The architecture of the genome matters for gene expression. If your gene stays partially wrapped around histones or is otherwise less accessible to the transcription machinery, then you'll get lower levels of expression.

Swapping strands potentially has similar problems, since promoter elements can lie hundreds of thousands of bases upstream. Simply put, you wouldn't know how much DNA you'd have to move.

Synonymous mutations in genes would work, but high rates of it would be pretty easily detectable with current sequencing methods.

It's worth noting that even highly repetitive and non-coding regions of DNA are fairly stable within a person's lifetime. From an long-term, evolutionary perspective it becomes problematic though. You might not expect to get high-fidelity copies back from someone's 10th or 100th generation offspring, but putting it into one person, then extracting it decades later probably wouldn't be a problem.
posted by chrisamiller at 12:14 PM on November 29, 2007

Another vote for silent/synonymous mutations. Non-coding DNA != non-functional DNA, and truly non-functional DNA isn't going to be subject to especially rigorous damage repair. Cmiller's germ idea is pretty cool too.
posted by eritain at 10:34 PM on January 15, 2008

« Older Brazil Flower I.D.   |   Impossible Movie Question Newer »
This thread is closed to new comments.