Using DNA to archive the past for the future

I’ve spent a lot of time over the last few months trying to pare down a quarter century’s worth of files, photos and ephemera. Among the many boxes of paper, I’ve found photos from my high school years, notes from forgotten projects, and correspondence with friends and family and colleagues, some of whom I haven’t seen in years. It’s been a lovely trip down memory lane.

But I also discovered ancient floppy disks with college term papers, Zip disks full of archived emails, and CD photo albums. At least that’s how they were labeled. I no longer have a computer that can read floppies and even some of CDs have become unreadable. Fortunately over the years I’ve transferred many of my files to new media, but some of the files may be forever unrecoverable. While DVDs and hard drives can hold a lot more information – and weigh a lot less – than a box of paper printouts, the fact is that my old paper files are more likely to be readable 30 years from now than the digital files currently residing on my laptop.

This, of course, presents a difficult problem not only for people like me who want to be able to access their photos and letters and personal documents in the future, but also for archivists, historians,  governments and other organizations who want and need to ensure that today’s digital data will remain accessible decades or centuries from now.

It turns out that a biochemical approach to information storage may be part the solution. Naturally occurring DNA molecules encode information that directs the synthesis of the tens of thousands of proteins and other molecules that make up a living cell, along with the processes that allow the development of complex multicellular organisms (such as us humans). The development of ever faster and more accurate methods of both synthesizing and determining the sequence of DNA molecules has not only improved our understanding of normal DNA function, but also spurred the creation new nucleic-acid based technologies.

Animation of a rotating DNA structure.One such biotechnological innovation is the development of methods that use the information storage properties of DNA to encode digital data. Just a few months ago EMBL molecular biologists Nick Goldman and Ewan Birney published a paper demonstrating that one gram of DNA can hold more than 2 million gigabits of information, or “468,000 DVDs”. They were even able to build in error correction to be sure the encoded information would be stored and read accurately. And if kept in a cool dry environment, DNA can potentially remain stable for tens of thousands of years, making long term archival storage possible.

Naturally there are drawbacks to using DNA as a storage medium. Once the data is written by synthesizing the DNA sequence it cannot be changed. And there is no easy way to retrieve just a small portion of data without sequencing a big chunk of DNA. There is no equivalent to the list of files on your hard drive to find the data you are looking for. It’s not at all practical for information you would want to frequently retrieve or modify, and synthesizing DNA takes longer (and is more expensive) that saving a file on a thumb drive.  But unlike DVDs, which will eventually seem as archaic as papyrus as a storage medium, we humans should be able to sequence DNA molecules and decode the information stored therein – assuming, of course, that human society retains at least at the level of technology that exists today.

And to get a bit more speculative, the  biochemical properties of DNA leave open the intriguing possibility that the encoding methodology could be used to insert important files directly into human genomes creating living data repositories. The method could also be used to mark one’s self as part of a group or organization through genetic engineering.

Of course that idea isn’t new to science fiction. In Chris Lawson’s 1999 short story “Written in Blood”, a Muslim man has part of the Koran encoded as DNA inserted into his genome. That decision turns out to be fatal when the inserted DNA creates a mutation that causes leukemia. His biochemist daughter eventually develops a better method of encoding and she uses it to write what’s important to her into her own blood: photos of her wedding and her family, Martin Luther King’s “I Have a Dream” speech, Watson and Crick double-helix paper, Shakespeare’s Julius Caesar and a Muslim parable. And a paraphrase of Einstein’s words after the atomic bombing of Japan, expressing hope in humanity:

“The release of atom power has changed everything but our way of thinking,” then added, “The solution of this problem lies in the heart of humankind.”

Birney and Goldman selected similar data to test their real-world system: a digital version of Shakespeare’s sonnets, a photo of their offices, a pdf of Watson and Crick’s paper with the structure of DNA, and an audio clip from Martin Luther King Jr.’s “I Have a Dream” speech. If we were to use the technology to create a time capsule for our descendants to open millennia from now, Shakespeare and MLK are obvious selections of cultural significance.

What we choose to archive from our past to share with future generations says a lot about what we value today. I’d hope that ultimately we would include data representing a diverse range of cultures and voices. It’s up to science fiction to ask the question of what future humans might make of the information.


Goldman et al. “Towards practical, high-capacity, low-maintenance storage in synthetic DNA” Nature (2013) doi:10.1038/nature11875


You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or trackback from your own site.