Paleontologists routinely resurrect and sequence DNA from woolly mammoths and other long-extinct species. Future paleontologists, or librarians, may do much the same to pull up Shakespeare’s sonnets, listen to Martin Luther King Jr.’s “I have a dream” speech, or view photos. Researchers in the United Kingdom report today that they’ve encoded these works and others in DNA and later sequenced the genetic material to reconstruct the written, audio, and visual information.
The new work isn’t the first example of large-scale storage of digital information in DNA. Last year, researchers led by bioengineers Sriram Kosuri and George Church of Harvard Medical School reported that they stored a copy of one of Church’s books in DNA, among other things, at a density of about 700 terabits per gram, more than six orders of magnitude more dense than conventional data storage on a computer hard disk. Now, researchers led by molecular biologists Nick Goldman and Ewan Birney of the European Bioinformatics Institute (EBI) in Hinxton, U.K., report online today in Nature that they’ve improved the DNA encoding scheme to raise that storage density to a staggering 2.2 petabytes per gram, three times the previous effort.
To do so, the team first translated written words or other data into a standard binary code of 0s and 1s, and then converted this to a trinary code of 0s, 1s, and 2s—a step needed to help prevent the introduction of errors. The researchers then rewrote that data as strings of DNA’s chemical bases: As, Gs, Cs, and Ts. At the storage density achieved, a single gram of DNA would hold 2.2 million gigabits of information, or about what you can store in 468,000 DVDs. What’s more, the researchers also added an error correction scheme, encoding the information multiple times, among other tricks, to ensure that it could be read back with 100% accuracy.