Can DNA Solve the Data Storage Problem?

Can DNA Solve the Data Storage Problem?

By Rajeswari Jayaraman

Mankind is producing more data than ever. Globally, digitally stored data is projected to reach 44 trillion gigabytes by 2020 which is a threefold increase in just three years. Data storage capacities of conventional media like hard drives and magnetic tapes are hitting their physical limits. A great promise to this escalating data storage problem is provided by our own ancient form of biological information storage: DNA.

Researchers have recently shown that one gram of DNA is capable of storing 215 petabytes of digital data. In other words, all of the information humans have ever recorded could be contained in a single room if stored as DNA. Apart from being the densest known storage medium, the information encoded in DNA can last practically forever when kept in a cool, dry, and dark place, as shown by the ability to reconstruct a human genome from a bone of more than 400,000 years of age.

How Does DNA Data Storage Work?

DNA is made up of chains of four base nucleotides: adenine, guanine, cytosine, and thymine (labeled A, G, C, and T, respectively). For data storage purposes, special algorithms convert the binary digital files of 1s and 0s into the four bases; say, 00 for A, 01 for G, 10 for C and 11 for T. For encoding data, information is transmitted by synthesizing DNA strings with specific base patterns. The files can then be decoded using modern DNA sequencing technology. The theoretical maximum capacity of information storage per nucleotide is two bits (although in practice, this amounts to 1.8 bits owing to the inevitable noise factors).

In 2016, Microsoft announced a record of storing 200 megabytes (MB) of data using about 1.5 billion unique pieces of DNA. This year, researchers from Columbia University and the New York Genome Center have reported the development of the DNA Fountain algorithm, which approaches 85% of the theoretical storage limit per nucleotide—60% better than previous studies. In addition, the information storage and the retrieval was 100% reliable and error-free.

Current Limits of DNA Data Storage

The main challenges with DNA data storage are cost and efficiency. The task of encoding data is incredibly slow, with rates of about 400 bytes per second. This is millions of times slower than the microsecond timescales in a silicon memory chip. In fact, in order for a data storage technology to become feasible in practice, Microsoft estimates that the rate to encode information must be at least 100 MB per second. The cost of synthesizing DNA molecules is also very expensive. Experts calculate the cost to be $800,000 for Microsoft’s 200 MB project and $7,000 for the synthesis of 2 MB of data for the DNA Fountain project.

However, companies and researchers believe the cost will plunge significantly over time, as DNA synthesis methods improve and consume less machine time. A number of bioscience startup companies are already working on synthesizing DNA using enzymes rather than the conventional decades-old chemical process, which could decrease cost as well as increase speed. Several big players are jumping into the field and Microsoft Research computer architects have the optimistic goal of building an operational DNA based storage system by 2020.

The Future of DNA Data Storage

Due to the challenges of cost and efficiency, we can safely conclude that the early applications of this technology will be restricted to long-term archival applications, like storing medical or legal records, as opposed to a wide-scale consumer storage medium. In addition to presenting an incredibly dense data storage medium, the biological nature of DNA also renders this novel storage method ecologically friendly, which will make it more promising in the long-term as a more sustainable storage technique.

Image courtesy of

More of Our Insights & Work