Think the memory card in your camera
is high-capacity? It's got nothing on DNA. With data accumulating at a faster
rate now than any other point in human history, scientists and engineers are
looking to genetic code as a form of next-generation digital information
storage.
Now, a team of Harvard and Johns
Hopkins geneticists has developed a new method of DNA encoding that makes it
possible to store more digital information than ever before. We spoke with lead
researcher Sriram Kosuri to learn why the future of archival data storage is in
genetic code, and why his team's novel encoding scheme represents such an
important step toward harnessing DNA's vast storage potential.
The
Problem
Humanity has a storage problem.
Recent surveys conducted by IDC Digital Universe
suggest that the perfusion of technology throughout society has triggered an
explosion in the volume of information that we as a species produce on a daily
basis. Between photos, video, texts, tweets, Facebook updates, unsolicited
FarmVille requests, Instagram posts and various other forms of digital data
production, the world's information is doubling every two years, and that
raises some important questions, chief among them being: where the hell
do we put it all?
"In 2011 we had 1.8 * 1021
bytes of information stored and replicated" explains Sriram Kosuri, a
Harvard geneticist and member of the Wyss Institute's synthetic biology platform, in
an email to io9. "By 2020 it will be 50 times that. That's an astounding
number; and doesn't include a much larger set of data that's thrown away (e.g.,
video feeds)."
Expand
As Kosuri points out, not all of
this information needs to be stored, but — being the diligent little hoarders
that we are — a good deal of it will be cached away somewhere for posterity;
and at the rate we're generating information, we'll need to find new storage
solutions if we want to have any hope of keeping up with our demand for space.
"Our ability to store, manage, and archive such information is being
constantly strained already," notes Kosuri. "Archival storage is also
a large problem."
The
(Theoretical) Solution: The Advantages of DNA Storage
Archival storage is where DNA comes
in. As storage media go, it's hard to compete with the universal building
blocks of life. In an article published in today's issue of Science,
Kosuri — in co-authorship with geneticist Yuan Gao and synthetic biology pioneer George
Church — describes a new technique for using DNA to encode digital information
in unprecedented quantities. We'll get to their novel storage method in the
next section, but for now let's look at some numbers that help contextualize
what Kosuri identifies as the two major advantages of DNA storage: information
density and stability.
At theoretical maximum, one gram of
single stranded genetic code can encode 455 exabytes of information. That's
almost half a billion terabytes, or 4.9 * 1011 GB. (As a
point of reference, the latest iPad tops out at 64 GB of storage space.) DNA
strands also likes to fold over on top of themselves, meaning that, unlike most
other digital storage media, data needn't be restricted to two dimensions; and
being able to store data in three-space translates to more free-space.
Expand
DNA is also incredibly robust, and
is often readable even after being exposed to unfavorable conditions for
thousands of years. Every time researchers recover genetic information from a
woolly mammoth specimen, or sequence the genome of a 5,300 year-old
human mummy, it's a testament to DNA's durability and data life.
Just try recovering files from a 5,000-year-old CD or DVD. Hell, try it with a
20-year old disc; odds are it just isn't going to
happen.
That being said, DNA has its
shortcomings. "It's not re-writable, it's not random access, and it is
very high latency," explains Kosuri, "so really the applications are
for archival storage (not to downplay the importance of archives)."
The
(Practical) Solution
To demonstrate the vast potential of
DNA storage, Kosuri and his team used just shy of 55,000 159-nucleotide chunks
of single stranded genetic code to encode a 5.27-megabit book, containing
53,426 words, 11 jpg images and one JavaScript program. They then proceeded to
use next-generation DNA sequencing techniques to read it back. (For those who
need refreshing, nucleotides are the individual building blocks that, when
joined together, form strands of DNA.)
5.27-megabits probably doesn't
strike you as a lot (that comes out to roughly 660 kilobytes of information,
about what you'd find on a 3.5" floppy from the 80s), but it's impressive
for at least three reasons:
One: It positively crushes the previous DNA-storage record of
7,920 bits.
Two: The novel encoding method employed by Kosuri and his
colleagues allowed them to address issues of cost and accuracy, two
long-standing technical hurdles facing DNA storage:
The major reason why this would have
been difficult in the past is that it is really difficult to construct a large
stretch of DNA with exact sequence, and make it cheaply. We took an
approach that allows us to use short stretches of DNA (basically by having an
address (19 bits) and data block (96 bits), so each short stretch can be
stitched together later after sequencing. Using short stretches allowed us to
leverage both next-generation synthesis [for writing data]… and next-generation
sequencing [for reading data] technologies to really lower cost and ease.
Three: It offers a compelling proof of concept that DNA can be
used to store digital information at remarkable densities. "What we
published in terms of scale is… obviously small compared to commercial
technologies now," explains Kosuri, but "using our method, a petabyte
of data [one petabyte = 1,024 terabytes] would require about 1.5 mg of
DNA." Since that genetic information can be packaged in three dimensions, that
translates to a storage volume of about one cubic millimeter.
Expand
The logarithmic plot featured here
illustrates how the storage density demonstrated by Kosuri and his team
(labeled "This Work") compares to technologies of today and
tomorrow. You should really just reference the graph, but to summarize: DNA
wins out by a landslide.
"For example," explains
Kosuri, "we are ~10 orders of magnitude (100 billion fold) more dense than
a CD, a million-fold more dense than the best commercial storage technologies,
and about ~1000 fold more dense than [other] proof-of-concept work (e.g.,
position atoms on a surface)." He says the secret to DNA's superiority
harkens back to the fact that it can be stored dry in three dimensions;
"thus there is no surface that requires a thickness, which really kills 3D
data density."
The
Future
DNA storage has its limitations. As
I mentioned earlier, it's not re-writable, and it's not random access. Its
latency is also too high for it to be practical for anything other than
archival storage, but we've already established that we're in dire need of
space for archiving, anyway. The only other big limiting factors, at present,
are synthesis and sequencing technologies — and those won't be an issue for
much longer.
According to Kosuri, the costs of
DNA synthesis and sequencing have been dropping much faster than Moore's law.
In the supplementary information section of their paper, Kosuri and his
colleagues imagine what a petabyte of storage would require, from the
standpoint of synthesis and sequencing costs, and conclude that they would need
a roughly 6 order of magnitude drop in sequencing, and 7-8 in synthesis for
storage media of that capacity to become feasible.
"To give perspective,"
explains Kosuri, "costs have been dropping for the past 5-10 years at 10x
and 5x per year for sequencing and synthesis respectively." In other
words: this tech is right around the corner. Are you ready for your DNA drive?
The researchers' results are
published in the latest issue of Science.
Images
via Shutterstock
No comments:
Post a Comment