The practice of encoding data in DNA molecules could be inching closer to graduating from research labs to finding practical commercial use.
In the coming years, the explosion of data being generated by computing devices could outstrip the supply of hard drives needed to store it, some industry experts say. Some academic researchers and business leaders think that the solution could be to house information in lab-made DNA molecules instead of silicon. DNA’s advantages include a much longer shelf life and superior ability to pack information into less space.
But there are a number of hurdles that must be overcome before DNA becomes a feasible method for mass data storage, including how slow and expensive it is to encode data in manufactured DNA. Catalog Technologies, a young startup based at the Harvard Life Lab, says it has developed a process for encoding data in DNA that addresses both of those issues. The company announced today that it has raised $9 million so far from investors, including New Enterprise Associates (NEA), to help commercialize its DNA-based data-storage method.
“We’re really excited about making this a viable solution in the near future, starting with next year some pilot projects,” says CEO and co-founder Hyunjun Park in a phone interview. He declined to share names of pilot customers, but he says Catalog is receiving interest from government entities, nonprofits, and data storage companies.
The idea of storing data in DNA isn’t new, but the field seems to be gaining momentum as it finds faster ways of encoding and reading ever-increasing amounts of data in DNA. Researchers have completed experiments in the past few years demonstrating the ability to encode digital data—the sequences of zeroes and ones that comprise computer text, images, and audio files—into strands of synthetic DNA, made up of sequences of nucleotide bases represented by the letters A, C, G, and T. Storing data in DNA involves converting digital data into DNA code, and then synthesizing strings of DNA molecules with that code. For the past several years, researchers have been working to speed up the DNA synthesis step and lower the cost.
In 2012, researchers that included renowned Harvard professor George Church reported that they encoded a 5.27-megabit book in DNA and read it using a DNA sequencing machine. Since then, demonstrated storage capacity has grown. In a paper published in February, Microsoft and University of Washington researchers reported that they stored 35 distinct digital files in DNA—more than 200 megabytes of data—including the United Nations’ “Universal Declaration of Human Rights” in more than 100 languages and the music video for OK Go’s song “This Too Shall Pass.” The Microsoft and UW team also said they improved methods of retrieving that data.
This month, Church and several other researchers published results of a new method that uses a DNA-building enzyme instead of traditional chemical approaches to rapidly synthesize DNA. UCLA assistant professor Sri Kosuri, a synthetic biology researcher who didn’t work on the project, tweeted that the approach might help improve the speed and cost of DNA synthesis, as well as the speed of reading the encoded information. (The new paper has not been peer-reviewed.)
Catalog says it has developed faster and cheaper methods of building custom DNA for data storage purposes. The startup says the key to its approach is separating the process of synthesizing DNA molecules from the process of encoding the digital data. Park says Catalog’s method involves purchasing large quantities of small DNA fragments—about 20 to 30 base pairs long—from synthetic DNA suppliers. Catalog designed a machine that can dispense and stitch the DNA fragments together in programmable ways. The idea is that Catalog’s process uses a relatively small number of DNA molecules—fewer than 200—which can be combined in an exponential number of ways, Park says. The process requires less DNA synthesis, which is the “expensive and slow part of the work,” he says.
Victor Zhirnov, chief scientist of the nonprofit Semiconductor Research Corporation in Durham, NC, says it sounds like Catalog is using a so-called “library approach,” which involves “encoding information by taking a combination of DNA molecules from a defined lexicon of molecules.”
“By doing this, they don’t need to synthesize new DNA for every new piece of information to store. Instead they just have to remix their pre-fabricated DNA,” Zhirnov says in an e-mail to Xconomy. (His research interests include DNA data storage, and he says he has no ties to Catalog.)
Park claims that by next year, Catalog’s machine will be able to encode 1 terabyte of information per day in DNA, at a cost of several thousand dollars. Current standard methods of encoding data in DNA would cost billions of dollars and take several weeks to accomplish the same task, Park says.
Catalog’s goals for the performance of its system are ambitious, but “not unreasonable,” Zhirnov says. Whether Catalog’s approach “can be done in an economically viable way—it remains to be seen,” he says. “I find their approach interesting and look forward to seeing the results,” he adds. Catalog hasn’t published any peer-reviewed studies of its methods, Park says.
By comparison, a silicon-based portable hard drive with 1 terabyte of storage capacity typically costs less than $100, and the process of saving 1 terabyte of data on it would only take a few hours. The bottom line is even if Catalog’s system performs as well as advertised, the company and its rivals are still a long way from being able to compete with the lower costs and faster data transfer speeds of hard drives. Still, DNA’s longevity and compactness might make that tradeoff worth it for some users, particularly those who want to store data for long periods of time without accessing it.
Park says he and his co-founder, Nathaniel Roquet, met at the Synthetic Biology Center at MIT, where they were both working in associate professor Timothy Lu’s group. Park was doing postdoctoral research at MIT, and Roquet was a graduate student researcher at Harvard. They formed Catalog in fall 2016 and relocated to San Francisco to participate in the IndieBio startup accelerator, Park says. They later moved back to Boston and set up shop in the Harvard Life Lab, a co-working and lab space that’s part of the Harvard Innovation Labs.
Catalog has six employees, including the recently hired chief science officer Devin Leake, who was previously the head of DNA synthesis at Ginkgo Bioworks, a Boston-based synthetic biology company. Most of Catalog’s team is trained in biology and chemistry, so some of the venture capital will be spent on hiring more computer scientists, Park says.
In addition to NEA, Catalog says its investors include OS Fund, Day One Ventures, Data Collective, Green Bay Ventures, AME Cloud Ventures, Industry Ventures, and the messaging app company Line. They’re betting that Catalog will beat competitors, including Iridia and Helixworks Technologies, in the race to deliver practical DNA-based data storage systems.
Ultimately, Park envisions DNA being used not only for storing data, but also for transporting it. He says NASA, for example, might want to use DNA to more easily and reliably transport information through space, which would become more important if humans one day colonize other planets. Back on Earth, intelligence agencies could transport data more securely with DNA, Park says.
“If you wanted to carry around a few petabytes [of data] with you in just a few grams [of material], in an untraceable way, DNA would be a good way to do it,” Park says.
[Top photo courtesy of Catalog. From L to R: Leake, Park, and Catalog scientist Milena Lazova.]