(Page 3 of 3)
to send this much data to a remote server run by Amazon. This isn’t an issue for the customers using DNAnexus now, Sundquist says, because it usually takes a week to do a sequencing run and an hour or less to transmit the data across the Internet. But as I noted in a feature story yesterday, Amazon has set up a conventional FedEx system for researchers who prefer to save their data on a disk and ship it back and forth to Amazon, rather than transmit over the web.
As more and more researchers start producing more and more gene sequencing runs, bandwidth could be an issue, Sundquist says. “It’s not a problem now,” he says. “Five or 10 years from now, it could be a real problem.”
Another real problem, which Microsoft has been grappling with lately, is how to create a standardized program that’s useful for researchers who might ask completely different questions. This is one of the reasons there are so many open-source and custom-made programs, and no single dominant for-profit vendor, Sundquist says.
Some of the more specialized questions will always be part of bioinformatics, and there will probably always be a place for the custom-made bioinformatics programs, Sundquist says. The DNAnexus program is designed to be good at some very common questions that researchers look for, like single nucleotide polymorphisms (SNPs) that occur in the genome and might be associated with a disease.
Researchers have traditionally leaned on home-brewed software in an era when a tiny number of complete human genomes are thought to have been sequenced worldwide. But over the next few years, that number is expected to skyrocket to 1 million genomes. If that happens, the data deluge will be hard to fathom. The entrepreneurs of a decade ago who said there was gold in bioinformatics may just have been a little too far ahead of their time, Sundquist says.
“This is so much larger a scale of anything from the past, it’s forcing a shift,” Sundquist says. “This is going to be the dominant problem.”