all the information, none of the junk | biotech • healthcare • life sciences

Geospiza Runs in the Black, as Scientists Turn to Software to Help Crunch Genomes

Xconomy Seattle — 

The faster and cheaper that gene sequencing gets, the better things start to look for Seattle-based Geospiza. This small angel-backed company has stuck to its guns for 13 years, many of them lean, arguing that biologists need better software to make sense of the digital mountains of DNA being created every day.

Geospiza—knock wood—has now won over enough customers that it is operating on a consistent cash-flow positive basis, says president Rob Arnold. It’s a modest milestone, but an important lesson in perseverance for a little operation with about 20 employees. Arnold says Geospiza has built a roster with “hundreds” of paying customers for its lab software and analysis products, including scientists at the Institute for Systems Biology and University of Washington, Harvard Medical School, Yale University, Children’s Hospital Boston, and the University of Florida.

“We’ve made enormous progress,” Arnold says. “We are able to financially power ourselves now.”

Gene sequencing has been on a torrid pace of innovation over the past few years, as the established toolmakers like Illumina, Life Technologies, and Roche have been racing to lower the cost of sequencing an entire human genome to as little as $10,000. Others, like Mountain View, CA-based Complete Genomics say they can do it for as little as $5,000. This is creating terabyte-size piles of digital data in the form of A, C, G, and T. Once those digits have been recorded from a biological sample, scientists need to be able to store, analyze, compare, and visualize the patterns on their computers before they can have a “Eureka” moment that might lead to a top-notch scientific paper or a medical insight.

The sequencing instruments themselves can cost as much as $500,000. So quite a few researchers over the years have figured they could get by on the cheap by dumping their data into old Microsoft Excel spreadsheets that were never designed for this kind of thing, or whipping up their own “home-brew” software for custom experiments.

Geospiza has long argued that it can do better. It now offers a Web-based product in which it charges $30,000 a year to provide its genomic data service to researchers, plus another $2,500 a year for each researcher who wants to analyze the data from the lab. The system is supported by Geospiza’s cloud computing infrastructure, or a cloud run by Amazon Web Services. That means the research lab doesn’t need to host the data on its own servers.

The big players in sequencing have traditionally concentrated on selling their sequencing tools, not software. But Geospiza has shown the toolmakers that computing matters to their customers. The company has persuaded companies like Life Technologies and Illumina, as well as emerging players like Pacific Biosciences and Ion Torrent Systems, to form partnerships.

When I spoke to Arnold to get an update on the company last week, he made it sound like Geospiza is trying to transform this product from a “nice-to-have” into a “must-have.”

“We can generate an analyzed data set for about $2,500 that would otherwise cost $25,000 if you do it yourself,” Arnold says. “What we can do in a matter of hours, would otherwise take months.”

How do they achieve that savings? Scientists have to start by taking a deep breath and letting somebody else host their precious data. But when they do, this frees them up from the expensive and daunting task of hiring their own in-house bioinformatics guru to take care of all the data on the lab’s own servers, Arnold says.

Cost is a key part of the Geospiza pitch, but it is also benefitting from the trend toward better, faster, cheaper gene sequencing. Once the price for DNA sequencing drops to a certain point, it may be common for researchers to want the whole 3-billion-letter string of DNA from, say, each individual who enrolls in a clinical trial. It is now estimated that existing sequencing equipment around the world has enough capacity to sequence 500,000 entire individual genomes in the next three to five years. Right now, it is still thought that fewer than 100 genomes have been sequenced.

“It’s pretty mind-boggling when you think about it,” Arnold says. All that rapid sequencing is going to create enormous haystacks of data that will be increasingly hard to pull the needle out of, he says.

Software, of course, isn’t some kind of magic bullet for this data problem. Human beings still need time to sort through, analyze, and study the data to make use of it, Arnold says. And while all this exponential data is being produced, we humans are falling farther behind. U.K.-based biophysicist Cameron Neylon made an important point about this a couple weeks ago during a talk about open-source science at Microsoft. He showed a slide which pointed out that the average capacity of the human mind isn’t keeping up with all this data, and that we as individual humans don’t “scale up” to process all of this data.

“Researchers are overwhelmed,” Arnold says.

There’s no one company that dominates this world of software for biological data, either. Microsoft has taken a crack at this with its Amalga Life Sciences program, which now incorporates assets it acquired from Merck’s Rosetta Biosoftware operation in Seattle. Victoria, BC-based Genologics, a company backed by Kirkland, WA-based OVP Venture Partners, overlaps some with Geospiza, although it has a broader strategy of stitching together basic genomic data with other health records. Bridgewater, NJ-based LabVantage makes some competitive laboratory software, as does St. Louis-based Partek. Geospiza has tried to build its competitive edge around being the only one to capture the genomic data and combine it with analytical capabilities, Arnold says.

It’s still very early days to see where this is all going. Over the coming years, researchers are going to have to learn to work in bigger collaborative teams to crunch all the genomic data, Arnold says. The ones who thrive will have high “IQ and EQ,” Arnold says, referring to not just brainpower, but people skills. Companies are going to have to work out standards across many of the existing proprietary “stovepipes” that make it hard to get consistent formats on data for things like whole genome sequences, transcriptomes, and other biological data points, Arnold says. It’s going to take a lot of collaboration among scientists, and companies, to tease out the most meaningful data to get close to that ultimate goal of personalized medicine.

“No one company is going to dominate this field,” Arnold says.

By posting a comment, you agree to our terms and conditions.

Comments are closed.