NextBio Finds Profit at Intersection Between Public and Private Genomic Data

Bioinformatics was a buzzword at the beginning of the genomic era about a decade ago, but it has become a dirty word today. It’s sort of like shorthand for a highly fragmented cottage industry that seeks to analyze biological data, in which no one seems to make money. But whatever you want to call it, Saeid Akhtari still sees big opportunity in using information technology to help biologists make sense of all the genomic data piling up in servers around the world.

And he’s actually starting to make money at it.

Akhtari’s company, Cupertino, CA-based NextBio, doesn’t claim to have singlehandedly solved all the data overload issues that are emerging as genomic instrument makers race ahead with machines that seek to sequence complete human genomes, with 6 billion chemical letters, for $5,000 or even less in coming years. But NextBio has grown to 60 employees, built up an impressive roster of blue-chip pharmaceutical and academic customers, and has started to operate on a cash-flow positive basis in its sixth year, Akhtari says.

The big idea at NextBio is to take the vast amounts of genomic data piling up in free public databases like those run by the National Institutes of Health and pool it with proprietary internal data from for-profit customers, in a way that can be mined by average lab bench researchers in real-time. NextBio’s vision is to make this web-based service so easy to use that a biologist doesn’t need help from a trained bioinformatics expert, and can get clear answers that shed light on how, say, certain genes are up-regulated or down-regulated in a diseased tissue.

“We’re finding that by correlating different pieces of data, new things pop out at you,” Akhtari says.

Saied Akhtari

Saeid Akhtari

NextBio got started back in 2004. That was when Akhtari sold his last bioinformatics company, Silicon Genetics, to Agilent Technologies for an undisclosed sum. Akhtari co-founded the new venture with Ilya Kupershmidt, one of his key lieutenants at Silicon Genetics, and Mostafa Ronaghi, the chief technology officer of the market leader in gene sequencing instruments, San Diego-based Illumina (NASDAQ: ILMN).

Back then, NextBio saw the growth in genome-wide association studies, the search for subtle variations in genomic code called single nucleotide polymorphisms (SNPs), and the growing use of next-generation sequencing tools that were making it possible for biologists to run all kinds of new experiments that might shed light on what’s going wrong and causing disease. One of the big problems then, Akhtari says, was that biologists couldn’t just run queries against these datasets on their own—they needed to ask for help from a bioinformatics expert. It was sort of like the days before Google or Bing, when information professionals needed to ask research librarians trained in Boolean logic to run online queries that got good results. The simple numbers game says that a Big Pharma company might have 200 people highly skilled at bioinformatics, and several thousand biologists, so the average queries would pile up and get stale. Since researchers had to send an e-mail with a query and then wait for weeks to get an answer, they often didn’t bother to ask questions in the first place, Akhtari says.

“We noticed this created quite a bottleneck,” Akhtari says. “We wanted to change the dynamic, so any researcher or doctor could go to NextBio, form a query, and get their results right away.”

Of course, this is easier said than done. Real work had to be done to bring together and process the raw data from a number of public repositories. This involved a lot of indexing and semantic tagging to help people tap into correlations that would otherwise be missed.

Making the public data really useful and “normalized” is part of the challenge, since different sequencing machines create different sets of data, and different statistical analyses. But the real edge is in taking that cleaned up public data and marrying it with the unpublished, private data that Big Pharma companies have, Akhtari says.

“The core of the platform is the content, based on public data that’s normalized and in a useful format. That is the yin, but the yang comes from clients’ data,” Akhtari says. “The major pharma companies have a ton of internal data that they never publish. They have large volumes of genomics data, and they can pump all of it into NextBio and correlate it with public content.”

NextBio had to spend a lot of time, and $20 million of investors’ money, building up the capacity to merge this data. The model is essentially software-as-a-service. NextBio has its own secure servers that support the data, and the company provides customers with a secure login and password they use to access their data over the web. The customer agrees to send its data through an FTP connection, where it gets combined with the publicly available data, so the NextBio “correlation engine” can do its job. NextBio doesn’t say how much it charges for access to this data pool, but it has “several multi-million dollar deals” with Big Pharma customers and offers academic researchers a discount, Akhtari says.

So far, NextBio has built up an impressive list of customers. The group includes Merck, Johnson & Johnson, and Pfizer, as well as academic leaders like The Scripps Research Institute, Stanford University, and the Sanford-Burnham Medical Research Institute.

There are still some small bioinformatics companies out there, and lots of academic research groups that write their own specialized “home brew” bioinformatics software. Much of that work goes on “upstream,” doing the primary, secondary, and tertiary analysis of raw data that needs to happen before NextBio does its thing, Akhtari says. Even at the point where NextBio’s service enters the equation, it’s competing against what customers try to develop internally. Apparently, though, that doesn’t concern Akhtari very much.

“A lot of firms have developed internal tools with chicken wire and duct tape primarily for the bioinformatics experts,” Akhtari says. “We have no head-to-head competitors.”

The market for computational analysis of genomic data is still small, of course, but the potential over time could run “into the billions,” Akhtari says. Already, sophisticated cancer research centers like Stanford are seeking to stratify patients into certain groups, with treatment that’s tailored to their individual tumor types, based on looking at the activity of thousands of genes, not just one. Over time, more drugs like Roche’s trastuzumab (Herceptin) will be developed with a companion diagnostic that determines which patients are likely to respond, and which won’t. Someone will need to provide IT to help crunch data for all these experiments that are bound to be run, he says.

When it happens, Akhtari wants to be one of those key behind-the-scenes players making it happen. If that comes to pass, there will probably be some new buzzword people use to describe what NextBio does.

“Our goal is to expand into clinical applications, it’s really exciting,” Akhtari says. “That’s my dream, to see genomics really bring about preventive, personalized medicine.”

Trending on Xconomy