George Church created the Personal Genome Project, a big plan to sequence more than 100,000 human genomes in the U.S. Now the database he’s been using to store all that information has become the basis for a new startup.
Boston-based Curoverse is announcing today that it’s raised $1.5 million in seed funding to continue developing Arvados, an open source computational platform that houses massive amounts of genomic data. Hatteras Venture Partners, Point Judith Ventures, MassVentures, Boston Global Ventures, and Common Angels have provided the funding for Curoverse, which plans to release its first commercial products next year.
Curoverse is a product of the Personal Genome Project, according to CEO Adam Berrey. That project was spearheaded by Church to sequence a massive number of genomes and link each individual’s health information. Church realized when taking this on that to successfully do so and start similar studies around the world, he needed a massive database. So he looked to one of his computer scientists, Alexander Wait Zaranek, to create one with a few key factors. It had to be able to hold close to an exabyte worth of data. Researchers had to be able to use it to efficiently analyze data and make sense of what they see. It had to be shareable from one research center to another. And most importantly, it made to make complex analyses easily reproducible.
Zaranek’s team came up with Arvados, which Berrey says is based on a lot of modern day cloud-computing and big-data technologies, only tailored to handle a giant amount of genomic information. Berrey says Arvados helps make computations reproducible—say, if a scientist wants to repeat an experiment from a few months ago and see if the results have changed. It’s also shareable, meaning a bioinformaticist could write an algorithm, and run it across data that’s stored at several different locations. Curoverse can run on both public and private cloud services, so it’ll be available both on Amazon and other cloud platforms, according to Berrey.
Though the first iteration of Arvados was initially developed in 2006 and deployed two years later to power Church’s study under the name FreeFactories, the cost of sequencing was too high at the time to use the software to start a company. So even though the company was officially incorporated in 2010, executives weren’t hired to steer the ship until 2012. That’s when Berrey, a veteran of software startups like Allaire and Brightcove, was brought in as CEO, and Jonathan Sheffi was hired to handle business development. (Zaranek is the company’s scientific director, while Church is on the scientific advisory board, according to Berrey.)
The group has since been bootstrapping it, putting together a business plan, and figuring out how to get it financed. With the seed money in place, the company—-which was initially called Clinical Future, but is now known as Curoverse—is expanding its team of engineers and getting ready to bring its first product to market.
Even so, Curoverse is joining what’s becoming an increasingly competitive sector. A bunch of startups like Mountain View, CA-based DNAnexus, Redwood City, CA-based Bina Technologies, and others, have popped up with similar ideas for ways to efficiently store genomic data and interpret it. What makes Curoverse’s system different, Berrey says, is that it runs on open-source software, rather than via a proprietary system where “you’re totally dependent on a single vendor.” Curoverse will manage a website, arvados.org, that researchers can tap into from anywhere and use to share information. This means that a geneticist could ask a question about data that sits in his or her own data center, a neighboring lab, and elsewhere simultaneously, without having to physically move any of them around, Berry says.
“The different model of data sharing—you don’t move the data, you move the computations around—and the open-source strategy are the two things that are very different,” he says.
To be clear, Curoverse doesn’t specifically “own” Arvados in the proprietary sense—since it’s an open source platform, anyone can use and download the source code, or computer instructions behind it. But Curoverse will set up, operate, manage and maintain it, and charge users for the amount of computational resources and data storage they use (Berrey wouldn’t say how much). Berrey likens the approach to how Acquia has commercialized Drupal’s open source content management system, or similarly what Rackspace is doing with the OpenStack open source cloud operating system.
”[It’s] complex, time consuming, and requires specialized system administration and operations skills,” Berrey says. “Curoverse will provide products that make it turnkey to use Arvados without having to deal with any of the challenges associated with configuring and managing your own systems.”
Today, Curoverse is only running a private beta version of that service—a private cloud used for genomic analysis at Harvard with 300 terabytes of storage on two clusters, or data centers. Its first product, expected to be available next year, will be a platform-as-a-service, or a hosted and managed version of Arvados. Curoverse aims to then sell a set of products that enable companies and organizations to deploy clouds using Arvados.
Curoverse ultimately hopes to tap three markets. This coming year, it’ll target clinical researchers at major medical centers. It’ll then move on to pathology and independent genetic testing labs using next-generation sequencing for diagnostic tests. The big dream is for doctors someday to use Arvados as a precision medicine tool to treat their patients better—say, by using an application that picks out a specific drug based on the patient’s genetic profile.
Of course, that’s a ways away. Competition aside, Berrey acknowledges that the company is going to have to work hard to get its foot in the door and convince big medical organizations to change, and “understand the value” of using big-data computing over traditional methods. But he’s hoping that that tipping point is coming.
“The good news is there’s so much new data being generated,” he says. “The systems that are installed now aren’t ready for those new data.”