Here and there, the genetic information of a cancer patient has helped a doctor find that person the right drug or steer her away from the wrong one. But the use of genetic knowledge to improve people’s health is in its infancy.
That’s why seven major cancer treatment centers in North America and Europe are pooling their patients’ data in a central repository, dubbed Project GENIE (Genomics, Evidence, Neoplasia, Information, Exchange), for their own researchers and doctors to use and, eventually, to open completely to all comers. “Right now genetic testing is getting done for individual patients and being returned to those patients’ physicians,” says Victor Velculescu, co-director of cancer biology and a professor at Johns Hopkins University who has helped push forward the field of cancer genomic analysis. “Those data are not being used, except to help that patient. This effort brings all data from all reports together and allows us to keep learning.”
The bet with this project—and writ larger, with the promise of so-called big data in all manner of medical fields, not just cancer—is that the more peoples’ information is pooled for comparison and analysis, the deeper the insights into health.
“It’s reasonable to think that it could have the records of 50,000 to 100,000 patients in several years,” says Charles Sawyers, one of the nation’s top cancer researchers and the chair of the human oncology and pathogenesis program at New York’s Memorial Sloan Kettering Cancer Center. “That’s a lot.”
Sawyers hinted earlier this year to Xconomy that the massive data sharing project was underway and could launch in the fall.
Sloan Kettering, Johns Hopkins’s Kimmel Cancer Center in Baltimore, Dana-Farber Cancer Institute in Boston, Princess Margaret Cancer Centre in Toronto, Vanderbilt-Ingram Cancer Center in Nashville, TN, Institut Gustave Roussy in France, and the Center for Personalized Cancer Treatment in the Netherlands are the seven participating institutions. Velculescu says the consortium could welcome more in due time.
One of the first projects GENIE could tackle is what Dana-Farber chief scientific officer Barrett Rollins calls a “meta” problem. Does decoding the genes of patients’ cancer actually lead to better health outcomes? Dana-Farber and other institutions have their anecdotal success stories; Farber has a handful of what it calls “Lazarus cases,” such as a young man with a seemingly intractable form of leukemia, who was at death’s door when a sequence of his cancer pointed toward treatment with the drug imatinib (Gleevec), according to Rollins. “Now he and his wife are having a baby,” says Rollins.
But no one has examined whether more sequencing, across large populations, is worthwhile. “This is one of our best hopes to demonstrate what we all intuitively feel is true,” says Rollins, noting that insurance companies aren’t keen on paying for tests that don’t have a body of positive health outcomes behind them.
The participants are all well equipped with the labs and machinery, not to mention the flow of patients, to contribute to the pool of data, but some are contributing more than others. Supported with philanthropic money that subsidizes its data efforts, Sloan Kettering sequences 410 genes from the tumor of some of its patients—mainly those with metastatic cancer. Dana-Farber looks at 405 genes, but it does so for every cancer patient. More financially constrained, Hopkins sequences about 50, and hopes to double that in the future, says Velculescu, but it relies on insurance and reimbursement to pay for its data work.
These disparities are a reminder that, while the cost of sequencing has plummeted in the genomic age, there are still financial limits to how much an institution can do. The differences also underscore a practical problem: How to combine all these data into one searchable pool? Not only are the data sets different sizes, but each institute might sequence a particular gene in different ways—focusing on one mutation over another, say, or sequencing the same site with varying levels of redundancy for quality control.
To help create a framework, GENIE has turned to Sage Bionetworks in Seattle, a nonprofit group that helps groups share scientific data. Part of the job is not just melding together existing data but accounting for future changes in each institution’s panels, or sets of genes, under examination. “These panels have to be living entities, driven by what the scientific literature tells us” is relevant to patients, says Rollins. Every two quarters Rollins and others review which genes to swap in or out.
To add to the complexity, Project GENIE will have a second layer of information, drawn from the non-genomic health records of the patients whose tumor sequences are in the pool. That information would include the type of cancer, plus the patient’s age, gender, location, other diseases, and general health history—what researchers often lump together into the catch-all term “phenotype.”
Those health records will not be stored centrally, so researchers will have to request data sets for employees at each of the seven institutions to pull together. That part of GENIE will be slower to build, says Sawyers. “It’s not easy to retrieve that data from medical records. It’s a manual process that costs money, so at first it has to be a focused extraction of the most important elements,” he says. “We envision over time there will be software to extract data at scale, but right now that does not exist.”
The consortium is also going to be more protective of that extra layer—rich context for the genomic data—and giving its members priority to pursue questions that Sawyers calls “super compelling.” Sawyers gives the example of the BRAF gene. Mutations to the gene drive certain cancers, and the Roche drug vemurafinib (Velboraf) does quite well treating people with a dire form of skin cancer driven by a particular BRAF mutation called V600e, the most common. But a small portion of BRAF mutations are not V600e, which “we all know about but we don’t know what to do about,” Sawyers says. “We know the answer to one or two variants, but only based on a small sample of patients. The field would benefit from a larger analysis across seven centers.”
When asked what project he would initiate on GENIE if he could start tomorrow, Velculescu had a practical answer. He would expand a recent Johns Hopkins study that explored why the cancer drug cetuximab (Erbitux) fails to help a portion of people with late stage colon cancer. Cetuximab blocks the protein EGFR that, when mutated, puts cancer cells into overdrive.
In that study, researchers sequenced the exome—the small portion of the genome that codes for proteins—of the tumors of more than 100 people, then transplanted and grew those peoples’ tumors in mice. Combining genomic and phenotypic information, the study found new mutations in six genes that could be driving drug resistance. But studying the question in thousands of cases, says Velculescu, could lead to more solid clues for new treatments.
GENIE came together under the roof of the American Association for Cancer Research, which is funding it with $2 million for two years, but who pays for it after that is up in the air, says Sawyers. (Sawyers was recently AACR president, and Velculescu is on the board of directors.) In addition to philanthropic funding, one potential source of future support is drug companies who want custom data sets for their own research, the project’s leaders say. (For-profit users of the system would still have to publish their results.)
But the data will eventually be free. To appease institutes and researchers who demanded some reward for their own work, originators will have six months of exclusive use of their own data before the rest of the consortium gains access. After six more months, the data will be open to all comers—even for-profit groups, which means companies like 23andMe, Helix, Craig Venter’s Human Longevity, and Invitae, all looking to build treasure chests of health data, could conceivably also tap in.
One limitation of the data set at first is that it will include information about mutations that occur in cancer cells, but not inherited genetic data. While plenty of insight about cancer can be gleaned strictly from the changes that take place in cancer cells, genes inherited from previous generations can play a critical role in a person’s cancer risks. Sometimes those correlations are straightforward, which is why women with certain mutations of the BRCA genes might opt for preventive measures—such as the double mastectomy actress Angelina Jolie opted for in 2013—even if they have not yet shown signs of cancer. But researchers say our understanding of those correlations is growing more complex, making the hereditary layer of information increasingly relevant in diagnosis and treatment.
Adding that layer of information to GENIE will have to come later, says Velucescu. “We didn’t want the perfect to be the enemy of the good. We wanted to get this off the ground sooner rather than later.”