Untying the Knots in Big Data and Big Biology: Q&A with Andrew Su

(Page 2 of 2)

data from more than 300 researchers around the world in an effort to develop computational methods to identify factors that promote or resist neurological disease.

In its announcement, the NIH said it also provided related grants that are focused on data discovery, career development in biomedical data, and the development of big data courses and open educational resources. The NIH says its “Big Data to Knowledge Initiative” (BD2K) is projected to have a total investment of nearly $656 million through 2020, pending available funds.

Su’s research is focused on using quantitative methods in biomedical discovery. He answered some questions about the BD2K initiative by e-mail. A lightly edited transcript of our exchange is here:

Xconomy: The NIH says these grants are intended to make it easier for biomedical scientists to analyze and use genomic, proteomic, and other complex biomedical data sets. How could such data be used to treat patients in a clinical setting?

Andrew Su: There are definitely some of these BD2K proposals that seem to have direct clinical benefit. I don’t want to speak for them, but I think the overall program goals will become clearer when we have a meeting of all consortium principal investigators in Washington, D.C., next month.

X: Do you see opportunities for commercialization of new technologies arising from the work done under these grants?

AS: There is commercialization potential, both for our grant and the BD2K program as a whole. We will be developing a variety of technologies for proteomics—better identification of post-translational modifications, modeling spatiotemporal dynamics, correlating to genetic variants, and relating to cardiovascular disease.

On the other hand, there is a significant portion of our proposal that focuses on using crowd-sourcing and citizen science to organize biomedical knowledge. In those cases, the knowledge bases that result will be entirely free and open to all. While those won’t be directly commercializable, we hope that it will be a foundation on which other efforts (both commercial and non-commercial) can build.

X: Can you give me some examples of bottlenecks that make it hard to apply genetic data in the diagnosis and treatment of patients?

AS: Annotating the functions of genetic variants is a big one. We are very good at identifying the presence or absence of variants in a given patient, but picking out the variant or variants that are driving disease is still very difficult. This is due to a combination of things—the absence of functional data, and poor organization of the functional data that has been generated.

X: It doesn’t seem like $32 million is going to go very far if you’re trying to solve the problems that make it hard to use big data in a clinical setting. Is parceling out these small grants to dozens of research centers the most efficient way to address these bioinformatics bottlenecks?

AS: You touch on the classic debate of top-down NIH-driven programs versus bottom-up investigator-driven proposals. I don’t think there’s a provable right or wrong answer here, but there are certainly passionate voices on both sides.

If you’re asking whether BD2K will yield tangible benefits in four years, just based on the people involved, I’d be shocked if it didn’t. There are some top-notch people participating (both awardees and NIH).

X: How does the Scripps Wellderly Genome Resource fit into the BD2K program? Who owns that data? Is the owner willing to share/collaborate by allowing other researchers to access that database?

AS: With any Big Data initiative, the quality of the output depends a lot on the data used as input. In particular, having large, high-quality reference data sets is incredibly valuable. The Wellderly study, for example, tells us a lot about what genetic variants are and are not likely to be functional or deleterious.

Single PageCurrently on Page: 1 2 previous page