Bad Statistics, and Bad Training, Are Sabotaging Drug Discovery


(Page 2 of 2)

to pay for additional analyses is uncertain. The future of the initiative likely depends on how informative this pilot study turns out to be.

Other laboratory practices have recently been put under the microscope. A report from the Global Biological Standards Institute identified problems with standardization that were pervasive across both academic and industrial research settings. The authors say that irreproducibility “stems from undefined variance in reagents, practices, and assays between laboratories.” They advocate for the expansion of standard practices and reagents via educational programs and policy initiatives. Their report identified statistical analysis of data as one of many problematic areas, warning of “differences in statistical methods, including use of different mathematical approaches to analyze data or use of statistical approaches that might not be optimal for the particular data type.”

I’m not an expert on statistical methodology in the biological sciences, and statistics are not always required to make novel scientific discoveries. Certain problems in biology are focused on outcomes where statistics aren’t needed. Much of the work I did in my career was in cloning genes encoding previously undiscovered growth factors. The only possible outcomes were that I had cloned a new gene, or I hadn’t. While many of us trained in the biological sciences took a biostatistics class somewhere in our schooling, it is often a distant memory by the time we’ve landed a lead investigator position. One of my former colleagues once submitted a paper that came back with a reviewer’s comment that the statistical analysis should have included the Bonferroni correction. Neither he nor I had any idea what this was, but he was able to get assistance from one of our company’s statisticians to fine-tune the analysis. Knowing how to perform a particular statistical test isn’t sufficient; it’s equally important to know which statistical test is the best one to apply.

If reproducibility problems in the biological sciences are due to a mixture of poor statistical analysis and improperly done experiments, how do we fix this? And for all of the articles that we’ve read over the past few years pleading for new programs to get students amped up and interested in STEM careers, maybe a bigger problem is the insufficient training of a significant percentage of current scientists. These problems can’t be fixed quickly, but I’ll suggest two possible solutions that are not mutually exclusive:

Require Manuscripts to Have Solid Experimental Statistics

Journals, if they don’t currently engage such individuals, should add peer reviewers who are charged with the task of making sure the statistics in submitted studies are up to snuff. At least one reviewer on every paper should be someone who has been vetted by the editors as having a strong background in statistical analysis. Given that the federal government is funding a large percentage of research studies via the NIH, they should consider assembling an extramural statistical group to help grantees with both the design of their experiments and the analysis of their data.

Exactly where the funding for such a group would come from is unclear, and we know that academia is already suffering greatly because of the across-the-board federal budget cuts known as the “sequester.” Big Pharma should be willing to throw some money in this direction because its ability to develop new drugs is increasingly tied to the quality of the data coming out of academia. The industry currently provides at least 40 percent of the funding for the Tufts Center for the Study of Drug Development, a non-profit research group that analyzes various issues in the pharmaceuticals business.  Companies could easily endow an independent organization to take on the task of helping with the statistical analysis of data and developing research standards. This is especially true given the enormous increase in future industry revenues predicted by the IMS Institute for Healthcare Informatics assuming the successful adoption of the Affordable Care Act (aka Obamacare). Big Pharma companies already come together to form TransCelerate Biopharma, whose focus in on solving industry-wide problems. The issues outlined above could certainly be part of this group’s mandate.

Alternatively, researchers could turn to private contract research organizations who have the statistical talent on staff to evaluate the data. Figuring out how this additional statistical analysis would get paid for needs to be determined, but again it’s a logical place for Big Pharma to invest some money from their multi-billion dollar war chests. Industry contributions to such an effort (in the form of unrestricted grants) could be proportional to the sales income or profitability of individual members.

Provide Better Training in Experimental Design and Implementation

Most graduate programs in the biological sciences feature a wide variety of coursework; the classes you take depend on your specific area of study. Courses in biostatistics are commonly available, but are probably not widely required. Getting universities to offer a broader spectrum of statistics classes might be helpful in the long run, but this would do little to solve the problem in the short term. I wonder how many departments or graduate programs actually offer courses on how to properly design and execute experiments; perhaps they need to add them. I think it’s generally assumed that this subject is a mentor’s responsibility, but suppose they either don’t have the proper training themselves, or won’t take the time to teach the people in their labs? Having said that, I think that some procedures don’t readily lend themselves to fixing in the real world. The suggestion that researchers blind themselves so that they don’t know which animals get which treatments is one such example. This could readily be done in an industrial lab, but I’m having a hard time seeing how graduate students and post-docs, who frequently work alone while doing their experiments, could accomplish this task.

One other problem that may be affecting experimental design across numerous disciplines is the reduced availability of grant money. Increasing the number of animals per group, for example, may be statistically advisable but will also raise the cost of an experiment. Researchers may be cutting back on the size of their experimental groups as a way to save money without realizing that doing so puts the outcome at risk. While understandable, this practice winds up being penny wise but pound foolish if it results in the conclusions that are simply wrong and can’t be reproduced by others.

Several years ago I was asked to advise an academic group that wanted to start a new biotech company. They thought they had come up with a novel Alzheimer’s treatment using bioactive peptides. While reviewing their limited data I saw that they hadn’t included any negative control peptides in their experiments. Even a beginning graduate student should have recognized this was highly problematic. I inquired about this and was informed that they simply didn’t have the funds to purchase and include them in their study. I told them that investors were unlikely to back a company without this critical piece of data. Unfortunately, they simply couldn’t find the money to do these additional experiments, and the entire project was eventually abandoned.

Statistics can be used well or they can be abused. This observation was nicely captured in the old saw “There are three kinds of lies: lies, damned lies, and statistics.” The expression was popularized by Mark Twain (among others) to explain how politicians bolster weak arguments using dubious data. Valen Johnson’s analysis indicates that many scientists (who don’t know any better) also use statistics poorly, resulting in weak data that doesn’t hold up to careful scrutiny. There’s an old joke in the biological sciences: a researcher sharing his data with colleagues reported that “33 percent of the animals responded positively to the treatment, 33 percent of the animals showed no response, and the third mouse ran away.”

Given the current state of affairs, this joke isn’t sounding too funny anymore.

Single PageCurrently on Page: 1 2 previous page

Stewart Lyman is Owner and Manager of Lyman BioPharma Consulting LLC in Seattle. He provides strategic advice to clients on their research programs, collaboration management issues, as well as preclinical data reviews. Follow @

Trending on Xconomy

By posting a comment, you agree to our terms and conditions.

6 responses to “Bad Statistics, and Bad Training, Are Sabotaging Drug Discovery”

  1. David Miller says:

    A good start would be a specific certification/recertification requirement for CMEs specific to biostatistics. The number of practicing clinicians who cannot answer the following question correctly is staggering:

    Two identical trials and patient populations. Which drug is the superior drug for patients?

    Drug A: p-value = 0.001, Hazard Ratio=0.89
    Drug B: p-value = 0.01, Hazard Ratio=0.69

    That’s the easy question. Here’s the one that most everyone screws up, clinicians, researchers, and patients:

    Two identical trials and patient populations. Which drug is the superior drug for patients?

    Drug C: Median survival 2.0 months, Hazard Ratio = 0.49
    Drug D: Median survival 3.5 months, Hazard Ratio = 0.78

    • Anonymous says:

      I’m gonna say B & C. BTW I have absolutely no experience or background education in this field, but I do find it interesting and I’d like to know the right answer. Thanks, David.

      • david says:

        You’re right.

        In the first example, people often get confused that the p-value has anything to do with efficacy. It does not. All the p-value tells you is the degree to which the outcome seen in the trial was due to chance.

        The second example highlights median versus the entire population. The median tells you how one patient performed from each arm – the middle patient. That’s useful information, but the Hazard Ratio is a far superior measure as it described how the entire population performed — those who didn’t respond as well and those who responded really well. Especially when looking at interim (immature) data, the median can be pretty deceptive.

  2. When one is looking at statistical significance one can influence the p value a lot by choosing how to define the parameters. For example if you have a reading of 70 versus 65, with 60 in the control group, one can either compare 70 to 65 or you can subtract the control from both values and compare 10 with 5 to improve statistical significance. Statisticians can choose either to cooperate with such approaches or remain objective. However if the result is not significant it won’t get published.

  3. jl says:

    interesting topic. One reason results don’t show up is also the wharped financial models used to pick cherries, and the incredible amount of costs loaded to product development.