Statistics in Preclinical Pharmaceutical Research and Development

Article excerpt

Although most statisticians and the public at large are familiar with the role of statistics in human clinical drug trials, advances in the basic science and technology of drug research and development (R&D) have created equally challenging and important opportunities in the preclinical arena. Preclinical pharmaceutical research encompasses all aspects of drug discovery and development, from basic research into how cells and organs work and how disease processes disrupt that work to the development, formulation, and manufacture of drugs. The activities that fall under this rubric include biological and biochemical research using in vitro ("test tube"-based) and in vivo (whole animal) experiments; genomics, the study of gene expression in cells, organisms, and populations to determine the molecular biology of disease; proteomics, the study of protein expression patterns to understand how normal and disease processes differ; design, synthesis, and selection of diverse chemical and natural product "libraries" o f compounds to screen for desirable biological activity, often via high throughput, "industrialized" drug screening assays; analytical development for drug research and manufacturing; animal testing of drug candidates for efficacy and metabolism and to determine drug toxicity, teratogenicity (fetal and growth effects), and carcinogenicity; development and scale-up of chemical and fermentation drug manufacturing processes; and drug formulation and stability testing. This list is far from complete.

To put these activities into perspective, it can easily cost more than $1 billion and require 10 to 15 years of R&D to bring out a single new drug, of which only the last 2-3 involve the FDA-reviewed human trials with which statisticians and the public are most familiar. So preclinical activities occupy the bulk of the time and scientific effort. The statistics that support this work cover a broad range of statistical methods. Sample sizes can range from longitudinal case-control studies of 10 or fewer animals (although they may produce thousands of data points from continuous monitoring using sophisticated instruments and telemetry) to hundreds of thousands or millions of multivariate records in drug screening and structure searches. All areas of statistics find useful application, but recent opportunities for nonparametric experimental design, linear and nonlinear longitudinal data modeling, high-dimensional exploration and visualization, inference using exact permutation methods and bootstrapping, and pat tern recognition, classification, and clustering of large databases are perhaps noteworthy.

Clearly, in a brief survey like this we can highlight only a couple of examples. We have chosen chemometrics and genomics because they provide good examples of the kind of interdisciplinary, data-rich, and nonstandard issues that are increasingly at the forefront of modern pharmaceutical research. But these examples are just the tip of a vast and fascinating iceberg.


Roughly speaking, chemometrics is the statistics of (analytical) chemistry data; especially spectroscopy data. Physics and chemistry have developed an arsenal of ingenious tools to probe chemical composition and structure. (A nice internet resource for spectroscopy is These techniques can produce (one- and two-dimensional) spectra of exquisite resolution, often with hundreds or thousands of individual peaks. Digitizing translates them to multivariate vectors of that dimensionality. Chemometrics arose because classical multivariate normal statistical methods were inadequate for such data and related matters of calibration and quality control.

One typical application will give the flavor of the issues. Suppose that one has, say, 200 unknown natural chemical extracts from various biological sources that are tested for antibiotic activity against 30 different pathogens. …