Multiple regression has two distinct uses with observational data: prediction of outcomes and interpretation of process. For prediction, multiple regression is efficient, effective, and makes optimal use of multiple predictor variables while avoiding certain biases that afflict human judges. In most prediction tasks, accordingly, multiple regression outdoes the experts.
Conceptually, multiple regression is extremely simple: a weighted sum of predictor variables. Confidence intervals and significance tests can be obtained with Anova, much as with one-variable regression in Chapter 9.
The two big problems in prediction are both extrastatistical: To find a good criterion and to find good predictors. Both problems may be illustrated with selection for graduate school. What is the criterion of good performance in graduate school? Good grades? Good thesis results? Productivity after the Ph.D.? Self-esteem and self-fulfillment in personal and professional life? Given the criterion, how can good predictors be found? Although both problems are empirical, regression can help.
For interpretation, in contrast to prediction, multiple regression suffers exceptionally serious confounds. Missing variable confounding refers to variables that have a causal effect but are not included in the regression equation. A missing variable can reverse the apparent causal influence of some other variable that is included. Causal effects of intercorrelated variables can readily be misunderstood if some have not been measured. Since nearly all variables in common use are partial measures, missing variable confounds are endemic.
Person–variable confounding is no less serious. Regression analyses usually rest on an implicit assumption that persons are completely interchangeable, that natural individual differences on some variable can be treated as though they were controlled experimentally. This assumption seems quite unrealistic.
The belief that multiple regression can “statistically control, ” “hold constant, ” or “partial out” uncontrolled variables underlies many, perhaps most, applications outside of prediction. “Statistical control” would be wonderful if it were true, but it is false. Person–variable confounding and missing variable confounding each falsify “statistical control.”
Observational data can be useful clues. Causation does imply correlation, so correlation is a clue to causation. These clues, however, are untrustworthy and treacherous. Valid inference from observational data requires empirical knowledge about causation that is not often available. Conjoint observation and controlled experiment is needed as a base for determining when observational data allow trustworthy inference.