Prediction is the strong point of regression analysis. In admitting graduate students, some departments utilize predictive power of the GRE. Similar predictive formulas are often used to sort out those more likely to succeed in particular jobs or to benefit from certain medical treatments. These statistical prediction formulas are typically superior to expert judgment, and less costly to boot.
The term regression merely means curve fitting, usually a straight line curve. This regression exhibits the response measure, Y, as a function of the predictor variable, X. The slope of this line reflects how strongly Y depends on X, so this slope is the main concern. Thus, a slope of 0 means that X has no predictive power for Y.
Statistically, regression analysis can be seen as an extension of Anova that uses a metric predictor variable, X, in place of the experimental variable, A, of previous chapters. The slope of the regression line, which is usually the main concern, can be determined, together with its confidence interval.
The two big problems with prediction are extrastatistical: To find good predictor variables; and to find a valid measure of the criterion to be predicted. The GRE is not actually a strong predictor of success in graduate school. Moreover, success in graduate school is not easily measured, and however measured may have little relation to the final criterion—life after graduation.
Regression can also be used in experimental studies, especially when the stimulus variables are quantitative or metric. With metric stimulus variables, regression analysis offers marked simplification of factorial design. Also, linear trend analysis is markedly simpler with regression formulas than with the linear contrasts of Anova that are generally recommended.
The correlation coefficient is a byproduct of regression analysis. It can be useful in specialized domains, as in test theory. The correlation coefficient suffers surprisingly many flaws, however, that severely limit its usefulness.
Causal analysis is the weak point of regression analysis. Regression–correlation is often used to attempt causal interpretation from uncontrolled, observational data. A good deal of what is reported in the media and even in professional journals is unwarranted inference based on regression analysis of uncontrolled data. The many well-known pitfalls are glossed over with lip-service cautions or simply ignored. Causal analysis with observational data has high importance, but effective work requires unusually high expertise, both statistical and extrastatistical.