# Empirical Direction in Design and Analysis

By Norman H. Anderson | Go to book overview

NOTES
9.1.2a
For simplicity of exposition, this paragraph assumes that Y and X have symmetric distributions.
9.1.2b
The slope coefficient of ½ in Galton's equation comes from a more extensive follow-up by Pearson and Lee (1903), who obtained a stronger relation than Galton.
9.1.2c
Regression equations are misleading because they ignore the error of prediction. In Galton's father-son equation, sons of the same father usually differ considerably in height. The correlation is only.5; the scatterplot has a lot of scatter that almost obscures the relation. Freedman. et al. (1998, p. 172) comment, “It was a stroke of genius on Galton's part to see a straight line in the chaos.” The predictive power of GRE for success in graduate school is even smaller, although not less than that of an admissions committee.
9.1.3a
Formulas for the confidence band for a regression line, and for the confidence interval for any predicted value, are given in many texts (e.g., Myers & Well, 1991; Snedecor & Cochran, 1980). I have adopted the symbol, Xnew, from Myers and Well.
9.1.3b
Pure Error Through Within Cell Replication. The regression error term obtained from Equation 8b differs from the Anova error term of previous chapters because it includes systematic deviations from linearity as well as pure error. A pure error term can sometimes be obtained that requires no assumption about the form of the regression equation. Suppose X is controlled experimentally and that two or more independent cases are run at each value of X. Since these cases are treated alike, their variability is pure error. This variability may be pooled across all values of Xi to obtain a single pooled error term that is independent of the form of the regression model. In fact, the values of X may be considered the levels of a one-way Anova. If this error term is used, the linear regression is identical to the linear trend of Chapter 4.
9.1.4a
Assumptions for regression differ somewhat, depending on whether X is fixed, as when experimentally manipulated, or random, as with father's height in Galton's equation and with most observational data. In practice, these different assumptions lead to the same results. In either case, linearity and independence imply that b0 and b1, are unbiased estimates of the population parameters, β0 and β1. Confidence intervals and significance tests are also the same.
9.1.4b
Extreme cases can be extremely serious in regression analysis. Kahn and Udry (1986) reanalyzed data considered in a previous report that had claimed surprising conclusions about age changes in marital coital frequency, showing that these conclusions were largely due to eight extreme cases out of 2063 couples, four of which were thought to be keypunch errors in the archival data. When a few extreme cases make a big difference, including them will misrepresent the main population.
9.1.4c
The main generalization that can be hoped for from a predictive regression is that the predictor variables will retain some usefulness in other situations. The values of b0 and b1, would usually have to be calibrated anew for each new situation.

-280-

