A Simple Example Illustrating a Well-Known Property of the Correlation Coefficient
Turner, Danny W., The American Statistician
While discussing the correlation coefficient in an introductory statistics class, a number of graphical examples were provided. I mentioned the fact that even though the correlation coefficient might be "close to one," we should not automatically conclude that the underlying (x, y) data would be "close to" a straight line, and I then provided a few "standard" point clouds to "prove" this claim. One astute student wanted something more precise. The example below was prompted by this student's questions. There are obviously an unlimited number of variations on the construction, but this one is appealingly simple, and it makes a very nice example in an introductory class dealing with discrete distributions, moments, regression lines, and the correlation coefficient.
2. THE EXAMPLE
Let discrete random vector (X, Y) have joint distribution that is uniform on the four points (-k, -a - [Epsilon]), (-k, -a + [Epsilon]), (k, a - [Epsilon]), and (k, a + [Epsilon]). Figure 1 illustrates a configuration of these points when k [greater than] 0, a [greater than] 0, [Epsilon] [greater than] 0, and a - [Epsilon] [greater than] 0. It is simple to verify that E(X) = E(Y) = 0, var(x) = [k.sup.2], var(Y) = [a.sup.2] + [[Epsilon].sup.2], and cov(X, Y) = ak. Thus the correlation coefficient for (X, Y) is
corr(X, Y) = ak/([absolute value of k][([a.sup.2] + [[Epsilon].sup.2]).sup.1/2])
where it is assumed that k [not equal to] 0 and [a.sup.2] + [[Epsilon].sup.2] [greater than] 0. Note that the magnitude of corr(X, Y) does not depend on k, a fact that follows from scale independence. Thus horizontal translations via the parameter k have no effect on corr(X, Y). Without loss of generality take k [greater than] 0, so that
corr(X, Y) = a/[([a.sup.2] + [[Epsilon].sup.2]).sup.1/2].
For any c [Epsilon] (-1, 1) the parameters a and [Epsilon] can be manipulated to obtain corr(X, Y) = c. For example, for a given c and [Epsilon], solving corr(X, Y) = c for the magnitude of a yields [absolute value of a] = [absolute value of c[Epsilon]/[(1 - [c. …