8 Statistics for Questions about Two Variables The most general statistical question that can be asked about two variables is whether or not they are related in some way. In other words, can the values of one be used to predict the values of the other for a certain set of cases? If this is possible, the two vari- ables are said to be related or correlated, even though the predic- tions may not be infallible. Perfect correlation implies that every value of one variable may be predicted exactly from the values of the other. Imperfect correlation between variables suggests that knowing a particular value of one variable provides information about the most likely value of the other variable. For example, be- cause it is possible to predict the average weight of a group of per- sons for each of a range of values of height, body height and weight are said to be highly but not perfectly correlated. In such relation- ships a few predictions may be exact, but most will be off the mark to some small extent. The strength of a relationship between two variables is based upon the amount of error in the predictions. The less such forecasts turn out to be wrong, the stronger the cor- relation. One important limitation on statistical analysis, often misunder- stood by laymen, is the difference between correlation and causa- tion. Simply put, showing that two variables covary is not sufficient to prove that one variable actually causes another to behave as it does. It is both tempting and wrong to infer cause from covariation. Correlation may be accidental, such as the coincidence between fluctuations of the New York stock market and the monsoon in India, or dependent upon a third variable, such as the drop in the Swedish birthrate and the disappearance of storks, both of which may be attributed to industrialization. However, in many practical -104- |