Two variables are said to correlate if a change in one of them is accompanied by a predictable change in the other. The concept of correlation is commonly encountered in a range of techniques used in business forecasting and modelling.
If both of the variables in question are numerical, a technique known as the Pearson method can be used to calculate the degree to which they correlate. The result is expressed as a correlation coefficient, otherwise known as the Pearson coefficient or r score. If one or both of the variables are not given in a suitable quantitative form, an alternative approach can be used to measure the degree of correlation, which is expressed in such cases as Spearman's rank correlation coefficient.
The basic mathematics behind the Pearson method can be illustrated using the simple case of a class of students. There are six people in the class, each of whom sits a maths exam and then an English exam the following week. Suppose that each student achieves exactly half of the mark in their English exam that they scored in their maths paper: in this case the correlation between their maths and English scores is perfect and the Pearson coefficient derived from comparing the two sets of results is 1:
Case 1: perfect positive (linear) correlation Student Maths mark English mark 1 80% 40% 2 60% 30% 3 44% 22% 4 26% 13% 5 70% 35% 6 64% 32% Pearson coefficient: 1.000
This is a plausible finding, given that performance in exams is an expression of academic ability. An able student should score relatively highly in both exams, while a weak student should score lower marks in both.
The r score is calculated using a formula that measures the range of dispersion of the number of points around a mean average value. Microsoft's Excel spreadsheet software has a Pearson function: if you arrange the two sets of figures in columns A and B, and then type "=pearson(a1:a6, b1:b6)" into a cell, the r score will appear there. When the r score is 1, it indicates a perfect positive correlation, which can be represented graphically in the diagram below. The six points have been plotted on the graph (known as a scatter diagram) and then joined by a straight line. It is good practice to draw a scatter diagram to ascertain whether or not there's a linear relationship between the two variables.
The correlation between the maths and English exam marks is perfect and this appears as a straight line on the graph. All of the six points observed in the data lie exactly on that line.
The fact that the two sets of figures correlate suggests a relationship or causal link between them, but says nothing about the amount of change in the first that corresponds to a given change in the second. To determine that, an exercise in regression analysis is required. The relationship between the English and maths marks can be represented by the regression equation E = [beta]M, where [beta] is known as the regression coefficient. In this case 13 is evidently 0.5. Compare this equation with the equation for the linear regression of y on x, which is given in the list of formulas required for CO3 as y = a + bx. Here the value of a is zero, which is because the y intercept is zero.
One obvious use of regression analysis is that it enables us to forecast what result a student should obtain in the English exam as soon as their result in the maths paper is known. If they score 68 per cent in maths, for example, we can forecast that they will achieve 34 per cent in English.
A correlation can also be negative. For example, if we alter the English exam results of our class of six, the following outcome might occur:
Case 2: perfect negative (linear) correlation Student …