# Linear Regression

Linear regression, also known as simple regression, is a statistical concept often applied to economic and psychological data. While regression analysis seeks to define the relationship between two or more variables, in linear regression -- a type of regression analysis -- there are only two: the explained variable, represented by y, and the explanatory variable, represented by x.

If one seeks to apply regression analysis to find the relationship between poverty and high school dropouts, poverty would be the fixed, explained variable (y) and high school dropouts would be the explanatory variable (x). Sometimes the terminology of the variables changes depending on the purpose of the analysis. If the purpose is prediction, then the y variable is called the predictand and the x variable is called the predictor. If it is a test of causation the y variable is called the effect variable and the x variable is called the causal variable. The terms regressand, explained and dependent for the y variable, and regressors, explanatory and independent for the x variable, are used interchangeably and do not represent different experimental goals.

The denotation of simple regression is y=f(x), i.e., that the function of X is equal to Y. In an example of y=100+6x+e, where 100 is the intercept and 6 is the slope coefficient, 100 and 6 are called the regression coefficients, and e represents an unpredictable and random number called the error term. The e ensures that the regression remains controlled and stable, as all human experiments have some element of randomness, since presumably there are other variables that are not mentioned and since there tends to be measurement error in y.

Once the series of convergence points are charted on a graph, a line is drawn between them. If the y increases as the x increases, i.e., is higher on the right side of the graph than the left, then there is a positive correlation. If the y decreases as the x increases, there is a negative correlation; and if the line is horizontal to the x axis, there is no correlation. However, despite the fact that it may be discernible by sight, one must calculate it order to say for certain if there are any relationships, as some might be obvious, while others imperceptible.

In order to determine if a correlation exists and whether that relationship is a strong or weak one, or in statistical terms "significant," one must determine the coefficient of determination. The coefficient of determination is calculated by dividing the explained sum of squares over the total sum of squares, which is synonymous with dividing the total sum of squares minus the residual sum of squares, divided by the total sum of squares. The result will lie somewhere between 0 and 1; if it is closer to 1 then it is indicative of a strong positive relationship, and if it is closer to 0 it is indicative of a weak or nonexistent relationship.

Though linear regression can provide correlations, it may also be misleading since it is only considering one variable. For example: if one wants to determine the correlation between headaches and hours one spends on a computer, one might find a positive correlation, but fail to consider other variables such as allergies, hours of sleep and more. Therefore, statisticians will often use multiple regression that accounts for more than one explanatory variable to determine any sort of relationship.

Furthermore, in doing linear regression it is important to note whether the test is conducted on an entire population or merely a sample; if it is a sample it will affect the error term differently in the calculations.

Though the term regression may connote some deficiency, in reality there is no connection to its original meaning. Linear regression is a powerful statistical tool, it is what has made the scientific and economic proofs of the past few hundred years possible.