Linear Regression

Linear regression, also known as simple regression, is a statistical concept often applied to economic and psychological data. While regression analysis seeks to define the relationship between two or more variables, in linear regression -- a type of regression analysis -- there are only two: the explained variable, represented by y, and the explanatory variable, represented by x.

If one seeks to apply regression analysis to find the relationship between poverty and high school dropouts, poverty would be the fixed, explained variable (y) and high school dropouts would be the explanatory variable (x). Sometimes the terminology of the variables changes depending on the purpose of the analysis. If the purpose is prediction, then the y variable is called the predictand and the x variable is called the predictor. If it is a test of causation the y variable is called the effect variable and the x variable is called the causal variable. The terms regressand, explained and dependent for the y variable, and regressors, explanatory and independent for the x variable, are used interchangeably and do not represent different experimental goals.

The denotation of simple regression is y=f(x), i.e., that the function of X is equal to Y. In an example of y=100+6x+e, where 100 is the intercept and 6 is the slope coefficient, 100 and 6 are called the regression coefficients, and e represents an unpredictable and random number called the error term. The e ensures that the regression remains controlled and stable, as all human experiments have some element of randomness, since presumably there are other variables that are not mentioned and since there tends to be measurement error in y.

Once the series of convergence points are charted on a graph, a line is drawn between them. If the y increases as the x increases, i.e., is higher on the right side of the graph than the left, then there is a positive correlation. If the y decreases as the x increases, there is a negative correlation; and if the line is horizontal to the x axis, there is no correlation. However, despite the fact that it may be discernible by sight, one must calculate it order to say for certain if there are any relationships, as some might be obvious, while others imperceptible.

In order to determine if a correlation exists and whether that relationship is a strong or weak one, or in statistical terms "significant," one must determine the coefficient of determination. The coefficient of determination is calculated by dividing the explained sum of squares over the total sum of squares, which is synonymous with dividing the total sum of squares minus the residual sum of squares, divided by the total sum of squares. The result will lie somewhere between 0 and 1; if it is closer to 1 then it is indicative of a strong positive relationship, and if it is closer to 0 it is indicative of a weak or nonexistent relationship.

Though linear regression can provide correlations, it may also be misleading since it is only considering one variable. For example: if one wants to determine the correlation between headaches and hours one spends on a computer, one might find a positive correlation, but fail to consider other variables such as allergies, hours of sleep and more. Therefore, statisticians will often use multiple regression that accounts for more than one explanatory variable to determine any sort of relationship.

Furthermore, in doing linear regression it is important to note whether the test is conducted on an entire population or merely a sample; if it is a sample it will affect the error term differently in the calculations.

Though the term regression may connote some deficiency, in reality there is no connection to its original meaning. Linear regression is a powerful statistical tool, it is what has made the scientific and economic proofs of the past few hundred years possible.

Selected full-text books and articles on this topic

Introduction to Econometrics
G. S. Maddala.
Wiley, 2001
Librarian’s tip: Part I "Introduction and the Linear Regression Model"
A Practical Introduction to Econometric Methods: Classical and Modern
Patrick K. Watson; Sonja S. Teelucksingh.
University of the West Indies Press, 2002
Librarian’s tip: Chap. I "The General Linear Regression Model"
An Introduction to Statistical Concepts for Education and Behavioral Sciences
Richard G. Lomax.
Lawrence Erlbaum Associates, 2001
Librarian’s tip: Chap. 11 "Simple Linear Regression"
Statistics: An Introduction to Quantitative Economic Research
Daniel B. Suits.
Rand McNally, 1963
Librarian’s tip: Chap. VII "The Function as an Equation I: Simple Linear Regression"
Understanding Statistical Concepts Using S-Plus
Randall E. Schumacker; Allen Akers.
Lawrence Erlbaum Associates, 2001
Librarian’s tip: Chap. 34 "Linear Regression"
Research Methods in Applied Settings: An Integrated Approach to Design and Analysis
Jeffrey A. Gliner; George A. Morgan.
Lawrence Erlbaum Associates, 2000
Librarian’s tip: "Linear Regression" begins on p. 257
An Introduction to the Analysis of Variance
Richard S. Bogartz.
Praeger, 1994
Librarian’s tip: Chap. 12 "Regression, ANOVA, and the General Linear Model"
Analyzing Quantitative Behavioral Observation Data
Hoi K. Suen; Donald Ary.
Lawrence Erlbaum Associates, 1989
Librarian’s tip: "Linear Regression" begins on p. 206
A Note on Marginal Linear Regression with Correlated Response Data
Pan, Wei; Louis, Thomas A.; Connett, John E.
The American Statistician, Vol. 54, No. 3, August 2000
PEER-REVIEWED PERIODICAL
Peer-reviewed publications on Questia are publications containing articles which were subject to evaluation for accuracy and substance by professional peers of the article's author(s).
Simple Forms of the Best Linear Unbiased Predictor in the General Linear Regression Model
Elian, Silvia N.
The American Statistician, Vol. 54, No. 1, February 2000
PEER-REVIEWED PERIODICAL
Peer-reviewed publications on Questia are publications containing articles which were subject to evaluation for accuracy and substance by professional peers of the article's author(s).
Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model
Long, J. Scott; Ervin, Laurie H.
The American Statistician, Vol. 54, No. 3, August 2000
PEER-REVIEWED PERIODICAL
Peer-reviewed publications on Questia are publications containing articles which were subject to evaluation for accuracy and substance by professional peers of the article's author(s).
Statistics for Psychologists: An Intermediate Course
Brian S. Everitt.
Lawrence Erlbaum Associates, 2001
Librarian’s tip: Chap. 6 "Simple Linear Regression and Multiple Regression Analysis"
Making History Count: A Primer in Quantitative Methods for Historians
Charles H. Feinstein; Mark Thomas.
Cambridge University Press, 2002
Librarian’s tip: Chap. 4 "Simple Linear Regression"
Looking for a topic idea? Use Questia's Topic Generator