Academic journal article The Journal of Business Forecasting Methods & Systems

Heteroscedasticity: How to Handle in Regression Modeling

Academic journal article The Journal of Business Forecasting Methods & Systems

Heteroscedasticity: How to Handle in Regression Modeling

Article excerpt

A regression model estimated by using the Ordinary Least Squares (OLS) method often takes the following form:

Y sub t = a sub 0 X sub t + a sub 2 Z sub t + u sub t ...(1)

Here t denotes time for time series data. For cross-sectional data, t will be replaced with i to denote units over crosssection. Time series data refer to data gathered over a number of consecutive time periods, for example, sales data of the last ten years. Cross-sectional data are data collected across states, income levels, etc. of the same time period. For example, gasoline consumption in each state depends on the number of motor vehicles and gasoline tax rate in that state. If we wish to forecast gasoline consumption on the basis of 1993 data for each of the 50 states, the model is cross-sectional. The regression procedure can be used to model both time series and cross-sectional data.

The residual u sub t is a random quantity in Y sub t which has not been accounted for by the independent variables. In this article, Equation (1) will be used as a basis of discussion for both time-series and cross-sectional regression models.

For an OLS regression model to be efficient for forecasting, the following three basic assumptions must hold:

1. The residuals, u sub t 's, are normally distributed with zero mean and constant variance. In practice, constant variance means that if the residuals are divided into several groups, variance calculated for each group of residuals should have a similar value. This is the assumption of homoscedasticity. The OLS procedure will produce a zero mean, but it may not produce a constant residual variance.

2. The consecutive u sub t 's are not correlated in any order. This means that there is no autocorrelation among the residuals. In other words, the residual of one period is not correlated with the residual of any previous periods.

3. The independent variables, X sub t and Z sub t , are not highly correlated with each other. This means that there is no multicollinearity between the variables.

Violation of any of these assumptions may increase the model's forecasting error or cause confusions in the interpretation of the model. In this article, we will address only the violation of the homoscedasticity assumption. That is, when the residual variance varies with the values of an explanatory variable (X sub t or Z sub t in Equation 1), it will no longer be a constant. This is the problem of heteroscedasticity. This article explains not only the sources and effects of heteroscedasticity as well as how to deal with it. Problems associated with the other two assumptions will be discussed in other articles in the future.

CAUSES OF HETEROSCEDASTICITY

There are many reasons for heteroscedasticity to appear. The following three causes are most common:

1. Where the data base of one or more variables contains values with a large range, that is, the range between the smallest and the largest values is large.

2. Where the parity between the growth rate of dependent and independent variables varies significantly during the modeling period. This is applicable only to time series data.

3. Where there exists a heterogeneity in the data. This is more true with cross-sectional data than with time series data. For example, data of income levels in different regions can hardly be uniform. The responses of high income buyers to a certain product will be different from those of low income buyers because high income consumers usually have more discretionary power than low income consumers. When such data are pooled together in regression modeling using the OLS method, the problem of heteroscedasticity will arise. In estimating the coefficients of the model, the OLS method gives equal weight to each data point. As a result, residuals will vary with different income levels.

When these conditions exist, the residual variance of the model may correlate with an independent variable. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.