Kenneth G. Willis
Frequency and cross-tabulation tables have limitations in their ability to analyse housing problems. Suppose a sample of houses consisted of 180 observations, and the purpose is to investigate the effect of income, household size and tenure type on housing consumption. By cross-tabulation, the sample could be divided into, say, five income groups, four household size groups and three tenure groups. Means could be computed from each cell to estimate the effect of these variables on demand. However, with 5×4×3=60 cells, there would only be an average of three observations per cell, and although some would have five or six, many would be empty and most would have only one or two observations, making means meaningless or extremely unreliable (Malpezzi, 1984b). Regression techniques get round this problem, in addition to that of errors in results and interpretations which can occur in cross-tabulations through the inappropriate grouping of survey data (Upton, 1989).
Regression analysis can be broadly defined as the analysis of the statistical relationship among variables. There are many forms of regression analysis but a basic distinction can be made between forms which use continuous data and those which use discrete data. The classical regression model is based upon a continuous dependent (response or endogenous) variable (Y) which is dependent upon a number of independent (predictor or exogenous) variables (X1, X2, X3...., Xk). Such a model implies cause and effect; a given change in X1 will cause Y to change by a specific amount; but it does not prove cause and effect. It really stresses the probabilistic association between the two variables. Indeed instrumentalism argues that variables and models should be chosen which provide the best predictive results (since this is the primary purpose of a theory) rather than selecting those based on deductive logic or causal theory (Boland, 1979).