Causality, Regression, Discriminant Analysis, and Research on Failure
Scherr, Frederick, Akron Business and Economic Review
Causality, Regression, Discriminant Analysis, and Research on Failure(*)
It is natural that lenders, investors, and regulators would like to have some empirical insight into the default risk of the firms and financial institutions with which they are involved. This interest has led to substantial empirical research on the failure of such firms and institutions.(1) While other variables have been employed in this research, the most widely-used variable set has been the financial ratios of failed and nonfailed firms and institutions.(2) Statistical methodologies used to relate these ratios to failure have included various types of regression and maximum-likelihood estimation, but by far the most common analysis technique has been discriminant analysis.(3) This has led to a series of papers citing serious methodological, empirical, and interpretational problems in published failure studies where discriminant analysis was used[4, 13, 20, 21]. Among the problems cited are misinterpretation of the classification outputs, violation of the statistical assumptions underlying the discriminant model, misinterpretation of the implications of discriminant statistics regarding the importance of individual variables, unnecessary reduction in dimensionality and variable elimination, and problems in application over time.
In investigating the association between ratio and other measures and failure, the researcher must choose an analysis methodology: regression (or another related maximum-likelihood method), discriminant analysis, or another technique. Several considerations are relevant to this choice of methodology. One consideration is the different statistical requirements of discriminant analysis and regression[10, 23, 26]. However, these different analysis techniques also reflect different assumptions regarding the underlying causal structures. Thus, causality should also be a consideration in the choice and application of analysis methods. Few default studies have explicitly considered the causal issue. This article discusses causality relative to the empirical investigation of failure and the choice of analysis technique.
In the next section of this paper, the implied causality in regression and in discriminant analysis is discussed. Potential assumptions regarding causality in the failure process are then reviewed, with particular reference to financial ratios and their relation to failure. These assumptions are related to the implied causality in regression and in discriminant analysis and thus to methodology choice. The problems of interpretation of individual variables and of dimensionality reduction in discriminant analysis are discussed relative to these assumptions about failure. Examples of prior research are presented where the causality implied in the researchers' hypotheses was not matched to the statistical techniques used and where some of these problems consequently occurred. Two example analyses are performed on a small data set to illustrate the proper use of the two methodologies in failure research. The final section presents conclusions and implications for future research on failure.
CAUSALITY IN REGRESSION AND IN DISCRIMINANT ANALYSIS
In the regression model, the implied causality runs from the independent variables (the Xs), which are exogenously determined, to the dependent variable (Y), which is hypothesized to be determined by the independent variables plus a random disturbance term. In such a model, it is appropriate to form hypotheses regarding the signs and magnitudes of the estimated coefficients of the independent variables, since the postulated causal situation leads to expectations regarding these signs and magnitudes that the model is (in part) testing. The postulated causal situation will also suggest a limited set of potential independent variables, and it is appropriate in the regression model to include only those variables that the researcher believes determine the dependent variable; extraneous variables increase estimation time requirements. …