A Comparison of Data Mining Techniques for Credit Scoring in Banking: A Managerial Perspective

Article excerpt

1. Introduction

One of the main tasks of a bank is to lend money. As a financial intermediary, one of its roles is to reduce lending risks. Bank lending is an art as well as a science. Success depends on techniques used, knowledge and on an aptitude to assess both credit-worthiness of a potential borrower and the merits of the proposition to be financed. In recent years, banks have increasingly used credit-scoring techniques to evaluate the loan applications they receive from consumers (Blochlinger and Leippold 2006; Vojtek and Kocenda 2006; Macerinskiene and Ivaskeviciute 2008; Karan and Arslan 2008). Since severe competition and rapid growth in the consumer credit market, credit scoring models have been extensively used for the credit admission evaluation. Credit scoring is a method of modeling potential risk of credit applications (Vojtek and Kocenda 2006; Zhao 2007; Avery et al. 2004; Bodur and Teker 2005; Crook and Banasik 2004; Jacobson and Roszbach 2003). Credit scoring models have been developed by the financial institution and researchers in order to solve the problems involved during the evaluation process.

In the first beginning, financial institutions always utilized the rules or principles built by the analysts to decide whom to give credit. Since the number of applicants increase tremendously, it is impossible in both economic and man power terms to evaluate the credit applications. Several quantitative methods have been developed for credit admission decision. The credit scoring models are developed to categorize applicants as either accepted or rejected with respect to the applicants' characteristics. The objective of credit scoring models is to assign credit applicants to either a 'good credit' group that is likely to repay financial obligation or a 'bad credit' group whose application will be denied because of its high possibility of defaulting on the financial obligation (Lee et al. 2006). The statistical methods, nonparametric statistical methods, and artificial intelligence approaches have been proposed to support the credit decision (Thomas 2000). Credit scoring problems are basically in the domain of the more general and widely discussed classification problems (Lee et al. 2002).

The classification problems have long played important roles in business related decision making due to its wide applications in decision support, financial forecasting, fraud detection, marketing strategy, process control, and other related fields (Chen et al. 1996; Fayyad et al. 1996; Lee at al. 2006). The classification problem can be solved by using different techniques ranging from statistical methods to artificial intelligence algorithms.

Statistical methods, including regression, linear and nonlinear discriminant analysis, logit and probit models were most commonly applied to construct credit scoring models (Vojtek and Kocenda 2006; Lee et al. 2002; Lee et al. 2006). The most popular methods applied to credit scoring models are linear discriminant analysis, logistic regression and their variations. They are relatively easy to implement and are able to generate straightforward results that can be readily interpreted. However, there are some limitations associated with their applications in credit scoring. First of all, these methods are not effective for problems with high-dimensional inputs and small sample size. Most importantly, these techniques rely on linear separability and normality assumptions. Furthermore, it is difficult to automate the modeling process and design a continuous update flow. According to Yang (2007), the static models usually fail to adapt when environment or population changes over the time. Therefore, these models may need to be rebuilt from scratch.

In addition to these classical methodologies, artificial intelligence techniques have been applied to credit scoring. Practitioners and researchers have developed a variety of techniques for credit scoring, which involve k-nearest neighbor (Henley and Hand 1996), decision trees (Lee et al. …