Selection bias results from a discrepancy between the range of estimation of a statistical model and its range of application. This is the case for fraud risk models, which are estimated on audited claims but applied on incoming claims in the design of auditing strategies. Now audited claims are a minority within the parent sample since they are chosen after a severe selection performed by claims adjusters. This article presents a statistical approach that counteracts selection bias without using a random auditing strategy. A two-equation model on audit and fraud (a bivariate probit model with censoring) is estimated on a sample of claims where the experts are left free to take the audit decision. The expected overestimation of fraud risk derived from a single-equation model is corrected. Results are close to those obtained with a random auditing strategy, at the expense of some instability with respect to the regression components set. Then we compare auditing policies derived from the different approaches.
Auditing policies derived from statistical analysis and applied to insurance claims (see the section "Fraud Detection ... Bias Issues" for an overview) face a major selection bias problem. A score which assesses fraud risk is derived from a regression analysis on audited claims, as fraud is checked on the audited claims only. Then the score is applied to incoming claims in order to select those which are then recommended for audit. Now audited claims are a minority within the parent sample as they are chosen after a severe selection performed by claims adjusters. This discrepancy between the range of derivation of the risk model (i.e., the audited claims) and the range of application (the incoming claims) creates a selection bias.
Random auditing of claims is the basic strategy which makes it possible to counteract selection bias. A pure random auditing strategy consists in picking claims at random, then in auditing these claims. This controlled experiment eliminates the selection induced by the audit decision. The estimation of a single fraud equation in this sample provides an estimated fraud probability for incoming claims which is not subject to selection bias.
Random auditing was partly carried out on the database we investigated. Twenty percent of the claims were thus selected and recommended for audit. However, only one claim out of five was eventually audited in this population because all the incoming claims were not suspicious with respect to fraud (see the section "Fraud Detection ... Bias Issues" for more details about auditing processes). Optimal auditing strategies can be designed from the estimation of a fraud equation on the audited claims (see Ayuso et al., 2004, for derivations with the database used in this article).
Most insurance companies are reluctant to carry out a random auditing policy. This is because the long-term influence of an audit decision on the policyholder's value for the company is negative. Indeed, an honest policyholder may take the audit process amiss and his loyalty to the company should decrease as a consequence, as well as his value for the insurer. Hence companies are deterred from performing a systematic audit on part of their claims database. Without a random auditing policy, the effect of selection bias on fraud risk assessment can easily be anticipated. The experts take an audit decision based on the claim characteristics recorded in the company files. However, they are also able to capture idiosyncrasies in fraud distributions which are not summarized in the observable information. Given the observable characteristics, fraud risk is expected to be less significant for a claim exempted from audit by the expert than for another one checked for fraud. A fraud risk model derived without taking this selection problem into account would then overestimate fraud probabilities for the incoming claims. …