Measuring the Accuracy of Diagnostic Systems
Diagnostic systems are all around us. They are used to reveal diseases in people, malfunctions in nuclear power plants, flaws in manufactured products, threatening activities of foreign enemies, collision courses of aircraft, and entries of burglars. Such undesirable conditions and events usually call for corrective action. Other diagnostic systems are used to make judicious selection from many objects. Included are job or school applicants who are likely to succeed, income tax returns that are fraudulent, oil deposits in the ground, criminal suspects who lie, and relevant documents in a library. Still other diagnostic systems are used to predict future events. Examples are forecasts of the weather and of economic change.
It is immediately evident that diagnostic systems of this sort are not perfectly accurate. It is also clear that good, quantitative assessments of their degree of accuracy would be very useful. Valid and precise assessments of intrinsic accuracy could help users to know how or when to use the systems and how much faith to put in them. Such assessments could also help system managers to determine when to attempt improvements and how to evaluate the results. A full evaluation of a system's performance would go beyond its general, inherent accuracy in order to establish quantitatively its utility or efficacy in any specific setting, but good, general measures of accuracy must precede specific considerations of efficacy (1).
I suggest that although an accuracy measure is often calculated in one or another inadequate or misleading way, a good way is available for general use. The preferred way quantifies accuracy independently of the relative frequencies of the events (conditions, objects) to be diagnosed ("disease" and "no disease" or "rain" and "no rain," for instance) and also independently of the diagnostic