Indices of Discrimination or Diagnostic Accuracy
Subjects in experiments on perception, learning, memory, and cognition are often required to make a series of fine discriminations. In a common method, a single stimulus is presented on each trial and the subject indicates which of two similar stimuli it is, or from which of two similar categories of stimuli it was drawn. In addition, in several practical settings, professional diagnosticians and prognosticators must say time and again which of two conditions, confusable at the moment of decision, exists or will exist. Among them are physicians, nondestructive testers, product inspectors, process-plant supervisors, weather forecasters, mineralogists, stockbrokers, librarians, survey researchers, and admissions officers. There is interest in knowing both how accurately the experimental subjects and professionals perform and how accurately their various tools perform, and a dozen or more indices of discrimination accuracy are in common use. In this chapter I cover a way of discriminating among those indices that permits sifting the ones that are valid and reliable from the ones that are not. This proposed touchstone for indices is the relative (or receiver) operating characteristic (ROC).
In this chapter I argue that there is no model-free approach to confusion data, and specify the models implied by several common indices. Many of the points I make may be familiar to experimental psychologists from previous discussions of signal detection theory, but they are generalized now to provide a theoretical overview of questions usually addressed heuristically, and with uneven success. The package is presented as a useful contribution to other fields and to those who have avoided the indices of detection theory in favor of indices presumed to make fewer or weaker assumptions.
The path of this chapter is not simple and quick, but the outcome is quite