Academic journal article Perception and Psychophysics

Type I Error Rates and Power Analyses for Single-Point Sensitivity Measures

Academic journal article Perception and Psychophysics

Type I Error Rates and Power Analyses for Single-Point Sensitivity Measures

Article excerpt

Experiments often produce a hit rate and a false alarm rate in each of two conditions. These response rates are summarized into a single-point sensitivity measure such as d', and t tests are conducted to test for experimental effects. Using large-scale Monte Carlo simulations, we evaluate the Type I error rates and power that result from four commonly used single-point measures: d', A', percent correct, and γ. We also test a newly proposed measure called γ^sub C^. For all measures, we consider several ways of handling cases in which false alarm rate = 0 or hit rate = 1. The results of our simulations indicate that power is similar for these measures but that the Type I error rates are often unacceptably high. Type I errors are minimized when the selected sensitivity measure is theoretically appropriate for the data.

A common experimental design asks subjects to classify a test stimulus into two categories (target or lure, Category A or B, or signal present or absent), using a binary-valued response. The resulting data can be summarized with two numbers: the hit rate (H), which is the probability of saying "yes" to a target, and the false alarm rate (F), which is the probability of saying "yes" to a lure. From these response proportions, one can estimate the subjects' ability to discriminate the two classes of stimuli, as well as their general bias to prefer one response over the other. A variety of indexes meant to quantify discrimination sensitivity have been proposed, including d' (Tanner & Swets, 1954), A' (Pollack & Norman, 1964), H - F, percent correct, and γ (Nelson, 1984). In this article, we examine two statistical properties of these measures that are of particular interest in an experimental setting: Type I error rate and power.

In evaluating these statistical properties, obvious considerations include the size of the sample and the number of trials per condition. Less obvious but equally important is the structure of evidence in the environment. We can get a sense of this structure from the receiver operating characteristic (ROC), which plots all possible (F, H) pairs as response bias varies but sensitivity remains constant. Each of the sensitivity indexes produces ROCs of a particular shape, and this constrains the form of the evidence distributions that underlie the ROC (see Swets, 1986b). In other words, each sensitivity measure makes an assumption about how evidence is distributed, and the degree to which the assumption matches reality will affect its statistical performance. As will be seen later, this factor interacts with the true level of sensitivity and response bias. In our evaluation, we test two types of evidence distributions most prominent in the theoretical literature: Gaussian (both equal and unequal variance) and rectangular.

Calculation of d' as a summary of discrimination performance entails the assumption that the underlying distributions are equal-variance Gaussian. The distance between the means of the two distributions is measured by d' in units of their common standard deviation. It is easy to calculate:

d' = z(H) - z(F), (1)

where the z transformation takes a response proportion and yields a z score. One advantage of d' over other measures in this equal-variance Gaussian scenario is that it is independent of response bias. That is, the same value of d' is observed regardless of whether subjects are relatively conservative or liberal in their "yes" responses. If the underlying distributions are Gaussian but do not have a common variance, as is often the case in recognition memory tasks (Ratcliff, Sheu, & Gronlund, 1992), d' is not independent of bias.

Calculating percent correct, p(c), as a summary statistic is appropriate when the underlying strength distributions are rectangular in form. Given equal numbers of target and lure trials at test,

As Equation 2 shows, p(c) is linearly related to the popular corrected recognition score, H-F, that subtracts the false alarm rate from the hit rate in an attempt to correct for guessing or for response bias. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.