Two characteristics of any mode of data entry are crucial to determining its efficacy: speed of entry and error rate. At first glance the speed of human speech would suggest that automatic speech recognition (ASR) technology must revolutionize data entry However, experience with ASR quickly reveals that error rates can be high and that error correction procedures are time consuming. More experience shows that error rates and error correction delays can drop dramatically as the user and the recognizer adapt to each other. The current study aims to establish heuristics, at least for routine data entry tasks, that specify the parameters that determine when and how ASR may offer an advantage.
Two questions must be answered to give a picture of the usefulness of ASR: How quickly do error rates decline to an acceptable level, and what rates of data entry can be obtained when error rates are acceptably low? If the answer to the second question reveals that the best rates of data entry by ASR are slower than those of other means, then ASR will be limited to applications for which rapid data entry is not a primary consideration, and the first question is worth considering only in those contexts. Consequently, we designed our experiments specifically to answer the second question -- that is, the dependent variable in each experiment is the time to enter an item correctly on the first attempt. The use of such data to represent the best times obtainable with ASR depends on an assumption that the effects of the mutual adaptation of user and recognizer are limited to error rates and error correction. Pilot work supported this assumption, and further strong support is provided by data from the phrases experi ment.
Choice of Speech Recognition System
The choice of an ASR system configuration was made by comparing the characteristics of the system with the task requirement of data entry by form completion. We chose a speaker-dependent system in order to allow the use of a large vocabulary. In addition, we chose an isolated-word system, which compares strings of phonemes with representations of individual words, because form entries are often cryptic and do not consist of the properly formed sentences for which continuous speech systems are optimized. In command mode the complete set of words to be recognized is known in advance, and the recognizer can be trained on just that vocabulary. In dictate mode the recognizer has available a large vocabulary of word templates. A more detailed discussion of these issues can be found in Simpson, McCauley, Roland, Ruth, and Williges (1985).
Previous Evaluation Studies
A substantial literature on ASR applications has grown up over the last 30 years. However, little of it bears directly on the issue of data entry speeds, and (in the light of 20/20 hindsight) some of the experimental comparisons are flawed by the use of atypical participants or by differences in task requirements when ASR is compared with an alternative. Some studies and critiques that have influenced the design of our experiments can be mentioned here; a more detailed review can be found in Damper and Wood (1995).
Welch (1977) concluded that for simple data entry tasks, the keyboard provides a faster input mode than does ASR. The experiment was designed to compare data entry aspects such as speed, accuracy, and correction times. The input modes were keyboard, ASR, and graf pen, a pointing device. The data entry task was classed as simple (copying) or complex (requiring parsing by the participant). The skill distribution of participants in the simple data entry experiment--mostly highly experienced or expert typists--gave the keyboard "a distinct advantage" (Welch, 1977, p. 17). Another advantage conferred on the keyboard condition arose from a reduction in the recommended ASR training.
Similar results were reported by McSorley (1981, cited in Simpson et al. …