In 1983, Card, Moran, and Newell developed the goals, operators, methods, and selection rules (GOMS) methodology for modeling human computer tasks. A GOMS model has four components: The goal is the state to be achieved, or end state. Goals may include subgoals. Operators are low-level actions, which make up tasks. Methods are a set or series of operators used to accomplish a specific goal. Selection rules are sets of discriminating conditions used to choose between different methods of achieving a particular goal. The GOMS methodology is essentially a process of hierarchical task decomposition in which all the operators needed to accomplish a goal are specified. These operators form a method. If more than one method of achieving the goal exists, then a selection rule is specified to determine the appropriate method, depending on the conditions of the environment at the time of task execution. These high-level operators are further decomposed as subgoals, which are then broken down into lower-level operators. When all operators have been broken down to their lowest level (e.g., they cannot be broken down any more), the process stops.
These lowest-level operators can be subsequently classified under three subsystems: the motor subsystem, the perceptual subsystem, and the cognitive subsystem. Although it is theoretically possible to model all human tasks employing these three subsystem operations, execution times for many motor tasks are not available; also, there are an infinite number of knowledge-level tasks and other creative cognitive operations related to performance for which a sequence of operations is unknown. For the case of human-computer tasks, however, empirical evidence exists regarding the time required to execute each of these primitive subsystem operators, making it possible to accurately predict the time it takes to execute any human-computer task once the sequence of operators has been defined.
Although the GOMS modeling technique has proven extremely successful in developing accurate cognitive task models, instruction and training in the details of applying this modeling technique are required. Instruction or training is typically provided in college-level courses in human-computer interaction or in CHI tutorials or workshops. Most expert operators of computer systems interfaces, however, lack the skill necessary to develop these GOMS and natural language GOMS (NGOMSL) models, as prescribed by Kieras (1997). Consequently, a research effort was conducted to build an automated tool to assist domain experts (unfamiliar with GOMS and NGOMSL analysis techniques) in developing cognitive models of human-computer interactions. Williams (2000) developed such an automated tool, the cognitive analysis tool for human-computer interaction (CAT-HCI). A number of tools have developed for eliciting and structuring knowledge within the framework of a GOMS analysis, such as GLEAN3 (Kieras, Wood, Abotel, & Hornof, 1995) and quick and dirty GOMS (QGOMS; Beard, Smith, & Denelsbeck, 1996). These tools, however, have not been evaluated with respect to the accuracy and consistency with which different individuals, skilled in a task domain but unskilled in GOMS analysis, can create such models.
One study (Baumeister, John, & Byrne, 2000) reported a comparison among QGOMS, CAT-HCI, and GLEAN5 relative to building GOMS models. These researchers, however, were skilled in this form of analysis and performed their evaluation primarily focusing on the usability of each tool. A comparison of accuracy was made among the three tools, an automatically generated model, and a hand-done (manual) version of a GOMS keystroke-level model (KLM) on the one task employed for the comparison. The comparisons of execution times among the models generated by the different tools were not validated by empirical observation. The same holds with the present research. The present work, however, builds on this earlier evaluation of tools by providing a more detailed investigation of one of the tools (CAT-HCI) evaluated by Baumeister et al. Since the time of the Baumeister et al. comparison of tools, continued debugging of CAT-HCI has progressed in an effort to reduce the number of crashes experienced. This has led to a more stable version of CAT-HCI.
Unlike GLEAN3 and QGOMS, CAT-HCI provides a structured interview process, which guides the user through the construction of a GOMS model to the point at which low-level operators can be inferred and automatically inserted. GLEAN5 is like programming in a programming language, with an editor to assist in developing NGOMSL code structures. QGOMS does not provide any direction in terms of building GOMS models but does provide an interface for creating treelike structures of nodes and links. For each node, execution time data are entered via the keyboard.
Our concern in this evaluation is not with respect to usability but to determine the reliability and accuracy of models generated using CAT-HCI across a number of different task domain experts on several different tasks. In order to promote and facilitate the use of this analysis to improve the usability of computer interfaces, both usability specialists and non-usability specialists must have access to practical tools for evaluating interfaces. These tools must also generate valid, reliable models across different users and across varied task domains. This current evaluation is a step in the direction of promoting the use of GOMS tools by non-usability specialists or by individuals skilled in the use of a specific interface application but having no knowledge of interface design and/or interface assessment techniques.
The evaluation of CAT-HCI consisted of having 18 domain experts, who had had no prior knowledge of interface design, use the tool to develop cognitive models of four different Bradley A3 digital computer tasks. A minimal variance among these models, as created by different domain experts, would demonstrate the consistency with which individuals who are not skilled in this cognitive task analysis approach can generate cognitive models of human-system interaction. Additionally, the models developed by each expert were compared with models of those tasks developed by the authors (e.g., the first author being the designer and developer of CAT-HCI), both of whom are skilled in the development of cognitive models of human-computer interaction. The task models generated by experimental participants were compared with those generated by the authors (referred to as the baseline models) to determine measures of model accuracy. The models, however, were not validated against actual performance. If the tool can generate accurate and consistent models, the inference can be made that the tool can be used to generate cognitive models for other similar human-computer interaction tasks, in keeping with the original intention of Card et al. (1983).
The purpose of the evaluation methodology was to investigate the performance of CAT-HCI relative to meeting this research objective. Can the automated tool guide domain experts (or what Nielsen, 1992, referred to as single experts) who are not skilled in task analysis through the GOMS process to formulate consistent and accurate cognitive models for assessing the complexity of HCI applications and for predicting user performance? Consistency and accuracy between task domain experts should determine the effectiveness of the tool.
The participants for this experiment consisted of 18 active duty or recently retired infantry and armor noncommissioned officers in the U.S. Army. All the participants were men with at least 6 years of military experience. All had completed a high-school level educational program, but none had advanced degrees. Because the Bradley A3 commander's tactical display (CTD) is still in the prototype stage, these participants essentially compose the entire pool of subject matter experts on the tasks to be modeled. The tasks were part of a system under development that is scheduled for fielding in the near future. The participants were members of the New Equipment Training Team for the system. These individuals are responsible for learning the operations of the system, such that they may later train the trainers for newly equipped units. All participants were tested on the tasks to be modeled to verify their expertise on the system.
CAT-HCI Version 95 was used to develop cognitive models of four human-computer interface tasks associated with the U.S. Army's Bradley A3 Fighting Vehicle. The tasks were selected such that a broad range of cognitive, perceptual, and motor operations must be described by the test participants in order to accurately model each task selected. Task selection was contrived to evaluate the breadth of operations, which needed to be identified in order to generate rather complex models.
CAT-HCI is based on its predecessor, cognitive task analysis (CAT; Williams, Hultman, & Graesser, 1998). The CAT-HCI interactive computer program was designed to guide single experts, or task domain experts, through the cognitive task analysis process in a manner similar to Kieras's (1997) guide to GOMS using NGOMSL. CAT-HCI was specifically designed to develop models of human-computer interactions at the level of an NGOMSL analysis. This is accomplished by prompting the user to select …