Traditionally, testing for evaluating knowledge, skills, abilities, and other characteristics (KSAO's) has been done in a paper-and-pencil scenario. However, the development of information technology (IT) in the last two decades has made computer-based testing (CBT) feasible in both educational research and practice (Bunderson et al., 1989). Furthermore, today's e-learning technology enables organizations to start adopting online instructions as well as online testing. The evolving technologies have thus moved the traditional pencil-and-paper testing toward a computer-based, or even a computer adaptive testing (CAT) scenario.
In theory, CAT can dramatically reduce the testing time while maintaining the quality of measurement as compared to the fixed-item type of tests in either pencil-and-paper or CBT format (Wise & Kingsbury, 2000). Thus, it has been researched and applied extensively in larger educational institutes, certified or licensed centers (Olson, 1990; ETS, 2001; Taiwan Education Testing Center, 2007). However, CAT is not used by either classroom teachers who make up and administrate their own tests (Frick, 1992) or by business organizations in their daily KSAO's routines. One major cause for this situation is that the most adopted CAT model--Item Response Theory (IRT)--is too rigorous to implement and maintain. Wise and Kingsbury (2000) listed item pools, test administration, test security, and examinee issues as the four general areas of practical issues in developing and maintaining IRT-based CAT programs. In particular, the adopting issues mostly fall into the item pools area, which includes pool size and control, dimensionality of an item pool, response models, item removal and revision, addition of items to the item pool, maintenance of scale consistency, and the use of multiple item pools. Since the rigorous IRT requires a large number of examinees ranging from 200 to 1000 for estimating item parameters and special expertise in item-pool maintenance, IRT is only possible in educational institutes or professional testing centers (Frick, 1992).
The Sequential Probability Ratio Test (SPRT) model is another CAT model that is less adopted because it only provides the examinee's mastery result and lack of the assessment flexibility of the IRT score. Nevertheless, the original SPRT waives the maintenance requirements for the pre-test that involves a large number of examinees (Frick, 1990). This characteristic of SPRT also suffers issues in variability in item difficulty, discrimination, or chances of guessing. An empirical study of Frick (1990) indicated that SPRT is a fairly robust model for mastery decisions, especially under smaller Type I and II decision error rates such as 0.025. Moreover, although parameter estimation pre-test and calibration on item pool may be preferred, IRT still suffers from accuracy or validity issues (Frick, 1990; Huff & Sireci, 2001; Welch & Frick, 1993). From the above perspectives, SPRT seems to be a practical alternative to the CAT application for school teachers and business organizations.
We propose an SPRT-based CAT approach that inherits the maintenance-free item pool of SPRT strength, and approximates the grade classification of IRT spirit. In addition, to show the validity of our proposed approach, the criterion validity (Zikmund, 1997) method is adopted by comparing an English CAT prototype system based on the proposed approach with the "Test of English as a Foreign Language (TOEFL)" standard. Criterion validity was chosen because the potential source of construct-irrelevant variance originating from one's unfamiliarity with computers had been studied and concluded to be negligible (Taylor et al., 1999). Technically speaking, the criterion validity basically answers questions like "Does my measure correlate with other measure of the same construct?" (Zikmund, 1997). …