A recent graduate class in tests and measurements was introduced to the topics of authentic assessment, alternative assessment, and assessment portfolios. Most of the 26 K-12 teachers in the class were currently relying on tests provided by textbook publishers to assess student achievement. They were surprised to learn that such tests often have low validity, poor reliability, and low correlations with other measures of student achievement (Witt, 1994; Stiggins, 1997). The K-12 teachers engaged in discussion and dialogue, asking whether changes in their testing methods might improve students' performance on standardized measures of achievement such as college entrance exams. Their interest in experimenting with alternative testing techniques was stimulated by the literature suggesting that effective questioning leads to improved critical thinking, as reflected by students' grades and other performance measures (Hunkins, 1995)
The K-12 teachers in this project studied teacher-made tests, Bloom's taxonomy of educational objectives (cognitive domain), and a variety of objective techniques designed to elicit diagnostic information about student performance in the knowledge, comprehension, application, analysis, synthesis and evaluation domains (Bloom, 1956; Metfessel, Michael & Kirsner, 1969). The teachers benefited from Metfessel et al.'s (1969) excellent examples showing how to measure a student's ability to perform specific desired behaviors; they began to see the connections between the critical elements of content mastery and behavioral competence.
The K-12 teachers studied the "10 designs for assessment and instruction" developed by Carlson (1985), under the auspices of the Educational Testing Service. These designs include:
* matching items
* master list (keylist) items
* tabular matrix items
* best answer items
* greater-less-same items
* rank order items
* question and short answer items
* statement and comment items
* experiment/results items
* experiment/results/interpretation items
Carlson's (1985) test models incorporate strategies designed to assess all of the levels in Blooms taxonomy, from knowledge through synthesis. Moreover, the test construction strategies demonstrated by these models contain test item formats typically encountered in the standardized tests that most school districts rely on, college entrance examinations, and career "gateway" tests.
The K-12 teachers were required to choose among several options for a course project. One option was to convert one or more of their current tests, using one of Carlson's models. Twelve teachers chose this option and four of them collected data on student performance. The data these four teachers collected were analyzed using both quantitative and qualitative techniques.
The quantitative results were obtained from 24 second grade students. The data are student test scores on three versions of a test: a publisher's test, a matching test using pictures, and a matching test that did not use pictures. The test content in the three versions was identical; only the testing format changed. The teachers observed that the publisher's test focused on recall, while the matching tests required students to use critical reasoning skills.
Other teachers (seventh grade math teachers and eighth grade social studies teachers) collected qualitative data, obtained by systematic observation of their students. At the outset, the math teachers believed that objective testing would be less rigorous. Some thought that the benefit of alternative testing would not be worth the time and effort required. Most of the math teachers wanted to "see" the work that went into the problem solving and they are less interested in seeing an answer only.
Quantitative analysis - Subjects included 24 second grade students who took three versions of a test during a period of 10 school days. …