Magazine article Training & Development Journal

IBM Takes the Guessing out of Testing

Magazine article Training & Development Journal

IBM Takes the Guessing out of Testing

Article excerpt

IBM Takes the Guessing Out of Testing An innovative new way to assess training courses can measure not only trainees' knowledge but also their confidence in their answers.

Measuring how effectively training satisfies corporate goals is a major challenge for training and development specialists. Many training programs have no assessment mechanisms to determine their quantitative effectiveness. Instructors typically solicit the opinions of the students. They ask, "Did this course meet its stated objectives?" or "Did you enjoy this course?" But "happiness-sheet" feedback hardly answers the crucial question, "Are the students learning anything?" To answer that question, trainers must measure knowledge gain and retention.

In 1985, IBM's internal education organization began a programmatic research effort to find out whether new, internally developed training technologies increased knowledge transfer and retention. In an experimental course, the organization successfully developed and implemented a unique method of knowledge testing, employing pre-and post-tests. The testing goes beyond reaction measurement (Kirkpatrick's Step One) and also provides more information than that gained from knowledge testing (Step Two). (Donald L. Kirkpatrick discusses his four steps for training evaluation--measuring reaction, learning, behavior, and results--in "Evaluation of Training," Training and Development Handbook, edited by Robert L. Craig.)

Instrument construction

A six-hour experimental course, designed to make maximum use of new training technologies, was the initial target for the new program of knowledge measurement. The course designers separated the course objectives into 10 main segments; each segment encompassed, on average, 10 key learning points. From that material, 100 content-valid, true-or-false questions were written. The course designers took great care to write questions that were challenging but fair, and relevant to the learning points and course objectives. In fact, the designers' goal was to create, when possible, questions that were "mini-case-studies" --that is, questions that required some situational analysis as well as knowledge of a principle or idea. In addition, the designers compared the learning points with the questions to ensure that the questions thoroughly covered the course's 10 knowledge areas and the learning points within each area.

After extensive experiments with the questions to obtain estimates of difficulty, the designers created two non-overlapping, parallel families of tests (four tests per family). Each test contained 25 questions, balanced according to the knowledge areas. Because of the particular circumstances, the designers considered that multiple forms of the test were important, not only to ensure differences between pre-and post-test questions, but also to minimize the ability of instructors to "teach to the test" instead of teaching the course as designed. In the same way, particular circumstances (such as time constraints) dictated the choice of the true/false format and the number of questions.

Trainees could answer each question, of course, either correctly or incorrectly. The designers, however, developed a true/false format that they hoped would be more interesting to those taking the test and at the same time yield more information than a standard true/false test. For each question, students could answer that they were "reasonably sure" of a true or false answer, or that they were "not sure but probably" the answer was true or false, or that they didn't know the answer. That way, the tests measured not only what students knew but also their confidence in that knowledge.

Administration and scoring

Before presenting the tests, instructors gathered data on student demographics, such as intracompany organization, occupation, time in profession, and time in company. That data would be used, for example, to analyze which internal organizations did particularly good jobs of preparing their students. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.