and finance. After 622 A.D., competitive tests were administered on three levels. Candidates first had to pass a local district assessment before they could compete at the provincial capital. Those who passed at the provincial level were honored with a twenty-four hour exam in Peking, after which they went home to await the official report on their place in the "scale of merit." Assessment changed little in the following centuries until the rise of a technocratic civilization. In America, until about 1850, most examinations were limited to recitation and long, difficult essays ( Davis, 1971; Lien, 1976). The lack of reliability, implicit in scores from these assessments, was apparent as early as 1845 when Horace Mann argued in favor of the "new type examinations" because of their objectivity ( Ruch, 1929). However, it remained for researchers in the early 20th century to empirically support Mann's point ( Buch, 1916). Accountability and standardized forms of evaluation quickly became issues in measurement and evaluation. Mann, who was to become a major influence in the development of assessment, ( Davis, 1971; Ruch, 1929) wanted to develop an instrument that would measure the progress of pupils in the Boston school system. Principals were to be fired if their students did not show satisfactory results on the tests. The wave of immigration into the United States during the mid-19th century exerted a subtle pressure on the development of assessment. There was a desire to sort out the "rough," uneducated new immigrants by using "scientific" measurement instruments. Many educators argued for the development and measurement of educational standards. ( Boten, 1932: Buch, 1916; Monroe De Voss & Kelley, 1917). Tests were developed for spelling, language and arithmetic skills ( Boyington, 1932; Buckingham, 1916; Courtis, 1913, 1916a, 1916b, Gray, 1923; Hall, 1911; Monroe, 1923; Pryor, 1923). The "new type examinations" estimated a student's knowledge by the number of words spelled correctly or the number of problems completed successfully in a given period ( Ebel, 1970; Thorndike, 1911, 1912, 1918; Washburne, 1922). Psychology at this time was developing into a "rational science," and mathematics as well as statistics began to invade what had been thought of as a purely subjective realm. Inevitably, calls for quantitative and content standards were voiced by well renowned educators ( Thorndike, 1911, 1912, 1918; Washburne, 1922). Around the turn of the century, two milestones in modern assessment occurred. In the late 1800's, Francis Galton established his laboratory in South Kensington, England. Galton measured and recorded individual differences in human physical characteristics and sensory and psychomotor responses. His results seemed surprising when published in 1869 and 1883. Individual differences were distributed according to a mathematical model called the "normal curve of error" ( Harris, 1966; Hashway, 1977c; Freund, 1962). This discovery led many to believe that mental and physical attributes must also follow the "normal curve." If the results of an assessment were not normally distributed, they inferred that there was something wrong with the test-as opposed to inferring that whatever the assessment was measuring was not normally distributed. Ruch ( 1929) attributes the normal -2- |