Models for Assessing Art Performance (MAAP): A K-12 Project
Dorn, Charles M., Studies in Art Education
This study reports on the Models for Assessing Art Performance (MAAP) project, an NEA funded study to assess K-12 student learning through art teacher assessments of student portfolios. The study participants included 70 K-12 art teachers and 1,000 students in 11 school districts in Florida, Indiana, and Illinois. The study investigators include art education faculty from Florida State University, Northern Illinois University, and Purdue University. Included is a rationale for the study, the development of the test instruments, the training of teachers in portfolio assessment and curriculum development. The results of the study confirm that teachers can, with appropriate training, conduct the assessment of K-12 student artwork and create their own standards for adjudicating artworks subject to the abilities of the teachers, the students, and the schools.
There is an increased pressure by school administrators and state Departments of Education to regulate how art teachers assess K-12 student art performances. Due to the lack of art assessment tests, opportunities for training in art assessment and information on authentic means of assessment, it was proposed that a cooperative effort by three university art education faculties at Florida State, Purdue, and Northern Illinois Universities and four U.S. school districts in Florida, Indiana, and Illinois undertake the research and development of pre-K-12 art assessment models that could be replicated in the nation's schools. This effort was implemented through three major activities: 1) teacher training and assessment development institutes, 2) applied assessment and technological research in school art classrooms, and 3) dissemination of the results of research to the art teaching profession. Charles Dorn from Florida State University, Bob Sabol from Purdue University, and Stan Madeja from Northern Illinois University conducted the training and supervised the research in the 11 school districts in Florida, Indiana, and Illinois. The Models for Assessing Art Performance (MAAP) project, which emphasized teaching, research and service, relates directly to the mission of all three teacher education institutions and to the needs of the school districts in meeting the demands set by national and state Goals 2000 achievement standards.
The Need for Reform
While school reformers see testing as a significant part of the reform effort, more effort needs to be expended on answering the question of how such testing efforts relate to what teachers teach. Cusic (1994) observed that teachers in school reform should accept personal interpretation and choice as central to their professionalism. As quasi-autonomous individuals, they teach in their classrooms in an independent and self-reliant manner and most often behave as individuals and not as a collective force. Usually, teachers do not feel free to join or not join the reform effort unless as they are directed to do so by state mandated compliance. If, as Cusic noted, teachers are the deciding element in school reform he felt they should feel free to join or not join in the reform effort and further to be able to regulate themselves and set their own policies and standards. The fear of regulators and reformers is, however, that granting such freedom will cause teachers to question the reforms and even argue for increased individual rights and privileges rather than reform their teaching and, therefore, improve learning. Given a choice, reformers would rather mandate teacher compliance and, eventually, also mandate the means for assessment.
Testing as Reform
Although the federal and most state governments are currently committed to testing all elementary and secondary students, a limited number of standardized visual arts tests in addition to the NAEP test are available for teachers or school districts to use. Also, the use of paper-and-pencil, true-false, and multiple choice tests and even essay type responses rarely provide adequate estimates of what students learn in most K-12 school art programs where studio-based activity is the primary means of instruction. What the educational reformers would like to see is a single art test that can measure what students know and are able to do in all of the nation's art programs. No such tests are available due to the lack of adequate means to quantify expressive activity and the unwillingness of all the nation's art teachers to teach art in the same way.
The Art Teacher's Role in Reform
For effective art learning we have to answer the question of why we believe we can embarrass teachers and school administrators into higher levels of professional performance through the imposition of a single set of predetermined educational standards (Eisner, 1992). Is there any reason, Eisner asked, why we should expect future educational policy reforms to have any greater influence than those of the past? Most school people doubt that such federal initiatives will make a difference, believing that if a change is to occur, it will have to come from within, rather than outside the schools. Arts educators can contribute to this change process, but only if they are willing to accept the empowerment to become the principal change agents in educational decision making. Can we, as Eisner notes, continue to think that better teaching and more caring schools will be created by a national report card that forces every student to arrive at the same destination at the same time, with a single set of aims, curriculum, and standards for all? Such reforms tend to make teachers more cynical and more passive.
The school art assessment context raises a number of questions regarding the most effective role of the teacher in the assessment process, including what kinds of assessments art teachers prefer to use, the teachers' lack of assessment training, and the appropriateness of paper-and-pencil, true-false, or multiple choice tests in assessing student progress in art. There are, in fact, several alternatives to paper-and-pencil tests in art, including different approaches to portfolio assessment which take into consideration the connections between school assessment and the school art curriculum. What is needed is a school authentic assessment model that involves arts teachers as stakeholders in the assessment process.
The Models for Assessing Art Performance (MAAP) Project
The Models for Assessing Art Performance (MAAP) project came about as a result of the Goals 2000: Educate America Act (U.S. Department of Education, 1994) reform and other state and federal mandates such as NAEA (1988) and NAEP (Persky, 1999) which have heightened interest in assessing American schools. While previous reform efforts also encouraged assessments, none quite equaled current efforts to set both national and state achievement standards and require the testing of students in schools as evidence of the school and the teachers' accountability. Indeed, many states have already made student testing a matter of law, holding teachers to be in compliance with the state and national standards and accountable in terms of assessing student educational progress.
The program plan developed by the research team focused on meeting four important needs: 1) helping teachers to understand and learn how to administer an authentic assessment model for evaluating student work in their own classes; 2) helping teachers develop an assessment plan they could adopt for use in their classrooms and schools; 3) devising a data collection system that meets the needs of the art student, and 4) meeting the needs of the school and state and national art assessment standards.
The project was a cooperative effort by three universities and the Pinellas and Dade County school districts in Florida, Washington and Wayne Township districts in Indiana, and 11 independent school districts in Illinois. The Florida project focused mainly on the assessment of student art portfolios and on the in-service training of teachers in curriculum development and in art studio practice. The Indiana project conducted both the studio and curriculum training of teachers, and in addition, the assessment of teacher, student and artist attitudes toward assessment in art education. The Illinois group focused mainly on alternative ways to develop and assess student electronic portfolios.
Project activities included: 1) training in the use of art rubrics in assessing pre-K-12 student art performance; 2) experiences in using blind scoring methods by peer teachers to validate teacher-scored student work; 3) training in the use of authentically scored student art as a curriculum tool for the improvement of art instruction; 4) developing assessment portfolios and analytical rubrics for special needs; and 5) developing assessment instruments and methods of reporting consistent with student needs and with Goals 2000 state and school district standards. The institute instructors included artists, curriculum and assessment specialists, and art teacher educators. The artists contributed the aesthetic and technical knowledge necessary for the teachers to increase their expressive abilities. The knowledge was used to ensure the philosophical validity of the teachers' curriculum, which ensured consistency with the means and ends of art, and provided for accurate and significant representation of the products of artistic inquiry. The curriculum and assessment specialists assisted the teachers in writing lesson plans, developing rubrics and portfolios, and creating methods for reporting the results of assessments.
The research component of the MAAP project was conducted by the three university researchers who applied qualitative and quantitative research methods to produce descriptive statistics and conduct data analysis about assessment procedures used by teachers. These procedures included the independent peer ratings of student art products; the methods teachers used to assess production; how assessment information was used, and how students and teachers were impacted by the classroom assessments developed. In the evaluation of new assessment models, the researchers observed teachers and students in the schools, developed and used interview instruments, analyzed measures of performance, and participated in the dissemination effort.
Aims of the MAAP Project
The primary consideration in the design of the student portfolio assessment part of the study was to decide whether the teacher training in three studio and curriculum development workshops would affect the art performances of the teachers' students. The portfolio assessment study sought more specifically to test the reliability of the instruments used, the procedures used to train the teachers in the assessment process, and the utility of the instruments in estimating student progress over time. The four research questions considered were whether
* the portfolio assessment process could systematically quantify student art performances;
* there was inter-rater reliability among the teachers scoring the pre- and post-test portfolios scored as a combined group;
* the raters' scores within each class were normally distributed and provided sufficient score spread; and
* the gains or losses in student portfolio scores were evenly distributed among students in the lower and higher performance categories.
The design of the portfolio assessment study involved the use of repeated measures on the same (K-12 students) multiple observations. The design was a one-group, pre-test/post-test design (O X O) with the students used as their own control group. The measure before the training (B1 or pre-test) represented a baseline (as control group) and the measure after the training (B2 or post-test) represented the influence of the treatment. The participants consisted of students from grade pre-K to 12. Two-stage cluster sampling was used. Teachers in 51 schools in three states volunteered to participate in the portfolio assessment part of the study. The teacher in each school selected one class and performance assessment measures were applied on two portfolios from each student in each of the selected classes. The measures included three teacher ratings on each student art portfolio containing four works gathered before and after the teacher training.
Each teacher collected four student art works from the same class to form portfolio A-1 (pre-test), which was scored using rubrics on a scale of 1 to 4, (4 being a high and 1 being a low) by the teacher with two additional teachers blind scoring the same portfolio. These works were again scored along with four new works gathered at the completion of the training by three teachers in the study group (B-1 and B-2).
Eighteen 1- and 2-day, pre-K-12 assessment research and development institutes were presented in Clearwater, FL; Indianapolis, IN; and Mundelein, IL. The project's institutes involved 71 pre-K-12 public school art teachers. The aims were to train them to 1) administer a field-tested, authentic pre-K-12 assessment model on student artwork; 2) to develop and test teacher-designed assessment models for use in the cooperating school districts; 3) to organize a data collection system for pre-K-12 student assessment; 4) to report the assessment data collected in formats that met individual school, school district and state assessment standards; and, 5) to participate in selected studio experiences and curriculum development.
The portfolio assessment part of the institutes had two major goals. The first was to develop a process whereby teachers could learn to accurately assess student art performances in the context of what different school art programs with different curricula and different students actually do. The second goal was to develop a teacher in-service education program that would focus on enhancing the teachers' own creative work and on using this enhancement to improve the quality of their teaching and subsequently the quality of their own students' work. This was accomplished in two ways: 1) hands-on workshops in creative forming and 2) developing new studio curricula for the art classroom. Three half-day studio workshops were offered at three project sites in Florida and Indiana.
The teacher studio workshops, taught by secondary art teachers, studio artists, and college-level art instructors, offered three intensive studio workshop sessions on drawing from the figure, drawing and painting from stilllife subject matter, and creating an imaginative abstract work. The three sites differed slightly in what was presented and on what goals were to be accomplished, but all focused on engaging in basic studio practices in observational drawing.
Although each site approached the studio problems in different ways, all provided some skill training, the introduction to the use of art media, drawing from observation and from imagination, problem assignments, and follow-up critiques of the teachers' work. The teachers also received lesson plan handouts, a vocabulary of terms, slides showing examples of both students' and professional artists' approaches to the problem, suggested resources and, in some cases, duplicate slides of student works that could be used in lesson plan development.
At the conclusion of the studio workshops, teachers in the Indiana and Florida school sites began the process of constructing lesson plans based on what they learned in the studio session as it might apply to their classes. To assist the teachers in lesson planning, they were presented with guidelines that summarized the focal and procedural knowledges contained in the curriculum on which the rubrics were based, with suggestions about goals, motivation, materials, equipment, and procedures. Teachers could choose to follow that outline or adapt it to their districts or their individual lesson plans.
Critiques of Teachers' Work and Students' Work
Two kinds of critiques were organized following the studio workshops. One was conducted by the workshop instructors on the teachers' own creative work. The other was based on examples of student work collected by the teachers from their study class. The critique of the student work was led by the project director, the art supervisor, or by the teachers themselves.
Critiques of the teachers' work were conducted by the workshop instructor at the close of the studio sessions, where the work was either spread across the studio floor or displayed in an upright position in a hallway or gallery. The critiques generally included an analysis of the work by the workshop instructor and a dialog with individual teachers and with the group as a whole.
Each teacher brought to the workshop four to six sample student works reflecting the lesson plan they developed at a previous studio session. These were displayed on the wall with an open discussion following the teacher's review of the work. Generally, the discussion included an exchange of ideas on the quality of the product, the teacher's lesson plan and its implementation, and what ideas teachers could offer each other on how the lesson could have been improved or reorganized to better meet the needs of students at different grade levels. These discussions among the teachers about what they were attempting to teach and how they carried it out seemed particularly rewarding to the teachers who rarely have opportunities to share their ideas and concerns about teaching with colleagues.
Because the research needed to reflect current national educational goals, it was necessary that the 71 teachers become familiar with National Standards for Art Education (MENC, 1994) and the various art teaching standards advocated by arts professional associations, the state, and by the schools charged with the responsibility of assessing the quality of instruction in American schools. It should be noted, however, that the national instructional standards in art as published by a consortium of national arts education associations were not necessarily the same as the standards adopted by some state departments of public instruction or those developed by the governing board of the National Assessment of Educational Progress (NAEP).
The Construction of the Assessment Instruments
To build an assessment instrument, the researchers first had to decide what it was students needed to know. The researchers were committed to making the artistic process the primary goal and using national standards mainly as a guide. This required that the process begin with a topology of practice rather than the analysis of a set of behaviors connected to selected art world figures and their power struggles. Next, it was necessary to decide on the achievement standards, the specified student behaviors, and levels of achievement to be assessed. It was recognized that the national standards were more or less ideal achievements to be met at specified 4-year intervals: grades 4, 8, and 12. They were based on the assumption that conceptual thinking is sequentially ordered in accordance with the hierarchy set first by Bloom's taxonomy where students move from descriptive to analytical behaviors. Therefore at grades K-4, they know, describe and use; at grades 5-8, they generalize, employ, select, analyze, and compare; and at grades 9-12, they conceive, evaluate, demonstrate, reflect, apply, and correlate (Bloom, 1956). While these descriptors can be useful in setting sequential performance standards, they also assume the student will achieve those higher order thinking skills most closely associated with inductive and deductive modes of thinking. They do not, as a consequence, mention such behaviors as seeing, noticing, and performing where at various levels students are expected to note such things as shape differences, positions, distance and direction, and control arm movement, and to be able to paint, draw, cut, tear, measure, unfold, recombine, think metaphorically, represent, exaggerate, think symbolically, and reason metasytematically.
The researchers used Vernon Howard's (1997) Typology of Practice in the construction of the test instruments in order to jointly reflect the students knowing that, which is what students needed to know cognitively, and also knowing how, which is about students creating expressive objects of meaning. Neither the that nor the how was considered as more important than the other, but rather their becoming apparent in the unification of form and matter in the expressive object.
Because the instruments designed for the project were needed primarily for performance assessment, it was decided that an authentic assessment approach would be more consistent with the goals of the project. Authentic assessment requires the construction of alternative assessment items (Armstrong, 1994) that are considered an alternative to traditional objective tests and essays. They are focused on student performance, which is observable evidence of what students know and can do. Authentic assessment calls for authentic performances, which include the behaviors of aestheticians, architects, art historians and critics, artists such as folk artists, people working in all forms who confront art in their daily lives and people whose avocational activities relate to art. The authentic performance tasks used in the project's assessment process were real life decisions that grew out of the curriculum, were feasible in terms of available time and resources, and could be scored and reported in ways that satisfied teachers, parents, and administrators. The performance assessments, furthermore, were designed in such a way that they included:
* both the procedural and focal knowledge that students needed in order for them to know how and be able to do various learning activities in the arts;
* the core performance roles or situations that all pre-K-12 students should encounter and be expected to master;
* the most salient and insightful discriminators that could be used in judging artistic performance;
* sufficient depth and breadth to allow valid generalisations about student competence;
* the training necessary for those evaluating artistic performances to arrive at a valid and reliable assessment; and
* a description of audiences that should be targeted for assessment information and how that assessment should be designed, conducted and reported to those audiences.
The most important concern in the physical design of the performance assessment was that it reflect the nature of the exercises already embedded in the art curriculum and that it encourage students to study their own train of thinking as perhaps revealed in notes, sketches or practice efforts. Not every behavior that might be assessed is always evident in a single work that requires the performance description to specify the steps that should be followed prior to and during the execution of a work or made evident in a succession of works. Procedural skills, such as practice toward improvement, doing something smoothly and quickly, understanding the direction a practice session should take, controlled improvement or getting the "feel" of something, are equally difficult to discover in a single product.
The Design of the Scoring Rubrics
A holistic rubric has two particular virtues. It generally communicates how the work appears in the context of other works and provides a scoring system that is easy to learn and use. Holistic scoring requires a general assessment of a group of works looked at as a whole, producing a single score based on a 4-point scale. The scoring rubric uses four sets of established criteria for scoring student portfolios or performances. It describes the four levels of performance a student might be expected to attain relative to a desired standard of achievement. It also provides performance benchmarks, which tell the evaluator what characteristics or signs to look for in a student's work and how to place that work on a 4-point scale .
The rubrics used to assess performance in grades K-12 also used maturation benchmarks that reflected increasingly higher levels of performance based on both the maturity level of the student and the expectation that as students progress, they will receive the benefits of more advanced instruction in art. Higher level (secondary) rubrics contain descriptors that reflect increasingly higher levels of thinking and visual abstraction.
Four rubrics were designed, one each for pre-K-2, 3-5, 6-8, and 9-12, and each specifying four performance levels: excellent, very good, satisfactory, and inadequate. The rubric descriptors at each level reflected age-appropriate cognitive, aesthetic, and technical skills sequentially organized. They were designed to measure performance content specified in the Florida Sunshine Standards A and B that, like the national standards, specified content in 1) understanding and applying media techniques and processes and 2) using knowledge of structures and functions.
The performances specified in the rubrics came from three sources: Piaget's (1952) pre-operational, early concrete operational and formal operational stages; Lowenfeld's (1947) scribbling, pre-schematic, schematic, gang stage, reasoning stage, period of decision stage; and McFee's (1961) skill improvement stages, which include searching for pattern, using verbal descriptions of space, exploring consistencies in shape, form and size, manipulating things as a unit, taking an average of things, completing visual wholes and recognizing patterns in figure and ground.
The rubrics used to evaluate all the portfolios used a 4-point scale with a score of 4 (excellent) being high and a score of 1 (inadequate) being low. The instruments and the adjudication process itself were modeled after the "A" quality section AP portfolio review and adjudication process used by the College Board's Advanced Placement Program in Studio Art (AP) administered by the Educational Testing Service (Askin, 1985).
Adjudication of Portfolios
Teacher Training in the Use of Rubrics
Project teachers were first introduced to using the rubrics through the activity of scoring sample portfolios made up of sample student artworks. They were given the opportunity to study and question the rubric descriptors and were advised that the instruments listed some, but not all, of the possible descriptors, which could be used. In judging the work they also were advised that a student being scored at a given level might achieve most but not necessarily all, of the descriptors listed for each qualitative level of performance. In the first formal adjudication, Sample 1-4 portfolios were selected in advance by the researchers and were scored by three project teachers, with the other project teachers in the group looking on. Afterward, the teachers discussed where they agreed and disagreed. The samples were discussed at the pre-K-2, 3-5, 6-9, and 10-12 levels. This benchmarking process preceded each adjudication conducted in the project.
Deciding What to Judge
Project teachers were expected to judge student portfolios containing four different, two-dimensional works, that is, drawings or paintings selected by the teacher and/or the student to represent a "body" of work. The portfolios included four works using a variety of media and subject matter. The choice to use varied works rather than works reflecting common assignments was intended to reflect what actually occurs in American schools, where art teachers make different assignments and art students solve visual problems in different ways.
The Gestalt Method
When scoring the portfolios, the teachers were instructed that the rubric, which was reviewed at the start, was only a guide and used only as needed in the adjudication process. The teachers were told to apply the rubrics holistically, judging four works as a whole, giving a single score guided by the benchmark training session and their own intuitive understanding of expressive forming as artists and as art teachers familiar with the art performances of K-12 students at a given level. The teachers were cautioned that this judgment process was to be used in order to assess the expressive quality of the four works as a whole, rather than apply a reductionist scoring method that evaluated elements, principles, and techniques. They also were advised that the method of using a checklist to obtain objective scoring while producing valid scores all too frequently overlooked the qualitative Gestalt or "hair on the back of the neck" sense of the power of the expressive object.
The teachers also were advised that they should plan to use all four scoring levels in their assessment, including at least a few "ones" (low) and "fours" (high). The benchmark sampling activity preceding the scoring process was used as a guideline to help the teachers mentally envision how student works could be evaluated on a 4-point scale where a portfolio of works of outstanding quality would receive a score of four and a portfolio of works of low quality would receive a one. A score of three would then be given to works which would be on the high side but not as strong as a four, and a score of two would be given to works on the low side but not as weak as a one.
The project teachers were advised in scoring the portfolios that they also needed to think about achieving sufficient score spread in judging the portfolios to ensure that the judges were discriminating by giving scores at the one, two, three, and four levels. They also were told to expect that in the portfolios they assessed there would be at least a few one and four scores, although the majority of scores given would probably be at the two and three levels. The choice to use a four, rather than a five-point rubric was to avoid teachers not making critical decisions and scoring most of the work in the middle of the scoring rubric. Earlier pilot studies of art teachers using these instruments indicated that art teachers more often than not scored the portfolios in the upper two quartiles of the scoring range, thus producing a skewed distribution rather than a normally distributed bell curve probability.
Selection of Artwork
The MAAP project required teachers to select one of their classes to be included in the study over a 4-to-8-month period. They were asked to organize with the students' help two portfolios of four works for each student, one collected at the beginning of the school year and a second collected at the end of the course, either in January or April. The first student portfolios of four works (A-1) were initially adjudicated in the fall by the students' teacher and by two other project teachers who volunteered to help judge the work. The choice to ask the students' teacher to score the portfolio in the A-1 rating was both to ascertain the teacher's ability to independently score the portfolio and also to provide a dialog between the teacher and the other independent raters as to the need for objectivity in the rating process. The A-1 scoring results indicated that, although teachers who scored their own students' work had a high level of agreement with the independent judges, more often than not they scored their own students work either somewhat higher or lower.
In the A-1 adjudication process some teachers had difficulty in objectively scoring their own students' work according to its overall expressive quality, preferring in some cases to score the work on how well the student followed the teacher's instructions or how much improvement the student had made over previous assignments. The ensuing dialog between the teachers and the other raters helped make it clear that works that do not follow a teacher's lesson plan still may be powerfully expressive. Most teachers did, however, have difficulty reconciling the work of children with disabilities with the other children in the class. It was decided, however, that in this project, the portfolios of these children would be viewed on the same basis as the other children in the class.
In scoring the portfolios, teachers were further advised during the benchmark-sample training sessions that a one-point difference in scores between two different judges was acceptable, but that a score difference of two or more points would suggest that the judges might not be looking at the same features in the works. These disagreements would indicate what AP portfolio testers call "reader fatigue," (Askin, 1985) where judges became unfocused because they were tired or needed a break. When this happened with some frequency, the adjudication would be stopped in order to give the group a break or to give individuals or the whole group additional benchmark training. The one-point difference standard has been used by ETS in the studio art program and similar measures are employed in the International Baccalaureate. Where assessment raters continue to show a high level of discrepancy with other raters, different portfolio assessment programs solve the problem in different ways. Some programs seek agreements among the raters in order to change their scores. Some call for a chief scorer to change the scores arbitrarily. Other programs require the judges to undergo pre-training activities and, when they too frequently disagree with other raters, they are dismissed in advance or if used, not invited to return. This project followed the AP approach, which is to ask the raters whether they would like to change their scores or leave them as they were originally scored.
When judges disagreed by more than one point, the adjudication leader asked the two judges to take a second look at the portfolio to see if they could remember what score they had given it. In most cases, one judge or the other decided to give it a different score than the one originally given. One or both of the judges most often decided to change their scores in order to agree on no more than a one-point difference. If they still continued to disagree the scores were left as they were.
Summary of Findings
Before reporting the analysis of the data on the student portfolio assessments it should be kept in mind that this effort was not an experimental study, where experimental and control groups were compared. It is not, therefore, possible to report empirically verified evidence that all the research goals were confirmed. Nor does the report present the mean scores for the students, school districts and project sites in order to protect the privacy of the data and not misrepresent the goals of authentic assessment, which are not about reporting failures or successes, but rather about how children, teachers, schools and school districts can do a better job of educating students. The findings do, however, tend to support all four of the research claims that the process does support the quantification of expressive behaviors, that there was a high level of inter-rater reliability among teachers scoring the pre-test and post-test portfolios, that the scores were normally distributed, and that gains in mean scores were unevenly distributed among students scoring in both the higher and lower performance categories.
Although data from two, four-works portfolios from nearly 1,000 students in 51 classrooms from 15 school districts were analyzed, in some cases the sample was too small to be certain about some of the conclusions or suggestions reported. The most important question to be answered was whether there was a high level of inter-rater reliability among the three different raters scoring each portfolio. These comparisons were reported in two ways: A-1 the initial scoring of pre-test portfolio, and B-1-B-2 assess-ment comparing the pre-test portfolios and the post-test portfolio as one group. Although the same group adjudicated the A-1, B-1, and B-2 adjudications, the mean score gains in the B-1-B-2 adjudications were greater than between the A-1 and B-2 comparisons. Scores on the pre-test portfolios scored separately tended to be higher than when they were later mixed and blind scored in the B-1-B-2 comparisons.
Figure 1 shows the inter-reliability as measured by the Spearman's rho Coefficient for the correlation between the mean scores provided by the raters in A-1 and B-1, B-2 assessments for all schools in Florida and Indiana. This table reveals all the correlation coefficients were medium to low, but all were significant. The correlations between A-1 and B-2, were the lowest, and between B-1 and B-2 the highest. The B-1-B-2 comparisons suggest that in only 1% of the cases is it possible that the mean scores for the pre- and post-test differ, which suggests an extremely high level of agreement among the raters in the B-1-B-2 pre- and post-test comparisons.
Figure 2 answers the question of whether the rater scores for a given classroom were normally distributed and whether there was sufficient score spread in order to determine whether the test discriminated among the portfolio scores. The box plots depicting the distributions of scores in the pre-test (A-1) and B-1-B-2 pre- and post-test comparisons reveal the score spread for A-1 as being smaller than in the B-1-B-2, comparisons with the scores in the B-2 pre-test as having greater range and a more normal distribution than either A-1 or B-2, with a greater number of high scores in B-2. This suggests that the final adjudication is more discriminatory when comparing the B-1 and B-2 portfolios.
The question of whether there was an improvement in the group mean for each school following the training of the teachers cannot be statistically confirmed, but gains in the class mean scores occurred in 33 of the 52 schools. Gains in mean scores comparing B-2 and A-1 occurred in four other schools, which suggests at least some mean score gains in 71% of the schools. On a 4-point scale, gains ranged from 1.0 (a 25% gain), to as little as .10, which suggest little if any gain. In 16 schools the mean class score increased by more than 50%. In 11 schools or about 30% of the schools, mean scores in the pre- and post-test showed a decline. These ranged from declines as low as .03 to .50 points, with nearly half being less than one-tenth of a point. It should be noted that the decline in the mean score for a given class does not mean that the student performance decreased from what it was in the beginning, but rather that lower student scores are more likely to occur when compared to a significant number of students showing significant gains.
To determine if the losses or gains in class mean performances were significant, the Wilcoxon Signed Ranks Test, a nonparametric test to compare two related samples, was used. The results were significant at the .05 level of confidence, with 19 of the 51 schools being positively or negatively significant. Figure 3 reveals the gain or loss for the 51 schools with .00 as the mean score. The table also reveals that 36 of the 51 classrooms improved over time.
Figure 4 shows whether the gain in mean scores occurred evenly at all grade levels. The results suggested that the greatest gain occurred at the 6-8 grade levels, somewhat less at the 1-3 grade levels, and much less in the 9-12 and 4-5 grade levels, with the smallest gains occurring in grades 4-5.
It appeared that some of the 51 schools in Florida, Indiana, and Illinois revealed greater student gain than others. The gains in Illinois, where no studio interventions were offered, revealed little or no gain. Interestingly, in one of the Indiana districts where teachers did participate in the same studio workshops, there was actually a loss in the mean scores. However, it must be noted that substantial numbers of students with disabilities were included in these classes. Among the districts that showed gains the mean score gain ranged from .15 to .35 on a four-point scale.
The final question was whether lower- and higher-performing students with differing art ability levels show the same gains in student performance. Figure 5 reveals the gains and losses in mean scores for the 51 schools according to the scores of higher- and lower-performing students with the mean score for each classroom being considered the base line "0." Increases are revealed in scores above the "0" level and decreases below the "0" level. Figure 5 suggests that while less than half of the higher-achieving students improved in their scores, 85% of low-performing students improved their scores.
Figure 6 illustrates the gains among high- and low-performing students and suggests these gains were not evenly distributed. The Figure suggests that lower performing students as measured on B1 improve at all levels with higher performing students on B1 showing at least some improvement at all levels except grades 4-6.
Results of the Student Portfolio Assessment
The principal study question as to whether the assessment process was itself reliable was confirmed based on the study sample that included 51 classrooms and nearly 1,000 students. Some of the other conclusions, including the influence of the in-service studio and curriculum training on student performances at different levels, at different sites and on students at different grade levels, require further study. What also should be kept in mind is that the study participants did not constitute a random sample, but rather were volunteers from 51 different schools and 15 different school districts, each operating in a different context with different school populations and with differing resources and school support levels. Although comparisons between schools and between students were necessary in order to confirm the effectiveness of the assessment process, these comparisons do not support the goals of authentic assessment, which are not designed to compare teachers and schools with one another, but rather to assess student progress within a given classroom as a guide to improving the quality of instruction.
The analysis of the data derived from the adjudication of nearly 2,000 portfolios and 16,000 student artworks confirms that an authentic assessment process where art teachers are trained how to conduct themselves will produce quantifiable and reliable estimates of student performance in the making of expressive objects. Additionally, these results suggest that qualitative instructional outcomes can be assessed quantitatively, yielding score values that can be manipulated statistically and that produce measures that are both valid and reliable estimates of student art performance.
This adjudication process involving art teachers and their students clearly demonstrates that art teachers with appropriate training have the ability to evaluate student performances, can govern themselves and set their own intuitive standards for providing valid and reliable estimates of their own students' performances. It further demonstrates that these performances that come from the school curriculum reflect the values of the teacher, the student, the school program, and the goals of learning in art.
The study data further support the notion that the project rubrics employed in these authentic assessment settings by teachers familiar with the nature of creative forming in art can conduct an assessment process that effectively measures student expressive outcomes guided only by developmentally ordered rubrics and the teachers' own intuitive knowledge of artistic thinking and making. These results suggest more importantly that there are viable alternatives to paper-and-pencil tests in art assessment, that teacher bias moved by experience in teaching and their intuitive understandings of art can even be a positive force in assessing art products, and that all art teachers need not be expected to teach, nor all students need perform, in the same way.
The analysis of the data suggests that differences in student art performance and their progress will vary among different classrooms at different grade levels and in different school districts. These findings suggest that student and teacher abilities and school environments are unequally distributed in their effect on performance outcomes, and that comparisons made between the performances of teachers, students, schools, and school districts are neither useful nor compatible with the goals of improving instruction. A competitively ordered assessment of the performances of teachers and students in different schools and in different school districts is both inappropriate and counterproductive to achieving the aims of authentic school assessment.
The analysis of the data on the question of whether student expressive performances improved over time suggests, but does not empirically confirm, that gains in student performance may be positively related to the teacher workshop interventions, the grade level of the students, and the students' expressive abilities. Overall, student performance gains were unevenly distributed among different grade levels, among teachers receiving the same or different studio training and among students of unequal expressive ability. These data, which support the idea that the students of teachers who received no training (Illinois) made less progress than the students of teachers who received the training, suggest that the in-service training of the teacher has a positive effect on student performance. Results and performances (Indiana) also suggest that some students of teachers receiving the same training may not benefit equally from teacher training. This raises questions both about the quality and amount of training, and how useful it is to the teachers who received it.
Recommendations for Further Study
Because the participants in the study came from different schools, it is suggested that the study be replicated involving art teachers in a single school or within schools with similar school populations. At the secondary level, this could be accomplished through the study of several classes in a large high school or, at the elementary level, using classrooms from schools with similar populations. The expected outcome of the study would be to see how student performances vary among the same or similar school settings. An effort also should be made to replicate the study in other areas of the United States using randomly selected school populations that could confirm the generalizability of the project's findings in different settings. This study would investigate the effects of involving other school populations, different individuals conducting the assessments and different training activities as a means to increase the generalizability of research plans.
Another possible study would be to use other approaches to authentic assessment, including the use of different rubrics, teacher logs, teacher-constructed tests measuring content knowledge and approaches to student self-evaluation. Such measures could further confirm or strengthen the predictability of the project's performance rubrics and also provide measures to better assess what additional forms of art classroom performance need to be considered. An attempt should be made to develop a system-wide assessment plan using some or all of the procedures used in this study in order to test the feasibility of a system-wide art performance assessment. This should include the development of sampling techniques that could be used to accurately assess the school district's art program without assessing every child at every level over 12 years of schooling. This effort should include the use of electronic portfolios designed to provide longitudinal studies of student progress over 12 years of schooling in art.
There also is a need to study what kind of teacher support and training is the most useful and has the greatest impact on the teacher and the students' art performances. This study suggests, but does not confirm, that studio and curriculum training does have a positive impact on teacher and student behaviors. The study also suggests findings that studio practice without appropriate teaching methods on how to apply these strategies in the classroom might not be as effective in improving student performance as instruction that combines studio and teaching methods. The effects of the training on the teacher and on the student also could be experimentally investigated with respect to the effectiveness of different approaches to studio training on the teacher and the students' performance.
This report was about an effort by 70 pre-K-12 art teachers and 1,000 students in three states to participate in an authentic art assessment study and a call for school administrators and legislators to reconsider a national testing policy that supports a single set of predetermined educational standards and assessments. The report attempts to chronicle the activities of these students and teachers in a yearlong effort to address the problem of art assessment in pre-K-12 schooling.
This project represented a unique collaborative effort between three major higher education research institutions in a partnership with K-12 public school art teachers in the field. In addition, this effort produced substantially increased interactions among art education faculty in these institutions. It also led to the sharing of multiple viewpoints and philosophies while stimulating the development of the project. The collaboration among higher education faculty also fostered a mixture of knowledge and expertise among individuals to produce an even broader perspective with which to identify and resolve research questions and approaching issues related to art assessment. This higher education collaboration also contributed to the translating input from art teachers into the ways the preservice assessment training of art teachers is conducted.
The collaboration of researchers from higher education with art teachers from the field also provided a unique opportunity for matching accepted research methodologies and practices with the identified needs of art teachers. The blending of common interests and needs provided benefits to both groups. Findings from the studies and the lack of more school/ college collaborations suggest that there is a marked need for members of the higher education community to conduct school-based research with practical applications. The research and involvement of the higher education community with art teachers and schools offers the potential to improve art education in ways that other kinds of research may not. This research also contributes to the researchers' understanding of issues and concerns that are of importance to practitioners. By shaping research goals based on this kind of activity, researchers can better provide information and more assistance in resolving the problems faced by practitioners. The practical value of this research will further benefit the research community by identifying additional issues and research questions of a philosophical and theoretical nature.
Outcomes from this project support the need for more school-based research involving collaborations between the higher education community and art teachers in the field. Additional studies of assessment and its implications for the field of art education are necessary. These studies, whether quantitative or qualitative, also should investigate the effects assessment has had on the field. Future studies also should include the impact of assessment on curriculum, instruction, and the quality of student work. Empirical studies of student achievement in art education, like those conducted in this project, can help provide a foundation for understanding the impact of assessment on learning in the field.
Copyright 2003 by the National Art Education Association
Studies in Art Education
A Journal of Issues and Research
2003, 44(4), 350-371
Askin, W. (1985). Evaluating the advanced placement portfolio in studio art, New York: The College Board.
Armstrong, C. L. (1994) Designing assessment in art. Reston, VA: National Art Education Association.
Bloom, B. (1956). in J. Englehart, E., Furst, W. Hill, & D. Krathwohl (Eds.), Taxonomy of educational objectives handbook cognitive domain. New York: David McKay.
Cusic, P. A. (1994). Teachers regulating themselves in owning their own standards. In N. Cobb (Ed.), The future of education perspectives on national standards in America (pp. 205-257). New York College Examination Board.
Eisner, E.W. (1992). The federal reform of schools looking for the silver bullet, NAEA Advisory, Reston: Reston, VA: National Art Education Association.
Florida State Department of Education (1996). Sunshine Standards. Tallahassee: Florida State Department of Education.
Howard, V.A. (1997) Artistic practice and skills. In Perkins & Leondar (Eds.), The arts and cognition, pp. 208-240. Baltimore: Johns Hopkins University Press.
Improving America's Schools Act of 1994. 6, 103d Congress, 2nd Session.
Lowenfeld (1947). Creative and mental growth. New York: MacMillan.
National Art Education Association (1999, Feb.). Student achievement in the arts falls short. NAEA News, pp. 1-3. Reston, VA: Author.
National Art Education Association (1998). About the NAEP assessments. NAEA News. Reston, VA.
McFee, J. (1961). Preparation for art. San Francisco: Wadsworth Publishing Inc.
Music Educators National Conference (MENC). (1994). National standards for arts education: what every young American should know and be able to do in the arts. Reston, VA: National Art Education Association.
Persky, H. R., Sandene, B.A., & Askew, J.N. (1999). The NAEP 1997 arts report card. Eighth grade findings from the national assessment of educational progress. Washington DC: U.S. Department of Educational Research and Improvement.
Piaget, J. (1952). The origins of intelligence in children. New York: International Universities Press.
U.S. Department of Education (1994). Goals 2000: Educate America Act. Washington DC: Government Printing Office.
Charles M. Dorn
The Florida State University
Correspondence regarding this article should be addressed to the author at the Arts Administration Program, The Florida State University, School of Visual Arts & Dance, 128 Carothers Hall, Tallahassee, FL 32306-4480. E-mail: firstname.lastname@example.org.…
Questia, a part of Gale, Cengage Learning. www.questia.com
Publication information: Article title: Models for Assessing Art Performance (MAAP): A K-12 Project. Contributors: Dorn, Charles M. - Author. Journal title: Studies in Art Education. Volume: 44. Issue: 4 Publication date: Summer 2003. Page number: 350. © National Art Education Association Winter 2009. Provided by ProQuest LLC. All Rights Reserved.
This material is protected by copyright and, with the exception of fair use, may not be further copied, distributed or transmitted in any form or by any means.