Effects of Behavioral Anchors on Peer Evaluation Reliability

Article excerpt


This paper presents comparisons of three peer evaluation instruments tested among students in undergraduate engineering classes: a single-item instrument without behavioral anchors, a ten-item instrument, and a single-item behaviorally anchored instrument. Studies using the instruments in undergraduate engineering classes over four years show that the use of behavioral anchors significantly improves the inter-rater reliability of the single-item instrument. The inter-rater reliability (based on four raters) of the behaviorally anchored instrument was 0.78, which was not significantly higher than that of the ten-item instrument (0.74), but it was substantially more parsimonious. The results of this study add to the body of knowledge on evaluating students' performance in teams. This is critical since the ability to function in multidisciplinary teams is a required student learning outcome of engineering programs.

Keywords: peer evaluation, assessment, behaviorally anchored rating scale


In this paper, we compare the inter-rater reliability of three peer-evaluation instruments when the instruments are used to adjust team members' grades based on the ratings of their contributions to the team. The research setting involves project teams comprised of junior-level engineering students. Our results show that adding behavioral anchors and descriptive instructions to a one-item instrument significantly increases instrument reliability and that a one-item behaviorally anchored instrument has inter-rater reliability as high as that of a ten-item unanchored instrument.


1) Teamwork in engineering courses: In recent years, there has been a great deal of engineering education research aimed at evaluating teamwork. This is driven both by engineering's industrial stakeholders and accreditation standards. ABET's EC2000 Criterion 3, outcome (d) is "an ability to function on multi-disciplinary teams" [1]. Although there has been debate about how to apply the term "multi-disciplinary," the ability to function on a team is central to this outcome.

Many engineering professors incorporate teamwork into their courses not only because employers and accrediting bodies look for these skills, but also because they value team-based educational methods. Advocates of cooperative learning methods believe that the best way for students to achieve the learning objectives in their courses is to work in learning teams. Many studies have shown that when correctly implemented, cooperative learning improves information acquisition and retention, and enhances higher-level thinking skills, interpersonal and communication skills, and self-confidence [2]. Cooperative learning is an instructional paradigm wherein teams of students work on structured tasks (e.g., homework assignments, laboratory experiments, or design projects) under conditions that meet five criteria: positive interdependence, individual accountability, face-to-face interaction, appropriate use of collaborative skills, and regular self-assessment of team functioning [3]. In addition to creating individual accountability, the average of team members' peer ratings can be used as a self-assessment of team functioning. The instruments presented here do not measure the other criteria for successful cooperative learning.

2) Formative vs. summative assessment: Peer evaluations have been administered to engineering student teams in one of two ways: formative assessment [4] or summative assessment [5]. Formative assessments (such as Team Developed) [4] are used to provide feedback to students in order to help them improve their teamwork skills. Therefore, they should provide specific information about which student behaviors are effective and ineffective. Longer, more detailed evaluation instruments can be appropriate for formative assessment because the targeted feedback helps students to understand what they are doing well and in what areas they need to improve. …