Students are not used as raters to evaluate or judge faculty members' academic performance in some universities because of the lack of trust. This study examined the extent to which students can give consistent and reliable ratings. Nineteen graduate students were asked to give ratings to the academic performance of a faculty member on two occasions with two weeks in between. Results showed evidence of interrater agreement and rater consistency. With appropriate training, students can be a reliable source of information about faculty members' academic performance.

In a number of domains, the use of raters to assess performance on specific tasks is common. Judgments given on air crew resource management skills determine whether or not an individual is expected to be a successful pilot. Judgments given by a panel of raters determine the chance of getting a particular job. Ratings of the judges in gymnastics determine which participant will win the gold. These examples, and many others, show how important the decision could be and how precise, reliable, and unbiased ratings and judgments should be.

Evaluation of faculty involves the gathering of information for understanding and improving performance as well as for judging its quality. Purposes of faculty evaluation can be reduced to two: on the one hand, faculty evaluation has a formative function - that is, the results are used to support faculty development, growth, and self-improvement; on the other hand, it has a summative function - that is, the results are used to make personnel decisions on tenure, promotion, reappointment, and salary. However, the majority of community college faculty members and administrators surveyed identified faculty development as the primary purpose, with the provision of information on promotion, retention, dismissal, and normal salary increments as a secondary purpose (Rifkin, 2002).

Quantitative ratings given by students provide one of the most common methods for evaluation of faculty and teaching effectiveness. Seldin (1984) found that administrators utilized student-rating data in two-thirds of 616 institutions surveyed. However, it is also the method that raises the most concerns (Rifkin, 2002). Besides the grades given in the course, the rigor of the course, teaching style, and teacher's personal characteristics are possible factors affecting the accuracy of ratings (Jean, Ashok, & Dawn 2002). Obtaining accurate and reliable performance ratings is a challenge faced in most educational and employment settings. Reliability coefficients of ratings given in practical settings are typically quite low, ranging from 0.30 to 0.75. As mentioned, ratings of performance are frequently used as the sole basis for making extremely important personnel decisions. It may be argued that the reliability of performance ratings is generally inadequate to make decisions as important as those mentioned (Raymond & Houston, 1990). Therefore, the need to acknowledge the nature of student evaluations is paramount.

This study aimed at investigating interrater agreement and rater consistency. The specific substantive issues considered are:

1. Whether or not students - as raters - demonstrate consistency in their ratings of a faculty member's performance in an academic setting over time.

2. Whether or not students demonstrate an acceptable level of interrater agreement.



Subjects were fifty-six graduate students (40 males and 16 females) registered at Mu'tah University, Jordan during the second semester 2003 - 2004. They were asked to rate the teaching performance of a faculty member (of their choice) who was not teaching them during that particular semester, but who had taught them in the previous semester. This condition was set so students would not feel threatened. It happened that the largest number of students (19) rated the same teacher. Ninety per cent of the subjects were teachers at schools. …


