Academic journal article Journal of Teacher Education

Rating Teachers Cheaper, Faster, and Better: Not So Fast

Academic journal article Journal of Teacher Education

Rating Teachers Cheaper, Faster, and Better: Not So Fast

Article excerpt

Gargani and Strong (GS; 2014) claim that their findings from using a six-item observation rubric--the Rapid Assessment of Teacher Effectiveness (RATE)--is evidence that they can identify effective teachers better, faster, and cheaper than other observation instruments. At first glance, this goal may seem appropriate (to those not familiar with the complexity of observational research), in part, because Race to the Top has placed greater demands on administrators' time. Many states have revamped their teacher evaluation models to meet more "rigorous" standards. Most models place the greatest weight on student growth and observational data. And, given that achievement data can generally be used only in mathematics and reading, most teachers will primarily be evaluated with an observation system. In short, principals must now spend more time in classrooms observing teachers and gathering related artifacts. This is in addition to their 58-hr workweeks (U.S. Department of Education, 2013). If unable to accommodate longer hours, principals must choose to delegate more tasks to others and/or reduce the time they spend on other administrative duties (Lavigne & Chamberlain, under review). And as we and others have noted (Canadian Association of Principals & The Alberta Teachers' Association, 2014; Lavigne & Good, in press), principals are also coping with many new issues including low teacher morale and inadequate resources. Hence, we agree with GS that to the extent possible it is important to reduce the burden on already taxed principals who are responsible for evaluating teachers. But any observational measure used to evaluate teaching has to have ecological validity and a demonstrated capacity for describing and improving teaching in actual classrooms. We now discuss aspects of the instrument as described by GS. Subsequently, we will analyze its potential for improving practice and we suggest that RATE may be fast and cheap but we conclude that its value for improving practice is limited. This is an important limitation because the ultimate goal is to improve teaching and learning. We want more than only a rating of teaching quality.

Internal Validity

GS (2014) devote considerable space to describing the internal validity of the RATE instrument and tout its many strengths in comparison with other instruments used in the Measures of Effective Teaching (MET) project (for a review of this research, see Kane, Kerr, & Pianta, 2014). The extensive and careful methodological work in demonstrating the internal validity of RATE appears to be impressive. However, having said this, readers could benefit from much more information about how inter-rater reliabilities were established. GS provide us with information about the reliability of independent raters and how the ratings are improved by a discussion of the obtained ratings.

   After discussing similarities and differences in their independent
   scores, raters were now asked--but not required--to revise their
   own scores should they be persuaded by the discussion. The two
   revised scores for a pair were later averaged by the researchers
   to produce the RATE scores, (p. 394)

Later GS indicate that raters were intentionally rotated so that new pairs were constantly formed, allowing raters to "check their understanding of the rubric with each new partner" (p. 395). Given that raters may have discussed discrepancies and may have compromised in terms of assigning scores, it would seem that the raters are no longer independent and that the agreement indexes are likely inflated. Furthermore, the index for comparing two raters is typically the kappa coefficient or the percent of agreement between two raters. It is not clear why these data were not collected or at least reported. In addition, we know that sometimes poor judgments are made because of a perceived need to obtain consensus (e.g., Janis, 1972). It would have been useful to know more about how coder negotiations were conducted, and if space was insufficient to include this information in the article, GS could have provided supplementary information in a separate paper. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.