Academic journal article The Journal of Faculty Development

Should Student Outcomes Be Used to Evaluate Teaching?

A Context for Learning Outcomes

So what's the problem? Ever since the first brontosaurus stomped onto the screen in Jurassic Park, every source of evidence you could possibly use in the evaluation of teaching was FALLIBLE - from student ratings to peer observations to learning outcomes to ratings by close relatives. That applies to all formative (teaching and course improvement), summative (annual review, contract renewal, promotion & tenure), and program (accreditation and accountability) decisions. All of the sources are evil, chock full of psychometric sin. However, the different sources vary considerably in the type and degree of sin. They are like all the bad food we eat, except kale, which tastes like insulation unless you blend it into a smoothie with fruit, yogurt, flaxseed, and Doritos® to mask the flavor.

The most defensible strategy is to pick the best sources for a specific decision according to technical and legal standards. After all, high-stakes, career employment decisions are being made about faculty. The context for that strategy and use of outcome measures is briefly described in this section: (1) state of current practice, (2) 15 sources of evidence, and (3) triangulation of multiple sources.

State of Current Practice

Since the 1990s, give or take a decade, the practice of augmenting student ratings with other data sources of teaching effectiveness has been gaining traction in liberal arts colleges, universities, medical schools/colleges, and other institutions of higher education worldwide and in a few distant planets. Such sources can serve to broaden and deepen the evidence base used to evaluate courses and the quality of teaching (Arreola, 2007; Benton & Cashin, 2012; Berk, 2005, 2006, 2013a, 2013b; Cashin, 2003; Gravestock & Gregor-Greenleaf, 2008; Hoyt & Pallett, 1999; Knapper & Cranton, 2001; Seldin, 2006; Theall & Feldman, 2007). In fact, several comprehensive models of "faculty evaluation" have been proposed (Arreola, 2007; Berk, 2006, 2009a, 2009b; Braskamp & Ory, 1994; Centra, 1993; Gravestock & Gregor-Greenleaf, 2008), which include multiple sources of evidence.

15 Sources of Evidence

Guess what? There are 15 potential sources of evidence of teaching effectiveness reported in the literature: (1) student ratings, (2) peer observations, (3) peer review of course materials (4) external expert ratings, (5) self-ratings, (6) videos, (7) student interviews, (8) exit and alumni ratings, (9) employer ratings, (10) mentor's advice, (11) administrator ratings, (12) teaching scholarship, (13) teaching awards, (14) learning outcome measures, and (15) teaching (course) portfolio.

A critique and the major characteristics of each source, including type of measure needed to gather the evidence, the person(s) responsible for providing the evidence , the person or committee who uses the evidence, and the decision(s) typically rendered based on that data, were presented previously (Berk, 2006, 2013b). In fact, our hero's review should have been delivered to your doorstep by an Amazon drone. If you didn't get it, contact Amazon.

Triangulation of Multiple Sources

There are stacks of articles that weigh the merits and shortcomings of these various sources of evidence (Berk, 2005, 2006). Put simply: There is no perfect source or combination of sources, plus there is a scarcity of evidence on different combinations, such as student ratings and self-ratings (Barnett, Matthews, & Jackson, 2003; Stalmeijer et al., 2010). Each source can supply unique information, but, as noted previously, also is fallible, usually in ways different from the other sources. For example, peer ratings tend to be less reliable with biases that are different from student ratings (Thomas, Chie, Abraham, Raj, & Beh, 2014); student ratings have other psychometric weaknesses (Benton & Cashin, 2012; Nilson, 2012; Spooren, Brockx, & Mortelmans, 2013). …

