Sources of Bias in Teacher Ratings of Adolescents with ADHD

Sources of Bias in Teacher Ratings of Adolescents with ADHD

Best practice assessment of childhood ADHD includes behavior ratings from multiple sources across multiple environments. However, adolescents in secondary schools interact with several teachers each day, and research has shown that teacher perceptions of the same child can be highly inconsistent. As a result, rating scale data can be equivocal, depending on which teachers are selected. The intent of the present study was two-fold: 1) to assess the consistency between teacher behavior ratings of adolescents with ADHD, and 2) to explore predictors of rater leniency or severity (i.e., sources of bias). Results suggest that interrater reliability within our sample was moderate, consistent with previous research. Further, teacher characteristics, including sex and age, were related to biases on ratings of student hyperactivity-impulsivity. Specifically, women and younger teachers provided significantly more severe ratings on average than did men and older teachers. Implications for the interpretation and statistical norming of ADHD rating scales are discussed.

Keywords: ADHD, Rating scales, Teacher ratings, Rater bias

1. Background

The cardinal symptoms of ADHD are significant and persistent impairment in attention or activity regulation relative to same-age peers. To be diagnosed with ADHD, individuals must exhibit six or more behavioral symptoms of inattention or hyperactivity-impulsivity for longer than six months, in two or more settings (e.g., home and school), with impairment in social, familial, or academic functioning beginning prior to age seven (American Psychiatric Association [APA], 2000). Thus, the diagnosis of ADHD is based entirely on observable behaviors and impairments, as research has failed to identify medical tests that reliably help in diagnosis (Pelham, Fabiano, & Massetti, 2005). Psychological tests also appear insufficient to diagnose ADHD because outcomes for individuals with and without the disorder largely overlap (i.e., poor instrument sensitivity and specificity), even though significant group differences are apparent between large samples (Frazier, Demaree, & Youngstrom, 2004). Similarly, group differences between ADHD and undiagnosed groups have been found on some neuropsychological measures, but these instruments are not sensitive or specific enough to diagnose individual cases (e.g., Homack & Riccio, 2004; Preston, Fennell, & Bussing, 2005). As a result, clinicians must rely primarily on behavioral observations-directly or indirectly-when assessing individual cases.

1.1 Interpreting Rating Scale Discrepancies

Behavior rating scale data from multiple informants have been generally shown to be both valid and useful and, as a result, behavior ratings are a vital component of best practice assessment of ADHD (American Academy of Child and Adolescent Psychiatry, 1997; American Academy of Pediatrics, 2001). However, behavior ratings have significant limitations, including high rates of disagreement between raters. When behavior ratings are collected from multiple sources rating the same target child, some may appear relatively lenient and others appear relatively severe. Studies examining interrater reliability on behavior rating scales have traditionally found only moderate correlations (e.g., Achenbach, McConaughy, & Howell, 1987). In secondary schools, where students interact with several teachers in separate classrooms during the school day, interrater reliability among teachers (hereafter inter-teacher reliability) can be especially low. For example, Molina, Pelham, Blumenthal, and Galiszewski (1998) examined inter-teacher reliability among secondary teachers' ratings of adolescents with ADHD and found low to moderate reliability coefficients (intraclass correlations [ICCs] ranged from .21 to .52). Other studies suggest that inter-teacher reliability improves when teachers work within the same classroom, but even in overlapped environments there are considerable inter-teacher inconsistencies on ratings scales commonly used to assess ADHD (Danforth & DuPaul, 1996). …

