Academic journal article School Psychology Review

Generalizability and Dependability of Single-Item and Multiple-Item Direct Behavior Rating Scales for Engagement and Disruptive Behavior

Academic journal article School Psychology Review

Generalizability and Dependability of Single-Item and Multiple-Item Direct Behavior Rating Scales for Engagement and Disruptive Behavior

Article excerpt

A refer-test-place model of academic and social-emotional assessment has predominated in school-based practice for decades, wherein emphasis has been placed on quantifying child deficits for the purpose of classification (Reschly, 2008). However, focus has recently shifted toward tiered models of prevention wherein every student is exposed to primary prevention efforts and those who are not responsive to these efforts receive higher intensity supports aligned with their needs (Tilly, 2008). The success of a preventative model depends heavily on the availability of appropriate measurement tools for assessing student response to intervention.

Direct behavior rating (DBR) is one method for measuring student response to intervention that has received substantial attention in recent years. There are two central features of DBR assessments: (a) the behavior is operationally defined and (b) a brief and low-inference rating of that behavior is conducted over a specified period (Christ, Riley-Tillman, & Chafouleas, 2009). As such, DBR has been considered a hybrid between systematic direct observation (informants complete the form in close proximity to the actual behavioral occurrence) and behavior rating scales (impressions from the observation are rated on a scale; Chafouleas, Riley-Tillman, & Christ, 2009). Although the DBR literature has focused largely on academic engagement, disruptive behavior, and respectful behavior (Chafouleas, 2011), one of the often-emphasized strengths of DBR is its inherent flexibility. It has been noted that "DBR is not defined by a single scale, form, or number of rating items; rather, it is likely that lines of research will (and should) investigate multiple versions and applications of DBR as a method of assessment" (Chafouleas, Riley-Tillman, & Christ, 2009, p. 196). Although flexibility can certainly be viewed as an advantage of this method, it also means that much work must be carried out to validate different assessment approaches. To date, however, the psychometric research involving DBR has focused largely on the use of single-item scales (DBR-SIS), for which data summarization and interpretation take place at the individual item level. One preliminary study found that raters were generally more accurate when rating a single, globally worded item (e.g., academic engagement) than a single and more discrete behavior (e.g., raising hand; Riley-Tillman, Chafouleas, Christ, Briesch, & LeBel, 2009), suggesting that DBR-SIS measuring global behaviors may be the most efficient way to assess student behavior in an ongoing fashion (Christ et al., 2009). However, this research was limited in that the comparison was made at the level of an individual item. Unfortunately, it is not known how a single global item would compare to a composite score derived from multiple indicators of the construct of interest.

An important reason to investigate a DBR-MIS approach is that using multiple items to assess a construct may accelerate decision making. Researchers have previously suggested that between 7 and 10 ratings of a single item are needed to obtain a reliable estimate of behavior (Chafouleas, Riley Tillman, & Sugai, 2007; Chafouleas, Christ, &

Riley-Tillman, 2009; Chafouleas, Kilgus, & Hernandez, 2009). Although it has been argued that the brevity of a DBR assessment makes this data collection schedule reasonable, the need for 7-10 ratings indicate that classroom teachers must wait approximately 2 weeks before informed decisions can be made

about student performance. Alternatively, recent research has suggested that fewer occasions may be required if utilizing a MIS. Volpe, Briesch, and Gadow (2011) found that 8-10 rating occasions were needed to achieve adequate levels of reliability using a SIS derived from the IOWA Conners Teacher Rating Scale (Loney & Milich, 1982). However, the number of necessary rating occasions quickly diminished as additional items were added, such that only three rating occasions were needed for a 4-item scale. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.