What Do Alternate Assessments of Alternate Academic Achievement Standards Measure? A Multitrait-Multimethod Analysis
Kettler, Ryan J., Elliott, Stephen N., Beddow, Peter A., Compton, Elizabeth, McGrath, Dawn, Kaase, Kristopher J., Bruen, Charles, Ford, Lisa, Hinton, Kent, Exceptional Children
For more than a decade, educators have administered alternate assessments of alternate academic achievement standards (AA-AASs) to students with significant disabilities who cannot meaningfully participate in their states' general achievement tests. As a result of federal legislation, starting with the Individuals With Disabilities Education Act of 1997 (IDEA) and reiterated in the No Child Left Behind Act of 2001 (NCLB) and the Individuals With Disabilities Education Improvement Act of 2004 (IDEA 2004), every state has an AA-AAS and must ensure its technical soundness.
The technical soundness of AA-AASs, however, remains an area of concern. Basic questions about the constructs measured and their relationship to other measures of achievement remain largely unsubstantiated by rigorous research and validation studies. The paucity of published studies or documentary evidence for validity in states' AA-AAS technical manuals supports this assertion. The National Study of Alternate Assessments (NSAA) report (SRI International, 2009) provides a comprehensive descriptive summary of key attributes of AA-AASs and resulting accountability data for each state. The NSAA indicates that directors of AA-AASs in 41% of the states and one territory reported conducting a formal study to document that test and item scores are related to internal or external variables as intended. The NSAA also reported that in 59% of the states, a formal study had documented measurement of the construct relevance of their test. The information, however, is not widely available.
According to the U.S. Department of Education's nonregulatory document Alternate Academic Achievement Standards for Students With the Most Significant Cognitive Disabilities (U.S. Department of Education, 2005), "An alternate assessment must he aligned with the state's content standards, must yield results separately in both reading/language arts and mathematics, and must be designed and implemented in a manner that supports use of the results as an indicator of AYP." (adequate yearly progress; U.S. Department of Education, 2005, p. 15).
The AA-AASs are an important component of each state's assessment system and must meet the federal regulations outlined in Title I of the Elementary and Secondary Education Act (1965). The AA-AASs must also meet standards of high technical quality--reliability, validity, accessibility, objectivity, and consistency--expected of other educational tests (i.e., Standards for Educational and Psychological Testing, American Educational Research Association [AERA], American Psychological Association [APA], and National Council on Measurement in Education [NCME], 1999).
In addition, AA-AASs must have the following:
* An explicit structure.
* Guidelines for determining which students may participate.
* Clearly defined scoring criteria and procedures.
* A report format that communicates student performances in terms of academic achievement standards.
If the AA-AASs meet the required standards for technical quality and use, then educators can report the results of AA-AASs for up to 1% of the total student population for AYP purposes, in January 2009, the U.S. Department of Education published the Standards and Assessment Peer Review Guidance: Information and Examples for Meeting Requirements of the No Child Left Behind Act of 2001. This document extends the Standards for Educational and Psychological Testing (AERA et al., 1999) and provides even more specific guidance concerning validity evidence for AA-AASs. For example, the Technical Quality subsection [4.1] of the Peer Review Guidance document specifically asks the following:
(b) Has the State ascertained that the assessments, including alternate assessments, are measuring the knowledge and skills described in its academic content standards and not knowledge, skills, or other characteristics that are not specified in the academic content standards or grade level expectations?
(e) Has the State ascertained that test and item scores are related to outside variables as intended (e.g., scores are correlated strongly with relevant measures of academic achievement and are weakly correlated, if at all, with irrelevant characteristics, such as demographics)? (p. 35)
Towles-Reeves, Kleinert, and Muhomba (2009) have further identified this lack of availability. In a recent review of research on AA-AASs, these authors identified 23 empirical studies completed since 2003. Specifically, Towles-Reeves et al. lament that "there is considerably less research that has examined the extent to which actual student scores were associated with empirically verified instructional or other outcome variables" (p. 245). These authors called for "future research to investigate the relationship between AA-AASs (regardless of approach: portfolios, performance assessments, and checklists) and another accepted measure of student learning" (p. 246). They concluded, "there is no evidence to support the correlation of alternate assessments with other accepted measures of student learning" (p. 246). This claim is a serious one and should cause all users of these assessments to be cautious when interpreting AAAAS scores.
PREVIOUS RESEARCH ON AA-AASs
Some published evidence for the validity of the constructs that AA-AASs have measured does exist, but Towles-Reeves et al. (2009) did not review it. A validation study of the Idaho Alternate Assessment (IAA; Idaho Department of Education, 1999) scores focused on evidence about the underlying construct being measured (Elliott, Compton, & Roach, 2007). That study examined the relationships between ratings on the IAA for students with significant disabilities, corresponding scores on the general assessment, and ratings on two norm-referenced teacher rating scales: the Academic Competence Evaluation Scales (ACES; DiPerna & Elliott, 2000) and the Vineland Adaptive Behavior Scales (VABS; Sparrow, Balla, & Cicchetti, 1985). The study investigated IAA performance for a representative group of students with disabilities (N = 116) who, according to their individualized education program (IEP) teams, were eligible (SWD-Es) and participated in the state's alternate assessment, as well as for another group of students who had disabilities (N = 54) but were not officially eligible (SWD-NEs) for the alternate assessment. The study assessed both groups of students with the IAA and compared the students' results with other indirect assessments of performance, all of which were teacher-completed measures. The researchers included SWD-NEs as a control group for two reasons: (a) so that they could explore the relationship between the state's regular assessment (Idaho Standards Achievement Tests, ISAT; Idaho Department of Education, 2008) with accommodations and the IAA with a sample that could complete the ISAT; and (b) because the performance of SWD-NEs better matched that of SWD-Es than the performance of a group of students without disabilities would have. This analysis is critical because of the emphasis that AA-AASs, like general assessments, focus on academic content. We examined this seminal alternate assessment study in detail because it provided the basis for the design of the present study.
The evidence of interest in Elliott et al. (2007) concerned relationships between the constructs measured by the IAA and two other types of variables: (a) the ISAT, and (b) established rating scale measures of academic competence (ACES) and adaptive behavior (VABS). Correlations calculated for SWD-NEs between the IAA and the ISAT were in the medium (reading, language arts) or large (mathematics) ranges within content areas, but correlations also tended to be in these ranges when calculated across content areas (e.g., r between AA-AAS mathematics and ISAT reading = .67). When calculated for the entire sample, IAA reading, language arts, and mathematics scales all shared more variance with measures of adaptive behavior and academic enablers than with measures of academic skills. The correlations for SWD-NEs tended to be about twice as large as the same IAA to ACES Academic Skills relations for the SWD-Es.
Elliott et al. (2007) concluded that the evidence to support the validity of the IAA was mixed, yet on balance promising. The relationship between the reading, language arts, and mathematics achievement level ratings on the IAA and the concurrent scores on the ACES Academic Skills scales for the eligible students varied across grade clusters but in general were medium at best. When the researchers examined correlations for the same score relationships for the not-eligible students, the magnitude of the correlations increased noticeably. Collectively, these findings furnished evidence that the IAA scales measure skills indicative of the academic content characterized in the state's content standards. The medium to large correlations between the IAA and ISAT for the not-eligible students further reinforced that point. The evidence for both groups of students supports the validity of the IAA scores. Although the correlations among academic skills on the IAA and other measures indicated a meaningful amount of shared variance (i.e., 20% to 40%), in some cases, particularly at the elementary grade levels, there was more shared variance with the academic enabling and adaptive behavior constructs.
PURPOSE OF THE PRESENT RESEARCH: A MULTISTATE REPLICATION STUDY
The IAA validity study that Elliott et al. (2007) conducted served as the model for the present multistate investigation. The participating states in the current study all used a comprehensive rating-scale approach to AA-AASs. Each of the alternate assessments had been aligned with the state's grade-level extended standards and thus designed to focus on academic skills. To understand each of these states' alternate assessments, the current study incorporated a multitrait-multimethod (MTMM) design to determine the relationship among the AA-AAS, the state's general achievement test, and two established teacher-based rating scales.
Relationships with other variables is one of five main types of validity evidence that the Standards for Educational and Psychological Testing (AERA et al., 1999) addresses. Evidence based on relationships with other variables includes the degree to which scores from an instrument converge with indicators of similar constructs (convergent validity) and diverge from indicators of dissimilar constructs (divergent validity), as well as the degree to which the scores share no relationship with indicators of unrelated constructs (discriminant validity).
Campbell and Fiske (1959) suggested an approach by which researchers could use scores from multiple methods that were indicative of multiple traits as evidence for …
Questia, a part of Gale, Cengage Learning. www.questia.com
Publication information: Article title: What Do Alternate Assessments of Alternate Academic Achievement Standards Measure? A Multitrait-Multimethod Analysis. Contributors: Kettler, Ryan J. - Author, Elliott, Stephen N. - Author, Beddow, Peter A. - Author, Compton, Elizabeth - Author, McGrath, Dawn - Author, Kaase, Kristopher J. - Author, Bruen, Charles - Author, Ford, Lisa - Author, Hinton, Kent - Author. Journal title: Exceptional Children. Volume: 76. Issue: 4 Publication date: Summer 2010. Page number: 457+. © 1999 Council for Exceptional Children. COPYRIGHT 2010 Gale Group.
This material is protected by copyright and, with the exception of fair use, may not be further copied, distributed or transmitted in any form or by any means.