Choosing Assessment Instruments for Depression Outcome Research with School-Age Youth

Article excerpt

Depression is one of the most studied categories of mental disorders, and numerous scales have been developed to assess depression severity in youth and adults (Moran & Mohr, 2005; A. B. Shafer, 2006). Depression assessment is usually conducted either by using an interview format or via inventories using a self-report, clinician-report, parent-report, or teacher-report format. Some batteries of instruments may provide a combination of these formats to measure the severity of depression. For example, both the Achenbach System of Empirically Based Assessment (ASEBA; Achenbach & Rescorla, 2001) and the Children's Depression Inventory (CDI; Kovacs, 2003) provide self-, parent-, and teacher-report versions. As examples of the plethora of instruments available for use with clients of all ages during the past 20 years in counseling outcome research, Table 1 presents 25 diverse depression inventories and associated characteristics.

Counseling outcome research includes articles published in counseling, psychology, and medical journals that explore the effectiveness and staying power of counseling and psychotherapy interventions. Counseling outcome research does not include medication-only trials. Barkham et al. (1998), Moran and Mohr (2005), and A. B. Shafer (2006) identified the most widely used depression scales in counseling outcome research. For participants of all ages, these included the Hamilton Rating Scale for Depression (HAM-D; Hamilton, 1960), which is an interview protocol completed by the clinician, and the Beck Depression Inventory-II (BDI-II; Beck, Steer, & Brown, 1996) and the Center for Epidemiologic Studies Depression Scale (CES-D; Radloff, 1977), both of which are self-report instruments. Erford et al. (2011) confirmed the popularity of these outcome instruments, specifically among clinical trials of school-age youth, indicating that the BDI-II was used in 13 of the 42 (31%) clinical trials, the HAM-D in 12 of the 42 (29%) clinical trials, and the CES-D in eight of the 42 (19%) clinical trials selected into the meta-analysis (see Table 2). Erford et al. reported three additional instruments commonly found among the 42 selected studies of treatment of depression in school-age youth, including 17 clinical trials (40%) that used the self-report CDI, 10 clinical trials (24%) that used the mother-report Child Behavior Checklist Internalizing scale (CBCL-M-I; Achenbach & Rescorla, 2001), and five clinical trials (12%) that used the self-report Reynolds Adolescent Depression Scale-Second Edition (RADS-2; W. M. Reynolds, 2002).

The Erford et al. (2011) meta-analysis was conducted to determine the effectiveness of counseling interventions at posttest and upon follow-up with school-age youth, both inside and outside of the school setting, from 42 articles published from 1990 to 2009. These articles did not include medication trials. A side benefit of that meta-analysis was the identification of the published outcome measures (used as dependent variables) used most frequently, allowing comparisons between effect sizes derived from various clinical trials on various outcome measures. The purpose of this article is to review the practical and technical characteristics of these six most commonly used depression scales and, using effect size estimates from Erford et al., compare each scale's ability to measure treatment outcomes among school-age youth. These six depression instruments were selected for review because of their prominence in the outcome research literature for school-age youth over the past 20 years and because each met high standards of technical adequacy. Readers are referred to Table 1 for additional measures of depression used in outcome studies with participants of all ages.

* CDI

The CDI was originally published in 1992, based on items from the Beck Depression Inventory (BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961), and is designed to assess depressive symptoms in children 7 to 17 years old. There are four versions of the CDI: (a) the original version (CDI), which includes 27 items; (b) the CDI-Short Form (CDI:S) with 10 items; (c) the CDI-Parent version (CDI:P) composed of 17 items; and (d) the CDI-Teacher version (CDI:T) composed of 12 items. The latter two versions ask parents and teachers, respectively, to rate a child's depressive symptoms as they have occurred in the past 2 weeks using a 4-point Likert-type response scale. The self-report CDI and CDI:S versions ask children to rate their own depressive symptoms as they have occurred over the past 2 weeks using a 3-point Likert-type response scale (Freeman, 2007). The cost of a single administration of the CDI is about $2.20, whereas the examiner kit costs $135.00.

The CDI is a Level B instrument designed for use as a screening measure and should not be used to make diagnostic decisions. The leveling system (A, B, C) specifies publisher requirements to purchase and use an instrument (Erford, 2006). Level A instruments are open access with no educational or licensure requirements for the examiner. Level B instruments require that examiners have a master's degree in counseling, psychology, or a related field; a graduate course in assessment; and appropriate supervised experience. Level C instruments require a doctoral degree or state licensure allowing administration of the Level C test, along with appropriate supervised experience.

The CDI is available for use in 23 languages (Kovacs, 2003). The CDI takes about 15 minutes to complete, and the CDI:S takes about 5 minutes to complete. Items are written at a 7-year-old reading level. The CDI:P and CDI:T can be completed in about 5 minutes. Scoring takes about the same amount of time as administration and can be completed by hand with the QuikScore form or by computer (Freeman, 2007). Scores for the CDI are reported on five subscales (i.e., Anhedonia, Ineffectiveness, Interpersonal Problems, Negative Mood, and Negative Self-Esteem) and are summed to interpret a Total Depression score. This interpretive strategy was supported by exploratory factor analysis (EFA; Kovacs, 2003). Scores on the CDI:P and CDI:T are interpreted as a total score or by using two subscales: Emotional Problems and Functional Problems (Carlson, 2007).

The CDI is a norm-referenced instrument, but it was not standardized on a nationally representative sample, nor were racial or socioeconomic demographic data collected. Internal consistency of CDI scores was reported to fall between [alpha] = .71 and [alpha] = .89, with the Total Depression scores falling at the top of that range of alphas. Test-retest reliability for time periods of 4 weeks and 1 week revealed correlations of [r.sub.tt] = .38 and [r.sub.tt] = .87, respectively (Freeman, 2007); thus, near-term (i.e., 1 week) coefficients indicate adequate stability, whereas longer term stability estimates are highly questionable on a clinical population. Erford (2006) suggested that reliability coefficients of .90 or higher are appropriate for diagnostic decision making and that reliability coefficients of .80 or higher are appropriate for screening level and research purposes. Thus, the CDI Total Depression score is adequate for screening and research purposes. More data supporting interscorer reliability are necessary between the CDI:P and CDI:T. Little concrete data were provided as to the validity of CDI scores. It was reported that the CDI had a moderate correlation with the Revised Children's Manifest Anxiety Scale (C. R. Reynolds & Richmond, 1985) and a low correlation with the Coopersmith Self-Esteem Inventory (Coopersmith, 2002; Erford, 2007).

Erford et al. (2011) indicated the CDI was used in 17 clinical trials reported in 15 articles authored by Ackerson, Scogin, McKendree-Smith, and Lyman (1998); Asarnow, Scott, and Mintz (2002); Barrera, Chung, Greenberg, and Fleming (2002); De Cuyper, Timbremont, Bract, De Backer, and Wullaert (2004); Fine, Forth, Gilbert, and Haley (1991); Horowitz, Garber, Ciesla, Young, and Mufson (2007); Kahn, Kehle, Jenson, and Clark (1990); Liddle and Spence (1990); Mendlowitz et al. (1999); Nolan et al. (2002); Roberts, Kane, Thomson, Bishop, and Hart (2003); Rossello and Bernal (1999); Rossello, Bernal, and Rivera-Medina (2008); Sheffield et al. (2006); and Weisz, Thurber, Sweeney, Proffitt, and LeGagnoux (1997). Mean effect sizes can be computed by averaging the weighted effect sizes derived from individual clinical trials into a grand average across studies. Although averaging the effect size comparisons from numerous studies can be challenging, as long as the comparisons of effect sizes derived from multiple dependent variables from a single study are computed using the same formula and from the same comparison condition (i.e., wait list; placebo; treatment as usual [TAU]; or no-comparison, single-group study), the averaging of effect sizes yields robust and meaningful comparisons.

Mean effect size comparisons of the CDI (Erford et al., 2011) found no significant differences in effect sizes when the CDI was compared with the RADS-2 ([d.sub.CDI] = 1.64 vs. [d.sub.RADS-2] = 1.85, k = 1, n = 34), where k equals the number of studies compared; the CBCL-M-I ([d.sub.CDI] = 0.23 vs. [d.sub.CBCL-M-I] = 0.43, k = 3, n = 311); and the CES-D ([d.sub.CDI] = 0.29 vs. [d.sub.CES-D] = 0.24, k = 2, n = 429). The number of studies comparing effect sizes for these measures was small (ks of 1-3 studies), so the findings should be interpreted with caution. It should also be noted that aggregated effect sizes for one instrument may not be directly comparable with an aggregated effect size for a second instrument, because different clinical trials may have been used to construct the aggregated effect sizes. The CDI was found to have a significantly lower effect size when compared with the HAM-D ([d.sub.CDI] = 0.57 vs. [d.sub.HAM-D] = 1.39, k = 2, n = 37), which could be due to the small number of studies or the comparison between child self-report and clinician report, when the clinicians were not blind to participant treatment conditions.

The CDI is one of the most widely used assessments for determining depressive symptoms in children (Carlson, 2007; Erford et al., 2011; Freeman, 2007), and it is a good choice for the assessment of preadolescents. Depressive symptoms are rated by the client, his or her parents, and teachers. The CDI series is easy to score and interpret and available in many languages, although few studies have been published validating the translated versions. In addition, more information must be collected in support of the score validity of the English version measure. Finally, the CDI should be normed on a national, diverse sample to have greater confidence in norm-referenced interpretations of scores for screening and research decisions.

* BDI-II

The BDI-II is used to assess depression in individuals 13 years and older. The BDI was originally created by Aaron T. Beck in 1961 and was revised in 1978 and again in 1996 (i.e., the BDI-II). The BDI-II can be used in clinical and subclinical populations as a screening level measure for assessing the severity of depression symptoms. However, it should not be used to determine a clinical diagnosis for depression. The criteria in the BDI-II align well with the criteria for depression specified by the Diagnostic and Statistical Manual of Mental Disorders (4th ed., text rev.; DSM-IV-TR [hereinafter referred to as the DSM when used in the general sense]; American Psychiatric Association [APA], 2000; Erford, 2006; Farmer, 2001).

BDI-II items are written at a fifth-grade reading level (Beck et al., 1996). It is important to be aware that the BDI-II does not contain a validity scale and may allow individuals to provide false responses undetected; therefore, caution in interpretation is warranted. Classified as a Level C measure by the Psychological Corporation, the BDI-II can be purchased only by an individual with a doctoral degree in psychology or a related field, by a licensed counseling or psychological professional, or for research purposes (Farmer, 2001). The cost of one administration is approximately $2.20; the cost of the examiner kit, including the technical manual, is $115.00. A Spanish version of the BDI-II is available (Beck et al., 1996).

The BDI-II consists of 21 self-report items, each scored on a 4-point Likert-type response scale ranging from 0 (absence of symptom) to 3 (severe manifestation of symptom; Beck et al., 1996). The items may be read aloud by the examiner if necessary. The BDI-II protocol can be hand or computer scored. The total raw score consists of the simple sum of the raw scores of the 21 items (Farmer, 2001). Raw scores indicate the general severity of depression and can range from 0 to 63. Interpretation of the raw score total is accomplished through criterion-referenced strategies; the BDI-II is not norm referenced. Beck et al. (1996) suggested the following criterion interpretive ranges: 0-13 = minimal depression, 14-19 = mild depression, 20-28 = moderate depression, and 29-63 = severe depression. No diagnostic validity studies using these suggested cutoff scores were provided.

The BDI-II was standardized on a clinical sample of 500 individuals receiving outpatient therapy and on a convenience sample of 120 Canadian college students. Overall, the psychometric properties of the BDI-II are very good. Internal consistency of scores on the BDI-II was demonstrated through studies showing coefficient alphas ranging from .90 to .93 for the total score (Coles, Gibb, & Heimberg, 2001; Dozois, Dobson, & Ahnberg, 1998; Grothe et al., 2005). Test-retest reliability was demonstrated on the original normative sample (r = .93; Beck et al., 1996), and Coles et al. (2001) reported a test-retest reliability coefficient of r = .91. Convergent validity was demonstrated between the BDI-II and other measures, such as the HAM-D (r = .71; Beck et al., 1996) and the Symptom Checklist-90-R (Derogatis, 1994) Depression subscale (r = .82; Coles et al., 2001). Scores on the BDI-II have also demonstrated adequate factorial validity. Dozois et al. (1998) found a two-factor solution for the BDI-II (i.e., Cognitive-Affective and Somatic-Vegetative). The BDI-II is one of the most psychometrically sound screening instruments available for assessing depressive symptoms in adolescents and adults (Erford, 2006).

In the Erford et al. (2011) meta-analysis, the BDI or BDI-II was used in 13 studies across 12 articles authored by Brent et al. (1997); Clarke, Rohde, Lewinsohn, Hops, and Seeley (1999); Diamond, Siqueland, and Diamond (2003); Hyun, Chung, and Lee (2005); Kaufman, Rohde, Seeley, Clarke, and Stice (2005); Lewinsohn, Clarke, Hops, and Andrews (1990); Miller, Gur, Shanok, and Weissman (2008); Mufson et al. (2004); Mufson, Weissman, Moreau, and Garfinkel (1999); Rohde, Lewinsohn, and Seeley (1994); Stice, Burton, Bearman, and Rohde (2006); and Stice, Rohde, Seeley, and Gau (2008). The BDI-II had an equivalent effect size estimate when compared with the HAM-D (d = 0.49 each, k = 6, n = 347) and the CES-D ([d.sub.BDI-II] = 0.81 vs. [d.sub.CES-D] = 0.76, k = 2, n = 124). When contrasted with the CBCL-M-I, the BDI-II had a significantly greater effect size ([d.sub.BDI-II] = 0.75 VS. [d.sub.CBCL-M-I] = 0.01, k = 2, n = 104), but this may be due to the small number of studies and the comparison of self-report to mother-report estimates. The BDI-II is an excellent choice for depression outcome research and clinical practice with children 13 years and older as a self-report measure.

* HAM-D

The HAM-D is a 21-item, Level B instrument designed to measure the severity of symptoms of individuals who have already been diagnosed with depression. The scale is conducted in a semistructured interview style by a medical or mental health professional. The HAM-D was created by Max Hamilton in 1960 and relies primarily on the skill of the interviewer in terms of finding reliable and necessary information. Since the original publication of the HAM-D, multiple variations have been created, often making the search for scale and psychometric information confusing. However, the original version of the HAM-D has been credited as the most frequently used rating scale for depression (Williams, 2001). The HAM-D is available free of charge at http://healthnet.umassmed.edu/mhealth/HAMD.pdf.

The original HAM-D contained 21 questions written at a high school reading level, 17 of which have responses scaled into categories of increasing intensity as well as equally weighted terms (Hamilton, 1960). The final four items (i.e., diurnal variation, derealization, paranoid symptoms, and obsessional symptoms) were not included on the actual rating scale because of infrequency and lack of relevance to the disorder. Therefore, the total score is based on the first 17 items. Each of these 17 items is measured either on a 5-point or 3-point scale. The 5-point scale (0-4) is used for seven of the 17 items, and the 3-point scale (0-2) is used for the remaining 10 items, for a total possible raw score of 48 points. For each item, the frequency and the intensity of a certain symptom are equally weighted; because of this, it is the rater's job to decide if any emphasis should be placed on either. Three specific factors have been derived from the HAM-D items: General Depression or Symptom Severity, Agitation Versus Retardation, and Insomnia (Hamilton, 1960). Administration of the original HAM-D was properly conducted independently by two scorers. Interpretive ranges for the total raw score are as follows: 0-7 = normal, 8-13 = mild depression, 14-18 = moderate depression, 19-22 = severe depression, and 23+ = very severe depression. The HAM-D has been translated into Mandarin and Turkish languages (Whisman et al., 1989).

The psychometric properties of the HAM-D have been reviewed several times, and procedural tactics when using the scale have caused irregularity in reliability, validity, and item-response characteristics (Hedlund & Vieweg, 1979). High interrater reliability was found when using the HAM-D, but because of the lack of structure in the interview process, as well as interviews conducted by professionals from the same facility, this high reliability could be related to the shared background or perspectives of raters (Whisman et al., 1989).

In a meta-analysis on the validity and reliability of the original HAM-D, 70 studies were selected to review the psychometric properties of the scale (Bagby, Ryder, Schuller, & Marshall, 2004). Internal reliability for the HAM-D was stated to be adequate, ranging from .46 to .97 (the [alpha] = .46 study was a very low outlier), where 10 studies estimated alpha to be greater than .70. Bagby et al. (2004) noted that the only item on the HAM-D that did not meet adequate internal reliability criteria was "loss of insight." Interrater reliability for the total score ranged from r = .82 to r = .98, and test-retest reliability ranged from r = .81 to r = .98.

Although most of the HAM-D items align with DSM-IV-TR (APA, 2000) diagnostic criteria, several items do not, including psychic anxiety, loss of insight, and hypochondriasis. The DSM-IV-TR also has items that the HAM-D does not explicitly account for; instead, these items are integrated into various alternative items found on the scale. Convergent validity was adequate; however, the HAM-D and the Structured Clinical Interview for DSM (SCID; First, Spitzer, Gibbon, & Williams, 1997) major depression scale did not correlate significantly (Bagby et al., 2004). Predictive validity for the HAM-D is difficult to determine because of the multidimensionality of the scale. Thus, treatment may affect only a single dimension of the symptomatic presentation of clients with depression in a concurrent time period. The total score is most often interpreted because EFA has been conducted on HAM-D scores, and no consensus item-factor structure has emerged.

Although the original HAM-D is one of the most prominent rating scales for depression, several structured interview permutations have been established based on the original scale, including the structured interview version created in cooperation with the National Institute of Mental Health (NIMH; i.e., the Diagnostic Interview Schedule-Hamilton Rating Scale for Depression [DIS-HRSD; Whisman et al., 1989]) to counteract the reliability and validity limitations found in the original HAM-D (Whisman et al., 1989). (Note. In the literature, the abbreviations HAM-D and HRSD are used to refer to the Hamilton Rating Scale for Depression, also known as the Hamilton Depression Rating Scale [HDRS or HAM-D].) Both the HAM-D and DIS-HRSD had similar correlations with three other depression scales (i.e., the BDI, the Depression scale of the Minnesota Multiphasic Personality Inventory-2 [Butcher et al., 2001], and the Carroll Rating Scale for Depression [Carroll, Feinberg, Smouse, Rawson, & Greden, 1981]), showing a comparable degree of convergent validity with the other scales.

Erford et al. (2011) indicated the HAM-D was used in 12 clinical trials authored by Ackerson et al. (1998); Clarke, Hawkins, Murphy, and Sheeber (1995); Clarke et al. (2001); Clarke et al. (2002); Clarke et al. (1999); Diamond et al. (2003); Kaufman et al. (2005); Miller et al. (2008); Mufson et al. (2004); Mufson et al. (1999); Rohde et al. (1994); and Young, Mufson, and Davies (2006b). According to the Erford et al. (2011) meta-analysis results, when the HAM-D was compared with the BDI-II (k = 6, n = 347), there was no difference in effect size (both ds = 0.49). Also, there was no significant difference when the HAM-D was compared with the CES-D (both ds = 0.31, k = 3, n = 292). The HAM-D was found to have a significantly larger effect size when compared with the CDI ([d.sub.HAM-D] = 1.39 vs. [d.sub.CDI] = 0.57, k = 2, n = 37) and the CBCL-M-I ([d.sub.HAM-D] = 0.30 VS. [d.sub.CBCL-M-I] = 0.01, k = 2, n = 152), although the small number of studies warrants caution in concluding the extent of such differences. The original HAM-D has been modified and restructured over the years, but it continues to be a valuable depression outcome measure and is the best choice for a clinician-administered interview of depression symptoms in school-age children.

* ASEBA

The origins of the ASEBA were generated in 1980 by Thomas M. Achenbach and Craig Edelbrock. Achenbach evolved and revised the scales in 1991 into three separate instruments: the CBCL, the Teacher's Report Form (TRF), and the Youth Self-Report (YSR). These three forms were then revised and renormed in 2001 as the ASEBA and designed to assess and identify behavioral and emotional disorders in children 1 1/2 to 18 years old. Included in the assessments are questions that ask for descriptive information about the child's competencies, as well as questions asking for ratings of behavioral, emotional, and social problems. Each item is rated on a 3-point Likert-type response scale. The response choices are 0 = not true, 1 = somewhat/sometimes true, and 2 = very true/often true. The items are written overtly, and the respondent can easily provide a false answer, so caution is warranted when interpreting results (Erford, 2006).

The ASEBA is a Level B instrument with items written at a middle school reading level. The ASEBA can be scored by hand or by using a computer-assisted scoring and interpretive program. Computer scoring is recommended because hand scoring the assessment is quite tedious, time consuming, and prone to errors (Achenbach & Rescorla, 2001). The scores can be converted to norm-referenced T scores and percentile ranks. The ASEBA scales have been translated into 61 different languages, and it ordinarily takes about 15 to 20 minutes to complete each of the three respondent versions (Achenbach & Rescorla, 2001) because each version is composed of approximately 113 items. The ASEBA costs about $1.00 per administration, and the starter kit with computer scoring software, a manual, and 50 copies of each of the report protocols costs about $400.00.

Factor analysis of the CBCL revealed eight empirical factors: Withdrawn/Depressed, Somatic Complaints, Anxious/ Depressed, Social Problems, Thought Problems, Attention Problems, Rule-Breaking Behavior, and Aggressive Behavior (Achenbach & Rescorla, 2001). Two broad groupings of syndromes also appear on the ASEBA: the Internalizing and Externalizing scales. The Internalizing scale is more pertinent for the assessment of depression than is the Externalizing scale given the former scale's focus on inner problems and perspectives. DSM-oriented subscales are also included on the ASEBA for interpreting results and include Attention Deficit/ Hyperactivity Problems, Anxiety Problems, Oppositional Defiant Problems, Affective Problems, Conduct Problems, and Somatic Problems (Erford, 2006).

The most frequently used scale for measurement of depression in school-age youth was the Internalizing scale (Erford et al., 2011). Unfortunately, the Internalizing scale measures an amalgamation of internalizing symptoms (e.g., depression, anxiety, withdrawn, somatization), so it is an imprecise measure of depression outcomes. As an alternative to the Internalizing scale, researchers could use the empirically derived Withdrawn/Depressed or Anxious/Depressed subscales; although with fewer items, these subscales yield lower levels of score reliability. It is particularly unfortunate that a DSM-oriented subscale of depression was not created.

Overall, the psychometric properties of the ASEBA are good. Test-retest reliability coefficients ranged from .60 to .95 for the empirically based scales and from .62 to .95 for the DSM-oriented scales. Coefficients alphas are higher. Discriminant analyses revealed significant discrimination of scores between nonreferred and referred populations. Finally, criterion-related validity was demonstrated by substantial correlations between the ASEBA and other measures, including the Conners 3 scales (Conners, 2008), where coefficients ranged from .71 to .89, as well as DSM criteria (Achenbach & Rescorla, 2001).

The CBCL portion of the ASEBA is a checklist that parents complete regarding their children. There are different forms of the CBCL, one for children 1 1/2 to 5 years old and another for children 6 to 18 years old. A teacher completes the TRF portion of the ASEBA regarding the student under evaluation. The YSR is completed by youth between the ages of 11 and 18 years. In addition to these three forms, the ASEBA has a few additional forms, including the Young Adult Self-Report, which is similar to the YSR, but for individuals from 18 to 30 years old. The Direct Observations Form allows structured comments by an observer about children 5 to 14 years old. The Young Adult Behavior Checklist is similar to the CBCL except that it is meant for parents to report about young adults' behavioral, emotional, or social problems. Finally, the Semistructured Clinical Interview for Children & Adolescents is an interview that can be used with children 6 to 18 years old (Erford, 2007).

Erford et al. (2011) indicated the CBCL was used in 10 studies across eight articles authored by Ackerson et al. (1998); Clarke et al. (2002); De Cuyper et al. (2004); Kovacs et al. (2006); Lewinsohn et al. (1990); Melvin et al. (2006); Roberts et al. (2003); and Rossello et al. (2008). According to mean effect size comparisons from the Erford et al. meta-analysis, the CBCL-M-I was equivalent to the CDI ([d.sub.CBCL-M-I] = 0.43 vs. [d.sub.CDI] = 0.23, k = 3, n = 311). However, the CBCL-M-I had significantly lower effect sizes when compared with the HAM-D ([d.sub.CBCL-M-I], = 0.01 VS. [d.sub.HAM-D] = 0.30, k = 2, n = 152) and the RADS-2 ([d.sub.CBCL-M-I] = -0.16 VS. [d.sub.HAM-D] = 0.41, k = 1, n = 42). Again, the small number of studies and sample sizes require interpretive caution.

* CES-D

The CES-D was developed by Radloff (1977, 1991) while she worked for NIMH. The CES-D is a 20-item, Level B, self-report scale, and there is no cost to use this instrument (see http://www.chcr.brown.edu/pcoc/cesdscale.pdf). The CES-D is meant to be used with individuals 14 years and older as a screening level tool for depressive symptoms. Items are written at a middle school reading level. The CES-D measures symptoms of depression as they have occurred over the previous week using a 4-point Likert-type response scale. Administration takes about 10 minutes to complete, and the instrument can be scored by hand. Four items are worded positively and have to be reverse scored. The item raw scores are summed and yield a Depression Total score. The CES-D has been translated into Greek, Korean, and Japanese (Radloff, 1991).

Psychometric properties of the CES-D when used with adult populations were adequate. Little information is available specifically regarding use of the CES-D with adolescent samples, but given that many in the adult samples were college-age participants, it is probable that the psychometric results will generalize to older adolescents. Still, empirical studies of the CES-D with adolescent samples are needed. Reliability was demonstrated with internal consistency coefficients ranging from r = .84 to r = .90, and test-retest reliability coefficients ranged from r = .51 to r = .67 over an 8-week period and from r = .41 to r = .54 over periods of 3 to 12 months (Radloff, 1977). Studies have also demonstrated the validity of the CES-D scores. The CES-D correlated r = .50-.80 with the HAM-D, r = .30-.80 with the Raskin rating scale (Raskin, Schulterbrandt, Reatig, & McKeon, 1969), and r = .40-.50 with the Lubin Depression Adjective Checklist (Lubin, 1981). Factor analysis (A. B. Shafer, 2006) revealed a four-factor structure: Interpersonal Problems, Somatic Symptoms, Positive Affect, and Depressed/Negative Affect. However, it should be noted that no formal subscales of the CES-D are acknowledged and only the Depression Total score is interpreted.

Erford et al. (2011) indicated that the CES-D was used as an outcome measure in eight clinical trials authored by Clarke et al. (1995); Clarke et al. (2001); Clarke et al. (2002); Horowitz et al. (2007); Lewinsohn et al. (1990); Rohde et al. (1994); Sheffield et al. (2006); and Young, Mufson, and Davies (2006a). Mean effect size comparisons from the Erford et al. meta-analysis found no significant differences between the CES-D and other depression scales. When compared with the BDI-II (k = 2, n = 124), the CES-D had an effect size of 0.76, whereas the BDI-II had an effect size of 0.81. A similar effect size was found when the CES-D was compared with the CDI ([d.sub.CES-D] = 0.24 vs. [d.sub.CDI] = 0.29, k = 2, n = 429). Overall, the CES-D is a very cost- and time-effective instrument used to assess symptoms of depression in adults. However, caution must be taken when using it with adolescents because research on its use with that population has been sparse. Greater research in this area would be beneficial.

* RADS-2

The RADS-2 is a self-report screening measure designed to assess depressive symptomatology in adolescents 11 to 20 years old. A revision of the original Reynolds Adolescent Depression Scale (W. M. Reynolds, 1987), the RADS-2 takes about 10 minutes to administer, score, and interpret. RADS-2 items are written on a third-grade reading level, and items are worded overtly, so they are potentially influenced by biased reporting (e.g., social desirability). The RADS-2 can be administered in groups or individually (Blair, 2005). The RADS-2 is a Level B instrument. If necessary, the instrument may be read aloud to individuals unable to read the test items. The RADS-2 costs $4.00 per administration and $160.00 for the examiner's kit (Erford, 2007). To date, the RADS-2 is available only in English.

Individuals rate RADS-2 items on a 4-point Likert-type response scale, where 1 = almost never, 2 = hardly ever, 3 = sometimes, and 4 = most of the time. The item raw scores are totaled and yield scores on four subscales (i.e., Anhedonia/Negative Affect, Dysphoric Mood, Negative Self-Evaluation, and Somatic Complaints). Scores from each subscale are summed to derive a Depression Total score (Blair, 2005). The RADS-2 must be hand scored using a pressure-sensitive form. Raw scores can be converted into T scores and percentile ranks and compared by gender and age groups. T scores less than 61 are considered normal, T scores from 61 to 64 indicate mild depression, T scores from 65 to 69 indicate moderate depression, and T scores of 70 and higher indicate severe depression (Erford, 2006, 2007).

The psychometric properties of the RADS-2 are adequate for its screening level purposes. It was standardized on a sample of 3,300 individuals (1,650 males and 1,650 females; Blair, 2005). The sample was representative of the racial composition of the U.S. population; however, no information was included about the residential makeup of the sample. Internal consistency was [alpha] = .92, and test-retest reliability was [r.sub.tt] = .86 for the Depression Total score. EFA (W. M. Reynolds, 2002) indicated that four factors underlie the RADS-2 items (i.e., Dysphoric Mood, Anhedonia/Negative Affect, Negative Self-Evaluation, and Somatic Complaints), but the total score is the most reliable and frequently interpreted score. Convergent validity was demonstrated through correlation (r = .84) with the HAM-D and other measures of depression (W. M. Reynolds, 2002).

Erford et al. (2011) indicated that the RADS-2 was used as an outcome measure in five clinical trials authored by Kahn et al. (1990); Kovacs et al. (2006); March et al. (2004); Melvin et al. (2006); and Puskar, Sereika, and Tusaie-Mumford (2003). Mean effect size comparisons using Erford et al. results found the RADS-2 to be a good choice for outcome research. When compared with the CBCL-M-I, the RADS-2 had a considerably higher effect size ([d.sub.RADS-2] = 0.41 VS. [d.sub.CBCL-M-I] = 0.16, k = 1, n = 42). The RADS-2 had an equivalent effect size when compared with the CDI ([d.sub.RADS-2] = 1.85 VS. [d.sub.CDI] = 1.64, k = 1, n = 34). Overall, the RADS-2 is a cost- and time-effective measure, yielding reliable and valid scores, and is an efficient norm-referenced, self-report instrument for outcome research and screening adolescents for depressive symptoms.

* Limitations and Implications for Counseling Research and Practice

Consistent with Erford et al.'s (2011) meta-analytic report, the primary limitation of this area of study is the paucity of quality clinical trials studying the effectiveness of treatments for school-age youth with depression. Erford et al. located only 42 articles published between 1990 and 2009 using wait-list, placebo, or TAU comparison group designs, or single-group designs, each using single or varied combinations of depression outcome measures (see Table 2). Indeed, only 15 of those 42 clinical trials used more than one depression outcome measure. As a result, effect size comparisons between various combinations of depression outcome measures were composed of small numbers of studies (ks of 1-6) and small combined sample sizes (ns of 34-429) in the present study. Although the cumulative data were comprehensive and reflect accurately on the current state of available knowledge on depression outcome measures used in studies of school-age youth, the small number of studies and sample sizes probably also reflect low degrees of power and must be interpreted cautiously. As future clinical trials with youth are conducted and research findings accumulate, the power of analyses such as this one will increase and allow greater confidence in interpretations.

An additional limitation was the lack of substantial psychometric information on some of the instruments. Even though these six instruments (or earlier versions) have each existed for at least 20 years, essential psychometric evidence related to reliability and validity of scores for some of the instruments and some age ranges has not yet been produced. Future confident use of these instruments for outcome research will hinge on the publication of this vital information.

Two important implications for counseling research and practice involve the degree to which future outcome studies of the treatment of depression in school-age children consistently use multiple depression measures and use instruments of demonstrated high quality. As mentioned earlier, only 15 of the 42 clinical trials included in the Erford et al. (2011) meta-analysis used more than one depression outcome measure as the dependent variable. Best practice would be to administer several high-quality instruments to measure the depression dependent variable in order to provide ample evidence of the effect of treatment over time and across measures. Thus, if for some reason one instrument is less sensitive to therapeutic change, a second or even third instrument may help to document the treatment effect, because different depression instruments have different sensitivities. Such a practice would also allow triangulation of results through multiple measures across multiple respondents, leading to a more thorough, robust documentation of treatment effects. This practice would also provide effect sizes for each instrument, thus allowing comparisons of the effect of treatment on various instruments in a greater number of studies. If this were the practice used in the 42 studies selected into the Erford et al. meta-analysis, the k comparisons offered in this review would have been in the range of about 20 studies, rather than the resulting k of one to six studies per instrument pairing. A higher number of study comparisons leads to increased power in the analysis, and Cornwell (1993) and Cornwell and Ladd (1993) indicated that comparisons with k = 20 ordinarily have sufficient statistical power.

Second, the use of instruments with demonstrated high levels of score reliability and validity offers more precise measurement of the treatment effects under study. Better instrumentation leads to more accurate measurement. The availability of so many depression inventories can be problematic for counselors trying to determine the most appropriate instrument to use for clinical or research purposes. Fortunately, clinicians and researchers working with school-age youth have a more simplified task given that the number of instruments available for use with this younger population is but a fraction of the number available for use with adults. The problem is simplified a bit further when one realizes that only a few scales for assessing depression in school-age children have versions that can be completed by parents and teachers knowledgeable of the child's depressive symptoms; few adult depression scales have other-report options.

For certain, different depression instruments have been developed for different purposes and populations (see Table 1), but when it comes to measuring depression in school-age children, the results of the analyses conducted in this review indicate that some instruments are better choices than others. Certainly, cost is a consideration that will guide use for some researchers and practitioners, and the CES-D and HAM-D are free-access instruments. Practical efficiency, such as ease of administration, scoring, and interpretation, is also a consideration, and each of the six instruments reviewed in this article has advantages in this regard. The availability of self- and other-report versions makes some instruments more attractive. Then, there are variations in the quality of instrument psychometrics that must be considered.

Considering all of these factors in aggregate, each instrument has advantages and disadvantages. The CDI offers self-, parent-, and teacher-report versions for children 7 to 17 years old and generates norm-referenced scores, but it costs about $2.20 per administration and needs additional evidence of validity. The BDI-II offers only a self-report version for individuals 13 years and older and provides criterion-referenced interpretations and good psychometric evidence, but it costs about $2.20 per administration and has a Level C restriction. The CES-D is a self-report, criterion-referenced instrument that can be administered at no cost to individuals 14 years or older, but it lacks sufficient psychometric evidence for confident use with adolescents. The ASEBA provides norm-referenced parent and teacher versions for children 6 to 18 years old and a self-report version for children 11 years and older. Each version costs about $1.00 to administer. However, the ASEBA produces imprecise estimates of depression, given that most studies used the global Internalizing scale and a few used the Withdrawn/Depressed or Anxious/Depressed subscales. The HAM-D is a no-cost, psychometrically adequate instrument completed by the clinician and applicable to individuals of all ages, but it has a marked tendency to overestimate the effect of treatment, perhaps because the clinicians completing the instrument were not blinded to the treatment and therefore provided biased after-treatment assessments. The RADS-2 is a norm-referenced, self-report instrument with good psychometrics for individuals 11 to 20 years old but costs $4.00 per administration.

Finally, one must consider the effect sizes ordinarily generated by these instruments so that accurate results can be judiciously reported and reasonable expectations for outcomes communicated to and by practitioners who will use the instruments with individuals or groups of clients to demonstrate treatment efficacy in schools, clinics, and inpatient and outpatient facilities. An amalgamation of all the effect size comparisons reported in this article indicates that the CBCL-M-I produced the lowest overall effect size, significantly lower than those of the HAM-D, RADS-2, and BDI-II, probably because of the imprecision of this scale in measuring depression and because parent estimates of change may be more conservative than either self- or clinician reports. On the other hand, the HAM-D produced the highest overall effect size, significantly higher than those of the CBCL-M-I and CDI, probably because clinician reports are less conservative than parent reports or because of clinician bias during the posttreatment assessment phase. During all of these comparisons, the BDI-II, CES-D, RADS-2, and CDI yielded equivalent effect sizes, an interesting result given that they are all self-report instruments with varying degrees of psychometric rigor and are used with youth of various ages.

Each of the six instruments reviewed has advantages and disadvantages and is used with different respondents across different age ranges. Still, some have advantages that make them more desirable for use in depression outcome research. Table 3 presents what we believe are the most efficient instruments according to different age levels and different respondent categories, when all considerations are aggregated and evaluated. The HAM-D is the most efficient instrument among clinician-report depression instruments, although caution is warranted because of its tendency to inflate effect sizes. The CDI:T and CDI:P are the most efficient instruments for teacher and parent reports for children between the ages 7 of 17 years, respectively. The most efficient instrument for self-report of depression among school-age children is determined, in part, by the age of the child and the psychometric adequacy of the measure. Thus, the CDI is the most efficient instrument for children 7 to 10 years old, primarily because few other self-report instruments cover this age range. The RADS-2 and CDI are good choices of depression instruments for 11- and 12-year-olds; the RADS-2 has the psychometric edge and yields an equivalent effect size but costs nearly double what the CDI costs per administration. Finally, both the BDI-II and RADS-2 are very efficient instruments for use with adolescents 13 years and older; they yield equivalent effect sizes and have comparable psychometrics, although the RADS-2 is more expensive to administer.

* Conclusion

Numerous depression instruments exist for use with school-age children for both clinical purposes and outcome research and vary in terms of expense, practical features, psychometric integrity, and effect size generation. Counselors and researchers can accurately measure the outcomes of their treatments by administering more than one high-quality instrument and collecting multiple respondent perspectives (e.g., self, parent, teacher, clinician). If this practice is adopted by researchers conducting clinical trials of depression treatment in school-age youth, the field will advance rapidly toward the identification of the most efficient instruments that clinicians and researchers can use with greater confidence in both the field and laboratory.

Finally, it must be noted that of the 42 articles from 1990 to 2009 selected into the Erford et al. (2011) depression meta-analysis, none were published in journals published by the American Counseling Association (ACA). If professional counselors are to attain parity with other mental health professions in the treatment of mental and emotional disorders, professional counselor scholars must contribute high-quality clinical trials to the extant counseling outcome literature. The recent emphasis on counselor identity through the Council for Accreditation of Counseling and Related Educational Programs (2009) Standards and the 20/20: A Vision for the Future of Counseling (ACA, 2009) initiative hinges on the scientific foundation of the counseling profession. Providing high-quality outcome research in counseling journals is a critical element in the evolution of the counseling profession and in attaining parity with other mental health professions.

Received 01/26/11

Revised 04/17/11

Accepted 06/07/11

* References

Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA school-age forms & profiles. Burlington: University of Vermont, Research Center for Children, Youth, & Families.

Ackerson, J., Scogin, F., McKendree-Smith, N., & Lyman, R. D. (1998). Cognitive bibliotherapy for mild and moderate adolescent depressive symptomatology. Journal of Consulting and Clinical Psychology, 66, 685-690. doi:10.1037/0022-006X.66.4.685

American Counseling Association. (2009). 20/20 statement of principles advances the profession. Retrieved from http://www.counseling.org/PressRoom/ NewsReleases.aspx?AGuid=4d87a0ce65c0-4074-89dc-2761cfbbe2ec

American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed., text rev.). Washington, DC: Author.

Asarnow, J. R., Scott, C. V., & Mintz, J. (2002). A combined cognitive-behavioral family education intervention for depression in children: A treatment development study. Cognitive Therapy and Research, 26, 221-229.

Bagby, R. M., Ryder, A. G., Schuller, D. R., & Marshall, M. B. (2004). The Hamilton Depression Rating Scale: Has the gold standard become a lead weight? American Journal of Psychiatry, 161, 2163-2177.

Barkham, M, Evans, C., Margison, F., McGrath, G., Mellor-Clark, J., Milne, D., & Connell, J. (1998). The rationale for developing and implementing core batteries in service settings and psychotherapy outcomes research. Journal of Mental Health, 7, 35-47.

Barrera, M., Chung, J. Y. Y., Greenberg, M., & Fleming, C. (2002). Preliminary investigation of a group intervention for siblings of pediatric cancer patients. Children's Health Care, 31, 131-142.

Beck, A. T., Steer, R. A., & Brown, G. K. (1996). Manual for the Beck Depression Inventory (2nd ed.). San Antonio, TX: Psychological Corporation.

Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571.

Blair, J. (2005). Test review of the Reynolds Adolescent Depression Scale--Second Edition. In B. S. Plake, J. C. Impara, & R. A. Spies (Eds.), The sixteenth mental measurements yearbook. Available from http://www.unl.edu/buros

Brent, D. A., Holder, D., Kolko, D., Birmaher, B., Baugher, M., Roth, C., ... Johnson, B. A. (1997). A clinical psychotherapy trial for adolescent depression comparing cognitive, family, and supportive therapy. Archives of General Psychiatry, 54, 877-885.

Butcher, J. N., Graham, J. R., Ben-Porath, Y. S., Tellegen, A., Dahlstrom, W. G., & Kaemmer, B. (2001). Manual for administration, scoring, and interpretation of the Minnesota Multiphasic Personality Inventory--Second Edition. Minneapolis: University of Minnesota Press.

Butcher, J. N., Williams, C. L., Graham, J. R., Archer, R. P., Tellegen, A., Ben-Porath, Y. S., & Kaemmer, B. (1992). Minnesota Multiphasic Personality Inventory--Adolescent (MMPI-A): Manual for administration, scoring, and interpretation. Minneapolis: University of Minnesota Press.

Carlson, J. F. (2007). Test review of the Children's Depression Inventory. In K. F. Gesinger, R. A. Spies, J. F. Carlson, & B. S. Plake (Eds.), The seventeenth mental measurements yearbook. Available from http://www.unl.edu/buros

Carroll, B. J., Feinberg, M., Smouse, P. E., Rawson, S. G., & Greden, J. F. (1981). The Carroll Rating Scale for Depression: I. Development, reliability and validation. The British Journal of Psychiatry, 138, 194-200. doi:10.1192/bjp.138.3.194

Clarke, G. N., Hawkins, W., Murphy, M., & Sheeber, L. B. (1995). Targeted prevention of unipolar depressive disorder in an at-risk sample of high school adolescents: A randomized trial of group cognitive intervention. Journal of the American Academy of Child and Adolescent Psychiatry, 34, 312-321.

Clarke, G. N., Hornbrook, M., Lynch, F., Polen, M., Gale, J., Beardslee, W., ... Seeley, J. (2001). A randomized trial of a group cognitive intervention for preventing depression in adolescent offspring of depressed parents. Archives of General Psychiatry, 58, 1127-1134.

Clarke, G. N., Hornbrook, M., Lynch, F., Polen, M., Gale, J., O'Connor, E., ... Debar, L. (2002). Group cognitive-behavioral treatment for depressed adolescent offspring of depressed parents in a health maintenance organization. Journal of the American Academy of Child and Adolescent Psychiatry, 41, 305-313.

Clarke, G. N., Rohde, P., Lewinsohn, P. M., Hops, H., & Seeley, J. R. (1999). Cognitive-behavioral treatment of adolescent depression: Efficacy of acute group treatment and booster sessions. Journal of the American Academy of Child and Adolescent Psychiatry, 38, 272-279.

Coles, M. E., Gibb, B. E., & Heimberg, R. G. (2001). Psychometric properties of the Beck Depression Inventory in adults with social anxiety disorder. Depression and Anxiety, 14, 145-148.

Conners, C. K. (2008). Manual for the Conners 3. North Tonawanda, NY: Multi-Health Systems.

Coopersmith, S. (2002). Revised Coopersmith Self Esteem Inventory manual. Redwood, CA: Mind Garden.

Cornwell, J. M. (1993). Monte Carlo comparisons of three tests for homogeneity of independent correlations. Educational & Psychological Measurement, 53, 605-618. doi:10.1177/0013164493053003003

Cornwell, J. M., & Ladd, R. T. (1993). Power and accuracy of the Schmidt and Hunter meta-analytic procedures. Educational & Psychological Measurement, 53, 877-895.

Council for Accreditation of Counseling and Related Educational Programs. (2009). 2009 standards. Retrieved from http://www. cacrep.org/doc/2009%20Standards.pdf

Cox, J. L., Holden, J. M., & Sagovsky, R. (1987). Detection of postnatal depression: Development of the 10-item Edinburgh Postnatal Depression Scale. British Journal of Psychiatry, 150, 782-786.

De Cuyper, S., Timbremont, B., Bract, C., De Backer, V., & Wullaert, T. (2004). Treating depressive symptoms in schoolchildren: A pilot study. Journal of European Child and Adolescent Psychiatry, 13, 105-114. doi:10.1007/s00787-004-0366-2

Derogatis, L. R. (1994). Manual for the Symptom Check List-90 (SCL-90-R). Minneapolis, MN: Pearson.

Diamond, G., Siqueland, L., & Diamond, G. M. (2003). Attachment-based family therapy for depressed adolescents: Programmatic treatment development. Clinical Child and Family Psychology Review,, 6, 107-127.

Dozois, D. J. A., Dobson, K. S., & Ahnberg, J. L. (1998). A psychometric evaluation of the Beck Depression Inventory-II. Psychological Assessment, 10, 83-89.

Erford, B. T. (Ed.). (2006). Counselor's guide to clinical, personality, and behavioral assessment. Boston, MA: Houghton Mifflin.

Erford, B. T. (Ed.). (2007). Assessment for counselors. Boston, MA: Houghton Mifflin.

Erford, B. T., Erford, B. M., Lattanzi, G., Weller, J., Schein, H., Wolf, E., ... Peacock, E. (2011). Counseling outcomes for school-age youth with depression from 1990 to 2008: A meta-analysis. Journal of Counseling & Development, 89, 439-457.

Esbensen, A. J., Rojahn, J., Aman, M. G., & Ruedrich, S. (2003). Reliability and validity of an assessment instrument for anxiety, depression, and mood among individuals with mental retardation. Journal of Autism and Developmental Disorders, 33, 617-629.

Farmer, R. F. (2001). Review of the Beck Depression Inventory--2nd Edition. In B. S. Plake & J. C. Impara (Eds.), The fourteenth mental measurements yearbook (pp. 123-126). Lincoln, NE: Buros Institute of Mental Measurements.

Fine, S., Forth, A., Gilbert, M., & Haley, G. (1991). Group therapy for adolescent depressive disorder: A comparison of social skills and therapeutic support. Journal of the American Academy of Child and Adolescent Psychiatry, 30, 79-85.

First, M. B., Spitzer, R. L., Gibbon, M., & Williams, J. B. (1997). The Structured Clinical Interview for DSM-IV Axis I Disorders--Clinical Version (SCID-CV). Washington, DC: American Psychiatric Press.

Freeman, S. J. (2007). Test review of the Children's Depression Inventory. In K. F. Gesinger, R. Spies, J. F. Carlson, & B. S. Plake (Eds.), The seventeenth mental measurements yearbook. Available from http://www.unl.edu/buros

Grothe, K. B., Dutton, G. R., Jones, G. N., Bodenlos, J., Ancona, M., & Brantley, P J. (2005). Validation of the Beck Depression Inventory--II in a low-income African American sample of medical outpatients. Psychological Assessment, 17, 110-114. doi:10.1037/1040-3590.17.1.110

Hamilton, M. (1960). A rating scale for depression. Journal of Neurology and Neuroscience, 23, 56-62.

Hedlund, J. L., & Vieweg, B. W. (1979). The Hamilton Rating Scale for Depression: A comprehensive review. Journal of Operational Psychiatry, 10, 149-165.

Horowitz, J. L., Garber, J., Ciesla, J. A., Young, J. F., & Mufson, L. (2007). Prevention of depressive symptoms in adolescents: A randomized trial of cognitive-behavioral and interpersonal prevention programs. Journal of Consulting and Clinical Psychology, 75, 693-706. doi:l0.1037/0022-006x.75.5.693

Hyun, M. S., Chung, H. I., & Lee, Y. J. (2005). The effect of cognitive-behavioral group therapy on the self-esteem, depression, and self-efficacy of runaway adolescents in a shelter in South Korea. Applied Nursing Research, 18, 160-166. doi:10.1016/j. apnr.2004.07.006

Kahn, J. S., Kehle, T. J., Jenson, W. R., & Clark, E. (1990). Comparison of cognitive-behavioral, relaxation, and self-modeling interventions for depression among middle-school students. School Psychology Review, 19, 196-211.

Kandel, D. B., & Davies, M. (1982). Epidemiology of depressive mood in adolescents. Archives of General Psychiatry, 39, 1205-1212.

Kaufman, N. K., Rohde, P., Seeley, J. R., Clarke, G. N., & Stice, E. (2005). Potential mediators of cognitive-behavioral therapy for adolescents with comorbid major depression and conduct disorder. Journal of Consulting and Clinical Psychology, 73, 38-46. doi:10.1037/0022-006x.73.1.38

Koenig, H. G., Meador, K. G., Cohen, H. J., & Blazer, D. G. (1988). Self-rated depression scales and screening for major depression in the older hospitalized patient with medical illness. Journal of the American Geriatric Society, 36, 699-706.

Kovacs, M. (2003). Children's Depression Inventory: Technical manual update. North Tonawanda, NY: Multi-Health Systems.

Kovacs, M., Sherrill, J., George, C. J., Pollock, M., Tumuluru, R. V., & Ho, V. (2006). Contextual emotion-regulation therapy for childhood depression: Description and pilot testing of a new intervention. Journal of the American Academy of Child and Adolescent Psychiatry, 45, 892-903. doi:10.1097/01. chi.0000222878.74162.5a

Lewinsohn, P. M., Clarke, G. N., Hops, H., & Andrews, J. (1990). Cognitive-behavior treatment for depressed adolescents. Behavior Therapy, 21, 385-401.

Liddle, B., & Spence, S. H. (1990). Cognitive-behaviour therapy with depressed primary school children: A cautionary note. Behavioural Psychotherapy, 18, 85-102.

Lloyd-Williams, M., Friedman, T., & Rudd, N. (2000). Criterion validation of the Edinburgh Postnatal Depression Scale as a screening tool for depression in patients with advanced metastatic cancer. Journal of Pain Symptom Management, 20, 259-265.

Lovibond, S. H., & Lovibond, P. F. (1995). Manual for the Depression Anxiety Stress Scales (2nd ed.). Sydney, New South Wales, Australia: Psychology Foundation.

Lubin, B. (1981). Depression Adjective Check Lists-Revised: Manual. Palo Alto, CA: Consulting Psychologists Press.

March, J., Silva, S., Petrycki, S., Curry, J., Wells, K., Fairbank, J., ... Severe, J. (2004). Fluoxetine, cognitive-behavioral therapy, and their combination for adolescents with depression. Journal of the American Medical Association, 292, 807-820.

Mason, B. J., Kocsis, J. H., Leon, A. C., Thompson, S., Frances, A. J., Morgan, R. O., & Parides, M. K. (1993). Measurement of severity and treatment response in dysthymia may have been limited by the structure and format of existing rating instruments. Psychiatry Annals, 23, 625-631.

Melvin, G. A., Tonge, B. J., King, N. J., Heyne, D., Gordon, M. S., & Klirnkeit, E. (2006). A comparison of cognitive-behavioral therapy, sertraline, and their combination for adolescent depression. Journal of the American Academy of Child and Adolescent Psychiatry, 45, 1151-1161. doi:10.1097/01. chi.0000233157.21925.71

Mendlowitz, S. L., Manassis, K., Bradley, S., Scapillato, D., Miezitis, S., & Shaw, B. F. (1999). Cognitive-behavioral group treatments in childhood anxiety disorders: The role of parental involvement. Journal of the American Academy of Child and Adolescent Psychiatry, 38, 1223-1229.

Miller, L., Gur, M., Shanok, A., & Weissman, M. (2008). Interpersonal psychotherapy with pregnant adolescents: Two pilot studies. Journal of Child Psychology and Psychiatry, 49, 733-742. doi:10.1111/j.1469-7610.2008.01890.x

Mogge, N. L., & LePage, J. P. (2004). The Assessment of Depression Inventory (ADI): A new instrument used to measure depression and to detect honesty of response. Depression and Anxiety, 20, 107-113. doi:10.1002/da.20033

Montgomery, S. A., & Asberg, M. (1979). A new depression scale designed to be sensitive to change. British Journal of Psychiatry, 134, 382-389.

Moran, P. J., & Mohr, D. C. (2005). The validity of Beck Depression Inventory and Hamilton Rating Scale for Depression items in the assessment of depression among patients with multiple sclerosis. Journal of Behavioral Medicine, 28, 35-41. doi:10.1007/ s10865-005-2561-0

Mufson, L., Dorta, K. P., Wickramaratne, P., Nomura, Y., Olfson, M., & Weissman, M. M. (2004). A randomized effectiveness trial of interpersonal psychotherapy for depressed adolescents. Archives of General Psychiatry, 61, 577-584.

Mufson, L., Weissman, M. M., Moreau, D., & Garfinkel, R. (1999). Efficacy of interpersonal psychotherapy for depressed adolescents. Archives of General Psychiatry, 56, 573-579.

Nolan, M., Carr, A., Fitzpartrick, C., O'Flaherty, A., Keary, K., Turner, R., ... Tobin, G. (2002). A comparison of two programmes for victims of child sexual abuse: A treatment outcome study. Child Abuse Review, 11, 103-123. doi:10.1002/car.727

Puskar, K., Sereika, S., & Tusaie-Mumford, K. (2003). Effect of the Teaching Kids to Cope (TKC) program on outcomes of depression and coping among rural adolescents. Journal of Child and Adolescent Psychiatric Nursing, 16, 71-80.

Radloff, L. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1, 385-401. doi:10.1177/014662167700100306

Radloff, L. S. (1991). The use of the Center for Epidemiologic Studies Depression Scale in adolescents and young adults. Journal for Youth and Adolescence, 20, 149-166.

Raskin, A., Schulterbrandt, J., Reatig, N., & McKeon, J. J. (1969). Replication of factors of psychopathology in interview, ward behavior and self-report ratings of hospitalized depressives. Journal of Nervous Mental Disorders, 148, 87-98. doi:10.1097/00005053-196901000-00010

Reynolds, C. R., & Kamphaus, R. W. (2005). Manual for the Behavior Assessment System for Children (2nd ed.). Circle Pines, MN: AGS Publishing.

Reynolds, C. R., & Richmond, B. O. (1985). Manual for the Revised Children's Manifest Anxiety Scale. Los Angeles, CA: Western Psychological Services.

Reynolds, W. M. (1987). Reynolds Adolescent Depression Scale: Professional manual. Odessa, FL: Psychological Assessment Resources.

Reynolds, W. M. (2002). Reynolds Adolescent Depression Scale-Second Edition: Professional manual. Odessa, FL: Psychological Assessment Resources.

Roberts, C., Kane, R., Thomson, H., Bishop, B., & Hart, B. (2003). The prevention of depressive symptoms in rural school children: A randomized controlled trial. Journal of Consulting and Clinical Psychology, 71, 622-628. doi:10.1037/0022-006x.71.3.622

Rohde, P., Lewinsohn, P. M., & Seeley, J. R. (1994). Response of depressed adolescents to cognitive-behavioral treatment: Do differences in initial severity clarify the comparison of treatments? Journal of Consulting and Clinical Psychology, 62, 851-854.

Rossello, J., & Bernal, G. (1999). The efficacy of cognitive-behavioral and interpersonal treatments for depression in Puerto Rican adolescents. Journal of Consulting and Clinical Psychology, 67, 734-745.

Rossello, J., Bernal, G., & Rivera-Medina, C. (2008). Individual and group CBT and IPT for Puerto Rican adolescents with depressive symptoms. Cultural Diversity and Ethnic Minority Psychology, 14, 234-245. doi:10.1037/1099-9809.14.3.234

Rush, A. J., Giles, D. E., Schlesser, M. A., Fulton, C. L., Weissenburger, J. E., & Burns, C. T. (1986). The Inventory of Depressive Symptomatology (IDS): Preliminary findings. Psychiatry Research, 18, 65-87.

Rush, A. J., Gullion, C. M., Basco, M. R., Jarrett, R. B., & Trivedi, M. H. (1996). The Inventory of Depressive Symptomatology (IDS): Psychometric properties. Psychological Medicine, 26, 477-486.

Shafer, A. B. (2006). Meta-analysis of the factor structures of four depression questionnaires: Beck, CES-D, Hamilton, and Zung. Journal of Clinical Psychology, 62, 123-146. doi:10.1002/ jclp.20212

Shafer, D., Fisher, P., Lucas, C. P., Dulcan, M. K., & Schwab-Stone, M. E. (2000). NIMH Diagnostic Interview Schedule for Children Version IV (NIMH DISC-IV): Description, differences from previous versions, and reliability of some common diagnoses. Journal of the American Academy of Child and Adolescent Psychiatry, 39, 28-38.

Sheffield, J. K., Spence, S. H., Rapee, R. M., Kowalenko, N., Wignall, A., Davis, A., & McLoone, J. (2006). Evaluation of universal, indicated, and combined cognitive-behavioral approaches to the prevention of depression among adolescents. Journal of Consulting and Clinical Psychology, 74, 66-79. doi:10.1037/0022-006x.74.1.66

Stice, E., Burton, E., Bearman, S. K., & Rohde, P. (2006). Randomized trial of a brief depression prevention program: An elusive search for a psychosocial placebo control condition. Behaviour Research and Therapy, 45, 863-876. doi:10.1016/j. brat.2006.08.008

Stice, E., Rohde, P., Seeley, J. R., & Gau, J. M. (2008). Brief cognitive-behavioral depression prevention program for high-risk adolescents outperforms two alternative interventions: A randomized efficacy trial. Journal of Consulting and Clinical Psychology, 76, 595-606. doi:10.1037/a0012645

Yisher, M., & Lang, M. (1983). The manual for the Children's Depression Scale (CDS). Melbourne, Victoria, Australia: Australian Council for Educational Research.

Walter, L. J., Meresman, J. F., Kramer, Y. L., & Evans, R. B. (2003). The Depression--Arkansas Scale: A validation study of a new brief depression scale in an HMO. Journal of Clinical Psychology, 59, 465-481. doi:10.1002/jclp.10137

Weisz, J. R., Thurber, C. A., Sweeney, L., Proffitt, V. D., & LeGagnoux, G. L. (1997). Brief treatment of mild-to-moderate child depression using primary and secondary control enhancement training. Journal of Consulting and Clinical Psychology, 65, 703-707.

Whisman, M. A., Strosahl, K., Fruzzetti, A. E., Schmaling, K. B., Jacobson, N. S., & Miller, D. M. (1989). A structured interview version of the Hamilton Rating Scale for Depression: Reliability and validity. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 1, 238-241. doi:10.1037/10403590.1.3.238

Williams, J. B. (2001). Standardizing the Hamilton Depression Rating Scale: Past, present, and future. European Archives of Psychiatry and Clinical Neuroscience, 251(Suppl. 2), 6-12.

Young, J. F., Mufson, L., & Davies, M. (2006a). Efficacy of interpersonal psychotherapy--adolescent skills training: An indicated preventive intervention for depression. Journal of Child Psychology and Psychiatry, 47, 1254-1262. doi:10.1111/j.14697610.2006.01667.x

Young, J. F., Mufson, L., & Davies, M. (2006b). Impact of comorbid anxiety in an effectiveness study of interpersonal psychotherapy for depressed adolescents. Journal of the American Academy of Child and Adolescent Psychiatry, 45, 904-912. doi:10.1097/01. chi.0000222791.23927.5f

Zigmond, A. S., & Snaith, R. P. (1983). The Hospital Anxiety and Depression Scale. Acta Psychiatrica Scandinavica, 67, 361-370.

Zimmerman, M., & Coryell, W. (1987). The Inventory to Diagnose Depression, Lifetime Version. Acta Psychiatrica Scandinavica, 75, 494-499.

Zimmerman, M., Coryell, W, Corenthal, C., & Wilson, S. (1986). A self-report scale to diagnose major depressive disorder. Archives of General Psychiatry, 43, 1076-1081.

Zung, W. W, Richards, M. S., Gables, C., & Short, M. (1965). A self-rating depression scale. Archives of General Psychiatry, 12, 63-70.

Brooke E. Muller and Bradley T. Erford, Department of Education Specialties, Loyola University Maryland. Correspondence concerning this article should be addressed to Bradley T. Erford, Department of Education Specialties, Loyola University Maryland, 2034 Greenspring Drive, Timonium, MD 21093 (e-mail: berford@loyola.edu).

TABLE 1
Summary of Depression Inventories Used in Counseling Outcome Research

Instrument, Acronym, and Author    Age (in Years)      Report Type

Anxiety, Depression, and Mood           10+               Other
Scale (ADAMS; Esbensen et al.,
2003)

(a) Achenbach System of                 6-18           Self (YSR),
Empirically Based Assessment                         parent (CBCL),
(ASEBA; Achenbach & Rescorla,                          and teacher
2001) Internalizing scale and                             (TRF)
Anxious/Depressed, Withdrawn/
Depressed, and Affective
Problems subscales

Assessment of Depression               Adults             Self
Inventory (ADI; Mogge & LePage,
2004)

(b) Beck Depression                     13+               Self
Inventory-II (BDI-II; Beck et
al., 1996)

Behavior Assessment System for          2-25        Self, parent, and
Children (BASC) Depression                               teacher
subscale (C. R. Reynolds &
Kamphaus, 2005)

(a) Center for Epidemiologic            14+               Self
Studies Depression Scale
(CES-D; Radloff, 1977)

Children's Depression Scale             9-16         Self and parent
(CDS; Tisher & Lang, 1983)

(b) Children's Depression               7-17        Self, parent, and
Inventory (CDI; Kovacs, 2003)                            teacher
Clinical Outcomes in Routine           Adults             Self
Evaluation-Outcome Measure
(CORE-OM; Barkham et al., 1998)

Cornell Dysthymia Rating Scale         Adults           Clinician
(CDRS; Mason et al., 1993)

Depression Anxiety Stress Scales        14+               Self
(DASS; Lovibond & Lovibond,
1995)

Depression--Arkansas Scale             Adults             Self
(D-ARK; Walter et al., 2003)

Diagnostic Interview Schedule           6-18           Clinician
for Children (DISC; D. Shafer et                      interview of
al., 2000)                                          child and parents

Edinburgh Postnatal Depression     Recent mothers         Self
Scale (EPDS; Cox et al., 1987;
Lloyd-Williams et al., 2000)

Geriatric Depression Screening      Older adults          Self
Scale (GDS; Koenig et al., 1988)

(a) Hamilton Rating Scale for           All             Interview
Depression (HAM-D; Hamilton,
1960)

Hospital Anxiety and Depression         All               Self
Scale (HADS; Zigmond & Snaith,
1983)

Inventory of Depressive                Adults             Self
Symptomatology (IDS; Rush et
al., 1986; Rush et al., 1996)

Inventory to Diagnose Depression                          Self
(IDD; Zimmerman & Coryell, 1987;
Zimmerman et al., 1986)

Kandel Depression Scale (KDS;       Adolescents           Self
Kandel & Davies, 1982)

Minnesota Multiphasic                   14+               Self
Personality Inventory-2 (MMPI-
2) Depression scale (Butcher et
al., 2001). Also MMPI-A
(Adolescent version; Butcher et
al., 1992)

Montgomery--Asberg Depression                           Interview
Rating Scale (MADRS; Montgomery
& Asberg, 1979)

(a) Reynolds Adolescent                11-20              Self
Depression Scale-Second Edition
(RADS-2; W. M. Reynolds, 2002)

Symptom Checklist-90-R (SCL-90-         13+               Self
R)-Depression factor (Derogatis,
1994)

Zung Self-Rating Depression             All               Self
Scale (SDS; Zung et al., 1965)

Instrument, Acronym, and Author      No. of Items      Psychometrics

Anxiety, Depression, and Mood         55 total (7        Adequate
Scale (ADAMS; Esbensen et al.,         Depressed
2003)                                 Mood items)

(a) Achenbach System of              113 (~10 for        Adequate
Empirically Based Assessment        each subscale)
(ASEBA; Achenbach & Rescorla,
2001) Internalizing scale and
Anxious/Depressed, Withdrawn/
Depressed, and Affective
Problems subscales

Assessment of Depression             39 total (19          Good
Inventory (ADI; Mogge & LePage,       Depression
2004)                                scale items)

(b) Beck Depression                       21               Good
Inventory-II (BDI-II; Beck et
al., 1996)

Behavior Assessment System for        Total scale        Adequate
Children (BASC) Depression           varies (9-14
subscale (C. R. Reynolds &            Depression
Kamphaus, 2005)                      scale items)

(a) Center for Epidemiologic              20             Adequate
Studies Depression Scale
(CES-D; Radloff, 1977)

Children's Depression Scale               66             Adequate
(CDS; Tisher & Lang, 1983)

(b) Children's Depression                 27             Adequate
Inventory (CDI; Kovacs, 2003)
Clinical Outcomes in Routine              34               Good
Evaluation-Outcome Measure
(CORE-OM; Barkham et al., 1998)

Cornell Dysthymia Rating Scale            20
(CDRS; Mason et al., 1993)

Depression Anxiety Stress Scales     42 total (14        Adequate
(DASS; Lovibond & Lovibond,           Depression
1995)                                scale items)

Depression--Arkansas Scale                11             Adequate
(D-ARK; Walter et al., 2003)

Diagnostic Interview Schedule      2,930 total (>100     Adequate
for Children (DISC; D. Shafer et      depression
al., 2000)                              items)

Edinburgh Postnatal Depression            10             Adequate
Scale (EPDS; Cox et al., 1987;
Lloyd-Williams et al., 2000)

Geriatric Depression Screening            30
Scale (GDS; Koenig et al., 1988)

(a) Hamilton Rating Scale for             21             Adequate
Depression (HAM-D; Hamilton,
1960)

Hospital Anxiety and Depression           14
Scale (HADS; Zigmond & Snaith,
1983)

Inventory of Depressive                                  Adequate
Symptomatology (IDS; Rush et
al., 1986; Rush et al., 1996)

Inventory to Diagnose Depression          38
(IDD; Zimmerman & Coryell, 1987;
Zimmerman et al., 1986)

Kandel Depression Scale (KDS;                            Adequate
Kandel & Davies, 1982)

Minnesota Multiphasic               567 total items        Poor
Personality Inventory-2 (MMPI-      (57 Depression
2) Depression scale (Butcher et
al., 2001). Also MMPI-A
(Adolescent version; Butcher et
al., 1992)

Montgomery--Asberg Depression             10
Rating Scale (MADRS; Montgomery
& Asberg, 1979)

(a) Reynolds Adolescent                30 total            Good
Depression Scale-Second Edition
(RADS-2; W. M. Reynolds, 2002)

Symptom Checklist-90-R (SCL-90-     90 total items       Adequate
R)-Depression factor (Derogatis,    (13 Depression
1994)                                scale items)

Zung Self-Rating Depression               20             Adequate
Scale (SDS; Zung et al., 1965)

Instrument, Acronym, and Author      Translation

Anxiety, Depression, and Mood
Scale (ADAMS; Esbensen et al.,
2003)

(a) Achenbach System of
Empirically Based Assessment
(ASEBA; Achenbach & Rescorla,
2001) Internalizing scale and
Anxious/Depressed, Withdrawn/
Depressed, and Affective
Problems subscales

Assessment of Depression
Inventory (ADI; Mogge & LePage,
2004)

(b) Beck Depression                    Spanish
Inventory-II (BDI-II; Beck et
al., 1996)

Behavior Assessment System for         Spanish
Children (BASC) Depression
subscale (C. R. Reynolds &
Kamphaus, 2005)

(a) Center for Epidemiologic        Greek, Korean,
Studies Depression Scale               Japanese
(CES-D; Radloff, 1977)

Children's Depression Scale
(CDS; Tisher & Lang, 1983)

(b) Children's Depression            23 languages
Inventory (CDI; Kovacs, 2003)
Clinical Outcomes in Routine
Evaluation-Outcome Measure
(CORE-OM; Barkham et al., 1998)

Cornell Dysthymia Rating Scale
(CDRS; Mason et al., 1993)

Depression Anxiety Stress Scales     24 languages
(DASS; Lovibond & Lovibond,
1995)

Depression--Arkansas Scale
(D-ARK; Walter et al., 2003)

Diagnostic Interview Schedule          Spanish
for Children (DISC; D. Shafer et
al., 2000)

Edinburgh Postnatal Depression
Scale (EPDS; Cox et al., 1987;
Lloyd-Williams et al., 2000)

Geriatric Depression Screening      >20 languages
Scale (GDS; Koenig et al., 1988)

(a) Hamilton Rating Scale for      Chinese, Turkish
Depression (HAM-D; Hamilton,
1960)

Hospital Anxiety and Depression        Chinese
Scale (HADS; Zigmond & Snaith,
1983)

Inventory of Depressive             >30 languages
Symptomatology (IDS; Rush et
al., 1986; Rush et al., 1996)

Inventory to Diagnose Depression
(IDD; Zimmerman & Coryell, 1987;
Zimmerman et al., 1986)

Kandel Depression Scale (KDS;
Kandel & Davies, 1982)

Minnesota Multiphasic
Personality Inventory-2 (MMPI-
2) Depression scale (Butcher et
al., 2001). Also MMPI-A
(Adolescent version; Butcher et
al., 1992)

Montgomery--Asberg Depression
Rating Scale (MADRS; Montgomery
& Asberg, 1979)

(a) Reynolds Adolescent
Depression Scale-Second Edition
(RADS-2; W. M. Reynolds, 2002)

Symptom Checklist-90-R (SCL-90-
R)-Depression factor (Derogatis,
1994)

Zung Self-Rating Depression          7 languages
Scale (SDS; Zung et al., 1965)

Note. All instruments listed in the table are available in
English. YSR = Youth Self-Report; CBCL = Child Behavior
Checklist; TRF = Teacher's Report Form.

(a) Selected for expanded review in this article.

TABLE 2
Widely Used Outcome Measures Reported in the
Erford et al. (2011) Meta-Analysis of School-Age
Youth With Depression

                                                    Articles
                                                      Used
                                                     (N= 42)

Instrument and Author                               n    %

Children's Depression Inventory (Kovacs, 2003)      17   40
Beck Depression Inventory-II (Beck et al., 1996)    13   31
Hamilton Rating Scale for Depression (Hamilton,
  1960)                                             12   29
Child Behavior Checklist Internalizing Scale and
  Anxious/Depressed subscale (Achenbach &
  Rescorla, 2001)                                   10   24
Center for Epidemiologic Studies Depression
  Scale (Radloff, 1977)                              8   19
Reynolds Adolescent Depression Scale-Second
Edition (W. M. Reynolds, 2002)                       5   12

Note. Percentage is out of the 42 studies selected into the Erford
et al. (2011) meta-analysis. Some studies used more than one
instrument as an outcome measure.

TABLE 3
Best Choices for Depression Outcome Research
With School-Age Children

                                  Age Range

Report Type        7-10 Years    11-12 Years    13-17 Years

Self-report            CDI       CDI, RADS-2   BDI-II, RADS-2
Parent report      CDI-Parent    CDI-Parent      CDI-Parent
Teacher report     CDI-Teacher   CDI-Teacher    CDI-Teacher
Clinician report      HAM-D         HAM-D          HAM-D

Note. CDI = Children's Depression Inventory; RADS-2 = Reynolds
Adolescent Depression Scale-Second Edition; BDI-II = Beck Depression
Inventory-II; HAM-D = Hamilton Rating Scale for Depression.

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.