Application Assignment 4-5 paged Paper Due by Sunday August 17, 2014

Anonymous
timer Asked: Aug 15th, 2014

Question Description

Symptom Severity and Functional Impairment

How severe are a client’s symptoms and how do they impact a client’s functioning? Consider the case of Sami in this week’s Introduction. Sami’s experiences during driving had grown to the point to which she feared having a panic attack behind the wheel and was apprehensive about driving at all. How might a clinician evaluate these symptoms and incorporate the results into the development of a treatment plan?

For this Assignment, you evaluate and select an appropriate test to measure symptom severity as well as functioning and apply it to your virtual client. 

The Assignment (3–5 pages):

·Select two tests of symptom severity (one must be a mental status examination) and two tests of functional impairment from the Mental Measurements Yearbook. (Select from the list below and see attachments to find the tests).

·Complete a comparative analysis of the tests and select one of each test (symptom severity and functional impairment) which is most appropriate for your virtual client (my virtual client is Sean Brody) and explain why.  Link to virtual client: http://mym.cdn.laureate-media.com/2dett4d/Walden/CPSY/6341/11/mm/psych_assessment/index.html

·Justify your selection.

·Explain one limitation of the two tests you selected and explain why they are limitations.

Support your Application Assignment with specific references to all resources used in its preparation. You are to provide a reference list for all resources.

Mental Measurements Yearbook

List of Tests attached below: (Please select from the following)

Test: Beck, A. T., Steer, R. A., & Brown, G. (1996). Beck depression inventory-II.

 

Test: Reynolds, W. M., & Kobak, K. (1995). Hamilton depression inventory.

Hamilton Depression Inventory.docx

Test: Beck, A. T., & Steer, R. (1993). Beck anxiety inventory [1993 Edition].   

Beck Anxiety Inventory.docx 

Test: Reynolds, W. (2008). Reynolds adolescent depression scale—2nd edition: Short form.

Reynolds Adolescent Depression Scale.docx  

Test: Reynolds, W. M., & Kobak, K. (1998). Reynolds depression screening inventory.

Reynolds Depression Screening Inventory.docx

Test: Novaco, R. (2003). Novaco anger scale and provocation inventory.

Novaco Anger Scale and Provocation Inventory.docx 

Test: Derogatis, L. (2001). Brief symptom inventory 18.

Brief Symptom Inventory 18.docx 

Test: Brown, T. (2001). Brown attention-deficit disorder scales for children and adolescents.

Brown Attention Deficit Disorder Scales for Children and Adolescents.docx  

Test: Brown, T. (1996). Brown attention-deficit disorder scales. 

Brown Attention Deficit Disorder Scales.docx

Test: Cullum, C., Weiner, M. F., & Saine, K. (2009). Texas functional living scale.

Texas Functional Living Scale.docx 

Test: Sparrow, S. S., Cicchetti, D. V., & Balla, D. (2008). Vineland adaptive behavior scales, second edition.

Vineland Adaptive Behavior Scales.docx 

Test: Conners, C. (2008). Conners comprehensive behavior rating scales.

Conners Comprehensive Behavior Rating Scales.docx 

Test: Piers, E. V., Herzberg, D. S., & Harris, D. (2002). Piers-Harris children's self-concept scale, second edition (The way I feel about myself).

Piers Harris Children's Self Concept Scale.docx 

Beck Anxiety Inventory [1993 Edition] By: Beck, Aaron T., Steer, Robert A, 19930101, Vol. 13 Mental Measurements Yearbook Review of the Beck Anxiety Inventory by E. THOMAS DOWD, Professor of Psychology, Kent State University, Kent, OH: Aaron T. Beck, M.D., and his associates have designed well-constructed and widely-used tests for years, with the twin virtues of simplicity and brevity. The Beck Anxiety Inventory (BAI) is no exception. At 21 items, it is certainly short and easy to understand as well. Each of the 21 items represents an anxiety symptom that is rated for severity on a 4-point Likert scale (0-3), ranging from Not at all to Severely; I could barely stand it. The scoring is easy, as the points for each item are simply added to form the total score. The test kit I received contained a brief manual, several answer sheets for the BAI as well as for the BDI and the BHS, and a complete computer-scoring package. Although not stated as such, I suspect that the instrument can be ordered without the computer scoring package. The manual, which is quite short, appears to have been written for a clinician rather than a researcher. The guidelines for administration and scoring are quite adequate, as are the data on reliability and validity. But the description of the scale development is inadequate, as it refers the reader to the original Beck, Epstein, Brown, and Steer (1988) article. In addition, the descriptions of other studies on the instrument are unacceptably brief. The BAI was originally developed from a sample of 810 outpatients of mixed diagnostic categories (predominantly mood and anxiety disorders). Two successive factor analyses on different samples then reduced the number of items to 21, with a minimum item-total correlation of .30. The original development appears to have been very well done and is described in detail in Beck et al. (1988). The reliability and validity data are thorough and informative but are based only on three studies: Beck et al. (1988), Fydrich, Dowdall, and Chambless (1990), and Dent and Salkovskis (1986). Of these, the first used a mixed diagnostic group, the second used patients diagnosed with DSM-III-R anxiety disorders, and the third used a nonclinical sample. The manual authors quite appropriately caution the reader the instrument was developed on a psychiatric population and should be interpreted cautiously with nonclinical individuals. The normative data tables are very thorough and informative, including means, standard deviations, coefficient alpha reliabilities, and corrected item-total correlations for five anxiety diagnostic groups with the highest representation in the sample. This apparently unpublished clinical sample consists of 393 outpatients who were seen at the Center for Cognitive Therapy in Philadelphia between January 1985 and August 1989. Internal consistency reliability coefficients are uniformly excellent, ranging between .85 and .94. Testretest reliability data from Beck et al. (1988) showed a coefficient of .75 over one week. The validity data are quite comprehensive, including content, concurrent, construct, discriminant, and factorial validity. In general, the data show excellent validity, even regarding the difficult problem of untangling anxiety and depression. Especially interesting to this reviewer were the factorial validity data. One factor analysis (Beck et al., 1988) found two factors (r = .56, p < .001) that seemed to reflect somatic and cognitive/affective aspects of anxiety, respectively. A cluster analysis on the clinical sample showed four clusters that are labeled Neurophysiological, Subjective, Panic, and Autonomic. The separate clusters showed acceptable reliability, considering the small number of items in each, and discriminant function analyses found some significant differences among the clusters. Two demographic variables, gender and age, appear to be significantly related to anxiety. Women were found to be more anxious than men and younger people were found to be more anxious than older people. Because of this, the authors caution users to adjust scores somewhat on interpretation, though how much is a little vague. As I mentioned earlier, a computer-scoring package came with the BAI, including large and small disks and a very detailed instruction manual. According to the manual, this package provides three modes of test administration and interpretive profiles for the Beck Depression Inventory (BDI; 13:31), The Beck Hopelessness Scale (BHS; 13:32), the Beck Anxiety Inventory (BAI; 13:30), and the Beck Scale for Suicide Ideation (BSSI; 13:33) separately and together. The profile includes clinical group references and a history of the patient's scores on previous administrations, in addition to data on that test. However, the package I received (as indicated in the manual and the README file) included only the separate profile reports and the BDI, BAI, and BHS. A note at the end of the manual suggested that some of this material would not be available until December 1992, so apparently I received an older version of the computerscoring package. [Editor's note: A 1994 version of the computer-scoring package is now available and produces integrative narrative reports.] Scoring of each test is not free once the package has been purchased. Credits must be purchased for the Use Counter (included) that is installed between the computer and the printer. In summary, the Beck Anxiety Inventory is another of the useful instruments designed by Beck and his colleagues. There are only a few deficits. First, the manual is too brief to give as much information as many users might like (though the computer scoring manual is very comprehensive). Second, there have been too few studies conducted on the BAI, with the result that it rests on an uncomfortably small data base. This is particularly apparent for the gender and age differences. The clusters of anxiety disorders identified thus far appear to be especially promising and further research should be conducted here. Clinicians, however, will find this a very useful test, especially when combined with the other Beck instruments into a comprehensive computer-scored interpretive profile. REVIEWER'S REFERENCES Dent, H. R., & Salkovskis, P. M. (1986). Clinical measures of depression, anxiety and obsessionality in nonclinical populations. Behavioral Research and Therapy, 24, 689-691. Beck, A. T., Epstein, N., Brown, G., & Steer, R. A. (1988). An inventory for measuring clinical anxiety: Psychometric properties. Journal of Consulting and Clinical Psychology, 56, 893-897. Fydrich, T., Dowdall, D., & Chambless, D. L. (1990, March). Aspects of reliability and validity for the Beck Anxiety Inventory. Paper presented at the National Conference on Phobias and Related Anxiety Disorders, Bethesda, MD. Review of the Beck Anxiety Inventory by NIELS G. WALLER, Associate Professor of Psychology, University of California, Davis, CA: While describing the development of the Beck Anxiety Inventory, Beck et al. (Beck, Epstein, Brown, & Steer, 1988) note that a number of studies have reported correlations greater than .50 between widely used anxiety and depression scales. Similar findings have been reported by others (Lovibond & Lovibond, 1995), and formal reviews of this topic (Clark & Watson, 1991; Dobson, 1985) conclude that anxiety and depression scales frequently correlate between .40 and .70. No one expects these scales to be uncorrelated because of the high comorbidity rates of anxiety and mood disorders (Maser & Cloninger, 1990). Yet many researchers (Riskind, Beck, Brown, & Steer, 1987) feel uncomfortable when measures of conceptually distinct constructs correlate as highly as .50. The Beck Anxiety Inventory (BAI; Beck & Steer, 1990; Beck, Epstein, Brown, & Steer, 1988) is a brief selfreport scale that was designed "to measure symptoms of anxiety which are minimally shared with those of depression" (Beck & Steer, 1990, p. 1). The 21 symptoms on the BAI were selected from three existing measures: (a) The Anxiety Check List (Beck, Steer, & Brown, 1985), (b) the PDR Check List (Beck, 1978), and (c) the Situational Anxiety Check List (Beck, 1982). The item pools of these scales were combined and winnowed using Jackson's (1970) method of scale construction. After eliminating identical or highly similar items, Beck and his colleagues used factor analysis to cull items for the final scale. Scales that are developed by this method often have high reliabilities, and coefficient alpha (Cortina, 1993) for the BAI is typically in the mid-.90s (Beck, Epstein, Brown, & Steer, 1988; Jolly, Aruffo, Wherry, & Livingston, 1993; Kumar, Steer, & Beck, 1993). The BAI items are measured on a 4-point Likert scale that ranges from Not at all (0 points) to Severely; I could barely stand it (3). The instructions for the test ask subjects to "indicate how much you have been bothered by each symptom during the PAST WEEK, INCLUDING TODAY, by placing an X in the corresponding space in the column next to each symptom" (manual, p. 4). Notice these instructions focus on a 2-week time frame; and consequently, the BAI should measure state anxiety better than trait anxiety. Beck et al. (1988) report a 1-week test-retest correlation of .75 for the BAI, whereas Creamer, Foran, and Bell (1995) report a 7-week correlation of .62. The factor structure of the BAI has been investigated in clinical (Beck, et al., 1988; Hewitt & Norton, 1993; Kumar et al., 1993) and nonclinical (Creamer et al., 1995) samples. Many studies have found two correlated dimensions (r = approximately .55) that have been interpreted as measuring somatic (example markers: Feelings of choking, Shaky) and subjective (example markers: Fear of the worst happening, Fear of losing control) symptoms of anxiety. A similar structure emerged in a recent factor analysis of data from the computer-administered BAI (Steer, Rissmiller, Ranieri, & Beck, 1993). The underlying structure of the BAI has also been investigated with cluster analysis. The BAI manual authors report that a centroid cluster analysis of clinical data uncovered four symptom clusters representing (a) neurophysiological, (b) subjective, (c) panic, and (d) autonomic symptoms of anxiety. Interestingly, a similar structure was uncovered in a recent factor analysis of the scale (Osman, Barrios, Aukes, Osman, & Markway, 1993). Regarding the cluster solution, Beck and Steer (1990) have suggested the cluster subscales "may assist the examiner in making a differential diagnosis" (p. 6) and that "profile analyses of BAI subscales appear promising" (p. 18). In my opinion, neither of these statements is supported by the data. The BAI subscales are highly correlated; consequently, subscale profiles will almost certainly be unreliable for most test takers. For example, from data reported in Table 5 of the BAI manual, it is easy to calculate the reliabilities for the cluster-subscale difference scores. For some of these scores the reliabilities are as low as .50. In other words, it is difficult to obtain reliable profiles when scales are composed of only four or five items. Because of the goals of the BAI authors, it is appropriate to ask how strongly the BAI correlates with popular depression scales, such as The Beck Depression Inventory (BDI; Beck & Steer, 1993; 13:31). In clinical samples, correlations between the BAI and BDI have ranged from .48 to .71 (Beck et al., 1988; Fydrich, Dowdall, & Chambless, 1992; Hewitt & Norton, 1993; Steer, Ranieri, Beck, & Clark, 1993). A Bayesian (Iversen, 1984, p. 41-44) posterior estimate for these data suggests that r = .587 with a 95% probability that the population correlation lies between .545 and .626. In nonclinical samples (Creamer et al., 1995; Dent & Salkovskis, 1986), correlations between the BAI and BDI have ranged from .50 to .63. The Bayesian estimate of r for these data is .591 with a 95% probability that the population correlation lies between .548 and .631. In conclusion, it appears that Beck was not successful in developing an anxiety scale with high discriminant validity. Nevertheless, he did develop a highly reliable scale that can be administered in 5 to 10 minutes. Thus, the BAI appears to be a useful addition to the growing number of clinical anxiety measures. REVIEWER'S REFERENCES Jackson, D. N. (1970). A sequential system for personality scale development. In C. D. Spielberger (Ed.), Current topics in clinical and community psychology (vol. 2, pp. 61-96). New York: Academic Press. Beck, A. T. (1978). PDR Check List. Philadelphia: University of Pennsylvania, Center for Cognitive Therapy. Beck, A. T. (1982). Situational Anxiety Check List (SAC). Philadelphia: University of Pennsylvania, Center for Cognitive Therapy. Iversen, G. R. (1984). Bayesian statistical inference. Beverly Hills, CA: Sage. Beck, A. T., Steer, R. A., & Brown, G. (1985). Beck Anxiety Check List. Unpublished manuscript, University of Pennsylvania. Dobson, K. S. (1985). The relationship between anxiety and depression. Clinical Psychology Review, 5, 307-324. Dent, H. R., & Salkovskis, P. M. (1986). Clinical measures of depression, anxiety, and obsessionality in non-clinical populations. Behaviour Research and Therapy, 24, 689-691. Riskind, J. H., Beck, A. T., Brown, G., & Steer, R. A. (1987). Taking the measure of anxiety and depression: Validity of reconstructed Hamilton scales. Journal of Nervous and Mental Disease, 175, 475-479. Beck, A. T., Epstein, N., Brown, G., & Steer, R. A. (1988). An inventory for measuring clinical anxiety: Psychometric properties. Journal of Consulting & Clinical Psychology, 56, 893-897. Beck, A. T., & Steer, R. A. (1990). Manual for the Beck Anxiety Inventory. San Antonio, TX: The Psychological Corporation. Maser, J. D., & Cloninger, C. R. (Eds.). (1990). Comorbidity of mood and anxiety disorders. Washington, DC: American Psychiatric Press. Clark, L. A., & Watson, D. (1991). Theoretical and empirical issues in differentiating depression from anxiety. In J. Becker, & A. Kleinman (Eds.), Psychosocial aspects of depression. Hillsdale, NJ: Erlbaum. Frydrich, T., Dowdall, D., & Chambless, D. L. (1992). Reliability and validity of the Beck Anxiety Inventory. Journal of Anxiety Disorders, 6, 55-61. Beck, A. T., & Steer, R. A. (1993). Beck Depression Inventory, manual. San Antonio, TX: The Psychological Corporation. Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98-104. Hewitt, P. L., & Norton, G. R. (1993). The Beck Anxiety Inventory: A psychometric analysis. Psychological Assessment, 5, 408-412. Jolly, J. B., Aruffo, J. F., Wherry, J. N., & Livingston, R. (1993). The utility of the Beck Anxiety Inventory with inpatient adolescents. Journal of Anxiety Disorders, 7, 95-106. Kumar, G., Steer, R. A., & Beck, A. T. (1993). Factor structure of the Beck Anxiety Inventory with adolescent psychiatric inpatients. Anxiety, Stress, and Coping, 6, 125-131. Osman, A., Barrios, F. X., Aukes, D., Osman, J. R., & Markway, K. (1993). The Beck Anxiety Inventory: Psychometric properties in a community population. Journal of Psychopathology and Behavioral Assessment, 15, 287-297. Steer, R. A., Rissmiller, D. J., Ranieri, W. F., & Beck, A. T. (1993). Structure of the computer-assisted Beck Anxiety Inventory with psychiatric inpatients. Journal of Personality Assessment, 60, 532-542. Steer, R. A., Ranieri, W. F., Beck, A. T., & Clark, D. A. (1993). Further evidence for the validity of the Beck Anxiety Inventory with psychiatric outpatients. Journal of Anxiety Disorders, 7, 195-205. Creamer, M., Foran, J., & Bell, R. (1995). The Beck Anxiety Inventory in a non-clinical sample. Behavior Research and Therapy, 33, 477-485. Gillis, M. M., Haaga, D. A. F., & Ford, G. T. (1995). Normative values for the Beck Anxiety Inventory, Fear Questionnaire, Penn State Worry Questionnaire, and Social Phobia and Anxiety Inventory. Psychological Assessment, 7, 450-455. Lovibond, P. F., & Lovibond, S. H. (1995). The structure of negative emotional states: Comparison of the Depression Anxiety Stress Scales (DASS) with the Beck Depression and Anxiety Inventories. Behaviour and Therapy Research, 33, 335-343.
Beck Depression Inventory-II By: Beck, Aaron T., Steer, Robert A., Brown, Gregory K, 19960101, Vol. 14 Mental Measurements Yearbook Review of the Beck Depression Inventory-II by PAUL A. ARBISI, Minneapolis VA Medical Center, Assistant Professor Department of Psychiatry and Assistant Clinical Professor Department of Psychology, University of Minnesota, Minneapolis, MN: After over 35 years of nearly universal use, the Beck Depression Inventory (BDI) has undergone a major revision. The revised version of the Beck, the BDI-II, represents a significant improvement over the original instrument across all aspects of the instrument including content, psychometric validity, and external validity. The BDI was an effective measure of depressed mood that repeatedly demonstrated utility as evidenced by its widespread use in the clinic as well as by the frequent use of the BDI as a dependent measure in outcome studies of psychotherapy and antidepressant treatment (Piotrowski & Keller, 1989; Piotrowski & Lubin, 1990). The BDI-II should supplant the BDI and readily gain acceptance by surpassing its predecessor in use. Despite the demonstrated utility of the Beck, times had changed and the diagnostic context within which the instrument was developed had altered considerably over the years (Beck, Ward, Mendelson, Mock, & Erbaugh, 1961). Further, psychometrically, the BDI had some problems with certain items failing to discriminate adequately across the range of depression and other items showing gender bias (Santor, Ramsay, & Zuroff, 1994). Hence the time had come for a conceptual reassessment and psychometrically informed revision of the instrument. Indeed, a mid-course correction had occurred in 1987 as evidenced by the BDI-IA, a version that included rewording of 15 out of the 21 items (Beck & Steer, 1987). This version did not address the limited scope of depressive symptoms of the BDI nor the failure of the BDI to adhere to contemporary diagnostic criteria for depression as codified in the DSM-III. Further, consumers appeared to vote with their feet because, since the publication of the BDI-IA, the original Beck had been cited far more frequently in the literature than the BDI-IA. Therefore, the time had arrived for a major overhaul of the classic BDI and a retooling of the content to reflect diagnostic sensibilities of the 1990s. In the main, the BDI-II accomplishes these goals and represents a highly successful revamping of a reliable standard. The BDI-II retains the 21-item format with four options under each item, ranging from not present (0) to severe (3). Relative to the BDI-IA, all but three items were altered in some way on the BDI-II. Items dropped from the BDI include body image change, work difficulty, weight loss, and somatic preoccupation. To replace the four lost items, the BDI-II includes the following new items: agitation, worthlessness, loss of energy, and concentration difficulty. The current item content includes: (a) sadness, (b) pessimism, (c) past failure, (d) loss of pleasure, (e) guilty feelings, (f) punishment feelings, (g) self-dislike, (h) self-criticalness, (i) suicidal thoughts or wishes, (j) crying, (k) agitation, (l) loss of interest, (m) indecisiveness, (n) worthlessness, (o) loss of energy, (p) changes in sleeping pattern, (q) irritability, (r) changes in appetite, (s) concentration difficulty, (t) tiredness or fatigue, and (u) loss of interest in sex. To further reflect DSM-IV diagnostic criteria for depression, both increases and decreases in appetite are assessed in the same item and both hypersomnia and hyposomnia are assessed in another item. And rather than the 1-week time period rated on the BDI, the BDI-II, consistent with DSMIV, asks for ratings over the past 2 weeks. The BDI-II retains the advantage of the BDI in its ease of administration (5-10 minutes) and the rather straightforward interpretive guidelines presented in the manual. At the same time, the advantage of a self-report instrument such as the BDI-II may also be a disadvantage. That is, there are no validity indicators contained on the BDI or the BDI-II and the ease of administration of a self-report lends itself to the deliberate tailoring of self-report and distortion of the results. Those of us engaged in clinical practice are often faced with clients who alter their presentation to forward a personal agenda that may not be shared with the clinician. The manual obliquely mentions this problem in an ambivalent and somewhat avoidant fashion. Under the heading, "Memory and Response Sets," the manual blithely discounts the potential problem of a distorted response set by attributing extreme elevation on the BDIII to "extreme negative thinking" which "may be a central cognitive symptom of severe depression rather than a response set per se because patients with milder depression should show variation in their response ratings" (manual, p. 9). On the other hand, later in the manual, we are told that, "In evaluating BDI-II scores, practitioners should keep in mind that all self-report inventories are subject to response bias" (p. 12). The latter is sound advice and should be highlighted under the heading of response bias. The manual is well written and provides the reader with significant information regarding norms, factor structure, and notably, nonparametric item-option characteristic curves for each item. Indeed the latter inclusion incorporates the latest in item response theory, which appears to have guided the retention and deletion of items from the BDI (Santor et al., 1994). Generally the psychometric properties of the BDI-II are quite sound. Coefficient alpha estimates of reliability for the BDI-II with outpatients was .92 and was .93 for the nonclinical sample. Corrected itemtotal correlation for the outpatient sample ranged from .39 (loss of interest in sex) to .70 (loss of pleasure), for the nonclinical college sample the lowest item-total correlation was .27 (loss of interest in sex) and the highest (.74 (self-dislike). The test-retest reliability coefficient across the period of a week was quite high at .93. The inclusion in the manual of item-option characteristic curves for each BDI-II item is of noted significance. Examination of these curves reveals that, for the most part, the ordinal position of the item options is appropriately assigned for 17 of the 21 items. However, the items addressing punishment feelings, suicidal thought or wishes, agitation, and loss of interest in sex did not display the anticipated rank order indicating ordinal increase in severity of depression across item options. Additionally, although improved over the BDI, Item 10 (crying) Option 3 does not clearly express a more severe level of depression than Option 2 (see Santor et al., 1994). Over all, however, the option choices within each item appear to function as intended across the severity dimension of depression. The suggested guidelines and cut scores for the interpretation of the BDI-II and placement of individual scores into a range of depression severity are purported to have good sensitivity and moderate specificity, but test parameters such as positive and negative predictive power are not reported (i.e., given score X on the BDI-II, what is the probability that the individual meets criteria for a Major Depressive Disorder, of moderate severity?). According to the manual, the BDI-II was developed as a screening instrument for major depression and, accordingly, cut scores were derived through the use of receiver operating characteristic curves to maximize sensitivity. Of the 127 outpatients used to derive the cut scores, 57 met criteria for either single-episode or recurrent major depression. The relatively high base rate (45%) for major depression is a bit unrealistic for nonpsychiatric settings and will likely serve to inflate the test parameters. Cross validation of the cut scores on different samples with lower base rates of major depression is warranted due to the fact that a different base rate of major depression may result in a significant change in the proportion of correct decisions based on the suggested cut score (Meehl & Rosen, 1955). Consequently, until the suggested cut scores are cross validated in those populations, caution should be exercised when using the BDI-II as a screen in nonpsychiatric populations where the base rate for major depression may be substantially lower. Concurrent validity evidence appears solid with the BDI-II demonstrating a moderately high correlation with the Hamilton Psychiatric Rating Scale for Depression-Revised (r = .71) in psychiatric outpatients. Of importance to the discriminative validity of the instrument was the relatively moderate correlation between the BDI-II and the Hamilton Rating Scale for Anxiety-Revised (r = .47). The manual reports mean BDI-II scores for various groups of psychiatric outpatients by diagnosis. As expected, outpatients had higher scores than college students. Further, individuals with mood disorders had higher scores than those individuals diagnosed with anxiety and adjustment disorders. The BDI-II is a stronger instrument than the BDI with respect to its factor structure. A two-factor (Somatic-Affective and Cognitive) solution accounted for the majority of the common variance in both an outpatient psychiatric sample and a much smaller nonclinical college sample. Factor Analysis of the BDI-II in a larger nonclinical sample of college students resulted in Cognitive-Affective and SomaticVegetative main factors essentially replicating the findings presented in the manual and providing strong evidence for the overall stability of the factor structure across samples (Dozois, Dobson, & Ahnberg, 1998). Unfortunately several of the items such as sadness and crying shifted factor loadings depending upon the type of sample (clinical vs. nonclinical). SUMMARY. The BDI-II represents a highly successful revision of an acknowledged standard in the measurement of depressed mood. The revision has improved upon the original by updating the items to reflect contemporary diagnostic criteria for depression and utilizing state-of-the-art psychometric techniques to improve the discriminative properties of the instrument. This degree of improvement is no small feat and the BDI-II deserves to replace the BDI as the single most widely used clinically administered instrument for the assessment of depression. REVIEWER'S REFERENCES Meehl, P. E., & Rosen, A. (1955). Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52, 194-216. Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571. Beck, A. T., & Steer, R. A. (1987). Beck Depression Inventory manual. San Antonio, TX: The Psychological Corporation. Piotrowski, C., & Keller, J. W. (1989). Psychological testing in outpatient mental health facilities: A national study. Professional Psychology: Research and Practice, 20, 423-425. Piotrowski, C., & Lubin, B. (1990). Assessment practices of health psychologists; Survey of APA Division 38 clinicians. Professional Psychology: Research and Practice, 21, 99-106. Santor, D. A., Ramsay, J. O., & Zuroff, D. C. (1994). Nonparametric item analyses of the Beck Depression Inventory: Evaluating gender item bias and response option weights. Psychological Assessment, 6, 255270. Dozois, D. J. A., Dobson, K. S., & Ahnberg, J. L. (1998). A psychometric evaluation of the Beck Depression Inventory-II. Psychological Assessment, 10, 83-89. Review of the Beck Depression Inventory-II by RICHARD F. FARMER, Associate Professor of Psychology, Idaho State University, Pocatello, ID: The Beck Depression Inventory-II (BDI-II) is the most recent version of a widely used self-report measure of depression severity. Designed for persons 13 years of age and older, the BDI-II represents a significant revision of the original instrument published almost 40 years ago (BDI-I; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961) as well as the subsequent amended version copyrighted in 1978 (BDI-IA; Beck, Rush, Shaw, & Emery, 1979; Beck & Steer, 1987, 1993). Previous editions of the BDI have considerable support for their effectiveness as measures of depression (for reviews, see Beck & Beamesderfer, 1974; Beck, Steer & Garbin, 1988; and Steer, Beck, & Garrison, 1986). Items found in these earlier versions, many of which were retained in modified form for the BDI-II, were clinically derived and neutral with respect to a particular theory of depression. Like previous versions, the BDI-II contains 21 items, each of which assesses a different symptom or attitude by asking the examinee to consider a group of graded statements that are weighted from 0 to 3 based on intuitively derived levels of severity. If the examinee feels that more than one statement within a group applies, he or she is instructed to circle the highest weighting among the applicable statements. A total score is derived by summing weights corresponding to the statements endorsed over the 21 items. The test authors provide empirically informed cut scores (derived from receiver operating characteristic [ROC] curve methodology) for indexing the severity of depression based on responses from outpatients with a diagnosed episode of major depression (cutoff scores to index the severity of dysphoria for college samples are suggested by Dozois, Dobson, & Ahnberg, 1998). The BDI-II can usually be completed within 5 to 10 minutes. In addition to providing guidelines for the oral administration of the test, the manual cautions the user against using the BDI-II as a diagnostic instrument and appropriately recommends that interpretations of test scores should only be undertaken by qualified professionals. Although the manual does not report the reading level associated with the test items, previous research on the BDI-IA suggested that items were written at about the sixth-grade level (Berndt, Schwartz, & Kaiser, 1983). A number of changes appear in the BDI-II, perhaps the most significant of which is the modification of test directions and item content to be more consistent with the major depressive episode concept as defined in the Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition (DSM-IV; American Psychiatric Association, 1994). Whereas the BDI-I and BDI-IA assessed symptoms experienced at the present time and during the past week, respectively, the BDI-II instructs the examinee to respond in terms of how he or she has "been feeling during the past two weeks, including today" (manual, p. 8, emphasis in original) so as to be consistent with the DSM-IV time period for the assessment of major depression. Similarly, new items included in the BDI-II address psychomotor agitation, concentration difficulties, sense of worthlessness, and loss of energy so as to make the BDI-II item set more consistent with DSM-IV criteria. Items that appeared in the BDI-I and BDI-IA that were dropped in the second edition were those that assessed weight loss, body image change, somatic preoccupation, and work difficulty. All but three of the items from the BDI-IA retained for inclusion in the BDI-II were reworded in some way. Items that assess changes in sleep patterns and appetite now address both increases and decreases in these areas. Two samples were retained to evaluate the psychometric characteristics of the BDI-II: (a) a clinical sample (n = 500; 63% female; 91% White) who sought outpatient therapy at one of four outpatient clinics on the U.S. east coast (two of which were located in urban areas, two in suburban areas), and (b) a convenience sample of Canadian college students (n = 120; 56% women; described as "predominantly White"). The average ages of the clinical and student samples were, respectively, 37.2 (SD = 15.91; range = 13-86) and 19.58 (SD = 1.84). Reliability of the BDI was evaluated with multiple methods. Internal consistency was assessed using corrected item-total correlations (ranges: .39 to .70 for outpatients; .27 to .74 for students) and coefficient alpha (.92 for outpatients; .93 for students). Test-retest reliability was assessed over a 1week interval among a small subsample of 26 outpatients from one clinic site (r = .93). There was no significant change in scores noted among this outpatient sample between the two testing occasions, a finding that is different from those often obtained with college students who, when tested repeatedly with earlier versions of the BDI, were often observed to have lower scores on subsequent testing occasions (e.g., Hatzenbuehler, Parpal, & Matthews, 1983). Following the method of Santor, Ramsay, and Zuroff (1994), the test authors also examined the itemoption characteristic curves for each of the 21 BDI-II items as endorsed by the 500 outpatients. As noted in a previous review of the BDI (1993 Revised) by Waller (1998), the use of this method to evaluate item performance represents a new standard in test revision. Consistent with findings for depressed outpatients obtained by Santor et al. (1994) on the BDI-IA, most of the BDI-II items performed well as evidenced by the individual item-option curves. All items were reported to display monotonic relationships with the underlying dimension of depression severity. A minority of items were somewhat problematic, however, when the degree of correspondence between estimated and a priori weights associated with item response options was evaluated. For example, on Item 11 (agitation), the response option weighted a value of 1 was more likely to be endorsed than the option weighted 3 across all levels of depression, including depression in the moderate and severe ranges. In general, though, response option weights of the BDI-II items did a good job of discriminating across estimated levels of depression severity. Unfortunately, the manual does not provide detailed discussion of item-option characteristic curves and their interpretation. The validity of the BDI-II was evaluated with outpatient subsamples of various sizes. When administered on the same occasion, the correlation between the BDI-II and BDI-IA was quite high (n = 101, r = .93), suggesting that these measures yield similar patterns of scores, even though the BDI-II, on average, produced equated scores that were about 3 points higher. In support of its convergent validity, the BDIII displayed moderately high correlations with the Beck Hopelessness Scale (n = 158, r = .68) and the Revised Hamilton Psychiatric Rating Scale for Depression (HRSD-R; n = 87, r = .71). The correlation between the BDI-II and the Revised Hamilton Anxiety Rating Scale (n = 87, r = .47) was significantly less than that for the BDI-II and HRSD-R, which was cited as evidence of the BDI-II's discriminant validity. The BDI-II, however, did share a moderately high correlation with the Beck Anxiety Inventory (n = 297; r = .60), a finding consistent with past research on the strong association between self-reported anxiety and depression (e.g., Kendall & Watson, 1989). Additional research published since the manual's release (Steer, Ball, Ranieri, & Beck, 1997) also indicates that the BDI-II shares higher correlations with the SCL90-R Depression subscale (r = .89) than with the SCL-90-R Anxiety subscale (r = .71), although the latter correlation is still substantial. Other data presented in the test manual indicated that of the 500 outpatients, those diagnosed with mood disorders (n = 264) had higher BDI-II scores than those diagnosed with anxiety (n = 88), adjustment (n = 80), or other (n = 68) disorders. The test authors also cite evidence of validity by separate factor analyses performed on the BDI-II item set for outpatients and students. However, findings from these analyses, which were different in some significant respects, are questionable evidence of the measure's validity as the test was apparently not developed to assess specific dimensions of depression. Factor analytic studies of the BDI have historically produced inconsistent findings (Beck et al., 1988), and preliminary research on the BDI-II suggests some variations in factor structure within both clinical and student samples (Dozois et al., 1998; Steer & Clark, 1997; Steer, Kumar, Ranieri, & Beck, 1998). Furthermore, one of the authors of the BDI-II (Steer & Clark, 1997) has recently advised that the measure not be scored as separate subscales. SUMMARY. The BDI-II is presented as a user-friendly self-report measure of depression severity. Strengths of the BDI-II include the very strong empirical foundation on which it was built, namely almost 40 years of research that demonstrates the effectiveness of earlier versions. In the development of the BDI-II, innovative methods were employed to determine optimum cut scores (ROC curves) and evaluate item performance and weighting (item-option curves). The present edition demonstrates very good reliability and impressive test item characteristics. Preliminary evidence of the BDI-II's validity in clinical samples is also encouraging. Despite the many impressive features of this measure, one may wonder why the test developers were not even more thorough in their presentation of the development of the BDI-II and more rigorous in the evaluation of its effectiveness. The test manual is too concise, and often omits important details involving the test development process. The clinical sample used to generate cut scores and evaluate the psychometric properties of the measure seems unrepresentative in many respects (e.g., racial make-up, patient setting, geographic distribution), and other aspects of this sample (e.g., education level, family income) go unmentioned. The student sample is relatively small and, unfortunately, drawn from a single university. Opportunities to address important questions regarding the measure were also missed, such as whether the BDI-II effectively assesses or screens the DSM-IV concept of major depression, and the extent to which it may accomplish this better than earlier versions. This seems to be a particularly important question given that the BDI was originally developed as a measure of the depressive syndrome, not as a screening measure for a nosologic category (Kendall, Hollon, Beck, Hammen, & Ingram, 1987), a distinction that appears to have become somewhat blurred in this most recent edition. Also, not reported in the manual are analyses to examine possible sex biases among the BDI-II item set. Santor et al. (1994) reported that the BDI-IA items were relatively free of sex bias, and given the omission of the most sex-biased item in the BDI-IA (body image change) from the BDI-II, it is possible that this most recent edition may contain even less bias. Similarly absent in the manual is any report on the item-option characteristic curves for nonclinical samples. Santor et al. (1994) reported that for most of the BDI-IA items, response option weights were less discriminating across the range of depression severity among their college sample relative to their clinical sample, an anticipated finding given that students would be less likely to endorse response options hypothesized to be consistent with more severe forms of depression. Also, given that previous editions of the BDI have shown inconsistent associations with social undesirability (e.g., Tanaka-Matsumi & Kameoka, 1986), an opportunity was missed to evaluate the extent to which the BDI-II measures something different than this response set. Despite these relative weaknesses in the development and presentation of the BDI-II, existent evidence suggests that the BDI-II is just as sound if not more so than its earlier versions. REVIEWER'S REFERENCES Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571. Beck, A. T., & Beamesderfer, A. (1974). Assessment of depression: The Depression Inventory. In P. Pichot & R. Oliver-Martin (Eds.), Psychological measurements in psychopharmacology: Modern problems in pharmacopsychiatry (vol. 7, pp. 151-169). Basel: Karger. Beck, A. T., Rush, A. J., Shaw, B. F., & Emery, G. (1979). Cognitive therapy of depression. New York: Guilford. Berndt, D. J., Schwartz, S., & Kaiser, C. F. (1983). Readability of self-report depression inventories. Journal of Consulting and Clinical Psychology, 51, 627-628. Hatzenbuehler, L. C., Parpal, M., & Matthews, L. (1983). Classifying college students as depressed or nondepressed using the Beck Depression Inventory: An empirical analysis. Journal of Consulting and Clinical Psychology, 51, 360-366. Steer, R. A., Beck, A. T., & Garrison, B. (1986). Applications of the Beck Depression Inventory. In N. Sartorius & T. A. Ban (Eds.), Assessment of depression (pp. 123-142). New York: Springer-Verlag. Tanaka-Matsumi, J., & Kameoka, V. A. (1986). Reliabilities and concurrent validities of popular selfreport measures of depression, anxiety, and social desirability. Journal of Consulting and Clinical Psychology, 54, 328-333. Beck, A. T., & Steer, R. A. (1987). Beck Depression Inventory manual. San Antonio, TX: The Psychological Corporation. Kendall, P. C., Hollon, S. D., Beck, A. T., Hammen, C. L., & Ingram, R. E. (1987). Issues and recommendations regarding the use of the Beck Depression Inventory. Cognitive Therapy and Research, 11, 289-299. Beck, A. T., Steer, R. A., & Garbin, M. G. (1988). Psychometric properties of the Beck Depression Inventory: Twenty-five years of evaluation. Clinical Psychology Review, 8, 77-100. Kendall, P. C., & Watson, D. (Eds.). (1989). Anxiety and depression: Distinctive and overlapping features. San Diego, CA: Academic Press. Beck, A. T., & Steer, R. A. (1993). Beck Depression Inventory manual. San Antonio, TX: Psychological Corporation. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Santor, D. A., Ramsay, J. O., & Zuroff, D. C. (1994). Nonparametric item analyses of the Beck Depression Inventory: Evaluating gender item bias and response option weights. Psychological Assessment, 6, 255270. Steer, R. A., Ball, R., Ranieri, W. F., & Beck, A. T. (1997). Further evidence for the construct validity of the Beck Depression Inventory-II with psychiatric outpatients. Psychological Reports, 80, 443-446. Steer, R. A., & Clark, D. A. (1997). Psychometric characteristics of the Beck Depression Inventory-II with college students. Measurement and Evaluation in Counseling and Development, 30, 128-136. Dozois, D. J. A., Dobson, K. S., & Ahnberg, J. L. (1998). A psychometric evaluation of the Beck Depression Inventory-II. Psychological Assessment, 10, 83-89. Steer, R. A., Kumar, G., Ranieri, W. F., & Beck, A. T. (1998). Use of the Beck Depression Inventory-II with adolescent psychiatric outpatients. Journal of Psychopathology and Behavioral Assessment, 20, 127-137. Waller, N. G. (1998). [Review of the Beck Depression Inventory-1993 Revised]. In J. C. Impara & B. S. Plake (Eds.), The thirteenth mental measurements yearbook (pp. 120-121). Lincoln, NE: The Buros Institute of Mental Measurements.
Brief Symptom Inventory 18 By: Derogatis, Leonard R, 20010101, Vol. 15 Mental Measurements Yearbook Review of the Brief Symptom Inventory 18 by ROGER A. BOOTHROYD, Associate Professor, Department of Mental Health Law and Policy, Louis de la Parte Florida Mental Health Institute, University of South Florida, Tampa, FL: DESCRIPTION. The Brief Symptom Inventory 18 (BSI 18), as its name implies, is an 18-item "self-report symptom inventory designed to serve as a screen for psychological distress and psychiatric disorders" (manual, p. 1). According to the administration manual the measure was designed for use "with a broad spectrum of adult medical patients 18 or older and adult individuals in the community who are not currently assigned patient status" (p. 3). Patients rate their level of distress during the past week on each of the 18 symptoms using a 5-point Likert-type scale ranging from 0 (not at all) to 4 (extremely). The author indicates that the items assess three symptom dimensions: Somatization (6 items), Depression (6 items), and Anxiety (6 items), as well as a Global Severity Index (GSI) based on all 18 items. The BSI 18 is recommended for use by health and behavioral health professionals as a psychological screen, to support clinical decisions, for monitoring treatment progress, and to assess treatment outcomes. The BSI can be completed by most respondents in 4 minutes and is purportedly written at a sixth grade reading level. DEVELOPMENT. The BSI 18 has an extensive developmental history. It is a reduced version of the 53item Brief Symptom Inventory (BSI; Derogatis & Melisaratos, 1983) that was developed from the Symptom Checklist-90 (SCL-90; Derogatis, Rickels, & Rock, 1976) that originally evolved from the Hopkins Symptoms Checklist (HSCL; Derogatis, Lipman, Rickels, Uhlenhuth, & Covi, 1974). The BSI 18 focuses on three symptom dimensions in contrast to the nine dimensions assessed by its predecessors. The author indicates that the three symptom dimensions of the BSI 18 were selected because they represent about 80% of the psychiatric disorders that occur in primary care practice. The author also states that although multiple criteria were used to determine which items to retain, the high prevalence of the specific symptoms in clinical disorders was the most significant selection factor. TECHNICAL. Scoring and standardization. In the absence of missing responses, raw scores are simply the sum of the item responses within a symptom domain. The manual includes scoring instructions for dealing with missing responses. Raw scores are converted to standardized T scores using the norm tables provided. Gender specific normative data are provided on two samples: a community sample of 1,134 adults and an oncology sample of 1,543 adults being treated for cancer. Although the age distributions of these samples are provided, no information is presented regarding the racial/ethnic composition of either sample, raising some question about the ethnic/racial diversity of the norming samples. Scores can then be plotted on the appropriate profile sheet. Computerized scoring is available through the purchase of MICROTEST Q 5.04 Assessment System from NCS Assessments. The software supports scoring of the BSI 18 and other NCS assessments, reporting results, and storing and exporting data. Optical scanning is also available; however, the time and effort required for hand scoring and interpretation is minimal. Reliability. Internal consistency reliability estimates were derived from the community sample. Alpha coefficients for the three symptom dimensions and GSI are .74 (Somatization), .84 (Depression), .79 (Anxiety), and .89 (GSI), and are certainly very acceptable. Additionally, these reliability estimates compare favorably with those derived from the longer BSI on a sample of 719 psychiatric outpatients. Although no test-retest reliability studies are reported on the BSI 18, the author provides test-retest estimates ranging from .68 to .84 on the symptom dimensions over an unspecified time interval based on a sample of 60 nonpatients who completed the BSI. GSI test-retest estimate was .90. Validity. As with many newly developed measures, evidence of validity is limited. The construct validity of the BSI 18 was assessed by correlating the three symptom dimension scores and GSI with the corresponding scores on the SCL-90-R. All correlations were high ranging from .91 on the Somatization dimension to .96 on Anxiety (Depression and GSI were both .93) suggesting little information was lost with the reduced number of items. The factor structure of the BSI 18 was examined using data from the community sample as a means of validating the hypothesized symptom dimensions. Although the results of a principal component analysis support a four-factor solution, the author argues that the findings are not "fundamentally inconsistent with the hypothesized structure of the BSI 18 test" (manual, p. 14). His rationale is that the items loading on the fourth factor, representing panic, are subsumed under anxiety disorders in the DSM-IV (American Psychiatric Association, 1994). Although no specific studies using the BSI 18 are reported, evidence of the measure's convergentdiscriminant validity and criterion-related validity is inferred on the basis of studies conducted with its predecessors the BSI and SCL-90. COMMENTARY. The BSI 18 is the newest incarnation of the Hopkins Symptoms Checklist that has been evolving over a span of nearly 30 years. The measurement foundation of the BSI 18 is quite strong even though the psychometric properties of this specific rendition are not well understood. The measure is brief and easy to score. The three symptom dimensions of the BSI 18 should identify individuals with the most common mental health problems. The dimensions are highly correlated to those in the more extensive SCL-90-R, supporting the validity of scores from the BSI 18. However, it would be a worthwhile effort to compare scores from the BSI 18 to other independently developed brief patient self-report symptomatology measures such as the Colorado Symptom Index (Shern, et al., 1994) as was done with the BSI by Conrad et al. (2001). This would provide additional validity evidence for the BSI 18. The administration manual is well written and contains information frequently omitted such as how to treat missing data in scoring. Over a quarter of the manual is devoted to "Specific Application of the BSI 18 Test" (p. 15); however, all of the studies summarized in this section were conducted using the longer 53item BSI. In Peterson's (1989) review of the BSI, he questioned whether the reduction in administration time from 15-20 minutes for the SCL-90 to the 7-10 minutes of the BSI was meaningful in light of the potential loss of clinical sensitivity. His concern certainly remains apropos with the BSI 18 given the number of symptom domains has been reduced from nine to three and that administration time is now a mere 4 minutes. Perhaps as Peterson suggested the time is approaching when we will just ask people "Do you feel depressed?" SUMMARY. The BSI 18 appears to be a useful measure for assessing anxiety, depression, and somatization as well as for obtaining an overall level of psychological distress. Although few studies have been conducted assessing the psychometric properties of the BSI 18, it is an abbreviated version of a frequently used and psychometrically tested measure of mental health symptomatology. REVIEWER'S REFERENCES American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Conrad, K. J., Yagelka, J. R., Matters, M. D., Rich, A. R., Williams, V., & Buchanan, M. (2001). Reliability and validity of a modified Colorado Symptom Index in a national homeless sample. Mental Health Services Research, 3, 141-153. Derogatis, L. R., & Melisaratos, N. (1983). The Brief Symptom Inventory: An introductory report. Psychological Medicine, 13, 595-605. Derogatis, L. R., Rickels, K., & Rock, A. (1976). The SCL-90 and the MMPI: A step in the validation of a new self-report scale. British Journal of Psychiatry, 128, 280-289. Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H., & Covi, L. (1974). The Hopkins Symptom Checklist (HSCL): A measure of primary symptom dimensions. In P. Pichot (Ed.), Psychological measurement in psychopharmacology (pp. 79-111). Basel, Switzerland: Karger. Peterson, C. A. (1989). [Review of the Brief Symptom Inventory.] In J. C. Conoley & J. J. Kramer (Eds.), The tenth mental measurement yearbook (pp. 112-113). Lincoln, NE: Buros Institute of Mental Measurements. Shern, D. L., Wilson, N. Z., Coen, A. S., Patrick, D. C. et al. (1994). Client outcomes II: Longitudinal client data from the Colorado Treatment Outcome Study. Milbank Quarterly, 72, 123-148. Review of the Brief Symptom Inventory 18 by WILLIAM E. HANSON, Assistant Professor, Department of Educational Psychology, University of Nebraska-Lincoln, Lincoln, NE: DESCRIPTION. The Brief Symptom Inventory 18 (BSI 18) is a norm-referenced, self-report instrument composed of, as its namesake suggests, 18 items. According to the manual, it is first and foremost a screening instrument, "developed primarily as a highly sensitive screen for psychiatric disorders and psychological disintegration and secondarily as an instrument to measure treatment outcomes" (administration, scoring, and procedures manual, p. 1). The manual indicates that it may be useful in most clinical and research settings and may be used with a wide range of medical and community populations, including, to name a few, people (18 years old or older) who have been diagnosed with cancer, who have a compromised immune system (e.g., HIV/AIDS), and/or who are experiencing chronic pain or sexual difficulties. The manual did not, however, indicate the minimum reading level required to complete the instrument. DEVELOPMENT. The BSI 18 is the fourth iteration in a family of well-known and widely used symptombased instruments. Its parent instrument, the Brief Symptom Inventory (BSI; Derogatis, 1993; Derogatis & Spencer, 1982), is a derivative of the Symptom Checklist-90-Revised (SCL-90-R; Derogatis, 1977, 1994), which, in turn, is a derivative of the Hopkins Symptom Checklist (HSCL; Derogatis, Lipman, Rickels, Uhlenhuth, & Covi, 1974). It was developed with the following considerations/assumptions in mind: Mood and anxiety disorders are common, though difficult to detect/diagnose appropriately, especially in medical and community settings; somatic symptoms co-occur frequently with Mood and anxiety disorders, further complicating the diagnostic picture; and administration and scoring of the instrument should be brief (<4-5 minutes), straightforward, and cost-effective. The instrument has 18 nonoverlapping items, all of which were taken directly from the BSI (Derogatis, 1993; Derogatis & Spencer, 1982). They were selected based on "multiple considerations, including prevalence of the symptom, item analysis characteristics, and loading saturations in factor analyses of the BSI and SCL-90-R" (manual, p. 2). No other details related to the development of the instrument were reported (e.g., specific results of item analyses, factor loadings, or pilot testing). It is difficult, therefore, to evaluate the appropriateness of the selection criteria and the decision rules that were used to choose the 18 items. Nevertheless, the BSI 18 has three, six-item subscales: Somatization (SOM), Depression (DEP), and Anxiety (ANX); and a Total, or Global Severity Index (GSI), score. The subscale and total scores measure constructs identical to the like-named BSI scores (Derogatis, 1993; Derogatis & Spencer, 1982). Specifically, the SOM subscale score measures "distress caused by the perception of bodily dysfunction, focusing on symptoms arising from cardiovascular, gastrointestinal, and other physiologic systems" (manual, p. 5). The DEP subscale score measures "core symptoms of various syndromes of clinical depression" (e.g., disaffection, dysphoric mood, suicidal ideation; manual, p. 5). The ANX subscale score measures "symptoms that are prevalent in most major anxiety disorders" (e.g., nervousness, tension, apprehension; manual, p. 5). The GSI score is a composite measure of psychological distress and is "the single best indicator of the respondent's overall emotional adjustment and psychopathologic status" (manual, p. 6). Each subscale is scored on a 5-point Likert-type scale, ranging from 0 not at all to 4 extremely. Subscale scores can range from 0-24 and can, if desired, be summed to obtain a total score, which can range from 0-72. TECHNICAL INFORMATION. Standardization procedures: Norming. Normative information is available for two separate samples: an adult nonclient, or "community" sample; and an adult nonclient "oncology" sample. The community sample consisted of 1,134 adult employees (605 men and 517 women; 12 did not report their sex) of an unspecified U.S. corporation. The employees were of diverse age (reported range: 18-69), with the majority being between the ages of 40-59. No other characteristics of this sample were reported (e.g., race/ethnicity, education level, or SES), making it difficult to determine its representativeness. Details related to how these individuals were identified and/or recruited to participate in the original norming study were also not reported. The oncology sample consisted of 1,543 adults (802 men and 741 women) who had been diagnosed with cancer and who were patients at an unspecified U.S. east coast cancer center. The adult cancer patients were of diverse age (reported range: <30-80+), with the majority being between the ages of 50-69. At least 20 different types/manifestations of cancer were represented. Similar to the community sample, no other characteristics of this sample were reported. Of note, both the community and the oncology norms are gender-keyed. Separate norms are available for men and women. Combined norms are also available. However, test users are "strongly recommended" to use the separate, gender-keyed norms (manual, p. 37). Administration and scoring. Administration and scoring procedures are straightforward and easy to understand. The BSI 18 may be administered by hand or by computer. It may also be scored by hand, using a preprinted scoring sheet that includes detailed scoring directions, or by computer, using scoring software that may be purchased from the publisher. Computer-based progress reports are also available for purchase. The availability of progress reports is appealing and, if used, may prove to be a useful feature of the scoring software. If administered by hand, the instructions, a sample test item, and the 18 test items are printed on one side of a single sheet of paper. Test takers are instructed to read a list of "problems," or symptoms, and to indicate how much each symptom has distressed or bothered them over the past week (i.e., "the past 7 days including today"). If scored by hand, the scoring directions and either a community- or oncology-based blank profile graph (one for men and one for women) are printed on one side of a single sheet of paper. Nine specific, stepby-step scoring directions are provided, including directions for determining a profile's validity, for calculating estimated values of omitted items, and for plotting raw score totals on the blank profile graph. As a general rule of thumb, a test taker may omit two items per subscale without jeopardizing the validity of the BSI 18's scores. If, however, three or more items are omitted from any of the three subscales, the scores should be considered invalid. Also, test users are reminded to always calculate estimated values of omitted items, as these estimates are included in raw score totals. Interpretation. To facilitate interpretation, raw score totals are converted to area, or uniform, T-scores, with a mean of 50 and a standard deviation of 10. Conversion tables for the two normative samples (community and oncology) are in the manual. Percentile rank equivalents of the raw scores are also in the manual. The manual recommends that interpretation of the BSI 18 occur at three interrelated levels: the global level; the dimensional level; and the symptom, or item, level. Basically, it involves three steps. The first step occurs at the global level. It involves determining "caseness" (manual, p. 23), that is, whether or not the test taker's scores meet predetermined, empirically based criteria for identification/positive risk of psychological distress-stated differently, that they fall within the clinical range. If so, then the test user is encouraged to evaluate the test taker further. The second step occurs at the dimensional level. It involves considering each BSI 18 subscale score independently, in the following recommended order: DEP, ANX, and SOM. If any of these subscale scores fall within the clinical range (T score >63), then the test user is also encouraged to evaluate the test taker further. The third and final step occurs at the symptom, or item, level. It involves considering individual BSI 18 items. For example, Item 17 (an item related to suicidal ideation) and Items 9, 12, and 18 (items related to panic attacks) should be examined closely to determine if further evaluation is necessary in these clinically important areas. Test users are referred to Derogatis and Savitz (1999) for additional information related to interpretation. RELIABILITY AND VALIDITY. Reliability estimates of the BSI 18 subscale and total scores are, based on the adult nonclient "community" sample mentioned earlier, satisfactory and meet traditional professional standards of acceptability. Estimates of internal consistency range, in this sample, from "fair" (Somatization [.74], Anxiety [.79]) to "good" (Depression [.84], Total [.89]; cf. Cicchetti, 1994). Testretest reliability estimates are not reported. However, test-retest reliability estimates of the BSI (Derogatis, 1993; Derogatis & Spencer, 1982) are reported. These estimates range, in a different sample of 60 nonpatients, from .68 (Somatization) to .90 (Total). By reporting these estimates, it appears that the author is relying on reliability induction, whereby test-retest reliability of the subscale and total scores is generalized from one sample (and, in this case, a different test, the BSI) and assumed to be an appropriate estimate for another sample (and the BSI 18). BSI 18 test users are, therefore, encouraged to compute reliability estimates, including estimates of internal consistency and test-retest reliability, for their "data in hand." Standard error of measurement (SEM), a common method of estimating the reliability of a test taker's score, is also not reported for the BSI 18 subscale or total scores. Subscale and total score intercorrelations are not reported either, which limits appropriate and responsible interpretation of the subscale and total scores and precludes profile interpretation altogether (Anastasi, 1985). Preliminary evidence of equivalence, or correspondence, between BSI 18 scores and SCL-90-R scores is provided in the manual. These correlations, which are based on the community sample, ranged from .91 (SOM) to .96 (ANX). The manual states that "basic considerations concerning such issues as face and content validity have been addressed previously in the context of the development of the parent instrument" (pp. 13-14). Preliminary evidence of criterion-related validity of BSI 18 scores is also provided in the manual. This evidence is based on a selective review of published studies that used the BSI-not the BSI 18. The studies were related to eight different clinical areas: screening studies, cancer populations, pain assessment/management, military populations, HIV/AIDS research, immune system functioning, human sexuality, and medical and law students. Finally, preliminary evidence of construct validity, in particular, convergent validity, is provided in the manual. This evidence is based on correlations between BSI and SCL-90-R scores and MMPI clinical, content, and Tryon cluster scores that measure similar constructs. Reported correlation coefficients ranged from .40-.72 and were generally in the expected direction. Specific evidence of discriminant validity was not reported. Of relevance here, factor analyses of the BSI 18 resulted in a four-factor solution that accounted for a respectable 57.2% of the total variance. The four identified factors and corresponding item loadings are more-or-less consistent with the author's hypothesized, a priori dimensional structure of the instrument and its scores. CONCLUSIONS AND RECOMMENDATIONS. The BSI 18 is an intriguing new, commercially available screening instrument. Given the popularity and track records of its parent instruments (e.g., BSI, SCL-90R), it likely has a promising future. Because of how it was developed, its overall strengths and weaknesses parallel those of its predecessors (for MMY reviews of the BSI, see Cundick, 1989, and Peterson, 1989; for MMY reviews of the SCL-90-R, see Pauker, 1985, and Payne, 1985). Its most obvious strengths include: professionally developed, user-friendly testing materials and scoring software; brevity; straightforward administration, scoring, and interpretation procedures; availability of computerbased progress reports; and acceptable estimates of internal consistency. Its most obvious weaknesses include: limited normative, reliability, and validity data, including data related to score sensitivity and specificity; lack of profile interpretation capabilities; and, similar to other brief, self-report instruments, susceptibility to distortion and "faking" of responses. That said, the manual, in its current form, makes it virtually impossible to determine the BSI 18's true merits. Though well organized and well written, it is largely uninformative. Too few details are included. Test users who refer to it for information regarding: the reading level required to complete the instrument; specific details related to how/why the test items were chosen; characteristics of the two normative samples; test-retest reliability estimates; SEM estimates; subscale and total score intercorrelations; and, perhaps most importantly, adequate evidence of construct and predictive validity, will be disappointed. The omission of these types of details is significant and, in this reviewer's opinion, should be addressed in future editions of the manual. All things considered, use of the BSI 18 may, quite frankly, be premature. Clearly, additional normative, reliability, and validity data are needed on the BSI 18 to justify its use, especially for clinical purposes. Until that occurs, it is recommended that the BSI 18 be used only for research purposes and as an adjunct, or supplement, to traditional, more well-established screening instruments/interview methods in clinical applications and settings. Also, it is recommend that, at this time, only the Total, or GSI, score be used. Prospective test users looking for a suitable, though slightly longer, alternative to the BSI 18 may find the OQ-45.2 (Lambert et al., 1996) to be a potentially satisfactory, psychometrically sound option. REVIEWER'S REFERENCES Anastasi, A. (1985). Interpreting results from multiscore batteries. Journal of Counseling and Development, 64, 84-86. Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284-290. Cundick, B. P. (1989). [Review of the Brief Symptom Inventory]. In J. C. Conoley & J. J. Kramer (Eds.), The tenth mental measurements yearbook (pp. 111-112). Lincoln, NE: Buros Institute of Mental Measurements. Derogatis, L. R. (1977). Symptom Checklist-90-R (SCL-90-R) administration, scoring, and procedures manual I. Baltimore, MD: Clinical Psychometric Research. Derogatis, L. R. (1993). Brief Symptom Inventory (BSI) administration, scoring, and procedures manual (3rd ed.). Minneapolis: NCS Pearson, Inc. Derogatis, L. R. (1994). Symptom Checklist-90-R (SCL-90-R) administration, scoring, and procedures manual (3rd ed.). Minneapolis: NCS Pearson, Inc. Derogatis, L. R., Lipman, R. S., Rickels, K., Uhlenhuth, E. H., & Covi, L. (1974). The Hopkins Symptom Checklist (HSCL): A measure of primary symptom dimensions. In P. Pichot (Ed.), Psychological measurements in psychopharmacology (pp. 79-111). Basel, Switzerland: Karger. Derogatis, L. R., & Savitz, K. L. (1999). The SCL-90-R, Brief Symptom Inventory, and matching clinical ratings scales. In M. E. Maruish (Ed.), The use of psychological testing for treatment planning and outcomes assessment (2nd ed.; pp. 679-724). Mahwah, NJ: Lawrence Erlbaum Associates. Derogatis, L. R., & Spencer, P. (1982). Brief Symptom Inventory (BSI) administration, scoring, and procedures manual I. Baltimore, MD: Clinical Psychometric Research. Lambert, M. J., Hansen, N. B., Umphress, V., Lunnen, K., Okiishi, J., Burlingame, G. M., & Reisinger, C. W. (1996). Administration and scoring manual for the OQ-45.2. Stevenson, MD: American Professional Credentialing Services LLC. Pauker, J. D. (1985). [Review of the SCL-90-R]. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (pp. 1325-1326). Lincoln, NE: Buros Institute of Mental Measurements. Payne, R. W. (1985). [Review of the SCL-90-R]. In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook (pp. 1326-1329). Lincoln, NE: Buros Institute of Mental Measurements. Peterson, C. D. (1989). [Review of the Brief Symptom Inventory]. In J. C. Conoley & J. J. Kramer (Eds.), The tenth mental measurements yearbook (pp. 112-113). Lincoln, NE: Buros Institute of Mental Measurements.
Brown Attention-Deficit Disorder Scales for Children and Adolescents By: Brown, Thomas E, 20010101, Vol. 15 Mental Measurements Yearbook Review of the Brown Attention Deficit Disorder Scales for Children and Adolescents by KAREN E. JENNINGS, Clinical Psychologist, LaMora Psychological Associates, Nashua, NH: DESCRIPTION. The Brown Attention-Deficit Disorder Scales for Children and Adolescents is a multicomponent assessment device designed as an initial screening device and/or a comprehensive component of a multidimensional battery. In addition to the parent, teacher, and child questionnaires, there is a diagnostic formulation booklet. The diagnostic booklets contain semi-structured clinical interview questions, developmental history queries, diagnostic rule-out information, and quantitative summary charts (Total Raw Score grid, Cluster Score Graph, Multi-evaluator rating summary sheet, and a worksheet for inter- and intra-IQ discrepancy analysis). The scales yield a profile of patient's developmental abilities/impairments in attention, working memory, executive functions, cognition, affect, and behavior. DEVELOPMENT. The Brown ADD Scales were developed for adolescents and adults in 1996. Brown extended the age range down to 3 years of age and added items for the 2001 edition. The author designed these scales in response to his observations of children who demonstrated learning problems in the absence of a frank learning disability. The author conceptualized ADHD as a multidimensional disorder that requires a multidimensional diagnostic process. Brown operationalized the multidimensional nature of ADD via the six nonorthogonal clusters of the scales. Items were designed to assess one of the six clusters. Brown devised a model for his scale on the basis of his synthesis of literature on the varied manifestations of attention deficit disorders. His model is similar to other researchers' perspectives on the central role of executive deficit in the combined type of ADHD (Barkley, 1997; Nigg, 2001). The author suggested that executive function impairments were sin qua non of Attention Deficit disorders. He incorporated an assessment of executive deficits within the clusters of the scale. Pilot studies were conducted to assess the readability and comparability of versions of the scales. The tryout studies were performed to assess item selection and to discern the relative efficacy of individual or group administration. The author decided to administer the scale individually based upon the results of these preliminary studies. He did not discuss the findings of these tryout studies or the decision rules he incorporated in making these choices. TECHNICAL. Standardization. The Brown ADD Scales child normative sample was standardized on a stratified sample of 800 children aged from 3-12 years. Two hundred participants were selected from each of four geographic regions of the United States. The author selected a sample stratified by race and parental education. The sample replicated the distribution reported in the 1999 U.S. Census. The manual did not discuss whether the participant selection was based on the principle of randomness. The initial selection of participants for the child normative sample excluded children diagnosed with a psychiatric disorder or learning disability. Thirty-three children previously diagnosed with ADHD were added to this normative sample after it was collected. The author desired to ensure the representativeness of the sample to the population through incorporating this procedure because this number of cases represents the prevalence of ADHD in the population. Unfortunately, this decision may have added error variance in these norms. The author did not discuss the implications of this sampling procedure and the potential bias created by this procedure. The child clinical sample consisted of 240 children who were clients in the practices of multidisciplinary clinicians. Two hundred eight, of the original 240, were diagnosed with an attention deficit disorder based on DSM-IV criteria, clinical interviews with the child and his of her parents, teacher reports and standardized psychological testing. These children were selected as the clinical normative group. The author did not discuss the sampling techniques utilized in collecting this sample nor the characteristics of the children who did not complete the study. Limited data are provided on the standardization and clinical samples for adolescents. More details about these samples may be found in the previous (1996) manual. Reliability. The author collected internal consistency, test-retest reliability, and interrater reliability information for the child scales. Internal consistency (coefficient alpha) data were strong for clusters, total inattention scores, and total combined scores (e.g., cluster coefficients ranged from .73 to .91). Test-retest reliability (time interval from 1-4 weeks) data were generally strong. The test-retest coefficients (corrected coefficients) for teacher ratings of children aged 3-7 ranged from .78 to .89. For children aged 8-12, parent and teacher corrected test-retest reliability coefficients ranged from .84 to .92 and .84 to .93, respectively. The test-retest reliabilities for the child self-report ranged from .45 to .69. Interrater reliability coefficients ranged from .39 to .58 for ages 3-7, and from .46 to .57 for ages 812. The author attributed the lower rater correspondence to the differences in perspective. Validity. The author presented three types of information in support of the validity of scores from this scale: internal structure, criterion-related evidence, and convergent evidence of validity. The intercorrelation matrices of cluster and total scores presented strong evidence of validity. Cluster score coefficients ranged from .62 to .84. Cluster-total score coefficients ranged from .81 to .96. Brown presented criterion-related evidence for validity by comparing the performance of individuals previously diagnosed with ADHD and matched nonclinical individuals on the Brown scales. The author indicated that the Brown scales discriminated the control groups from the ADHD samples to statistically significant levels across age groups. Estimates of the practical significance of these statistically significant findings would further elucidate the robustness/power of the group differences. Brown presented intermeasure correlations as evidence for convergent and divergent validity. The comparability of the Brown scale to Achenbach's Child Behavior Checklist (CBCL), Behavior Assessment System for Children (BASC), and Conners' Rating Scales was assessed. These measures have been widely used and are highly respected in the assessment of attention deficit disorders. The CBCL assesses a wide variety of possible pediatric behavioral problems. The BASC and Conners' Scale measure manifestations of attention deficit disorders. As expected, the Brown Scales demonstrated stronger correlations with attention specific subtests than other behavioral categories on the CBCL. The coefficients for the parent version of the Brown Scales and CBCL ranged from -.25 (for CBCL Somatic Complaints and Brown's cluster 5 score) to .69 (for CBCL Attention Problems and Brown's cluster 6-Memory) for the ages 4-7. Correlation coefficients for the clinical sample aged 8-12 ranged from -.05 (for CBCL Somatic Complaints and cluster 6) to .70 (for CBCL Attention Problems and cluster 2-attention). These data support the premise that the Brown scales measure behavioral dimensions similar to and distinct from aspects of child behavior measured by the CBCL. The data for the BASC, Conners' Rating Scales and Brown cluster correlation coefficients provided evidence for the convergent validity of this scale. BASC Teacher ratings of attention problems demonstrated a strong relationship to the Brown Scales cluster scores (coefficients ranging from .50 to .91). The Brown Scales demonstrated a strong correlation to the Conners' Parent and Teacher Rating Scales. The Conners' Scales AD/HD index demonstrated a moderately high correlation with both the Inattention and Combined Totals scores for the clinical samples within the 3-7 and 8-12 age groups (coefficients ranged from .68 to .82). The Conners' Hyperactivity Scale demonstrated a moderately high correlation to the Brown Scales Monitoring and Self-regulation cluster (coefficients were .55 for 3-7year-olds and .79 for 8-12-year-olds). These findings illustrate the similarity amongst the Conners' Rating Scales and the Brown Scales for measuring aspects of attention deficit disorders. COMMENTARY. The Brown Scales for Children and Adolescents are an important and valuable contribution to the ongoing theoretical debate about the nature of ADHD. The breadth of coverage of the theoretically based elements of executive functions dovetails with contemporary discussions about the nature of primary, secondary, and tertiary deficits of ADHD (Barkley, 1997; Nigg, 2001). The incorporation of subtests that operationalize and measure elements of these neurocognitive functions is a strong asset to the empirical investigation of these functions. These scales would be useful in the comprehensive evaluation of a child with suspected attention, behavioral disinhibition, and/or learning problems. The Brown Scales contribute a plethora of important information for the evaluator of a child with suspected learning difficulties. The potential vulnerabilities of the Brown Scales for Children lie in some of its sampling procedures. The standardization sample was selected to parallel the U.S. Census in its representativeness. Unfortunately, the absence of a discussion of the incorporation of the principal of randomization may result in sampling error, increased risk for bias, and decreased generalizability of findings. Most likely the author instituted safeguards for minimizing these possibilities. A discussion of these safeguards would be helpful for the test user. SUMMARY. The Brown Scales for Children and Adolescents are important tools in the multidimensional assessment of children struggling to learn. The handouts and organizational materials are very useful to performing a complete profile analysis of the multiple aspects assessed by this measure. Ongoing assessment of the psychometric implications of possible sampling error, limits in generalizability, and continued elucidation of the underlying dimensions of the constructs of this measure will mostly be an ongoing and important process. However, these scales are very useful, convenient, informative, and user-friendly measures necessary to the comprehensive evaluation to rule out attention deficit disorders. REVIEWER'S REFERENCES Barkley, R. (1997). Behavioral inhibition, sustained attention, and executive functions constructing a unifying theory of ADHD. Psychological Bulletin, 121, 65-94. Nigg, J. T. (2001). Is ADHD a disinhibitory disorder? Psychological Bulletin, 127, 571-598. Review of the Brown Attention-Deficit Disorder Scales for Children and Adolescents by WILLIAM K. WILKINSON, Consulting Educational Psychologist, Boleybeg, Barna, County Galway, Republic of Ireland: DESCRIPTION. The Brown Attention-Deficit Disorder Scales for Children and Adolescents (Brown ADD Scales) is a downward extension of a pre-existing measure known as the Brown Attention Deficit Disorder Scales for Adolescents and Adults. The author notes that the purpose of the test is to "elicit parent and teacher observations of symptoms in 3- to 12-year-olds and to elicit self-report from children ages 8 years and older that may indicate impairment in executive functions related to Attention Deficit/Hyperactivity Disorders" (manual, p. 5). Towards this end, the test includes five rating forms: Parent Form age 3-7, Parent Form 8-12, Teacher Form 3-7, Teacher Form 8-12, and Self-Report Form (children 8 to 12). Test materials also include a Diagnostic Form that can be used for gathering developmental history, reviewing DSM-IV symptoms of AD/HD, and summarizing rating results as well as cognitive test scores (e.g., ability and achievement). Further, the test package includes a revised Diagnostic Form for Adolescents and a rating scale for this same age group. The manual includes a conversion table for the 12- to 18-year age group, so the Brown ADD Scales effectively covers an age range from 3 to 18 years. The rating forms are clearly presented and easy to use. The Parent and Teacher Forms (ages 3 to 12) can be completed independently by these raters. The Child Self-Report Form is completed in an interview method (to insure the child understands the items and the assessor understands the child's responses). There are 40 to 50 items, depending on the form. The author recommends that the Adolescent Form be completed in a joint interview situation. This seems wise, because the Adolescent rating form has two number scales (e.g., 0 = Never; 1 = Once or Twice a Week; 2 = Twice a Week; and 3 = Almost Daily) for each item. If the adolescent were to complete the form independently, and it was then given to a parent, the parent could see the adolescent's self-report and this could influence the parent assessment. It should be noted that the Adolescent scale does not have a corresponding teacher version. Although the author discusses why this is so, this is a disadvantage, because it reduces the numerical/objective quantification of AD/HD-type behaviors beyond the home environment. Generalizability of AD/HD-type problems across settings is alluded to in DSM-IV, although how this is done remains "nonstandardized" and variable across clinicians. This is why a Teacher Form for the Adolescent age group would be useful. The forms are easy to score. One simply tears the perforated form and transforms item ratings to cluster and total scores (converted to T-scores). The following scores are obtained for the Child Version: Activation, Focus, Effort, Emotion, Memory, with the total of these five clusters yielding a Total inattention score. This score is then added to the final cluster, Action, and all scores are summed to obtain an ADD Combined Total score. An optional CD-ROM Scoring Assistant is available. DEVELOPMENT. One of the most positive aspects of the Brown Scales is the theoretical base for item development. Rather than delimit itself to the DSM-IV description of Inattention and Impulsivity/Hyperactivity, the author has expanded the item pool based on more recent theoretical developments, namely "executive functions" (manual, p. 10). In this regard, the six cluster scores relate to theory, yet realistically cover the "true" problems related to AD/HD. Take for example the Cluster score, Emotion. Experts widely agree that negative emotions abound in children and adults with AD/HD. Yet, it is never clear whether these emotions play a secondary role (as most AD/HD experts suggest) to AD/HD, or whether these emotional states play a fundamental role in creating/maintaining AD/HD-type behavior. By allowing for a separate cluster score in this area, and others, the user of the Brown ADD Scales will have a more refined and accurate clinical picture. From a technical standpoint, I would like to know the empirical base for the clusters (e.g., how did these clusters emerge from initial item sets?). There is significant overlap among the clusters, but it is the author's contention that the clusters are related, but not synonymous. TECHNICAL. The author presents a well-organized and reasonably thorough discussion of the standardization and psychometric properties of the scale. The standardization sample of 800 children follows U.S. Census data. Raw score transformation tables are provided for different raters and the different forms. The author notes the raw scores were "skewed" and that the conversion to "standardized scores" was used because of the nonnormal distribution. I wish the author had engaged in speculation about the reasons for the skewness. And the implications of the nonnormal distribution are readily apparent when the author discusses T-score threshold interpretation. For example, in the Diagnostic form, the T-scale is given and a T score of 45-54 is considered "Average range; possibly significant concern"; T score of 55 to 59 is "Somewhat Atypical; probably a significant concern." Yet, in a "normal" distribution, we are still less than one standard deviation above the mean when we get to a Tscore of 59. Obviously, the raw score distribution packed up in the lower raw score ranges, and the conversion to a standardized t-distribution will not fix this. I would warn potential users to be aware of this issue and ask, "why the skew?" I would also suggest that test users adopt a T-score threshold of 60 or above before considering the relevance of cluster scores or total AD/HD. Reliability evidence is thorough and meets acceptable standards. Of particular interest are the testretest reliability data (retest interval 1-4 weeks) for the child group (ages 8 to 12). Here the stability coefficients drop well below .80. It should be noted that the retest sample consisted of 43 children from the standardization sample and that of this sample, maybe 1 to 3 children were "clinical" (diagnosed with AD/HD). Therefore, the reliability data reported reflects a predominately "nonclinical" population. However, if one were to assess a purely "clinical" group, consisting of children who meet guidelines for AD/HD, my guess is the reliability coefficients would be considerably lower (knowing that children with AD/HD are notoriously poor in self-awareness). Validity evidence is very thorough. The author covers factorial validity, differential population validity (raters score the AD/HD sample and non-AD/HD sample significantly different on clusters and total scores), and convergent validity (the Brown Scale correlates with other scales measuring similar constructs). COMMENTARY. The Brown ADD Scales represent a significant advance related to previous instrumentation regarding adult ratings of AD/HD. Existing measures are confined to DSM-IV symptoms and minimal elaboration of these symptoms in behavioral terms. By contrast, Brown follows more recent theoretical advances related to AD/HD, with much of this theorizing on "executive functions." Items were developed with these functions in mind. And the reliability and validity evidence suggests that the scales measure these functions. There are additional strengths of the Brown ADD Scales. First, the forms are well presented and easy to administer and score. The addition of a self-report form for children is an important extension, although the data derived from a young child's self-report should be viewed cautiously. The diagnostic form is also a convenient summarizing form and clinicians should find it very useful in the overall evaluation of AD/HD. The significant caution is the "Threshold Interpretation" the author recommends, specifically the T-score cutoff points. In this regard, we have to consider the positive skew. However, even if no skew were reported, I would question how one could say that a T-score of between 45 and 54 can be interpreted as "Average range, possibly significant concern" (manual, p. 41). We should not forget that the mean of the T distribution is 50, so that the author risks identifying at least 50% of the population as having some type of ADD-type difficulty (or concern). The potential implication is a significant increase in "false positives" (e.g., considering that children have an ADD-related problem when in fact they do not). I strongly suggest that test users review this information and make appropriate adjustments. A related point is that when scores surpass certain threshold levels, the author says a "diagnosis" of ADD is "strongly suggest[ed]" (manual, p. 40). Again, given the concern about the threshold cutoffs, this interpretation should be carefully reviewed. And, as the author notes, the most prudent way to view the overall diagnosis of ADD is to place rating data in the context of all other data obtained, especially clinical interview, behavior observations (e.g., at home school, during standardized testing), and other test outcomes. SUMMARY. The Brown ADD Scales represent significant advancements in the measurement of adult perceptions of a child's AD/HD-related difficulties. The scales follow theoretical advancements related to AD/HD as opposed to a limited set of "atheoretical" psychiatric symptoms. The forms are relatively easy to administer and score. In addition, there is adequate technical support related to the normative sample, reliability, and validity. The one important caveat is that potential users reconsider the author's interpretation of the scale scores (e.g., Threshold Interpretation) to more prudent, realistic, and appropriate levels.
Brown Attention-Deficit Disorder Scales By: Brown, Thomas E, 19960101, Vol. 14 Mental Measurements Yearbook Review of the Brown Attention-Deficit Disorder Scales by NADEEN L. KAUFMAN, Lecturer, Clinical Faculty, Yale University School of Medicine, and ALAN S. KAUFMAN, Clinical Professor of Psychology, Yale University School of Medicine, New Haven, CT: The Brown Attention-Deficit Disorder Scales (Brown ADD Scales) is composed of two "Ready-Score," 40item, self-report scales (one for adolescents, ages 12-18 years, and one for adults, ages 18 and older) that are similar to each other, differing primarily in the wording-but not the intent-of a number of items. The adolescent form is geared for students, and is oriented toward school ["Has difficulty memorizing (e.g., vocabulary, math facts, names, dates)"], whereas the adult form is more generic or job-oriented ["Has difficulty memorizing (e.g., names, dates, information at work)"]. Occasionally, the wording is more difficult in the adult form (e.g., the words "setting priorities" are included in an "adult" item about bogging down when presented with many things to do, but are excluded from the adolescent version of the item). Nonetheless, a majority of items (22 out of 40) are identical in both forms, and many others differ by a word or two. The Brown ADD Scales are intended for use with adolescents and adults known or suspected of having symptoms of Attention-Deficit Disorders (ADDs), with or without impulsivity and hyperactivity. Although the DSM-IV (American Psychiatric Association, 1994) lists criteria for assessing Inattention, Hyperactivity, and Impulsivity to diagnose Attention-Deficit Hyperactivity Disorder (ADHD), the Brown ADD Scales focus exclusively on the Inattention criteria, even tapping a range of symptoms beyond these criteria, "to assess for additional cognitive and affective impairments often associates with ADDs" (manual, p. 1). The Brown ADD Scales are clearly created in the image of the author's own clinical (and not necessarily mainstream) model of ADD and ADHD which relates to his focus on Inattention; he virtually excludes the Hyperactivity and Impulsivity symptoms that are intuitively associated with impulse inhibition. Indeed, one of the seven conceptual assumptions on which the Brown ADD Scales rest is, "Hyperactivity/impulsivity is not an essential element in ADDs" (manual, p. 8). Brown lists three uses for his scales, which may be administered "by a wide range of professionals with graduate training in psychological assessment" (manual, p. 12): (a) the preliminary screening of individuals suspected of having an attention-deficit disorder; (b) as one part of a more comprehensive battery for assessing attention-deficit disorders; and (c) as an instrument for monitoring treatment effectiveness for people with ADDs who are receiving medications or other interventions. However, his emphasis on the screening role of his scales and its usefulness as only a piece of a larger picture is occasionally compromised by statements that imply that scores on his scales can be nearly definitive for a diagnosis: "scores of 50 or more strongly suggest an ADD diagnosis for adolescents and adults" (p. 1). In addition, the graph for interpreting scores on the total scale converts "high" scores to the category "ADD highly probable." ADMINISTRATION AND SCORING. The two 40-item Ready Score forms, one for adolescents and one for adults, may be administered orally or in written (self-administered) format. For adolescents, oral inquiry is preferred, ideally with one or both parents present, such that student and parent are both queried (though scores are supposed to be kept for each respondent, it is the student's responses that yield the scores). Parents are included, according to the author, to allow each person to offer a different perspective plus more data, to curb exaggerations, and to allow each participant to gain appreciation of others' perspectives. Not stated by the author are the potential problems that might occur for adolescents in the presence of their parents, namely the loss of confidentiality, feeling inhibited about saying unpopular things, being entirely truthful, and so forth; similarly, parents may be more truthful and less inhibited if they are questioned privately, especially because teen relationships may be tenuous. At the least, examiners should have some type of clinical or counseling experience to buttress their requisite assessment experience to deal with subtle confrontations, hostility, or lack of appropriate participation by one or more family members. Apart from possible problems in the interactions between parent and teen, the adolescent Ready Score form is easy to administer. The adult form, likewise, is easy to administer in either an oral or written form. Adults have the option of bringing a close friend, spouse, or parent to the oral evaluation (in which case the same joint procedures described for adolescents are used), but the potential conflicts are minimized because the referred adult has control over the situation. The Brown ADD Scales yield a total score plus subscores in the following clusters: (a) organizing and activating for work (resulting from chronic problems with a high threshold for arousal or high anxiety); (b) sustaining attention and concentration (either receptive, when listening, or actively, when engaged in an activity such as reading); (c) sustaining energy and effort (inconsistent energy or sustained effort due to laziness, sluggishness, or lack of vigilance); (d) managing affective interference (moods that affect social interactions, related to irritability, frustration, and anger); and (e) utilizing "working memory" and accessing recall (forgetting to bring a needed item or to do a necessary task, misplacing things). Each of the 40 items on the adolescent or adult scale is scored on a 4-point scale. Students or adults are asked to listen to each item (or read each item) and indicate if that item has been a problem for them over the past 6 months. If it is "never" a problem, the score is 0. Problems once a week or less are scored 1; twice a week = 2; almost every day = 3. Two lines of 0-3 scores are provided for each of the 40 items to enable the examiner to record the referred person's responses (top row) and the collateral Informant's responses (bottom row). To obtain scores, examiners must tear off the perforated edge of the answer sheet, transfer all item scores to the appropriate column for each cluster, add two subtotals per cluster to get the cluster raw scores (which requires careful attention to a confusing array of "connecting" lines), and transfer cluster and total scores to a graph that permits conversion of raw scores to T scores (an optional step). On the graph, total scores are also converted to a category (e.g., "ADD probable but not certain"). Although it is logical (because of the range of scores possible for each) that it is total raw score, and not total T score, that is categorized, the instructions do not tell examiners that important fact; misinterpretations are quite feasible. In addition, the extensive transferring of item and cluster scores, and the possible confusion when summing subtotals, encourages clerical errors. A better designed answer sheet would have allowed the responses to be recorded in their pertinent column, thereby avoiding the transferring. The use of T scores is both confusing and puzzling. The manual includes extensive interpretive information, but it is the raw scores, and not the T scores, that are interpreted. Although examiners are instructed to convert raw scores to T scores as an optional step, the metrics of this presumably standardized score are not provided. One thing is evident from the conversion table, however: The mean of 50 and standard deviation of 10 that are routinely applied to T scores were not used for the Brown ADD Scale; T scores for Brown's scales range from <= 50 to 100+ with a midpoint of about 75. On the positive side, the items written for both adolescents and adults are stated in an empathic tone; no test items are phrased to sound negative or judgmental. Therefore, the items are likely to elicit pertinent commentary or complaints from the client and will help form a "connection" with the examiner. In addition to the 40-item scales, the Brown ADD Scales include booklets known as Diagnostic Forms to aid in a multifaceted assessment of Attention-Deficit Disorders. The 15-page forms, one targeted for adolescents and the other for adults, includes five pages with written questions followed by blank lines for writing responses as a means of summarizing the client's clinical history; one page for copying over the scores on the 40-item scale; one page for determining multi-rater agreement on the DSM-IV criteria for ADHD, including Hyperactivity and Impulsivity; as well as additional pages to allow screening for comorbid disorders, for recording an array of scores derived from Wechsler's (1981, 1991) IQ scales (the manual discusses interpretation of the Brown ADD Scales in the context of scores obtained on the Wechsler Intelligence Scale for Children-Third Edition [WISC-III] and Wechsler Adult Intelligence ScaleRevised [WAIS-R]), and for integrating all data and information into a diagnostic format. These forms are likely to be useful for novice examiners who have limited experience and will be able to benefit from the extreme structure built into the forms. Experienced examiners, who have already developed their own interviewing, assessment, and diagnostic strategies, may find the forms confining and of little practical value, especially if their orientation is even moderately different from Brown's approach. PSYCHOMETRIC PROPERTIES. The Brown ADD Scales report total scores, cluster scores, and categories for those assessed by this instrument, but do not present data for an actual standardization sample of adolescents or adults. Instead, several research samples, clinical and nonclinical within both the adolescent and adult age ranges, are described. When subsamples were combined, the nonclinical samples numbered 190 adolescents and 143 adults; the clinical ADD samples (composed of individuals both with and without Hyperactivity in approximately equal numbers) totaled 191 adolescents and 142 adults. Within the adolescent and adult samples, the clinical and nonclinical samples were reasonably matched on gender, age, socioeconomic status (SES), and ethnic background, permitting meaningful comparisons for determining the diagnostic validity of the scales. However, the nonclinical samples, which provided the basis for interpreting scores yielded by the Brown ADD Scales, do not match U.S. Census data on these key background variables. The percentages for White, Hispanics, and AfricanAmericans seem to be similar to U.S. Census proportions (no Census data are provided), but the sample is decidedly of higher SES than the nation as a whole, and does not report data for geographic region or community size. The gender distribution reveals a proportion of 87 males to 13 females within the adolescent nonclinical population compared to a very different ratio of 44 males to 56 females among the nonclinical adult sample. The former proportion may reflect the fact that many more males than females are referred for possible ADD, but that does not alter the fact that such a disproportion of males to females makes the sample highly undesirable as a reference group. Furthermore, the nonclinical samples are too small to be appropriate standardization samples. This concern ranks as the largest drawback of the scales. Comparison of the clinical and nonclinical samples reveals exceptional discriminant validity for the five clusters and the total score for both adolescents and adults. Total raw scores averaged about 72 for the adolescent clinical sample compared to 39 for the nonclinical sample. For adults, the respective values were about 78 and 31 (SD = 16), for a discrepancy of nearly 2 SDs. These are impressive results, and they maintained for all five scales. Whereas discrimination between the clinical and nonclinical samples offers some evidence of the construct validity of the Brown ADD Scales, in general, construct validity evidence is meager. Item-total correlations are presented as a kind of evidence of construct validity, and these coefficients are generally good, but no evidence is offered (other than the face validity of the items) to support the separate identify of the five clusters or to validate the construct that each purports to measure. Evidence of concurrent validity is offered by examining relationships of the Brown ADD Scales to the Wechsler scales (WISC-III or WAIS-R, depending on the person's age). Brown demonstrated that adolescents and adults with ADD scored substantially lower on the triad of subtests associated with attention-concentration than on the Verbal or Spatial triads. If, indeed, the three component tasks measured attention-concentration for the ADD samples (or working memory, or freedom from distractibility, other interpretations commonly assigned to the trio of subtests; see Kaufman, 1994), then the Brown/Wechsler data do offer good evidence of the concurrent validity of the Brown ADD Scales. Coefficient alpha reliability coefficients for the total scale were .92-.93 for the two nonclinical samples, .90 for the adolescent clinical sample, and .86 for the adult clinical sample. These values are acceptable. However, the values of .95-.96 reported in the manual by Brown for the combined clinical and nonclinical samples are bogus and should not be interpreted. The distributions of scores described in the discriminant validity studies are clearly bimodal, preventing meaningful interpretation of either reliability or validity data for combined samples. Unfortunately, the only values presented for the five clusters are for the combined sample. Therefore, the values of .70 to .89 for adolescents and .79 to .92 for adults should not be interpreted; they are spuriously high to an unknown degree. The test-retest stability, fortunately, was obtained on 75 nonclinical adolescents over a 2-week interval and the resultant coefficient of .87 indicates excellent stability. Unfortunately, coefficients are not reported for the five clusters; mean scores on the two testings are not reported, preventing any understanding of practice effects; and no stability data were obtained for the adult form. OVERALL EVALUATION. One of the key strengths of the Brown ADD Scales is the author himself, who is an astute, experienced clinician whose in-depth knowledge of ADD, his empathy for families affected by it, and his enthusiasm for the topic are all quite evident by reading the manual. Brown's approach, however, is more clinical than research based. He has written extensively on ADD, but apparently has not conducted an abundance of published research on ADD to support his conception of the disorder. The research that he cites in support of his theoretical approach (manual, pp. 9-11) seems tangential and limited in scope. Yet he has developed an instrument for adolescents and adults, with the latter group being underserved and often ignored. His development of truly an adult-oriented scale reflects a very nice contribution to the ADD field. So, too, is the manual for the scales. Though it is occasionally disorganized (it is not always easy to know where to find pertinent information), it is written in a reader-friendly style that explains important concepts in a straightforward manner. Brown avoids technical language and jargon and presents material in a consistently commonsense manner, from the assumptions he makes as a foundation for his approach and his scales to the impressive amount of interpretive material (including useful case studies) that fill the manual. He offers concrete suggestions, such as when giving feedback to clients, and gives many clear examples. In addition, he offers the reader numerous cautions about ADD diagnosis and makes suggestions of supplementary measures to use to assess areas not covered by the Brown ADD Scales. The examiner who is inexperienced with ADD will learn much from giving the scales to a client and from reading the manual, and will conceivably use the results of the screening scales to refer the client to an expert, if need be. One caution concerns the considerable claims made for the use of the scales for monitoring treatment: Research documentation is sorely needed to determine its utility for this important purpose. SUMMARY. The Brown ADD Scales provide screening scales for measuring inattention and symptoms associated with inattention, and are targeted for adolescents and adults; they make an especially important contribution to adult assessment because that group is often ignored in the diagnosis and treatment of ADD. The test items are worded empathetically and should elicit meaningful responses from referred clients. The scales are administered to the client, even though a parent or collateral informant is also advised to provide responses to the questions (especially for adolescents). It is good to have the client's perceptions as part of the diagnostic process, but it is a negative of the scales that only the client's responses are assigned scores. Administration is easy, but scoring is more complex than necessary and may lead to clerical errors. Evidence of discriminant validity is excellent, internal consistency reliability is good for the total score, but statistics provided did not permit evaluation of the reliability of the five clusters. In addition, there is meager construct validity evidence for the five clusters. The biggest shortcomings of the Brown ADD Scales are the lack of a large, representative normative sample for adolescents or adults, and the limited research base in support of the author's model. The biggest positives are the author's clinical expertise and the informative, easy-to-read test manual. On balance, the instrument should be quite useful for ADD diagnosis if used appropriately as a screening tool or as part of a comprehensive battery; its role for monitoring treatment (including the effects of administering the same scale several times) needs to be demonstrated with empirical research. REVIEWER'S REFERENCES Wechsler, D. (1981) Manual for the Wechsler Adult Intelligence Scale-Revised (WAIS-R). San Antonio, TX: The Psychological Corporation. Wechsler, D. (1991). Manual for the Wechsler Intelligence Scale for Children: Third edition. San Antonio, TX: The Psychological Corporation. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Kaufman, A. S. (1994). Intelligent testing with the WISC-III. New York: Wiley. Review of the Brown Attention-Deficit Disorder Scales by E. JEAN NEWMAN, Assistant Professor of Educational Psychology, University of South Alabama, Mobile, AL: The Brown Attention-Deficit Disorder Scales consist of a self-report assessment for adolescents (ages 1218) and adults. The 40-item instruments are designed to be used for initial screening, as part of the comprehensive diagnosis process and as an assessment (in repeated administrations) of ongoing treatment. The author suggests that a professional read the questions to the subject, asking for self-ratings and examples. In the case of adolescents, the author suggests that one or both parents be present and offer their assessment to items after the subject offers his or her self-score. The instrument is designed to easily record the self-report scores, as well as the "collateral" scores, with instructions for separate scoring and interpretation. Administrator skill level requirements are defined as "a wide range of professionals with graduate training in psychological assessment" (manual, p. 12). Scoring is made simple by design of the instrument, and clear instructions are offered in the training manual. A preliminary test was developed within the clinical practice of the author. A pilot test consisted of a 40item instrument administered to 40 students aged 12 years to college age, and an additional unspecified number of adults, all of whom had been referred for attention and/or achievement problems, and all of whom met criteria for Attention Deficit Without Hyperactivity in the Diagnostic and Statistical Manual of Mental Disorders, Third Edition (DSM-III). The scores were compared with those of 40 "nonclinical adolescents and adults" (manual, p. 31). No description was offered related to sampling methodology. Likewise, no description of any modification of the original instrument was found. Because the pilot instrument contained 40 items, as did the published instrument, it can only be surmised that no changes in item content were made. Subsequently, there were two phases of data collection in each age group. The adolescent version was administered to 76 subjects (ages 12-18) in Phase I and 134 in Phase II, who had been referred to a clinical psychologist for academic underachievement. The scores were compared with a matched sample of 75 (Phase I) and 115 (Phase II) nonclinical students, randomly selected from one junior and one senior high school. The adult scale was formed, during this second phase, by rewording items on the adolescent scale. This rephrased scale was administered to 50 adults in Phase I and 123 in Phase II, who had sought treatment for ADD-like symptoms, and who met DSM-III criteria for Attention Deficit Disorder With or Without Hyperactivity. A comparison group of nonclinical adults was tested for comparison, recruited from "two work settings and one civic organization" (manual, p. 32). Although there was a "Phase 1" and "Phase 2," no evidence of revision or editing between phases was reported. Demographic data indicate equivalent numbers for each two-age span in the adolescent group in both the clinical and nonclinical groups. Eighty-seven percent of the nonclinical sample were male, and 80% of the clinical sample; 13% of the nonclinical and 21% of the clinical sample were female. Seventy-one percent of the nonclinical were white, and 74% of the clinical; 17% of the nonclinical were African American, and 15% of the clinical; 13% of the nonclinical were Hispanic, and 11% of the clinical. Among the clinical, 40% fell within the normal IQ range, and 60% in the above average range. Therefore, this instrument should only be used with subjects who have a measured IQ score in the average or above average range. Among the adults tested, 44% of the nonclinical group and 61% of the clinical group (total n = 245) were male; 56% of the nonclinical and 39% of the clinical group were female. Other demographic data were relatively similar to data from the adolescent sample. Scores on the adolescent scale tended to increase significantly with age. However, when age regressed against scores from the instrument, it was reported that the ADD diagnosis accounted for difference more than did age. Although the 40 items on the scale include symptoms from the diagnostic checklist of the DSM-IV, the author clearly states that the instrument is based on a broader, dimensional model. Therefore, an unspecified number of the 40 items were constructed based on the author's own theoretical model. The author further divided the items into five clusters used for diagnosis, although these clusters are not analogous to categories contained in the DSM-IV. The ranges of within-cluster consistency ranged from .57 to .80. The correlation between clusters ranges from .68 to .86. Five professionals in the area of Attention Deficit Disorders diagnosis also reviewed the instruments. Using Kappa scores, interrater reliability was .85. The author assessed concurrent validity using subtest scores from the appropriate Wechsler Intelligence Scales (WISC-R, WISC III, and WAIS); measures of attention, short-term memory, concentration, and processing speed were assessed using Arithmetic, Digit Span, Symbol Search, and Coding/Digit Symbol. The reader is referred to the manual for specific descriptions of analysis, where similarities are shown between self-reported ADD-related deficits and deficits on IQ subtest comparisons. Significant differences were found, among all ages, between clinical and nonclinical subjects, supporting the ability of the Brown Scales to discriminate between ADD and non-ADD subjects. In addition to the assessment instruments and manual, the complete kit contains a diagnostic form for both adolescents and adults. Sections include a Clinical History Protocol, Scoring Summary of the instrument, a Multirater Evaluation Form to parallel the DSM-IV, a Screener for Comorbid Disorders, an Examiner's Worksheet for IQ Test Data, a Summary of Wechsler Scores relevant to ADD Diagnosis, and an Overall Summary of Diagnostic Data form. Although these forms represent the basic standard now acceptable for diagnosis, the format and explanations provide an excellent summative tool for thorough assessment and evaluation. SUMMARY. The Brown ADD Scales provide a unique assessment in targeting the adolescent and adult population. Its unique emphasis on those with above-average intelligence provides detailed theoretical and psychometric means for assessing this subgroup. Statistical analyses provide adequate support for psychometric properties assessed thus far. Therefore, the instrument may be appropriately recommended for initial screening and possible assessment of ongoing treatment for the two target populations. However, the instrument is based on the author's theoretical model, which is not fully congruent with currently acceptable diagnostic standards as represented in the DSM-IV. Both the symptom characteristics and the diagnosis (ADD, rather than AD/HD) suggest problematic concern for practitioners, therapists, school systems, and insurance companies, all of whom perceive the model presented in the DSM as currently standard. Therefore, for diagnostic purposes, the instrument may not be recommended currently. Review of the Brown Attention-Deficit Disorder Scales by JUDY OEHLER-STINNETT, Associate Professor of Applied Behavioral Studies, Oklahoma State University, Stillwater, OK: NEED FOR INSTRUMENT. The Brown Attention-Deficit Disorder Scales (Brown ADD Scales) are designed to measure "symptoms of Attention-Deficit Disorders" in adolescents ages 12-18 and in adults 18 years and older. Its intended purposes are to screen for ADD in areas such as cognitive and affective impairment as well as in attention, to be utilized as part of a comprehensive measurement, and for treatment monitoring. Compared to the numerous behavior rating scales for children, there is a need for a scale such as the Brown ADD Scales for the adolescent and adult age groups. Additionally, the Brown ADD Scales is unique in that it is a self-report measure asking clients directly about the difficulties they encounter in daily living. Another strength of the Brown ADD Scales is the inclusion of a diagnostic form that guides practitioners in utilizing the scale as part of a comprehensive assessment, including test results, DSM-IV criteria, screener for comorbid disorders, and summary of Wechsler information in data interpretation. Thus, rather than simply saying, "don't use this test alone," specific written information utilizing the Brown ADD Scales within the context of a multifactored assessment is provided with the test itself. THEORETICAL/RATIONAL TEST DEVELOPMENT. The Brown ADD Scales is based on the author's theoretical model derived from an extensive review of the theoretical literature of Attention Deficit Disorder (ADD) and symptoms described by clinical clients. Five clusters capture the dimensions of this theory: Organizing and Activating to Work, Sustaining Attention and Concentration, Sustaining Energy and Effort, Managing Affective Interference, and Utilizing "Working Memory" and Accessing Recall. As such, it is one of the few measures that captures the cognitive and motivational components of functioning that might contribute to problems in everyday living experienced by adolescents and adults with ADD. These important dimensions have been further described by Russell Barkley as critical in the understanding of ADD/ADHD. STATISTICAL TEST DEVELOPMENT/PSYCHOMETRIC PROPERTIES. Initial development of the Brown ADD Scales included administering the initial item pool (the number of initial items was not reported) to clinical clients (number also not reported). Refinement of the scale to 40 items was completed; item analysis and selection processes were not described. Item alphas for clusters (subscales) and the total score are reported. Item-cluster alphas ranged from .70 to .89 for the adolescent scale and .79 to .92 for the adult scale. Items with very low item-cluster alphas remain in the scale (rs range from .18 to .76 for the adolescent scale and from .26 to .84 for the adult scale). The total alphas lend stronger support for the scales; the total alpha for the adolescent version was .95 and for the adult version, .96. Cluster intercorrelations range from .57 for Managing Affective Interference with Memory/Recall to .80 for Organization/Activation with Sustaining Energy and Effort. Although internal consistency data are useful aspects of test development for construct validity as well as for reliability, it is unfortunate that further statistical analyses substantiating the theoretical model of the scale were not conducted. Because of this, the suggestion that a profile analysis should be conducted with clusters is not warranted by the data presented. The 40 items were administered to samples of adolescents and adults in order to demonstrate group differences. In two studies, the adolescent and adult versions discriminated clinical (ADHD) from nonclinical groups; however, studies utilizing the scale with other diagnostic groups or those with comorbid disorders are not reported. Sensitivity and specificity data for development of diagnostic cutoff scores are reported, presumably based on the clinical diagnosis from the practitioner from whom the clinical sample was obtained. Cutoff score recommendations are reported as yielding false negative rates for the adolescent scale at 10% and false positive rates at 22%. For the adult scale, the false negative rate is reported as 4% and the false positive rate at 6%. Appropriately, in this section users are admonished to use these for screening purposes only. The adolescent version was evaluated for monitoring of medication treatment effects. Brown ADD Scales scores improved, as did GPA, for adolescents on stimulant medication. This is an important step in use of a clinical scale in working with persons with ADHD. However, as with several other scales developed in clinic settings, the Brown ADD Scales assume that demonstrating group differences and treatment effects are sufficient for test development and neglect critical aspects of construct validity and test norming. Frequency of agreement of the Brown ADD Scales with a Bannatyne interpretation of the Wechsler scales is reported as concurrent validity. Correlations with other rating scales or self-report measures are not reported. The manual encourages practitioners to have the client as well as a collateral (e.g., parent for the adolescent scale) complete the rating. In the validity section, a correlation of .84 is reported for a self-report versus parent rating. NORMATIVE INFORMATION. Adolescent norms are based on the initial samples gathered from one private-practice psychologist for the clinical sample (n = 191) and from two schools for the nonclinical sample (n = 190). The adult norms are based on two samples from a private practice psychologist for the clinical sample (n = 142) and from work/civic settings for the nonclinical sample (n = 143). There is a much greater number of males than females in the clinical samples. Demographic information such as age, socioeconomic status, race/ethnicity, and IQ level are reported as being basically representative of the U.S. Census; however, with such a small and limited sample, norms are inadequate for national application. There is no description of how standard scores were derived; thus, it is unclear whether these are based on the clinical, nonclinical, or the total sample of each level of the test. Norms utilizing the parent rating for the adolescent scale would be useful. The adolescent norms are based solely on adolescent self-report rather than a combination of self-report with parent input. Test-retest reliability for a 2-week interval is reported as .87 for a sample of 75 adolescents, which is acceptable. No test-retest reliability is reported for the adult version. Confidence intervals based on normative data adjusted for regression to the mean and test reliability are reported for use in treatment monitoring. ADMINISTRATION/SCORING/INTERPRETATION. Instructions for administration include having the adolescent and at least one parent respond to oral inquiry. For the adult form, a collateral person is optional and the test can be completed orally or in writing. Scoring is based on self-report regardless of the method of administration. Raw scores are converted to t-scores on the ready-score protocol. Cutoff scores are coded on a threshold continuum. There are also directions for completing the treatment monitoring form that includes sections for pre- and posttreatment ratings. Detailed descriptions on utilizing the diagnostic form information are also provided; this includes history, comorbid disorder, and DSM-IV screening, and reviewing IQ data. A separate interpretation section is provided. A detailed description of the behaviors associated with each cluster is provided for interpretation purposes. However, until further psychometric work is complete, use of these interpretations should be limited to an informal symptom checklist. SUMMARY AND RECOMMENDATIONS. Upon first reading, the Brown ADD Scales is an exciting scale because it is based on state-of-the art models of ADHD that go beyond DSM diagnoses. However, this reviewer is looking forward to the Psychological Corporation's continuing work in examining and documenting the technical quality of this instrument. As the new child version is developed, it is hoped that basic psychometric steps will be implemented as well.
Conners Comprehensive Behavior Rating Scales By: Conners, C. Keith, 20080101, Vol. 18 Mental Measurements Yearbook Review of the Conners Comprehensive Behavior Rating Scales by JEREMY R. SULLIVAN, Assistant Professor of Educational Psychology, University of Texas at San Antonio, San Antonio, TX: DESCRIPTION. The Conners Comprehensive Behavior Rating Scales (Conners CBRS) was designed as an omnibus measure of emotional, behavioral, academic, and social functioning among children (please note: the term "children" will be used in this review to include both children and adolescents). The Conners CBRS is purported to facilitate decision making with regard to diagnosis, special education classification, intervention planning, progress monitoring, and research. The Conners CBRS includes a parent rating scale (203 items), teacher rating scale (204 items), and self-report scale (179 items), which may be used independently or in conjunction with one another to facilitate multi-informant assessment. The parent and teacher scales can be used with children from 6 to 18 years of age; the self-report scale can be used with children from 8 to 18 years of age. Given the test author's intent to assess a range of behavioral and psychological problems, the Conners CBRS includes over 40 scales. The Content Scales include broad areas of dysfunction, such as Emotional Distress, Defiant/Aggressive Behaviors, and Academic Difficulties. The Symptom Scales are tied to specific DSM-IV-TR diagnostic criteria, including ADHD (differentiated by subtype), Conduct Disorder, Oppositional Defiant Disorder, Autistic Disorders, Mood Disorders, and Anxiety Disorders. These scales are meant to help the examiner narrow down which diagnoses are likely and unlikely to be appropriate for the child. The Validity scales are designed to detect three possible patterns of responding: Positive Impression (i.e., faking good), Negative Impression (i.e., faking bad), and Inconsistency (i.e., careless or random responding). The Conners Clinical Index attempts to differentiate children with a discernible diagnosis from children without a discernible diagnosis. The Other Clinical Indicators scales assess other potential problems such as Bullying Perpetration and Victimization, Panic Attack, Posttraumatic Stress Disorder, Substance Use, Tics, and Trichotillomania. Two sets of Critical items (Self-Harm and Severe Conduct) may identify immediate safety concerns. Finally, all three forms include Impairment items, which ask the rater whether the problems result in significant impairment in functioning across settings such as home, school, and social settings. The Conners CBRS system also includes a short form of sorts called the Conners Clinical Index (Conners CI). The Conners CI form is meant to be used as a screener, or rough indicator that suggests the child is similar to other children with a diagnosis. The Conners CBRS and Conners CI can be administered by raters writing directly on the response booklets, or they can be administered online. Items may be read aloud if reading comprehension is a concern, and specific procedures are provided for the examiner to follow in this situation. Across all forms of the Conners CBRS, response options for each item are as follows: 0 = Not true at all (Never, Seldom), 1 = Just a little true (Occasionally), 2 = Pretty much true (Often, Quite a bit), and 3 = Very much true (Very often, Very frequently). Respondents are asked to consider behaviors observed over the past month. Thus, teachers completing the form should have at least 1 month in which to become familiar with the child and his or her behaviors. Both the Conners CBRS and Conners CI can be scored either with computer software or online. The Conners CI also can be scored by hand using the QuikScore form; the Conners CBRS cannot be scored by hand. If the hard copy version of the Conners CBRS is used, scoring is completed by the clinician typing the rater's responses into the computer scoring program. Raw scores for the Conners CBRS and Conners CI scales are converted to T-scores for normative interpretation. Both software and online scoring result in a narrative report inclusive of interpretive guidelines. Overall, the manual provides adequate instruction in administration and scoring procedures. DEVELOPMENT. Conners CBRS development is described as taking place over three phases: initial planning, the pilot study, and the normative study. The initial planning phase involved developing items based on reviews of the assessment and psychopathology literature. Given the purpose of the Conners CBRS, the DSM classification system served as an important basis of item development, and the Symptom scales were rationally derived from the DSM-IV-TR. The initial Conners CBRS item pool was reviewed and reduced by the development team before the pilot study was conducted; items were revised or eliminated based on their ability to be translated into Spanish, appropriateness across cultural groups, clarity, and clinical importance. The pilot study phase involved administering the initial item pool to different samples of adults and children, including samples from the general population and various clinical samples. The sample sizes for the pilot study were 232 for the parent form, 271 for the teacher form, and 249 for the self-report. Exploratory factor analysis and other statistical procedures (e.g., coefficient alpha, item-total correlations, item discrimination indices) were used to gain an understanding of item groupings, and to determine which items to retain and which to eliminate. Expert review also was used to evaluate the items for clarity, clinical importance, and cultural sensitivity. The normative study phase involved administering the revised set of Conners CBRS items to the standardization sample, and then using these responses to develop final versions of scales and subscales, confirm results from the exploratory factor analyses, establish scoring criteria and cutoffs, and evaluate psychometric properties. Several items were removed from each of the Conners CBRS forms before publication of the final version of the Conners CBRS. Reading levels of the Conners CBRS items were found to range from Grade 3.5 (self-report) to 5.9 (teacher report). TECHNICAL. Standardization. The norms are based on ratings from 3,400 people, including 1,200 parents, 1,200 teachers, and 1,000 children. The normative sample was taken from a larger sample (4,626) of gathered data, so that at each year of age there would be 50 males and 50 females included who were representative of the United States population in terms of ethnicity. The clinical sample included 704 parents, 672 teachers, and 700 children. The normative and clinical samples are described in great detail in the test manual. The ethnic distributions of participants in the normative sample are generally similar to the United States population for Asian, African American, Hispanic, and Caucasian children. With regard to geographic distribution, the Western states are less represented than the Northeast, Midwest, and Southern states. For the purpose of score conversions, the norms are divided by age and gender due to numerous statistically significant age and gender effects found in the normative data (although most effect sizes were relatively small). Statistically significant effects for ethnicity and parent education level were found for many scales, but again, most effect sizes were small. Reliability. Score reliability was assessed with internal consistency, test-retest reliability, and interrater reliability analyses. With regard to internal consistency, most alpha coefficients for the Content and Symptom scales were above .70, and many were above .90. An exception is the Asperger's Disorder scale, which fell below .70 for some age and gender subgroups. Alpha coefficients for the Positive Impression and Negative Impression scales were below .70 for many subgroups (even as low as .28), but these lower coefficients are likely due to a lack of variability on the items on these scales. Internal consistency analyses are broken down by gender and age, and coefficients are generally similar across these subgroups. However, this reviewer would like to see evidence in the manual that alpha coefficients are similar across ethnic groups. Test-retest analyses were conducted with a sample of 84 parents, 136 teachers, and 75 children. The test-retest interval was 2 to 4 weeks. All coefficients were statistically significant at p < .001, and adjusted coefficients indicate acceptable stability across administrations with ranges as follows: Parent Content .70 to .96, Parent Symptom .66 to .95, Teacher Content .80 to .96, Teacher Symptom .76 to .94, Self-Report Content .58 to .82, and Self-Report Symptom .56 to .76. Interrater reliability was assessed with 199 pairs of parents and 130 pairs of teachers. Within these pairs, the two parents or two teachers provided ratings of the same child, in order to determine level of similarity across two independent raters. Corrected correlation coefficients indicate moderate to high levels of agreement: Parent Content .62 to .89, Parent Symptom .53 to .84, Teacher Content .50 to .89, and Teacher Symptom .53 to .80. Validity. The test manual provides an abundance of information about validity evidence; only the highlights will be presented here. The development team evaluated validity from several angles, including factorial validity, convergent and divergent validity, and discriminative (or criterion-related) validity. Factorial analyses indicated that the Conners CBRS items grouped into theoretically supported factors. The test manual describes the process of using exploratory factor analyses to establish the factor structure of the parent, teacher, and self-report forms, and then using confirmatory factor analysis to test and confirm the structure. Confirmatory fit indices for the parent form generally suggest good model fit; indices for the teacher and self-report forms were somewhat lower than desired. Convergent and divergent analyses indicated that scores on the Conners CBRS generally correlated with scores on other measures of psychopathology in theoretically expected ways (i.e., stronger correlations with measures of similar constructs, weaker correlations with measures of dissimilar constructs). The other measures used in these analyses included the Behavior Assessment System for Children-Second Edition (BASC-2; Reynolds & Kamphaus, 2004), Achenbach System of Empirically Based Assessment (ASEBA; Achenbach & Rescorla, 2001), Children's Depression Inventory (CDI; Kovacs, 2003), and several additional measures. As an example, scores on the Major Depressive Episode scale of the Conners CBRS were correlated with scores on other scales as follows: BASC-2 Depression scale .38 to .71, ASEBA Anxious/Depressed scale .43 to .83, and CDI self-report Total Score .55 (p < .01 for all). Similarly, Conners CBRS scores were generally correlated with the Adaptive scales of the BASC-2 (e.g., Adaptability, Leadership, Social Skills) in a negative direction, which makes sense given the nature of these scales. The overall pattern of correlations with scores on other measures provides adequate evidence of construct validity. Finally, discriminative analyses indicated that Conners CBRS scores were able to differentiate children in various clinical groups from those in the general normative sample. For example, children in the Disruptive Behavior Disorders clinical group scored significantly higher on the Defiant/Aggressive Behaviors, Violence Potential Indicator, Conduct Disorder, and Oppositional Defiant Disorder scales than children from both the general population and other clinical groups. Similarly, children in the Pervasive Developmental Disorders clinical group scored significantly higher on the Social Problems, Perfectionistic and Compulsive Behaviors, Autistic Disorder, and Asperger's Disorder scales as compared to children in other groups. Conversely, the ADHD Inattentive scale was not successful at differentiating ADHD subtypes; this was true for the parent, teacher, and self-report forms. Overall, mean correct classification rates based on Conners CBRS scores were as follows: parent = 78.40%, teacher = 81.22%, self-report = 75.25%. The test manual also includes classification statistics for sensitivity, specificity, positive predictive power, negative predictive power, false positive rate, false negative rate, and kappa. COMMENTARY. It is clear that much care and thought were invested in the development of the Conners CBRS. The test manual is among the best this reviewer has seen in terms of level of detail regarding psychometric issues such as norms, reliability, and validity. The test manual also describes how different scales are tied to DSM and IDEA criteria, thereby facilitating interpretation for clinicians. The chapters on interpretation and intervention discuss these issues in more detail than is often found in test manuals. Tables also are provided for users to determine whether pre-post changes in Conners CBRS scores are statistically significant. This reviewer took the Conners CBRS self-report as a 16-year-old male, responding with a "2" to all items, and then entered these responses into the computer scoring program. As expected, this response pattern resulted in "Very Elevated" T-scores for all Content and Symptom scales (T-scores ranging from 79 to 90), yet the Negative Impression Validity scale was in the normal range. Thus, it is possible for raters to give 2-point responses to all items (thereby endorsing multiple symptoms to some degree) without triggering the Validity scales because extreme responses were not given to any of the Negative Impression items. This observation makes sense given the procedures and cutoffs used in developing the Validity scales, but clinicians should be aware of the potential for these scales to "miss" some response patterns. The tables and handouts provided in the software-based interpretive report make a large amount of data more manageable. SEM and percentiles are optional, providing more interpretive information. The computer program makes scoring fast and easy, and the program includes a double-entry option to catch data entry errors. An additional strength is the similarity of items and scales across parent, teacher, and self-report forms, allowing the clinician to look at consistency across informants and determine possible setting effects. Although not described in detail in this review, the psychometric properties of the Conners CI and Spanish forms were similar to those of the full Conners CBRS, but validity analyses were not reported for the Spanish forms. Users should also consider some of the weaknesses of the Conners CBRS. For example, on the teacher form, it seems like the Upsetting Thoughts/Physical Symptoms subscale would be more useful if it was split into two scales to facilitate interpretation; in its current form, an elevated score will require consideration of individual responses to determine whether the elevation is due to endorsement of items assessing upsetting thoughts or physical symptoms, or both. Similarly, the Emotional Distress scale combines symptoms of anxiety and depression. With regard to the validity studies, this reviewer would like to see more explanation for why some of the confirmatory fit indices were lower than expected. Further, not many ethnic minorities were included in the construct validation samples, as compared to Whites. Finally, the Conners CBRS does not include scales specifically designed to detect possible psychosis or thought disturbances; clinicians seeking to assess these issues should consider additional measures. SUMMARY. Alternative omnibus instruments that include self-, teacher-, and parent-report components include the BASC-2 and ASEBA systems. Although similar, the Conners CBRS is unique in terms of some of the constructs included and in the comprehensiveness of the DSM Symptom scales. One of the goals behind the development of the Conners CBRS was to provide clinicians with a measure that is more diagnostically useful than similar rating scales. More research and clinical use will be necessary to determine whether this goal has been realized, and to determine whether the Conners CBRS contributes to developing appropriate interventions. The Conners CBRS appears to be a high-quality option when comprehensive information is needed about behavioral and psychological functioning from multiple informants. The low to moderate correlations across different informants suggest the importance of gathering ratings from multiple sources, as each will likely provide unique pieces of information about the child. REVIEWER'S REFERENCES Achenbach, T. M., & Rescorla, L. A. (2001). Manual for ASEBA School-Age Forms & Profiles. Burlington, VT: University of Vermont, Research Center for Children, Youth, & Families. Kovacs, M. (2003). Children's Depression Inventory technical manual update. Toronto, Ontario, Canada: Multi-Health Systems. Reynolds, C. R., & Kamphaus, R. W. (2004). Behavior Assessment System for Children-Second Edition manual. Circle Pines, MN: American Guidance Service. Review of the Conners Comprehensive Behavior Rating Scales by JOHN J. VACCA, Assistant Professor of Early Childhood Education, St. Joseph's University, Philadelphia, PA: DESCRIPTION. The early childhood field is confronted now more than ever with a surge in populations of younger children with behavioral and emotional difficulties. Many of these children often go undiagnosed or are misdiagnosed, especially because the presenting behaviors are not completely understood within the context of the environments in which these children are raised. Furthermore, the cultural diversity of families and the ways rituals, routines, and practices are carried out in homes across the United States are factors that researchers continue to see as the primary contributors to how well children cope and manage stress. Therefore, these factors need to be included in any measure of socialemotional functioning. "The Conners Comprehensive Behavior Rating Scales (Conners CONNERS CBRS) is a comprehensive assessment tool, which assesses a wide range of behavioral, emotional, social, and academic concerns and disorders in children and adolescents" (manual, p. 1). The format of the tool provides a forum for input about a child's behavior from multiple people across multiple settings. Rating forms for teachers and parents are provided for children and adolescents (ages 6 to 18 years). Additionally, self-report forms are available for individuals from 8 to 18. A comprehensive manual, score sheets, record booklets, and a quick reference guide for interpretatioins and interventions are all provided to users. Detailed instructions for administration, scoring, and interpretation (including supporting information for making diagnoses/dual-diagnoses and designing interventions) are provided to the user. The test author stipulates that the sole use of the scales to make a diagnosis or determine eligibility for specialized support is not only inappropriate but also unethical. His stipulation is upheld not only in federal law but also in research focusing on best practices in the assessment and evaluation of children, adolescents, and adults across developmental levels and concerns. User qualifications allow for administration and scoring by individuals without specialized training; however, only those professionals with advanced levels of training in the mental health/psychiatric fields are considered qualified to interpret and report findings. DEVELOPMENT. The process for the development of the Conners CBRS was complex and involved three phases: initial planning, pilot study, and normative study. During the initial stage, an in-depth review of research, theories, legislative initiatives, and public policies was done. Focus groups were then formed and information was gathered reflecting public opinion from pediatric, education, and related professionals about issues involving youth and social-emotional functioning. Overall, the development of the Conners CBRS represented a multifaceted approach that occurred over the course of 4 years and involved more than 7,000 field testing activities across regions and stratified demographics in the United States. Ratings from teachers, parents, and students were gathered over the course of the development of the instrument in multiple settings and contexts. Extensive factor analyses were implemented to render the current scale, and behavioral dimensions that reflected judgments about an individual's behavior were examined. The section on development is exceptional, and provides the user with an in-depth background on not only how the Conners CBRS was established but also on the critical integrity with which the instrument was developed. The development section for any assessment instrument is one that should be identified and addressed comprehensively for effective appraisals to be completed by experts in the field such as the appraisals published in the Mental Measurements Yearbook. This information is also important to address because it demonstrates the integrity of the instrument, the manner in which the factors that it is purported to measure are addressed, and finally, the extent to which the targeted population was included in the inception and field-testing process. Many test authors fall short in discussing with users the process of development for the given instrument and instead stress its purpose and usefulness. The Conners CBRS represents a model for how assessment instruments of any kind should be organized, developed, and implemented. TECHNICAL. The technical aspects of the Conners CBRS are remarkable and a model for an instrument of this nature. Conners reports that over 6,000 assessments were collected and analyzed. Information from a majority of these assessments came from multiple informants. Specifically, for any given child for whom assessment data were collected, at least two different individuals provided ratings (e.g., teacher and parent). The normative sample was extensive (n = 3,400) and stratified to reflect the heterogeneity of the U.S. population. Within the norming sample, the test author identifies that 50 boys and 50 girls across the age range of 6 to 18 years participated. A smaller subset of the overall normative population (n = 1,616) included individuals who received ratings from other similar measures. This process was used to evaluate the concurrent validity of the Conners CBRS and the ability of the data to support that collected in a typical psychiatric assessment battery. Extensive measures were undertaken to establish both the reliability and validity of the Conners CBRS. In terms of reliability, multiple measures were completed. Mean alpha coefficients for internal consistency, test-retest, and interrater reliability ranged from .65 (test-retest) to .90 (internal consistency). Generally, alpha coefficients of .70 and above are acceptable. The user should recognize that low numbers of items on any scale can contribute to lower coefficients. The advice in these cases is to compare such results with the rest of the results of the test. All CONNERS CBRS alpha coefficients that fall below .70 reflect this phenomenon and are therefore not clinically significant. The measure of internal consistency is an important variable of reliability to examine because it is a reflection of how well all of the items on a given scale not only relate to one another but how they uphold the construct being measured. Other variables that are equally important to evaluate are testretest and interrater reliability, and the values reported for the Conners CBRS are well within the acceptable range and in some cases above the median level for test construction. This means that any user can expect consistent results across evaluations of individuals. Finally, the fact that Conners recruited the participation of multiple persons for each evaluation of an individual, the values mirror what can be reasonably anticipated when groups evaluate students with the Conners CBRS. An important issue that needs to be mentioned is the remarkable consistency with which the Conners CBRS aligns with social-emotional disabilities as well as school-related difficulties. Coefficients are well above the acceptable levels. For example, alpha coefficients for the parent form were determined to be at .95 for general Emotional Distress, .94 for Academic Difficulties, .93 for ADHD Inattentive, .90 for Oppositional Defiant Disorder, and .78 for Autistic Disorder. With measures of validity, the following domains were examined: Factorial, Across-Informant, Convergent/Divergent, and Discriminative. A total of six strongly loaded factors within the instrument were identified and then assessed: Emotional Distress, Aggressive Behavior, Academic Difficulties, Hyperactivity, Perfectionistic and Compulsive, and Social Problems. Factor analyses indicated a stronger model and fit for the parent scale of the Conners CBRS and a slightly lower level of fit for the student and teacher forms. Neither was clinically significant, however, thereby supporting full use of the three scales. Values for across-informant correlatioins were low to moderate in scale (.29 to .67). Although some coefficients for the Conners CBRS-P and Conners CBRS-T are quite low, the user must remember that it is reasonable for individuals' judgments about behavior for a given student to vary from one individual to another. This supports the critical nature of gathering assessment information from multiple sources. Coefficients for convergent/divergent validity were moderately strong when compared against the DSMIV-TR. Finally, scores for discriminative validity (which is the ability to isolate those behaviors/items that truly reflect those that are manifest in individuals with known diagnoses) indicated that the Conners CBRS reflected close to 80% of the diagnostic classifications published in the DSM-IV-TR. COMMENTARY. The work of Professor Conners on his latest scale represents a significant contribution to the field of education and developmental pediatrics. Given the limited resources available to educators and related professionals, the Conners CBRS provides a forum for multiple persons to collaborate on concerns about a given student. The strongly established psychometric properties of the tool clearly demonstrate the capability to assist professionals in determining diagnoses and etiologies of emotional difficulties, developing appropriate program goals, and identifying teaching and therapeutic interventions to support students in their efforts to cope with the environments in which they participate (including home, school, and beyond). SUMMARY. The Conners CBRS "is a comprehensive assessment tool, which assesses a wide range of behavioral, emotional, social, and academic concerns and disorders in children and adolescents" (manual, p. 1). The format of the measure provides a forum for input about a child's behavior from multiple people across multiple settings. Given the rise in populations of children and adolescents experiencing emotional distress, teachers, parents, professionals, and students themselves are in need of innovative ways to help support learning and development. The Conners CBRS represents a valuable contribution to the field and provides a critical piece to any team assessment of a student.
Hamilton Depression Inventory By: Reynolds, William M., Kobak, Kenneth A, 19950101, Vol. 13 Mental Measurements Yearbook Review of the Hamilton Depression Inventory by EPHREM FERNANDEZ, Associate Professor of Clinical Psychology, Southern Methodist University, Dallas, TX: The Hamilton Depression Inventory (HDI) is a recent successor to the venerable Hamilton Depression Rating Scale (HDRS; Hamilton, 1960, 1967), a semistructured clinical interview. The adoption of a paperand-pencil forced-choice format in the HDI has resulted in greater ease of administration and better standardization of scoring than in the HDRS. More substantively, the HDI has kept abreast of the latest revisions in DSM criteria for depression. Although the original 17 items of the HDRS are still embedded in the HDI, new items have also been included to expand the breadth of assessment of depression. A short form of the instrument (the HDI-SF) is also available but is recommended only for preliminary screening purposes. Instead of a taxonomic approach to assess subtypes of depression, it is repeatedly emphasized that the HDI aims to quantify severity of depression. However, it must be borne in mind that the qualitative features of depression are not always separable from the magnitude of depression. The greater the intensity of subjective dysphoria, the more likely that somatic and behavioral symptoms will arise. To its merit, the HDI assesses more symptoms relevant to depression than does the Beck Depression Inventory (BDI; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961). One of these is hypersomnia, which is a conspicuous omission from the BDI. However, some of the phenomena assessed (e.g., indecisiveness, indigestion, and the host of sympathetic nervous system reactions) are hardly specific to depression. On the other hand, there is one item that, although not scored, explores the subject's level of insight into his/her condition; this is Item 16, which enquires about the subject's causal attribution of depression. What particularly sets the HDI apart from other inventories of the same genre is the use of multiple questions (probes) to assess individual features of depression. Weight loss, for instance, is assessed using three separate questions. Thus, there are 38 questions tapping into 23 different symptoms. Wherever applicable, symptoms are further assessed using multiple parameters of intensity, frequency, and duration/latency. For example, dysphoria is probed by five separate questions, two of which pertain to subjective intensity and estimated frequency, respectively; insomnia is probed by questions pertaining to the frequency and latency of sleep onset, frequency of sleep disruption, lagtime before sleep resumption, and frequency and latency of premature sleep termination. All this boosts the confidence level in symptom identification. There are no negatively keyed items. However, there are pairs of "opposite" items (e.g., hypersomnia versus insomnia) which, if endorsed as part of a response set, would raise questions about the validity of the data. Furthermore, some items, if answered in the negative, require the test-taker to skip the next question or next few questions; random or uncritical response patterns could thus be detected. Ascertaining the more pernicious problem of motivated responding, however, will require additional tests like the Marlowe-Crowne Scale of Social Desirability (Crown & Marlowe, 1960). Factor analysis of the HDI has revealed a four-factor solution (five-factor for psychiatric sample) accounting for 55% of the total score variance, the bulk of this variance being explained by one factor representing depressed mood and demoralization. This is consistent with earlier results on the HDRS using a multidimensional normal item response theory model (Gibbons, Clark, & Kupfer, 1991). The lack of variance accounted for by vegetative and other symptoms of depression has raised some concern about the construct validity of the HDI. As with the HDRS, the inclusion of several somatic symptoms that are not definitive of depression may well leave the HDI vulnerable to the confounding effects of multimorbidity (Fleck, Poirier-Littre, Guelfi, Bourdel, & Loo, 1995; Linden, Borchelt, Barnow, & Geiselmann, 1995). Other psychometric properties of the HDI are good. Criterion-related validity has been demonstrated using the HDRS because a comprehensive clinical interview of depression is regarded as the gold standard in depression assessment. Given a range from 0-73, a raw score of 19 maximizes the hit rate (98.2%), sensitivity (99.3%), and specificity (95.9%). The 1-week retest reliability is .95, and internal consistency is .89. Although these reliability coefficients are often regarded as impressive, Boyle (1985) points out that tests of depression should have situational sensitivity (and, therefore, moderate retest reliability) that can be ascertained by administering the test under known levels of depressive induction; he further reminds us of the Catellian view that item homogeneity should not be excessive, preferably no more than .7 because high internal consistency may be an indication that the test is too narrow in scope. It should also be noted that because the HDI has multiple questions per symptom, it would be interesting to know the internal consistency of these "subtests." However, the authors of the HDI have indicated that their objective was not to develop subscales at the time of test development. In summary, the HDI is a multifaceted test that represents an advance upon its precursor, the HDRS, in terms of conceptualization and procedure. As far as psychometric properties are concerned, there is some concern about its construct validity due to the inclusion of several symptoms that are associated, but not defining, characteristics of depression. However, this is a problem that applies to several other tests of depression, too. Among the triumphs of the HDI over other competing measures are its use of multiple probes and parameters that allow greater confidence in the detection of symptoms and measurement of depressive severity. REVIEWER'S REFERENCES Crowne, D. P., & Marlowe, D. (1960). A new scale of social desirability independent of psychopathology. Journal of Consulting Psychology, 24, 349-354. Hamilton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry, 23, 56-62. Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571. Hamilton, M. (1967). Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychology, 6, 278-296. Boyle, G. J. (1985). Self-report measures of depression: Some psychometric considerations. British Journal of Clinical Psychology, 24, 45-59. Gibbons, R. D., Clark, D. C., & Kupfer, D. (1993). Exactly what does the Hamilton Depression Rating Scale measure? Journal of Psychiatric Research, 27, 259-273. Fleck, M. P., Poirier-Littre, M. F., Guelfi, J. D., Bourdel, M. C., & Loo, H. (1995). Factorial structure of the 17-item Hamilton Depression Rating Scale. Acta Psychiatrica Scandinavica, 92, 168-172. Linden, M., Borchelt, M., Barnow, S., & Geiselmann, B. (1995). The impact of somatic morbidity on the Hamilton Depression Rating Scale in the very old. Acta Psychiatrica Scandinavica, 92, 150-154. Review of the Hamilton Depression Inventory by CARL ISENHART, Administrative Coordinator, Addictive Disorders Section, Department of Veterans Affairs Medical Center, Minneapolis, MN: The Hamilton Depression Inventory (HDI) is a self-report adaptation of the Hamilton Depression Rating Scale (HDRS; a 17-item structured interview). The HDI assesses the severity and range of depressive symptoms but is not designed to make DSM-IV depressive disorder diagnoses. The HDI self-report format was developed to reduce rater variance and eliminate the time and costs associated with training clinicians to complete the HDRS structured interview format. Also, contemporary depression concepts were incorporated and "fidelity" increased by assessing select depressive domains with multiple questions. Consequently, the HDI consists of 38 questions that assess 23 areas of functioning. Additional versions include the HDI-17, a 17-item scale that parallels the 17-item HDRS; the HDI-SF, a 9item short form; and the HDI-Mel, a 9-item subscale that assesses melancholia. The HDI-17 and HDI-Mel are calculated from the HDI protocol. The HDI instruments were designed for clinical (inpatients and outpatients) and nonclinical (college students and job applicants) adults between the ages of 18 through 89. They require a fifth grade reading level and should not be used with individuals with developmental disabilities. The HDI takes 10 to 15 minutes to complete and the HDI-SF takes 5 minutes or less. Available HDI and HDI-SF software administers and scores protocols and produces interpretive reports. The software and narrative reports are not addressed here. ADMINISTRATION AND SCORING. Patients are provided with the HDI test booklet and a carbonless twopage answer sheet. Their responses are transferred to the bottom sheet that contains recommended cutoff scores and specific scoring instructions for the HDI, the HDI-17, and the HDI-Mel. Space is provided to record the patient's T score and percentile. This sheet also contains a section to record the patient's scores on DSM-IV depression items. In addition, seven "critical" items (e.g., suicidal ideation) are listed for which space is provided to assess each symptom's nature and intensity during a clinical interview. The scoring process is clearly described and the score sheet is well designed and easy to follow. The manual provides examples of scored protocols as guides. The HDI-SF was developed from items that best discriminated between depressed patients, patients with other psychiatric disorders, and a community sample that yields high item-total-scale correlations. The HDI-SF assesses six of nine DSM-IV depression criteria. Interestingly, the three items not assessed were the "vegetative" symptoms: insomnia/hypersomnia, psychomotor agitation/retardation, and weight loss/appetite. Patients provide answers on a two-page carbonless sheet on which the nine items are printed. The answers are transferred to the bottom sheet that contains scoring instructions, the recommended cutoff score, and space for the raw score, T score, and percentile. The authors stress reviewing the protocol for irregular scoring patterns (e.g., blank items and alternating responses) and examining item pairs with opposite content (e.g., Item 4 [insomnia] with Item 18 [hypersomnia]) for inconsistencies. One enhancement would be to add these contrasting item pairs to the score sheet to make this comparison a routine part of the scoring process (much like the "VRIN" scale on the MMPI-2). They recommend that protocols with five or more missing items be invalidated. Protocols with four or less blank items can be prorated to obtain an estimated full score. However, there were no reports of the accuracy of these estimates. The manual provides interpretative information and case examples. The authors recommend that inconsistencies be assessed in a clinical interview and stress that clinical reports and collateral information be included when making interpretations. NORMS. The authors provide demographic information (age, sex, and race) and HDI scores on 510 community adults, 98 college undergraduates, and 313 psychiatric outpatients evaluated before treatment. The authors reported some statistically significant HDI score differences across age and sex; however, these differences were not clinically significant. The norms for the HDI, HDI-17, and HDI-SF were generated from the community sample and reported for the total sample and by sex. The norms are reported by sex even though the sex differences were not clinically significant because of authorcited research documenting sex differences in self-report depression instruments. The community sample was also used to calculate T scores and percentiles for each of these scales, including the HDIMel, which are also reported by sex. RELIABILITY. The reliability of the HDI, HDI-17, HDI-SF, and HDI-Mel scales was assessed with internal consistency and test-retest correlations, item analysis, and calculation of standard errors of measurement. The internal consistency measures ranged from .805 to .936 in the community sample and from .724 to .905 for the psychiatric sample. One goal in the development of the HDI was to increase its fidelity over other instruments. These internal consistency measures suggest that this goal was met, but perhaps too much so. These measures suggest a high level of redundancy, even for the short form (alpha = .919 to .930), and that fewer items likely could be used to assess the same domains. The test-retest reliability measures were assessed over a 1-week interval (mean = 6.2 days and range = 2 to 9 days) using 189 subjects from the community and psychiatric samples and ranged from .918 to .958. Corrected item-total scale correlations ranged from .23 to .86. The calculated standard errors of measurement were within 2 to 3 raw score points; these data allow the test user to develop confidence intervals for obtained scales. VALIDITY. The authors reported content, concurrent criterion-related, construct (convergent, divergent, and factorial), and contrasted groups validity estimates, and cutoff scores and measures of specificity, sensitivity, and diagnostic accuracy. Item review demonstrated that HDI content is inclusive of DSM-IV concepts of depression and supported the content validity. The criterion-related validity compared the HDI with the HDRS in 403 subjects, community and psychiatric outpatients. The correlations between the HDRS and the four HDI scales, for the total sample and by sex, ranged from .905 to .951. The authors also reported correlations ranging from .26 to .89 (with most correlations in the .70s) between HDI-17 and HDRS items. However, the reported high correlations would be expected given the HDI was developed from the HDRS. Therefore, these concurrent validity results could be strengthened by using additional depression measures. HDI scores were correlated with the Beck Depression Inventory (BDI), the Beck Hopelessness Scale, the Beck Anxiety Inventory, the Adult Suicidal Ideation Questionnaire, the Rosenberg Self-Esteem Scale, and the Marlowe-Crowne Social Desirability Scale--Short Form to demonstrate convergent construct validity. The HDI scales demonstrated highly statistically significant correlations for the total sample and by sex with these other measures, especially the BDI (all rs >= .89). All correlations were positive except for correlations with the self-esteem and the social desirability. In addition, multiple regression analysis demonstrated that the HDI scales assess depression in particular and not emotional distress in general: the HDI, HDI-17, and HDI-Mel shared the most variance with the BDI and less (though still significant) variance with the other measures. Additional multiple regression analyses used the same independent and dependent variables except that the HDRS replaced the BDI, and similar results were found. Findings of statistically significant negative correlations between the HDI scales and social desirability were used to demonstrate discriminant validity. However, it is reasonable to assume that depressed individuals may both present themselves in a socially desirable fashion and because of associated low self-esteem, be self-deprecatory. As a result, they may produce low social desirability scores. Consequently, social desirability may be a concept related to depression, and it would be expected that high depression would be associated with low social desirability. It would have been interesting to see the correlations between self-esteem and social desirability and to have included social desirability in the multiple regression. Clearly, other unrelated concepts need to be used to assess discriminant validity. Factor analytic study of the community sample demonstrated that the HDI's factors are consistent with their model of depression: depressed mood, sleep difficulties, somatic-vegetative problems (e.g., weight loss), and agitation-disorientation. Similar findings were obtained using the psychiatric sample. Measures of sensitivity, specificity, positive predictive power, negative predictive power, hit rate, and calculations of chi-square, phi, and kappa coefficients support the HDI's accuracy in identifying who of 140 patients met DSM-III-R diagnostic criteria for major depression. The authors provide a range of potential cutoff scores for each HDI scale. Most of the hit rates for the HDI scales were in the middle to upper .90s, all chi-squares were significant, and most of the phi and kappa coefficients were in the .80s and .90s. A contrasted groups procedure found statistically significant mean HDI scale scores between three groups: The depressed group consistently scored higher than the "other psychiatric disorder" group, which consistently scored higher than the community sample. This contrasted group procedure was repeated for individual HDI and HDI-SF items with almost identical results. SUMMARY. The HDI scales quickly and efficiently assess depressive symptomatology. The administration and scoring procedures are straightforward and easy to follow. However, this reviewer is not convinced that increased fidelity (i.e., the use of multiple questions) justifies this instrument being used over others. Redundancy was demonstrated by high internal consistency measures and by the HDI-SF's psychometrics being comparable to the full scale. However, the scales have other strengths that may justify its use over other instruments. The authors conscientiously used multiple procedures to assess and report reliability and validity data. However, other measures of depression and more divergent concepts than the ones reported in the manual would add to the instrument's strength. Other recommendations for improvement include: better justification of why different areas are assessed by multiple items and some are not, incorporating the "validity check" of contrasting items, replications with researchers other than the authors, and clarification regarding the process by which the community sample was selected. Finally, the manual is readable and adequate enough to use the tests. However, in places it goes into more detail and justification than what is needed and it is repetitive. Even the authors seemed to notice the redundancy as the phrases "as noted through out this manual," "as noted earlier," and "as discussed earlier," occur frequently throughout the manual.
Novaco Anger Scale and Provocation Inventory By: Novaco, Raymond W, 20030101, Vol. 17 Mental Measurements Yearbook Review of the Novaco Anger Scale and Provocation Inventory by ALBERT BUGAJ, Professor of Psychology, University of Wisconsin-Marinette, Marinette, WI: DESCRIPTION. The Novaco Anger Scale and Provocation Inventory (NAS-PI), designed for individual assessment, outcome evaluation, and research purposes, consists of the 60-item Novaco Anger Scale (NAS) and the 25-item Provocation Inventory (PI). The NAS, which focuses on an individual's experience of anger, results in an overall scale score, and scores on Cognitive, Arousal, Behavioral, and Anger Regulation subscales. High NAS scores may indicate a need for clinical intervention, although they also may show an effort by the test-taker to "look tough." The Provocation Inventory (PI), intended to assess the types of situations that lead to anger in five content areas such as "disrespectful treatment," results in an overall scale score. Results of the PI can elicit discussions of settings that provoke strong anger in a client; discussion of items leading to low scores can result in an understanding of how the client utilizes effective coping skills. A trained technician can administer the NAS-PI in individual or group settings, although only individuals with clinical training in psychological testing should interpret the results. A paper-and-pencil test, NAS-PI responses are hand-scored, although the test manual indicates a computerized version is available. A formula developed by Barrett (2001), provided in the test manual, allows the values of missing responses to be estimated when three or fewer items are left incomplete. The test manual also includes a method for checking for inconsistent response patterns. DEVELOPMENT. A theory of anger devised by Novaco (1977) formed the basis of the preliminary set of 101 items for the NAS-PI. A sample of 171 undergraduate students responded to these items. Results of this test administration and interviews with 45 hospitalized patients led to the creation of a revised instrument containing 88 items. Two additional waves of testing produced the final instrument. TECHNICAL. The standardization sample of the NAS-PI consisted of 1,546 individuals (ages 9 to 84) from nonclinical settings. The manual indicates slight underrepresentation in the sample of males, individuals of minority ethnic backgrounds, and those with lower levels of education. Statistical examination of the scores (Cohen's d) indicated that although scores of men and women were comparable, scores of younger test-takers (ages 9 to 18) and adults (19 and older) differed significantly; the manual thus provides separate norms for the two age groups. The manual notes that African Americans' scores were higher than the average of the standardization sample on some scales, a result also found in other research (cited in the test manual) with African American and Hispanic samples. Individuals with lower educational levels also acquired scores departing from the average. The test manual suggests further research is necessary to uncover why these groups depart from the norm. The NAS and PI exhibit high levels of internal consistency. Alpha coefficient for the NAS total score for the standardization sample was .94. Alpha values ranged from .76 for the Anger Regulation subscale to a high of .89 for the Behavior subscale. Coefficient alpha for PI total score was .95, and ranged from .73 to .84 for its subscales. Median test-retest reliability over a 2-week period for a group of 27 individuals from the standardization sample was .78, ranging from .47 on the cognitive subscale to .82 on the PI total score. The test author acknowledges the small size of the sample. Higher test-retest reliabilities resulted from studies (utilizing larger sample sizes) of hospitalized inpatients in California and Scotland, and Canadian prison inmates. The test manual cites several studies that examined the concurrent validity of the NAS-PI using samples from clinical and correctional populations. For example, in a study involving 141 male and female psychiatric patients in California, the NAS total score was strongly correlated to the total score on the Buss-Durkee Hostility Inventory (r = .82); the Caprara Scales of Irritability (r = .78) and Rumination (r = .69), the Cook-Medley Hostility scale (r = .68), and the STAXI Trait Anger Scale (r = .84). With regard to the PI, the test author concludes that although the PI total score is closely related to other measures of anger, specific content areas of the PI "seem to have some selective relationships with other measures" that are difficult to interpret (manual, p. 38), and calls for further research into this area. More problematic, therapist ratings of severity of past and current offenses committed by a sample (n = 59) of juvenile delinquents in a residential treatment facility were for the most part unrelated to NAS-PI scores. However, the "Rated Anger Level" of 39 paroled sex offenders was related to NAS total scores (r = .42) and the Behavior subscale (r = .51). The Behavior subscale was also related to the therapists' ratings of the parolees' offensive history (r = .45), appropriateness for participation in anger management groups (r = .33), and parolees' current offense severity (r = .29). A number of intercorrelations within the NAS and PI subscales proved to be moderate to high in the standardization sample, with those between subscales of the NAS and PI generally lower than those within the two scales. Factor analysis of the NAS resulted in three factors each consisting of items from at least two subscales. Factor analysis of the PI resulted in five factors. Another factor analysis (n = 1,101 civil commitment patients) utilized a different set of Cognitive items, and did not include Regulation items. In one study of predictive validity, NAS (r = .46) and PI (r = .43) total scores and several subscales were found to be predictive of STAXI State Anger. In a retrospective analysis, the NAS total was predictive of hospitalized patients' number of convictions for violent crimes. In a study of 1,100 discharged patients, the NAS was predictive of violence during the first 20 weeks after discharge and at 1 year. Although several other studies are supportive of the NAS, homicidal patients in the standardization sample obtained higher Irritability scores than the "normal" standardization subsample, but reported higher Anger Regulation. Juvenile delinquents in the group reported poorer Anger Regulation, but problematically, less Anger Provocation. A number of groups (Homicide Perpetrators in Psychiatric Treatment, Incarcerated Sex Offenders, and Juvenile Delinquents in Residential Care) reported higher scores on the Crowne-Marlowe Social Desirability Scale than did other groups. Three studies reported in the test manual examined the NAS-PI as a measure of anger treatment outcomes with positive results. COMMENTARY. Numerous studies (only some of which are referred to in this review) attest to the concurrent validity, as well as the test-retest reliability, of the NAS-PI. However, as the test author intimates, the test is not without problems. One must question the suitability of the NAS-PI for use with minority populations, as data indicate African Americans and Hispanics score differently than the normative population. The same may be said for individuals with lower levels of education. Although the test author states that most people with a fourth-grade reading ability can read the test, there is no indication of an empirical examination of the readability of the items. One may thus question the use of the NAS-PI with younger or less educated populations. One must also wonder if lower levels of education or reading ability do not lead to the lower scores of younger populations. On the basis of the factor analysis reported in the test manual, the question arises as to whether the subscale structure of the test is correct. One would expect items related to particular subscales to load on the same factor. This was not the case in the reported factor analysis where items from two and sometimes three subscales loaded on the same factor. Although this outcome does not detract from the potential worth of the overall scale scores, further examination of the subscale structure of the test is suggested. The most problematic issue concerning the NAS-PI is the issue of social desirability. The test manual concludes with a suggestion that the test administrator obtain an estimate of response bias (for example, through use of the Crowne-Marlowe Social Desirability Scale) when using the NAS-PI in forensic settings. The manual also states that very high scores might indicate an effort to "look bad" (p. 13) or "look tough" (p. 15). If this is the case, it may be questioned why the test authors did not devise scales measuring social desirability and the need to "look tough" during the inception of the scales. None of this is to say the NAS-PI is not a useful test. A good deal of research has gone into determining the psychometric properties of the test, with regard to its validity and reliability. A sound and elaborate theory, which should be used to further refine the test's factor structure, forms the basis of the NAS-PI. SUMMARY. Used with proper caution, the NAS-PI should prove useful in assessing anger in clinical and forensic populations. The test possesses adequate reliability, and concurrent and predictive validity. Caution must be taken, however, when the test is used with minority, younger, or less educated populations. Test users also must be wary of social desirability effects and efforts by the test-taker to "look tough." REVIEWER'S REFERENCES Barrett, P. (2001). Prorating error in test scores. Retrieved on August 11, 2006, from http://www.liv.ac.uk~pbarrett/statistics_corner.htm Novaco, R. W. (1977). Stress inoculation: A cognitive therapy for anger and its application to a case of depression. Journal of Consulting and Clinical Psychology, 45, 600-608. Review of the Novaco Anger Scale and Provocation Inventory by GEOFFREY L. THORPE, Professor of Psychology, University of Maine, Orono, ME: DESCRIPTION. The Novaco Anger Scale and Provocation Inventory (NAS-PI) is a two-part self-report test consisting of 85 items. The NAS component assesses how an individual experiences anger, with questions like: "If someone bothers me, I react first and think later." Respondents rate each of the 60 items of the NAS on a 3-point scale (1 = Never true, 2 = Sometimes true, 3 = Always true), producing four subscale scores (Cognitive, Arousal, Behavior, and Anger Regulation) and a total score. The PI component describes situations that may lead to anger, such as: "Being accused of something that you didn't do." Respondents rate each of the 25 items of the PI on a 4-point scale to indicate how angry the situation described would make them feel (1 = Not at all angry, 2 = A little angry, 3 = Fairly angry, 4 = Very angry), producing a single total score. The NAS-PI was designed "to assess anger as a problem of psychological functioning and physical health and to assess therapeutic change" (manual, p. 1). The test author cites a broad range of stress-related health problems and mental disorders in which anger is prominent, and argues that the assessment of anger disposition and its modification is an important task for many healthcare professionals. The materials received from the test publisher consist of a 64-page manual and a package of NAS-PI profile sheets and AutoScore forms. The profile sheets, printed separately for Adolescent (age range 918) and Adult (ages 19 and over) respondents, present T-scores and percentiles corresponding with the various subscale and total score ranges. DEVELOPMENT. The NAS-PI components were developed separately. NAS items are clinically oriented and reflect the guiding theoretical orientation that anger comprises elements of cognition, arousal, and behavior, "linked by feedback and regulation mechanisms and embedded in an environmental context" (manual, p. 21). Items in the Cognitive domain represent the dimensions of justification, suspiciousness, rumination, and hostile attitude; in the Arousal domain, intensity, duration, somatic tension, and irritability; and in the Behavioral domain, impulsive reaction, verbal aggression, physical confrontation, and indirect expression. PI items were selected to assess the intensity of anger elicited by a variety of provocative situations in five content areas. These areas are: disrespectful treatment, unfairness, frustration, annoying traits of others, and irritations. An initial set of 101 test items for the NAS-PI was pilot-tested with 171 undergraduate students, who also completed a battery of tests that included other anger inventories. The set of items was reduced to the 88 with the best psychometric properties that also represented the most realistic match with the experiences of state hospital patients with severe anger problems. The revised instrument was administered to 142 psychiatric inpatients, a subset of whom provided test-retest reliability data. Further refinements followed, and the final form of the NAS-PI was re-assessed with similar samples of inpatients. A set of 16 item pairs was identified to serve as a rough index of consistent responding, the criterion being that each pair selected showed a minimal intercorrelation of .40 or greater (these correlations range from .42 to .66). TECHNICAL. The standardization sample consisted of 1,546 respondents, ranging in age from 9 to 84, from public schools, college classrooms, senior centers, religious organizations, and other community settings. In addition to a table of raw score means and standard deviations for the subscales and total scores in the entire standardization sample, the test author (Novaco) provides tabulations of the sample's demographic characteristics (gender, age, ethnic background, socioeconomic status, and geographic region), indicating a fairly close match with data from the U.S. Census of 2000. Novaco presents T-scores for males and females, for racial/ethnic subgroups, for nine age ranges, and for five educational levels in the normative group. He argues that the use of T-scores makes effect sizes immediately apparent, so that T = 56 in one subgroup in comparison with T = 50 in another reveals a large effect size of .6 in standard deviation units. On that criterion, the observed statistically significant difference between males and females on the NAS behavior subscale (the only scale to produce a significant sex difference) is not clinically meaningful, as the corresponding effect size is only .31. The internal consistency (alpha) estimates for the standardization sample were very high for the total scores: .94 for the NAS and .95 for the PI. Alpha coefficients for the NAS subscales ranged from .76 to .89. Similar results were obtained for juveniles and adults, and for psychiatric inpatients in California and Scotland. Test-retest reliability estimates in a subsample of the standardization sample were .76 for the NAS and .82 for the PI. These estimates were drawn from a very small subset of 27 respondents who were tested 2 weeks apart. More compelling are the test-retest correlations of .84 (NAS) and .86 (PI) in 126 California state hospital inpatients with a 2-week intertest interval. The construct validity of the NAS was assessed by obtaining its correlations with the Buss-Durkee Hostility Inventory, the STAXI Trait Anger Scale, and two other anger scales, producing coefficients ranging from .69 to .84 in a sample of 141 inpatients. Data from inpatients in a high security forensic unit in Scotland showed a similar range of intercorrelations of the NAS-PI with other anger measures, but much lower correlations with the Beck Depression Inventory, attesting to the NAS-PI's discriminant validity. A study with 110 male inpatients in a forensic facility for the developmentally disabled compared NAS-PI scores with, among other measures, a ward behavior rating scale, yielding low correlations (e.g., .28 for the NAS total score and .34 for the NAS cognitive subscale). However, it seems likely that these correlation coefficients could have been limited by a truncated range of scores produced by the respondents in that setting. The NAS-PI correlates modestly with STAXI state anger scores obtained 2 months later, indicating a level of predictive validity for Novaco's instrument. The manual provides detailed information on factor analytic studies of the NAS-PI and on its sensitivity to anger treatment outcome. The test-retest reliability, parallel-form reliability, concurrent validity, and discriminant validity of the NAS were found to be satisfactory in 204 male offenders in Canada (Mills, Kroner, & Forth, 1998). This study and many like it exemplify the substantial professional literature that has developed in recent years from Novaco's research on the assessment of anger. COMMENTARY. The NAS-PI is a two-part self-report inventory of anger and its components that was designed for practicality and ease of use by respondents and examiners. The items were written at a fourth-grade reading level, and the test takes about 25 minutes to complete. Hand-scoring is straightforward and convenient. The test was developed and standardized with community, clinical, and forensic samples, and is offered as an assessment instrument for research, individual assessment, and outcome measurement. The NAS-PI was originally standardized with about 1,500 respondents who were broadly representative of the U.S. population. Many further studies have provided normative data from clinical and forensic populations in diverse geographic regions. The test manual's extensive tabulation of norms and detailed appraisal of the NAS-PI's psychometric properties confirm that the instrument is psychometrically sound and that its author has reached his intended goals. Readers familiar with Novaco's early work will recall the original Novaco Anger Scale from the 1970s that consisted of 80 anger provocation items similar to those in the current PI. The early scale was used in research by Novaco and others and was not commercially available. The development of the new NAS to reflect cognitive, emotional, behavioral, and self-regulatory dimensions of anger drawn from theory was a constructive move that has helped to advance the field. Retaining a list of potentially anger-arousing situations in the PI allows clinicians and researchers to continue to use assessment methodology similar to that of Novaco's original research on anger control. SUMMARY. The NAS-PI can be recommended as a convenient self-report assessment of anger and its principal dimensions to be used in community, clinical, and forensic settings, both as a snapshot index of current anger levels and as a barometer of progress and change. The instrument is extensively norm referenced and has satisfactory psychometric properties of internal consistency, test-retest reliability, and concurrent and predictive validity. Despite the strong internal consistency of the scales, the interitem correlations and factor-analytic work described in the manual appear to indicate that the NASPI scale items are not interchangeable and make different contributions to assessing the measured constructs. It would be interesting to see future researchers use the methodology of modern test theory to ascertain specific test items that are most informative in distinguishing respondents with varying levels of anger, and to aid in scaling items for the level of anger they typically represent. REVIEWER'S REFERENCE Mills, J. F., Kroner, D. G., & Forth, A. E. (1998). Novaco Anger Scale: Reliability and validity within an adult criminal sample. Assessment, 5, 237-248.
Piers-Harris Children's Self-Concept Scale, Second Edition (The Way I Feel About Myself) By: Piers, Ellen V., Herzberg, David S., Harris, Dale B, 20020101, Vol. 16 Mental Measurements Yearbook Review of the Piers-Harris Children's Self-Concept Scale, Second Edition (The Way I Feel About Myself) by MARY LOU KELLEY, Professor of Psychology, Louisiana State University, Baton Rouge, LA: DESCRIPTION. The Piers-Harris Children's Self-Concept Scale, Second Edition (Piers-Harris 2) is a 60-item questionnaire, subtitled The Way I Feel About Myself. The Piers-Harris 2 is a self-report measure for children and adolescents age 7-18. Respondents indicate whether each item is true or not true of them. The items are very easy to read and interpret. English and Spanish versions are available. The PiersHarris 2 is a revision of the previous, identically titled questionnaire and contains the same Self-Concept and Validity scales. Specifically, the scales consist of a total (TOT) score and six domain and two validity scales. The domain scales assess self-concept across a variety of areas including perceptions about school and intellectual functioning, appearance, and social acceptance. The self-concept scales are scored so that higher scores indicate more positive self-evaluation. The validity scales assess response bias, inconsistent responding, and exaggerated responding. Several methods of scoring are available in this revised edition. These include the AutoScoreTM, which is scored manually with the use of carbons, mail/fax-in forms, and a PC program that generates a report based on online or offline administration. Raw scores are converted to standard scores (normalized Tscores, percentile ranks). Scoring using the AutoScoreTM is very straightforward and easily accomplished by someone with minimal training. The authors warn, however, that interpretation and use of the scores should only be conducted by someone trained in psychological assessment. Norms are based on the total standardization sample and are not broken down according to respondent age, gender, or ethnicity. The authors state that ethnicity did not appear to be a moderating variable in any of their analyses. The technical manual is well written and clearly describes the measure and its development, scoring procedures, and psychometric studies. Instructions for calculating scores for the validity scales and high/low domain scores were quite clear and examples were provided. The authors presented several case examples illustrating how the scores were useful for screening and as a part of a psychological evaluation. The authors describe precisely how the current instrument differs from the previous version, which is important for those who use the instrument in research contexts. The various chapters are short yet adequately detailed. The technical manual describes various approaches to reducing respondent bias and provides suggested oral instructions as well as cautions against use of the word "test" when describing the scale to children. DEVELOPMENT. The Piers-Harris 2 is a revision of a measure originally developed in the early 1960s and revised in 1984. The current version is an improvement over earlier versions in three important ways. First, the items were updated and trimmed from 80 on the earlier version to 60. An item was eliminated if it contributed only to the total and not to domain score, contained outdated or gender-specific content, or required additional information to answer. Thus, the current version is more contemporary, less ambiguous, and less time-consuming to complete. Second and perhaps of most importance, the scale has been standardized on a national sample in contrast to the use of a homogeneous, rural sample collected in the 1960s. The current sample consists of 1,387 students aged 7 to 18 recruited from school systems across the nation. The use of a heterogeneous sample addresses one of the most significant limitations cited in a previous MMY review (Epstein, 1985). The third change in the Piers-Harris 2 cited by the authors is the availability of computer scoring. TECHNICAL. Descriptions of the restandardization were clear and adequately detailed. The authors assessed whether age, gender, socioeconomic status, and ethnicity produced different norms. The results indicated that significant differences in scores were not obtained for any of these variables in almost all cases, although the authors indicated that further research on potential moderating effects of demographics should be pursued. With regard to reliability, internal consistency estimates for the total and domain scores were adequate with almost all Cronbach alphas being above .70. Test-retest reliability studies were not conducted with the new standardization data but studies using the earlier scale were acceptable. The validity of scores from the Piers-Harris 2 was evaluated in a number of ways. The authors examined content validity using a judge's ratings as to whether the deleted items continue to be represented by the remaining items of each domain. The rating process was not described and thus, exactly what was judged is uncertain. Construct validity was assessed via factor analysis and generally supported the rationally generated domains. However, findings across studies are somewhat inconsistent according to the authors. The convergent validity of the Piers-Harris was assessed by evaluating self-concept scores with various other psychological measures. Generally, studies show that positive self-concept is inversely related to measures of psychological problems. Overall, studies suggest that the Piers-Harris has support for the validity of the instrument although the majority of studies were conducted with the original version of the scale. The authors aptly note that additional studies on the reliability and validity of the Piers-Harris 2 are needed. COMMENTARY. The Piers-Harris is one of the best if not the best questionnaire of its type, given the long history of research findings supporting the reliability and validity of the scale. It is very easy for children to use and probably is best used as a screening instrument as illustrated in the case examples described in the technical manual. SUMMARY. The Piers-Harris 2 is a revision of a previous scale that is well known and well researched. The revised scale is an improvement over the previous scale in several ways. The scale was renormed using a relatively large, demographically and geographically diverse sample. The scale was shortened, eliminating ambiguous, outdated, and psychometrically limited items. Finally, the scale can be computer scored. However, the scale retained many of the positive features of the original scale. The items are straightforward and require a yes or no endorsement. The questionnaire is easy to score and the manual is clear and straightforward. Numerous psychometric studies support the reliability and validity of the earlier version but additional studies are needed using the revised scale. REVIEWER'S REFERENCE Epstein, J. H. (1985). [Review of the Piers-Harris Children's Self-Concept Scale (The Way I Feel About Myself).] In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook, (pp. 1167-1169). Lincoln, NE: Buros Institute of Mental Measurements. Review of the Piers-Harris Children's Self-Concept Scale, Second Edition (The Way I Feel About Myself) by DONALD P. OSWALD, Associate Professor of Psychiatry and Psychology, Virginia Commonwealth University, Richmond, VA: DESCRIPTION. The Piers-Harris Children's Self-Concept Scale, Second Edition (The Way I Feel About Myself) (Piers-Harris 2) is a 60-item self-report scale designed for the assessment of self-concept in children and adolescents. The scale is designed for children ages 7 to 18 who are able to read and comprehend at a second grade level. The scale can be completed by most respondents in 10-15 minutes. The instrument may be administered in an individual or group format and can be administered and scored by teachers or paraprofessionals; however, interpretation of results should be provided by "a professional with appropriate training in psychological assessment" (manual, p. 4). The Piers-Harris 2 is a revision of The Piers-Harris Children's Self-Concept Scale, a self-concept scale originally constructed in the 1960s. The instrument has been shortened from the original 80 items to reduce administration time and to eliminate items that included outdated language or were of less value psychometrically. In most respects, however, the Piers-Harris 2 preserves the character and content of the original version. Like the original, the instrument yields a total score and six domain scale scores, originally constructed on the basis of a factor analytic study (Piers, 1963). The items included in the domain scales are virtually identical; two domain scales lost two items from the original instrument and one domain scale lost one item. One original domain scale, "Anxiety" has been renamed "Freedom from Anxiety" to more accurately reflect the direction of scores, and the scale originally called "Behavior" has been given the more descriptive name: "Behavioral Adjustment." The remaining domain scale names are unchanged (i.e., Intellectual and School Status, Physical Appearance and Attributes, Popularity, and Happiness and Satisfaction). The second edition of the instrument adds two validity scales (Inconsistent Responding and Response Bias) that are designed to detect random responding and excessive yeasaying or nay-saying. The instrument may be administered using several alternative formats depending on the preferred scoring method. Scoring alternatives include an AutoScore Form that can be scored manually by the examiner, mail-in response forms that are computer scored by the publisher, and a personal computer program that generates a report based on responses keyed in by the respondent or by the examiner from a paper record form. The Piers-Harris 2 includes a disk with a computerized scoring program that allows the user to complete and score two administrations of the instrument. The program requires activation via phone or the internet. The program is easily installed and produces a score report, including a table and graph of total and domain scores, and a narrative describing the scores. The Total score and all domain scale scores are calculated as T-scores (mean = 50; SD = 10) and the "normal" range is considered to be between 40 and 60. The profile sheet for the AutoScore form also provides percentile equivalents for all T-Scores. The manual provides a thorough discussion of the interpretation of validity and self-concept scores as well as a summary of some of the relevant literature on the psychometrics of the original Piers-Harris. An appendix provides an extensive bibliography of studies that involved the original instrument. TECHNICAL. The Piers-Harris 2 is essentially a restandardization of the instrument based on a national sample of 1,387 students recruited from school districts across the U.S. The sample was intended to match U.S. Census data but Hispanic/Latino students are substantially underrepresented as are students from the West region of the country. The sample is relatively evenly distributed across the age span of 7-18 years. Some new psychometric studies accompany this revision. Internal consistency data for the total scale and the domain scales are provided at each age level; alpha coefficients range from .60 to .93. Interscale correlations among the domain scales are modest, ranging from .30 to .69; some of the higher correlations between domains may be attributed in part to shared items. As indications of convergent validity, the authors investigated relationships between Piers-Harris 2 scores and scores on measures of anger, aggressive attitudes, symptoms of Post-Traumatic-Stress Disorder, and thoughts and attitudes related to obesity. However, no new test-retest reliability studies or concurrent validity studies involving other self-concept instruments are presented in the manual. The manual includes a thoughtful discussion of potential moderator variables that might affect scores on a test independently of the target construct, including age, sex, ethnicity, SES, geographic region, and intelligence/academic achievement. For each of these potential moderator variables (except intelligence/academic achievement, for which no data were available) the authors provide evidence of minimal moderating effects. As a consequence, all children's scores are derived from the same conversion table, regardless of age, sex, or ethnicity. COMMENTARY. One criterion for judging a revision of an existing instrument is the extent to which the revision addresses the weaknesses that have been identified in the literature. Criticism of the original Piers-Harris over the years has included considerable emphasis on the limited nature of the standardization sample. This weakness has been substantially addressed in the Piers-Harris 2. The present standardization sample is drawn from a far more representative pool, although the underrepresentation of Hispanic/Latino students is unfortunate in view of the recent growth of the Latino population in the U.S. and the anticipated continued expansion of this minority group. The original Piers-Harris was often criticized for the fact that the domain scales are not unique (i.e., there are many items that appear in more than one domain scale). The authors indicate that removing the overlapping items substantially lowered internal consistency of the domain scales and, as a result, they chose to retain them. The original instrument has also been criticized for inadequate test-retest reliability (Keith & Bracken, 1996); the failure to undertake new studies of test-retest reliability constitutes a weakness of the current revision. Another recurring criticism of the original Piers-Harris has been that the instrument is fundamentally unidimensional. The current domain scales were derived from a factor analysis of the original PiersHarris items; however, these domain scales have not been consistently replicated in succeeding factor analytic studies. Indeed, the literature is inconsistent regarding the factor structure of the scale to the point that the validity of the domain scales is called into question. The authors of the Piers-Harris 2 undertook a new factor analysis of the present scale and reported finding six factors that support the six domain scales. However, the correspondence between factors and domains is far from perfect and, for two of the domain scales (Physical Appearance and Attributes; and Happiness and Satisfaction), fewer than half of the items load on a single factor. SUMMARY. The Piers-Harris 2 represents an incremental improvement on the original instrument by virtue of a more representative standardization sample and reduced administration time. However, the authors of the Piers-Harris 2 appear to have placed a high value on preserving continuity with the original instrument (e.g., keeping the domain scales virtually intact) and such continuity comes at a substantial price. The theoretical and empirical basis of the instrument has been subjected to considerable criticism over the past three decades and these critiques are not effectively addressed in the Piers-Harris 2. Much scholarly work has been done in the area of children's self-concept and there is now considerably more competition in the field of self-concept assessment. Given the alternatives available, the instrument may well be supplanted by another tool with a stronger theoretical and empirical basis. REVIEWER'S REFERENCES Keith, L. K., & Bracken, B. A. (1996). Self-concept instrumentation: A historical and evaluative review. In B. A. Bracken (Ed.), Handbook of self-concept (pp. 91-170). New York: John Wiley & Sons, Inc. Piers, E. V. (1963). [Factor analysis for the Piers-Harris Children's Self-Concept Scale]. Unpublished raw data.
Piers-Harris Children's Self-Concept Scale, Second Edition (The Way I Feel About Myself) By: Piers, Ellen V., Herzberg, David S., Harris, Dale B, 20020101, Vol. 16 Mental Measurements Yearbook Review of the Piers-Harris Children's Self-Concept Scale, Second Edition (The Way I Feel About Myself) by MARY LOU KELLEY, Professor of Psychology, Louisiana State University, Baton Rouge, LA: DESCRIPTION. The Piers-Harris Children's Self-Concept Scale, Second Edition (Piers-Harris 2) is a 60-item questionnaire, subtitled The Way I Feel About Myself. The Piers-Harris 2 is a self-report measure for children and adolescents age 7-18. Respondents indicate whether each item is true or not true of them. The items are very easy to read and interpret. English and Spanish versions are available. The PiersHarris 2 is a revision of the previous, identically titled questionnaire and contains the same Self-Concept and Validity scales. Specifically, the scales consist of a total (TOT) score and six domain and two validity scales. The domain scales assess self-concept across a variety of areas including perceptions about school and intellectual functioning, appearance, and social acceptance. The self-concept scales are scored so that higher scores indicate more positive self-evaluation. The validity scales assess response bias, inconsistent responding, and exaggerated responding. Several methods of scoring are available in this revised edition. These include the AutoScoreTM, which is scored manually with the use of carbons, mail/fax-in forms, and a PC program that generates a report based on online or offline administration. Raw scores are converted to standard scores (normalized Tscores, percentile ranks). Scoring using the AutoScoreTM is very straightforward and easily accomplished by someone with minimal training. The authors warn, however, that interpretation and use of the scores should only be conducted by someone trained in psychological assessment. Norms are based on the total standardization sample and are not broken down according to respondent age, gender, or ethnicity. The authors state that ethnicity did not appear to be a moderating variable in any of their analyses. The technical manual is well written and clearly describes the measure and its development, scoring procedures, and psychometric studies. Instructions for calculating scores for the validity scales and high/low domain scores were quite clear and examples were provided. The authors presented several case examples illustrating how the scores were useful for screening and as a part of a psychological evaluation. The authors describe precisely how the current instrument differs from the previous version, which is important for those who use the instrument in research contexts. The various chapters are short yet adequately detailed. The technical manual describes various approaches to reducing respondent bias and provides suggested oral instructions as well as cautions against use of the word "test" when describing the scale to children. DEVELOPMENT. The Piers-Harris 2 is a revision of a measure originally developed in the early 1960s and revised in 1984. The current version is an improvement over earlier versions in three important ways. First, the items were updated and trimmed from 80 on the earlier version to 60. An item was eliminated if it contributed only to the total and not to domain score, contained outdated or gender-specific content, or required additional information to answer. Thus, the current version is more contemporary, less ambiguous, and less time-consuming to complete. Second and perhaps of most importance, the scale has been standardized on a national sample in contrast to the use of a homogeneous, rural sample collected in the 1960s. The current sample consists of 1,387 students aged 7 to 18 recruited from school systems across the nation. The use of a heterogeneous sample addresses one of the most significant limitations cited in a previous MMY review (Epstein, 1985). The third change in the Piers-Harris 2 cited by the authors is the availability of computer scoring. TECHNICAL. Descriptions of the restandardization were clear and adequately detailed. The authors assessed whether age, gender, socioeconomic status, and ethnicity produced different norms. The results indicated that significant differences in scores were not obtained for any of these variables in almost all cases, although the authors indicated that further research on potential moderating effects of demographics should be pursued. With regard to reliability, internal consistency estimates for the total and domain scores were adequate with almost all Cronbach alphas being above .70. Test-retest reliability studies were not conducted with the new standardization data but studies using the earlier scale were acceptable. The validity of scores from the Piers-Harris 2 was evaluated in a number of ways. The authors examined content validity using a judge's ratings as to whether the deleted items continue to be represented by the remaining items of each domain. The rating process was not described and thus, exactly what was judged is uncertain. Construct validity was assessed via factor analysis and generally supported the rationally generated domains. However, findings across studies are somewhat inconsistent according to the authors. The convergent validity of the Piers-Harris was assessed by evaluating self-concept scores with various other psychological measures. Generally, studies show that positive self-concept is inversely related to measures of psychological problems. Overall, studies suggest that the Piers-Harris has support for the validity of the instrument although the majority of studies were conducted with the original version of the scale. The authors aptly note that additional studies on the reliability and validity of the Piers-Harris 2 are needed. COMMENTARY. The Piers-Harris is one of the best if not the best questionnaire of its type, given the long history of research findings supporting the reliability and validity of the scale. It is very easy for children to use and probably is best used as a screening instrument as illustrated in the case examples described in the technical manual. SUMMARY. The Piers-Harris 2 is a revision of a previous scale that is well known and well researched. The revised scale is an improvement over the previous scale in several ways. The scale was renormed using a relatively large, demographically and geographically diverse sample. The scale was shortened, eliminating ambiguous, outdated, and psychometrically limited items. Finally, the scale can be computer scored. However, the scale retained many of the positive features of the original scale. The items are straightforward and require a yes or no endorsement. The questionnaire is easy to score and the manual is clear and straightforward. Numerous psychometric studies support the reliability and validity of the earlier version but additional studies are needed using the revised scale. REVIEWER'S REFERENCE Epstein, J. H. (1985). [Review of the Piers-Harris Children's Self-Concept Scale (The Way I Feel About Myself).] In J. V. Mitchell, Jr. (Ed.), The ninth mental measurements yearbook, (pp. 1167-1169). Lincoln, NE: Buros Institute of Mental Measurements. Review of the Piers-Harris Children's Self-Concept Scale, Second Edition (The Way I Feel About Myself) by DONALD P. OSWALD, Associate Professor of Psychiatry and Psychology, Virginia Commonwealth University, Richmond, VA: DESCRIPTION. The Piers-Harris Children's Self-Concept Scale, Second Edition (The Way I Feel About Myself) (Piers-Harris 2) is a 60-item self-report scale designed for the assessment of self-concept in children and adolescents. The scale is designed for children ages 7 to 18 who are able to read and comprehend at a second grade level. The scale can be completed by most respondents in 10-15 minutes. The instrument may be administered in an individual or group format and can be administered and scored by teachers or paraprofessionals; however, interpretation of results should be provided by "a professional with appropriate training in psychological assessment" (manual, p. 4). The Piers-Harris 2 is a revision of The Piers-Harris Children's Self-Concept Scale, a self-concept scale originally constructed in the 1960s. The instrument has been shortened from the original 80 items to reduce administration time and to eliminate items that included outdated language or were of less value psychometrically. In most respects, however, the Piers-Harris 2 preserves the character and content of the original version. Like the original, the instrument yields a total score and six domain scale scores, originally constructed on the basis of a factor analytic study (Piers, 1963). The items included in the domain scales are virtually identical; two domain scales lost two items from the original instrument and one domain scale lost one item. One original domain scale, "Anxiety" has been renamed "Freedom from Anxiety" to more accurately reflect the direction of scores, and the scale originally called "Behavior" has been given the more descriptive name: "Behavioral Adjustment." The remaining domain scale names are unchanged (i.e., Intellectual and School Status, Physical Appearance and Attributes, Popularity, and Happiness and Satisfaction). The second edition of the instrument adds two validity scales (Inconsistent Responding and Response Bias) that are designed to detect random responding and excessive yeasaying or nay-saying. The instrument may be administered using several alternative formats depending on the preferred scoring method. Scoring alternatives include an AutoScore Form that can be scored manually by the examiner, mail-in response forms that are computer scored by the publisher, and a personal computer program that generates a report based on responses keyed in by the respondent or by the examiner from a paper record form. The Piers-Harris 2 includes a disk with a computerized scoring program that allows the user to complete and score two administrations of the instrument. The program requires activation via phone or the internet. The program is easily installed and produces a score report, including a table and graph of total and domain scores, and a narrative describing the scores. The Total score and all domain scale scores are calculated as T-scores (mean = 50; SD = 10) and the "normal" range is considered to be between 40 and 60. The profile sheet for the AutoScore form also provides percentile equivalents for all T-Scores. The manual provides a thorough discussion of the interpretation of validity and self-concept scores as well as a summary of some of the relevant literature on the psychometrics of the original Piers-Harris. An appendix provides an extensive bibliography of studies that involved the original instrument. TECHNICAL. The Piers-Harris 2 is essentially a restandardization of the instrument based on a national sample of 1,387 students recruited from school districts across the U.S. The sample was intended to match U.S. Census data but Hispanic/Latino students are substantially underrepresented as are students from the West region of the country. The sample is relatively evenly distributed across the age span of 7-18 years. Some new psychometric studies accompany this revision. Internal consistency data for the total scale and the domain scales are provided at each age level; alpha coefficients range from .60 to .93. Interscale correlations among the domain scales are modest, ranging from .30 to .69; some of the higher correlations between domains may be attributed in part to shared items. As indications of convergent validity, the authors investigated relationships between Piers-Harris 2 scores and scores on measures of anger, aggressive attitudes, symptoms of Post-Traumatic-Stress Disorder, and thoughts and attitudes related to obesity. However, no new test-retest reliability studies or concurrent validity studies involving other self-concept instruments are presented in the manual. The manual includes a thoughtful discussion of potential moderator variables that might affect scores on a test independently of the target construct, including age, sex, ethnicity, SES, geographic region, and intelligence/academic achievement. For each of these potential moderator variables (except intelligence/academic achievement, for which no data were available) the authors provide evidence of minimal moderating effects. As a consequence, all children's scores are derived from the same conversion table, regardless of age, sex, or ethnicity. COMMENTARY. One criterion for judging a revision of an existing instrument is the extent to which the revision addresses the weaknesses that have been identified in the literature. Criticism of the original Piers-Harris over the years has included considerable emphasis on the limited nature of the standardization sample. This weakness has been substantially addressed in the Piers-Harris 2. The present standardization sample is drawn from a far more representative pool, although the underrepresentation of Hispanic/Latino students is unfortunate in view of the recent growth of the Latino population in the U.S. and the anticipated continued expansion of this minority group. The original Piers-Harris was often criticized for the fact that the domain scales are not unique (i.e., there are many items that appear in more than one domain scale). The authors indicate that removing the overlapping items substantially lowered internal consistency of the domain scales and, as a result, they chose to retain them. The original instrument has also been criticized for inadequate test-retest reliability (Keith & Bracken, 1996); the failure to undertake new studies of test-retest reliability constitutes a weakness of the current revision. Another recurring criticism of the original Piers-Harris has been that the instrument is fundamentally unidimensional. The current domain scales were derived from a factor analysis of the original PiersHarris items; however, these domain scales have not been consistently replicated in succeeding factor analytic studies. Indeed, the literature is inconsistent regarding the factor structure of the scale to the point that the validity of the domain scales is called into question. The authors of the Piers-Harris 2 undertook a new factor analysis of the present scale and reported finding six factors that support the six domain scales. However, the correspondence between factors and domains is far from perfect and, for two of the domain scales (Physical Appearance and Attributes; and Happiness and Satisfaction), fewer than half of the items load on a single factor. SUMMARY. The Piers-Harris 2 represents an incremental improvement on the original instrument by virtue of a more representative standardization sample and reduced administration time. However, the authors of the Piers-Harris 2 appear to have placed a high value on preserving continuity with the original instrument (e.g., keeping the domain scales virtually intact) and such continuity comes at a substantial price. The theoretical and empirical basis of the instrument has been subjected to considerable criticism over the past three decades and these critiques are not effectively addressed in the Piers-Harris 2. Much scholarly work has been done in the area of children's self-concept and there is now considerably more competition in the field of self-concept assessment. Given the alternatives available, the instrument may well be supplanted by another tool with a stronger theoretical and empirical basis. REVIEWER'S REFERENCES Keith, L. K., & Bracken, B. A. (1996). Self-concept instrumentation: A historical and evaluative review. In B. A. Bracken (Ed.), Handbook of self-concept (pp. 91-170). New York: John Wiley & Sons, Inc. Piers, E. V. (1963). [Factor analysis for the Piers-Harris Children's Self-Concept Scale]. Unpublished raw data.
Reynolds Adolescent Depression Scale-2nd Edition: Short Form Short Form By: Reynolds, William M, 20080101, Vol. 18 Mental Measurements Yearbook Review of the Reynolds Adolescent Depression Scale-2nd Edition: Short Form by MICHAEL G. KAVAN, Professor of Family Medicine and Professor of Psychiatry, Associate Dean for Student Affairs, Creighton University School of Medicine, Omaha, NE: DESCRIPTION. The Reynolds Adolescent Depression Scale-2nd Edition: Short Form (RADS-2:SF) is a 10item self-report instrument "designed as a brief screening measure for the assessment of depression in adolescents" (professional manual, p. 1) between the ages of 11 and 20 years. The RADS-2:SF is meant to be used by clinical psychologists, school psychologists, counselors, social workers, psychiatrists, other mental health professionals, and researchers who have limited time available for the assessment of depression, and its results provide mental health professionals with information for making decisions about a patient's affective status. The test author is careful to point out that the RADS-2:SF "does not provide a formal diagnosis of depression" (professional manual, p. 3), but instead provides an indication of the clinical severity of depression in adolescents. In fact, the RADS-2:SF provides only a Total score, which acts as a global assessment of the severity of depressive symptomatology in adolescents. Additional interviewing with the adolescent and possibly parents and others is necessary in order to make a formal diagnosis. The RADS-2:SF may "be administered individually or in a group setting" (professional manual, p. 3). It is typically self-administered, but may be read aloud to adolescents with reading problems or developmental delays. The RADS-2:SF takes approximately 2 to 3 minutes to complete. The test items have a Flesch-Kincaid grade level of 1.2 and a Flesch Reading Ease index of 95.6%. Based on the Simple Measure of Gobbledygook (SMOG) Index, items were written at the 3.0 school grade level. The Coleman-Liau Index suggested a reading index at the 2.29 grade level and the Gunning Fog Index at the 2.75 grade level. The test author suggests that any mental health professional, researcher, or other professionals such as physicians or nurses who have been trained to administer self-report measures to individuals and/or groups may administer the RADS-2:SF after becoming familiar with the manual. Test materials include the RADS-2:SF manual and a two-part carbonless test booklet. Examinees are asked to circle the number under Almost never, Hardly ever, Sometimes, or Most of the time that "best describes how you really feel" (professional manual, p. 23) for each of the 10 items. No alternative language versions of the RADS-2:SF are available, and investigations of performance in adolescents whose first or native language is not English have not been conducted by the test author or others. Scoring of the RADS-2:SF is easily accomplished by separating the answer sheet from the scoring sheet. Scores are tabulated and placed into a Score Summary Table that includes Total raw score, T score, and percentile rank information. Scoring takes approximately 1 minute. Interpretation is based on cutoff scores and percentile ranks provided within the manual. These are meant to "identify individuals who demonstrate a clinical level of depression symptom severity" (professional manual, p. 33) and who warrant further evaluation. The test author suggests that positive scores also occur in adolescents who manifest "other forms of psychopathology or a generalized level of psychological distress" (professional manual, p. 34). DEVELOPMENT. The RADS-2:SF is an abbreviated version of the Reynolds Adolescent Depression Scale2nd Edition (RADS-2; 16:211), which was published in 2002. The original Reynolds Adolescent Depression Scale (RADS) was published in 1987. The 10 items of the RADS-2:SF were drawn from the RADS-2. Items for the RADS were selected based on the DSM-III diagnostic criteria for major depressive disorder and dysthymic disorder as well as symptoms specified by the Research Diagnostic Criteria as assessed by the Schedule for Affective Disorders and Schizophrenia. In developing the RADS-2:SF, the six critical items from the RADS-2 were selected for inclusion "based upon their excellent ability to discriminate between clinically depressed and nondepressed adolescents" (professional manual, p. 5). The remaining items were included because they were specific to dysphoric mood, loss of interest, and irritability/anger. Other criteria for the selection of these items included their ability to demonstrate homogeneity with the RADS-2 Total scale based on item-total correlations with and coefficient alpha reliability of the Total scale. A factor analysis resulted in a single factor. Nine of the 10 RADS-2:SF items showed "strong" item-total scale correlations with the other item being included due to a desire to retain a reverse-scored item on the instrument. TECHNICAL. The school standardization sample included 9,052 adolescents from seven states and one Canadian province. Ages ranged from 11 to 20 years (mean = 14.99) and adolescents were grouped into early adolescence (i.e., ages 11-13), middle adolescence (i.e., ages 14-16), and late adolescence (i.e., ages 17-20) groups. There were more females than males and 23% of the sample reported membership in a non-Caucasian ethnic group. A sample of 3,300 adolescents were drawn from this group and served as a normative comparison sample reflecting the 2000 U.S. Census proportions on ethnicity. This sample had equal numbers of males and females and equal numbers of adolescents in each previously mentioned age grouping. Two clinical samples included 101 adolescents with one or more DSM-III-R or DSM-IV Axis I diagnoses, 27% of which were for major depressive disorder; and 70 adolescents identified by a school-based screening for depression. Both clinical samples were highly represented by Caucasians. Raw score means and standard deviations by gender are provided for the total standardization and school samples, and T-scores are provided for the total standardization sample by age group. T-score means and standard deviations are also provided for the total standardization sample as well as by gender and by ethnic group. The test author recommends that the total standardization sample be used as the primary normative group for identifying adolescents with depression. Internal consistencies (coefficient alpha) and standard errors of measurement were determined for various adolescent samples. Internal consistency was .86 for the total school sample (n = 9,052) and .84 for the total standardization sample (n = 3,300). For the clinical sample of 101 adolescents with DSM-III or DSM-IV diagnoses, internal consistency was .90 with item-total scale correlations ranging from .54 to .70. Internal consistency for "a sample of 50 adolescents who met criteria for special class placement" (professional manual, p. 43) due to intellectual disability was .85. Item-total scale correlation coefficients for the total standardization sample ranged from .28 (Reduced Affect) to .67 (Helplessness). Item-total scale correlations for males and females within the total standardization sample ranged from .26 (Reduced Affect for the male sample) to .69 (Helplessness for the female sample). RADS-2:SF test-retest reliabilities over an "approximately" 2-week time period were computed for several samples. A test-retest reliability coefficient of .81 was found on a sample of 1,765 adolescents recruited from the total school sample, .82 for a subsample of 676 adolescents from the total standardization sample, and .82 for a clinical sample of 70 adolescents. The test author notes that very little difference in mean T scores (range from 1.36 to 1.57) was noted over the two testing periods, providing further evidence for high test-retest reliability. In terms of content validity, the test author stresses the importance of symptom sampling and the degree to which each item relates to the overall test. In regards to symptom sampling, the test author indicates that symptom content is consistent with several mental health taxonomies including the DSMIV. However, a review of the RADS-2:SF items and DSM-IV-TR (APA, 2000) diagnostic criteria shows that the RADS-2:SF items cover only four of the nine diagnostic criteria for major depressive disorder. The median item-total correlation for the RADS-2:SF is .55 with a range from .28 (Reduced Affect) to .67 (Helplessness) suggesting that, for the most part, individual items do contribute to the RADS-2:SF total score. The test author also reports strong score equivalence between the RADS-2:SF and the RADS-2 based on average item means and mean score differences. In support of criterion-related validity, the RADS-2:SF was shown to correlate .80 with the Hamilton Depression Rating Scale (HDRS) in a sample of 485 junior high and senior high school students. It should be noted that the HDRS was once thought to be the "gold standard" for the assessment of depression, but is now considered to be significantly flawed (Bagby, Ryder, Schuller, & Marshall, 2004). The RADS2:SF also has been shown to correlate with various other mental health instruments including the Beck Depression Inventory (.80 in a clinical sample of 70 adolescents) and with the Major Depression scale and the Dysthymic scale from the Adolescent Psychopathology Scale (.73 for both scales in the school validity sample of 485 students with a mean age of 14.56 years). The RADS-2:SF also correlated .94 (corrected correlation coefficient of .89) with the RADS-2 in the total school sample of 9,052 adolescents, .96 (corrected correlation coefficient of .91) in the school validity sample of 236 students in Grades 6, 7, and 8 (mean age = 12.18 years) and .96 (corrected correlation coefficient of .92) in the clinical sample of 70 adolescents with mean age of 17.48 years. An examination of other studies regarding the convergent validity of the RADS-2:SF demonstrated that it correlated .57 with the Hamilton Anxiety Scale, .73 with the Revised Children's Manifest Anxiety Scale, .67 with the Suicidal Ideation Questionnaire, and .64 with the Suicidal Behaviors Interview in a school sample of 485 students between the ages of 12 and 19. In a study with 236 middle-school students, the RADS-2:SF correlated .58 with the Beck Hopelessness Scale and mirrored closely correlations that the RADS-2 had with several other mental health measures. With regard to discriminant validity, the RADS-2:SF correlated -.69 with the Rosenberg Self-Esteem Scale, -.62 with the Academic Self-Concept Scale-High School Version, and -.38 with the MarloweCrowne Social Desirability Scale-Short Form in the school-based validity sample (n = 485). Contrasted groups validity was demonstrated by significant and predicted score differences between "a sample of 27 adolescents drawn from the total school sample who were matched by age and gender with a clinical sample of 27 adolescents" (professional manual, p. 55) with major depressive disorder. Additional evidence for the validity of the RADS-2:SF is based on a principal axis factor analysis that resulted in a one-factor solution, consistent with the test author's intent for the RADS-2:SF to be a onedimensional screening measure for depression in adolescents. Factor loadings ranged from .38 to .75 in the total school sample (n = 9,052). A RADS-2:SF cutoff score of 61 (T score), which corresponds to a raw score of 26, is meant to "identify individuals who demonstrate a clinical level of depression symptom severity" (professional manual, p. 33) and who warrant further evaluation. The cutoff score was empirically determined by examining the discriminative ability of various RADS-2:SF scores in their ability to differentiate between a sample of 27 adolescents with major depressive disorder and genderand age-matched adolescents drawn from the total standardization sample. A T score of 61 corresponds to a percentile rank of 86, which is slightly more than one standard deviation above the mean for the study sample. The cutoff score has a sensitivity of 96.3, a specificity of 74.1, and a hit rate of 85.2. Cutoff scores are also provided to assist in determining the severity of depressive symptomatology; however, these were based only on percentile rank ranges. COMMENTARY. The RADS-2:SF was developed as a brief screening measure to assist in identifying adolescents who are depressed and may need treatment. The test author designed the measure for use by both researchers and clinicians in school, clinical, and other settings who have limited time available for such screening. It is based on its predecessors (the RADS and the RADS-2) and, thus, has strong psychometric lineage. The test author states that the "RADS-2:SF demonstrates high levels of reliability, validity, and clinical utility" (professional manual, p. ix). In general, studies do support the contention that the RADS-2:SF has strong reliability. Although itemtotal correlations vary from the low to moderately high range, internal consistencies with various school and clinical samples are high. Test-retest reliability is also appropriate for an instrument measuring depression, and validity appears to be generally strong as well. Factor analyses support a onedimensional instrument for measuring depression. Convergent and discriminant validity also appear appropriate with the RADS-2:SF demonstrating positive correlations with measures of depression, anxiety, suicidal ideation/behavior, and hopelessness, and negative correlations with measures of selfesteem and self-concept. Although the test author reports that item symptom content is consistent with several mental health taxonomies including the DSM-IV, an examination of RADS-2:SF items and DSMIV-TR diagnostic criteria shows that the RADS-2:SF covers only four of the nine diagnostic criteria (i.e., depressed mood, diminished interest or pleasure in activities, feelings of worthlessness, and recurrent suicidal ideation) for major depressive disorder. In regards to clinical utility, the RADS-2:SF does provide both clinicians and researchers with an easy to use and very quick way to screen for depression in adolescents. With its generally strong reliability and validity, the RADS-2:SF can be used confidently to screen for this disorder. However, further research is necessary to assess the clinical utility of the cutoff scores provided within the manual to determine whether these provide truly meaningful gradations of adolescent depression or are just statistical conveniences. In addition, as the test author admirably points out throughout the manual, the RADS2:SF should only be used as a screening measure and not as a diagnostic indicator for major depressive disorder. A positive score on the RADS-2:SF must be followed by a clinical interview by a trained professional that covers all DSM-IV-TR diagnostic criteria for depression along with the consideration of data from a variety of other sources. It should be noted that whereas a recent U.S. Preventive Services Task Force (2009) reported that there was inadequate evidence that screening tests accurately identify major depressive disorder in children, they did note that adequate evidence does exist for screening adolescents. In fact, they "recommend screening of adolescents (12-18 years of age) for major depressive disorder (MDD) when systems are in place to ensure accurate diagnosis, psychotherapy (cognitive-behavioral or interpersonal), and followup" (U.S. Preventive Services Task Force, 2009, p. 1223). Therefore, to assure proper diagnosis and mitigate any harm associated with screening and certain treatments, positive screening results must be followed by more thorough diagnostic evaluation (AACAP, 2007). Anything less would be a disservice to our adolescents who are intended to benefit from screening instruments. SUMMARY. The RADS-2:SF is a 10-item self-report instrument designed to screen for depression in adolescents. As a brief version of the RADS-2, the RADS-2:SF is a technically strong instrument that is easily administered to adolescents individually or in group settings. The cutoff score for depression is empirically based; however, additional research is necessary on the clinical utility of using scores to differentiate depression severity in adolescents. Overall, this is an excellent screening instrument that ranks with other strong screening measures for adolescent depression including the Beck Depression Inventory (T7:275), the Beck Depression Inventory for Primary Care (Beck, Guth, Steer, & Ball, 1997) and the Reynolds Adolescent Depression Scale-2nd Edition (16:211). REVIEWER'S REFERENCES American Academy of Child and Adolescent Psychiatry. (2007). Practice parameters for the assessment and treatment of children and adolescents with depressive disorders. Journal of the American Academy of Child and Adolescent Psychiatry, 46, 1503-1526. American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders (4th ed., text rev.). Washington, DC: Author. Bagby, R. M., Ryder, A. G., Schuller, D. R., & Marshall, M. B. (2004). The Hamilton Depression Rating Scale: Has the gold standard become a lead weight? American Journal of Psychiatry, 161, 2163-2177. Beck, A. T., Guth, D., Steer, R. A., & Ball, R. (1997). Screening for major depression disorders in medical inpatients with the Beck Depression Inventory for Primary Care. Behaviour Research and Therapy, 35, 785-791. U.S. Preventive Services Task Force. (2009). Screening and treatment for major depressive disorder in children and adolescents: U.S. Preventive Services Task Force recommendation statement. Pediatrics, 123, 1223-1228. Review of Reynolds Adolescent Depression Scale-2nd Edition: Short Form by RAMA K. MISHRA, Neuropsychologist, Department of Psychiatry, Medicine Hat Regional Hospital, Medicine Hat, Alberta, Canada: DESCRIPTION. The Reynolds Adolescent Depression Scale-2nd Edition: Short Form (RADS-2:SF) is a screening measure of severity of depression in adolescents, 11 to 20 years of age. This screening measure has only 10 items and takes about 2 to 3 minutes to complete either individually or in a group. It is written in simple language at a third grade reading level. The items on this test are drawn from the 30-item Reynolds Adolescent Depression Scale-Second Edition (RADS-2; Reynolds, 2002; 16:211). The degree of depression is determined from the total score. Further assessment can be done using the RADS-2 for in-depth analysis. The RADS-2:SF uses a 4-point Likert-type response format (e.g., almost never, hardly ever, sometimes, or most of the time). The respondents are asked to indicate how they usually feel. Items are worded in the present tense to capture current symptoms. DEVELOPMENT. The test author has identified that the main goal for this short form was to create a version of the previously published Reynolds Adolescent Depression Scale-Second Edition (Reynolds, 2002) with one-third or 10 items. The goal was to maintain adequate reliability and validity, but create a short version for quick screening purposes. All 6 critical items and 4 additional items from the RADS-2 were selected based on their level of discrimination between depressed and nondepressed adolescents. Additional criteria in the item selection also included the magnitude of correlation with the RADS-2 total score as well as type of symptoms (e.g., dysphoria, loss of interest, and irritability/anger). The original RADS was field tested more than 8,000 adolescents and the RADS-2 has been used in hundreds of empirical studies. It was not adequately explained why it was decided to have 10 items, or why only 1 item was worded in the positive direction and was reverse scored. The test author found strong correlations between all but 1 item with the total score, but decided to keep the item due to the fact that it was the only reversescored item on the test and also due to the content of the item with respect to depression. TECHNICAL. The standardization of the RADS-2: SF involved more than 9,000 adolescents from eight U.S. states and one Canadian province (i.e., British Columbia). The sample for this short version included the original sample used for developing the RADS-2 and additional data collected since the RADS-2 was published in 2002. The total sample was divided into three age groups, 11 to 13 years, 14 to 16 years, and 17 to 20 years to reflect early adolescence, middle adolescence, and late adolescence. Approximately 44% of the total sample were male and 56% were female. Approximately 23% of the sample were non-Caucasian. The average socioeconomic status of the sample was 8.53 (SD = 3.32, range 1-18) based on the Hollingshead Occupational Index applied to both parents. The standardization sample consisted of 3,300 adolescents from the total sample, which reflected the 2,000 U.S. Census proportions with respect to gender, ethnic background, and age group distribution. Additionally, two clinical samples were used. The first clinical sample consisted of 59 females and 42 males with a total of 101 adolescents with one or more DSM-III or DSM-IV diagnoses based on a standardized clinical interview. The second sample consisted of 45 females and 25 males with a total of 70 adolescents with Axis-I diagnoses and used for test-retest reliability study and in validity studies. Separate groups of adolescents were used from the school and standardization samples for test-retest reliability, criterion-related validity, convergent and discriminant validity, contrasted groups validity, and clinical validity. Norms for the RADS-2:SF were based on the standardization sample of 3,300 adolescents. Raw scores were converted to T-scores for each sample. The reliability estimates of RADS-2:SF are fairly high. Internal consistency reliabilities were reported in the range of .84 to .90 for the school, standardization, and clinical samples. Test-retest reliabilities were slightly lower, but quite good with a range of .81 to .82 for these groups, based on a retesting interval of about 2 weeks. Validity estimates have been reported for criterion-related validity and construct validity. As expected, the RADS-2:SF demonstrated high correlation with the RADS-2 with a correlation coefficient of .96 and .80 with the Hamilton Depression Rating Scale. The correlation of RADS-2:SF with the Major Depression scale and Dysthymic Disorder scale of the Adolescent Psychopathology Scale (Reynolds, 1998) were .73 each. Both convergent and discriminant validity estimates have been reported to establish construct validity. The correlation of RADS-2:SF with related scales from the Beck Hopelessness Scale (Beck, Weissman, Lester, & Trexler, 1974), the Hopelessness Scale for Children (Kazdin, French, Unis, Evseldt-Dawson, & Sherick, 1983), and the Bully-Victimization Scales (Reynolds, 2003) were reported to be in the range of .29 to .82, respectively. Several scales from the Adolescent Psychopathology Scale that are unrelated to depression showed weak relationship with RADS-2:SF. For example, correlation coefficients ranged from .22 to .37 with conduct disorder, substance abuse, and mania subscales. In one of the large independent studies reported by Milfont et al. (2008), the RADS-2:SF was reported to have a strong correlation with other measures of depression. They found that with a cutoff score of 26, the RADS-2:SF classified a higher percentage of students as having depressive symptoms compared to the RADS in a large New Zealand sample. It has not been explained adequately the usefulness of a school-based sample that was almost three times the standardization sample. It was also not explained how the standardization sample was selected from the larger school-based sample. Also it was not clear if the total sample was a combination of several studies conducted using the RADS-2 and RADS-2:SF. COMMENTARY. The RADS-2:SF is a 10-item depression scale, suitable for screening purposes. This test has been normed on a large sample with U.S.-Census-matched diverse samples and provides similar reliability and validity estimates as the 30-item RADS-2. However, it has not been explained why the test author chose to use a sample from one of the western provinces of Canada. It was not clear from the information provided in the manual whether the short form was derived from the 30-item RADS-2 standardization data or developed independently. In any case, the technical foundation for this test is quite strong and therefore recommended for clinical use, particularly when the clinician has other tests to consider for overall assessment. However, three issues need to be considered by prospective users, including clinicians and researchers. In terms of time, the 30-item version RADS-2 does not take much longer than the 10-item RADS-2:SF. The RADS-2 provides four subscales and a total score to help interpret subclinical levels of depression from elevations on subscales. Finally, the "multiple gate screening procedure" flow chart provided by the author of the RADS-2:SF recommends that the RADS-2 be used if the score obtained by an individual is above the cutoff to reduce false positives. Therefore, the use of the RADS-2:SF is limited to situations in which only a very brief screening is required or feasible. SUMMARY. The RADS-2:SF is a brief measure of depression in adolescents between the ages of 11 and 20 years. It is most ideal for clinical screening and referral service. However, for diagnostic assessment, monitoring treatment effectiveness, and research, the 30-item RADS-2 is a better choice. Increased false-positives and absence of any subscales make the RADS-2:SF less attractive for the frontline clinician and researcher. REVIEWER'S REFERENCES Beck, A. T., Weissman, A., Lester, D., & Trexler, J. (1974). The measurement of pessimism: The Hopelessness Scale. Journal of Consulting and Clinical Psychology, 42, 861-865. Kazdin, A. E., French, N. H., Unis, A. S., Evseldt-Dawson, K., & Sherick, R. B. (1983). Hopelessness, depression, and suicidal intent among psychiatrically disturbed inpatient children. Journal of Consulting and Clinical Psychology, 51, 504-510. Milfont, T. L., Merry, S., Robinson, E., Denny, S., Crengle, S., & Ameratunga, S. (2008). Evaluating the short form of the Reynolds Adolescent Depression Scale in New Zealand adolescents. Australia and New Zealand Journal of Psychiatry, 42, 950-954. Reynolds, W. M. (1998). Adolescent Psychopathology Scale psychometric and technical manual. Lutz, FL: Psychological Assessment Resources. Reynolds, W. M. (2002). Reynolds Adolescent Depression Scale-2nd Edition. Lutz, FL: Psychological Assessment Resources. Reynolds, W. M. (2003). Reynolds Bully Victimization Scales for Schools: Manual. San Antonio, TX: Psychological Corporation.
Reynolds Depression Screening Inventory By: Reynolds, William M., Kobak, Kenneth A, 19980101, Vol. 14 Mental Measurements Yearbook Review of the Reynolds Depression Screening Inventory by MICHAEL H. CAMPBELL, Director of Residential Life, New College of University of South Florida at Sarasota, Sarasota, FL: TEST COVERAGE AND USE. The Reynolds Depression Screening Inventory (RDSI) is a paper-and-pencil self-report measure based on the well-known Hamilton Depression Inventory (HDI), which in turn was adapted from the classic Hamilton Depression Rating Scale. The test was designed to provide a brief, convenient, and cost-effective screening for the severity of depressive symptoms. Items for the RDSI were drawn from the 32-item HDI and selected to provide broad coverage of the DSM-IV criteria for Major Depressive Disorder, as well as to maximize scale homogeneity. The authors make clear that the RDSI is not intended to function as a diagnostic or predictive instrument; rather, the test provides quantitative and qualitative information on current levels of depressive symptomatology. The test is appropriate for use with adult outpatients, whether or not they meet DSM-IV criteria for diagnosis of a depressive disorder. NORMS AND TEST BIAS. The standardization sample for the RDSI consisted of 450 nonclient adults (ages 18-89) selected from a larger sample (n = 531) to provide balanced representation of gender and age groups. The authors also report norms from a psychiatric outpatient sample (n = 324), in which patients with Major Depressive Disorder (n = 150) were represented. Many of the analyses reported in the manual are based on the total development sample (n = 855). The manual provides comprehensive descriptions and analyses of sample demographics. There was a significant effect of gender on RDSI scores, consistent with previous research demonstrating a slight trend for women to report greater depressive symptomatology. There were no significant main effects for age or ethnicity, and no significant age X gender or ethnicity X gender interaction effects. However, as the authors prudently note, ethnic minorities, especially Asians and Hispanics, had relatively small sample sizes; therefore, statistical power may be insufficient to detect ethnicity-related differences in scores. ADMINISTRATION AND SCORING. The RDSI is clearly and elegantly designed. The test is easily administered in both individual and group formats and is sufficiently straightforward to be used by a wide variety of mental health professionals with appropriate training. The manual provides clear instructions for administration and scoring, which is readily accomplished by hand. Additionally, the manual includes procedures for prorating incomplete protocols and describes some simple validity checks based on usual or inconsistent response patterns. The RDSI also contains six critical items that merit follow-up when scored in the keyed direction. The RDSI produces raw scores ranging from 0 to 63, although raw scores above 35 are rare. The manual provides tables for conversion of raw scores into T-scores and percentile ranks. Raw scores of 10 or below are not suggestive of clinical severity; scores from 11 to 15 suggest mild severity. A cutoff score of 16 identifies "a clinically relevant level of depressive symptoms" that warrants referral "for further evaluation and consideration of treatment" (professional manual, p. 15). The manual provides a detailed description of the cutoff score selection criteria: to maximize both hit rate and clinical sensitivity. In a study of the RDSI's ability to differentiate between participants with an existing diagnosis of Major Depressive Disorder and nonpatient controls, a score of 16 correctly classified 94.9% of persons overall and 95.3% of those with an existing diagnosis of Major Depression. RELIABILITY. The reliability estimates of the RDSI appear excellent across a series of measures. Cronbach's alpha estimates of internal consistency were .93 for the total sample and .89 for the psychiatric outpatient sample, with minimal differences between genders. Test-retest reliability computed at approximately one-week intervals (using a sample of 190 adults retested after the initial data collection) yielded an overall correlation of .94. The authors also report correlations between individual items and total scale score for the total development sample. Correlations ranged from .44 to .83 (all but two were above .50), suggesting substantial homogeneity of item content, even though the RDSI taps a diverse group of depressive symptoms. Finally, the standard error of measurement is less than 3 points for both men and women, indicating a stability of measure that supports clinical use. VALIDITY. The manual provides clear and comprehensive summaries of validational data. The descriptions of statistical and conceptual strategies for validation are very well written and well organized; the material should be broadly accessible, even to readers with relatively little training in quantitative methods. More importantly, the substance of these analyses is clear and convincing evidence of content, criterion-related, construct, and clinical validity. The item selection procedures for the RDSI provide important evidence of content validity. The selection process ensured that items reflected mood, cognitive, somatic, neurovegetative, psychomotor, and interpersonal areas of symptomatology. The RDSI item content is tied to the diagnostic criteria of the DSM-IV; the instrument is therefore atheoretical in that content parallels the DSM's focus on symptom presentation rather than etiological explanation. Additionally, item validity can be implied from item homogeneity demonstrated by item-with-total-scale correlations, as noted above. The authors' evaluation of criterion validity focuses on concurrent rather than predictive criteria, a choice defended on the grounds that the RDSI is designed to assess current levels of severity but not to predict the future course of depression. The manual presents strong evidence of concurrent validity based on correlations with a variety of criterion measures, including the Hamilton Depression Rating Scale (.93), the Beck Depression Inventory (.94), the Beck Hopelessness Scale (.80), the Adult Suicidal Ideation Questionnaire (.67), the Beck Anxiety Inventory (.71), the Rosenberg Self-Esteem Scale (-.71), and the Marlowe-Crowne Social Desirability Scale--Short Form (-.37). This is an impressive array of correlations with well-validated criterion instruments. Moreover, the choice of criterion instruments provides strong evidence of convergent and discriminant construct validity. Further evidence of construct validity comes from factor analytic evaluation of the RDSI items. An initial principal components analysis using both orthogonal and oblique rotations yielded a consistent threefactor structure for the RDSI; the dimensions were depressed mood-demoralization, somatic complaints, and vegetative symptoms-fatigue. A second principal components analysis, restricted to data from psychiatric outpatients, yielded essentially the same factor structure. The manual includes an interesting discussion of clinical efficacy or clinical validity. In addition to a detailed discussion of the issues of hit rate and sensitivity noted earlier, the authors demonstrate statistically significant differences in RDSI score among nonreferred adults, persons with Major Depressive Disorder, and persons with other psychiatric diagnoses; the authors term this type of analysis "contrasted groups validity." SUMMARY. The RDSI provides a reliable, valid, and convenient short screening for severity of depressive symptoms in psychiatric outpatients. The supporting materials are outstanding for their thorough documentation and clarity of expression, and the evidence of reliability and validity is compelling. Although the test probably does not provide much additional or qualitatively different clinical information relative to other instruments (e.g., Revised Hamilton Rating Scale for Depression [RHRSD], Beck Depression Inventory-II [BDI-II], or Minnesota Multiphasic Personality Inventory-2 [MMPI-2], the RDSI is an excellent choice for clinicians who desire an efficient screening focused on depressive symptoms. Review of the Reynolds Depression Screening Inventory by ROSEMARY FLANAGAN, Adjunct Associate Professor of Psychology, St. John's University, Jamaica, NY: The Reynolds Depression Screening Inventory (RDSI) is an instrument in a series (e.g., Reynolds, 1986) of depression inventories. The manual is well written and appears useful for both researchers and practitioners. Standardization procedures and psychometric properties are carefully explained; illustrative case examples are provided. To the credit of the authors, sufficient data are reported in the manual, permitting test users to arrive at their own judgments about the RDSI. A literature search did not yield further information; therefore, this review is based on material in the manual, and a recent conference presentation (Reynolds, Flament, Masango, & Steele, 1999). The authors appear to have realized their stated goal of developing a measure of depression consistent with the diagnostic criteria for Major Depressive Disorder, according to the Diagnostic and Statistical Manual of Mental Disorders (DSM-IV; American Psychiatric Association, 1994). Depression is a significant mental health problem in that surveys indicate (e.g., Kessler et al., 1994) prevalence rates as high as 10.3% for the general population. The RDSI is not intended for the diagnosis of depression, but rather, is to be used to provide an indication of the severity of the problem over the past 2 weeks. Items reflect the same domains that are covered on the Hamilton Depression Rating Scale (HDRS; Hamilton, 1960), with weighted response options for the items. The RDSI is similar in format to the Beck Depression Inventory (BDI; Beck, Steers, & Brown, 1987) in that each item is rated along a continuum, with higher scores indicative of greater depressive symptomatology. There are three to five response options for each item, stated in specific behavioral terms, similar to a structured interview. Administration and hand scoring can be accomplished in 10-15 minutes; there is no computer-scoring format. Scoring involves summing the numerical values assigned to the response options, with data reported as linear T=scores (Mean = 50; SD = 10) and percentiles. Responses to six critical items are also reviewed. These items address the following and its extent: whether the respondent is feeling depressed, the respondent's outlook, suicidal ideation, changes in interest and work performance, and general feelings about oneself. The RDSI is written at approximately a fifth-grade reading level, somewhat below the reading level of the BDI (eighth-grade). Similar to the BDI, this format is advantageous to practicing clinicians, as it can be administered and scored during an office visit, if necessary. Norms were derived from a sample of 450 individuals who were matched for gender and age. A concern is that the sample is geographically limited, having been drawn from the Midwestern and Western United States. The racial-ethnic composition of the sample is 89.1% Caucasian, 4.5% African-American, 2.0% Asian, 3.3% Hispanic, and 1.1% Other. Approximately 72% of the individuals were between 25 and 64 years of age, with 14% in both the 18-24- and 65-89-year cohorts; the mean age of the participants was 43. Socioeconomic status varied from professionals to the unemployed; dwelling areas were urban, suburban, and rural. Data were collected on several additional samples that were used in subsequent analyses. The psychiatric sample was composed of 324 individuals, 150 of whom were diagnosed as having major depression, 123 had anxiety disorders, and 51 were diagnosed with other psychopathology. The demographic characteristics of this group were generally similar to the standardization sample. The mean scores for each group were such that the groups were collapsed into two groups: those with major depression and those with other psychopathology. An additional sample referred to as the total development sample was used for some analyses; its composition is demographically similar to the other samples used. It was composed of 855 individuals, approximately 62% had no DSM-IV (American Psychiatric Association, 1994) diagnosis. The remaining 38% comprised a group with major depression and a group with other psychiatric problems. Coefficient alpha for the total sample and the psychiatric sample was .933 and .898, respectively. Testretest reliability at a 1-week interval was .944. These values are adequate for clinical decision making and research (Kaplan & Saccuzzo, 1997; Nunnally & Bernstein, 1994). The instrument appears to assess a sole construct, with scores demonstrating adequate stability. Validity was examined in several ways: content, criterion-related, construct, and clinical (contrasted groups), and the efficiency, sensitivity, and diagnostic specificity of the RDSI cutoff score. Item-total correlations, reflecting content validity, are described as moderate to high, with approximately 25%-69% of the variance being explained for 16 of 19 items. Criterion-related validity was assessed by examining the sample correlation (r = .93) between an adapted version of the Hamilton Depression Rating Scale (HDRS; Reynolds & Kobak, 1995) and RDSI scores. The adapted form of the HDRS requires considerably less time to administer and is much less labor-intensive than the original HDRS (Hamilton, 1960), and is similar in format to the RDSI. Construct validity was evaluated by examining the relationship between the RDSI and several measures. Correlation with the Beck Depression Inventory (BDI; Beck, Steer, & Brown, 1987) was .94. Correlations with related constructs, such as suicide ideation, assessed by the Adult Suicidal Ideation Questionnaire (Reynolds, 1991) was .67. Correlation with the Rosenberg SelfEsteem Scale (Rosenberg, 1965) indicated an inverse relationship, as might be expected (-.71). Additional evidence of construct validity was provided as part of the validation of the Physical SelfConcept Scale (Reynolds, Flament, Masango, & Steele, 1999). The measure evaluates physical aspects of appearance, ability/skills, intelligence, health, and self-efficacy related to these same domains. A moderate relationship with the RDSI was demonstrated, accounting for 21% of the variance for a sample of community-based college students and adults. Multiple regression analysis indicates that the RDSI measures depression as opposed to generalized psychological distress. This was substantiated in two analyses in which the beta weights for depression as assessed by the BDI and HDRS were .66 and .72, respectively. In contrast, beta weights ranged from .22 to .18 for measures of hopelessness, suicide ideation, self-esteem, and anxiety. Factor analytic studies indicate that 58% of the variance in the total development sample is explained by a three-factor solution, corresponding to depressed mood, somatic complaints, and vegetative symptoms. The factors were extracted to provide evidence of validity, rather than to provide information about aspects of depression. It is made clear that the RDSI should not be the sole criterion used to diagnose depression, and that the factors should not be interpreted individually. This bears some similarity to the BDI. The most critical validity evidence is the efficacy of the RDSI cutoff scores. Analyses were conducted to determine the level at which the combination of sensitivity (correct identification of those with major depression), specificity (correct identifications of those who do not have major depression), positive and negative predictive value (correct identifications), and hit rate (proportions of correct identifications) were optimized. Data are also presented on the strength of association (chi square, kappa coefficient) and the quantified clinical validity of the cutoff scores (phi coefficient). The cutoff score that is expected to result in optimal decision making is 16, substantiated by tabled data indicating four (sensitivity, hit rate, chi square, phi coefficient) indices at their peak; the remaining indices are acceptably high. This corresponds to the 96th percentile, or T = .72. Should the score not be in the clinically significant range, the RDSI could be interpreted normatively. The item numbers of the six critical items are printed near the bottom of the front page of the protocol. Responses of "2" or higher on these items are clinically significant. Should an individual obtain scores of "3" or more on three critical items, further evaluation is indicated, irrespective of the total score. SUMMARY. The data in the manual suggest that the RDSI should live up to the authors' claims. Psychometric properties are sound, despite a smaller norming sample than that used for the BDI. The level of detail in the manuals, particularly in the validity sections, exceeds that available in the BDI manual, and is an improvement. The RDSI is atheoretical; the BDI reflects Beck's theory (e.g., Beck, 1973). The strength of the RDSI may be that it is a technical advance. Nevertheless, the uses and properties of the RDSI are similar to the BDI. The need for a new instrument to assess depression in a brief, time-sensitive format is debatable. Researchers and practitioners may be less likely to utilize a new measure, given the existing data and large literature supporting the BDI. It is reasonable to expect that additional research is needed for the RDSI to become a commonly accepted alternative to the BDI. REVIEWER'S REFERENCES Hamilton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry, 23, 56-62. Rosenberg, M. (1965). Society and the adolescent self-image. Princeton, NJ: Princeton University Press. Beck, A. T. (1973). Depression; Causes and treatment. Philadelphia: University of Pennsylvania Press. Reynolds, W. M. (1986). Reynolds Adolescent Depression Scale. Odessa, FL: Psychological Assessment Resources. Beck, A. T., Steer, R. A., & Brown, G. K. (1987). Beck Depression Inventory-II manual. San Antonio, TX: Psychological Corporation. Reynolds, W. M. (1991). Adult Suicidal Ideation Questionnaire: Professional manual. Odessa, FL: Psychgological Assessment Resources. American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author. Kessler, R. C., McGonagle, K. A., Zhao, S., Nelson, C. B., Hughes, M., Eshleman, S., Wittchen, H., & Kendler, K. S. (1994). Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: Results from the National Comorbidity Survey. Archives of General Psychiatry, 51, 8-19. Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill. Reynolds, W. M., & Kobak, K. A. (1995). Reliability and validity of the Hamilton Depression Rating Inventory: A paper and pencil version of the Hamilton Depression Rating Scale clinical interview. Psychological Assessment, 7, 472-483. Kaplan, R. M., & Saccuzzo, D. (1997). Psychological testing (4th ed.). Pacific Grove, CA: Brooks-Cole. Reynolds, W. M., Flament, J., Masango, S., & Steele, B. (1999, April). Reliability and validity of the Physical Self-Concept Scale. Paper presented at the annual convention of the American Educational Association, Montreal, Canada.
Texas Functional Living Scale By: Cullum, C. Munro, Weiner, Myron F., Saine, Kathleen C, 20090101, Vol. 18 Mental Measurements Yearbook Review of the Texas Functional Living Scale by PAM LINDSEY-GLENN, Professor, College of Education, Tarleton State University, Stephenville, TX: DESCRIPTION. The Texas Functional Living Scale (TFLS) is an individually administered, performancebased scale developed for use with individuals displaying characteristics of a variety of neurodevelopmental, neurodegenerative, and intellectual disabilities. The instrument evaluates the functional abilities, (e.g., daily or independent living skills) of such persons as part of an overall psychological or medical evaluation. The TFLS assesses functional ability skills in instrumental daily activities (IADLs) for individuals ages 16-90. IADLs are described as more complex behaviors related to living independently, such as managing money and paying bills. The instrument is composed of 24 items measuring an individual's skills in four areas of functioning. These areas are described in subscale ratings related to Time, Money and Calculation, Communication, and Memory. Results may be used as part of a comprehensive medical or psychological evaluation, focused on planning effective programs or evaluating the impact of medical or other interventions. The test kit contains stimulus cards for some items such as money and time. The examiner's manual provides a list of other items to be provided by the examiner such as a calendar, stop watch, and various coin and bill denominations. The examinee is required to give either an oral or written response to items containing oral and visual cues, depending on the nature of the particular item. The administration instructions are clear and straightforward. The instrument assesses a variety of tasks related to daily functional living such as writing checks, looking up information in a phone book, and so forth. Raw scores and cumulative percentages are recorded for each of the four subscales and an overall T-score is calculated for the entire test. The T-score is determined by the sum of the four subscale scores. Interpretation guidelines are given for each subscale and for the overall T-score. DEVELOPMENT. The TFLS was developed in response to an increasing need for reliable instruments that describe an individual's independent functioning ability. The skills measured by the TFLS are critical to the development of appropriate programs and interventions that improve an individual's quality of independent living. The work on the instrument began in the mid-1990s with the market version being published in 2009 (examiner's manual, p. ix). The instrument's original purpose was to assess the functional abilities of persons with neurodevelopmental or neurodegenerative disorders, particularly, Alzheimer's disease. However, the current version expands its efficacy for use with individuals who have intellectual disabilities, schizophrenia, and traumatic brain injuries. It was designed to provide relevant information about the examinee's ability to function independently in home and community settings (examiner's manual, p. 3). It was not developed as a "stand alone" measure, but to be used in conjunction with other evaluation data such as tests of memory or cognitive ability. TECHNICAL. The standardization sample consisted of 800 individuals ages 16-90 with 100 individuals in each age band (e.g., 16-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, & 80-90 years). The standardization data were collected as part of the Wechsler Memory Scale-Fourth Edition (WMS-IV; Wechsler, 2009) standardization. The normative data represent the population of the United States ages 16-90. The participants were selected based on specific demographic variables (e.g., geographic location, educational level, race/ethnicity) based on the 2005 U.S. Census. The T-score has a single norm for all ages with a mean of 50 and standard deviation of 10. The T-score uses the sum of raw scores from each subscale. The cumulative subscale percentages are used in place of scaled scores because the typical participants in the norming sample scored a perfect score on each subscale, thus skewing the functional scores. The subscale scores, therefore, are reported in cumulative percentage bands (≤ 2 to > 75) to attenuate for the skewed normative scores (examiner's manual, p. 45). The reliability coefficients were calculated using a split-half method with a Fisher's z transformation to calculate the average coefficients. A decision-consistency methodology was used to demonstrate reliability between subscales due to the significant highly skewed nature of the subscale scores. Reliability coefficients were adequate to demonstrate test reliability for each age and disability group with an overall average of .75 for typically developing individuals and .92 for the special groups sample. Interrater reliability coefficients were reported as .97 to .99. The manual provided an excellent description of validity evidence in all three areas of test validity, namely, construct, content, and concurrent. Content validity was evaluated through literature and expert reviews. Adjustments were made based on suggestions from both sources. Construct validity was evaluated throughout the standardization process and modifications were made. The test authors describe in detail special group studies conducted with targeted populations using the TFLS and other measures of adaptive and/or cognitive abilities. The special studies included populations of participants with Alzheimer's disease, mild-moderate mental retardation, major depressive disorders, schizophrenia, autistic disorder, and traumatic brain injuries. The manual describes in detail how participants for each study were chosen and the validity correlations and mean scores for each. In addition, an extensive description of each concurrent validity study is provided. Data suggested that the TFLS had adequate concurrent validity with instruments having a similar purpose (e.g., ILS), however, poor or moderate correlations with instruments such as those assessing cognition or adaptive behavior (e.g., ABAS-II or WAIS-IV). This result should be expected, however, because the TFLS is a screening instrument designed for a specific purpose and assessment of a limited range of very specialized skills. Instruments such as the ABAS-II or WAIS-IV are not designed to measure the same types of skills as the TFLS purports to measure and are meant as an overall measure of an individual's function. However, the TFLS is appropriate to use in conjunction with other measures such as the WAIS-IV and the ABAS-II. COMMENTARY. The TFLS is a brief, easy-to-use screening instrument that evaluates an individual's ability to function in four critical lifeskills areas. It has the potential to be a helpful tool if used in conjunction with other assessment data to help plan interventions or evaluate the effectiveness of interventions with a target group of adults. It has clear, straightforward directions and is easy to score. The manual provides interpretive examples; however, the examples are very limited in scope. Alternative prompts are provided for most items, which is very helpful when assessing individuals with cognitive or developmental challenges. Suggestions are also given for examinees with physical challenges. The downside of the TFLS is the cue cards used for some items. The black and white line drawings would be difficult for some people with developmental disabilities to "pretend" to use or interpret. For example, one item asks the examinee to "act out" the steps to cook a recipe in the microwave. The cue card of the microwave oven is a poor representation. Using real objects or, at least, models of real objects would likely be more effective. A second example is the cue card provided for writing a check and addressing an envelope. The line drawings and cluttered appearance would be difficult to make inferences about for persons with developmental disabilities. The cue card used to simulate a water bill is visually cluttered and would be difficult for a person with developmental disabilities to analyze. SUMMARY. Overall, the TFLS appears to be a useful screening tool to support comprehensive medical or psychological evaluations. The skills evaluation items are quite limited (24 items), but they could provide an overall impression of an individual's abilities in the four areas assessed. As with all screening instruments, it should be used with caution and in conjunction with other assessment data. Scores from the TFLS should not be used as the sole criterion for making diagnoses, planning programs, or evaluating the success of interventions. REVIEWER'S REFERENCE Wechsler, D. (2009). Wechsler Memory Scale-Fourth Edition. San Antonio, TX: Pearson. Review of the Texas Functional Living Scale by JENNIFER M. STRANG, Clinical Neuropsychologist, Department of Behavioral Health, DeWitt Healthcare Network, Fort Belvoir, VA: DESCRIPTION. The Texas Functional Living Scale (TFLS) is a performance-based measure designed to assess instrumental activities of daily living (IADLs). The TFLS consists of 24 items assessing the "ability to use analog clocks and calendars, ability to perform calculations involving time and money, ability to utilize basic communication skills in everyday activities, and memory" (examiner's manual, p. 2). The TFLS item requirements vary, ranging from reading a clock display to writing a check to paying a bill. Each item is individually scored from 0 to 1, 0 to 2, or 0 to 3 points. Some items also have a range of 0 to 5 and 0 to 6. A total raw score for each of the four subscales (Time, Money and Calculation, Communication, Memory) is calculated by adding individual item scores within each subscale. A TFLS total raw score is obtained by adding the subscale raw scores; a maximum of 50 points is possible. The subscale raw scores are converted to cumulative percentages, and the TFLS total raw score is converted to a T-Score. The manual provides detailed guidelines for interpreting the summary scores and applying qualitative descriptors. The test is administered individually to examinees ages 16 to 90. The average time to administer the test is less than 15 minutes. A record form is used to administer and score the TFLS; it contains all of the administration, recording, and scoring directions and provides space for recording behavioral observations during test administration. An accompanying response form is needed for Items 4, 5, 6, 13, and 14. The test kit also includes the following materials: a clear plastic bottle, five simulated phone books, and two stimulus cards (Card 1: water bill/microwave and Card 2: food label). Additional required materials to be provided by the examiner include: stopwatch, timer, 12-page wall calendar for the current year, pencil, zip-top bag for money, telephone, small edible objects/candies to mimic pills, and money (1 $5 bill, 2 $1 bills, 7 quarters, 5 dimes, 5 nickels, and 5 pennies). To minimize distractions, the test authors recommend keeping a chair or shelf within easy reach but out of the examinee's view to arrange the test materials. In general, the administration instructions are clear and easy to follow, and helpful icons appear on the record form to indicate materials required in each subtest. Additionally, verbal instructions are highlighted in purple to improve ease of administration. DEVELOPMENT. According to the test authors, the TFLS "was originally developed in response to the limitations of current functional ability assessment approaches used in patients with Alzheimer disease" (examiner's manual, p. 1). In the past, many assessments of functional capacity relied on reports from family members or other caregivers. Because caregiver reports may be biased by multiple factors, several performance-based functional measures were developed. However, many of these measures have problems of their own that may reduce their validity, reliability, and practicality. Thus, the TFLS was created to provide a brief, easily administered instrument to assess IADLs that are thought to be most susceptible to cognitive decline. Additionally, the test authors hoped to create an instrument that would provide clinically valuable information to aid in differential diagnosis and treatment planning, particularly in relation to an individual's capacity to function independently. TECHNICAL. The standardization data were collected in conjunction with the Wechsler Memory ScaleFourth Edition (WMS-IV; Wechsler, 2009) standardization; data from the 2005 U.S. Census guided the stratified sampling of participants. The examiner's manual provides a complete description of the inclusion and exclusion criteria and the demographic characteristics of the nationally representative sample of 800 examinees. The sample was divided into eight age bands with 100 individuals in each band. An equal number of male and female examinees were included in each age group from ages 16 through 59. The older age groups (≥ 60 years) included more females than males. The normative sample was further stratified according to five educational levels, five racial/ethnic categories, and four geographic regions. Several special group studies were conducted concurrently with standardization to examine the specificity and clinical utility of the TFLS. The special group sample included 212 examinees diagnosed with one of seven disorders: Probable Alzheimer's disease-Mild Severity, Intellectual Disability-Mild Severity, Intellectual Disability-Moderate Severity, Major Depressive Disorder, Traumatic Brain Injury, Schizophrenia, and Autistic Disorder. A sample of participants residing in assisted living facilities or group homes and those requiring caregiver support at home also was collected. There is some evidence of internal consistency for the TFLS T-score, with split-half reliability coefficients ranging from .65 at age 16-19 to .81 at age 60-69. The average across age groups was .75, which is considered a moderate level of reliability. Split-half reliability for the clinical groups generally was quite high, ranging from .63 for Major Depressive Disorder to .97 for the assisted living facility residents, with an average reliability of .92. Two independent raters scored all test protocols; interrater reliability was high, ranging from 97% to 99% agreement. Finally, the TFLS appears to demonstrate good stability over time. Although one would expect some increase in performance due to practice effects, the increase in the TFLS T-score was small, suggesting good test-retest reliability. In all, the reliability estimates appear acceptable and are comparable to similar performance-based functional ability measures, such as the Independent Living Scales (ILS; Loeb, 1996). The test authors also provide validity information based on test content, response processes, internal structure, relationships with other variables, and relationships to other measures. Test content was examined and modified through the use of literature and expert reviews. Earlier versions of the scale were used to develop item content for standardization. Items were reevaluated after the standardization phase to establish a final set of items. Examinee response processes provided additional evidence of validity through the evaluation of frequently occurring incorrect responses. That is, whenever analysis revealed the existence of incorrect but plausible responses, scoring guidelines were changed as appropriate. External evidence of validity includes correlations between the TFLS and other measures of adaptive functioning, memory, and cognition. The validity of TFLS scores in the aforementioned clinical groups also was examined. In the adaptive functioning domain, the TFLS was compared to performance on the Independent Living Scales (ILS), a similar measure that directly assesses skills required to live independently, and the Adaptive Behavior Assessment System-Second Edition (ABAS-II; Harrison & Oakland, 2003), a self- or other-report measure of adaptive functioning. In a sample of 27 examinees diagnosed with mild to moderate dementia, the research edition of the TFLS apparently showed strong correlations with the ILS, though these data are not printed in the examiner's manual. Nonetheless, the findings suggest measurement of similar constructs. The final edition of the TFLS and the ILS subsequently were administered to a sample of 77 nonclinical examinees. As predicted, correlations in this sample were low to moderate (.14 to .47), given the restricted score range one would expect in a sample of normally developing and aging adults. Likewise, correlations between the TFLS and the ABASII were small (.03 to .26), which the test authors suggest are likely due to the restricted range issues and the different format of the instruments. Correlations between the TFLS and ABAS-II in the clinical group sample were significantly higher, ranging from .41 to .80. Correlations of the TFLS with measures of memory and other cognitive abilities yielded similar findings. In essence, there were low correlations in the normative sample between measures of episodic memory, verbal comprehension, perceptual reasoning, working memory, processing speed, and general cognitive ability, and the TFLS T-score. Correlations were higher in the special group sample, ranging from .67 (WMS-IV Auditory Memory) to .80 (Visual Working Memory) on the memory measures and from .71 (WAIS-IV Perceptual Reasoning Index and Processing Speed Index) to .79 (Full Scale IQ). Performance on the TFLS also was compared to the general neuropsychological functioning on the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS). Correlations in the normative sample ranged from .28 (Attention and Delayed Memory) to .44 (Total Scale), suggesting a moderate relationship between neurocognitive functioning and IADLs. Finally, discriminant validity of the TFLS was demonstrated through the special group studies comparing the performance of different clinical groups. As would be predicted, individuals in the Probable Alzheimer's Disease, Intellectual Disability-Mild and Moderate Severity, Schizophrenia, and Autistic Disorder groups all exhibited significantly greater difficulties with IADLs than their matched control groups. Diagnoses of Major Depressive Disorder and Traumatic Brain Injury did not appear to affect performance on the TFLS. COMMENTARY. The developers of the TFLS appear to have achieved the goals they intended to reach. The TFLS is, indeed, brief, portable and easy to administer, score, and interpret. Of slight inconvenience are the additional materials the examiner must provide. However, this problem is easily alleviated by a few extra minutes of planning and organization. The TFLS has a large and nationally representative normative base and good psychometric properties. Although not discussed in the examiner's manual, the TFLS also appears to have good ecological validity, which is an important factor to consider when evaluating individuals who may resist measures that seem unrelated to the difficulties they are experiencing. Limitations of the TFLS include the requirement of intact fine motor control for successful performance of many of the items, which could result in scores that underestimate an individual's functioning. In such situations, use of the ILS, for example, may be more appropriate given the large number of items on that instrument that require only a verbal response. Additionally, the Memory subscale includes only three items that do not take into account many important aspects of memory, such as the ability to learn and retain new information. Thus, it is strongly recommended that examiners using the TFLS include additional instruments to assess different aspects of memory functioning. A complete picture of memory functioning is especially important for treatment planning in order to build on an individual's strengths and to devise compensatory strategies for an individual's limitations. Finally, continued research with special clinical groups would bolster the clinical applicability of the TFLS. Compared to the normative sample, the special group sample sizes are small and, in some cases, do not cover the full age range, though certain diagnoses, such as Alzheimer's Disease, would naturally restrict the age range of the participants. SUMMARY. The TFLS is a brief, portable, and easily administered performance-based measure of instrumental activities of daily living related to the capacity to function independently. It is an informative adjunct to traditional neurocognitive assessment and measures of functional capacity that utilize reports from family members or other caregivers. The test authors present detailed information on the instrument's reliability and validity, and they convey a strong rationale for the use of the TFLS to aid in differential diagnosis and treatment planning. In sum, the TFLS is a welcome addition to the rather limited arsenal of psychometric instruments that directly assess independent living skills. REVIEWER'S REFERENCES Harrison, P. L., & Oakland, T. (2003). Adaptive Behavior Assessment System-Second Edition. San Antonio, TX: The Psychological Corporation. Loeb, P. A. (1996). Independent Living Scales. San Antonio, TX: The Psychological Corporation. Wechsler, D. (2009). Wechsler Memory Scale-Fourth Edition. San Antonio, TX: Pearson.
Vineland Adaptive Behavior Scales, Second Edition By: Sparrow, Sara S., Cicchetti, Domenic V., Balla, David A, 20080101, Vol. 18 Mental Measurements Yearbook Review of the Vineland Adaptive Behavior Scales, Second Edition by STEPHANIE STEIN, Professor and Chair, Department of Psychology, Central Washington University, Ellensburg, WA: DESCRIPTION. The Vineland Adaptive Behavior Scales, Second Edition (Vineland-II) represents a significant revision and update of the well-established and popular Vineland Adaptive Behavior Scales (Vineland ABS; Sparrow, Balla, & Cicchetti, 1984), which itself was a revision of Edgar Doll's original Vineland Social Maturity Scale (1935, 1965). The Vineland-II is an individually administered measure of adaptive behavior, which the test authors define as "the performance of daily activities required for personal and social sufficiency" (Survey Forms Manual, p. 6). The age range for the Vineland-II (birth to 90) represents a significant expansion from the Vineland ABS that was intended for individuals from birth to age 18. There continue to be three versions of the Vineland in the latest addition. The two survey forms (Survey Interview Form and Parent/Caregiver Rating Form) differ only with respect to method of administration (interview vs. rating scale). The Expanded Interview Form is intended to provide a more comprehensive measure of adaptive behavior as well as a basis for treatment planning. Finally, the Teacher Rating Form (TRF) is a revision of the Vineland ABS Classroom Edition Form with an expanded age range from ages 3 to 21. Each of the versions of the Vineland-II measures the same four broad domains of functioning assessed in the Vineland ABS: Communication, Daily Living Skills, Socialization, and Motor Skills. The Communication domain assesses the subdomains of Receptive, Expressive, and Written Communication. The Daily Living Skills domain assesses the subdomains of Personal Skills (eating, dressing, hygiene), Domestic (performance of household tasks), and Community (the use of money, computer, time, telephone, and job skills) on the Survey and Expanded Interview Forms. The subdomains for the TRF have changed to include behaviors that are more relevant to the school environment. The Daily Living Skills subdomains are Personal (same as Survey Form), Academic (understanding of money, math, and time concepts), and School Community (following rules in the classroom and school, attention, and learning approaches). The Socialization domain assesses the person's ability to get along with others and to engage in leisure activities. This domain is further broken down into the subdomains of Interpersonal Relationships, Play and Leisure, and Coping Skills (degree of responsibility and sensitivity demonstrated to others). Finally, the Motor Skills domain (intended for children ages 6 and under) includes the subdomains of Gross and Fine motor skills. This subdomain can be administered to individuals aged 7 and older if a motor deficit is suspected. Finally, there is an optional Maladaptive Behavior domain that can be administered on all but the TRF if problem behaviors are suspected to interfere with the adaptive behavior functioning of individuals aged 3 years and older. The Survey (Interview Form) and the Expanded Interview Form are both administered to parents and/or caregivers through a semistructured interview format, ranging in administration times from 20 to 60 minutes for the Survey Form and 25 to 90 minutes for the more in-depth Expanded Interview Form. The Survey (Parent/Caregiver Rating Form) and the TRF are both intended to be filled out independently as a rating scale by the respondent (parent, caregiver, or teacher) who is very familiar with the individual. The manuals (a separate one for each version) guide administration by providing suggestions for interview format (where applicable) and determining a starting point for assessment. The detailed scoring guidelines indicate how to score responses obtained through the interview process and how to establish basals and ceilings on each subdomain. The items on the TRF and Survey Forms use a 3-point scale (Never, Sometimes or Partially, Usually), whereas the Expanded Form has a 5-point scale (Almost Always, Often, Sometimes, Rarely, Never). A variety of raw and derived scores are provided in the Vineland-II. First, there are clear guidelines for computing subdomain raw scores. Following this step, the tables provide corresponding v-scale scores (mean of 15 and standard deviation of 3) for each subdomain and maladaptive behavior raw score. Standard scores (mean of 100 and standard deviation of 15) are available for all of the primary domains and the overall Adaptive Behavior Composite (full scale score). Confidence intervals (based on the standard error of measurement; SEM) are provided at the 85%, 90%, and 95% confidence level for both v-scores and standard scores. Percentile ranks are also provided for domain scores and the Adaptive Behavior Composite. As with its predecessor, the Vineland-II provides five global adaptive levels as interpretive guides for each subdomain, domain, and Adaptive Behavior Composite: Low, Moderately Low, Adequate, Moderately High, and High. The exception to this pattern occurs in the Maladaptive Behavior subscales and Index that include the descriptive categories of Average, Elevated, and Clinically Significant. Though the manual also provides age equivalent scores for the subdomain raw scores, the test authors are careful to caution against reliance on these scores because of their inherent limitations and likelihood of being misinterpreted. Finally, stanines are provided for the subdomain and domain scores. The Vineland-II manual provides detailed guidelines for recording scores on the Score Report form and Score Profile and for interpreting each type of score. Interpretive steps include describing the individual's general adaptive functioning (using the Adaptive Behavior Composite score and confidence interval), describing performance in adaptive behavior domains and subdomains, identifying strengths and weaknesses, generating hypotheses about fluctuations in profiles, and describing maladaptive behavior (where applicable). DEVELOPMENT. The goal of the test authors was to revise the Vineland in response to feedback from clinicians and teachers, to incorporate changing cultural expectations, and to reflect recent research on developmental disabilities. In particular, the revisions in the Vineland-II reflect the greater cultural expectations for adaptive behavior in individuals with developmental disabilities associated with placement in least restrictive living environments. In addition, increased reliance on technology and the need for technological competence is recognized. Furthermore, the need to assess impairments in adaptive functioning in older individuals was addressed. New items were developed for the Vineland-II to better assess individuals with disabilities who function independently and to improve the diagnostic utility and sensitivity of the measure. An initial pool of over 3,800 items was reviewed by an expert panel, resulting in the elimination of some items and revision of others. The items were then reviewed for relevance and bias by a set of experienced clinicians and the pool was further reduced. Starting with a group of over 5,800 individuals, the remaining items were then tried out on a random sample of 1,843 individuals from the general population and a clinical sample of 392 individuals. The sampling plan controlled for ethnic diversity, sex, SES, geographic region, and community size. The outcome data were analyzed at the item level for developmental sequence, item validity, item placement, clinical sensitivity, bias, and redundancy and at the subdomain level of internal consistency reliability, intercorrelations, and factor structure. TECHNICAL. A nationally representative sample of individuals, from birth through age 90 (broken down into 20 age groups), comprised the standardization sample for the Survey Interview Form and Parent/Caregiver Rating Form (n = 3,695) and the Expanded Interview Form (n = 2,151). This sample was pulled from a much larger pool of over 25,000 individuals. Because of rapid developmental changes in the infant years and the need for early identification, a relatively higher proportion of the norm sample was clustered at birth through age 5 (about 30% for the Survey Interview Form; over 40% for the Expanded Interview Form). The TRF was administered to 2,570 teachers and daycare providers (from a larger pool of over 19,000) for 15 age groups of children, ranging from age 3 to 17/18. Even though the TRF is designed for students through age 21, students older than 18 were excluded from the sample because those who remain in secondary school through age 21 are not representative of their agepeers, most of whom are out of school and therefore do not have a teacher to complete the TRF. In all versions of the Vineland-II, the samples were designed to be evenly split between males and females and to match the 2001 U.S. Census data in the areas of race/ethnicity, SES, geographic region, community size, and special education placement. The clinical samples for each instrument included individuals diagnosed with one or more of the following conditions: attention deficit hyperactivity disorder (ADHD), autism (nonverbal or verbal), emotional/behavioral disturbance, hearing impairment, learning disability, mental retardation (mild, moderate, or severe/profound), and visual impairment. Reliability data for the Vineland-II address internal consistency, test-retest, and interrater/interinterviewer reliability. Coefficient alphas for the TRF and adjusted split-half Pearson correlations for the two other forms are consistently quite strong (mostly mid- to high .90s) for the Adaptive Behavior Composite. The one exception was the Survey Forms (Survey Interview Form and Parent/Caregiver Rating Form) for the ages 32-71, where scores tend to be the highest and therefore are less variable and reliable. The internal consistency reliability for the domain scores is also very good to excellent (mostly high .80s to mid- .90s), with the exception of the slightly lower reliability in the Motor Skills domain. As expected, the internal consistency for the subdomains is lower than for the domains, especially on the Survey Forms. On all forms, the Socialization subdomains appear to be the most reliable, whereas the Receptive subdomain and the Motor subdomains generally have fair reliability. Finally, the internal consistency reliabilities on the optional Maladaptive Behavior subscales and Index are mostly in the range of good to excellent (.70s to low .90s) on the Survey Forms and Expanded Form. Fewer items on this scale probably contribute to the slightly lower reliability coefficients. The adjusted test-retest reliability coefficients for the TRF (n = 135 students), Survey Forms (n = 414 respondents), and the Expanded Form (n = 220 respondents) are generally good to excellent for the Adaptive Behavior Composite (low .80s. to mid-.90s) Similarly, most of the average domain test-retest reliability coefficients were in the good to excellent range. The two exceptions demonstrated only fair reliability (.70s) on the Expanded Form for ages 0-2, where rapid developmental changes are most likely to occur over short periods of time and on the Survey Form for the teenage group. Average subdomain test-retest reliabilities ranged from fair to excellent. Test-retest reliabilities on the Maladaptive Behavior subscales and Index are mostly in the good to excellent range (.70s to .90s). Adjusted interrater reliability coefficients for the Adaptive Behavior Composite were mostly moderately strong (.70s to .80s) on the forms that were completed by parents/caregivers (Survey Forms and Expanded Form) and modest (.40s to .60s) for the TRF. Average interrater reliabilities for the domains and subdomains were similar or slightly lower for each form. The relatively low interrater reliability scores on the TRF are explained by the fact that different teachers have varying perceptions and interpretations of students' adaptive behavior. Mostly fair to good interrater reliability scores (.60s to .80s) were obtained on the Maladaptive Behavior subscales and Index except for somewhat lower coefficients on the Survey Forms with the adult population (.39 to .69 for the Parent/Caregiver Rating Form; .59 to .77 for the Survey Interview Form). Evidence for validity of the Vineland-II focuses on a variety of areas, including test content, test structure, diagnostic accuracy, and concurrent validity with related measures. Content validity evidence demonstrates a link between test content and the theoretical structure of adaptive behavior, as defined by groups such as the American Association on Mental Retardation, the American Psychological Association, and the National Academy of Sciences. In addition, the content of the Vineland-II appears to be representative of the domain of adaptive behavior, demonstrates developmental acquisition of behaviors and skills with age, and has items that are consistent with their assigned subdomains and domains. In terms of test structure, comparisons between the subdomain, domain, and Adaptive Behavior Composite scores on all forms indicate moderately high correlations, which support the strong influence of overall adaptive behavior on the individual domains and subdomains. Correlations between the subdomains within a domain tend to be slightly stronger than with subdomains across domains. However, overall modest subdomain clustering illustrates the interrelatedness of the adaptive behaviors across domains. Confirmatory factor analysis results on the TRF and Survey Forms indicate that a threeto four-factor model fits the data reasonably well, with a few exceptions. Diagnostic accuracy was determined by considering whether scores on the Vineland-II systematically corresponded to a variety of clinical groups, including those with diagnoses of mental retardation, autism, ADHD, emotional/behavioral disturbance, learning disability, and visual/hearing impairments. All three of the forms accurately differentiated between clinical and nonclinical populations and, in some cases, reliably distinguished between clinical groups and levels of severity within a diagnosis (i.e., mental retardation). The Vineland-II also identified patterns of behavior typical of individuals with milder diagnoses. Evidence for the concurrent validity of the Vineland-II was presented by comparing the instrument with related measures. As expected, the adjusted correlations between each of the three Vineland-II forms and the corresponding Vineland Adaptive Behavior Scale form are moderately high, most in the .80s and .90s. Correlations between the different forms of the Vineland-II with each other were also calculated. The relationship between the two forms completed by parents/guardians (Survey Form and Expanded Form) was moderate (.68 to .80 for the Adaptive Behavior Composite). Weaker correlations were found between the TRF and the Survey Form (.32 to .48 for Adaptive Behavior Composite). However, this pattern of relationship between the teacher and parent forms of the Vineland-II is very similar to the early Vineland ABS and likely reflects the fact that the respondents observe the students' behavior in significantly different environments. Scores from the Vineland-II were also correlated with scores from the Adaptive Behavior Assessment System, Second Edition (ABAS-II; Harrison & Oakland, 2003). The relationship between the two measures varied, depending on the Vineland-II form, the age group, and the type of score (subdomain, domain, or overall composite). The TRF had moderate correlations (.52 to .70) with the ABAS-II when looking at overall composite scores. The strongest areas of similarity were Communication and Socialization. Moderate to moderately strong correlations were also found between the Survey Forms and the ABAS-II composite scores (.69 to .78), though correlations at the subdomain and related skill areas were quite variable ranging from .27 (Health/Safety and Personal for ages 17-74) to .95 (Communication and Expressive Communication for ages 17-74). The most variable correspondence between the Vineland-II and the ABAS-II composite scores was found on the Expanded Form where correlations ranged from a modest .39 for ages 0-5 to a moderately strong .73 for ages 17-82. Subdomain correlations for the youngest age group were generally weak (.19 to .41) but increased with each age group. Though adaptive behavior and intelligence are different constructs, the Vineland-II was correlated with the WISC-III (all forms), WISC-IV (TRF), and WAIS-III (Survey and Expanded forms). The resulting low correlations between the Adaptive Behavior Composite and the Verbal, Performance, and Full Scale IQ scores from these measures confirm that both types of instruments contribute different kinds of information in the assessment process. Finally, the Vineland-II Survey Forms and TRF were compared with the Behavior Assessment System for Children, Second Edition (BASC-2; Reynolds & Kamphaus, 2004). Even though the BASC-2 is mainly a measure of maladaptive behaviors and the Vineland-II focuses on adaptive behaviors, there are some areas where the scores on the two instruments are closely related. The Maladaptive Behavior Index of the Survey Forms demonstrated a moderately strong correlation with the Behavioral Symptoms Index of the BASC-2 Parent Rating Scales. Furthermore, the TRF Adaptive Behavior Composite demonstrated a moderate to moderately strong negative correlation (-.60 to -.78) with the Behavioral Symptoms Index of the BASC-2 Teacher Rating Scales for individuals ages 6-18. Other patterns of negative correlations between the measures support the construct of adaptive behavior, as conceptualized by the Vineland-II COMMENTARY. The Vineland-II is a comprehensive and carefully designed set of rating/interview forms for assessing the adaptive functioning of individuals, ages 0 to 90. The theoretical model represented in the Vineland-II is thoroughly described and well supported by previous and current research. The three manuals include very thorough guidelines for administration, scoring, and interpretation, as well as detailed description of test development, standardization, and technical characteristics. The earlier version of this instrument, the Vineland Adaptive Behavior Scales, was a well-respected instrument and this revision will likely maintain the same strong reputation. The strengths of this instrument include a robust standardization sample, excellent internal consistency and test-retest reliability, and solid evidence for content, concurrent, and construct validity. In addition, the expanded age range for the Vineland-II allows for assessment of age-related adaptive functioning changes in elderly individuals. Furthermore, the increased item density at the youngest ages increases the possibility that developmental delays can be identified early and appropriate interventions can be implemented when they are most likely to lead to improved functioning. One of the weaknesses of the Vineland-II is relatively weak interrater reliability, especially on the TRF and the Maladaptive Behavior Index of the Survey Forms. However, this is a potential weakness of any rating scale, which requires respondents to quantify their observations of an individual, based on personal experience and varying expectations. Though the interrater reliability of the Vineland-II is lower than preferred, it is still comparable to reported interrater reliabilities of other adaptive behavior measures (Sattler & Hoge, 2006). Another potential weakness of the Vineland-II is inconsistency in the range of scores available by age. The highest scaled scores available vary by age, making it difficult to compare adaptive behavior skills over time for individuals with above average skills (Sattler & Hoge, 2006). Realistically, this is not likely to be a major problem in that the Vineland-II is not typically administered to individuals who are exhibiting higher than average functioning. SUMMARY. The Vineland-II is an individually administered measure of adaptive behavior with several different rating and interview forms for respondents (parents, caregivers, and teachers) who are very familiar with the individual. The Vineland has a long history of effective use in identifying individuals with adaptive behavior deficits and intervention planning and the recently revised Vineland-II shows promise in continuing this tradition. Even though it is probably one of the better adaptive behavior measures available, users should be cognizant of the inherent limitations of any instrument that relies solely on indirect measures of behavior such as ratings or interviews of third-party respondents. REVIEWER'S REFERENCES Doll, E. A. (1935). A genetic scale of social maturity. The American Journal of Orthopsychiatry, 5, 180188. Doll, E. A. (1965). Vineland Social Maturity Scale. Circle Pines, MN: American Guidance Service, Inc. Harrison, P. L., & Oakland, T. (2003). Adaptive Behavior Assessment System (2nd ed.). San Antonio, TX: The Psychological Corporation. Reynolds, C. R., & Kamphaus, R. W. (2004). Behavior Assessment System for Children (2nd ed.). Circle Pines, MN: AGS Publishing. Sattler, J. M., & Hoge, R. D. (2006). Assessment of children: Behavioral, social, and clinical foundations (5th ed.). La Mesa, CA: Jerome Sattler Publisher, Inc. Sparrow, S. S., Balla, D. A., & Cicchetti, D. V. (1984). Vineland Adaptive Behavior Scales. Circle Pines, MN: American Guidance Service, Inc. Review of the Vineland Adaptive Behavior Scales, Second Edition by KEITH F. WIDAMAN, Professor and Chair, Department of Psychology, University of California, Davis, CA: DESCRIPTION. The Vineland Adaptive Behavior Scales, Second Edition (Vineland-II) is an individually administered instrument for assessing adaptive behaviors of persons between the ages of 0 and 90. The adaptive behavior domain is conceptualized as encompassing the four broad dimensions of Communication, Daily Living Skills, Socialization, and Motor Skills, and the Vineland-II also includes an assessment of maladaptive, or problem, behaviors. The Vineland-II is available in several forms. The Survey Interview Form and the Parent/Caregiver Rating Form contain the same set of items. The former is completed by a trained rater based on an interview of an informant (e.g., parent) who knows well the person whose adaptive behaviors are being assessed, whereas the latter is completed by the parent or caregiver of the person being assessed. The Expanded Interview Form contains a more comprehensive set of items and ratings and provides additional information for educational or treatment programming. The Teacher Rating Form yields scores on the same dimensions of adaptive behavior, but focuses on behaviors that occur in classrooms and includes additional items related to basic academic functioning. Administration instructions are very clearly stated and easy to follow. The Vineland-II takes approximately 20 to 60 minutes to administer, depending on the adaptive levels exhibited by the person assessed. An additional 15 to 30 minutes are needed to hand score the instrument. The instructions for scoring the instrument are nicely formatted and easy to follow, and the manual has several examples of completed protocols with annotations on how discontinue rules were invoked, how to calculate raw scores on each scale, and then how standardized scores are obtained based on scale raw scores. The Vineland-II also has computerized scoring programs that make the computation and interpretation of scores much easier and avoid the many problems that arise in hand scoring of instruments. Standard scores for the four major dimensions are normed to have a mean of 100 and standard deviation of 15, similar to that for intelligence test scores. DEVELOPMENT. Adaptive behavior is one of two major domains to be assessed when determining whether an individual has mental retardation. That is, professional organizations agree that mental retardation is characterized by subnormal general intellectual functioning accompanied by deficits in adaptive behavior. The original version of the Vineland Adaptive Behavior Scales (Vineland ABS), published in 1984, was the first adaptive behavior instrument standardized on a representative sample of the U.S. population, so that a deviation score cutoff (e.g., a score two standard deviations below the mean or lower) could be employed in parallel to the cutoff that had long been used for intelligence tests. At the time of its initial publication (1984), the VABS was the premier instrument for assessing adaptive behaviors. The current version of the scale, the Vineland-II, carries a 2005 copyright date. The revision resulting in the Vineland-II was driven by various factors. New versions of tests typically involve the improvement of item content, culling items that are less relevant, and creating new items that are more discriminating, and these were salient goals in the development of this revision. But the test developers also wanted to increase the age range for which the instrument was valid, from 0 to 18 years to 0 to 90 years. They were also interested in increasing the number of items, and therefore the test precision, at very young ages and at the lower levels of each of the scales, to increase the diagnostic accuracy of scores. In pursuing these many goals, the test authors increased the number of adaptive behavior items across the four dimensions on the Survey Interview form by almost 50%, from 261 to 359 items, and increased substantially the numbers of items on the remaining forms as well. In addition to its use in the diagnosis of mental retardation, the Vineland-II is very useful in differential diagnosis and distinguishing among developmental disabilities, such as autism, and providing information relevant for providing programming for individuals with varying levels of adaptive behavior. TECHNICAL. The Survey Interview and Parent/Caregiver Rating Forms (hereinafter, Survey/Parent forms) of the Vineland-II, for assessing individuals between the ages of 0 and 90 years, were standardized on a nationally representative sample of 3,695 persons. The Teacher Rating Form (hereinafter, Teacher form), used for assessing students between the ages of 3 and 18 years, was standardized on a sample of 2,570 individuals. Both of these norming samples were quite comparable to the U.S. population on key dimensions, such as geographical strata as well as sex, race or ethnicity, and mother's education. Several different types of reliability coefficients were reported that reflect different psychometric properties of scale scores. Internal consistency reliabilities tended to be in the high .80s for the three primary domains of Communication, Daily Living Skills, and Socialization on the Survey/Parent forms (range: .84-.93); comparable reliabilities were notably higher on the Teacher form (median reliability of .95, range: .93-.97). Resulting standard errors of measurement (SEMs) were around 4.5 to 5 points on the Survey/Parent forms and rather narrower (between 2.6 and 4.0 points) on the Teacher form. Testretest correlations across an approximate 3-week interval averaged in the middle to upper .80s for both the Survey/Parent and Teacher forms. Interinterviewer (or interrater) reliabilities averaged around .75 for the Survey/Parent forms and noticeably lower, around .55, for the Teacher form. Not unexpectedly, given the positive correlations among scores on the three primary domains, all reliability statistics for the Adaptive Behavior Composite-an overall score based on the domain scores-were even more positive, with higher reliabilities and lower SEMs. Information relevant to the validity of the scale scores from the Vineland-II was documented in numerous ways, and only a sampling of validity information can be provided in the context of this short review. One form of validity is the factorial validity of a test. Here, the test developers fit confirmatory factor models to determine whether multiple factors representing each of the postulated domains could be confirmed and whether a single higher order factor successfully explained correlations among the first order factors. Without fail, models with multiple first-order factors and a single higher order factor fit the data well. However, the high correlations among the first-order factors (generally ranging above .80 at the latent variable level) may lead some to question the discriminant validity among the firstorder factors. Several group difference comparisons were made, such as comparing scores for persons with mild, moderate, or severe mental retardation with scores obtained for individuals from a nonclinical sample. These comparisons showed that the difference in performance between the nonclinical sample and samples with mental retardation were as expected, with larger differences exhibited by persons with more severe levels of mental retardation. Another type of validity information is the pattern of convergent and discriminant correlations with other measures. The Vineland-II domain scores tended to show moderately strong convergent correlations with comparable scales from the Adaptive Behavior Assessment System, Second Edition, with correlations averaging around .70 for similar scales. With regard to discriminant validity, Vineland-II domain scores tended to correlate at rather low levels with intelligence test scores from the Wechsler tests, correlations generally falling in the range from .10 to .35 for scores from the Survey/Parent forms and in the range from .05 to .50 for the Teacher form. COMMENTARY. The Vineland-II is a new and improved version of the original Vineland ABS that was the leader among instruments for assessing adaptive behavior upon its publication. Administration procedures are well described, and the methods of obtaining raw scores and scaled scores are clearly described and easy to perform. In addition to hand scoring, the Vineland-II comes with computer programs to ensure accurate computation of all scores. The norming samples for all forms are impressive in size and representativeness of the U.S. population. Moreover, the reliability and validity information provided for the Vineland-II is fairly comprehensive. One issue that is evident in test materials is the difficulty in assessing accurately high levels of adaptive behavior or skill. This implies that the Vineland-II is more accurate at distinguishing among persons scoring at rather low levels than among persons at high levels of adaptive skill. Given the importance of measures of adaptive behavior in the assessment of clinical syndromes such as mental retardation and autism, this is not a problem, but users should expect that scores at high levels of adaptive functioning will have poorer precision (i.e., higher SEMs) than those at low levels. SUMMARY. The Vineland-II was designed to be an easily used, standardized measure of key domains of adaptive behavior-Communication, Daily Living Skills, and Socialization-that play a prominent role in the diagnosis of mental retardation and other developmental disabilities. The instrument clearly meets its goals of ease of use, clear procedures for the calculation of both raw and scaled scores, and clear and comprehensive information regarding its reliability and validity. The Vineland-II deserves to be considered among the best measures of adaptive behavior currently available, and the use of this instrument for making high-stakes decisions regarding individuals is recommended.
Beck Anxiety Inventory [1993 Edition] By: Beck, Aaron T., Steer, Robert A, 19930101, Vol. 13 Mental Measurements Yearbook Review of the Beck Anxiety Inventory by E. THOMAS DOWD, Professor of Psychology, Kent State University, Kent, OH: Aaron T. Beck, M.D., and his associates have designed well-constructed and widely-used tests for years, with the twin virtues of simplicity and brevity. The Beck Anxiety Inventory (BAI) is no exception. At 21 items, it is certainly short and easy to understand as well. Each of the 21 items represents an anxiety symptom that is rated for severity on a 4-point Likert scale (0-3), ranging from Not at all to Severely; I could barely stand it. The scoring is easy, as the points for each item are simply added to form the total score. The test kit I received contained a brief manual, several answer sheets for the BAI as well as for the BDI and the BHS, and a complete computer-scoring package. Although not stated as such, I suspect that the instrument can be ordered without the computer scoring package. The manual, which is quite short, appears to have been written for a clinician rather than a researcher. The guidelines for administration and scoring are quite adequate, as are the data on reliability and validity. But the description of the scale development is inadequate, as it refers the reader to the original Beck, Epstein, Brown, and Steer (1988) article. In addition, the descriptions of other studies on the instrument are unacceptably brief. The BAI was originally developed from a sample of 810 outpatients of mixed diagnostic categories (predominantly mood and anxiety disorders). Two successive factor analyses on different samples then reduced the number of items to 21, with a minimum item-total correlation of .30. The original development appears to have been very well done and is described in detail in Beck et al. (1988). The reliability and validity data are thorough and informative but are based only on three studies: Beck et al. (1988), Fydrich, Dowdall, and Chambless (1990), and Dent and Salkovskis (1986). Of these, the first used a mixed diagnostic group, the second used patients diagnosed with DSM-III-R anxiety disorders, and the third used a nonclinical sample. The manual authors quite appropriately caution the reader the instrument was developed on a psychiatric population and should be interpreted cautiously with nonclinical individuals. The normative data tables are very thorough and informative, including means, standard deviations, coefficient alpha reliabilities, and corrected item-total correlations for five anxiety diagnostic groups with the highest representation in the sample. This apparently unpublished clinical sample consists of 393 outpatients who were seen at the Center for Cognitive Therapy in Philadelphia between January 1985 and August 1989. Internal consistency reliability coefficients are uniformly excellent, ranging between .85 and .94. Testretest reliability data from Beck et al. (1988) showed a coefficient of .75 over one week. The validity data are quite comprehensive, including content, concurrent, construct, discriminant, and factorial validity. In general, the data show excellent validity, even regarding the difficult problem of untangling anxiety and depression. Especially interesting to this reviewer were the factorial validity data. One factor analysis (Beck et al., 1988) found two factors (r = .56, p < .001) that seemed to reflect somatic and cognitive/affective aspects of anxiety, respectively. A cluster analysis on the clinical sample showed four clusters that are labeled Neurophysiological, Subjective, Panic, and Autonomic. The separate clusters showed acceptable reliability, considering the small number of items in each, and discriminant function analyses found some significant differences among the clusters. Two demographic variables, gender and age, appear to be significantly related to anxiety. Women were found to be more anxious than men and younger people were found to be more anxious than older people. Because of this, the authors caution users to adjust scores somewhat on interpretation, though how much is a little vague. As I mentioned earlier, a computer-scoring package came with the BAI, including large and small disks and a very detailed instruction manual. According to the manual, this package provides three modes of test administration and interpretive profiles for the Beck Depression Inventory (BDI; 13:31), The Beck Hopelessness Scale (BHS; 13:32), the Beck Anxiety Inventory (BAI; 13:30), and the Beck Scale for Suicide Ideation (BSSI; 13:33) separately and together. The profile includes clinical group references and a history of the patient's scores on previous administrations, in addition to data on that test. However, the package I received (as indicated in the manual and the README file) included only the separate profile reports and the BDI, BAI, and BHS. A note at the end of the manual suggested that some of this material would not be available until December 1992, so apparently I received an older version of the computerscoring package. [Editor's note: A 1994 version of the computer-scoring package is now available and produces integrative narrative reports.] Scoring of each test is not free once the package has been purchased. Credits must be purchased for the Use Counter (included) that is installed between the computer and the printer. In summary, the Beck Anxiety Inventory is another of the useful instruments designed by Beck and his colleagues. There are only a few deficits. First, the manual is too brief to give as much information as many users might like (though the computer scoring manual is very comprehensive). Second, there have been too few studies conducted on the BAI, with the result that it rests on an uncomfortably small data base. This is particularly apparent for the gender and age differences. The clusters of anxiety disorders identified thus far appear to be especially promising and further research should be conducted here. Clinicians, however, will find this a very useful test, especially when combined with the other Beck instruments into a comprehensive computer-scored interpretive profile. REVIEWER'S REFERENCES Dent, H. R., & Salkovskis, P. M. (1986). Clinical measures of depression, anxiety and obsessionality in nonclinical populations. Behavioral Research and Therapy, 24, 689-691. Beck, A. T., Epstein, N., Brown, G., & Steer, R. A. (1988). An inventory for measuring clinical anxiety: Psychometric properties. Journal of Consulting and Clinical Psychology, 56, 893-897. Fydrich, T., Dowdall, D., & Chambless, D. L. (1990, March). Aspects of reliability and validity for the Beck Anxiety Inventory. Paper presented at the National Conference on Phobias and Related Anxiety Disorders, Bethesda, MD. Review of the Beck Anxiety Inventory by NIELS G. WALLER, Associate Professor of Psychology, University of California, Davis, CA: While describing the development of the Beck Anxiety Inventory, Beck et al. (Beck, Epstein, Brown, & Steer, 1988) note that a number of studies have reported correlations greater than .50 between widely used anxiety and depression scales. Similar findings have been reported by others (Lovibond & Lovibond, 1995), and formal reviews of this topic (Clark & Watson, 1991; Dobson, 1985) conclude that anxiety and depression scales frequently correlate between .40 and .70. No one expects these scales to be uncorrelated because of the high comorbidity rates of anxiety and mood disorders (Maser & Cloninger, 1990). Yet many researchers (Riskind, Beck, Brown, & Steer, 1987) feel uncomfortable when measures of conceptually distinct constructs correlate as highly as .50. The Beck Anxiety Inventory (BAI; Beck & Steer, 1990; Beck, Epstein, Brown, & Steer, 1988) is a brief selfreport scale that was designed "to measure symptoms of anxiety which are minimally shared with those of depression" (Beck & Steer, 1990, p. 1). The 21 symptoms on the BAI were selected from three existing measures: (a) The Anxiety Check List (Beck, Steer, & Brown, 1985), (b) the PDR Check List (Beck, 1978), and (c) the Situational Anxiety Check List (Beck, 1982). The item pools of these scales were combined and winnowed using Jackson's (1970) method of scale construction. After eliminating identical or highly similar items, Beck and his colleagues used factor analysis to cull items for the final scale. Scales that are developed by this method often have high reliabilities, and coefficient alpha (Cortina, 1993) for the BAI is typically in the mid-.90s (Beck, Epstein, Brown, & Steer, 1988; Jolly, Aruffo, Wherry, & Livingston, 1993; Kumar, Steer, & Beck, 1993). The BAI items are measured on a 4-point Likert scale that ranges from Not at all (0 points) to Severely; I could barely stand it (3). The instructions for the test ask subjects to "indicate how much you have been bothered by each symptom during the PAST WEEK, INCLUDING TODAY, by placing an X in the corresponding space in the column next to each symptom" (manual, p. 4). Notice these instructions focus on a 2-week time frame; and consequently, the BAI should measure state anxiety better than trait anxiety. Beck et al. (1988) report a 1-week test-retest correlation of .75 for the BAI, whereas Creamer, Foran, and Bell (1995) report a 7-week correlation of .62. The factor structure of the BAI has been investigated in clinical (Beck, et al., 1988; Hewitt & Norton, 1993; Kumar et al., 1993) and nonclinical (Creamer et al., 1995) samples. Many studies have found two correlated dimensions (r = approximately .55) that have been interpreted as measuring somatic (example markers: Feelings of choking, Shaky) and subjective (example markers: Fear of the worst happening, Fear of losing control) symptoms of anxiety. A similar structure emerged in a recent factor analysis of data from the computer-administered BAI (Steer, Rissmiller, Ranieri, & Beck, 1993). The underlying structure of the BAI has also been investigated with cluster analysis. The BAI manual authors report that a centroid cluster analysis of clinical data uncovered four symptom clusters representing (a) neurophysiological, (b) subjective, (c) panic, and (d) autonomic symptoms of anxiety. Interestingly, a similar structure was uncovered in a recent factor analysis of the scale (Osman, Barrios, Aukes, Osman, & Markway, 1993). Regarding the cluster solution, Beck and Steer (1990) have suggested the cluster subscales "may assist the examiner in making a differential diagnosis" (p. 6) and that "profile analyses of BAI subscales appear promising" (p. 18). In my opinion, neither of these statements is supported by the data. The BAI subscales are highly correlated; consequently, subscale profiles will almost certainly be unreliable for most test takers. For example, from data reported in Table 5 of the BAI manual, it is easy to calculate the reliabilities for the cluster-subscale difference scores. For some of these scores the reliabilities are as low as .50. In other words, it is difficult to obtain reliable profiles when scales are composed of only four or five items. Because of the goals of the BAI authors, it is appropriate to ask how strongly the BAI correlates with popular depression scales, such as The Beck Depression Inventory (BDI; Beck & Steer, 1993; 13:31). In clinical samples, correlations between the BAI and BDI have ranged from .48 to .71 (Beck et al., 1988; Fydrich, Dowdall, & Chambless, 1992; Hewitt & Norton, 1993; Steer, Ranieri, Beck, & Clark, 1993). A Bayesian (Iversen, 1984, p. 41-44) posterior estimate for these data suggests that r = .587 with a 95% probability that the population correlation lies between .545 and .626. In nonclinical samples (Creamer et al., 1995; Dent & Salkovskis, 1986), correlations between the BAI and BDI have ranged from .50 to .63. The Bayesian estimate of r for these data is .591 with a 95% probability that the population correlation lies between .548 and .631. In conclusion, it appears that Beck was not successful in developing an anxiety scale with high discriminant validity. Nevertheless, he did develop a highly reliable scale that can be administered in 5 to 10 minutes. Thus, the BAI appears to be a useful addition to the growing number of clinical anxiety measures. REVIEWER'S REFERENCES Jackson, D. N. (1970). A sequential system for personality scale development. In C. D. Spielberger (Ed.), Current topics in clinical and community psychology (vol. 2, pp. 61-96). New York: Academic Press. Beck, A. T. (1978). PDR Check List. Philadelphia: University of Pennsylvania, Center for Cognitive Therapy. Beck, A. T. (1982). Situational Anxiety Check List (SAC). Philadelphia: University of Pennsylvania, Center for Cognitive Therapy. Iversen, G. R. (1984). Bayesian statistical inference. Beverly Hills, CA: Sage. Beck, A. T., Steer, R. A., & Brown, G. (1985). Beck Anxiety Check List. Unpublished manuscript, University of Pennsylvania. Dobson, K. S. (1985). The relationship between anxiety and depression. Clinical Psychology Review, 5, 307-324. Dent, H. R., & Salkovskis, P. M. (1986). Clinical measures of depression, anxiety, and obsessionality in non-clinical populations. Behaviour Research and Therapy, 24, 689-691. Riskind, J. H., Beck, A. T., Brown, G., & Steer, R. A. (1987). Taking the measure of anxiety and depression: Validity of reconstructed Hamilton scales. Journal of Nervous and Mental Disease, 175, 475-479. Beck, A. T., Epstein, N., Brown, G., & Steer, R. A. (1988). An inventory for measuring clinical anxiety: Psychometric properties. Journal of Consulting & Clinical Psychology, 56, 893-897. Beck, A. T., & Steer, R. A. (1990). Manual for the Beck Anxiety Inventory. San Antonio, TX: The Psychological Corporation. Maser, J. D., & Cloninger, C. R. (Eds.). (1990). Comorbidity of mood and anxiety disorders. Washington, DC: American Psychiatric Press. Clark, L. A., & Watson, D. (1991). Theoretical and empirical issues in differentiating depression from anxiety. In J. Becker, & A. Kleinman (Eds.), Psychosocial aspects of depression. Hillsdale, NJ: Erlbaum. Frydrich, T., Dowdall, D., & Chambless, D. L. (1992). Reliability and validity of the Beck Anxiety Inventory. Journal of Anxiety Disorders, 6, 55-61. Beck, A. T., & Steer, R. A. (1993). Beck Depression Inventory, manual. San Antonio, TX: The Psychological Corporation. Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98-104. Hewitt, P. L., & Norton, G. R. (1993). The Beck Anxiety Inventory: A psychometric analysis. Psychological Assessment, 5, 408-412. Jolly, J. B., Aruffo, J. F., Wherry, J. N., & Livingston, R. (1993). The utility of the Beck Anxiety Inventory with inpatient adolescents. Journal of Anxiety Disorders, 7, 95-106. Kumar, G., Steer, R. A., & Beck, A. T. (1993). Factor structure of the Beck Anxiety Inventory with adolescent psychiatric inpatients. Anxiety, Stress, and Coping, 6, 125-131. Osman, A., Barrios, F. X., Aukes, D., Osman, J. R., & Markway, K. (1993). The Beck Anxiety Inventory: Psychometric properties in a community population. Journal of Psychopathology and Behavioral Assessment, 15, 287-297. Steer, R. A., Rissmiller, D. J., Ranieri, W. F., & Beck, A. T. (1993). Structure of the computer-assisted Beck Anxiety Inventory with psychiatric inpatients. Journal of Personality Assessment, 60, 532-542. Steer, R. A., Ranieri, W. F., Beck, A. T., & Clark, D. A. (1993). Further evidence for the validity of the Beck Anxiety Inventory with psychiatric outpatients. Journal of Anxiety Disorders, 7, 195-205. Creamer, M., Foran, J., & Bell, R. (1995). The Beck Anxiety Inventory in a non-clinical sample. Behavior Research and Therapy, 33, 477-485. Gillis, M. M., Haaga, D. A. F., & Ford, G. T. (1995). Normative values for the Beck Anxiety Inventory, Fear Questionnaire, Penn State Worry Questionnaire, and Social Phobia and Anxiety Inventory. Psychological Assessment, 7, 450-455. Lovibond, P. F., & Lovibond, S. H. (1995). The structure of negative emotional states: Comparison of the Depression Anxiety Stress Scales (DASS) with the Beck Depression and Anxiety Inventories. Behaviour and Therapy Research, 33, 335-343.
Beck Anxiety Inventory [1993 Edition] By: Beck, Aaron T., Steer, Robert A, 19930101, Vol. 13 Mental Measurements Yearbook Review of the Beck Anxiety Inventory by E. THOMAS DOWD, Professor of Psychology, Kent State University, Kent, OH: Aaron T. Beck, M.D., and his associates have designed well-constructed and widely-used tests for years, with the twin virtues of simplicity and brevity. The Beck Anxiety Inventory (BAI) is no exception. At 21 items, it is certainly short and easy to understand as well. Each of the 21 items represents an anxiety symptom that is rated for severity on a 4-point Likert scale (0-3), ranging from Not at all to Severely; I could barely stand it. The scoring is easy, as the points for each item are simply added to form the total score. The test kit I received contained a brief manual, several answer sheets for the BAI as well as for the BDI and the BHS, and a complete computer-scoring package. Although not stated as such, I suspect that the instrument can be ordered without the computer scoring package. The manual, which is quite short, appears to have been written for a clinician rather than a researcher. The guidelines for administration and scoring are quite adequate, as are the data on reliability and validity. But the description of the scale development is inadequate, as it refers the reader to the original Beck, Epstein, Brown, and Steer (1988) article. In addition, the descriptions of other studies on the instrument are unacceptably brief. The BAI was originally developed from a sample of 810 outpatients of mixed diagnostic categories (predominantly mood and anxiety disorders). Two successive factor analyses on different samples then reduced the number of items to 21, with a minimum item-total correlation of .30. The original development appears to have been very well done and is described in detail in Beck et al. (1988). The reliability and validity data are thorough and informative but are based only on three studies: Beck et al. (1988), Fydrich, Dowdall, and Chambless (1990), and Dent and Salkovskis (1986). Of these, the first used a mixed diagnostic group, the second used patients diagnosed with DSM-III-R anxiety disorders, and the third used a nonclinical sample. The manual authors quite appropriately caution the reader the instrument was developed on a psychiatric population and should be interpreted cautiously with nonclinical individuals. The normative data tables are very thorough and informative, including means, standard deviations, coefficient alpha reliabilities, and corrected item-total correlations for five anxiety diagnostic groups with the highest representation in the sample. This apparently unpublished clinical sample consists of 393 outpatients who were seen at the Center for Cognitive Therapy in Philadelphia between January 1985 and August 1989. Internal consistency reliability coefficients are uniformly excellent, ranging between .85 and .94. Testretest reliability data from Beck et al. (1988) showed a coefficient of .75 over one week. The validity data are quite comprehensive, including content, concurrent, construct, discriminant, and factorial validity. In general, the data show excellent validity, even regarding the difficult problem of untangling anxiety and depression. Especially interesting to this reviewer were the factorial validity data. One factor analysis (Beck et al., 1988) found two factors (r = .56, p < .001) that seemed to reflect somatic and cognitive/affective aspects of anxiety, respectively. A cluster analysis on the clinical sample showed four clusters that are labeled Neurophysiological, Subjective, Panic, and Autonomic. The separate clusters showed acceptable reliability, considering the small number of items in each, and discriminant function analyses found some significant differences among the clusters. Two demographic variables, gender and age, appear to be significantly related to anxiety. Women were found to be more anxious than men and younger people were found to be more anxious than older people. Because of this, the authors caution users to adjust scores somewhat on interpretation, though how much is a little vague. As I mentioned earlier, a computer-scoring package came with the BAI, including large and small disks and a very detailed instruction manual. According to the manual, this package provides three modes of test administration and interpretive profiles for the Beck Depression Inventory (BDI; 13:31), The Beck Hopelessness Scale (BHS; 13:32), the Beck Anxiety Inventory (BAI; 13:30), and the Beck Scale for Suicide Ideation (BSSI; 13:33) separately and together. The profile includes clinical group references and a history of the patient's scores on previous administrations, in addition to data on that test. However, the package I received (as indicated in the manual and the README file) included only the separate profile reports and the BDI, BAI, and BHS. A note at the end of the manual suggested that some of this material would not be available until December 1992, so apparently I received an older version of the computerscoring package. [Editor's note: A 1994 version of the computer-scoring package is now available and produces integrative narrative reports.] Scoring of each test is not free once the package has been purchased. Credits must be purchased for the Use Counter (included) that is installed between the computer and the printer. In summary, the Beck Anxiety Inventory is another of the useful instruments designed by Beck and his colleagues. There are only a few deficits. First, the manual is too brief to give as much information as many users might like (though the computer scoring manual is very comprehensive). Second, there have been too few studies conducted on the BAI, with the result that it rests on an uncomfortably small data base. This is particularly apparent for the gender and age differences. The clusters of anxiety disorders identified thus far appear to be especially promising and further research should be conducted here. Clinicians, however, will find this a very useful test, especially when combined with the other Beck instruments into a comprehensive computer-scored interpretive profile. REVIEWER'S REFERENCES Dent, H. R., & Salkovskis, P. M. (1986). Clinical measures of depression, anxiety and obsessionality in nonclinical populations. Behavioral Research and Therapy, 24, 689-691. Beck, A. T., Epstein, N., Brown, G., & Steer, R. A. (1988). An inventory for measuring clinical anxiety: Psychometric properties. Journal of Consulting and Clinical Psychology, 56, 893-897. Fydrich, T., Dowdall, D., & Chambless, D. L. (1990, March). Aspects of reliability and validity for the Beck Anxiety Inventory. Paper presented at the National Conference on Phobias and Related Anxiety Disorders, Bethesda, MD. Review of the Beck Anxiety Inventory by NIELS G. WALLER, Associate Professor of Psychology, University of California, Davis, CA: While describing the development of the Beck Anxiety Inventory, Beck et al. (Beck, Epstein, Brown, & Steer, 1988) note that a number of studies have reported correlations greater than .50 between widely used anxiety and depression scales. Similar findings have been reported by others (Lovibond & Lovibond, 1995), and formal reviews of this topic (Clark & Watson, 1991; Dobson, 1985) conclude that anxiety and depression scales frequently correlate between .40 and .70. No one expects these scales to be uncorrelated because of the high comorbidity rates of anxiety and mood disorders (Maser & Cloninger, 1990). Yet many researchers (Riskind, Beck, Brown, & Steer, 1987) feel uncomfortable when measures of conceptually distinct constructs correlate as highly as .50. The Beck Anxiety Inventory (BAI; Beck & Steer, 1990; Beck, Epstein, Brown, & Steer, 1988) is a brief selfreport scale that was designed "to measure symptoms of anxiety which are minimally shared with those of depression" (Beck & Steer, 1990, p. 1). The 21 symptoms on the BAI were selected from three existing measures: (a) The Anxiety Check List (Beck, Steer, & Brown, 1985), (b) the PDR Check List (Beck, 1978), and (c) the Situational Anxiety Check List (Beck, 1982). The item pools of these scales were combined and winnowed using Jackson's (1970) method of scale construction. After eliminating identical or highly similar items, Beck and his colleagues used factor analysis to cull items for the final scale. Scales that are developed by this method often have high reliabilities, and coefficient alpha (Cortina, 1993) for the BAI is typically in the mid-.90s (Beck, Epstein, Brown, & Steer, 1988; Jolly, Aruffo, Wherry, & Livingston, 1993; Kumar, Steer, & Beck, 1993). The BAI items are measured on a 4-point Likert scale that ranges from Not at all (0 points) to Severely; I could barely stand it (3). The instructions for the test ask subjects to "indicate how much you have been bothered by each symptom during the PAST WEEK, INCLUDING TODAY, by placing an X in the corresponding space in the column next to each symptom" (manual, p. 4). Notice these instructions focus on a 2-week time frame; and consequently, the BAI should measure state anxiety better than trait anxiety. Beck et al. (1988) report a 1-week test-retest correlation of .75 for the BAI, whereas Creamer, Foran, and Bell (1995) report a 7-week correlation of .62. The factor structure of the BAI has been investigated in clinical (Beck, et al., 1988; Hewitt & Norton, 1993; Kumar et al., 1993) and nonclinical (Creamer et al., 1995) samples. Many studies have found two correlated dimensions (r = approximately .55) that have been interpreted as measuring somatic (example markers: Feelings of choking, Shaky) and subjective (example markers: Fear of the worst happening, Fear of losing control) symptoms of anxiety. A similar structure emerged in a recent factor analysis of data from the computer-administered BAI (Steer, Rissmiller, Ranieri, & Beck, 1993). The underlying structure of the BAI has also been investigated with cluster analysis. The BAI manual authors report that a centroid cluster analysis of clinical data uncovered four symptom clusters representing (a) neurophysiological, (b) subjective, (c) panic, and (d) autonomic symptoms of anxiety. Interestingly, a similar structure was uncovered in a recent factor analysis of the scale (Osman, Barrios, Aukes, Osman, & Markway, 1993). Regarding the cluster solution, Beck and Steer (1990) have suggested the cluster subscales "may assist the examiner in making a differential diagnosis" (p. 6) and that "profile analyses of BAI subscales appear promising" (p. 18). In my opinion, neither of these statements is supported by the data. The BAI subscales are highly correlated; consequently, subscale profiles will almost certainly be unreliable for most test takers. For example, from data reported in Table 5 of the BAI manual, it is easy to calculate the reliabilities for the cluster-subscale difference scores. For some of these scores the reliabilities are as low as .50. In other words, it is difficult to obtain reliable profiles when scales are composed of only four or five items. Because of the goals of the BAI authors, it is appropriate to ask how strongly the BAI correlates with popular depression scales, such as The Beck Depression Inventory (BDI; Beck & Steer, 1993; 13:31). In clinical samples, correlations between the BAI and BDI have ranged from .48 to .71 (Beck et al., 1988; Fydrich, Dowdall, & Chambless, 1992; Hewitt & Norton, 1993; Steer, Ranieri, Beck, & Clark, 1993). A Bayesian (Iversen, 1984, p. 41-44) posterior estimate for these data suggests that r = .587 with a 95% probability that the population correlation lies between .545 and .626. In nonclinical samples (Creamer et al., 1995; Dent & Salkovskis, 1986), correlations between the BAI and BDI have ranged from .50 to .63. The Bayesian estimate of r for these data is .591 with a 95% probability that the population correlation lies between .548 and .631. In conclusion, it appears that Beck was not successful in developing an anxiety scale with high discriminant validity. Nevertheless, he did develop a highly reliable scale that can be administered in 5 to 10 minutes. Thus, the BAI appears to be a useful addition to the growing number of clinical anxiety measures. REVIEWER'S REFERENCES Jackson, D. N. (1970). A sequential system for personality scale development. In C. D. Spielberger (Ed.), Current topics in clinical and community psychology (vol. 2, pp. 61-96). New York: Academic Press. Beck, A. T. (1978). PDR Check List. Philadelphia: University of Pennsylvania, Center for Cognitive Therapy. Beck, A. T. (1982). Situational Anxiety Check List (SAC). Philadelphia: University of Pennsylvania, Center for Cognitive Therapy. Iversen, G. R. (1984). Bayesian statistical inference. Beverly Hills, CA: Sage. Beck, A. T., Steer, R. A., & Brown, G. (1985). Beck Anxiety Check List. Unpublished manuscript, University of Pennsylvania. Dobson, K. S. (1985). The relationship between anxiety and depression. Clinical Psychology Review, 5, 307-324. Dent, H. R., & Salkovskis, P. M. (1986). Clinical measures of depression, anxiety, and obsessionality in non-clinical populations. Behaviour Research and Therapy, 24, 689-691. Riskind, J. H., Beck, A. T., Brown, G., & Steer, R. A. (1987). Taking the measure of anxiety and depression: Validity of reconstructed Hamilton scales. Journal of Nervous and Mental Disease, 175, 475-479. Beck, A. T., Epstein, N., Brown, G., & Steer, R. A. (1988). An inventory for measuring clinical anxiety: Psychometric properties. Journal of Consulting & Clinical Psychology, 56, 893-897. Beck, A. T., & Steer, R. A. (1990). Manual for the Beck Anxiety Inventory. San Antonio, TX: The Psychological Corporation. Maser, J. D., & Cloninger, C. R. (Eds.). (1990). Comorbidity of mood and anxiety disorders. Washington, DC: American Psychiatric Press. Clark, L. A., & Watson, D. (1991). Theoretical and empirical issues in differentiating depression from anxiety. In J. Becker, & A. Kleinman (Eds.), Psychosocial aspects of depression. Hillsdale, NJ: Erlbaum. Frydrich, T., Dowdall, D., & Chambless, D. L. (1992). Reliability and validity of the Beck Anxiety Inventory. Journal of Anxiety Disorders, 6, 55-61. Beck, A. T., & Steer, R. A. (1993). Beck Depression Inventory, manual. San Antonio, TX: The Psychological Corporation. Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98-104. Hewitt, P. L., & Norton, G. R. (1993). The Beck Anxiety Inventory: A psychometric analysis. Psychological Assessment, 5, 408-412. Jolly, J. B., Aruffo, J. F., Wherry, J. N., & Livingston, R. (1993). The utility of the Beck Anxiety Inventory with inpatient adolescents. Journal of Anxiety Disorders, 7, 95-106. Kumar, G., Steer, R. A., & Beck, A. T. (1993). Factor structure of the Beck Anxiety Inventory with adolescent psychiatric inpatients. Anxiety, Stress, and Coping, 6, 125-131. Osman, A., Barrios, F. X., Aukes, D., Osman, J. R., & Markway, K. (1993). The Beck Anxiety Inventory: Psychometric properties in a community population. Journal of Psychopathology and Behavioral Assessment, 15, 287-297. Steer, R. A., Rissmiller, D. J., Ranieri, W. F., & Beck, A. T. (1993). Structure of the computer-assisted Beck Anxiety Inventory with psychiatric inpatients. Journal of Personality Assessment, 60, 532-542. Steer, R. A., Ranieri, W. F., Beck, A. T., & Clark, D. A. (1993). Further evidence for the validity of the Beck Anxiety Inventory with psychiatric outpatients. Journal of Anxiety Disorders, 7, 195-205. Creamer, M., Foran, J., & Bell, R. (1995). The Beck Anxiety Inventory in a non-clinical sample. Behavior Research and Therapy, 33, 477-485. Gillis, M. M., Haaga, D. A. F., & Ford, G. T. (1995). Normative values for the Beck Anxiety Inventory, Fear Questionnaire, Penn State Worry Questionnaire, and Social Phobia and Anxiety Inventory. Psychological Assessment, 7, 450-455. Lovibond, P. F., & Lovibond, S. H. (1995). The structure of negative emotional states: Comparison of the Depression Anxiety Stress Scales (DASS) with the Beck Depression and Anxiety Inventories. Behaviour and Therapy Research, 33, 335-343.
Novaco Anger Scale and Provocation Inventory By: Novaco, Raymond W, 20030101, Vol. 17 Mental Measurements Yearbook Review of the Novaco Anger Scale and Provocation Inventory by ALBERT BUGAJ, Professor of Psychology, University of Wisconsin-Marinette, Marinette, WI: DESCRIPTION. The Novaco Anger Scale and Provocation Inventory (NAS-PI), designed for individual assessment, outcome evaluation, and research purposes, consists of the 60-item Novaco Anger Scale (NAS) and the 25-item Provocation Inventory (PI). The NAS, which focuses on an individual's experience of anger, results in an overall scale score, and scores on Cognitive, Arousal, Behavioral, and Anger Regulation subscales. High NAS scores may indicate a need for clinical intervention, although they also may show an effort by the test-taker to "look tough." The Provocation Inventory (PI), intended to assess the types of situations that lead to anger in five content areas such as "disrespectful treatment," results in an overall scale score. Results of the PI can elicit discussions of settings that provoke strong anger in a client; discussion of items leading to low scores can result in an understanding of how the client utilizes effective coping skills. A trained technician can administer the NAS-PI in individual or group settings, although only individuals with clinical training in psychological testing should interpret the results. A paper-and-pencil test, NAS-PI responses are hand-scored, although the test manual indicates a computerized version is available. A formula developed by Barrett (2001), provided in the test manual, allows the values of missing responses to be estimated when three or fewer items are left incomplete. The test manual also includes a method for checking for inconsistent response patterns. DEVELOPMENT. A theory of anger devised by Novaco (1977) formed the basis of the preliminary set of 101 items for the NAS-PI. A sample of 171 undergraduate students responded to these items. Results of this test administration and interviews with 45 hospitalized patients led to the creation of a revised instrument containing 88 items. Two additional waves of testing produced the final instrument. TECHNICAL. The standardization sample of the NAS-PI consisted of 1,546 individuals (ages 9 to 84) from nonclinical settings. The manual indicates slight underrepresentation in the sample of males, individuals of minority ethnic backgrounds, and those with lower levels of education. Statistical examination of the scores (Cohen's d) indicated that although scores of men and women were comparable, scores of younger test-takers (ages 9 to 18) and adults (19 and older) differed significantly; the manual thus provides separate norms for the two age groups. The manual notes that African Americans' scores were higher than the average of the standardization sample on some scales, a result also found in other research (cited in the test manual) with African American and Hispanic samples. Individuals with lower educational levels also acquired scores departing from the average. The test manual suggests further research is necessary to uncover why these groups depart from the norm. The NAS and PI exhibit high levels of internal consistency. Alpha coefficient for the NAS total score for the standardization sample was .94. Alpha values ranged from .76 for the Anger Regulation subscale to a high of .89 for the Behavior subscale. Coefficient alpha for PI total score was .95, and ranged from .73 to .84 for its subscales. Median test-retest reliability over a 2-week period for a group of 27 individuals from the standardization sample was .78, ranging from .47 on the cognitive subscale to .82 on the PI total score. The test author acknowledges the small size of the sample. Higher test-retest reliabilities resulted from studies (utilizing larger sample sizes) of hospitalized inpatients in California and Scotland, and Canadian prison inmates. The test manual cites several studies that examined the concurrent validity of the NAS-PI using samples from clinical and correctional populations. For example, in a study involving 141 male and female psychiatric patients in California, the NAS total score was strongly correlated to the total score on the Buss-Durkee Hostility Inventory (r = .82); the Caprara Scales of Irritability (r = .78) and Rumination (r = .69), the Cook-Medley Hostility scale (r = .68), and the STAXI Trait Anger Scale (r = .84). With regard to the PI, the test author concludes that although the PI total score is closely related to other measures of anger, specific content areas of the PI "seem to have some selective relationships with other measures" that are difficult to interpret (manual, p. 38), and calls for further research into this area. More problematic, therapist ratings of severity of past and current offenses committed by a sample (n = 59) of juvenile delinquents in a residential treatment facility were for the most part unrelated to NAS-PI scores. However, the "Rated Anger Level" of 39 paroled sex offenders was related to NAS total scores (r = .42) and the Behavior subscale (r = .51). The Behavior subscale was also related to the therapists' ratings of the parolees' offensive history (r = .45), appropriateness for participation in anger management groups (r = .33), and parolees' current offense severity (r = .29). A number of intercorrelations within the NAS and PI subscales proved to be moderate to high in the standardization sample, with those between subscales of the NAS and PI generally lower than those within the two scales. Factor analysis of the NAS resulted in three factors each consisting of items from at least two subscales. Factor analysis of the PI resulted in five factors. Another factor analysis (n = 1,101 civil commitment patients) utilized a different set of Cognitive items, and did not include Regulation items. In one study of predictive validity, NAS (r = .46) and PI (r = .43) total scores and several subscales were found to be predictive of STAXI State Anger. In a retrospective analysis, the NAS total was predictive of hospitalized patients' number of convictions for violent crimes. In a study of 1,100 discharged patients, the NAS was predictive of violence during the first 20 weeks after discharge and at 1 year. Although several other studies are supportive of the NAS, homicidal patients in the standardization sample obtained higher Irritability scores than the "normal" standardization subsample, but reported higher Anger Regulation. Juvenile delinquents in the group reported poorer Anger Regulation, but problematically, less Anger Provocation. A number of groups (Homicide Perpetrators in Psychiatric Treatment, Incarcerated Sex Offenders, and Juvenile Delinquents in Residential Care) reported higher scores on the Crowne-Marlowe Social Desirability Scale than did other groups. Three studies reported in the test manual examined the NAS-PI as a measure of anger treatment outcomes with positive results. COMMENTARY. Numerous studies (only some of which are referred to in this review) attest to the concurrent validity, as well as the test-retest reliability, of the NAS-PI. However, as the test author intimates, the test is not without problems. One must question the suitability of the NAS-PI for use with minority populations, as data indicate African Americans and Hispanics score differently than the normative population. The same may be said for individuals with lower levels of education. Although the test author states that most people with a fourth-grade reading ability can read the test, there is no indication of an empirical examination of the readability of the items. One may thus question the use of the NAS-PI with younger or less educated populations. One must also wonder if lower levels of education or reading ability do not lead to the lower scores of younger populations. On the basis of the factor analysis reported in the test manual, the question arises as to whether the subscale structure of the test is correct. One would expect items related to particular subscales to load on the same factor. This was not the case in the reported factor analysis where items from two and sometimes three subscales loaded on the same factor. Although this outcome does not detract from the potential worth of the overall scale scores, further examination of the subscale structure of the test is suggested. The most problematic issue concerning the NAS-PI is the issue of social desirability. The test manual concludes with a suggestion that the test administrator obtain an estimate of response bias (for example, through use of the Crowne-Marlowe Social Desirability Scale) when using the NAS-PI in forensic settings. The manual also states that very high scores might indicate an effort to "look bad" (p. 13) or "look tough" (p. 15). If this is the case, it may be questioned why the test authors did not devise scales measuring social desirability and the need to "look tough" during the inception of the scales. None of this is to say the NAS-PI is not a useful test. A good deal of research has gone into determining the psychometric properties of the test, with regard to its validity and reliability. A sound and elaborate theory, which should be used to further refine the test's factor structure, forms the basis of the NAS-PI. SUMMARY. Used with proper caution, the NAS-PI should prove useful in assessing anger in clinical and forensic populations. The test possesses adequate reliability, and concurrent and predictive validity. Caution must be taken, however, when the test is used with minority, younger, or less educated populations. Test users also must be wary of social desirability effects and efforts by the test-taker to "look tough." REVIEWER'S REFERENCES Barrett, P. (2001). Prorating error in test scores. Retrieved on August 11, 2006, from http://www.liv.ac.uk~pbarrett/statistics_corner.htm Novaco, R. W. (1977). Stress inoculation: A cognitive therapy for anger and its application to a case of depression. Journal of Consulting and Clinical Psychology, 45, 600-608. Review of the Novaco Anger Scale and Provocation Inventory by GEOFFREY L. THORPE, Professor of Psychology, University of Maine, Orono, ME: DESCRIPTION. The Novaco Anger Scale and Provocation Inventory (NAS-PI) is a two-part self-report test consisting of 85 items. The NAS component assesses how an individual experiences anger, with questions like: "If someone bothers me, I react first and think later." Respondents rate each of the 60 items of the NAS on a 3-point scale (1 = Never true, 2 = Sometimes true, 3 = Always true), producing four subscale scores (Cognitive, Arousal, Behavior, and Anger Regulation) and a total score. The PI component describes situations that may lead to anger, such as: "Being accused of something that you didn't do." Respondents rate each of the 25 items of the PI on a 4-point scale to indicate how angry the situation described would make them feel (1 = Not at all angry, 2 = A little angry, 3 = Fairly angry, 4 = Very angry), producing a single total score. The NAS-PI was designed "to assess anger as a problem of psychological functioning and physical health and to assess therapeutic change" (manual, p. 1). The test author cites a broad range of stress-related health problems and mental disorders in which anger is prominent, and argues that the assessment of anger disposition and its modification is an important task for many healthcare professionals. The materials received from the test publisher consist of a 64-page manual and a package of NAS-PI profile sheets and AutoScore forms. The profile sheets, printed separately for Adolescent (age range 918) and Adult (ages 19 and over) respondents, present T-scores and percentiles corresponding with the various subscale and total score ranges. DEVELOPMENT. The NAS-PI components were developed separately. NAS items are clinically oriented and reflect the guiding theoretical orientation that anger comprises elements of cognition, arousal, and behavior, "linked by feedback and regulation mechanisms and embedded in an environmental context" (manual, p. 21). Items in the Cognitive domain represent the dimensions of justification, suspiciousness, rumination, and hostile attitude; in the Arousal domain, intensity, duration, somatic tension, and irritability; and in the Behavioral domain, impulsive reaction, verbal aggression, physical confrontation, and indirect expression. PI items were selected to assess the intensity of anger elicited by a variety of provocative situations in five content areas. These areas are: disrespectful treatment, unfairness, frustration, annoying traits of others, and irritations. An initial set of 101 test items for the NAS-PI was pilot-tested with 171 undergraduate students, who also completed a battery of tests that included other anger inventories. The set of items was reduced to the 88 with the best psychometric properties that also represented the most realistic match with the experiences of state hospital patients with severe anger problems. The revised instrument was administered to 142 psychiatric inpatients, a subset of whom provided test-retest reliability data. Further refinements followed, and the final form of the NAS-PI was re-assessed with similar samples of inpatients. A set of 16 item pairs was identified to serve as a rough index of consistent responding, the criterion being that each pair selected showed a minimal intercorrelation of .40 or greater (these correlations range from .42 to .66). TECHNICAL. The standardization sample consisted of 1,546 respondents, ranging in age from 9 to 84, from public schools, college classrooms, senior centers, religious organizations, and other community settings. In addition to a table of raw score means and standard deviations for the subscales and total scores in the entire standardization sample, the test author (Novaco) provides tabulations of the sample's demographic characteristics (gender, age, ethnic background, socioeconomic status, and geographic region), indicating a fairly close match with data from the U.S. Census of 2000. Novaco presents T-scores for males and females, for racial/ethnic subgroups, for nine age ranges, and for five educational levels in the normative group. He argues that the use of T-scores makes effect sizes immediately apparent, so that T = 56 in one subgroup in comparison with T = 50 in another reveals a large effect size of .6 in standard deviation units. On that criterion, the observed statistically significant difference between males and females on the NAS behavior subscale (the only scale to produce a significant sex difference) is not clinically meaningful, as the corresponding effect size is only .31. The internal consistency (alpha) estimates for the standardization sample were very high for the total scores: .94 for the NAS and .95 for the PI. Alpha coefficients for the NAS subscales ranged from .76 to .89. Similar results were obtained for juveniles and adults, and for psychiatric inpatients in California and Scotland. Test-retest reliability estimates in a subsample of the standardization sample were .76 for the NAS and .82 for the PI. These estimates were drawn from a very small subset of 27 respondents who were tested 2 weeks apart. More compelling are the test-retest correlations of .84 (NAS) and .86 (PI) in 126 California state hospital inpatients with a 2-week intertest interval. The construct validity of the NAS was assessed by obtaining its correlations with the Buss-Durkee Hostility Inventory, the STAXI Trait Anger Scale, and two other anger scales, producing coefficients ranging from .69 to .84 in a sample of 141 inpatients. Data from inpatients in a high security forensic unit in Scotland showed a similar range of intercorrelations of the NAS-PI with other anger measures, but much lower correlations with the Beck Depression Inventory, attesting to the NAS-PI's discriminant validity. A study with 110 male inpatients in a forensic facility for the developmentally disabled compared NAS-PI scores with, among other measures, a ward behavior rating scale, yielding low correlations (e.g., .28 for the NAS total score and .34 for the NAS cognitive subscale). However, it seems likely that these correlation coefficients could have been limited by a truncated range of scores produced by the respondents in that setting. The NAS-PI correlates modestly with STAXI state anger scores obtained 2 months later, indicating a level of predictive validity for Novaco's instrument. The manual provides detailed information on factor analytic studies of the NAS-PI and on its sensitivity to anger treatment outcome. The test-retest reliability, parallel-form reliability, concurrent validity, and discriminant validity of the NAS were found to be satisfactory in 204 male offenders in Canada (Mills, Kroner, & Forth, 1998). This study and many like it exemplify the substantial professional literature that has developed in recent years from Novaco's research on the assessment of anger. COMMENTARY. The NAS-PI is a two-part self-report inventory of anger and its components that was designed for practicality and ease of use by respondents and examiners. The items were written at a fourth-grade reading level, and the test takes about 25 minutes to complete. Hand-scoring is straightforward and convenient. The test was developed and standardized with community, clinical, and forensic samples, and is offered as an assessment instrument for research, individual assessment, and outcome measurement. The NAS-PI was originally standardized with about 1,500 respondents who were broadly representative of the U.S. population. Many further studies have provided normative data from clinical and forensic populations in diverse geographic regions. The test manual's extensive tabulation of norms and detailed appraisal of the NAS-PI's psychometric properties confirm that the instrument is psychometrically sound and that its author has reached his intended goals. Readers familiar with Novaco's early work will recall the original Novaco Anger Scale from the 1970s that consisted of 80 anger provocation items similar to those in the current PI. The early scale was used in research by Novaco and others and was not commercially available. The development of the new NAS to reflect cognitive, emotional, behavioral, and self-regulatory dimensions of anger drawn from theory was a constructive move that has helped to advance the field. Retaining a list of potentially anger-arousing situations in the PI allows clinicians and researchers to continue to use assessment methodology similar to that of Novaco's original research on anger control. SUMMARY. The NAS-PI can be recommended as a convenient self-report assessment of anger and its principal dimensions to be used in community, clinical, and forensic settings, both as a snapshot index of current anger levels and as a barometer of progress and change. The instrument is extensively norm referenced and has satisfactory psychometric properties of internal consistency, test-retest reliability, and concurrent and predictive validity. Despite the strong internal consistency of the scales, the interitem correlations and factor-analytic work described in the manual appear to indicate that the NASPI scale items are not interchangeable and make different contributions to assessing the measured constructs. It would be interesting to see future researchers use the methodology of modern test theory to ascertain specific test items that are most informative in distinguishing respondents with varying levels of anger, and to aid in scaling items for the level of anger they typically represent. REVIEWER'S REFERENCE Mills, J. F., Kroner, D. G., & Forth, A. E. (1998). Novaco Anger Scale: Reliability and validity within an adult criminal sample. Assessment, 5, 237-248.

This question has not been answered.

Create a free account to get help with this and any other question!

Brown University





1271 Tutors

California Institute of Technology




2131 Tutors

Carnegie Mellon University




982 Tutors

Columbia University





1256 Tutors

Dartmouth University





2113 Tutors

Emory University





2279 Tutors

Harvard University





599 Tutors

Massachusetts Institute of Technology



2319 Tutors

New York University





1645 Tutors

Notre Dam University





1911 Tutors

Oklahoma University





2122 Tutors

Pennsylvania State University





932 Tutors

Princeton University





1211 Tutors

Stanford University





983 Tutors

University of California





1282 Tutors

Oxford University





123 Tutors

Yale University





2325 Tutors