The International Handbook of Psychology
Psychological Assessment and Testing
Contributors: KURT PAWLIK, HOUCAN ZHANG, PIERRE VRIGNAUD, VLADIMIR ROUSSALOV &
ROCIO FERNANDEZ-BALLESTEROS
Edited by: Kurt Pawlik & Mark R. Rosenzweig
Book Title: The International Handbook of Psychology
Chapter Title: "Psychological Assessment and Testing"
Pub. Date: 2000
Access Date: April 9, 2019
Publishing Company: SAGE Publications Ltd
City: London
Print ISBN: 9780761953296
Online ISBN: 9781848608399
DOI: http://dx.doi.org/10.4135/9781848608399.n20
Print pages: 365-406
© 2000 SAGE Publications Ltd All Rights Reserved.
This PDF has been generated from SAGE Knowledge. Please note that the pagination of the online
version will vary from the pagination of the print book.
SAGE
© International Union of Psychological Science
SAGE Reference
Psychological Assessment and Testing
As a technical term, ‘psychological assessment’ refers to methods developed to describe, record, and interpret a person' behavior, be it with respect to underlying basic dispositions (traits), to characteristics of state or
change, or to such external criteria as expected success in a given training curriculum or in psychotherapeutic
treatment. Methods of psychological assessment and testing constitute a major technology that grew out of
psychological research, with widespread impact in educational, clinical, and industrial/organizational psychology, in counseling and, last but not least, in research itself.
In the most general sense, all assessment methods share one common feature: they are designed so as to
capture the enormous variability (between persons, or within a single person) in kind and properties of behavior and to relate these observed variations to explanatory dimensions or to external criteria of psychological
intervention and prediction. As a distinct field of psychology, psychological assessment comprises (1) a wide
range of instruments for observing, recording, and analyzing behavioral variations; (2) formalized theories of
psychological measurement underlying the design of these methods; and, finally, (3) systematic methods of
psychodiagnostic inference in interpreting assessment results. In this chapter all three branches of psychological assessment will be covered and major methods of assessment will be reviewed.
Assessment methods differ in the approach taken to study behavioral variations: through direct observation,
by employing self-ratings or ratings supplied from contact persons, by applying systematic behavior sampling
techniques (so-called ‘tests’) or through studying psycho-physiological correlates of behavior. In this chapter
these alternative approaches are dealt with in Section 20.6 as different data sources for assessment. An alternative classification of assessment tools follows a typology of assessment tasks: developmental assessment
in early or late childhood, vocational guidance testing, assessment in job selection or placement, intelligence
testing, or psychological assessment in clinical contexts such as diagnostics of anxiety states. Some of these
will be dealt with, albeit in an exemplary rather than exhaustive fashion, in Section 20.7.
Before reviewing different data sources and practical applications of psychological assessment, the history,
heuristics, and goals of assessment will be briefly looked at (Sections 20.1 and 20.2), to be followed by the
explanation of a so-called process chart of psychological assessment (Section 20.3). This will enable the
reader to appreciate different functions of psychological assessment in studying and interpreting variations in
human behavior. Following these three introductory sections, basic psychometric and ethical/legal standards
of assessment and psychodiagnostic inference are dealt with in Section 20.4. By present understanding and
professional standards, psychological assessments and tests cannot be applied responsibly without proper
psychometric and ethical/legal grounding. Psychological assessment procedures in general and psychological tests in particular, must not be mistaken for stand-alone procedures, they cannot be applied responsibly
in the absence of profound psychometric qualification and sufficient familiarity with the conceptual basis of
an assessment procedure, within which it has been developed and beyond which its results should not be
interpreted. For example, tests of intelligence originate in specific operationalizations of what is to be understood by intelligence. Individual scores on a test of intelligence must not be interpreted beyond the limits set
by the theoretical-conceptual basis of that test. Of course, from this follow also stringent rules of professional
procedure as regards minimum qualifications to be requested from persons who may apply methods of assessments outside contexts of supervision (Bartram, 1998).
Not surprisingly for a field that is broad in scope and practical applications, there is a rich introductory textbook
literature available (see the Resource References for a sampler). While some topics, like psychometric measurement theory or culture-fair testing of basic information-processing capacities, will hold without much variation across cultures, many assessment methods, especially in personality and clinical testing, must be viewed
as ethnic-embedded and culture-related. In that case special standards have to be observed in cross-cultural
testing (see also Chapter 18) and when adapting psychological tests, for example, of functions of intelligence,
Page 2 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
from one language area or culture to another (cf. Section 20.4 below). Of course, this poses also problems of
presentation in this Handbook, as we look upon psychological science from an international perspective. In
this chapter, the following compromise has been adopted: in the main part of the chapter (Sections 20.1–20.7)
psychological assessment and testing are dealt with (1) in a generalistic manner and (2) with examples mainly
from the English-language and German-language literature, simply for reasons of greater familiarity on the
part of the present author. To counterbalance this unavoidable cultural bias, four further sections 20.8–20.11
provide comparative overviews of assessment methodologies in other languages, viz. Chinese (Mandarin),
French, Russian, and Spanish, each one written by a distinguished author from that language region. This
selection of additional language areas still cannot achieve the desirable full breadth of inter-nationality, yet it
is the authors' (and Editors'!) intention and hope that in this way at least some widening of international perspective is achieved.
Throughout this chapter the term ‘behavior’ is used in a generic sense, including also verbal and other expressions of internal experience, of feelings, emotions, perceptions or attitudes. Similarly, the term ‘psychological assessment’ is used to cover all kinds of assessment technology, including, for example, projective
techniques and objective behavior tests. ‘Psychodiagnostics’, as preferred in some languages, is understood
as synonymous to ‘assessment’. Finally, unless stated otherwise, the word ‘person’ is used to refer to the
individual whose behavior is being assessed (thus avoiding such expressions as ‘testee’, ‘interviewee’, ‘assessee’, or ‘subject’).
20.1 History of Psychological Assessment and Testing
Individual differences in human behavior have been an object of human inquiry ever since the earliest times
of human history. At the high period of ancient classics, eminent philosophers like Aristotle or Plato were intrigued by the diversity in human nature. First examples of systematic proficiency and achievement ‘testing’
are reported from as far back as the ancient Chinese Mandarin civil servant selection procedures (Dubois,
1966).
The historical roots of present-day psychological testing and assessment go back to 1882 and the work of Sir
Francis Galton in Great Britain and to pioneer studies in individual differences, by James McKeen Cattell in
1890, in the United States. During the last decade of the nineteenth century many prototypes of what later
were to become mental tests were published: for the study of individual differences in memory performance,
in reasoning or speed of perception, for example. In 1897 Hermann Ebbinghaus, already famous for his monumental experimental pioneer work on human memory, devised new reasoning tests (e.g., following a sentence-completion design) to be used in school-settings. And in 1895 the French psychologist and lawyer Alfred Binet published, together with Victor Henri, the first edition of his ‘échelle mentale’, a scaled series of
short tests designed to measure level of intellectual development in six-year-old children to guide in educational placement and counselling. At the same time we also find first attempts towards the development of
assessment procedures in clinical contexts, e.g. by the German psychiatrist Emil Kraepelin.
In the following years the number of published studies on individual psychological differences expanded rapidly (cf. Pawlik, 1968), giving rise to a new branch of psychology: the study of individual differences. As early as
1900, the German psychologist William Stern published his founding text Über Psychologie der individuellen
Differenzen (‘On the psychology of individual differences’ Stern, 1900). In this book he laid a conceptual and
methodological foundation also for the development of psychological assessment. The second edition of this
book (Stern, 1911; see also Pawlik, 1994) still is the significant landmark in the early history of assessment
and individual difference research.
While much early test development work was geared towards solving practical assessment problems (in the
educational system, in measuring job performance and developmental potential, or in clinical contexts), anPage 3 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
other seminal publication shortly after the turn of the century by the British psychologist Charles Spearman
(1904) laid the foundation for what should later become the first-choice assessment paradigm: psychological
tests for measuring basic personal dispositions (today called traits, see Chapter 16). In his 1904 paper Spearman also developed a mathematical-statistical theory for analyzing individual differences in mental tests into
two independent components: a universal component (factor) of ‘general intelligence’, which would be common, yet in different degree, to each and every mental test, plus a second, test-specific component (depending on test make-up, item content, mode of presentation, etc.). Spearman' paper upgraded psychological assessment from a descriptive sampling level to the level of measurement and structural analysis of individual
differences. It inspired an enormous research literature on the dimensional (factorial) analysis of assessment
instruments and individual difference indicators. The salient work by Sir Cyril Burt and Philip Vernon in the
United Kingdom, by Leon and Thelma Q. Thurstone and Joy P. Guilford in the US, to be followed in the 1940s
and 1950s by Hans J. Eysenck in the United Kingdom and Raymond B. Cattell in the US, laid the foundation
for what is now confirmed empirical evidence on the multi-factor structure of human intelligence, personality/
temperament, aptitudes, and motivations (Pawlik, 1968; see also Chapter 16 in this Handbook). The design of
numerous methods of psychological assessment still widely in use, is rooted in this research, which has given
rise to such standard assessments of intelligence as the Wechsler tests of intelligence (Wechsler, 1958), tests
of psycho-motor proficiency or of personality/temperament dimensions like extraversion-introversion, neuroticism, or anxiety. Early precursors in this development include, among others, the development of the first
personality questionnaire (Personal Data Sheet) by Robert S. Woodworth in 1913, the first paper-and-pencil
group test of intelligence (called Army Alpha Test of Intelligence) in 1914, the first multi-dimensional clinical
personality questionnaire by Hathaway and McKinley (1943) in Minnesota (Minnesota Multi-Phasic Personality Inventory: MMPI), or the Differential Aptitude Test Battery by Bennett and co-workers (Bennett, Seashore,
& Wesman, 1981).
One common element in these assessment developments was their primary, if not exclusive, reliance on a
static cross-sectional diagnosis (so-called status assessment, studying behavior variations between persons).
This perspective came under challenge when, in the 1950s/1960s, professional and research emphases in
assessment moved away from description towards intervention, foremost in clinical contexts for evaluating
new methods of counselling and psychological therapy. This called for a process-orientation in testing, that is
for assessment instruments that will allow to monitor change (within-person variation) rather than traits (stable
dispositions underlying between-person variations). This new test design also raised questions of psychometric measurement theory; even now these issues have not yet been brought to fully satisfactory solution.
Other lines of research progress in psychological assessment since the 1960s involve systematic construct
analysis of assessment variables under study. A prime example in this respect is the assessment of anxiety,
differentiating conceptually between trait (stable over time and situation) and state (varying over time and situation) anxiety, with both in turn to be contrasted from test anxiety (Spielberger, 1983). In yet another line of
research, assessment techniques were developed to study behavioral variations in situ in a person' everyday
life course or, as it has been called, ‘in the field’. One motif behind this development was a growing concern
for ecological validity (Barker, 1960) of assessment results, which called for sampling behavior not in an artificial laboratory situation, but in a person' natural life space. This also inspired research towards assessing
individual differences in the unrestrained ‘natural’ stream of behavior in a person' natural environment (Pawlik, 1998).
In recent years new developments in the assessment field also became possible through the use of advanced
computer technologies, mostly at the level of personal computers (PC), leading to a new assessment technology called computer-aided testing (CAT). In its simplest form, an existing paper-and-pencil test such as a
personality questionnaire is loaded into a computer program that will present the test items and record the
person' item responses. In its most advanced form, which employs a special adaptive psychometric test theory, a test-software (also called testware) is devised that will administer to a person only test items at a level (of
item difficulty in an aptitude test or, for example, of degree of anxiousness in a personality test of anxiety) that
will prove critical for measuring that trait in this specific person. Advances in testing theory and PC technology
Page 4 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
have made it possible to develop such computer-aided testing methods also for in-field applications (Pawlik,
1998).
As is true of many fields of psychology, the history of assessment and testing has also seen its share of ad
hoc initiatives and even nonproductive sidelines. Two examples may suffice. In the 1920s, the Swiss psychiatrist Hermann Rorschach sought to develop an objective test of psychopathology. Following extensive
clinical experience with hospitalized psychotics he settled on a series of ten plates with symmetric meaningfree graphic displays, as one would obtain by folding and subsequently unfolding a page with random ink
splashes. In Rorschach' Formdeuteversuch (form interpretation study) patients were presented one plate after the other with the simple instruction to tell the experimenter ‘what they think they could see on this plate’.
In an often-quoted publication Rorschach presented evidence that a person' responses, evaluated on the basis of a detailed scoring system, would differentiate between, for example, schizophrenics and depressives.
What seemed an interesting, suggestive new approach to clinical-psychological assessment later got mystified, however, when authors (mostly from depth psychological schools of thinking; cf. Chapter 16) claimed
that tests of such a design would give rise to a new ‘projective’ personality assessment. According to their
reasoning a person would perceive (interpret) a Rorschach plate according to her/his personal style of experiencing, including her/his ‘unconscious’ (perhaps even repressed) motives, feelings, and anxieties—as if the
person would ‘project’ her/his own personality into her/his perception of this unstructured stimulus material. In
the decades to follow, a multitude of similarly conceived ‘pro-jective tests’ was developed, with most of them,
as a rule, falling short in psychometric quality and not even supporting the implied projection hypothesis. Still,
and despite negative psychometric quality assays, projective techniques continue to maintain a role in practical assessment work up to today, even a leading role in some regions of the world.
Another example of an assessment medium of supposedly high validity and still in use in some quarters despite its undoubtedly low to zero psychometric quality is handwriting analysis (graphology). Here again the
underlying rationale seemed straightforward at first glance; obviously, the individual style of handwriting identifies a person with next-to-perfect precision—so that state authorities or banks have come to use a person'
signature as proof of his/her identity. Then should not personal style of handwriting also be an indicator of a
person' unique personality? Despite intuitive plausibility, this expectation has not stood empirical psychometric tests (as will be referenced briefly in Section 20.6). Still this does not seem to prevent some psychologists
and, still more so, laymen and even major business firms to rely on this unreliable assessment methodology
for job placement and career decisions. In addition to handwriting a wide range of other so-called expressive
motions (or products thereof), such as facial expression, style of gross body motion, drawings, story completion, picture interpretation, art appreciation, etc. have been proposed, largely without great psychometric
success, as alternative means for dispositional trait assessment. However recent research has shown that
some of these methods, for example facial expression analysis, do contain valid variance for emotional state
assessment, if properly recorded and scored (cf. Section 20.6).
20.2 Heuristics and Goals of Psychological Assessment
As will be obvious from the preceding section, methods of psychological assessment may be employed for
different purposes and to answer widely different types of questions. In essence, one can distinguish among
the following three prototypical heuristics in psychological assessment.
(1) Descriptive assessment: Let us take as an example an adolescent in the final highschool year seeking
vocational guidance as to which academic or professional training to take up after graduation. In a typical
vocational guidance center this person will be invited to take a number of psychological tests, including a
multi-dimensional interest questionnaire. In this the person will be asked to respond to a range of questions
selected so as to sample salient interests and motives (for example: dealing with people vs. dealing with technical questions, working alone vs. working in groups, being interested in rural vs. urban jobs, in solving verPage 5 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
bal-numerical vs. manual-practical problems, etc.) Often, test results will be expressed in a personal ‘interest
profile’ which may serve, within limits of test validity, as a description of that person' interest structure. Here
the purpose and goal of the psychodiagnostic assessment is the description of a given behavioral reality. As
a matter of fact, the term diagnostics (from the Greek ‘diágnosis’: differentiation, ability to differentiate) refers
to this descriptive heuristic, as does the term ‘assessment’ (from ‘assessing’ or ‘taking note of a factual state
of affairs’).
Obviously, mere description will only rarely suffice as a goal of assessment. In one example, the person seeking vocational guidance is not interested in her/his interest profile per se, but seeks to utilize this information
for purposes of personal prediction (in which field of study will I be most successful and/or most satisfied?)
or decision (which field of study should I choose so that it will match my personal interest profile?). Similarly,
most educational, clinical and occupational/industrial assessments serve predictive or decisional purposes.
By rule of thumb, purely descriptive assessment tends to be limited to research applications, where assessment results may serve as independent or dependent variables in an experimental design or as hypothetical
covariates. For example, a researcher may wish to investigate differences between high-anxiety and low-anxiety subjects in an experiment on muscular relaxation (anxiety measure as independent variable) or study the
effect of a new, potentially anxiolytic drug on overt anxiety level (as dependent variable). Or a study may look
into the correlation between spontaneous degree of heart beat irregularity and individual level of trait anxiety
(as a covariate). In all three cases, a test of trait anxiety will be chosen under this purely descriptive heuristic.
(2) Decision heuristic: As explained earlier, in many practical assessment situations the psychologist seeks
assessment data as information basis for optimizing decisions. The vocational guidance example speaks for
itself. In a clinical setting, psychological tests may be applied to guide patient and psychologist in choosing of
the most appropriate psychological therapy (for example, in the case of an anxiety syndrome) or in a treatment-related decision whether to continue or discontinue a certain psychotherapeutic intervention. Assessment-based rules of decision can be developed in different ways. In one approach one simply tabulates different assessment results (diagnostic states) against outcome categories. For example, we may relate patients'
success rates in a certain method of psychotherapy against their kind or level of pre-therapy anxiety state. In
more advanced decision-related assessment paradigms, decision rules will involve explanatory or predictive
modeling.
(3) Assessment for explanatory or predictive modeling: In this case assessment results are employed to explain how a concurrent psychological state (example: a patient' anxiety disorder) may have developed or how
a person may behave at a later point in time or in a different setting. Predicting the level of professional satisfaction or success of our highschool student on the basis of her/his interest profile, presupposes a model
(theory) that will relate such on-the-job criterion data to current interest test results. Provided such a model
exists and has been confirmed with sufficiently strong correlations between test data and criterion data, one
can extrapolate statistically (predict) that student' later job success or job satisfaction on the basis of the test
results s/he obtained when still in highschool. More advanced predictive modeling will allow the psychologist
(1) to predict for that student the likely job success or satisfaction across a spectrum of vocational positions,
but also (2) to assign a probability estimate (level of confidence) expressing the likelihood that the predicted
criterion values will in fact hold true for a student with an interest test profile as obtained.
This is the methodological paradigm followed in present-day test interpretation for purposes of criterion prediction. By contrast, solely intuitive, subjective test interpretation should be considered a practice of the past,
no longer fulfilling professional standards (although, regrettably, there may still be psychologists out in the profession adhering to such a sub-standard procedure). Today validating a test against the criterion data needed
in predictive modeling or prognosis is considered part of test development, which thus extends way beyond
the mere selection and adaptation of test items or of questions in a questionnaire. Predictive modeling of
test data for psychodiagnostic inference can amount to a very laborious undertaking, also requiring advanced
theoretical sophistication on the part of the researcher as regards psychological processes of possible contriPage 6 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
bution to the criterion data in question.
A second type of modeling involves a ‘postdiction’ or backward modeling of earlier (antecedent) conditions to
account for (or explain) assessment data at hand. For example, we may wonder about conditions earlier in the
high-school student' life that contributed to her/his specific interest test profile at the time prior to graduation.
In this second type of modeling, assessment test data are related backward in time to antecedent psychological or other conditions prior to the assessment. In our given example we may look into parental modeling,
selective past-time learning opportunities, or the student' ability and aptitude profile (say, in the field of music
or in artistic expression). In this way, explanation can be understood as ‘backward prediction’ (or ‘postdiction’)
of most likely antecedents for a given behavioral state, assessment result, or test profile.
Prediction and explanation constitute the most important and most frequently employed heuristics in interpreting psychological assessment data. This interpretation is also called psychodiagnostic inference.
We shall now turn to some further distinctions, with respect to different goals and contextual settings of psychological assessment. For the sake of simplicity they can be set out by way of three dimensional alternatives.
(1) Assessment of status vs. assessment of process: As explained earlier above, psychological assessments
can be designed to describe a current state of behavior (status assessment; for example: intelligence profile,
interest structure, or level of anxiety) or, the nature and extent of behavioral change (process assessment; for
example, change in intelligence profile as a function of developmental maturation, in interest structure as a
function of professional training, or in anxiety level as a function of exposure to psychotherapy). In one important variant of process assessment one studies differences in behavioral indicators across different settings
or situations. For example, in a clinical treatment program one may wish to assess how a patient' anxiety profile varies across situations differing in anxiety arousal (e.g., when speaking to a friend or in front of a large
auditorium).
Classical test theory (CTT; see Section 20.4 below), the measurement rationale still most commonly employed in test development, is more apt to support status assessment than process assessment, which can
be accommodated more readily within the measurement format of item-response theory (IRT; see Section
20.4 below). Thus most assessment instruments still have their primary applicability in status assessment
only. To a large extent, the development of process assessment techniques with satisfactory situational or
developmental sensitivity is still a task for future assessment research.
(2) Norm-referenced vs. criterion-referenced assessment: If our highschool student answered 16 out of 20 urban-vs.-rural activity questions in the direction ‘urban’—does this already indicate a disproportionally high interest in urban activities? Obviously we have to compare this result (16 out of 20) with the range of variations
found in a suitable reference group (in this case: in same-aged male highschool students). In norm-referenced
tests an assessment result is transformed into a standardized score expressing the individual' result in relation to statistical distribution characteristics (cumulative percentage points; mean and standard deviation) in
an appropriate reference population. These distribution characteristics are then the statistical norm employed
for interpreting assessment data. Establishing adequate population norms constitutes an indispensable part
in test development. Whenever test results vary systematically with age, gender, ethnicity, educational background, or other characteristics in the general population, special norms (for specific age groups, the two
sexes, etc.) will have to be supplied.
In criterion-referenced assessment, test results are not expressed with reference to distribution characteristics
in the population, but with reference to a behavioral criterion itself. For example, in primary-school reading
instruction the educational aim (or criterion of instruction) may be mastery of words up to a certain level of
reading difficulty. In a criterion-referenced reading ability test a student' test result is expressed with reference to this criterion (for example, as percentage mastery). Criterion-referenced assessment may also be the
method of choice in psychotherapy outcome evaluation. An important special case of criterion-referenced asPage 7 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
sessment is program evaluation, e.g., of remedial reading ability training programs in educational research
or of psychotherapeutic intervention programs in mental health research. In evaluation, assessment methods are employed to measure degrees of program goal attainment in a properly balanced field-experimental
design. There is a rich reference literature available introducing the use of assessment methods in program
evaluation (see Rossi & Freeman, 1993).
(3) Sampling vs. inventory-taking in assessments: Many assessment procedures are built on the assumption
of an underlying homogeneous universe of assessment items, settings, or situations. In actual test development one follows rules of sampling from this universe. For example, in devising a vocabulary test one selects
a sample of words of different difficulty levels (estimated, for example, in relative usage frequency). Provided
one can set up a rational theory of item difficulty (as, for example, in some visuo-spatial test designs), this
sampling can even be computerized and applied individually in adaptive CAT.
In some assessment problems the homogeneity assumption, let alone a rational item difficulty theory, cannot
be meaningfully maintained. In anxiety testing, for example, we may not only like to know the level of anxiousness of a person, but also her/his individually specific profile of anxiety eliciting settings and stimuli. In other
words, we do not want to rely on a representative sampling of anxiety-provoking stimuli but need to compile,
as completely as possible, an inventory of all stimuli that may elicit anxiety reactions in that patient. Only in
this way will we be able to devise a person-adapted psychotherapeutic intervention. Up to now, this second
assessment rationale has been implemented successfully only for assessments in clinical behavior therapy.
It still remains to be seen if this paradigm could not be used fruitfully also in other assessment contexts.
20.3 A Process Chart of Psychological Assessment
The practice of psychological assessment involves considerably and qualitatively more than merely administering tests, questionnaires, or behavior ratings in a uniform way. Failure to adequately conceptualize the
psychodiagnostic process, from the statement of a problem to the final interpretation of results, has created considerable confusion and contributed to psychometric inadequacies of the professional practice years
back.
Figure 20.1 shows a condensed summary process chart of psychological assessment according to presentday conceptualization. In this diagram five successive stages of an assessment procedure are distinguished
(in rectangular frames), with connecting psychological operations shown in elliptical dishes. Straight-line topdown arrows connect typical steps in solving an assessment program, whereas bottom-up arrows indicate
possible or necessary feedback loops for successive iterative optimization of the assessment.
Different from assessment in basic research, the design of an assessment in professional practice will start
with a more or less coherent statement of a problem, labeled ‘problem at start’ in Figure 20.1. For example, parents may see a psychologist to get advice with developmental problems of their eight-year-old son.
Emotional instability, phases of restlessness and lack of concentration, fits of nervousness and occasional
severe tantrums are among the problem behaviors they report to the psychologist. Naturally, parents will use
everyday language in describing these behavior problems and in expressing their fears and concerns. From
the parents' report the psychologist will, as a first process step in assessment, deduce hypotheses about the
likely nature of the boy' behavior problems, at the same time translating the problem description into scientific conceptual language with reference to behavioral science knowledge about this developmental stage. For
example, the psychologist may deduce the hypothesis that the boy suffers from a symptomatology known as
hyperactive attention deficit disorder. On this basis the psychologist will now translate the problem at start
into specific assessment questions (in the example: testing for symptoms in sustained attention, emotional
responsiveness, etc.).
Page 8 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
Figure 20.1 A process chart of psychological assessment
The next step, ‘operationalization’, then calls for selecting, from among available assessment methods, a suitable set so as to access relevant behavioral indicators under the hypotheses deduced earlier. The following
step, conducting the assessment, is the only routine component in this process, which may even be delegated
to assistants not holding full psychological training. This then leads to norm-referenced or criterion-referenced
assessment results.
Next to deducing diagnostic hypotheses, the final step, psychodiagnostic inference, is the most demanding
one in this process model. It presupposes detailed knowledge of how the results of the assessment relate
to criterion data, to psychodiagnostic categories, or to explanatory concepts. At the same time the results of
this inferential step open up into an over-all evaluation of assessment. For example, the hypothesis deduced
initially may become confirmed or may need to be refined or even rejected. As indicated by bottom-up arrows
in Figure 20.1, depending on results each subsequent step may call for iterative feedback correction of one
Page 9 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
or several earlier steps in the assessment process. For example, rejection of the hyperactivity attention deficit
hypothesis may require the psychologist to restate the problem and develop alternative diagnostic hypotheses or, for example, choose a better operationalization or more advanced psychodiagnostic inference models.
Space precludes more detailed consideration of these steps and iterative feedback loops. May it suffice to
say that the last step, psychodiagostic inference, has been given special research attention in recent years.
For clinical psychological assessments standardized diagnostic inference systems (DSM IV, American Psychiatric Association, 1994; ICD-10, International Classification of Diseases, World Health Organization, 1990)
have been developed. Specialized interpretation and prediction systems have been developed, for example,
for assessment-based vocational guidance. There is reason to conclude that future development of psychological assessment methodology will depend to a growing extent on the further elaboration and creative design of systems and rules of psychodiagnostic inference. This development will widen the basis for systematic
validation of the assessment process at large.
This leads us into questions of how to evaluate the quality, especially the veridicality, of psychological assessment and testing.
20.4 Psychometric and Ethical/Legal Standards of Assessment and Psychodiagnostic Inference
Different methods of psychological assessment follow different approaches in recording and analyzing human
behavior. Yet all methods touch upon a person' behavioral and personal sphere of privacy. Furthermore, personal information obtained in an assessment may become the basis for decisions of great importance for
that person (cf. Section 20.2). It is for these reasons that psychological assessments must meet high standards of quality control (psychometric standards) and of ethical responsibility (legal/ethical standards). This
was recognized in the 1920s/1930s. The ‘Standards for Educational and Psychological Tests’ developed by
the American Psychological Association, currently in their 5th edition (American Psychological Association,
1985), are considered a model statement of such standards and have become a master schedule of assessment standards internationally (see also Fernandez-Ballesteros, 1997). It is current professional understanding that explicit empirical proof has to be provided for an assessment method to meet these psychometric
standards to satisfactory degrees, as each and every single application of an assessment method has to follow these standards and ethical/legal provisions.
These standards and regulations are presented briefly below.
Psychometric Standards of Psychological Assessment
(1) Objectivity of administration: Human behavior can be open to countless influences and causes. In psychological assessment one studies human behavior to gain insight into a person' enduring dispositions (traits)
or concurrent state. So special care needs to be taken to ensure that different assessment results can only
be due to different trait or state make-up of the person assessed—and not due to physical or social particulars of the assessment situation, the behavior of the psychologist conducting the assessment, or any other
circumstantial factor. Objectivity of administration is defined as the degree to which assessment results are
independent of such extraneous factors. In developing an assessment method, special care must be taken to
standardize the physical and social characteristics of the assessment situation, the way in which instructions
are to be given to the person assessed, the behavior of the psychologist conducting the assessment, and the
like.
Assessment methods may differ in the degree of administration objectivity. As a rule, group tests and methods
Page 10 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
employing CAT-format will show higher levels of objectivity of administration than individual performance tests
(as for example, in the Wechsler intelligence test system) or methods of behavior observation and rating, respectively.
(2) Scoring objectivity: Scoring refers to translating observed variations of behavior into a descriptive recording system. In general, one distinguishes between qualitative and quantitative scoring. In the first, differences
between scoring units are qualitative in nature (for example: technical vs. social interests). By far the majority
of psychological assessment methods follow a quantitative scoring rationale, according to which scoring units
differ in aspects of magnitude or intensity. In this case, assessment results are expressed in numerical form.
Depending on the scoring rationale, scoring systems differ in scaling property of assessment scores. In the
most simple case (ordinal scale), the scoring rationale can only preserve order of magnitude (or intensity).
For example, think of a test of twenty arithmetic problems increasing in difficulty level. Three persons solving
five, ten, and fifteen of these twenty problems, respectively, most likely will differ in this order in their individual
level of numerical proficiency. Yet this will not ensure that the third person surpasses the second one in trait
level by the same amount as this person surpasses the first one! Obviously this would presuppose equality of
distances in item difficulty level between successive items.
A scoring rationale establishes an interval scale if and only if equal score differences relate to equal differences in the psychological quantity to be measured. Today in many psychological tests care is taken to ascertain interval scale quality. One way to achieve this in our numerical proficiency test would be to select twenty
items so that, for any item number i, the difference in item difficulty between item i and item (i + 1) will be the
same throughout. Constructing tests according to IRT standards can guarantee interval scale quality of test
scores.
If the scoring rationale, in addition to interval scale quality, also ensures an absolute zero point of measurement, the resulting scale is called ratio scale. This presupposes prior knowledge about the lowest score level
conceivable and ever to be found for that scoring system in human behavior. Obviously psychological measurement scales can hardly ever meet this high scale requirement. Yet, unless ratio scale quality has been
established, scores must not be analyzed in a multiplicative fashion. For example, an intelligence quotient
(IQ) of 140 must not be misinterpreted as indicating twice the intelligence level of an IQ of 70, as we do not
know at which IQ score to locate the absolute zero level of human intelligence endowment. For the same reason, computation of score ratios such as ‘following psychotherapy the anxiety level in patient X was reduced
to 40% of that person' pre-therapy anxiety level’ are strictly not permissible and can be highly misleading.
Scoring objectivity refers to the degree to which a scoring system provides scoring rules according to which
any one observable specimen of behavior will be scored in one and only one scoring category. A frequently
employed method to test for scoring objectivity is to have the same behavior record scored by several independent scorers. Then the degree of inter-scorer correspondence (correlation) can serve as a measure of
scoring objectivity. In developing an assessment method the author has to demonstrate empirical proof of
scoring objectivity.
(3) Statistical norms: With units of measurement often being arbitrary as explained above, the results of many
psychodiagnostic assessment methods remain ambiguous unless norm-referenced. This involves expressing an individual score in relation to statistical distribution parameters of that score in a suitable reference
or norm population. The test construction literature (see, for example, Lord & Novick, 1968) explains such
different norming systems as standard scores (individual raw score minus population mean, divided by population standard deviation), normalized standard scores (standard score transformed so as to yield a Gaussian
normal distribution in the reference population) or percentile norms (percentage of persons in the norm population yielding the same or a lower test score). Modern IQ-scores, for example, are interval scores at the level
Page 11 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
of a normalized standard score (with a mean of 100 and standard deviation of 15).
The manual of an assessment procedure has to provide detailed information on the norm population employed in standardizing the scoring system. As explained above, this may call for different sets of norms for
subgroups of the population differing significantly in score distribution parameters. Before applying an assessment procedure to a new population, as a rule re-standardization should be considered obligatory. To guard
against systematic differences between different age cohorts (for example, due to changes in educational
systems), tests should be re-standardized at suitable intervals.
(4) Discriminative power: This refers to the degree to which an assessment procedure will yield different results for persons differing in the trait under study or, in the case of intra-individual assessment, yield different
results for the same person in different situational states.
(5) Internal consistency: Of course, the different elements (components, items) of an assessment procedure
should all measure the same quality or aspect of behavior. Otherwise the interpretive meaning of a test score
would become ambiguous and the score itself useless. Internal consistency refers to the degree to which
elements or items of an assessment procedure all measure the same aspect or quality behavior. Typically
the internal consistency of a test is measured by computing the intercorrelations between test scores at item
level. For a test to be consistent, each item has to correlate highly with the total score computed from all remaining test items.
(6) Reliability: This core psychometric criterion is defined as the degree to which assessment results are unaffected by unsystematic errors of observation, of assessment circumstances, and measurement errors. Reliability is the nucleus concept of the so-called classical test theory (CTT; see Lord & Novick, 1968). According
to this theory any observed score x is the sum of two underlying components: a true score t (of that person
in the underlying behavior variable) plus an error component e (due to unintended, unsystematic causes of
variation additionally affecting that person' behavior at that given assessment occasion). Then reliability is
defined as the ratio of the variance of the true score component to the variance of observed scores x. In this
sense, the reliability coefficient R denotes the percentage of variance in observed scores reflecting true score
differences in the variable under study. Interestingly enough, this psychometric concept of error in test reliability theory is fully equivalent to the concept of error of measurement as used in ISO norms for physical and
technical measurement as established by the International Standards Organisation (1981). See also Pawlik
(1992). The complement (1 - R) gives the percentage of error variance in raw test score variance. The positive square root of the numerator in this ratio, the standard deviation of errors e, is called the standard error
of measurement (SEM) of an assessment method. SEM is the average amount, in raw score units, by which
observed scores x deviate from the respective true score t. Knowledge of SEM can be used to compute a
confidence interval within which a person' true score will lie (with chosen level of probability p).
A necessary condition for SEM not to exceed half of the raw score standard deviation is that the reliability R
equals 0.75 or above. Consequently, a psychometric rule of thumb requires the reliability of a psychodiagnostic assessment method to reach or exceed 0.80. Today properly designed assessment methods, especially
objective behavior tests, yield psychometric reliabilities of 0.90 and above, particularly for test measures of
highly stable traits like general intelligence, visuo-spatial, or psychomotor aptitudes.
Different methods have been developed to estimate R in test development, most prominent among them the
re-test method (yielding a stability estimate of R), the parallel-form method (yielding an equivalence estimate
of R), and various internal consistency estimates of R (odd-even method, Kuder-Richardson coefficients).
Common to all methods is their reliance on interindividual correlations as estimates of R. Consequently, these
estimates are relative in the sense that they also depend on the degree of homogeneity/heterogeneity in
the person population sampled. While originally conceptualized for trait measurement, CTT can also be exPage 12 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
panded to provide for deriving reliability estimates for state measurement, even for within-person within-occasion measurement-reliability of an individual assessment result in a specific situation context (Buse & Pawlik,
1994).
Most psychodiagnostic assessment procedures, especially almost all psychological tests, are developed according to CTT reliability theory. While setting stringent standards for high-reliability test development, CTT
carries with it also shortcomings, however. By necessity of mathematical deduction, for two CTT-designed
test variables 1 and 2 the score difference (1–2) will be less reliable than the original scores, and the drop
in reliability will increase with increasing correlation between variables 1 and 2. As a consequence, CTTdesigned tests yield rather unreliable difference scores in the measurement of change or process. Another
disadvantage in CTT-based test development is its inability to measure person scores independent of item
difficulty levels, and vice versa, at ratio scale level. These shortcomings of CTT are avoided elegantly in modern probabilistic or item-response theory (IRT) of psychological measurement, which builds on the work of
Rasch, Birnbaum, Fischer, and others (see Lord & Novick, 1968; Wainer, 1990). Other than in CTT, score
reliability is estimated in IRT by a maximum-likelihood error-of-estimation function. The advanced mathematical apparatus employed in IRT may be responsible for the fact that, for decades, most assessment research
and applications stayed away from it. This should no longer be the case as IRT applications are now readily
available in PC software programs (Wainer, 1990).
(7) Validity: This second most important CTT standard refers to the degree to which a psychological assessment measures that and only that psychological variable or attribute it is designed to measure. It can be
shown formally that reliability is a necessary but not sufficient condition for validity (the validity of a measure
cannot exceed the square root of its reliability). From a practice-oriented point of view, validity is the ultimate
quality standard of assessment-assuring, for example, that a test of anxiety does indeed measure anxiety
and, ideally, nothing but anxiety.
Again there are also several methods to estimate validity. In external or criterion validation the interindividual
correlation between assessment results and the targeted criterion (for example: actual success in on-the-job
training, or actual improvement in anxiety level following psychotherapy) is determined empirically. An important distinction in criterion validation refers to the temporal distance between time of assessment and time
of criterion data acquisition. One speaks of concurrent (diagnostic, strictly speaking) validity when this temporal distance is negligible. (Example: validating a psychomotor aptitude test against the criterion of actual
in-flight simulator performance of air pilot trainees, both types of measures taken within the same training
week.) Alternatively one speaks of predictive (prognostic) validity, when time of assessment and time of criterion performance are weeks, months, or possibly years apart. In many educational, industrial, and clinical
assessments this latter type of validity is of primary concern.
As expected, predictive validities will fall short of concurrent validities, with the drop in validity also being a
function of temporal distance between time of assessment and time of criterion data collection. For example,
predictive validities for success in professional training programs seldom exceed criterion correlations of
0.50–0.60 (and are often even lower). Provided sufficient reliability of the assessment method in question,
these lower than expected predicted validities simply remind us of the necessary limits of longer-term behavioral prediction in general. Human behavior is an open system in several respects. In the course of a
training program, for example, different persons may show different amounts of change in relevant basic trait
scores—be it as a consequence of the training in question or for other, more individualistic reasons. Furthermore, different persons may differ in the nature and degree of change they experience (in their mental life, in psychologically relevant aspects of their social or physical environment) over the time period in
question—which again will attenuate predictive criterion correlations. Given high-reliability assessment procedures, less than perfect predictive validities must not be blamed on the quality of the assessment process
but simply highlight necessary, principal limits to longrange predicting of human behavior within contexts of
Page 13 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
free individuality in a free society. In this sense, predictive validation studies also tell us which diagnostic criterion can be properly predicted across which temporal or situational predictive distance. In addition, both
concurrent and predictive criterion validities may be attenuated further due to imperfect criterion data reliability. When validating a test of intelligence against the criterion of intelligence ratings teachers give for their
students, the reliability of criterion measures will be significantly lower than that of test measures. Within CTT
it can be shown algebraically that the correlation of two variables 1 and 2 cannot exceed the square root of
the product of their reliabilities R1 and R2. Thus insufficient criterion reliability will further attenuate external
test validity.
Up to this point we have treated questions of external validity from a strict measurement point of view. In
practical psychodiagnostic assessment often a less stringent mode, namely classificatory assessment, is fully sufficient or even more appropriate. Many clinical-psychological assessments are of such a classificatory
type, for example, anxiety state in need vs. not in need of psychotherapy; patient shows vs. does not show
symptoms of major depression. Also assessments in educational and industrial/organizational contexts often
follow classificatory formats. As long as base rates of classificatory diagnostic classes will not differ markedly
in the population of persons assessed, the percentage of correct assessment-based diagnostic classification
can still justify the utility of the assessment procedure even with medium to moderate test-criterion validity
correlations.
In internal validation, the validity of a new assessment method is estimated by correlating its results with other
assessment methods whose validity has already been established. In construct validation the validity of an
assessment method is estimated by the degree to which this method will yield empirical results in accord with
hypotheses derived from the theory in which the construct is embedded. For example: If test x is indeed a
valid measure of state anxiety, a psychopharmacological agent known to be anxiolytic (e.g., application of a
benzo-diazepine substance) should result in significant test score reduction (in a suitably balanced planned
experiment). Campbell and Fiske (1959) developed a suggestive correlational model (called a multi-trait multimethod validation matrix) for construct validation which allows to separate between convergent (constructconform) and discriminant construct validity, the latter referring to empirical proof that the measure in question
is indeed unrelated to other concepts not part of the construct to be assessed.
Construct validation is the royal road to theory-guided assessment development. At the same time, systematic
construct validation studies lead to substantial advances in differential psychological theory of human personality traits, of state variations, and of trait—state interactions. In this way, the last fifty years of assessment
research have given rise to an even more refined understanding of central trait domains like intelligence, neurotiscism (emotional stability/lability), anxiety, or psychomotor aptitudes (see also Chapter 16).
(8) Test fairness: One and the same psychological test may measure different attributes in different populations. For example, performance on tests of psychomotor coordination is known to depend on different (perceptual and motor) factors in unexperienced (experimentally ‘naive’) subjects as compared with experienced
(substantially pre-trained) subjects (Fleishman & Hempel, 1954). Differential validity diminishes test fairness,
if one and the same measures different attributes (or attributes at different levels) in different ethnic groups
(Reynolds & Brown, 1984). Test fairness has also been recognized as an important limiting condition to transferring a psychodiagnostic assessment procedure (like a standard intelligence test) from one culture to an
ecologically different culture. Test fairness also has implications for item translation in cross-cultural testing
programs (see Chapter 18).
During the last twenty to thirty years substantial literature has accumulated on issues related to test fairness.
In the most simple case, significant population differences in test validity may require different test interpretation rules or different test selection procedures to measure the same attributes in the same (fair) way in two
contrasting populations. At a more complex conceptual level, problems of test fairness and ecological validity
Page 14 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
may lead one to question the usefulness and theoretical meaningfulness of comparing two different populations in tests not meeting the criterion of symmetric ecological validity. With continuing economic and social
globalization, already today within the European and North American region aspects of test fairness, culture
fairness, and symmetric linguistic-ecological representativeness become important issues at the psychological practitioner' level. Within the European Union assessment development has begun to concentrate on new
test designs that will meet standards of cultural fairness right from the start.
(9) Response objectivity: Some assessment methods are more easily to fake than others. An objective intelligence test, for example, can at most be faked bad (viz. by giving incorrect or no answers to problems one
would be able to master), while personality questionnaires can be faked in either direction. Response objectivity refers to the degree to which the results of a psychological assessment will be unaffected by a person'
(voluntary or involuntary) response sets or faking tendencies. Since the 1950s, an enormous amount of empirical literature has accumulated on test-taking attitudes; especially in test validation special attempts have
to be made to guard against response sets.
Ethical/Legal Standards
Psychodiagnostic assessment and psychological therapy are among the fields of professional psychological
activity that deserve special ethical and legal consideration. Consequently, both fields of psychological practice receive attention also in national codes of professional-psychological ethics (Leach & Harbin, 1997). In
some countries (for example, in Germany) also provisions in the penal code, in the code of criminal procedure, in the civil code, or special laws pertaining to the use of electronically stored personal data are relevant.
At least the following three ethical/legal standards are considered essential universally.
(1) Protectionofpersonality: As a rule, national constitutions declare an individual' right to personal integrity,
with the consequence of individual rights to the protection of privacy and of personal interests. As in medicine,
also in psychology diagnostic assessments must not violate these rights to personal integrity. In the past this
has raised questions, for example, as to the admissibility of personality questionnaire items raising issues of
sexual behavior. In case of doubt, a regional or national psychological ethics committee should weigh the necessity (or acceptability) of an assessment method vis-à-vis constitutional rights to integrity on the one hand
and given psychodiagnostic assessment goals on the other hand.
(2) Principleofinformedconsent: Administering a psychodiagnostic assessment must be contingent upon the
person' prior, informed, and explicit consent. (In some countries, however, the penal code or the code of criminal procedure may permit exceptions.) Analyzing or even simply observing the behavior of an identified or
potentially identifiable person in a non-public situation without that person' explicit and informed consent is
generally considered a violation of professional ethical standards. The relevance of this standard for hidden
audio or video taping or disguised one-way mirror observation is obvious.
(3) Principleofconfidentiality: Many national codes of professional psychological ethics highlight a person' fundamental right to have her/ his data handled with absolute confidentiality. In Germany the psychologist' commitment to this confidentiality principle is even spelled out in a paragraph of the penal law, for that matter
treating the psychologist like a medical doctor, a clergyman, or a barrister (Article 52, German Penal Code).
Together with the foregoing two standards, the principle of confidentiality also sets rules as to how a psychologist is allowed or requested to deal with personal assessment data obtained under a third party' commission
(for example, when testing a person applying for a job in an office other than that employing the psychologist
conducting the assessment). Here again the principle of informed consent becomes absolutely critical. Many
national professional codes of ethics also contain explicit statements on how psychological assessment data
Page 15 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
are to be filed (stored) in order to uphold principles of confidentiality and of protection of personality.
20.5 Variable-Domains of Psychological Assessment
Psychodiagnostic assessment methods have been developed for a wide spectrum of trait and state variables
affecting human behavior. Following a proposal by Cronbach (1949), one distinguishes between performance
and personality measures, the former referring to measures of maximum behavior a person can maintain, the
latter to measures of typical style of behavior. Intelligence tests are examples of performance measures, a test
of extraversion—introversion or of trait anxiety examples of personality measures. While handy for descriptive
purposes, this distinction must not be mistaken for a theoretical one, as trait measures of performance may
in fact correlate with trait measures of personality (for example, speed of learning with level of trait anxiety).
Within the limits of this distinction, the following summary list may serve to illustrate the scope of behavioral
variables for which assessment procedures have been developed.
(1) Performance variables: These include measures of sensory processes (for example: tactile sensitivity,
visual acuity, color vision proficiency, auditory intensity threshold); perceptual aptitudes (tactile texture differentiation, visual closure, visual or auditory pattern recognition, memory for faces, visuo-spatial tasks, etc.);
measures of attention and concentration (tonic and phasic alertness; span of attention; distractability; double-performance tasks; vigilance performance over time); psychomotor aptitudes (including a wide variety of
speed-of-reaction task designs); measures of learning and memory (short-term vs. long-term memory; memory span; intentional vs. incidental memory; visual/auditory/kinesthetic memory); assessment of cognitive performance and intelligence (next to general intelligence a wide range of primary mental abilities like verbal
comprehension, word fluency, numerical ability, reasoning abilities, measures of different aspects of creativity, of social or emotional intelligence; see Chapter 16); assessment of language proficiency (developmental
linguistic performance, aphasia test systems, etc.); measures of social competence.
(2) Personality variables: These include the assessment of primary factors of personality (especially of the
so-called Big Five, cf. Chapter 16, and numerous more specific personality measurement scales); special
clinical schedules and symptom checklists (to assess anxiety, symptoms of depression, schizotypic tendency,
personality disorders, etc.); motivation structures and interests; styles of daily living; pastime and life goals;
assessment of incisive lifeevents; assessment of stress tolerance and stress coping (including coping with
serious illnesses and ailments); plus a wide range of still more specific assessment variables, like measures
for the assessment of specific motives or specific styles of coping with illness or stressful life events.
By now even the number of psychodiagnostic assessment methods meeting high psychometric standards
must already reach many tens of thousands, rendering it totally impossible to give more than an informative
overview within the limitations of this chapter. Rather than enumerating hundreds of assessment procedures
we shall here take a systematic look at major data sources for psychological assessment (in Section 20.6)
and then briefly examine a few selected psychodiagnostic assessment problems and how they would be typically approached (in Section 20.7). For a more detailed coverage of assessment methods the reader is referred to three kinds of sources: (i) introductory texts as documented in Resource References; (ii) periodical
encyclopedic resource publications such as the Mental Measurement Yearbook (Mental Measurement Yearbook 1998: Impara & Plake, 1998; now also accessible via internet at http://www.unl.edu/buros/catalog.html)
and corresponding resource publications in languages other than English; and, most recent and most useful,
(iii) electronic on-line accessible assessment method archives (as part of PsycInfo, provided by the American Psychological Association through its internet site: http://www.psycinfo.com or, for example, the German
test data archive PSYTKOM: http://www.7pid.psychologie.de). To illustrate the international breadth and diversity in the field of psychological assessment, Professors Houcan Zhang, Pierre Vrignaud, Vladimir RousPage 16 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
salov, and Rocio Fernandez-Ballesteros accepted invitations to contribute Sections 20.8–20.11 to this chapter
with overviews of Chinese-language, French-language, Russian-language, and Spanish-language assessment methods, respectively.
20.6 Ten Data Sources for Psychological Assessment
By a rough estimate, more than 80% of all published assessment methods will be questionnaires or objective
tests. As we shall see in this section, the range of possible assessment data sources extends considerably
farther though. And in practical assessment work too psychologists tend to complement (cross-check or simply expand) their assessment by some or several non-questionnaire and non-test methods. For example, in
clinical assessments behavior observation and interview data, often also psy-chophysiological data are considered essential additional information, as is interview and actuarial/biographical data in industrial/organizational assessments.
Table 20.1 Ten data sources in psychological assessment (adapted from Pawlik, 1998)
Variance accessed
Data modality
Data source
Mental
Psycho
Behavior
Laboratory
representation
physiology
Field
Response
objectivity
1 Actuarial and biographical
data
x
x
+
2 Behavior trace
x
x
+
3 Behavior observation
x
x
x
+/−
x
x
+/−
x
+/−
4 Behavior rating
x
5 Expressive behavior
x
x
6 Projective technique
x
x
−/+
7 Interview
x
(x)
x
-
8 Questionnaire
x
(x)
x
-
9 Objective test
x
x
x
+
10 Psychophysiological data
(x)
x
x
+
x
Table 20.1 gives a summary of ten data sources of psychological assessment which will be briefly explained
below. For each data source three types of entries are given:
• data modality: whether a methods relies on mental representations (perceptions, memory, cognitive
appraisal) of variations in behavior, on direct concurrent recording of behavior, or on psychophysiological measures;
• variance accessed: whether a method will study behavioral variations under (artificially) standardized
and thus restricted ‘laboratory’ conditions (as in a typical clinical or industrial/organizational test situation) or rely on field data, i.e., variations of behavior as they occur in a person' natural life space,
outside the laboratory, in the person' home, at the work place, in her/his normal daily activity; and
• response objectivity: whether data can be perfectly response-objective (+), possibly of satisfactory
(+/−) or possibly not of satisfactory (−/+) response objectivity or, as a rule, deficient in response obPage 17 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
jectivity (−).
The reader is referred to Pawlik (1996, 1998) for details of this classification of data sources and to the literature referenced in Section 20.5 for details on specific assessment methods.
(1) Actuarial and biographical data: This category refers to descriptive data about a person' life history, educational, professional and medical record, possibly also criminal record. Age, type and years of schooling,
nature of completed professional education/vocational training, marital status, current employment and positions held in the past, leisure activities, and past illnesses and hospitalizations are examples of actuarial and
biographical data. As a rule, such data is available with optimum reliability and often represents indispensable
information, for example, in clinical and industrial/organizational assessments. Special biographical check listitem assessment instruments may be available in a given language and culture for special applications.
(2) Behaviortrace: This refers to physical traces of human behavior like handwriting specimen, products of art
and expression (drawings, compositions, poems or other kinds of literary products), left-overs after play in a
children' playground, style (tidy or untidy, organized or ‘chaotic’) of self-devised living environment at home,
but also attributes of a person' appearance (e.g., bitten finger nails!) and attire.
While at times perhaps intriguing, also within a wider humanistic perspective, the validity of personality assessments based on behavior traces can be rather limited. For example, graphology (handwriting analysis)
has been known for a long time to fall short of acceptable validity criteria in carefully conducted validation
studies (see Guilford, 1959; Rohracher, 1969). On the other hand, behavior trace variables may provide valuable information in clinical contexts and at the process stage of developing assessment hypotheses (cf. Figure 20.1).
(3) Behaviorobservation: In some sense, behavior observation will form part of each and every assessment.
In the present context the word observation is used in a more restricted sense, though, referring to direct
recording/ monitoring, describing, and operational classification of human behavior, over and above what may
be already incorporated in the scoring rationale of a questionnaire, an interview schedule, or an objective test.
Examples of behavior observation could be: studying the behavior of an autistic child in a playground setting;
monitoring the behavior of a catatonic patient on a 24-hour basis; observing a trainee' performance in a newly
designed work place; or self-monitoring of mood swings by a psychotherapy patient in between therapy sessions.
An enormous amount of research literature is available on the design of behavior observation schedules,
on questions of time vs. event sampling in ambulatory behavior monitoring (see, for example, Fahrenberg &
Myrtek, 1996; Pawlik & Buse, 1996), on alternative rationales for defining units of observation in the continuous spontaneous stream of behavior, on observer training, adequate periods of continuous other-monitored
behavior observation, or on reactivity changes in behavior as a result of the observation procedure, to quote
only a few.
In a way, it is regrettable that the development of self-administering questionnaires and objective tests, starting in the 1920s and 1930s, has pushed careful, systematic behavior observation to the side of the assessment process. Only in recent years, especially within clinical assessment and treatment contexts following
behavior-therapeutic approaches (cf. Chapter 22), is the potential value of behavior ratings for the assessment process being re-discovered.
(4) Behavior ratings: In behavior rating assessments a person is asked to evaluate her/his own behavior or
the behavior of another person with respect to given characteristics, judgmental scales, or checklist items.
The method can be applied to concurrent behavior under direct observation (as in modern assessment center
applications) or, and more typically, to the rater' explicit or anecdotal memory of the ratee' behavior at previous
occasions, in (past or imagined) concrete situations, or in a general sense. Behavior rating methods may tell
Page 18 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
more about the mental representations that raters hold (developed, believe in) regarding the assessed person' behavior than about that behavior itself. A vast amount of research literature has accumulated on such
research issues as raters' response sets and judgmental errors, inter-rater reliability as a function of rating
format and rating scale design, on the standardized definition of rating scale units by giving sample video or
audio behavior records.
Behavior ratings constitute an essential methodology in clinical and industrial/organizational psychology, in
psychotherapy research and, last but not least, in basic personality research. Modern textbooks of personality
research (see Chapter 16) usually give detailed accounts of how to devise behavior rating scales and how
to compensate for common sources of error variance in ratings (severity vs. mildness error; central tendency
error; positive or negative halo effect; semantic error; rater—attribute interaction error; and so-called logical
errors, resulting from a rater' implicit theory about overlap and correlations between attributes).
(5) Expressive behavior: As a technical term, expressive behavior refers to variations in the way in which a
person may look, move, talk, express her/his current state of emotion, feelings or motives. Making a grimlooking face, trembling, getting a red face, sweating on the forehead, walking in a hesitant way, speaking
loudly or with an anxiously soft voice, would be examples of variations in expression behavior. Thereby expression refers to stylistic attributes in a person' behavior which will induce an observer to draw explicitly or
implicitly inferences about that person' state of mind, emotional tension, feeling state, or the like.
Assessing another person from her/his expressive behavior has a long tradition which goes back to pre-scientific days. Chapter 16 gives examples of such early attempts to study human personality through individual differences in physique, habitual facial expression, and other bodily characteristics. Despite some intuitive plausibility (let alone culture-bound interpretative traditions!), correlations between objectively measured
personality attributes and variations in physique and habitual expression do not warrant use of these variables in psychological assessment of stable personality traits (Guilford, 1959). The older German Ausdruckspsychologie (psychology of expression; for a summary cf. Rohracher, 1969), which hypothesized substantial
physique—personality correlations, has been disproven. However, there is significant validity in expressive
behavior variables for assessing state variations. Thus Ekman (1982), using modern time-fractioned videoanalysis methods, was able to show that variations in facial expression co-vary substantially and significantly
with changes in concurrent state of feeling and emotion, giving rise to objectively scorable, reliable assessments of emotional state on the basis of video-taped facial expression. More recently this approach has been
extended to the study of gross bodily movement expression (Feldman & Rimé, 1991). This research is relevant also for developing teaching aids in psychological assessment and observer training.
(6) Projective technique: In Section 20.1 the design of the Rorschach Test (Rorschach, 1921) was introduced
to illustrate a projective assessment procedure. In another procedure, the Thematic Perception Test (TAT;
Murray, 1943), the person is presented pictures (some photos, some drawings), many of them showing one
or several persons in an ambiguous situation. The task of the person is to tell a story matching the picture,
describing her/his perception of the situation shown, of events that would have led to this situation, and how
s/he thinks the story will end.
In the 1930s and 1940s many clinical psychologists, often influenced by psychoanalysis and other forms of
depth psychology, placed high expectations in such projective techniques, believing that they would induce
a person to express her/his perception of the ambiguous stimulus material, thus willingly or even unwillingly
‘uncovering’ her/his personal individuality, including motives and emotions that the person may not even be
aware of. Later, in the 1950s and 1960s, research has clearly shown that such assessment methods not only
tend to lack in scoring objectivity and psychometric reliability, but—and still more important—also turned out
to be of very limited validity, if any. As early as Murstein' (1963) review the underlying projection hypothesis
could not be verified. Nevertheless projective tests still keep some of their appeal today, and research in the
1960s and thereafter succeeded in improving techniques like the Rorschach test at least as far as scoring
objectivity and reliability are concerned (for example, Holtzman Inkblot Test: Holtzman, Thorpe, Swartz, &
Page 19 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
Herron, 1961). Furthermore thematic association techniques like the TAT maintain their status as assessment
methods potentially useful for deducing assessment hypotheses. In addition, special TAT forms have been
devised for assessing specific motivation variables such as achievement motivation (McClelland, 1971). In
the clinical context, once their prime field of application, projective techniques are no longer considered a tenable basis for hypothesis testing and theory development, let alone therapy planning and evaluation.
(7) Interview: Most psychodiagnostic assessments will include an interview at least as an ancillary component—and be it only for establishing personal contact and an atmosphere of trust. Extensive research on
interview structure, interviewer influences, and interviewee response biases has given rise to a spectrum of
interview techniques for different purposes and assessment contexts. As a rule, clinical assessments will start
out (cf. Figure 20.1) with an exploratory interview in which the psychologist will seek to focus the problem at
hand and collect information for deriving assessment hypotheses. An interview is called unstructured if questions asked by the psychologist do not follow a predetermined course and, largely if not exclusively, depends
on the person' responses and own interjections. Today most assessment interviews are semi-structured or
fully structured. In the first case, the interviewer is guided by a schedule of questions or topics, with varying
degrees of freedom as to how the psychologist may chose to follow up on the person' responses. Fully structured interviews follow an interview schedule containing all questions to be asked, often with detailed rules
about which question(s) to ask next depending on a person' response to previous questions. An example of
such a structured clinical interview schedule is the Structured Clinical Interview (SCID; Spitzer, Williams, &
Gibbon, 1987) for clinical assessments according to the Diagnostic and Statistic Manual (DSM; cf. Section
20.1).
The less structured an interview, the richer it may prove in breadth of information touched upon, but the poorer
its results will conform, as a rule, with standard psychometric criteria of assessment reliability. The enormous
amount of literature on psychometric pitfalls in interview data and on how to improve interview schedules so
as to yield more reliable assessment information is well documented (cf. Guilford, 1959). In general, structured interviews like SCID will exceed semi-structured and unstructured interviews in psychometric quality.
Interviews have also been devised as a means to introduce an assessment situation which then allows for
direct behavior observation—over and above recording the person' answers. Such clinical interview and behavior observation schedules have been developed, for example, by Lorr, Klett, and McNair (1965) (see also Pawlik, 1982, pp. 302–343), by Baumann and Stieglitz (1983) or in the Present State Examination (PSE;
Wing, Cooper, & Sartorius, 1974). With proper interviewer training, these combined interview—behavior observation schedules have been shown to yield high scale reliabilities (of 0.85 and above!), at the same time
extracting highly valid clinical-psychological variance.
(8) Questionnaires: Originally, personality inventories, interest surveys, and attitude or opinion schedules
were devised as structured interviews in written, following a multiple-choice response format (rather than
presenting questions open-ended as in an interview proper). In a typical questionnaire each item (question
or statement) will be followed by two or three response alternatives such as ‘yes—do not know—no’ or
‘true—cannot say—untrue’. Early clinical personality questionnaires like the Minnesota Multi-phasic Personality Inventory (MMPI; Hathaway & McKinley, 1943; recent revised edition by Butcher, Dahlstrom, Graham, Tellegen, & Kaemmer, 1989) drew much of their item content from confirmed clinical symptoms and syndromes.
By contrast, personality questionnaires designed to measure extraversion—introversion, neuroticism, and
other personality factors in healthy normals rely on item contents from empirical (mostly factor-analytic) studies of these primary factors of personality.
As in behavior ratings, research identified a number of typical response sets also in questionnaire data, including acquiescence (readiness to chose the affirmative response alternative, regardless of content) and
social desirability (preference for the socially more acceptable response alternative). One way to cope with
these sources of deficient response objectivity was to introduce special validity scales (as early as the MMPI)
to control for response sets in a person' protocol. Yet individual differences in response sets may—and in fact
Page 20 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
do—relate also to valid personality variance themselves. There is common agreement today that a person'
responses to a questionnaire must not be interpreted as behaviorally veridical, but only within empirically established scale validities. For example, a person' response to the questionnaire item ‘I frequently feel fatigued
without being able to give a reason’ must not be interpreted, for example, as being behaviorally indicative of
the so-called fatigue syndrome. Rather subjects may differ in what they mean by ‘frequently’, by ‘fatigued’,
by ‘without reason’, and on how broad a time and situation sample they base their response. After all, questionnaire data is assessment data about mental representations (perception, memory, evaluation) of behavior
variations in a person' self-perception and self-cognition. They tell us a lot about the awareness persons develop of their own behavior which may, but need not, turn out veridical in objective behavioral terms. So the
aforementioned item will carry its diagnostic value only as contributing to the validity of a psy-chometrically
reliable questionnaire scale, in this case the scale ‘neuroticism’, with proven high clinical validity.
(9) Objective tests: Tests constitute the core of psychological assessment instruments; it is through them that
psychological assessment has reached its level of scientific credibility and wide range of applications. A test
is a sample of items, questions, problems etc. chosen so as to sample, in a representative manner, the universe of items, questions or problems indicative of the trait or state to be assessed, for example, an aptitude
or personality trait or a mood state like alertness. The adjective ‘objective’ refers to administration, scoring,
and response objectivity in test development (with the exception of possibly faking bad, see Section 20.4).
Objective tests have been developed for the full spectrum of behavior variables referenced in Section 20.5;
their number goes into tens of thousands.
A test is called an individual test, if it needs an examiner to administer it individually to the person assessed.
Psychomotor and other performance tests are typical examples of tests still given individually. Still the most
widely used intelligence test system, the Wechsler Adult Intelligent Scale (WAIS; Wechsler, 1958; and later
editions) and its derivatives are administered individually throughout. The other test design, group tests, are
devised so that one examiner can administer them to a number of persons (typically 20 to 30) at the same
time in the same setting. Traditionally group tests were developed in so-called paper-and-pencil form, with the
test items printed in a booklet and the person answering on a special answer sheet. Today the advantages
of individual testing (for example, pacing and selection of items according to the person' own choice; individual timing of item responses) and of group testing (for example, higher objectivity of administration; higher
assessment economy) can be combined in CAT assessment. With the exception of purely manipulative-practical tasks (as in testing psychomotor manipulative skills), almost any type of test item can be adapted to CAT,
with the additional advantage of multimodel (for example, visual plus auditive) information display and efficient taylored or adaptive testing (see Section 20.4). Some important tests widely in use today will be listed in
Sections 20.7–20.11 under the respective problem heading.
While the development of objective behavior tests of performance has been brought to a high level of proficiency and psychometric quality, objective behavior tests of personality still linger in a far-from-final phase of
development—despite massive, continuing efforts by Eysenck, Cattell and many others (cf. Cattell & Warburton, 1965; Hundleby, Pawlik, & Cattell, 1963). There is confirmed empirical evidence to the fact that personality variables, i.e., measures of mode and style of typical behavior (rather than of optimum performance), are
more difficult to assess through objective tests than through conventional questionnaire scales, behavior observations, or behavior ratings. As a consequence, recent research in objective personality test design began
to concentrate on miniature-type laboratory tasks of potential validity, for example, as behavioral markers of
psychopathology (Widiger & Trull, 1991).
(10) Psychophysiological data: All variations in behavior and conscious experience are nervous-system
based, with ancillary input from the hormone and the immune system, respectively, and from peripheral organic processes (see Chapter 4 for a detailed account). This should lead us to expect that individual differences as revealed in psychological assessment should be accessible also, and perhaps even more directly
so, through monitoring psycho-physiological system parameters that relate to the kind of behavior variations
that an assessment is targeted at. These psychophysiological variables include measures of brain activiPage 21 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
ty and brain function plasticity (electroencephalogram, EEG; functional magnetic resonance imaging, fMRI;
magnetoencephalogram, MEG), of hormone and immune system parameters and response pattern, and of
peripheral psychophysiological responses mediated through the autonomic nervous system (cardiovascular
system response patterns: electrocardiogram, ECG; breathing parameters: pneumogram; variations in sweat
gland activity: electrodermal activity, EDA; in muscle tonus: electromyogram, EMG; or in eye movements and
in pupil diameter: pupil-lometry). Standard psychophysiology textbooks (see for example Caccioppo & Tassinary, 1990) introduce basic concepts and measurement operations. Modern computer-assisted recording and
analysis of psychophysiological data facilitate on-line monitoring, often concurrent with presentation of objective tests, in an interview situation or even, by means of portable recording equipment, in a person' habitual
daily life course (ambulatory psychophysiology).
In one kind of psychophysiological assessment one or several of the aforementioned psy-chophysiological
parameters are recorded while the person is shown different stimuli. For example, one measures the orienting
response in electric skin conductance (a parameter solely depending on sympathetic autonomic nervous system activity) to simple tones of medium intensity. It has been shown early, that schizophrenic patients will
follow more frequently than normals a non-responder pattern, showing less clear orienting reactions than normals to these stimuli. While there are less than 10% non-responders in normals, their frequency in schizophrenics approaches 50% (Bernstein, 1987). A rich research literature has accumulated from this approach in
recent years; there is reason to expect that psychophysiological assessments may one day become methods
of first choice for assessing state variations, especially in clinical contexts.
Still another, more recent innovation in psychophysiological assessment refers to stable, genetically linked
biological covariants of personality and aptitude development. Recent research from behavior genetics has
succeeded in identifying, for the first time, circumscribed genetic markers for aspects of intellectual development or for a personality trait like extraversion—introversion (see Pawlik, 1998, for details). Surely individual
differences in intellective functioning and personality formation are determined only in part genetically (see
also Chapter 16). Yet assessing the contributing genetic matrix may one day help to improve our understanding of possible or even necessary supportive behavioral intervention and should prove useful in predictive
assessment.
Before closing this section, two general comments seem in order. First, the ten data sources of psychological
assessment listed in Table 20.1 must not be considered mutually exchangeable. Quite to the contrary, different data sources differ substantially in their specific validity and sensitivity for some and only some assessment variables. We have seen earlier that objective tests are more suitable for assessing performance and
aptitude traits, while questionnaires are more sensitive to detecting differences in personality variables. Furthermore, each data source carries with it source-specific variance, called method variance. Consequently,
measures of the same trait assessed from different data sources will show lower interindividual correlations
as compared with trait measures assessed through the same data source—up to the point that different traits
assessed from the same source may even correlate higher than the same trait assessed through different
sources! It was this problem of method variance that originally led Campbell and Fiske (1959) to devise their
multitrait-multimethod matrix methodology of construct validation (cf. Section 20.4). In practical assessment
work one seeks to counterbalance method-specific sources of variance by combining assessment methods
from different sources, bringing together objective test and behavior observation information plus actuarial
and biographical data, rather than relying solely on test data, for example.
Another comment seems in order on the column labeled ‘variance accessed’ in Table 20.1. Today we begin
to understand that some classical validity problems in psychological assessment do not relate primarily to
psychometric imperfections of assessment instruments employed, but rather to some artificiality imported into
the assessment process by relying too much on laboratory-type data. It has been argued repeatedly in recent
years (also by the present author; see Pawlik, 1998) that psychological assessment must open up to ambulatory or in-field data in order to directly capture sources and degrees of behavioral variation in their naturally
occurring patterns of settings and co-variations. While some assessment sources (3, 4, and 5 in Table 20.1)
Page 22 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
are principally open to in-field applications, others (especially 6, 7, and 8 in Table 20.1) seem to be limited to
stationary application, devoid of in-field input. Here the assessment methodology AMBU (Ambulatory Monitoring and Behavior-Test Unit) developed by Pawlik and Buse (1996) allows one to administer, through the use
of a special portable PC test technology, ultra-short chronometric performance tests together with scales for
self-monitoring (of behavior and mood states, for example) and peripheral psycho-physiological recording under unrestrained infield conditions, with promising within-subject/ within-occasion reliability of measurement.
Fields of application range from ergonomic testing to clinical outpatient monitoring.
20.7 Practical Applications
In this section, the reader will be introduced to some frequently used methods of psychological assessment
for three frequently encountered assessment problems: testing of intellective and other aptitude functions;
psychological assessment in clinical contexts; and vocational guidance testing.
(1) Assessment of intelligence and other aptitude functions: Clearly this is the primary domain of objective
behavior tests. It was mentioned earlier (Section 20.1) that tests of cognitive and other aptitudes were among
the first methods of assessment ever to be developed. Following up on the scaling proposal of mental age
(age-equivalence, in months, of the number of test items solved correctly) as suggested by Binet and Henri
(1896) in their prototype scale of intellectual development in early childhood, the German psychologist William
Stern suggested an intelligence quotient (IQ), defined as the ratio of mental age over biological age, as a
measurement concept for assessing a gross function like intelligence in a score that would be independent
of the age of the person tested. When subsequent research revealed psychometric inadequacies with this
formula, the US psychologist David Wechsler proposed in his test (Wechsler, 1958) an IQ computed as agestandardized normalized standard score (with mean of 100 and standard deviation of 15). Now available in
re-designed and re-standardized form as Wechsler Adult Intelligence Scale (WAIS), Wechsler Intelligence
Scale for Children (WISC) and Wechsler Pre-School Test of Intelligence, this test package has become the
trend-setting intelligence test system of widest application, also internationally through numerous foreign-language adaptations. So a closer look at its assessment structure seems in order.
The WAIS, for example, contains ten individually administered tests of two kinds: verbal tests (general information, general comprehension, digit memory span, arithmetic reasoning, finding similarities of concepts) and
five performance tests (digit—symbol substitution, arranging pictures according to the sequence of a story,
completing pictures, mosaic test block design, object assembly of two-dimensional puzzle pictures). A person' test performance is assessed in three IQ scores: verbal IQ, performance IQ, and total IQ. Surprisingly
enough, this kind of over-all test of cognitive functioning is still maintained in practical assessment work—despite undisputable and overwhelming empirical evidence that general intelligence as a trait will only account
for part, at most perhaps about 30% of individual difference variation in cognitive tests (Carroll, 1993). More
recent examples of general-intelligence type tests are the Kaufman Assessment Battery (Kaufman & Kaufman, 1983, 1993) or, for example, the German-language Begabungstestsystem (BTS; ability test system;
Horn, 1972).
An alternative, theoretically more developed approach is called differential aptitude assessment. Tests in this
tradition are usually based on the results of factor-analytic multi-trait studies of intelligence, originating in the
work of Thurstone, Guilford and their students. Thurstone' Primary Mental Abilities Test (PMA; Thurstone &
Thurstone, 1943), the Differential Aptitude Tests Battery (DAT; Bennett et al., 1981), the Kit of Reference Tests
for Cognitive Factors (French, Ekstrom, & Price, 1963) or the German Intelligenz-Struktur-Test 70 (IST 70;
Amthauer, 1973) and, more recently, the Berliner Intelligenzstruktur-Test (BIS-Test; Jäger, Süss, & Beauducel, 1996) are typical examples of this assessment approach that provides separate standardized scales for
each selected primary intelligence factor.
Page 23 of 53
The International Handbook of Psychology
SAGE
© International Union of Psychological Science
SAGE Reference
In addition to these tests of intellective functions, numerous more specialized aptitude tests have been developed such as the Wechsler Memory Scale (Wechsler & Stone, 1974), speci...
Purchase answer to see full
attachment