The Validity and Structure of Culture-Level
Personality Scores: Data From Ratings
of Young Adolescents
Robert R. McCrae,1 Antonio Terracciano,1 Filip De Fruyt,2
Marleen De Bolle,2 Michele J. Gelfand,3 Paul T. Costa, Jr.,1
and 42 Collaborators of the Adolescent Personality Profiles
of Cultures Project
1
National Institute on Aging
2
3
Ghent University
University of Maryland
ABSTRACT We examined properties of culture-level personality
traits in ratings of targets (N 5 5,109) ages 12 to 17 in 24 cultures.
The Adolescent Personality Profiles of Cultures Project collaborators include Maria E.
Aguilar-Vafaie, Tarbiat Modarres University, Islamic Republic of Iran; Chang-kyu
Ahn, Pusan National University, South Korea; Hyun-nie Ahn, Ewha Womans University, South Korea; Lidia Alcalay, Pontificia Universidad Catolica De Chile, Chile;
Jüri Allik, University of Tartu, Estonia; Tatyana V. Avdeyeva, University of St.
Thomas, USA; Marek Blatný, Academy of Science of the Czech Republic, Czech Republic; Denis Bratko, University of Zagreb, Croatia; Marina Brunner-Sciarra, Universidad Peruana Cayetano Heredia, Peru; Thomas R. Cain, Rutgers University, USA;
Niyada Chittcharat, Srinakharinwirot University, Thailand; Jarret T. Crawford, The
College of New Jersey, USA; Margarida P. de Lima, University of Coimbra, Portugal;
Ryan Fehr, University of Maryland, USA; Emı́lia Ficková, Slovak Academy of Sciences, Slovak Republic; Sami Gülgöz, Koç University, Turkey; Martina Hřebı́čková,
Academy of Science of the Czech Republic, Czech Republic; Lee Jussim, Rutgers
University, USA; Waldemar Klinkosz, The John Paul II Catholic University of Lublin, Poland; Goran Kne&ević, Belgrade University, Serbia; Nora Leibovich de Figueroa, University of Buenos Aires, Argentina; Corinna E. Löckenhoff, Cornell
University, USA; Thomas A. Martin, Susquehanna University, USA; Iris Marušić,
Institute for Social Research, Zagreb, Croatia; Khairul Anwar Mastor, Universiti
Kebangsaan Malaysia, Malaysia; Katsuharu Nakazato, Iwate Prefectural University,
Japan; Florence Nansubuga, Makerere University, Uganda; Jose Porrata, San Juan,
Puerto Rico; Danka Purić, Belgrade University, Serbia; Anu Realo, University of
Tartu, Estonia; Norma Reátegui, Universidad Peruana Cayetano Heredia, Peru;
Journal of Personality 78:3, June 2010
This article is a US Government work and is in the public domain in the USA.
DOI: 10.1111/j.1467-6494.2010.00634.x
816
McCrae, Terracciano, De Fruyt, et al.
Aggregate scores were generalizable across gender, age, and relationship
groups and showed convergence with culture-level scores from previous
studies of self-reports and observer ratings of adults, but they were unrelated to national character stereotypes. Trait profiles also showed crossstudy agreement within most cultures, 8 of which had not previously been
studied. Multidimensional scaling showed that Western and non-Western
cultures clustered along a dimension related to Extraversion. A culturelevel factor analysis replicated earlier findings of a broad Extraversion
factor but generally resembled the factor structure found in individuals.
Continued analysis of aggregate personality scores is warranted.
The idea that the citizens of different nations have distinctive personalities can be traced to antiquity, and it was a central tenet of
early 20th century culture and personality studies (LeVine, 2001).
For a number of reasons, including the declining influence of psychoanalysis and ethical concerns about ethnocentrism (see Church,
2001), the topic fell out of favor, and interest has only recently been
revived, this time from the perspective of trait psychology (Lynn &
Martin, 1995; McCrae, Terracciano, & 79 Members of the Personality Profiles of Cultures Project, 2005; Schmitt et al., 2007). In this
new approach, personality profiles of cultures can be obtained by
averaging traits assessed in a sample of culture members, yielding a
Jean-Pierre Rolland, Université Paris Ouest Nanterre La Défense, France; Vanina
Schmidt, University of Buenos Aires, Argentina; Andrzej Sekowski, The John Paul II
Catholic University of Lublin, Poland; Jane Shakespeare-Finch, Queensland University of Technology, Australia; Yoshiko Shimonaka, Bunkyo Gakuin University,
Japan; Franco Simonetti, Pontificia Universidad Catolica De Chile, Chile; Jerzy
Siuta, Jagiellonian University, Poland; Barbara Szmigielska, Jagiellonian University,
Poland; Vitanya Vanno, Srinakharinwirot University, Thailand; Lei Wang, Peking
University, People’s Republic of China; Michelle Yik, The Hong Kong University of
Science and Technology, Hong Kong.
Robert R. McCrae and Paul T. Costa, Jr., receive royalties from the Revised NEO
Personality Inventory. This research was supported in part by the Intramural Research
Program of the NIH, National Institute on Aging. The Czech contribution was supported by grant 406/07/1561 from the Grant Agency of the Czech Republic and is
related to research plan AV0Z0250504 of the Institute of Psychology, Academy of
Sciences of the Czech Republic. The authors are indebted to the following persons for
their help with the data collection: Ana Butković, Sylvie Kouřilová, Valery E. Oryol,
Ivan G. Senin, Vera V. Onufrieva, A. Maglio, I. Injoque Ricle, G. Blum, A. Calero, L.
Cuenya, V. Pedrón, M. J. Torres Costa, D. Vion, Hamira Alavi, Kristina Burgetova,
Shuo Chen, Irene Lee, Cindy Lo, and Javier Paredes.
Correspondence concerning this article should be addressed to Robert R. McCrae,
809 Evesham Avenue, Baltimore, MD 21212. E-mail: RRMcCrae@gmail.com.
Culture-Level Traits
817
set of aggregate personality traits. This is an etic approach, in which
the same set of traits (usually identified in one culture) are studied
across a range of cultures.
The validity of these culture-level scores must be established, and
there are at least two reasons to be skeptical about their accuracy. The
first is that the personality trait scales that are aggregated may not
themselves be commensurable across cultures: They may assess different constructs in different cultural contexts, or they may lack scalar
equivalence (Nye, Roberts, Saucier, & Zhou, 2008; van de Vijver &
Leung, 1997) due to problems in translation, in the relevance of particular items, or to cultural differences in response styles. These are
theoretical threats to the validity of all cross-cultural measures.
The second reason to doubt the validity of aggregate personality
scores is that research to date suggests that they do not correspond
to national character stereotypes (Perugini & Richetin, 2007). It is
widely believed, for example, that the English are reserved—yet their
aggregate personality scores suggest that they are in fact quite extraverted (McCrae, Terracciano, & 79 Members, 2005). This finding is
not a fluke; analyses of data from 49 cultures suggested that national
stereotypes are almost completely unrelated to aggregate personality
traits (Terracciano et al., 2005). Many stereotypes have at least a
kernel of truth (Madon et al., 1998), so the failure to find any association of national character stereotypes with aggregate personality scores is a legitimate source of concern.
National character data from the Personality Profiles of Cultures
(PPOC) project used by Terracciano and colleagues (2005)—and
reanalyzed in the present article—were obtained by asking raters in
each culture to describe the typical member of their own culture.
Such judgments are sometimes called autostereotypes, in contrast to
the heterostereotypes held by members of one culture about members
of another. Several studies, however, have shown general agreement
between these two kinds of stereotypes (Boster & Maltseva, 2006;
Peabody, 1985). People around the world think that Americans are
assertive and arrogant, and so do Americans (Terracciano & McCrae, 2007). Thus, the apparent inaccuracy of national character stereotypes is unlikely to be the result of ethnocentric or ethnophobic
biases or of the way national character stereotypes were assessed.
It is logically possible that both stereotypes and aggregated scores
are invalid, but if forced to choose between them, researchers must
rely on patterns of supporting evidence. Heine, Buchtel, and
818
McCrae, Terracciano, De Fruyt, et al.
Norenzayan (2008), for example, showed that per capita Gross Domestic Product (GDP) is better predicted by stereotypes of Conscientiousness than by aggregate Conscientiousness scores. But this
evidence is ambiguous, because in stereotypic thinking, industriousness is generally (mis)attributed to the wealthy (Fiske, Cuddy, Glick,
& Xu, 2002), by a kind of variant of the fundamental attribution
error. The weight of evidence to date favors the view that aggregate
scores are accurate and national stereotypes are not (McCrae, Terracciano, Realo, & Allik, 2007b), largely because national stereotypes do not make psychological sense as indicators of national trait
levels. For example, climate is one of the strongest correlates of national stereotypes of interpersonal warmth (McCrae, Terracciano,
Realo, & Allik, 2007a), though few personality psychologists today
believe that ambient temperature is a powerful influence on personality development. Stereotypes also fail to obey simple mathematical
laws: The stereotype of Italians is not the mean of the stereotype of
Northern and Southern Italians, but is almost identical with the latter (et al., 2007a).
A number of cross-cultural methodologists (see Nye et al., 2008)
have argued that the scalar equivalence of test items across cultures
must be established before mean level comparisons are made—a
strategy McCrae, Terracciano, and 79 Members (2005) labeled bottom-up. In contrast, McCrae and colleagues advocated a top-down
strategy in which the construct validity of aggregate scores is examined directly. There is some support for the convergent validity of
aggregate personality scores (e.g., Oishi & Roth, 2009), but it is still
limited. Rentfrow, Gosling, and Potter (2008) provided validity data
on aggregate personality scores for U.S. states, although those data
do not address the difficulties posed by translation and cultural
variations in response styles. McCrae, Terracciano, and 79 Members
(2005) correlated culture-level scores from studies of self-reported
personality traits with scores from observer-rated traits across 28
cultures. They found significant agreement for three (Neuroticism,
Extraversion, and Openness) of the five factors and 26 of 30 facets of
the Revised NEO Personality Inventory (NEO-PI-R; Costa & McCrae, 1992). Analyzed as profile agreement across the 30 facets
within each culture, significant agreement was found for 22 of the
28 cultures. Aggregate personality scores also showed evidence of
construct validity in their prediction of Hofstede’s (2001) dimensions
of culture (Hofstede & McCrae, 2004) and in their geographical
Culture-Level Traits
819
patterns (Allik & McCrae, 2004; McCrae, Terracciano, & 79 Members, 2005), in which Western cultures tended to cluster together in
contrast to non-Western cultures. Using a different measure of personality, Schmitt and colleagues (2007) reported significant convergent validity between NEO-PI-R factor scores and Big Five
Inventory (BFI) scales (John, Donahue, & Kentle, 1991) for three
of the factors (Neuroticism, Extraversion, and Conscientiousness)
across 27 cultures. (Discriminant validity was more problematic.)
Persuasive evidence of the validity of culture-level aggregate personality scores would have important consequences for cross-cultural psychology. First, it would provide researchers with relatively
accurate accounts of the prevailing personality traits in a variety of
cultures, scores that might be used to predict a variety of nation-level
outcomes of interest (McCrae & Terracciano, 2008). Second, it
would reinforce the conclusion that national character stereotypes
are almost completely unfounded—an observation with consequences both for the psychology of stereotypes and for the practice
of international relations. Third, it would imply that the many
theoretical concerns—potential threats to scalar equivalence—that
have been raised about cross-cultural comparisons may have limited
applicability in real-world data, and thus these concerns may have
had an unwarranted chilling effect on mean comparisons in crosscultural research. Certainly, every cross-cultural researcher must
continue to be vigilant against artifactual explanations of apparent
cultural differences, but the validity of aggregate personality traits
would serve as an encouragement to study such differences.
With so much at stake, further evidence on the validity of aggregate personality traits is surely needed. The present article reports
new data from the Adolescent Personality Profiles of Cultures
(APPOC) Project, in which aggregate personality traits are scored
from observer ratings of adolescents aged 12 to 17 in a sample of
24 cultures. Although this is a relatively small number, it includes
8 cultures (Argentina, Australia, Chile, Islamic Republic of Iran,
Puerto Rico, Slovakia, Thailand, and Uganda) not previously included in culture-level studies of the validity of personality profiles.
In studies of personality at the individual level, factor replication
is an aspect of construct validity: If scales retain their validity in
translation (and if the structure of personality is universal), then
the same factor structure should emerge within each culture—as,
for the most part, it does in analyses of the NEO-PI-R (McCrae,
820
McCrae, Terracciano, De Fruyt, et al.
Terracciano, & 78 Members, 2005) and in world regional analyses of
the BFI (Schmitt et al., 2007). However, replication of the individual-level factor structure at the culture level is not necessarily required, because the structure of personality may vary across levels of
analysis. Previous research on the culture-level structure of the
NEO-PI-R (McCrae, 2002; McCrae, Terracciano, & 79 Members,
2005) has suggested that the individual-level Five-Factor Model
(FFM) is approximately replicated, but that the Extraversion factor
is expanded to include aspects of other factors, including Impulsiveness, Openness to Fantasy and Values, and Competence—characteristics that appear to be higher in wealthier and more extraverted
cultures. The present study provides an opportunity to replicate this
culture-level finding.
As a general rule, the analysis of aggregate scores ought to reproduce the individual level structure, unless there are specific effects
on structure due to culture (J. Allik, personal communication, August 10, 2004; McCrae & Terracciano, 2008). The present study uses
data on college students’ perceptions of adolescents ages 12 to 17,
and previous analyses of these data at the individual level (De Fruyt
et al., 2009) suggest one deviation from the universal adult factor
structure: Openness to Ideas shows a substantial loading on Conscientiousness, perhaps because both diligence and an interest in
ideas are attributed to adolescents who are known to be good students. It might therefore be hypothesized that a culture-level factor
analysis of these adolescent data will show that aggregate Openness
to Ideas loads on the Conscientiousness factor as well as the Openness factor.
METHOD
Procedure
As detailed elsewhere (De Fruyt et al., 2009), collaborators from 27 sites
representing 18 different languages from 24 cultures provided data. Ratings from multiple sites were available for the United States (three collaborating sites) and Poland (two collaborating sites). Collaborators were
asked to collect anonymous observer ratings from college students who
were randomly assigned one of four targets: a boy or girl ages 12 to 14 or
15 to 17 years. College student ratings were used instead of self-reports
from adolescents for several reasons (convenience, data quality, comparability to PPOC data), but American studies (Costa, McCrae, & Martin,
821
Culture-Level Traits
2008; McCrae, Costa, & Martin, 2005) suggest that self-reports from adolescents would likely yield similar data. Collaborators were asked to
provide data on 50 targets in each category.
Participants received the following general instructions (cf. McCrae,
Terracciano, & 78 Members, 2005): ‘‘This is a study of personality across
cultures. We are interested in how people view others and rate their personality traits, and we will be comparing your responses to those of college students in other countries. Please think of a boy [girl] aged 12–14
[15–17] whom you know well. He [She] should be someone who is a native-born citizen of your country. He [She] can be a relative or a friend or
neighbor—someone you like or someone you don’t like.’’ Valid ratings
were obtained for 5,109 targets.
Measures
The NEO-PI-R (Costa & McCrae, 1992) is among the most frequently used
inventories to assess the FFM and its dimensions of Neuroticism, Extraversion, Openness to Experience, Agreeableness, and Conscientiousness. The
inventory has 30 facets, organized under the five domains, and includes 240
items (8 items per facet), presented with a 5-point Likert response scale. (For
a discussion of the adequacy of this selection of facets to represent the five
factors, see McCrae & Costa, 2008.) For the present study, participants were
administered a questionnaire consisting of the 240 items of the NEO-PI-R
and 37 additional items developed for the NEO-PI-3, a more readable version
of the instrument (McCrae, Costa, et al., 2005). Previous analyses (De Fruyt
et al., 2009) demonstrated that the psychometric properties of the NEO-PI-3
are maintained in the translations used in this study, and that the instrument
is essentially equivalent to the NEO-PI-R in both structure and mean levels.
It is therefore appropriate to compare NEO-PI-3 scores in the present sample
with NEO-PI-R scores obtained in previous studies. NEO-PI-3 facet scales
were standardized as T scores within the full sample (i.e., using individual
level data, N 5 5,109, as international adolescent Form R NEO-PI-3 norms);
factor scores were computed using the factor scoring weights for observer
ratings presented in the manual (Costa & McCrae, 1992, Table 2, bottom
panel). Aggregate scores were the mean T scores in each sample or subgroup.
An index of data quality was also computed for each sample, based on
four indicators: Number of protocols with more than 40 missing items,
percentage of missing responses in valid protocols, number of protocols
with evidence of acquiescence or naysaying, and responses in the unscreened sample to a single-item validity check asking respondents if they
had answered honestly and accurately. Internal consistency of this quality
index was .67.
822
McCrae, Terracciano, De Fruyt, et al.
Criteria
Validity of aggregate APPOC scores was examined by comparing scores
to those previously reported in other samples. These include aggregate
self-report NEO-PI-R data from a collection of available data sets (McCrae 2002; McCrae & Terracciano, 2008), observer rating NEO-PI-R
data from the adult PPOC (McCrae, Terracciano, & 79 Members, 2005),
and self-report BFI data (Schmitt et al., 2007). In addition, APPOC
scores are also compared to national character stereotype (NCS) data
(McCrae et al., 2007a), in which the ‘‘typical’’ member of a culture was
rated by culture members on 30 scales corresponding to the facets of the
NEO-PI-R. For example, the N1: Anxiety facet was assessed by asking if
the typical culture member was ‘‘anxious, nervous, worrying vs. at ease,
calm, relaxed.’’ When factored across nations, the structure of these
stereotype ratings roughly replicated the structure of the NEO-PI-R
(Terracciano et al., 2005). If stereotypes are, in fact, groundless, then
NCS data provide information on the discriminant validity of aggregate
trait scores.
RESULTS AND DISCUSSION
Preliminary Analyses
We compared personality profiles in the three sites in the United States
and the two sites in Poland. Using the SPSS Reliability program,
treating sites as items and NEO-PI-3 facets as cases, we calculated average measure intraclass correlations under the absolute agreement
definition. These values were .77 for the United States and .82 for
Poland (pso.001). Data from these cultures were therefore collapsed
(as the unweighted means of the different sites) for further analyses.
In previous research (McCrae, Terracciano, & 79 Members,
2005), the variance of facet scores was related to geography, with
larger standard deviations across the full range of facet scores for
modern, Western cultures. The same pattern was found in the present study, with the lowest mean SDs in Malaysia, Peru, and Uganda,
and the highest mean SDs in France, Australia, and Estonia. The
correlation of mean SD in the present study with mean SD in the
PPOC sample was r 5 .73, N 5 24, po.001. These geographical variations might be due to real differences in the homogeneity of traits in
different cultures, to different response styles (e.g., acquiescence), or
to differences in data quality, which also tends to be lower in nonWestern countries (see McCrae, Terracciano, & 79 Members, 2005).
Culture-Level Traits
823
Also in previous research (Costa, Terracciano, & McCrae, 2001;
Schmitt, Realo, Voracek, & Allik, 2008), the magnitude of gender
differences was geographically ordered, with the most marked differences found in modern cultures. As in PPOC (McCrae, Terracciano,
& 78 Members, 2005), we calculated gender difference indexes for
each of the five factors, based on the facets on which adult women
scored higher than men in self-reports (Costa, Terracciano, & McCrae, 2001). For example, because women scored higher than men
on Openness to Aesthetics, Feeling, and Actions and lower on Openness to Ideas, a Female Openness/Closedness index was defined
as (O2: Aesthetics1O3: Feelings1O4 Actions O5: Ideas)/4. Girls
were rated significantly higher than boys in 74 of the 120 comparisons on the five indexes in 24 cultures. As in previous studies, the
five indexes were positively intercorrelated and were summed to represent a general gender differentiation score (a 5 .78). As expected,
the smallest differentiation was seen in Puerto Rico, Peru, and
Uganda and the largest in Hong Kong, Slovakia, and Estonia. However, there were also some anomalous findings: Gender differentiation was low in Australia but relatively high in Malaysia. The
correlation of gender differentiation in the present study with gender differentiation in the PPOC sample was only marginally significant (r 5 .37, N 5 23, po.05, one-tailed). In adult samples, lack of
gender differentiation in traditional cultures has been attributed to
the tendency of traditional men and women to compare themselves
only to others of their own sex, in effect norming away gender
differences in observed scores (Guimond et al., 2007). If so, then true
gender differences are likely to be similar in all cultures.
In any culture-level analysis it is necessary to recall that variation
within cultures is usually far larger that variation across cultures. A
components-of-variance analysis conducted on PPOC data (McCrae
& Terracciano, 2008) showed that culture accounted for about 4%
of the total variance, age (college vs. adult) for 3%, and sex for about
1%. Table 1 provides parallel information for APPOC. Here the
effect of age is far smaller because the age groups differ very little.
The effects of culture and sex, however, are similar to those seen in
adult targets, although in adolescent targets, the effects of culture are
most pronounced for Extraversion and least for Agreeableness.
The top panel of Table 2 presents evidence on the generalizability
of aggregate personality scores across gender and age groups. For
these analyses, culture means for factor scores were derived for boys
824
McCrae, Terracciano, De Fruyt, et al.
Table 1
Percentage of Variance in Observer-Rated NEO-PI-3 Factor Scores
Attributable to Culture, Sex, and Age
Factor
Source
Culture
Sex
Age
Culture Sex
Culture Age
Sex Age
N
E
O
A
C
Mean
3.6n
2.8n
0.2n
0.8n
0.8n
0.1n
5.0n
0.1n
0.0
0.6
0.7
0.0
2.9n
1.2n
0.0
0.9n
1.0n
0.0
1.5n
0.8n
0.2n
0.5
0.5
0.0
4.3n
2.2n
0.2n
0.5
0.7
0.0
3.46
1.42
0.12
0.66
0.74
0.02
Note. N 5 5,109. Age groups: 12 to 14 versus 15 to 17 years. Values are partial Z2
from a multivariate ANOVA. Three-way interactions were not significant.
n
po.05.
and girls (or younger and older targets) separately and correlated
across the 24 cultures. All correlations are significant, suggesting that
similar estimates of culture-level means would be obtained regardless
of the age or gender of the targets.
We asked about the relationship of raters to targets and found
that it varied somewhat across cultures. For example, 30% of the
targets in Thailand were relatives of the raters, whereas 87% were
relatives in Iran. De Fruyt and colleagues (2009) created a familiarity
index based on questions about how well the raters knew the target,
how often they saw them, and in how many different contexts. On a
0 to 4 scale, familiarity values ranged from 1.88 in Japan to 3.35 in
Australia. Raters reported that they had known targets for from 0 to
17 years, with a mean of 9.2 years, but none of the raters had known
their targets for over 10 years in Croatia or Portugal. Because of
these differences across samples, we conducted analyses of variance
on the five factors with culture and each of the dichotomized relationship categories as classifying variables. Most of the effects, even
when significant in this large sample, were trivial in magnitude, and
none of the main effects for relationship category or interaction
effects accounted for more than 1% of the variance. The largest
main effect showed that, unsurprisingly, well-known targets were
rated higher in Extraversion (M 5 50.7) than less well-known targets
(M 5 48.5). We also examined the generalizability of aggregate
825
Culture-Level Traits
Table 2
Generalizability and Convergent Correlations of Culture-Level
Factor Scores
APPOC Factor
N
Generalizability
Across gender
.68nnn
Across age
.61nnn
Across relationships
Type
.84nnn
a
Length
.79nnn
Familiarity
.82nnn
Convergent correlation
Form R
.50nn
Form S
.44n
BFI
.44n
E
O
A
C
.82nnn
.79nnn
.56nn
.50nn
.54nn
.49nn
.83nnn
.72nnn
.80nnn
.78nnn
.65nnn
.59nn
.56nn
.65nnn
.49n
.33
.43n
.73nnn
.63nn
.76nnn
.55nn
.74nnn
.45n
.37n
.14
.27
.02
.35
.05
.09
.36
.17
Note. Type 5 friend or acquaintance (N 5 2,456) versus relative (N 5 2,588).
Length 5 known for less than (N 5 2,528) versus more than (N 5 2,300) 10 years.
Familiarity 5 lower (N 5 2,327) versus higher (N 5 2,629). Form R 5 observer rating NEO-PI-R data, N 5 24, from McCrae, Terracciano, & 79 Members of the Personality Profiles of Cultures Project (2005); Form S 5 self-report NEO-PI-R data,
N 5 16, from McCrae (2002) and McCrae and Terracciano (2008); BFI 5 self-report
Big Five Inventory data, N 5 18, from Schmitt et al. (2007).
a
Across 22 cultures.
n
po.05, nnpo.01, nnnpo.001, one-tailed.
scores across relationship categories. The top panel of Table 2 shows
that, in general, there is strong replicability. Within this pool of
generally well-acquainted raters, the details of the relationship do
not seem to have major effects, so sample differences in these details
are unlikely to affect results.
Convergent and Discriminant Validity of Aggregate Scores
Validity of Scales Across Cultures
The bottom panel of Table 2 shows correlations with aggregate observer ratings (Form R) and self-reports (Form S) on the NEO-PI-R
from previous studies. It also presents correlations with aggregated
BFI self-reports. There is strong evidence of convergent validity for
826
McCrae, Terracciano, De Fruyt, et al.
the Neuroticism and Extraversion factors, only weak evidence for
Openness, and no evidence in these data for the validity of aggregate
Agreeableness and Conscientiousness scores. Nonsignificant correlations for the Agreeableness factor across studies were also reported
by McCrae, Terracciano, and 79 Members (2005) and Schmitt and
colleagues (2007).
Table 3 provides convergent validity information at the level of
the facet scales. The intraclass correlation (first data column;
ICC(1, k) 5 [BMS–WMS]/BMS) reflects agreement among raters
on targets from each of the 24 cultures and estimates the reliability
of the aggregate scores. These values are very slightly smaller than
those found in analyses of adult targets (Mdn ICC 5 .91; McCrae,
Terracciano, & 79 Members, 2005).
The second and third data columns in Table 3 show convergent correlations with observer rating and self-report data on the NEO-PI-R.
For Form R, 23 (76.7%) of the facets show significant cross-study
agreement; for Form S, 20 (66.7%) are significant. E2: Gregariousness,
O4: Actions, O5: Ideas, C3: Dutifulness, and C5: Self-Discipline failed to
reach significance in either comparison; Dutifulness and Self-discipline
also failed to show cross-study agreement in the PPOC study (McCrae,
Terracciano, & 79 Members, 2005). However, the present data relate
aggregate traits in ratings of adolescents using the NEO-PI-3 to aggregate traits in ratings and self-reports of adults using the original NEOPI-R; from this perspective the overall degree of convergence is striking.
A comparison of Tables 3 and 2 highlights a puzzling finding:
Why are the traits that define the Agreeableness and Conscientiousness factors generally related across studies, whereas the factors
themselves are not? In both PPOC (McCrae, Terracciano, & 79
Members, 2005) and APPOC (reported below in Table 5), culturelevel analyses clearly show Agreeableness and Conscientiousness
factors because the facets covary as expected. But the cross-facet,
cross-study correlations are not consistently positive. For example,
the correlation between aggregate A4: Compliance in adolescents
and aggregate A5: Modesty in adults is .53, po.01. Such anomalies may be due to the small sample size (N 5 24), but they may also
imply that there is more agreement on facet-specific variance than on
common variance at the culture level.
The last column of Table 3 reports correlations between APPOC
aggregate traits and NCS scores across 22 cultures. Five correlations
are significant, but three of them are negative. The positive associ-
827
Culture-Level Traits
Table 3
Intraclass Reliability and Cross-Instrument Correlations for NEO-PI-3
Facet Scales
ra
NEO-PI-3 Facet Scale
ICC(1,k)
Form R
Form S
NCS
N1:
N2:
N3:
N4:
N5:
N6:
Anxiety
Angry Hostility
Depression
Self-Consciousness
Impulsiveness
Vulnerability
.90
.79
.86
.77
.87
.90
.65nnn
.52nn
.55nn
.40n
.51nn
.61nnn
.79nnn
.03
.46n
.43n
.60nn
.72nnn
.05
.18
.17
.10
.05
.54nn
E1:
E2:
E3:
E4:
E5:
E6:
Warmth
Gregariousness
Assertiveness
Activity
Excitement Seeking
Positive Emotions
.90
.84
.76
.89
.91
.81
.60nnn
.18
.37n
.39n
.49nn
.43n
.33
.27
.67nn
.51n
.82nnn
.35
.40(n)
.27
.00
.26
.35
.41(n)
O1:
O2:
O3:
O4:
O5:
O6:
Fantasy
Aesthetics
Feelings
Actions
Ideas
Values
.91
.90
.90
.88
.84
.92
.54nn
.58nn
.78nnn
.34
.28
.61nnn
.40
.12
.56n
.04
.08
.75nnn
.10
.21
.14
.29
.07
.04
A1:
A2:
A3:
A4:
A5:
A6:
Trust
Straightforwardness
Altruism
Compliance
Modesty
Tender-Mindedness
.90
.82
.90
.91
.80
.89
.48nn
.24
.74nnn
.60nnn
.63nnn
.32
.48n
.65nn
.72nnn
.44n
.70nn
.47n
.20
.26
.04
.36n
.08
.02
C1:
C2:
C3:
C4:
C5:
C6:
Competence
Order
Dutifulness
Achievement Striving
Self-Discipline
Deliberation
.81
.88
.86
.90
.84
.92
.52nn
.47n
.10
.44n
.24
.58nn
.63nn
.48n
.42
.52n
.18
.68nn
.37(n)
.12
.10
.33
.31
.16
Mdn
.89
.50
.48
.01
a
Correlations with aggregate NEO-PI-R facet scores and NCS scales: Form R (observer rating data, N 5 24) from McCrae, Terracciano, and 79 Members (2005);
Form S (self-report data, N 5 16) from McCrae (2002) and McCrae and Terracciano
(2008); NCS data (N 5 22) from McCrae et al. (2007a).
n
po.05, nnpo.01, nnnpo.001, one-tailed. (n)Significant as one-tailed test in the wrong
direction.
828
McCrae, Terracciano, De Fruyt, et al.
ations of assessed Vulnerability and Compliance with corresponding
national stereotypes and the negative correlation of Warmth with its
stereotype replicate findings in observer rating data on adults but not
in self-report data (Terracciano et al., 2005). Otherwise, these data
are consistent with the findings of Terracciano and colleagues, who
reported no association of assessed personality with national stereotypes.
Validity of Profiles Within Cultures
Table 4 provides data on comparisons of the 30-facet profiles within
each culture. As in previous research, means for each facet were first
standardized across the set of cultures used in each analysis; intraclass correlations were then calculated across the 30 facets by the
double-entry method (see Griffin & Gonzalez, 1995). Comparing
APPOC data to adult Form R data (first data column), significant
profile agreement was found for 18 cultures (75.0%), including 6 of 8
cultures not included in the earlier PPOC comparison (McCrae,
Terracciano, & 79 Members, 2005). Comparing APPOC data to
adult Form S data (third data column), agreement was found for 9 of
16 cultures (56.3%). The magnitude of cross-study agreement was
not related to data quality or n of targets in APPOC.
The fifth data column of Table 4 reports ICC values for profile
agreement with national character stereotypes for 22 cultures. Significant positive correlations were found for Argentina and Turkey,
whereas significant negative correlations—contradicting the hypothesis of veridical stereotypes—were found for Australia, the Czech
Republic, France, Hong Kong, and Peru. None of these correlations
replicated findings reported by Terracciano and colleagues (2005),
and the median intraclass correlation was
.01. These analyses
confirm that national character stereotypes in general do not reflect
mean personality trait levels.
The second, fourth, and sixth data columns of Table 4 report
a second measure of profile agreement, rc (Cohen, 1969). Intraclass
correlations are sensitive to the shape and relative elevation of profiles, but they do not take into account the direction of scoring. A
profile that included measures of Introversion would look quite
different from one that included measures of its polar opposite,
Extraversion, and would generally yield different ICC values, but it
would contain the same information. Cohen’s rc is invariant over the
829
Culture-Level Traits
Table 4
Agreement of Adolescents’ NEO Personality Inventory-3 Profiles With
Adults’ Revised NEO Personality Inventory Profiles and National
Character Survey Scales
Adult NEO-PI-R
Form R
Culture
ICC
Argentinaa
Australiaa
Chilea
Croatia
Czech Republic
Estonia
France
Hong Kong
Islamic Republic
of Irana
Japan
Malaysia
People’s Republic
of China
Peru
Poland
Portugal
Puerto Ricoa
Russia
South Korea
Serbia
Slovakiaa
Thailanda
Turkey
Ugandaa
United States
.43nn
.45nn
.24
.59nnn
.13
.58nnn
.65nnn
.47nn
.04
Mdn
Form S
rc
NCS
ICC
rc
ICC
rc
.43nn
.47nn
.51nn
.63nnn
.13
.59nnn
.65nnn
.58nnn
.05
—
—
—
.25
.53nn
.84nnn
.54nn
.65nnn
—
—
—
—
.26
.57nnn
.85nnn
.56nnn
.70nnn
—
.39n
.34(n)
.10
.04
.33(n)
.18
.37(n)
.40(n)
—
.40n
.32(n)
.11
.06
.30
.18
.39(n)
.34(n)
—
.77nnn
.72nnn
.48nn
.78nnn
.72nnn
.58nnn
.47nn
.65nnn
.03
.48nn
.66nnn
.06
.24
.18
.23
.25
.19
.30
.15
.35n
.20
.41n
.34(n)
.52nn
.51nn
.56nn
.42n
.58nnn
.58nnn
.67nnn
.15
.38n
.56nnn
.43nn
.27
.52nn
.51nn
.64nnn
.46nn
.64nnn
.61nnn
.69nnn
.23
.33n
.28
—
.12
.51nn
.27
—
—
.16
—
.51nn
.24
.37n
.42nn
—
.20
.52nn
.30
—
—
.21
—
.56nnn
.54(nn)
.27
.14
.26
.06
.18
.12
.17
—
.42n
.07
.17
.52(nn)
.29
.07
.21
.03
.24
.12
.13
—
.43nn
.11
.18
.48
.54
.40
.45
.01
.07
Note. N 5 30 facets. ICC 5 intraclass correlations (double-entry method). rc 5
Cohen’s r. Form R (observer rating) data from McCrae, Terracciano, & 79 Members
(2005). Form S (self-report) data from McCrae (2002) and McCrae and Terracciano
(2008). NCS 5 National Character Survey; NCS data from McCrae et al. (2007a).
a
Not included in previous studies of culture-level convergent validity.
n
po.05, nnpo.01, nnnpo.001, one-tailed. (n),(nn)Significant as one-tailed test in the
wrong direction.
830
McCrae, Terracciano, De Fruyt, et al.
direction of scale scoring because each scale’s reflection around the
mean (in this case, T 5 50) is also included in the profile. It is sensitive
to both the shape and the absolute elevation of the two profiles.
Reanalysis of data on profile agreement across observers (McCrae,
2008) showed that rc is as effective as ICC in identifying matched
versus mismatched data. Table 4 reports rc values and provides further support for the view that aggregate adult personality scores, but
not national character stereotypes, are related to aggregate adolescent
scores. Adolescent profiles for Chile and Portugal are significantly
related to adult profiles when rc is used as the measure of profile
agreement.
Geographical Patterns
Associations among aggregate personality profiles were examined using nonmetric Multidimensional Scaling (MDS) to see if profile similarity was associated with geographical patterns. Analysis followed the
methods used in previous research (Allik & McCrae, 2004; McCrae,
Terracciano, & 79 Members, 2005): Aggregate scores for the 24
cultures were standardized across cultures, a distance matrix was calculated based on (1–Pearson r) across the 30 NEO-PI-3 facets,
coordinates for two MDS dimensions were derived (StatSoft, 1995),
and these coordinates were correlated with factor scores and rotated
to maximize the correlations of the vertical axis with Neuroticism
(r 5 .75) and the horizontal axis with Extraversion (r 5 .83). The standardized stress value for the two-dimensional solution was .21, which
suggests the need for additional dimensions (five dimensions showed a
stress value of .06), but because our intent was to compare these results
to previous MDS results, we report the two-dimensional solution.
Figure 1 displays results. As in previous studies, Western cultures
are found on the right (extraverted) side of the plot, non-Western
cultures on the left. French, Czechs, Argentines, and Hong Kong
Chinese are again found at the top of the figure and Estonians and
Mainland Chinese at the bottom. There is one notable difference:
Russian adolescents are located in the bottom right of the figure and
thus appear to be more adjusted and extraverted than older Russians
(McCrae, Terracciano, & 79 Members, 2005). Resemblance to the
MDS analysis of PPOC data can be quantified by correlating
the coordinates across the two studies. Agreement was strong for
the horizontal axis, r 5 .71, N 5 24, po.001; for the vertical axis,
831
Culture-Level Traits
1.5
°HK Chinese
1
French
°Japanese °Chileans
0.5
°Australians
°Argentines
°Czechs
°Thais
°Portuguese °Turks
°Malays
°Ugandans
0
°Puerto Ricans
°Peruvians
–0.5
°Iranians
°S. Koreans
°
–1
–1.5
–1.5
° Croatians
°Slovaks
°Poles
Chinese
°Serbians
°
°Americans
°Estonians
–1
–0.5
0
0.5
1
1.5
Figure 1
Multidimensional scaling plot of 24 cultures based on a distance
matrix of (1–Pearson r) for the 30 NEO Personality Inventory-3 facet
scores, standardized across cultures. The vertical axis is maximally
aligned with Neuroticism and the horizontal axis with Extraversion.
HK Chinese 5 Hong Kong Chinese. S. Koreans 5 South Koreans.
however, it was r 5 .34, ns. Omitting the Russians, the correlation for
the vertical axis increased to r 5 .51, N 5 23, po.05.
Culture-Level Factor Structure
As in previous studies, principal component analyses at the culture
level were undertaken using mean values from subsamples in order
to obtain a reasonably large number of cases. For the present study,
108 subsamples were used, representing older and younger adolescent boys and girls from each of the 27 sites. Results after Procrustes
rotation are reported in Table 5. Even in this small sample, the normative, adult, individual-level structure is reasonably replicated for
Neuroticism, Extraversion, Agreeableness, and Conscientiousness
832
McCrae, Terracciano, De Fruyt, et al.
Table 5
Culture-Level Factor Structure of NEO-PI-3 Facet Scales
Procrustes-Rotated Principal Component
A
C
VCa
.03
.03
.04
.10
.23
.29
.22
.19
.19
.06
.33
.10
.19
.17
.01
.07
.49
.28
.90b
.91c
.91c
.95b
.94b
.89c
.64
.70
.53
.44
.51
.53
.27
.43
.39
.60
.16
.45
.41
.20
.21
.35
.51
.26
.20
.00
.39
.16
.17
.03
.91c
.84
.96b
.76
.88c
.87c
.38
.31
.27
.24
.14
.19
.65
.03
.59
.03
.09
.51
.01
.46
.44
.73
.02
.13
.09
.38
.27
.11
.18
.22
.34
.58
.17
.17
.50
.18
.49
.76
.91c
.87c
.24
.04
Trust
Straightforwardness
Altruism
Compliance
Modesty
Tender-Mindedness
.05
.07
.00
.05
.03
.19
.46
.17
.71
.47
.10
.41
.15
.26
.18
.08
.14
.18
.40
.59
.29
.48
.47
.46
.03
.01
.16
.23
.18
.52
.82
.80
.90c
.71
.89c
.65
Competence
Order
Dutifulness
Achievement Striving
Self-Discipline
Deliberation
.26
.02
.02
.10
.13
.09
.39
.13
.02
.15
.04
.38
.11
.22
.15
.24
.12
.02
.01
.28
.40
.11
.22
.41
.75
.78
.83
.88
.87
.72
.91c
.79
.94b
.99b
.92c
.96b
Factor congruenced
.93b
.88b
.47
.86b
.88b
.83b
NEO-PI-3 Facet Scale
N
E
N1:
N2:
N3:
N4:
N5:
N6:
Anxiety
Angry Hostility
Depression
Self-Consciousness
Impulsiveness
Vulnerability
.83
.80
.81
.77
.48
.77
.11
.01
.14
.20
.29
.13
E1:
E2:
E3:
E4:
E5:
E6:
Warmth
Gregariousness
Assertiveness
Activity
Excitement Seeking
Positive Emotions
.12
.03
.24
.18
.18
.06
O1:
O2:
O3:
O4:
O5:
O6:
Fantasy
Aesthetics
Feelings
Actions
Ideas
Values
A1:
A2:
A3:
A4:
A5:
A6:
C1:
C2:
C3:
C4:
C5:
C6:
O
Note. These are principal components from 108 subsamples targeted to the American normative factor structure. Loadings greater than .40 in absolute magnitude are
given in boldface. aVariable congruence coefficient; total congruence coefficient in
the last row. bCongruence higher than that of 99% of rotations from random data.
c
Congruence higher than that of 95% of rotations from random data. dCongruence
with American normative factor structure.
Culture-Level Traits
833
factors (congruence 4.85; Lorenzo-Seva & ten Berge, 2006), and 26
of the 30 facets show loadings above .40 on the intended factor.
Comparisons to randomly permuted data from an earlier study of
the NEO-PI-R (McCrae, Zonderman, Costa, Bond, & Paunonen,
1996) suggested that 4 factor congruences and 19 of the 30 variable
congruence coefficients exceeded chance values.
However, the Openness factor is clearly not replicated. Three of its
intended facets are unrelated to the factor, and three of the definers of
the observed factor are facets of Extraversion. There appear to be two
reasons for these deviations from the usual structure. First, Openness
to Ideas loads on the Conscientiousness factor. This finding at the
culture level is expected, given that, in these data, Openness to Ideas
loads strongly (.48 to .51) on the Conscientiousness factor at the
individual level (De Fruyt et al., 2009). Although sometimes seen
in self-reports (Hřebı́čková, 2008), this phenomenon appears chiefly in
observer ratings of adolescents. Costa et al. (2008) reported a loading
of .39 for Openness to Ideas on the Conscientiousness factor when
middle-school-aged respondents rated another child of the same age,
but only .24 when they rated themselves. In observer ratings of college
students and adults (McCrae, Terracciano, & 78 Members, 2005),
the loading of O5: Ideas on Conscientiousness is .31; in self-reports
from adults (Costa & McCrae, 1992), it is .16. It thus appears that high
loadings of O5: Ideas on Conscientiousness are a joint function of
method and target age: When outside observers assess intellectual
curiosity in school children, they are apt to confuse it with academic
success, which is also associated with Conscientiousness. Teachers, for
example, attribute academic self-esteem to students they rate as high in
both Conscientiousness and Openness (Graziano & Ward, 1992). By
contrast, when American adolescents rate themselves, they can distinguish between intrinsic intellectual interest and academic achievement
orientation (Costa et al., 2008).
The Openness factor is also poorly defined because O1: Fantasy and
O6: Values have their major loadings on the Extraversion factor. This
is not unique to analyses of adolescents or of observer ratings; instead,
it appears to be a culture-level phenomenon. Modern Western nations
tend to be high on Extraversion, and they also tend to embrace such
self-expressive values as imagination and tolerance (Inglehart, 1997).
Raters from such cultures are thus more likely to describe their compatriot targets as high both in Extraversion and in traits like Fantasy
and Values. As data simulations show (McCrae & Terracciano, 2008),
834
McCrae, Terracciano, De Fruyt, et al.
the effect is to broaden the culture-level Extraversion factor to represent something more like individualism.
This is, however, only part of the story. In adult data from PPOC,
Openness to Fantasy and Values had joint loadings on the culturelevel Extraversion and Openness factors (McCrae, Terracciano, & 79
Members, 2005), whereas Table 4 shows no loadings at all for these
facets on the Openness factor. At least with regard to Openness to
Values, this may be because young adolescents do not yet have a
clearly defined ideology, leading to very low internal consistency for
this facet (Costa et al., 2008; De Fruyt et al., 2009).
Conclusion
The present study, using college students’ ratings of adolescents aged
12 to 17 on a modified version of the NEO-PI-R in 24 cultures,
provides further evidence for three conclusions. First, there is general
agreement about characterizations of cultures based on personality
assessments of individuals: Adult self-reports, observer ratings of
adults, and now observer ratings of adolescents all show similar
patterns, whether one considers each trait across all cultures or the
profile of all traits within each culture or the clustering of culture
profiles in multidimensional space. Second, there is no consistent
agreement between these aggregate characterizations of cultures
and the corresponding collective beliefs about traits of the ‘‘typical’’
culture member: National character stereotypes again appear to be
largely unfounded. Finally, there is further evidence that the culturelevel factor structure differs from the individual-level structure with
regard to the Extraversion factor. In ratings of young adolescents, as
in observer ratings and self-reports of college students and adults,
Openness to Fantasy and Values, Competence, and low Compliance
are associated with the Extraversion factor, but only at the culture
level. This robust finding requires a culture-level explanation.
The repeated finding that national character stereotypes are unrelated to assessed aggregate personality has seemed counterintuitive to
some psychologists (e.g., Perugini & Richetin, 2007), but it makes sense
if national stereotypes are, in fact, determined chiefly by such nonpsychological features as a nation’s wealth or mean temperature (McCrae
et al., 2007a). This finding is not of merely academic interest: Beliefs
about national character can have an important influence on political
and social views and affect both ethnic and international relations.
835
Culture-Level Traits
Psychologists should educate the public on the dangers of stereotypic
thinking, especially with regard to national stereotypes. At the same
time, they need to conduct more research on the origins of these beliefs
and how they might be changed (Terracciano & McCrae, 2007).
Other findings from the present study pose more purely intellectual challenges. At the individual level, aggregating facets to define
broad domains generally leads to more reliable and valid scores. For
example, among adolescents ages 14–20, the median cross-observer
correlation for the five NEO-PI-3 domains is .53, whereas the median for the 30 facets is only .43 (McCrae, Costa, et al., 2005). That
pattern is reversed at the culture level: In the present study, the median Form R cross-study correlation is .37 for the five domains but
.50 for the 30 facets. It is possible that this finding is a fluke, attributable to the small number of cultures examined. Until that can be
established, however, it would appear wise to conduct cross-cultural
comparisons of aggregate traits chiefly at the facet level: We can have
more confidence in the claim that a given culture is high in Altruism
or Deliberation than that it is high in Agreeableness or Conscientiousness. Studies on the cultural origins or effects of personality
traits should target specific facets.
The basic claim of the field of culture-level personality studies—
that averaging the trait scores of a sample of culture members can
yield meaningful information about the personality profile of the
culture group itself—is far from indisputable, but it has shown itself
to be a valuable working hypothesis. How far this hypothesis can be
generalized to other individual difference variables (e.g., attitudes,
interests, values) remains to be seen.
REFERENCES
Allik, J., & McCrae, R. R. (2004). Toward a geography of personality traits: Patterns
of profiles across 36 cultures. Journal of Cross-Cultural Psychology, 35, 13–28.
Boster, J. S., & Maltseva, K. (2006). A crystal seen from each of its vertices: European
views of European national characters. Cross-Cultural Research, 40, 47–64.
Church, A. T. (2001). Introduction. Journal of Personality, 69, 787–801.
Cohen, J. (1969). rc: A profile similarity coefficient invariant over variable reflection. Psychological Bulletin, 71, 281–284.
Costa, P. T., Jr., & McCrae, R. R. (1992). Revised NEO Personality Inventory
(NEO-PI-R) and NEO Five-Factor Inventory (NEO-FFI) professional manual.
Odessa, FL: Psychological Assessment Resources.
836
McCrae, Terracciano, De Fruyt, et al.
Costa, P. T., Jr., McCrae, R. R., & Martin, T. A. (2008). Incipient adult personality: The NEO-PI-3 in middle school-aged children. British Journal of Developmental Psychology, 26, 71–89.
Costa, P. T., Jr., Terracciano, A., & McCrae, R. R. (2001). Gender differences in
personality traits across cultures: Robust and surprising findings. Journal of
Personality and Social Psychology, 81, 322–331.
De Fruyt, F., De Bolle, M., McCrae, R. R., Terracciano, A., Costa, P. T., Jr.,
& 43 Collaborators of the Adolescent Personality Profiles of Cultures
Project (2009). Assessing the universal structure of personality in early adolescence: The NEO-PI-R and the NEO-PI-3 in 24 cultures. Assessment, 16,
301–311.
Fiske, S. T., Cuddy, A. J. C., Glick, P., & Xu, J. (2002). A model of (often mixed)
stereotype content: Competence and warmth respectively follow from perceived status and competition. Journal of Personality and Social Psychology,
82, 878–902.
Graziano, W. G., & Ward, D. (1992). Probing the Big Five in adolescence:
Personality and adjustment during a developmental transition. Journal of Personality, 60, 425–439.
Griffin, D., & Gonzalez, R. (1995). Correlational analysis of dyad-level data in the
exchangeable case. Psychological Bulletin, 118, 430–439.
Guimond, S., Brunot, S., Chatard, A., Garcia, D. M., Martinot, D., Branscombe,
N. R., et al. (2007). Culture, gender, and the self: Variations and impact of
social comparison processes. Journal of Personality and Social Psychology, 92,
1118–1134.
Heine, S. J., Buchtel, E. E., & Norenzayan, A. (2008). What do cross-national
comparisons of personality traits tell us? The case of conscientiousness. Psychological Science, 19, 309–313.
Hofstede, G. (2001). Culture’s consequences: Comparing values, behaviors, institutions, and organizations across nations (2nd ed.). Thousand Oaks, CA: Sage.
Hofstede, G., & McCrae, R. R. (2004). Personality and culture revisited: Linking
traits and dimensions of culture. Cross-Cultural Research, 38, 52–88.
Hřebı́čková, M. (2008). NEO osobnostni inventar (NEO-PI-3) v diagnostice
osobnosti deti a adolescentu [NEO Personality Inventory (NEO-PI-3) in
diagnostics of children and adolescents]. Ceskoslovenska Psychologie, 52,
425–442.
Inglehart, R. (1997). Modernization and postmodernization: Cultural, economic,
and political change in 43 societies. Princeton, NJ: Princeton University Press.
John, O. P., Donahue, E. M., & Kentle, R. L. (1991). The ‘‘Big Five’’ Inventory—
Versions 4a and 54. Berkeley: University of California, Berkeley, Institute of
Personality and Social Research.
LeVine, R. A. (2001). Culture and personality studies, 1918–1960: Myth and history. Journal of Personality, 69, 803–818.
Lorenzo-Seva, U., & ten Berge, J. M. F. (2006). Tucker’s congruence coefficient as
a meaningful index of factor similarity. Methodology, 2, 57–64.
Lynn, R., & Martin, T. (1995). National differences for thirty-seven nations
in Extraversion, Neuroticism, Psychoticism and economic, demographic and
other correlates. Personality and Individual Differences, 19, 403–406.
Culture-Level Traits
837
Madon, S., Jussim, L., Keiper, S., Eccles, J., Smith, A., & Palumbo, P. (1998). The
accuracy and power of sex, social class, and ethnic stereotypes: A naturalistic
study in person perception. Personality and Social Psychology Bulletin, 24,
1304–1318.
McCrae, R. R. (2002). NEO-PI-R data from 36 cultures: Further intercultural
comparisons. In R. R. McCrae & J. Allik (Eds.), The Five-Factor Model of
personality across cultures (pp. 105–125). New York: Kluwer Academic/
Plenum Publishers.
McCrae, R. R. (2008). A note on some measures of profile agreement. Journal of
Personality Assessment, 90, 105–109.
McCrae, R. R., & Costa, P. T., Jr. (2008). Empirical and theoretical status of
the Five-Factor Model of personality traits. In G. Boyle, G. Matthews, &
D. Saklofske (Eds.), Sage handbook of personality theory and assessment (Vol.
1, pp. 273–294). Thousand Oaks, CA: Sage.
McCrae, R. R., Costa, P. T., Jr., & Martin, T. A. (2005). The NEO-PI-3: A more
readable Revised NEO Personality Inventory. Journal of Personality Assessment, 84, 261–270.
McCrae, R. R., & Terracciano, A. (2008). The Five-Factor Model and its correlates in individuals and cultures. In F. J. R. Van de Vijver, D. A. van Hemert,
& Y. H. Poortinga (Eds.), Multilevel analyses of individuals and cultures
(pp. 249–283). Mahwah, NJ: Erlbaum.
McCrae, R. R., Terracciano, A., & 78 Members of the Personality Profiles of
Cultures Project. (2005). Universal features of personality traits from the
observer’s perspective: Data from 50 cultures. Journal of Personality and Social
Psychology, 88, 547–561.
McCrae, R. R., Terracciano, A., & 79 Members of the Personality Profiles of
Cultures Project. (2005). Personality profiles of cultures: Aggregate personality
traits. Journal of Personality and Social Psychology, 89, 407–425.
McCrae, R. R., Terracciano, A., Realo, A., & Allik, J. (2007a). Climatic warmth
and national wealth: Some culture-level determinants of national character
stereotypes. European Journal of Personality, 21, 953–976.
McCrae, R. R., Terracciano, A., Realo, A., & Allik, J. (2007b). On the validity of
culture-level personality and stereotypes score. European Journal of Personality, 21, 987–991.
McCrae, R. R., Zonderman, A. B., Costa, P. T., Jr., Bond, M. H., & Paunonen,
S. V. (1996). Evaluating replicability of factors in the Revised NEO Personality
Inventory: Confirmatory factor analysis versus Procrustes rotation. Journal of
Personality and Social Psychology, 70, 552–566.
Nye, C. D., Roberts, B. W., Saucier, G., & Zhou, X. (2008). Testing the measurement equivalence of personality adjective items across cultures. Journal of
Research in Personality, 42, 1524–1536.
Oishi, S., & Roth, D. P. (2009). The role of self-reports in culture and personality
research: It is too early to give up on self-reports. Journal of Research in
Personality, 43, 107–109.
Peabody, D. (1985). National characteristics. New York: Cambridge University Press.
Perugini, M., & Richetin, J. (2007). In the land of the blind, the one-eyed man is
king. European Journal of Personality, 21, 977–981.
838
McCrae, Terracciano, De Fruyt, et al.
Rentfrow, P. J., Gosling, S. D., & Potter, J. (2008). A theory of the emergence,
persistence, and expression of geographic variation in psychological characteristics. Perspectives on Psychological Science, 3, 339–369.
Schmitt, D. P., Allik, J., McCrae, R. R., Benet-Martı́nez, V., Alcalay, L., Ault, L.,
et al. (2007). The geographic distribution of Big Five personality traits: Patterns and profiles of human self-description across 56 nations. Journal of
Cross-Cultural Psychology, 38, 173–212.
Schmitt, D. P., Realo, A., Voracek, M., & Allik, J. (2008). Why can’t a man be
more like a woman? Sex differences in Big Five personality traits across 55
cultures. Journal of Personality and Social Psychology, 94, 168–182.
StatSoft. (1995). Statistica (Vol. 3): Statistics II [Computer software and manual]. Tulsa, OK: Author.
Terracciano, A., Abdel-Khalak, A. M., Ádám, N., Adamovová, L., Ahn, C.-k.,
Ahn, H.-n., et al. (2005). National character does not reflect mean personality
trait levels in 49 cultures. Science, 310, 96–100.
Terracciano, A., & McCrae, R. R. (2007). Perceptions of Americans and the Iraq
invasion: Implications for understanding national character stereotypes. Journal of Cross-Cultural Psychology, 38, 695–710.
van de Vijver, F. J. R., & Leung, K. (1997). Methods and data analysis of comparative research. In J. W. Berry, Y. H. Poortinga, & J. Pandey (Eds.), Handbook of cross-cultural psychology: Vol 1: Theory and method (pp. 257–300).
Boston: Allyn and Bacon.
Journal of Personality and Social Psychology
2010, Vol. 99, No. 5, 870 – 882
© 2010 American Psychological Association
0022-3514/10/$12.00 DOI: 10.1037/a0020963
How People See Others Is Different From How People See Themselves:
A Replicable Pattern Across Cultures
Jüri Allik, Anu Realo, and René Mõttus
Peter Borkenau
University of Tartu
Martin-Luther-Universität Halle–Wittenberg
Peter Kuppens
Martina Hřebı́čková
University of Melbourne and Katholieke Universiteit Leuven
Academy of Sciences of the Czech Republic
Consensus studies from 4 cultures—in Belgium, the Czech Republic, Estonia, and Germany—as well as
secondary analyses of self- and observer-reported Revised NEO Personality Inventory (NEO PI-R) data
from 29 cultures suggest that there is a cross-culturally replicable pattern of difference between internal
and external perspectives for the Big Five personality traits. People see themselves as more neurotic and
open to experience compared to how they are seen by other people. External observers generally hold a
higher opinion of an individual’s conscientiousness than he or she does about him- or herself. As a rule,
people think that they have more positive emotions and excitement seeking but much less assertiveness
than it seems from the vantage point of an external observer. This cross-culturally replicable disparity
between internal and external perspectives was not consistent with predictions based on the actor–
observer hypothesis because the size of the disparity was unrelated to the visibility of personality traits.
A relatively strong negative correlation (r ⫽ ⫺.53) between the average self-minus-observer profile and
social desirability ratings suggests that people in most studied cultures view themselves less favorably
than they are perceived by others.
Keywords: personality ratings, internal and external perspective, cross-cultural comparison,
self-enhancement, the actor– observer hypothesis
In addition to the consistent pattern of covariation among personality traits, several other surprisingly universal features of
personality have been discovered. For example, it was found that
in almost every culture, women reported themselves to be higher
in Neuroticism, Agreeableness, Warmth, and Openness to Feelings, whereas men were higher in Assertiveness and Openness to
Ideas (Costa, Terracciano, & McCrae, 2001; Feingold, 1994;
Schmitt, Realo, Voracek, & Allik, 2008). Quite unexpectedly,
these sex differences in personality increased with higher levels of
human development, including a long and healthy life, equal
access to knowledge and education, and economic prosperity
(Costa et al., 2001; Schmitt et al., 2008). What is particularly
remarkable is that the cross-cultural convergence between different studies on sex differences in personality was demonstrably
stronger and more replicable than the convergence between the
mean levels of the traits themselves (Schmitt et al., 2007, 2008).
Another feature that seems to easily transcend cultures is age
difference. In every human society explored so far, individuals
become less extraverted and open to new experiences and more
agreeable and conscientious with age (Costa et al., 2000; McCrae,
Costa, Hřebı́čková et al., 2004; McCrae et al., 1999; Srivastava,
John, Gosling, & Potter, 2003). In spite of some specific features
characterizing particular cultures, the general pattern of difference
between younger and older individuals is stable and highly replicable across the 30 personality traits measured by the NEO PI-R
questionnaire, even across dissimilar cultures (Allik et al., 2009).
In light of the discovery of these universal features, it is surprising that there is no consensus about the systematic differences
More than a decade ago, McCrae and Costa (1997) proposed the
hypothesis that the pattern of covariation among basic personality
traits is a universal feature of the human species. Several recent
large-scale cross-cultural studies have supported this idea, showing
that a common five-factor structure of personality traits can be
found in all languages and cultures examined so far (McCrae,
Terracciano, & 78 Members of the Personality Profiles of Cultures
Project, 2005; Schmitt, Allik, McCrae, & Benet-Martinez, 2007).
However, the five-factor model is not the only replicable personality structure because Eysenck’s three-factor (van Hemert, van de
Vijver, Poortinga & Georgas, 2002) and psycholexical six-factor
(Lee & Ashton, 2008) structures have also been replicated in many
cultures.
Jüri Allik, Anu Realo, and René Mõttus, Department of Psychology,
University of Tartu, Tartu, Estonia; Peter Borkenau, Institut für Psychologie, Martin-Luther-Universität Halle–Wittenberg, Halle, Germany; Peter
Kuppens, School of Behavioral Science, University of Melbourne, Melbourne, Australia, and Department of Psychology, Katholieke Universiteit
Leuven, Leuven, Belgium; Martina Hřebı́čková, Institute of Psychology,
Academy of Sciences of the Czech Republic, Prague, the Czech Republic.
This project was supported by Estonian Ministry of Science and Education Grant SF0180029s08 and Estonian Science Foundation Grant
ESF7020 to Jüri Allik. The Czech participation was supported by Grant
Agency of the Czech Republic Grant P407/10/2394. We are very thankful
to Robert R. McCrae for his helpful comments and suggestions.
Correspondence concerning this article should be addressed to Jüri
Allik, Department of Psychology, University of Tartu, Tiigi 78, Tartu
50410, Estonia. E-mail: juri.allik@ut.ee
870
SELF VERSUS OTHER RATINGS IN PERSONALITY JUDGMENT
between how people see others and how they see themselves,
despite some theories that have been proposed. Social psychologists, for example, have invested a considerable amount of energy
in the promotion of the idea that there is a fundamental disparity
between the way people perceive themselves and the way they are
perceived by others (Jones & Nisbett, 1971; Nisbett, Caputo,
Legant, & Marecek, 1973; Watson, 1982). This disparity is believed to originate from an inevitable asymmetry between internal
and external viewpoints: People are immersed in their own sensations, emotions, and cognitions at the same time that their experience of others is dominated by what can be observed externally
(Pronin, 2008). Our aim in the current article is to systematically
analyze the differences between how people judge their personality and how their personality is judged by others across different
cultures. Analyzing the existing literature, it is possible to distinguish at least two principal mechanisms by which personality
descriptions made from the perspective of the first person might be
systematically different from those made from the vantage point of
the third person.
Different Information
As an external observer cannot see the target person in all
situations, it is almost inevitable that the information possessed by
the target person must be different from the information that is
available to an external observer. Even when it comes to information that is equally available to the target person and the external
observer, it is possible that the target person could ignore some
information that was attended to by the external observer in
making his or her judgment. One consequence of this, as argued by
Jones and Nisbett (1971), is the actor– observer asymmetry in
attribution, that is the pervasive tendency for actors to attribute
their behavior to external (situational) causes and for observers to
attribute the same behavior to internal causes (dispositional qualities) of the actor. As a further evidence of the divergent perspectives of the actor and observer, among other things observers were
found to ascribe more personality traits to other people than to
themselves (Nisbett et al., 1973). However, a recent meta-analysis
involving more than 170 studies established that the actor–
observer hypothesis— counter to what textbook descriptions and
commonly held beliefs suggest—is neither firmly established nor
robust and that evidence for it is surprisingly limited (Malle,
2006). For example, the established beliefs that people ascribe
more personality traits to others than to themselves (Sande, Goethals, & Radloff, 1988) and that they perceive more complexity in
their behavior than that of others (Locke, 2002) were not confirmed in these later studies.
It is not self-evident how the principle of “more personality
traits” can be operationalized for responses to a fixed list of items.
Personality psychology questionnaires are constructed on the assumption that all personality traits are applicable to all individuals;
respondents thus need to assess themselves or the others on each of
these traits. Despite the fact that considerable self– other agreement
can be obtained for all personality traits, the convergence of
judgments either between self and observer or between different
observers is considerably stronger on some traits than on the other
traits (Colvin, 1993; Funder, 1999; John & Robins, 1993). The
most likely cause for these differences is the type and amount of
information available to the self and the external observer. In
871
particular, results suggest that traits pertaining to extraversion are
revealed relatively directly in social behavior and, therefore, are
easy to judge, whereas traits pertaining to neuroticism are less
visible and, so, are judged less accurately (Funder & Colvin, 1988;
Funder & Dobroth, 1987). It has also been noticed that traits
reflecting affective states are more difficult to judge than are traits
which manifest themselves in overt behaviors (Spain, Eaton, &
Funder, 2000; Watson, Hubbard, & Wiese, 2000). Lack of judgeability or visibility of traits can lead, in principle at least, to a
disparity between self- and observer-ratings (John & Robins,
1993). It is not difficult to imagine, for example, why people see
themselves as more neurotic compared to how they are seen by
other people. Neuroticism reflects, to a considerable extent, inner
states of an individual that are not necessarily accessible to an
external observer. Provided that these externally unobservable
instances of neurotic tendencies influence people’s self-evaluation
of neuroticism, the expected outcome is a disparity between selfand observer-ratings. Thus, one possible scenario is that the self–
other disparity is less pronounced on more observable traits—as
operationalized using rank-order correlations between self- and
observer-ratings—whereas the disparity between rater’s perspectives is more manifest in less externally observable traits.
Self-Enhancement
One of the most pervasive explanations for the disparity between self- and observer-perceptions is that people are systematically engaged in self-enhancement: They view themselves more
favorably than they view others (Kenny, 1994; Kwan, John,
Kenny, Bond, & Robins, 2004). Although some cross-cultural
differences seem to exist (Heine & Hamamura, 2007; Heine &
Renshaw, 2002), a recent cross-cultural study demonstrated that in
all 56 cultures studied, people’s mean value of self-esteem was
above the scale neutral point (Schmitt & Allik, 2005), suggesting
that most people are motivated to maintain a positive view of
themselves. Many studies have demonstrated that college students
rate their own personality traits in more socially desirable terms
than they do when rating the “average college student” (Alicke,
1985; Alicke, Klotz, Breitenbecher, Yurak, & Vredenburg, 1995;
J. Krueger, 1998). However, the effect of this unrealistically positive view of themselves disappears, or is considerably reduced,
when a specific person, not an average college student, is assessed
(Alicke et al., 1995). Nevertheless, it is possible that the mean
difference between self- and observer-ratings reflects, to a certain
degree, the social desirability of personality traits. Thus, one
expected outcome of self-enhancement is that mean self-rating
scores are higher than are the observer-ratings on those personality
traits believed to have higher social desirability.
However, all these predictions about systematic differences
between how people see others’ and their own personality traits
still need to take into account the fact that normative self-rated
personality mean scores converge almost perfectly with normative
observer-rated mean scores. For example, the correlation between
the mean profiles of the adult S-Form (self-ratings) and R-Form
(observer-ratings) presented in the NEO PI-R Professional Manual
is .94 (Costa & McCrae, 1992, Table B1 and Table B2). The
largest mean difference is 3.3 raw-score points on the C6: Deliberation subscale, which in T-scores is 7.6 units higher for observerthan self-ratings. The average absolute self– other rating difference
ALLIK ET AL.
872
across all 30 NEO PI-R subscales is 2.9 T-score units. Thus, given
the high correlation and small difference in mean levels, there is
little room for disparity between the perspectives from which
personality is described. Looking at how similar self-rating and
observer-rating normative profiles are, one has to conclude that the
effect caused by self– other asymmetry must be small and, if at all,
reliably detectable only when sufficiently large samples are used.
Examination of the global distribution of aggregated self-rated
personality profiles across cultures has revealed a regular pattern,
with a clear contrast between European and American cultures on
the one hand and Asian and African cultures on the other (Allik &
McCrae, 2004). The global distribution of observer-rated personality traits generally followed the same pattern (McCrae, Terracciano, & 79 Members of the Personality Profiles of Cultures
Project, 2005). However, it is possible that when there is an
asymmetry between self- and observer-perspectives, it also displays itself at the aggregate level. If there is a systematic disparity
in the perspective from which personality traits are described at
culture-level personality scores as well, it will constitute another
cross-cultural personality universal in addition to the relatively
invariant pattern of sex and age differences.
Aims of the Study
Since many theoretical constructions assume a fundamental
asymmetry between self- and observer-perspectives, the main goal
of this study is to examine whether there is a replicable pattern of
differences between self-rated and observer-rated personality traits
which transcends different languages and cultures. There are two
principal study designs available: an individual design, in which
the targets of self-reports and observer reports are the same and in
which the question is whether the mean difference between these
two is positive or negative; and a culture-level design, in which
different samples from the same culture are compared. The former
is more powerful because targets serve as their own controls with
regard to trait level, but the latter is also informative since it helps
to test the robustness of the phenomenon. We used both designs.
In Study 1 we compared data from four European languages and
cultures—in Belgium, Czech Republic, Estonia, and Germany—
where personality traits of participants were judged by themselves
and by one or more observers. To many readers it may come as a
surprise that the number of studies with consensual validation
between self- and observer-ratings is rather limited (McCrae,
Costa, Martin et al., 2004). Even so, consensus studies are the most
powerful source to test whether how people see others is different
from how people see themselves. Following the idea of the actor–
observer hypothesis, we tested whether self– observer discrepancy
is larger in less visible traits. Additionally, separate groups of
judges from all four of the countries also rated the social desirability of each of the NEO-PI-R items. On the basis of this, it is
possible to compare the difference between self- and observerratings with the social desirability of personality traits. Self-rated
scores are expected to be higher than observer-rated scores on
those personality traits considered higher on social desirability.
Study 2 is devoted to analyses of published cross-cultural data
sets about self- and observer-ratings of personality with the NEO
PI-R. The main goal in Study 2 is to establish how robust are the
asymmetries between self- and observer-ratings that were found in
the first study, that is how well they generalize across languages,
cultures, and different levels of analyses.
Study 1
Samples
Belgian sample. Flemish data were collected from 345 target
participants (270 women and 75 men) who were psychology
students at the Katholieke Universiteit Leuven and who, as a
course requirement, rated their own personality with the Dutch
version of the NEO PI-R (Hoekstra, Ormel, & DeFruyt, 1996).
They also recruited a well-acquainted person (n ⫽ 345; 190
women, 112 men, and 43 did not specify sex), either a relative or
a friend, who rated their personality using the observer-report form
of the same instrument. The mean age of targets was 18.4 (SD ⫽
3.0) years. The mean age of external raters was 29.5 (SD ⫽ 13.7)
years.
Czech sample. The Czech sample included 811 targets (330
men, 481 women) who were recruited in a series of studies
(McCrae, Costa, Martin et al., 2004). They ranged in age from 14
years to 83 years, with a mean age of 35.7 years (SD ⫽ 14.2 years).
Peer-ratings were provided by 909 raters (377 men, 532 women)
aged 14 – 83 years (M ⫽ 35.8 years; SD ⫽ 14.3 years) who
participated in one of two research designs. In the self– other
agreement studies (N ⫽ 615), each target provided a self-report
and was rated by one informant. In the consensus study, 195
targets (85 men and 110 women aged 17–77 years; mean age 36.4,
SD ⫽ 15.2) provided a self-report and were each rated by three
informants. All participants used the Czech version of the NEO
PI-R questionnaire (Hřebı́čková, 2002).
Estonian sample. Estonian data came from two already published studies. The first subsample consisted of 218 Estonianspeaking participants (180 women and 38 men; mean age 22.3
years, SD ⫽ 5.2) who answered the NEO PI-R questionnaire,
which was accompanied by a standard instruction to describe
themselves honestly and accurately (Konstabel, Aavik, & Allik,
2006). They were also asked to provide two peer-reports (n ⫽ 436)
from their acquaintances, relatives, or close friends. The Estonian
version of the NEO PI-R (Kallasmaa, Allik, Realo, & McCrae,
2000) was completed voluntarily; some students studying psychology received an extra credit towards the fulfillment of their course
requirements. The second Estonian subsample consisted of 154
participants (53 men and 101 women; mean age 43.9 years,
SD ⫽ 17.6) who were described by one or two judges (Mõttus,
Allik, & Pullman, 2007). The sample of judges (n ⫽ 308)
included 203 women, 67 men, and 38 participants who did not
report their gender. The mean age of the judges was 38.2 (SD ⫽
15.9) years. Both targets and judges used the Estonian version
of the EE.PIP-NEO (Mõttus, Pullmann, & Allik, 2006), which
has a facet-structure identical to the NEO PI-R but was designed to be linguistically simpler, containing shorter and grammatically less-complex items.
German sample. Participants were 304 students (169 women,
134 men, 1 not reporting sex) at a German university, of whom
only 3 studied psychology (Borkenau & Zaltauskas, 2009). Their
mean age was 23.38 (SD ⫽ 2.68) years, ranging from 18 years to
35 years. They received 45 Euro for their participation and were
recruited in 76 groups, each comprising 4 persons who all knew
SELF VERSUS OTHER RATINGS IN PERSONALITY JUDGMENT
each other well. First, the participants described the 3 other group
members on 30 bipolar adjective scales; these, however, are not
relevant to the present study. Next, each 4-person group was split
into two dyads, and all participants described themselves and the
other dyad member on several personality inventories including
the German version of the NEO PI-R (Ostendorf & Angleitner,
2004). It is important to notice that in all four samples observers
knew their targets well, being either their close relatives or friends.
Measure of Social Desirability
In order to develop a social desirability index for the NEO PI-R,
the questionnaire items were assessed by 100 Czech judges (43
men and 57 women, mean age 40.5 years, SD ⫽ 15.1), 88 Estonian
judges (24 men and 64 women, mean age 37.6 years, SD ⫽ 12.7),
30 Flemish judges (12 men, 16 women, 2 did not report their sex,
mean age 20.3 years, SD ⫽ 2.0), and 20 German judges (9 men
and 11 women, mean age 23.8 years, SD ⫽ 3.0), who independently rated the social desirability of each of the 240 NEO PI-R
items. The instruction stated,
Descriptors of people often contain evaluative information. Some
personality characteristics are considered more desirable, receiving
approval from other people, whereas others are undesirable. If someone agrees strongly with this item, does this present that person in a
favorable or unfavorable light, or is agreeing with this item neutral as
regards to others’ approval?
Ratings were made on a 7-point Likert scale, ranging from extremely undesirable (⫺3) to extremely desirable (⫹3), with zero as
a neutral point. Estonian desirability ratings were reported in a
previous study (Konstabel et al., 2006). As the pairwise correlations between social desirability profiles of cultures were suffi-
ciently high (from r ⫽ .80 to r ⫽ .93), we used an unweighted
average of these four groups of judges.
Results and Discussion
The Flemish, German, Estonian, and Czech mean difference
profiles (self-minus-observer) converted to z -scores (the mean
difference divided by the average standard deviation) are shown in
Figure 1. Even a brief inspection reveals that all four difference
profiles are very similar. Indeed, the pairwise correlations between
these four profiles range from .77 (Belgium and Germany) to .88
(Estonia and the Czech Republic) with the mean r ⫽ .83 (all highly
significant). All four profiles are also strongly correlated with the
difference profile of the U.S. adult normative data (S-Form minus
R-Form) given in the NEO PI-R Professional Manual (Costa &
McCrae, 1992, Table B1 and Table B2). The correlations are .85,
.79, .75, and .63 for Belgium, the Czech Republic, Estonia, and
Germany, respectively.
In all four cultures (in addition to the United States), people
perceive themselves as more neurotic and more open than they are
seen by others. They also perceive their level of conscientiousness
as lower than how they are seen from the vantage point of their
observers. Particularly, people see themselves as less competent
(C1), self-disciplined (C5), and altruistic (A3) than they are perceived by others. At the same time they think that they are more
than others open to fantasy (O1) and ready to reexamine social,
political, and religious values (O6). Even this short list of disparities suggests that in general, people do not view themselves more
favorably than how they are viewed by others.
For a more formal test we analyzed the combined sample of
1,768 targets, pooled from the four separate samples. Before
pooling, all scores were normalized within the country dataset.
Belgium
Estonians
Germans
Czechs
0.4
0.0
-0.4
-0.8
-1.2
N1:Anxiety
N2:Angry Hostility
N3:Depression
N4:Self-Consciousness
N5:Impulsiveness
N6:Vulnerability
E1:Warmth
E2:Gregariousness
E3:Assertiveness
E4:Activity
E5:Excitement Seeking
E6:Positive Emotions
O1:Fantasy
O2:Aesthetics
O3:Feelings
O4:Actions
O5:Ideas
O6:Values
A1:Trust
A2:Straightforwardness
A3:Altruism
A4:Compliance
A5:Modesty
A6:Tender-Mindedness
C1:Competence
C2:Order
C3:Dutifulness
C4:Achievement Striving
C5:Self-Discipline
C6:Deliberation
Self-minus-observer Ratings (z-scores)
1.2
0.8
873
Figure 1. The mean difference profiles (self-minus-observer) for Belgium, Germany, Estonia, and the Czech
Republic, converted to z-scores (the mean difference divided by the average standard deviation). Letters–number
combinations are the NEO PI-R facet scale numbers.
ALLIK ET AL.
874
Contrary to the concept of self-enhancement, the mean self-minusobserver profile was negatively correlated (r ⫽ ⫺.53, p ⫽ .003),
with the profile of social desirability ratings (individual correlations were from r ⫽ ⫺.40, p ⫽ .026 to r ⫽ ⫺.62, p ⫽ .001 for
Estonia and Belgium, respectively) suggesting that observerratings rather than self-ratings might be biased towards social
desirability. Similarly, the normative U.S. adult self-minusobserver profile was negatively correlated with the profile of social
desirability ratings (r ⫽ ⫺.45, p ⫽ .012). Figure 2 presents the
average self-minus-observer profile for the four cultures studied in
comparison with their average social desirability ratings on the
items of the 30 NEO PI-R facet scales. Although these two profiles
are not exact mirror images, their dissimilarity is obvious. In
personality psychology, a correlation of ⫺.53 between two profiles
is sufficiently high enough to speak of a substantial reverse link
between social desirability and the disparity between perspectives.
To test the prediction following from the actor– observer asymmetry that disparity is more pronounced on less visible traits, we
computed the rank-order correlation between self-ratings and
observer-ratings. The highest self– other agreement was found on
the subscales E3: Assertiveness (.56), O2: Aesthetics (.56), and
E5: Excitement Seeking (.55). A relatively low self– other agreement was found in the ratings of C1: Competence (.32), A1: Trust
(.36), and N5: Impulsiveness (.38). The average self– other agreement was reasonably high (median ⫽ .43, p ⫽ .018). These values
are comparable to typical self– observer agreement values obtained
in previous studies using the NEO family questionnaires (Connolly, Kavanagh, & Viswesvaran, 2007; McCrae, Costa, Martin et
al., 2004). Having obtained these self– other convergence values, we can ask whether the self– observer asymmetry is more
pronounced on those personality traits on which people and
their judges agree less. Figure 3 shows the correlation plot
between self– other agreement and self-minus-observer differ-
ence scores. There seems to be no systematic relationship
between self– other agreement and the asymmetry in perception
(r ⫽ .03, p ⫽ .88). Thus, on personality traits that are more
visible or judge-able, there is no smaller disparity between selfand observer-ratings.
In addition to social desirability and visibility, there are other
potential explanations for the asymmetry between self- and
observer-ratings. One of them could be a systematic age difference
between targets and judges. College-age targets may have recruited adults to judge their personality, and adults may tend to
endorse certain items differently compared with college-aged people. Indeed, this tendency was true for the Flemish and Estonian
samples, but not for the German and a part of Czech samples in
which the mean age of target and judges was identical. Thus, if the
rater-age related explanation is true, we should expect a systematic
difference in self– observer asymmetry between German and
Czech samples on one side and Estonian and Flemish samples on
the other side. There appears to be no such difference (see Figure
1), which speaks against the rater-age related explanation for the
asymmetry. In addition, to see whether the observer-rated personality profile tends to be more adultlike, we compared the average
values of the four self-minus-observer profiles displayed in Figure 1 with the mean age differences between adults and collegeage targets for 30 NEO PI-R facets. We took the latter values from
the best observer-ratings database to date, results obtained from the
international sample consisting of 50 cultures participating in the
Personality Profiles of Cultures Project (McCrae, Terracciano, &
78 Members of the Personality Profiles of Cultures Project, 2005;
McCrae, Terracciano, & 79 Members of the Personality Profiles of
Cultures Project, 2005). Contrary to our prediction that adults may
follow their age-specific response pattern even when rating
college-aged people, the correlation between the two difference
profiles— self– observer asymmetry and age-related differences
0.8
SELF-MINUS-OBSERVER RATINGS
SOCIAL DESIRABILITY RATINGS
0.6
Z-scores
0.4
0.2
0.0
-0.2
-0.6
N1:Anxiety
N2:Angry Hostility
N3:Depression
N4:Self-Consciousness
N5:Impulsiveness
N6:Vulnerability
E1:Warmth
E2:Gregariousness
E3:Assertiveness
E4:Activity
E5:Excitement Seeking
E6:Positive Emotions
O1:Fantasy
O2:Aesthetics
O3:Feelings
O4:Actions
O5:Ideas
O6:Values
A1:Trust
A2:Straightforwardness
A3:Altruism
A4:Compliance
A5:Modesty
A6:Tender-Mindedness
C1:Competence
C2:Order
C3:Dutifulness
C4:Achievement Striving
C5:Self-Discipline
C6:Deliberation
-0.4
Figure 2. The average self-minus-observer profile for the four cultures studied in comparison with their
average social desirability ratings on the Revised NEO Personality Inventory items. Letters–number combinations are the NEO PI-R facet scale numbers.
SELF VERSUS OTHER RATINGS IN PERSONALITY JUDGMENT
875
0.60
E3
Correlation between self and observer ratings
0.55
O2
E5
E2
C2
O5
0.50
E4
E6
E1
O4
A4
N3
C5
0.45
C3
A5
N2
O1
C6 C4
N1
O6
0.40
A3
A2
N6
O3
A6
N5
N4
A1
0.35
C1
0.30
-0.8
-0.6
-0.4
-0.2
0.0
0.2
0.4
0.6
Self-minus-observer ratings (z-scores)
Figure 3. Correlation plot between self– other agreement and self-minus-observer-difference scores (r ⫽ .03,
p ⫽ .88). Letters-number combinations are the NEO PI-R facet scale numbers.
(adults minus college age)—was positive (r ⫽ .64, p ⬍ .001),
suggesting that the observer’s perspective was more characteristic
to a younger person, not an older person. Thus, an external
observer may tend to emphasize the younglike personality trait
levels of their targets, whereas generally younger targets may tend
to show themselves as more maturelike.
Study 2
In spite of obvious language and cultural differences, all four
studied countries are members of the European Union and have
relatively high levels of human development and economic prosperity. Therefore, to make any claims concerning the universality
of our findings in Study 1, we would need to have data from other
regions of the World, including Asian and African countries.
Disappointingly, the number of consensual studies between selfand observer-ratings done outside Europe and North America is in
short supply. Fortunately, the NEO PI-R was translated into more
than 40 languages and many researchers around the world have
collected self-report data. In 2002, McCrae (2002) assembled
self-report data that had been collected by other researchers
using a variety of designs in 36 different cultures. Some cultures (e.g., Hong Kong) had only college-age respondents; some
(e.g., Spain) had only adult data, and many had both. The ratio
of men to women varied widely across cultures. In order to
create overall culture scores that would be comparable across
these diverse studies, McCrae (2002) first standardized each
subsample using age- and gender-specific U.S. norms and then
defined the overall culture T -score as the unweighted mean of
all available subsamples. This strategy assumes that trait levels for
a culture are generalizable across age and gender groups and that age
and gender differences around the world are similar to those found in
the U.S. norms. Both these assumptions were generally supported by
the data (McCrae, 2002), and the validity of the resulting overall
culture trait means was supported by their correlates (Allik & McCrae, 2004; McCrae & Terracciano, 2008).
Unlike self-report data, the collection of the observer’s ratings
has been much more systematic. During the Personality Profiles of
Culture project, college students from 51 cultures identified an
adult or college-aged man or woman whom they knew well and
rated more than 12,000 targets using the R-form of the NEO PI-R
(McCrae, Terracciano, & 78 Members of the Personality Profiles
of Cultures Project, 2005; McCrae, Terracciano, & 79 Members of
the Personality Profiles of Cultures Project, 2005). As there is a
considerable overlap between the samples of nations for which
aggregate self- or observer-rated personality scores are reported, it
makes it possible to compare the self- and observer-rated personality traits at the aggregate national level. Provided that the pattern
of difference between internal and external perspectives for the
Big Five personality traits is pervasive, we could expect to observe
it even if the targets of the self and observer’s ratings are not
identical, to say nothing about other differences (such as age, sex
and occupation) between the study designs.
876
ALLIK ET AL.
Method
The self-reported mean T-scores of the NEO PI-R subscales for
36 countries were published by McCrae (2002). An additional set
of self-report data for Burkina Faso, Switzerland (Frenchspeaking), and Poland was published by McCrae and Terracciano
(2008, Appendix C). In another large-scale project, observers’
ratings were collected from more than 50 different cultures using
the R-Form of the NEO PI-R (McCrae, Terracciano, & 78 Members of the Personality Profiles of Cultures Project, 2005; McCrae,
Terracciano, & 79 Members of the Personality Profiles of Cultures
Project, 2005), meaning that standardized mean observer ratings
for 51 cultures are now publicly available (McCrae & Terracciano,
2008). For 29 cultures, both self- and observer-reports are available. Although this overlapping set also contained data from Belgium, the Czech Republic, Estonia, and Germany, it is important to
note that the data were different from what we used in Study 1.
Because the published means in both cases are reported in
T-scores, and the original data are not available, we converted
them back to raw scores using the formula ([T-Score – 50] 䡠
SD)/10 ⫹ M, where M and SD are the mean and standard deviation
of the U.S. adult or college age normative data (Costa & McCrae,
1992), dependent on the respective sample, and the average international sample data (McCrae & Terracciano, 2008) for self- and
observer-ratings, respectively. It is important to note, however,
that this back transformation from T-scores to raw scores is approximate, given that on the basis of T-scores alone it is impossible
to reconstruct the exact scores for different sex and age groups.
The reconstructed mean profile represents a hypothetical average
person, without sex and age specification. To compute the self–
other asymmetry index, we subtracted the mean score of the
observer-ratings from the mean score of the self-ratings.
In order to study correlation with societal-level indicators, we
found the mean absolute difference between observer-ratings and
self-ratings for each culture. This score showing the magnitude of
the self-minus-observer differences was correlated with several
indicators characterizing economic and social conditions.
Gross domestic product (GDP). GDP at purchasing power
parity in U.S. dollars, divided by the midyear population in 2006,
were obtained from the Human Development Indices (2008).
Life expectancy. Life expectancy at birth indicates the number of years a newborn infant would live if prevailing patterns of
age-specific mortality rates at the time of birth were to stay the
same throughout the child’s life (Human Development Indices,
2008).
Human Development Index (HDI). The Human Development Index measures the level of human development by combining normalized measures of life expectancy, literacy, educational
attainment, and GDP per capita for countries worldwide; the
reported indices are for the year 2006 (Human Development
Indices, 2008).
Index of Shipping Difficulties. The Index of Shipping Difficulties is an indicator of required efforts and complications (border
delays, fees, red tape, etc) met during shipping goods (World
Development Report, 2009, Table A4).
Days to Start Business. The goal of the Doing Business
project was to provide an objective basis for understanding and
improving the regulatory environment for business. We used
the days required for starting a business as an index of the
bureaucratic and legal hurdles an entrepreneur must overcome
to incorporate and register a new firm. Data were retrieved from
http://doingbusiness.org/ExploreTopics/StartingBusiness/ on
June 29, 2009.
Corruption Perception Index (CPI). The Corruption Perception Index ranks 180 countries by their perceived levels of
freedom from corruption, as determined by expert assessments and
opinion surveys. The CPI is compiled annually, and it was retrieved from the Transparency International homepage http://
www.transparency.org/policy_research/surveys_indices/cpi on
June 29, 2009.
Results and Discussion
The mean differential profiles (self-minus-observer) of the 30
NEO PI-R subscales for the 29 countries or cultural groups are
shown in Table 1. The average between-country profile correlation
was .43, suggesting that the profiles of the self-minus-observer
mean differences are rather similar. The last column in Table 1
shows how much the mean self-minus-observer profile of each
country is similar to the average self-minus-observer profile of all
29 countries. As nearly all correlations are positive and significant
(median ⫽ .67), a strong first principal component is suggested on
which all individual profiles are loading. There is only one
country-level self-minus-observer profile of the 29 that clearly
deviates from the common shape—the Danish profile. We have no
good explanation for why the Danish data are conspicuously
different from other countries. The Danish self- and observerrating profiles alone are not outstanding from other profiles. The
deviance of the self-minus-observer profile might reflect a real
difference in perspective or might be a consequence of some
measurement error, to say nothing about artifacts that could have
been created by the back transformation from T-scores. Despite the
ideosyncracy of the Danish self-minus-observer profile, we can
still conclude that there is a remarkable cross-cultural similarity in
the asymmetry of aggregate self- and observer-ratings.
The critical issue, however, is how well the culture-level findings of Study 2 agree with the individual-level findings of Study 1
that were obtained with a more controlled design. The crossvalidity of the findings of the two studies is remarkable: Averaged
self– observer difference profiles found in the four cultures investigated in Study 1 and the averaged self– observer difference
profile of the 29 cultures investigated in Study 2 were correlated as
highly as r ⫽ .80, p ⬍ .0001. Figure 4 presents the average
self-minus-observer profile across 29 cultures, together with the
average of four consensus studies (Study 1) and the differential
profiles of the U.S. normative data (Costa & McCrae, 1992) for
adults. All these three profiles are strongly correlated (from .70 to
.82, p ⬍ .0001), suggesting that the average shape of the selfminus-observer profile remains essentially the same.
Further comparison of data from the Study 1 and Study 2 shows
that the average self– observer differences profile of the 29 countries is not correlated to the visibility of traits, operationalized as
the rank-order correlation between self- and observer-ratings in the
pooled data of four European countries (r ⫽ ⫺.04, p ⫽ .83). The
lack of correlation is consistent with the findings in Study 1,
further demonstrating that the self– observer asymmetry is probably not caused by the different amount of information available to
self-raters and external observers. We also calculated the correla-
SELF VERSUS OTHER RATINGS IN PERSONALITY JUDGMENT
tion between the average self– observer differences profile of the
29 cultures and the mean social desirability ratings reported in
Study 1. Similarly to the Study 1, the correlation was negative, but
this time it was nonsignificant (r ⫽ ⫺.17, p ⫽ .37). Thus, assuming that social desirability is relatively universal, the lack of
significant positive correlation shows that in culture-level analyses, self– observer asymmetry cannot be explained by selfenhancement.
We were also interested whether the magnitude of the self-minusobserver differences is related to geographic, economic, and social
indicators. In general, data of the 29 countries did not show significant
correlation with these country-level indicators (Table 2), except a
significant ( p ⫽ .008) negative correlation with the days required for
starting a business. Provided that it is not a statistical fluke, it remains
to elucidate why, in countries with low bureaucratic and legal hurdles
that an entrepreneur must overcome to incorporate and register a new
firm, people generally see others more differently from how they see
themselves.
General Discussion
Mainstream social psychology, focusing on human inabilities,
has been engaged in expanding the list of errors in judgment (J. I.
Krueger & Funder, 2004). Although this study is also about the
disagreement between two perspectives from which personality
can be judged, its message is, overwhelmingly, about the remarkable accuracy of personality judgments. First, the level of self–
other agreement (the median rank-order...
Purchase answer to see full
attachment