Dis 2
Review Francis et al. (1999) and Think about the “who” or “what” you will need to use
as your unit of analysis.
•
•
•
•
Post an assessment of the impact of the unit-of-analysis selection in quantitative l
business research. In your assessment, do the following:
Describe the importance of ensuring the unit of analysis aligns with the research
purpose.
Explain the broader implications of selecting the incorrect unit of analysis on the
practice to business.
Analyze the relationship between sample size for the chosen unit of analysis and
statistical power.
Justify how and why the unit of analysis for you proposed quantitative study is
appropriate for your research question.
Previous similar assignment:
https://www.studypool.com/discuss/4371635/Unit-of-Analysis-and-Sample-Size-Discussion-help
Dis 3
To prepare for this Discussion, review the Lumley et al. (2002) article, as well as
Lessons 19–21 and 24 in the Green and Salkind (2017) text. identify a research
example using your research proposal and consider the role and importance of the
assumptions underlying each parametric test.
•
•
•
•
Post a comparison of one-sample, paired-samples, and independent-samples t-tests
within the context of quantitative business research. In your comparison, do the
following:
Describe the research example related to your research proposal.
Describe a hypothetical example appropriate for each t-test, ensuring that the variables
are appropriately identified.
Analyze the assumptions associated with the independent-samples t-tests and the
implications when assumptions are violated.
Explain options researchers have when assumptions are violated.
Previous similar assignment:
https://www.studypool.com/discuss/5062060/Discussion-Data-Assumptions-and-Parametric-StatisticalTests
Dis 4
Consider a possible theory for your proposed research problem and think about how the key
constructs, propositions, tenets, etc., of the theory may help you better understand your
business problem.
•
•
•
Post an analysis of the role of theory within the context of your quantitative business
research. In your analysis, do the following:
Describe the central role theory plays in deductive reasoning when conducting
quantitative business research.
Explain the critical relationship between the theory, specific business problem, purpose
statement, and research question for a applied research Study.
Provide at least one example from your own research that illustrates the impact of
theory on the development of an applied research Study.
Dis 5
review Lesson 31 in the Green and Salkind (2017) text and consider the correlation to your
potential research Study topic. Your potential topic may or may not be appropriate for
correlational methods, but for the purpose of this Discussion, assume it is.
•
•
•
Post an analysis of the difference between causation and correlation within the context
of your research study. In your analysis, do the following:
Assess the implications for professional practice when a researcher implies causation
after using correlation (e.g., bivariate correlation) analyses.
Explain why the results of bivariate correlation analyses are considered weak in terms
of internal validity.
Explain how would you extend or modify a research design to examine a true causeand-effect relationship.
Dis 6
Select Chaudhary et al. (2013) or Tang (2013). Assume you are contemplating using
the instrument identified in your chosen article to measure the purported construct(s) in
your study.
•
By Day 3
Post an explanation of the importance of psychometrically (reliability and validity) sound
instruments to measure constructs within quantitative business research. In your
explanation, do the following:
Describe the similarities and differences of reliability and validity within the context of
quantitative business research.
•
•
•
Identify the instrument utilized in your chosen article, including relevant details such as
requirements or tools.
Discuss the concepts of the instrument’s measurements and their psychometric
properties.
Explain why or why not you would use this instrument in your own research Study,
providing support based upon your assessment of its psychometric properties.
Dis 7
Lesson 41 in the Green and Salkind (2017) text. Consider the impact of data analysis when the
data distribution does not meet expected assumptions. Also consider conditions in which a
nonparametric test is the most appropriate test.
•
•
•
Post an analysis of the relationship between data assumption violations and
nonparametric analyses. In your analysis, do the following:
Compare the similarities and differences of parametric and nonparametric analyses in
the context of data assumptions.
Provide at least one example of a parametric statistical test and its nonparametric
equivalent and explain how these examples illustrate the comparison of the two types of
analysis.
Explain conditions under which you would use a nonparametric test (e.g., MannWhitney U-test over the independent samples t-test), including supportive examples
from the course Resources for your explanation.
ABACUS, Vol, 35, No. 2, 1999
JERE R. FRANCIS, DONALD J. STOKES AND DON ANDERSON
\
City Markets as a Unit of Analysis in
Audit Research and the Re-Examination
of Big 6 Market Shares
Big 6 market shares based on aggregate national data have been used in
prior research to infer market leadership and industry expertise, and to
differentiate Big 6 accounting firms from one another. In this study it is
demonstrated that further differences exist with respect to city-specific
audit markets, both between firms and within the same firm across different city markets. The specific finding is that the national market leader is
not the city-specific market leader the vast majority of time. Usefulness of
the city-level unit of analysis is further demonstrated by re-examining the
1989 mergers creating Ernst & Young and Deloitte Touche. The primary
effect of the Ernst & Young merger was to increase market shares in
cities in which the pre-merger firms already had significant market shares,
resulting in an increase in the number of cities in which the merged firm
achieved top ranking. In contrast, the primary effect of the Deloitte Touche
merger was an expansion of the number of city-level markets in which
the merged firm had significant (though not leading) market shares. The
findings of this study suggest that, in order to move beyond our current
understanding, important audit research questions such as the reason for
particular auditor-client alignments, the competitive nature of markets,
audit pricing of reputations, and auditor reporting and independence issues
should be investigated in city-level markets where audit contracting occurs
and where Big 6 market shares (and presumably reputations) vary widely
from city to city.
Key words: Analysis unit; Auditing; Cities; Research.
Big 6 accounting firms are organized as national partnerships with national administrative offices that set firm-wide policies and provide technical support for their
city-based practice offices. There are also various arrangements for coordinating
international practices and, in some cases, sharing profits. While national and international institutional features are important in terms of understanding the organizational
structure of Big 6 firms, the fact remains that it is the city-based practice offices of
JERE R. FRANCIS is Professor of Accounting, University of Missouri-Columbia; DONALD J. STOKES is
Professor of Accounting, University of Technology, Sydney; and DON ANDERSON is Professor of Managerial Accounting, University of Queensland.
We appreciate financial support from the Australian Research Council, and comments on earlier versions
of the paper when presented at the 1996 annual meeting of the Accounting Association of Australia
and New Zealand, the 1998 annual meeting of the American Accounting Assoeiation, and workshops at
the following universities: Maastricht (Holland), Missouri, Monash (Australia), Rochester, RutgersCamden, Queensland (Australia), and University of Technology, Sydney (Australia).
185
ABACUS
these firms which contract for and oversee the delivery of audits, and which issue
audit reports for these clients who are headquartered in the same geographical
locale.'
In spite of the decentralized organizational structure for contracting and audit
reporting, the extant audit research literature is dominated by what might be termed
a 'national' level of analysis. For example, national-level Big 6 market shares have
been tabulated through the aggregation of client data from local practice offices to
study issues such as Big 6 concentration and market dominance (Zind and Zeghal,
1989; Tonge and Wootton, 1991; Doogar and Easley, forthcoming) and Big 6 industry specializations (Eichenseher and Danos, 1981; Craswell et al., 1995; Kwon 1996;
DeFond et al., 1998). The use of national market share data implies that audit
markets are national markets and that accounting firms are predominantly national
in operation rather than city-based. Since this is not strictly true, national analyses
are something of a fiction and therefore potentially misleading without also considering the underlying city-level audit markets from which national data are
constructed.
The purpose of this study is to extend the literature on within-Big 6 auditor differentiation by documenting variations in Big 6 market shares across city-specific
audit markets.^ The city-level unit of analysis in this study shows that national
market share data obscure significant city-to-city variations, both between Big 6
firms and within the same Big 6 firm across different city markets. Market share is
important because it measures market leadership from which inferences can be
made about auditor reputations and expertise. These findings raise the possibility
that reputations and expertise of individual Big 6 offices are not standard and
uniform but vary from one locale to another along with the city-specific clienteles.
The value of a city-level unit analysis is further demonstrated in the paper by reexamining the 1989 mergers creating Deloitte Touche and Ernst & Young. Prior
studies of the mergers primarily document the overall increase in market share
using aggregate national data (such as Table 1, below) without providing insight to
' To verify the validity of this assertion we randomly selected one hundred companies from the 1995
Compact Disclosure database that were audited by the Big 6. We compared the city address of each
company's corporate headquarters with the city address of the auditor as listed on the audit report in
each company's 10-K or annual report. Ninety-seven of the hundred companies examined were
audited by an office of the Big 6 located in the same metropolitan area, or the nearest major city
with a Big 6 office if corporate headquarters were remotely located. The three exceptions were:
Dillard Department Stores, audited by New York City but headquartered in Little Rock, Arkansas;
Kodak, audited by New York City but headquartered in Rochester, New York; and Maytag, audited
by Chicago, but headquartered in Newton, Iowa (Des Moines is the closest city with Big 6 offices).
^ Several recent studies have begun to examine local and regional rather than 'national' aspects of
auditing. In the only study we know of that examines cities, Penno and Walther (1996) use accounting firm employment data to compare city-level market concentration ratios for accounting, advertising and lawfirms.A number of studies analyse state-level data: Young (1988) investigates variations
in state-level CPA exam pass rates; Wilson and Grimlund (1990) examine the impact of SEC sanctions on client retention at both a national level and in the specific state where the offending office
was located; Jeter and Shaw (1995) study variations in client solicitation rules across states; and Deis
and Giroux (1992) and O'Keefe et at. (1994) examine state-level markets for school district audits in
Texas and California, respectively, and the effect of auditor market shares on industry expertise and
audit quality.
186
CITY V. NATIONAL MARKET LEADERSHIP IN AUDITING
the economic rationale for and consequences of the mergers (Minyard and Tabor,
1991; Wootton et al., 1994). The city-level analysis of these mergers in our study
indicates that the Ernst & Young merger was more of a leadership merger in which
the primary effect was to significantly increase market share in cities in which the
pre-merger firms already operated, resulting in a substantial increase in market
leadership (cities with top ranking). In contrast, the Deloitte Touche merger was
more of a coverage merger in which the merger increased coverage in the number
of city-level markets in which the merged firm had significant market shares (though
not necessarily market leadership).
Results of this study indicate that fundamental auditing phenomena such as
market structure, audit pricing, auditor reporting, and independence issues should
be investigated in city-specific markets where the clienteles and market shares of
individual Big 6 firms vary considerably from city to city. Here are examples of the
kinds of important questions that could be researched. Are city-level audit markets
more concentrated and therefore less competitive than national data imply? How
does city-level market leadership affect profitability? Do city-level market leaders
command the same kind of audit fee premiums that have been found using national
data? Is there a premium for city-level industry leaders? Is audit reporting affected
by the size of a specific client relative to a firm's city-level clientele? Is the audit
report less likely to be qualified ceteris paribus when a client is relatively large,
raising the possibility of independence problems. Even more fundamental is the
question of the extent to which Big 6 reputations and expertise are perceived to be
city-specific rather than standard and uniform across cities.
BACKGROUND
Big 6 accounting firms supply audit services through decentralized locally based
practice offices rather than larger more centralized offices, and clients demand services from these offices in order to lower audit contracting costs. These contracting
costs include: (a) search costs for the auditor to identify potential clients having
acceptable risk and revenue potential; (b) costs of delivering the audit, including
transportation of audit teams to client sites; (c) client search costs in establishing
the quality of the audit to be delivered by a particular accounting firm; and (d) client
costs in monitoring the delivery and quality of contracted services. The advantage
of having decentralized local offices (rather than larger centralized offices) is that
accounting firm personnel can develop better knowledge of existing and potential
clients in a particular location. Clients in turn have greater knowledge of and
confidence in the expertise of the locally based personnel who actually perform the
audits. The lowering of contracting costs thus comes about in large part from a
reduction in information asymmetry between auditor and client due to the mutual
knowledge they have when both operate out of the same geographical locale."*
This argument is consistent with Coase's (1937) theory of the firm: that is, accounting firms decentralize, create multiple local offices and establish national offices to administer and coordinate the
activities of the local offices when it lowers the transaction costs of contracting with existing and
187
ABACUS
While the reputation and expertise of local office personnel are arguably the
most important factors to the chent in selecting an auditor, the ability of a Big 6
accounting firm to deliver standardized quality across multiple offices involved in
the engagement, plus the abihty (if needed) to supplement local office expertise
with experts from other offices, are also important features of audit contracting.
For larger clients with geographically dispersed operations, the local offices which
contract for audits must also credibly demonstrate their abihty to oversee an audit
that may require the use of other offices of the firm.'' Since aU Big 6 firms have the
same kinds of multi-office national and international structures, the focus here is
on each unique office and the related city-level audit market to analyse market
share differences among the Big 6 firms and differences within the same Big 6 firm
across different city markets.
Why is market share and market leadership important? Market share is an indicator of a firm's market penetration and dominance. Marketing research studies
show a positive relationship between market share and firm profitability (Szymanski
et al., 1993). Market share leadership affects profitability through buyers' use of
market share as a signal of brand name quality and product superiority (Smallwood
and Conhsk, 1979) which leads to the ability to charge higher prices (Schroeter,
1988). The pricing effect of market leadership has been observed in the audit
market with respect to a Big 6 premium relative to non-Big 6 firms, and a further
premium for industry leadership within the Big 6 group of firms (Craswell et al.,
1995; DeFond et al., 1998). Market leadership can also improve profitability by
lower costs obtained from bargaining power with suppliers (Demsetz, 1973) and
the ability to achieve operating efficiencies through scale economies (Eichenseher
and Danos, 1981).
Two aspects of Big 6 city-level market leadership are measured in the study: (a)
the overall city leader; and (b) industry-specific market leaders within cities. Overall market share identifies the dominant Big 6 supplier in terms of total city market
share, while industry share measures dominant industry suppliers (experts) within
cities and is a basis for further differentiating one Big 6 firm from another at the
city level.^ For each city, it is determined if the national market leader is the city
leader and, if not, a comparison is made of market share held by the city leader
potential clients. The value created by a national administrative office arises from quality control
activities and economies of scale in activities such as training and audit program development.
National offices also interact with regulators such as the FASB and SEC, and internationally with
the IASC, and thus serve as a vehicle for lobbying activities on behalf of the firm and its clients.
These engagements are performed under the close supervision of the engagement partner in the
local office that contracts with the client and signs the audit report. The contracting office plans the
engagement, performs critical parts of the audit, coordinates and reviews the work delegated to
other offices, and has final responsibility for issuing the audit report.
Eichenseher and Danos (1981), Danos and Eichenseher (1986), Haskins and Williams (1990), Craswell
et al. (1995) and Kwon (1996) use national market share data to identify industry specialists. However, national market share data show that all Big 6 accounting firms have substantial market shares
in virtually all industry sectors, thus making it difficult to meaningfully differentiate one Big 6 firm
from another. In contrast, there is much larger variation among Big 6 firms at the city level.
188
CITY V. NATIONAL MARKET LEADERSHIP IN AUDITING
versus the national market leader. The purpose is to document the extent to which
national market share data do not accurately reflect city-specific market leaders.
NATIONAL MARKET SHARES
National-level market shares for the United States are calculated for 1988 and 1990
using all publicly listed companies in Standard & Poor's COMPUSTAT data base.
The years 1988 and 1990 are chosen because these years represent the last full year
of the Big 8 and the first full year of the Big 6. These data facilitate a general
examination of how the two Big 8 mergers affected city-level market leadership
and a more detailed analysis of how the mergers affected the city-specific markets
of Ernst & Young and Deloitte Touche.
National market share data are reported in Table 1. Market shares are expressed
as percentages of total audits and percentages of chentele market value. The Big 8/
Big 6 are dominant, auditing around 81 per cent of clients (and 97 per cent of market value).'' Six second-tier national firms audited 5 per cent of companies (1 per
cent of market value), and the remaining accounting firms audited 13 per cent of
companies (2 per cent of market value).' The results reported in our study use the
number of audits to measure market share. Based on number of audits performed,
national market leaders were Peat Marwick in 1988 with 14.6 per cent of audits,
and Ernst & Young in 1990 with 18.3 per cent of audits. The 1989 mergers did not
increase overall concentration by Big 6 firms; however, they did lead to a new
market leader and an increase in market share of the leading accounting firm."
Table 2 reports the top-ranked accounting firm in terms of percentage of industry
audits performed nationally for twenty-seven broad industry groups.' On average,
the top-ranked firms had industry market shares of 18.2 per cent (21.6 per cent) in
1988 (1990). By contrast, the second-ranked firms had market shares averaging only
14.6 per cent (17.2 per cent) in 1988 (1990). Using a f-test, these mean differences
of 3.6 per cent in 1988 and 4.4 per cent in 1990 are significantly different from zero
(p < .01) indicating that the market share of the top-ranked firm is significantly
^ Big 6 market shares are somewhat lower though still dominant outside the United States. Data for
1992 in Internalionat Accounting Trends and Auditing Trends (Center for International Financial
Analysis and Research Inc., 1995) indicates that Big 6 market shares outside the United States
averaged 61 per cent of publicly listed companies and 78 per cent of the sales of these companies.
'
Only three of these second-tier national firms still exist today. Kenneth Leventhal was acquired by
Ernst & Young; Laventhol & Horwath declared bankruptcy; and Panell Kerr Forster voluntarily
dissolved its U.S. practice.
" The data were also analysed using client sales to develop a size-weighted market share metric. While
this can affect which specific Big 6 firm is identified as a city-level market leader, it has no impact on
the study's overall conclusion which is that national-level market share data do not accurately
identify city leaders the majority of the time.
'
See the Appendix for industry definitions. Industry sample sizes range from sixty-seven to 713 and
were constructed by combining some two-digit SIC codes. Strict two-digit SIC codes were used to
specify industries with results comparable to those reported here but with much smaller sample sizes
for some industries.
189
ABACUS
(N m p
^.
PI
o o o o
Q
g
990
Z
ca
° i
Q
13
3
[3
O
c
3
"ct
-and t
D
c
:her a ceo
5
00
3
cond tier
s0
o
00
0
0
"cS
cc
0
00
0
P
ca u
z
o
•*
o
o
D
O 3
O
CQ
0.
tL.
O
•o
ca UL.
6
c
t-
en
aj
u
uy
L-
190
"305
H
3
Vi
00
00
H
3
o
c .»i
3
ca
c
c^
6
E
03:
8
00
UBl
D
bto tal
0
E
IZ)
I H
B
0
orni
terh use
loss
c
_
c
T3
s
o
CITY V. NATIONAL MARKET LEADERSHIP IN AUDITING
TABLE 2
COMPUSTAT SAMPLE OF PUBLICLY-LISTED COMPANIES: NATIONAL AUDITOR
MARKET SHARE DATA FOR 27 INDUSTRY GROUPS"
Top-ranked CPA firm (%)
Sample sizes
SIC code
1988
1990
1988
1990
1-12, 14
13
15-17
20, 21
22,23
24-26
27
28-29
30-32
33,34
219
350
195
323
113
100
187
171
128
133
174
154
110
110
420
449
CL (17)
AA (15)
A A (18)
PM (16)
AA (14)
AA (18)
AA (17)
CL (15)
PM (16)
EW (17)
AA (15)
PM (15)
EW (17)
PM (14)
AA (22)
CL (20)
A A (31)
PM (11)
PM (17)
PM (34)
PM (22)
CL (17)
PM (30)
PM (14)
PM (16)
AA (13)
PM (15)
PM (21)
PM (17)
A A (17)
EY (19)
EY (27)
AA (18)
A A (17)
EY (20)
AA (15)
EY (23)
EY (20)
EY (20)
EY (21)
EY (20)
EY (25)
CL (22)
AA (23)
DT (16)
DT (24)
PM (43)
DT (22)
EY (18)
PM (31)
DT (17)
EY (19)
EY (17)
EY (20)
180
158
264
242
35
532
489
36
479
455
37
162
150
38,39
40-47
48
49
50,51
52-59
60
61
62
63
64,65
67
70-79
525
511
80+
390
381
7,698
6,989
200
175
231
218
299
350
368
149
283
219
94
Total
317
464
180
187
84
116
67
163
145
309
332
713
650
" AA = Arthur Andersen, AY = Arthur Young, CL = Coopers & Lybrand, DH = Deloitte Haskins
& Sells, DT = Deloitte Touche, EW = Ernst & Whinney, EY = Ernst & Young, PM = Peat Marwick,
PW = Price Waterhouse, TR = Touche Ross.
Number of industry groups in which top ranked CPA firm:
1988
AA—9
AY—0
CL—4
DH—0
1990
EW—2
PM—12
PW—0
TR—0
A A—5 EY—13
CL—1 PM—4
DT—4 PW—0
191
ABACUS
greater than that of the second-ranked firm. Peat Marwick was the dominant industry leader in 1988, being ranked first in twelve of the twenty-seven industry
groups. As a consequence of its 1989 merger Ernst & Young became the dominant
industry leader in 1990 and was ranked first in thirteen of the twenty-seven industry
groups. As with overall market shares, the 1989 mergers did not affect Big 6 concentration but did lead to an increase in market share of the leading firm, and a
change in the firm having leadership in the most industries.
CITY MARKET SHARES
Information on city-level audit markets does not exist in any electronic database.
It was determined that the best approach to construct city-level market share data
was to read audit reports in order to identify the city-specific offices of accounting
firms performing the audits and issuing audit reports. In this way city-specific markets are built up as the summation of all companies audited by offices of accounting
firms located in each specific city.'° It was next decided that the AICPA'S on-line
NAARS database provided the most convenient access to a large sample of audit
reports for 1988 and 1990. NAARS is a full-text file of annual reports. Audit
reports were read for all companies on NAARS resulting in samples of 3,777
observations from 145 cities for 1988, and 3,125 observations from 148 cities for
1990. The NAARS population is smaller than the COMPUSTAT population; however, it is quite large with over 3,000 observations and provides a cost-effective
basis for constructing city-level audit markets for publicly listed companies." Panel
A of Table 3 reports the analysis of these city markets. In order to determine if the
results are consistent for both larger and smaller city markets, separate analyses
are reported for cities with twenty or more audits, cities with ten to nineteen
audits, and cities with fewer than ten audits. Importantly, the city-specific market
leaders in the study are always one of the Big 8/Big 6 firms. That is, no city in the
study had a non-Big 8/6 auditor as market leader.
In 1988 (1990) the national top-ranked accounting firm is the city-specific market
leader in a minority of cities. Only 27 per cent (36 per cent) of the cities have the
national leader as city leader, with these cities having 30 per cent (32 per cent) of
the total audits in the sample. This pattern holds true for larger and smaller cities.
'" All of the audit is attributed to the office which contracts for the audit and issues the audit report.
To the extent other offices are involved in the engagement, our analysis potentially mismeasures
city-level market shares. However, if this effect is randomized across the sample there would be no
systematic bias. Multiple offices in the same metropolitan area are coded as one city. Cities most
affected by this are larger locales such as New York City, Los Angeles and Chicago.
"
There is always the possibility that the NAARS sample is systematically different from the
COMPUSTAT sample of publicly listed companies. However, in actuality both sets of companies
are more accurately characterized as samples drawn from an underlying population of all publicly
listed companies. Strictly speaking, neither NAARS nor COMPUSTAT are true random samples,
but given that both the NAARS and COMPUSTAT samples are quite large relative to the estimated population of 10,000 or so publicly-listed companies, these is no concern with the possible
effects of sampling error though it is acknowledged that such error does (in theory) exist and could
potentially affect the results reported in the study.
192
CITY V. NATIONAL MARKET LEADERSHIP IN AUDITING
TABLE 3
NAARS SAMPLE OF PUBLICLY-LISTED COMPANIES: ANALYSIS OF CITY-LEVEL
MARKET LEADERS
Panel A: City markets having national top-ranked firm as icity leader
Cities
Year
Total
Total
Cities with national leader as city leader
# Cities (°/j)
# Audits (%)
'l988
37
3,144
10 (27)
1,001 (32)
audits
1990
33
2,472
9(27)
778 (31)
10-19
1988
1990
28
361
6 (21)
74 (21)
audits
24
304
8(33)
86 (28)
o
a
00
ON
CN|
ON
C^
O\
5 •-
a.
6
to
0.1
cu
Q
Z
<
W
'us S-
6
ow
C/JO
o;
a
o
8
Q
H
>-
y
0)
p
o\ 00
l/S
ON
—
MIJ
•
*
•
*
.^ s d
a
CO
00
D
OH
u
on
O
tu
I
12
I £"
I^" I
•4
'^
O
3 3 g_ .t: „ S
x:
o
j=
yy-
o
n
to
t: t; 5 "3 5
< < u Q w
<
200
^
c H o WQ
CITY V. NATIONAL MARKET LEADERSHIP IN AUDITING
contrast, the merged firm EY had a 10 per cent or greater market share in nearly
all (thirty-four of thirty-seven) cities.
For Deloitte Touche, there is also a small coverage effect using the zero threshold. DHS had a pre-merger zero market share in one city and TR a zero market
share in four cities, whereas the pro-forma merged firm DT had no cities with zero
market share. The merger thus expanded by five cities the combined firm coverage.
A stronger coverage effect is evident using the 10 per cent threshold. DHS had 10
per cent coverage in twelve cities, and TR in nine cities. There were only five
common cities in which both firms had pre-merger coverage of 10 per cent. By
contrast, the merged firm DT had a 10 per cent or greater market share in thirtytwo of thirty-seven cities.
There is also evidence that a leadership effect was achieved by both mergers.
Pre-merger, AY was top-ranked in two cities and EW was top-ranked in eight
cities, for a combined total of ten cities. A pro-forma analysis shows the merged
firm EY was top-ranked in fifteen cities, an increase of five cities over the premerger combined total. With respect to the DT merger, DHS was top-ranked in
two cities and TR was top-ranked in one city, for a combined total of three cities
pre-merger. The pro-forma merged firm DT was top-ranked in six cities, an increase of three cities over the pre-merger combined total.
A Comparison of the EY and DT Mergers
The Ernst & Young merger occurred between a larger firm (EW) and a comparatively smaller firm (AY). Based on aggregate national data in Table 1, EW was the
third-ranked firm in number of audits while Arthur Young was the smallest of the
Big 8 firms. In the thirty-seven cities analysed, EW was the larger of the two firms
in twenty-four cities, while AY was the larger firm in only nine cities, with a tie in
the remaining four cities. Even though EW was the dominant partner, the merged
firm expanded the number of cities in which market share exceeded 10 per cent.
Pre-merger, EW exceeded 10 per cent in twenty of thirty-seven cities, and AY in
only eleven of thirty-seven cities. However, on a pro-forma merged basis, EY
exceeded 10 per cent in thirty-four of thirty-seven cities.
By contrast, the Deloitte Touche merger occurred between two more equalsized firms. Table 1 shows that DHS was the sixth-largest firm, and TR the
seventh-largest firm based on number of audits. The similarity of size is borne out
by the city-level analysis. In the thirty-seven cities analysed, DHS was the larger of
the two firms in sixteen cities, while TR was the larger firm in another sixteen
cities, with a tie in the remaining five cities. The merged firm DT greatly expanded
the number of cities in which market share exceeded 10 per cent. Pre-merger, DHS
exceeded 10 per cent in only twelve of thirty-seven cities, and TR in only nine of
thirty-seven cities. However, on a pro-forma merged basis, DT exceeded 10 per
cent in thirty-two of thirty-seven cities.
Both mergers expanded coverage in the number of cities with a market share of
10 per cent or more, though this effect was clearly stronger for the DT merger. The
reason for this is that the pre-merger dominance of EW limited the incremental
coverage effects in the EY merger. Recall that EW alone had a 10 per cent or
201
ABACUS
greater market share in twenty of thirty-seven cities. In the case of DHS and TR,
both firms had fewer cities with market shares of 10 per cent, so the effect of the
merger was more pronounced in creating market shares of 10 per cent or more
when combining clienteles in these cities.
Both mergers also had leadership effects in the sense that the merged firms had
pro-forma top rankings in more cities than the combined position pre-merger.
However, the merger of AY and EW had relatively more impact on leadership as
the merged firm achieved top-ranked status in fifteen cities compared to the separate pre-merger total of ten cities. By contrast, the DHS and TR merger achieved
top-ranked status in only six cities compared to the separate pre-merger total of
three cities. The EY merger had more impact on top rankings because EW already
had quite large market shares in many cities, even though it was not necessarily the
top-ranked firm in these cities.
In summary the EY merger had a stronger leadership effect compared to the DT
merger, whereas the DT merger had a stronger coverage effect relative to the EY
merger. Another consequence of the mergers was to create two firms that exhibited
coverage and top rankings comparable to the two pre-merger market leaders,
Arthur Andersen and Peat Marwick. A benchmark comparison in 1988 shows the
following. Arthur Andersen had no cities with 0 per cent market share, and had a
10 per cent or greater market share in thirty-one of thirty-seven cities. Peat Marwick
had no cities with 0 per cent market share, and had a 10 per cent or greater market
share in thirty-three of thirty-seven cities. Arthur Andersen was the top-ranked firm
in thirteen cities, and Peat Marwick was the top-ranked firm in ten cities. These
numbers are comparable to the pro-forma analysis of Ernst & Young and Deloitte
Touche.
The mergers thus resulted in Arthur Andersen, Deloitte Touche, Ernst & Young
and Peat Marwick having broadly similar U.S. practice profiles in terms of overall
size, city coverage and number of city-level top rankings. This left the remaining
two Big 8 firms. Coopers & Lybrand and Price Waterhouse, decidedly smaller by
comparison. In 1988, Price Waterhouse had 10 per cent or greater market share
in only nineteen of thirty-seven cities, and Coopers & Lybrand in only twenty
of thirty-seven cities. After recognizing the pro-forma effect of the EY and DT
mergers. Coopers & Lybrand and Price Waterhouse each had a top ranking in
only three cities. Given the size disparity created by the mergers, one could have
predicted the possibility of an eventual merger between Coopers & Lybrand and
Price Waterhouse, motivated in part to create sufficient city-level chenteles to
remain competitive with the other four Big 6 accounting firms."
Merger Effects on City-Level Industry Shares
Coverage and leadership effects of the mergers on city-level industry shares are
analysed in a manner analogous to the preceding analysis of total city-level market
"
The proposed merger between Coopers & Lybrand and Price Waterhouse was annotinced on 18
September 1997 (see Wall Street Journal, 19 September 1997, pp. A3 and A4), and became effective
August 1998. The new firm is called PriceWaterhouseCoopers.
202
CITY V. NATIONAL MARKET LEADERSHIP IN AUDITING
shares. To do this the thirty-seven cities in our sample having twenty or more
audits in 1988 are used once again. These thirty-seven cities have a total of 725
city-level industries (i.e., the number of industries within each city, summed over
the thirty-seven cities). Each of the 725 city-level industries is treated as a unique
audit market. Recall that a coverage effect occurs if the merger expands coverage
in the number of industry markets, and a leadership effect occurs if the merger
increases the number of industry markets in which the firm is the market leader.
Coverage effects are quite pronounced for both mergers. The pre-merger industry coverage in the 725 city-level industries was 24 per cent for AY and 32 per cent
for EW, well below Arthur Andersen's lead with 42 per cent. By contrast, the
pro-forma merged firm EY achieved a market leading coverage of 47 per cent. For
DHS and TR, the pre-merger coverage was 26 per cent and 25 per cent, respectively. The pro-forma merged firm DT achieved coverage of 42 per cent, a tie with
Arthur Andersen for the second best coverage after EY's leading coverage of
47 per cent.
Leadership effects are also evident in both mergers. AY and EW were premerger industry market leaders in ninety-seven and 145 city-level industry markets,
respectively. The pro-forma merged firm EY was market leader for 241 city-level
industry markets, which is actually one industry less than the simple summation of
the pre-merger totals (due to ties in some industries). In this respect, then, there
was not a leadership effect (i.e., creating new top rankings from the combined
clienteles). However, the merger did give EY the most top rankings, followed by
Arthur Andersen in second place with 209.
By contrast, the DT merger more clearly demonstrated leadership effects. DHS
and TR were pre-merger industry market leaders for ninety-eight and twenty-nine
city-level industry markets, respectively, for a total of 127 combined top rankings.
The pro-forma merged firm DT was market leader for 197 city-level industry markets, fifty more than the simple summation of the pre-merger totals. This gave DT
the third largest number of top rankings in cities after EY and Arthur Andersen.
As with total city-level market shares, the industry effect of the 1989 mergers was
to create four larger sized firms, leaving Coopers & Lybrand and Price Waterhouse
as decidedly smaller by comparison. Coopers & Lybrand and Price Waterhouse
had the smallest coverage percentages, 24 per cent and 30 per cent, respectively,
and the fewest number of top industry rankings, ninety-seven and 112, respectively. The next largest firm was Peat Marwick with a coverage of 32 per cent and
172 top industry rankings.
CONCLUSIONS
This study examined how national market share data obscure important city-level
variations within the dominant Big 6 group of auditors. The city-level analysis
shows there is considerable variation across cities in audit market shares, both
among the dominant Big 6 group of accounting firms and within the same Big 6
firm across cities. Reliance upon national market share data obscures this variation
and therefore cannot be used to accurately infer city-level market leaders for a
203
ABACUS
majority of cities. Overall, approximately 70 per cent of companies in the sample
are located in cities in which national leaders are not city-level leaders. These
results are robust for both larger and smaller cities. Then the study demonstrated
how a city-level unit of analysis can enhance our understanding of audit market
phenomena by re-examining the 1989 mergers creating Ernst & Young and Deloitte
Touche. A city-by-city analysis provides insight to the economic rationale for and
consequences of the mergers that is not observable using national data.
Marketing research finds that market leadership increases brand-name recognition and the perception that the market leader is of higher quality, which in turn
leads to higher prices and greater profitability. In auditing research, the Big 6 have
generally been grouped together as a homogeneous set of firms in terms of brandname reputation. Based on aggregate national data this grouping makes sense and
there is evidence of systematically higher prices charged by Big 6 firms relative to
the non-Big 6 (Craswell et al, 1995). However, given the documented variation
in city-level market leadership, it seems likely that reputations of individual Big 6
firms also vary from city to city. Market share variations among (and within) the
Big 6 firms are even greater with respect to industry specializations. One obvious
implication of these findings would be that reputations of individual Big 6 firms are
city-specific and linked to city-specific clienteles rather than standard and uniform
across cities as implied by analyses based on national-level data. Research on this
question and the extent to which a firm can 'stretch' its expertise across multiple
offices are important questions in understanding accounting firm reputations and
expertise.
The kind of city-to-city variation in the market shares of individual Big 6 accounting firms reported in this study has not been explicitly recognized in prior research
based on aggregate national data. Our findings strongly suggest that important
auditing phenomena such as auditor-client alignments, market structure, pricing,
audit reports, and independence issues should be investigated in these city-specific
markets. Such research is necessary to capture the salient features of audit contracting and Big 6 organizational structures, and to move beyond our current understanding of audit markets which is largely based on the use of national-level data.
REFERENCES
Coase, R., 'The Nature of the Firm', Economica, Vol. 4 No, 1937.
Craswell, A., J. Francis and S. Taylor, 'Auditor Brand Name Reputations and Industry Specializations',
Journal of Accounting and Economics, December 1995.
Danos, P., and J. Eichenseher, 'Long-Term Trends in Seller Concentration in the U.S. Audit Market',
The Accounting Review, October 1986.
DeFond, M., J. Francis and T. J. Wong, Auditor Industry Specialization and Market Segmentation:
Evidence From Hong Kong, working paper. University of Southern California, University of
Missouri and Hong Kong University of Science and Technology, 1998.
Demsetz, H., 'Industry Structure, Market Rivalry, and Public Policy', Journal of Law and Economics,
April 1973.
Deis, D., and G. Giroux, 'Determinants of Audit Quality in the Public Sector', The Accounting Review,
July 1992.
204
CITY V. NATIONAL MARKET LEADERSHIP IN AUDITING
Doogar, R., and R. Easley, 'Concentration Without Differentiation: A New Look at the Determinants
of Audit Market Concentration', Journal of Accounting and Economics, forthcoming.
Eichenseher, J., and P. Danos, 'The Analysis of Industry Specific Concentration: Toward an Explanatory Model', The Accounting Review, July 1981.
Haskins, M., and D. Williams, 'A Contingent Model of Intra-Big Eight Auditor Changes', Auditing: A
Journal of Practice and Theory, Fall 1990.
Jeter, D., and P. Shaw, 'Solicitation and Auditor Reporting Decisions', The Accounting Review, April
1995.
Kwon, S., 'The Impact of Competition Within the Client's Industry on the Auditor Selection Decision',
Auditing: A Journal of Practice and Theory, Spring 1996.
Minyard, D., and R. Tabor, 'The Effect of Big Eight Mergers on Auditor Concentration', Accounting
Horizons, December 1991.
O'Keefe, T., R. King and K. Gaver, 'Audit Fees, Industry Specialization, and Compliance With GAAS
Reporting Standards', Auditing: A Journal of Practice and Theory, Fall 1994.
Penno, M., and B. Walther, 'The Concentration of Local Markets: A Study of Accounting, Advertising
and Law', Accounting Horizons, June 1996.
Post, A., Anatomy of a Merger: The Causes and Effects of Mergers and Acquisitions, Prentice-Hall,
1994.
Schroeter, J., 'Estimating the Degree of Market Power in the Beef Packing Industry', Review of Economics and Statistics, February 1988.
Smallwood, D., and J. Conlisk, 'Product Quality in Markets Where Consumers are Imperfectly Informed',
Quarterly Journal of Economics, February 1979.
Stevens, M., The Big Six: The Selling Out of America's Top Accounting Firms, Simon & Schuster, 1991.
Szymanski, D., S. Bharadway and P. R. Varadarajan, 'An Analysis of the Market-Share Profitability
Relationship', Journal of Marketing, July 1993.
Tonge, S., and C. Wootton, 'Auditor Concentration and Competition Among the Large Public Accounting Firms: Post-Merger Status and Future Implications', Journal of Accounting and Public
Policy, Summer 1991.
Wilson, T., and R. Grimlund, 'An Examination of the Importance of an Auditor's Reputation', Auditing:
A Journal of Practice and Theory, Spring 1990.
Wootton, C, S. Tonge and C. Wolk, 'Pre and Post Big 8 Mergers: Comparisons of Auditor Concentration', Accounting Horizons, September 1994.
Young, S., 'The Economic Theory of Regulation: Evidence From the Uniform CPA Examination', The
Accounting Review, April 1988.
Zind, R., and D. Zeghal, 'Some Characteristics of the Canadian Audit Industry', Contemporary Accounting Research, Fall 1989.
APPENDIX
INDUSTRY GROUPS
SIC codes
1-12, 14
13
15-17
20,21
22,23
24-26
industries
Agriculture and mining
Oil and gas
Construction
Food and tobacco
Textiles and apparel
Wood products
205
ABACUS
SIC codes
27
28,29
30-32
33, 34
35
36
37
38, 39
40-47
48
49
50, 51
52-59
60
61
62
63
64, 65
67
70-79
80+
Industries
Publishing
Chemicals
Rubber and leather, glass and concrete
Steel and metal
Heavy equipment
Electronics
Transportation vehicles
Precision instruments
Transportation services
Telecommunications
UtiUties
Wholesalers
Retailers
Banks
Other financial institutions
Security dealers
Insurance
Insurance and real estate agents
REITS and other investing institutions
Services
Health care and professional services
206
11 Mar 2002
12:26
AR
AR153-08.tex
AR153-08.SGM
LaTeX2e(2001/05/10)
P1: IKH
10.1146/annurev.publheath.23.100901.140546
Annu. Rev. Public Health 2002. 23:151–69
DOI: 10.1146/annurev.publheath.23.100901.140546
c 2002 by Annual Reviews. All rights reserved
Copyright °
THE IMPORTANCE OF THE NORMALITY
ASSUMPTION IN LARGE PUBLIC
HEALTH DATA SETS
Thomas Lumley, Paula Diehr, Scott Emerson, and Lu Chen
Department of Biostatistics, University of Washington, Box 357232, Seattle,
Washington 98195; e-mail: tlumley@u.washington.edu
Key Words parametric, nonparametric, Wilcoxon test, rank test, heteroscedasticity
■ Abstract It is widely but incorrectly believed that the t-test and linear regression
are valid only for Normally distributed outcomes. The t-test and linear regression
compare the mean of an outcome variable for different subjects. While these are valid
even in very small samples if the outcome variable is Normally distributed, their major
usefulness comes from the fact that in large samples they are valid for any distribution.
We demonstrate this validity by simulation in extremely non-Normal data. We discuss
situations in which in other methods such as the Wilcoxon rank sum test and ordinal
logistic regression (proportional odds model) have been recommended, and conclude
that the t-test and linear regression often provide a convenient and practical alternative.
The major limitation on the t-test and linear regression for inference about associations
is not a distributional one, but whether detecting and estimating a difference in the
mean of the outcome answers the scientific question at hand.
INTRODUCTION
It is widely but incorrectly believed that the t-test and linear regression are valid
only for Normally distributed outcomes. This belief leads to the use of rank tests for
which confidence intervals are very hard to obtain and interpret and to cumbersome
data-dependent procedures where different transformations are examined until a
distributional test fails to reject Normality. In this paper we re-emphasize the
uncontroversial statistical facts that the validity of the t-test and linear regression
in sufficiently large samples depends only on assumptions about the variance of the
response and that violations of those assumptions can be handled easily for the t-test
(and with slightly more difficulty for linear regression). In addition to reviewing
the literature on the assumptions of the t-test, we demonstrate that the necessary
sample size is relatively modest by the standards of today’s public health research.
This is true even in one of the most extreme kinds of data we have encountered,
annualized medical costs. We should note that our discussion is entirely restricted
0163-7525/02/0510-0151$14.00
151
21 Feb 2002
9:32
152
AR
AR153-08.tex
AR153-08.SGM
LaTeX2e(2001/05/10)
P1: IKH
LUMLEY ET AL.
to inference about associations between variables. When linear regression is used to
predict outcomes for individuals, knowing the distribution of the outcome variable
is critical to computing valid prediction intervals.
The reason for the widespread belief in a Normality assumption is easy to see.
If outcomes are indeed Normally distributed then several different mathematical
criteria identify the t-test and ordinary least squares regression as optimal analyses. This relatively unusual convergence of criteria makes the Normal theory an
excellent example in mathematical statistics, and leads to its popularity in both
theoretical and applied textbooks. The fact that the Normality assumption is sufficient but not necessary for the validity of the t-test and least squares regression
is often ignored. This is relatively unimportant in theoretical texts, but seriously
misleading in applied books.
In small samples most statistical methods do require distributional assumptions,
and the case for distribution-free rank-based tests is relatively strong. However, in
the large data sets typical in public health research, most statistical methods rely
on the Central Limit Theorem, which states that the average of a large number of
independent random variables is approximately Normally distributed around the
true population mean. It is this Normal distribution of an average that underlies
the validity of the t-test and linear regression, but also of logistic regression and
of most software for the Wilcoxon and other rank tests.
In situations where estimation and comparison of means with the t-test and
linear regression is difficult because of extreme data distributions, it is important
to consider whether the mean is the primary target of estimation or whether some
other summary measure would be just as appropriate. Other tests and estimation
methods may give narrower confidence intervals and more powerful tests when
data are very non-Normal but at the expense of using some other summary measure
than the mean.
In this review we begin by giving the statistical background for the t-test and
linear regression and then review what the research literature and textbooks say
about these methods. We then present simulations based on sampling from a large
data set of medical cost data. These simulations show that linear regression and the
t-test can perform well in moderately large samples even from very non-Normal
data. Finally, we discuss some alternatives to the t-test and least squares regression
and present criteria for deciding which summary measure to estimate and what
statistical technique to use.
DEFINITIONS AND THEORETICAL ISSUES
Least-Squares Techniques
We will discuss first the two-sample t-test, and then linear regression. While the
t-test can be seen as merely a special case of linear regression, it is useful to
consider it separately. Some more details of the calculations and a review of the
Central Limit Theorem can be found in Appendix 1.
21 Feb 2002
9:32
AR
AR153-08.tex
AR153-08.SGM
LaTeX2e(2001/05/10)
P1: IKH
NORMALITY ASSUMPTION
153
The t-Test
Two different versions of the two-sample t-test are usually taught and are available
in most statistical packages. The differences are that one assumes the two groups
have the same variance, whereas the other does not. The t-statistic, which does
not assume equal variances, is the statistic in Equation 1. In Appendix 1 we show
that, because of the Central Limit Theorem, this is normally distributed with unit
variance when the sample size is large, no matter what distribution Y has. Thus,
this version of the t-test will always be appropriate for large enough samples. Its
distribution in small samples is not exactly a t distribution even if the outcomes
are Normal. Approximate degrees of freedom for which the statistic has nearly a
t distribution in small samples are computed by many statistical packages.
Y1 − Y2
.
t=q 2
s1
s22
+
n1
n2
1.
We next mention the version of the t-statistic that assumes the variances in the two
groups are equal. This, the original version of the test, is often used in introductory
statistics because when the data do have a Normal distribution, the statistic in
Equation 2 has exactly a t distribution with a known number of degrees of freedom.
One would rarely prefer this statistic in large samples, since Equation 1 is more
general and most statistical programs compute both versions. However, Equation
2 is useful in illustrating the problem of heteroscedasticity.
t=q
Y1 − Y2
(n 1 − 1)s12 + (n 2 − 1)s22
n1 + n2 − 2
q
1
n1
+
1
n2
.
2.
Equation 2 differs from Equation 1 in combining the two group variances to estimate a pooled standard deviation. It is identical to that in Equation 1 if either
n1 = n2 or s12 = s22. The two forms will be similar if n1 and n2 or σ 12 and σ 22 are
similar, as is often the case. However, it is possible for them to differ in extreme
situations. Suppose n1 is much larger than n2. In that case, the denominator of
the t-statistic in Equation 1 can be seen to be primarily a function of s22, while
the denominator of the t-statistic in Equation 2 is primarily a function of s12 .
If the variances in the two groups are different, this can result in the two t-statistics
having different denominators. For example, if n1 is ten times as big as n2, and the
two variances also differ by a factor of 10, then Equation 1 will still be appropriate
but Equation 2 will be too small or too large by a factor of about 2, depending on
which group has the larger variance. In such an extreme case, it would be possible
to make an incorrect inference based on Equation 2. That is, the Central Limit
Theorem guarantees that the t-statistic in Equation 2 will be normally distributed,
but it may not have variance equal to 1. This is not a problem in practice because
we can always use Equation 1, but severe heteroscedasticity will cause problems
for linear regression, as is discussed below.
21 Feb 2002
9:32
154
AR
AR153-08.tex
AR153-08.SGM
LaTeX2e(2001/05/10)
P1: IKH
LUMLEY ET AL.
Linear Regression
As with the t-test, least-squares linear regression is usually introduced by assuming
that Y is Normally distributed, conditional on X. This is not quite the same as
saying that Y must be Normal; for example, Y for men and women could each have
a different Normal distribution that might appear bimodal when men and women
are considered together. That they were Normally distributed when controlling
for sex would satisfy the usual Normality assumption. Normality is not required
to fit a linear regression; but Normality of the coefficient estimates β̂ is needed
to compute confidence intervals and perform tests. As β̂ is a weighted sum of Y
(see Appendix 1), the Central Limit Theorem guarantees that it will be normally
distributed if the sample size is large enough, and so tests and confidence intervals
can be based on the associated t-statistic.
A more important assumption is that the variance of Y is constant. As with the
t-test, differences in the variance of Y for different values of X (heteroscedasticity)
result in coefficient estimates β̂ that still have a Normal distribution; as with Equation 2 above, the variance estimates may be incorrect. Specifically, if the predictor
X has a skewed distribution and Y has different variance for large and small values
of X, the variance of β̂ can be estimated incorrectly. This can be related to the
conditions for t-test (2) to be incorrect by writing the t-test as a linear regression
with a single binary predictor variable. A binary predictor X is skewed when the
proportions p with X = 0 and the proportion q = 1 − p with X = 1 are different
[the skewness is equal to (q − p)pq]. Thus the condition that X is skewed and Y is
heteroscedastic in this linear regression is the same as the condition that n and σ 2
both differ between groups for the t-test. Modifications analogous to t-test{1} to
provide reliable inference in the presence of substantial heteroscedasticity exist but
are not widely implemented in statistical software. In the case of the t-test, we saw
that heteroscedasticity must be extreme to cause large biases; in our simulations
below we examine this question further for linear regression.
LITERATURE REVIEW
An unwritten assumption of much of the literature on the t-test is that all two-sample
tests are effectively testing the same null hypothesis, so that it is meaningful to
compare the Type I and Type II error rates of different tests. This assumption
is frequently untrue, and testing for a difference in means between two samples
may have different implications than testing for a difference in medians or in
the proportion above a threshold. We defer until later a discussion of these other
important criteria for selecting an estimator or test. Most of the literature on the
assumptions of the t-test is concerned with the behavior of the t-test in relatively
small samples, where it is not clear if the Central Limit Theorem applies.
For linear regression, the statistical literature largely recognizes that heteroscedasticity may affect the validity of the method and non-Normality does not. The
literature has thus largely been concerned with how to model heteroscedasticity
21 Feb 2002
9:32
AR
AR153-08.tex
AR153-08.SGM
LaTeX2e(2001/05/10)
P1: IKH
NORMALITY ASSUMPTION
155
and with methods that may be more powerful than linear regression for non-Normal
data. These issues are outside the scope of our review.
A number of authors have examined the level and power of the t-test in fairly
small samples, without comparisons to alternative tests. Barrett & Goldsmith (4)
examined the coverage of the t-test in three small data sets, and found good coverage
for sample sizes of 40 or more. Ratcliffe (22) looked at the effect on the t distribution
of non-Normality, and provided an estimate of how large n must be for the t-test to
be appropriate. He examined sample sizes of up to 80 and concluded that “extreme
non-Normality can as much as double the value of t at the 2.5% (one tail) probability
level for small samples, but increasing the sample sizes to 80, 50, 30, and 15 will
for practical purposes remove the effect of extreme skewness, moderate skewness,
extreme flatness, and moderate flatness, respectively.” We note that the one-tailed
tests he studied are more sensitive to skewness than two-tailed tests, where errors
in the two tails tend to compensate. Sullivan & d’Agostino (32) found that t-tests
produced appropriate significance levels even in the presence of small samples (50
or less) and distributions in which as many as 50% of the subjects attained scores
of zero.
Sawilowsky & Blair (23) examined the robustness of the t-test to departures
from Normality using Monte Carlo methods in 8 data sets with sample sizes up to
120. They found the t-test was robust to Type II error. Sawilowsky & Hillman (24)
showed that power calculations based on the t-test were appropriate, even when
the data were decidedly non-Normal. They examined sample sizes up to 80.
The bootstrap (12) provides another method of computing confidence intervals
and significance levels using the t-statistic. The bootstrap is a general-purpose
method for estimating the sampling distribution of any statistic computed from independent observations. The sampling distribution is, by definition, the distribution
of the statistic across repeated samples from the same population. The bootstrap
approximates this by assuming that the observed sample is representative of the
population and by taking repeated samples (with replacement) from the observed
sample. The bootstrap approach usually requires some programming even in statistical packages with built-in bootstrap facilities [e.g., Stata (29) and S-PLUS
(17)]. There is a wide theoretical and applied literature discussing and extending
the bootstrap, much of which is summarized in books by Efron & Tibshirani (12)
and Davison & Hinkley (9).
Bootstrapping for comparing means of non-Normal data has been evaluated
in the context of cost and cost-effectiveness studies. Barber & Thompson (3)
recommended a bootstrap approach for testing for differences in mean costs. They
presented two examples, with sample sizes of 184 and 32 patients, respectively.
In both cases, the p-values and the confidence intervals were very similar using
the t-test and using the bootstrap procedure. Rascati et al. (21) concluded that the
bootstrap was more appropriate, but they only examined the distribution of the
cost data, not the more relevant sampling distribution of the mean.
In a practical setting, the t-test should be discarded only if a replacement can
perform better, so comparisons with other tests are particularly important. Cohen &
21 Feb 2002
9:32
156
AR
AR153-08.tex
AR153-08.SGM
LaTeX2e(2001/05/10)
P1: IKH
LUMLEY ET AL.
Arthur (8) looked at samples of 25 per group and found that t-tests on raw, log, and
square transformed data; the Wilcoxon test; and a randomization test all exhibited
satisfactory levels of alpha error, with the randomization test and the t-test having
the greatest power. Stonehouse & Forrester (30) found that the unequal-variance
form of the t-test performed well in samples drawn from non-Normal distributions
but with different variances and sample sizes. The Wilcoxon test did not perform
as well. Zimmerman (34) compared the t-test to the Wilcoxon test when data were
non-Normal and heteroscedastic and found that the t-test performed better than the
Wilcoxon. Zimmerman & Zumbo (35) found that rank methods are as influenced by
unequal variances as are parametric tests, and recommended the t-test. Skovlund &
Fenstad (27) also found that the t-test was superior to the Wilcoxon when variances
were different.
Theoretical results on the properties of the t-test are mostly over 30 years old.
These papers mostly examine how the skewness and kurtosis of the outcome
distribution affects the t-statistic in fairly small samples. In principle, they could
be used to create a modified t-statistic that incorporated estimates of skewness and
kurtosis. At least one such test (7) has achieved some limited applied use. The
original references appear to be to Gayen (14) and Geary (15), who approximated
the distribution of the t-statistic in non-Normal distributions. They were followed
by other authors in producing better approximations for very small samples or
extreme non-Normality.
In contrast to the t-test, there has been little empirical research into the behavior
of linear regression for non-Normal data. Such research typically focuses on the
effects of extreme outliers, under the assumption that such outliers are caused
by errors or at least may be excluded from the analysis. When residuals are not
Normally distributed, these robust regression methods may be useful for finding
the line that best fits the majority of the data, ignoring some points that do not
fit well. Robust regression methods do not model the mean of Y but some other
summary of Y that varies from method to method. There is little literature on robust
regression at an elementary level, but chapters by Berk (5) and Goodall (16) are
at least addressed to the practising statistician rather than the theoretician.
Textbooks of biostatistics frequently describe linear regression solely in the
context of Normally distributed residuals [e.g., Altman (2), Fisher & van Belle
(13), Kleinbaum et al. (18)] where it is the optimal method for finding the bestfitting line; however, the least-squares method was invented as a nonparametric
approach. One of the inventors, Legendre [quoted by Smith (28)], wrote,
Of all the principles which can be proposed for that purpose, I think there is
none more general, more exact, and more easy of application, that of which
we made use in the preceding researches, and which consists of rendering the
sum of squares of the errors a minimum.
Discussions of linear regression that do not suppose Normality are relatively rare.
One from an impeccable statistical authority is that of Stuart et al. (31). More
commonly, a Normality assumption is presented but is described as less important
21 Feb 2002
9:32
AR
AR153-08.tex
AR153-08.SGM
LaTeX2e(2001/05/10)
P1: IKH
NORMALITY ASSUMPTION
157
than other assumptions of the model. For example, Kleinbaum et al. (18, p. 117)
wrote,
[Normality] is not necessary for the least-squares fitting of the regression
model but it is required in general for inference making . . . only extreme
departures of the distribution of Y from normality yield spurious results.
This is consistent with the fact that the Central Limit Theorem is more sensitive to
extreme distributions in small samples, as most textbook analyses are of relatively
small sets of data.
SIMULATIONS
The simulations in much of the statistical literature we reviewed refer to sample
sizes far smaller than those commonly encountered in public health research. In
an effort to fill part of this gap, this section describes some simulations that we
performed with larger samples. We used data from the evaluation of Washington
State’s Basic Health Plan, which provided subsidized health insurance for lowincome residents, starting in 1989 (10, 19). The 6918 subjects in the study were
enrolled in four health plans, 26% in a health maintenance organization (HMO) and
74% in one of three independent practice associations (IPA). Subjects were aged
0 to 65 (mean 23 years) and were followed for an average of 22 months (range 1 to
44 months). Length of follow-up depended on when the person joined the program
relative to the end of the evaluation period, and is probably not related to the
person’s health. During the study period 79% used some services. As examples we
use the variables “cost of outpatient care,” age, sex, and self-rated general health.
The last variable is abbreviated EVGFP, for “excellent/very good/good/ fair/poor.”
Example of Central Limit Theorem
The Central Limit Theorem depends on the sample size being “large enough,” but
provides little guidance on how large a sample might be necessary. We explored
this question using the cost variable in the Washington Basic Health Plan data.
Annualized outpatient cost has a very long right tail, as shown in Figure 1. We
truncated the histogram at $3000 so that the distribution for lower values could
be seen, but use the full distribution in the following analysis. The actual costs
ranged from $0 to $22, 452, with a mean of $389. The standard deviation is $895,
standardized skewness is 8.8, and standardized kurtosis is 131.
Figure 2 shows the sampling distribution of 1000 means of random samples of
size 65, 129, 324, and 487 from this very non-Normal distribution (approximately
1%, 2%, 5%, and 7.5% of the population). The graph shows a histogram and a
smooth estimate of the distribution for each sample size. It is clear that the means
are close to Normally distributed even with these very extreme data and with
sample sizes as low as 65.
21 Feb 2002
9:32
158
AR
AR153-08.tex
AR153-08.SGM
LaTeX2e(2001/05/10)
P1: IKH
LUMLEY ET AL.
Figure 1 Distribution of annualized medical costs in the Washington Basic Health Plan.
Example for Linear Regression
Medical costs usually have the very non-Normal distribution we see here, but transformations are undesirable as our interest is in total (or mean) dollar costs rather
than, say, log dollars (11). We considered the 6918 subjects to be the population of
interest and drew samples of various sizes to determine whether the test statistics
of interest had the distribution that was expected.
In addition, there is substantial heteroscedasticity and a somewhat linear relation
between the mean and variance. In Figure 3 we divided subjects into groups by
age and sex and calculated the mean and standard deviation of cost for each group.
It is clear that the standard deviation increases strongly as the mean increases. The
data are as far from being Normal and homoscedastic as can be found in any real
examples.
We used these data to determine how large a sample would be needed for the
Central Limit Theorem to provide reliable results. For example, as illustrated on the
first line of Table 1, we drew 1000 1% samples, of average size 65, from the population. For each sample we calculated the regression of cost on age, sex, self-rated
health, and HMO (IPA = 0) versus Fee for Service (IPA = 1). For each parameter
in the regression model we calculated a 95% confidence interval and then checked
21 Feb 2002
9:32
AR
AR153-08.tex
AR153-08.SGM
LaTeX2e(2001/05/10)
P1: IKH
NORMALITY ASSUMPTION
159
Figure 2 Distribution of means of samples of annualized costs.
to see whether the confidence interval contained the true value. The percent of
times that the confidence interval included the value computed from the entire
population of 6918 is an estimate of the true amount of confidence (coverage) and
would be 95% if the data had been Normal to start with. For samples of size 65
and 129, some of the confidence interval coverages are below 90%. That means
that the true alpha level would be 10% or more, when the investigator believed it to
be 5%, yielding too many significant regression coefficients. Note that for sample
sizes of about 500 or more, the coverage for all regression coefficients is quite
close to 95%. Thus, even with these very extreme data, least-squares regression
performed well with 500 or more observations.
These results suggest that cost data can be analyzed using least-squares approaches with samples of 500 or more. Fortunately, such large samples are usually
the case in cost studies. With smaller samples, results for variables that are highly
significant (p < .001, for example) are probably reliable. Regression coefficients
with p-values between .001 and .10, say, might require additional analysis if they
are important.
For data without such long tails much smaller sample sizes suffice, as the
examples in the literature review indicate. For example, at one time a popular
method of generating Normally distributed data on a computer was to use the sum
21 Feb 2002
9:32
160
AR
AR153-08.tex
AR153-08.SGM
LaTeX2e(2001/05/10)
P1: IKH
LUMLEY ET AL.
Figure 3 The relationship between mean and standard deviation of annualized costs, in
age-sex subgroups.
of a sample of 12 uniformly distributed random numbers. The resulting distribution
was not just close enough to Normal for statistical purposes, it was effectively
indistinguishable from a Normal distribution. Similarly, the familiar rule that 2 × 2
tables should have expected counts of at least 5 for a χ 2 test comes from applying
the Central Limit Theorem to binary variables.
ALTERNATIVES TO LEAST-SQUARES APPROACHES
The literature summarized above and our simulations illustrate that linear regression and the t-test can perform well with data that are far from Normal, at least
in the large samples usual in public health research. In this section we examine
alternatives to linear regression. In some disciplines these methods are needed to
handle small samples of non-Normal data, but in reviewing their appropriateness
for public health research we focus on other criteria. These methods usually come
with their own sets of assumptions and they are “alternatives” to least-squares
methods only when no specific summary statistic of interest can be identified, as
we discuss in the next section.
We examine the Wilcoxon rank-sum test as an alternative to the t-test and the
logistic and proportional odds models as alternatives to linear regression.
21 Feb 2002
9:32
AR
AR153-08.tex
AR153-08.SGM
LaTeX2e(2001/05/10)
P1: IKH
161
NORMALITY ASSUMPTION
TABLE 1 Coveragea results for the mean and coefficients from multivariable
regression. (Based on 1000 replicates)
% of population
N in sample
Mean
b-age
b-sex
b-EVGFP
b-IPA
1
65
88.5
89.7
96.4
88.8
93.1
2
129
90.5
89.9
96.3
88.4
91.5
5
324
92.4
89.9
97.5
91.3
93.8
b
94.0
90.3
97.3
92.3
94.0
649c
94.9
91.2
97.7
92.5
94.7
15
973
95.8
92.9
98.3
94.3
96.0
20
1297
96.2
92.6
98.4
95.0
97.1
7.5
10
487
a
Coverage is the % of time that the (nominal) 95% confidence included the true mean, out of 1000 replicates.
b
Not always the same because some of the data are missing—468 to 500.
c
Range from 629–669 because of missing data.
Wilcoxon and Other Nonparametric Tests
The Wilcoxon two-sample test is said to be nonparametric because no particular
distribution is assumed for the data. The test simply ranks all of the data and calculates the sum of the ranks for one of the groups. It is possible to test how likely that
sum would be under the null hypothesis that the two distributions were identical.
The Wilcoxon test can thus be performed without distributional assumptions even
in very small samples. It is sometimes described as a test for the median, but this
is not correct unless the distribution in the two groups is known a priori to have
the same shape. It is possible to construct distributions with arbitrarily different
medians for which the Wilcoxon test will not detect a difference.
The Wilcoxon test is widely known to be more powerful than the t-test when
the distribution of data in the two groups has long tails and has the same shape in
each group but has been shifted in location. Conversely, it is less powerful than
the t-test when the groups differ in the number and magnitude of extreme outlying
distributions, as recognized in EPA guidelines for testing for environmental contamination in soil (33). Although its power relative to other tests depends on the
details of the null and alternative hypotheses, the Wilcoxon test always has the disadvantage that it does not test for equality in any easily described summary of the
data. This is illustrated by the analysis of Rascati et al. (21) in comparing overall
medical costs for asthmatics prescribed steroids compared with other treatments.
Although the mean cost was lower in the steroid group, a Wilcoxon test reported
significantly higher costs for that group. A related disadvantage is that it is not
easy to construct confidence intervals that correspond to the Wilcoxon test.
EXAMPLE As an example, we compared the outpatient cost for people who rated
themselves in poor health to those in fair health (n = 103 and 340, respectively).
The t-test showed that the mean costs in the poor and fair groups were $960 and
21 Feb 2002
9:32
162
AR
AR153-08.tex
AR153-08.SGM
LaTeX2e(2001/05/10)
P1: IKH
LUMLEY ET AL.
$727, respectively; the mean difference is $234; the 95% confidence interval for
the difference (−$72 to + $540); t = 1.51; p = 0.133. The Wilcoxon test provides
the information that the mean rank of costs in the poor and fair groups were 245.51
and 214.88; that the sum of ranks was 25288 versus 73058; the Wilcoxon statistic
was 73058 and the p-value 0.033. The Wilcoxon test thus yielded a more significant
result than the t-test, but did not provide any useful descriptive statistics. The data
for the two groups did not seem to have the same shape, based on a histogram.
Logistic Regression
When the dependent variable is binary, the most common analytic method is
logistic regression. In this approach the assumptions fit the data. Further, the (exponentials of the) regression parameters can be interpreted as odds ratios, which
are nearly identical to relative risks when the event under study is rare.
Another possible approach is least-squares linear regression, letting Y be the 0/1
binary variable. Such an approach is not usually considered appropriate because
Y is not Normally distributed; however, the Central Limit Theorem ensures that
the regression coefficients will be Normally distributed for large enough samples.
Regression estimates would be a weighted sum of the Y’s, which are 0’ and 1’s.
The usual rule for the binomial distribution is that proportions are approximately
Normal if np > 5 and n(1 − p) > 5, which should hold for the large data sets we are
considering. Another objection to the linear regression approach is that estimated
proportions can be below 0 or greater than 1. This is a problem if the goal is to
predict a probability for an individual, and the sample is small. It will rarely be
a problem when the goal is to assess the effec...

Purchase answer to see full
attachment