Research Paper Instructions
AD 685
Structure – word count
• Abstract – 150 words
• Introduction – 300 words
• Literature review – 600 words
• Methods & Data Collections – 250 words
• Interpretation and Discussion – 250 words
• Conclusion – 300 words
• MAX: 2000 words
Structure – In detail
• Abstract – Extremely short summary of your entire paper
• Introduction – Summary of your entire paper, why this topic, why is it interesting, why is it
important, what have the previous researchers done on the topic, what did you do, what more
can be done
• Literature review – What previous researchers have done on this topic, areas of agreement,
disagreement,
• Methods & Data Collections – How did you collect the data, what type of analysis did you do,
report summary statistics, regression results, scatter plots, diagrams, and anything else that you
think is important
• Interpretation and Discussion – Interpret your results, discuss how this matches up with previous
research on this topic
• Conclusion – Summarize the paper and say what else can you do on this topic, what do you know
now that you did not know before
• MAX: 2000 words
Resources:
• APA Style Guide:
https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_fo
rmatting_and_style_guide/apa_sample_paper.html
• Writing Tips: http://www.tulane.edu/~bfleury/termpaper.html
• Boston University Writing Center:
http://www.bu.edu/writingprogram/the-writing-center/
• Boston University Writing and Reference Guide:
http://library.bu.edu/citeys
Introduction
to Econometrics
The Pearson Series in Economics
Abel/Bernanke/Croushore
Macroeconomics*
Bade/Parkin
Foundations of Economics*
Berck/Helfand
The Economics of the Environment
Bierman/Fernandez
Game Theory with Economic
Applications
Blanchard
Macroeconomics*
Blau/Ferber/Winkler
The Economics of Women, Men, and
Work
Boardman/Greenberg/Vining/Weimer
Cost-Benefit Analysis
Boyer
Principles of Transportation Economics
Branson
Macroeconomic Theory and Policy
Bruce
Public Finance and the American
Economy
Carlton/Perloff
Modern Industrial Organization
Case/Fair/Oster
Principles of Economics*
Chapman
Environmental Economics: Theory,
Application, and Policy
Cooter/Ulen
Law & Economics
Daniels/VanHoose
International Monetary & Financial
Economics
Downs
An Economic Theory of Democracy
Ehrenberg/Smith
Modern Labor Economics
Farnham
Economics for Managers
Folland/Goodman/Stano
The Economics of Health and
Health Care
Fort
Sports Economics
Froyen
Macroeconomics
Fusfeld
The Age of the Economist
Gerber
International Economics*
González-Rivera
Forecasting for Economics and Business
Gordon
Macroeconomics*
Greene
Econometric Analysis
*denotes
Gregory
Essentials of Economics
Gregory/Stuart
Russian and Soviet Economic
Performance and Structure
Hartwick/Olewiler
The Economics of Natural Resource Use
Heilbroner/Milberg
The Making of the Economic Society
Heyne/Boettke/Prychitko
The Economic Way of Thinking
Holt
Markets, Games, and Strategic Behavior
Hubbard/O’Brien
Economics*
Money, Banking, and the Financial
System*
Hubbard/O’Brien/Rafferty
Macroeconomics*
Hughes/Cain
American Economic History
Husted/Melvin
International Economics
Jehle/Reny
Advanced Microeconomic Theory
Johnson-Lans
A Health Economics Primer
Keat/Young/Erfle
Managerial Economics
Klein
Mathematical Methods for Economics
Krugman/Obstfeld/Melitz
International Economics: Theory & Policy*
Laidler
The Demand for Money
Leeds/von Allmen
The Economics of Sports
Leeds/von Allmen/Schiming
Economics*
Lynn
Economic Development: Theory and
Practice for a Divided World
Miller
Economics Today*
Understanding Modern Economics
Miller/Benjamin
The Economics of Macro Issues
Miller/Benjamin/North
The Economics of Public Issues
Mills/Hamilton
Urban Economics
Mishkin
The Economics of Money, Banking, and
Financial Markets*
The Economics of Money, Banking, and
Financial Markets, Business School Edition*
Macroeconomics: Policy and Practice*
MyEconLab titles. Visit www.myeconlab.com to learn more.
Murray
Econometrics: A Modern Introduction
O’Sullivan/Sheffrin/Perez
Economics: Principles, Applications, and
Tools*
Parkin
Economics*
Perloff
Microeconomics*
Microeconomics: Theory and
Applications with Calculus*
Perloff/Brander
Managerial Economics and Strategy*
Phelps
Health Economics
Pindyck/Rubinfeld
Microeconomics*
Riddell/Shackelford/
Stamos/Schneider
Economics: A Tool for Critically
Understanding Society
Roberts
The Choice: A Fable of Free Trade and
Protection
Rohlf
Introduction to Economic Reasoning
Roland
Development Economics
Scherer
Industry Structure, Strategy, and Public
Policy
Schiller
The Economics of Poverty and
Discrimination
Sherman
Market Regulation
Stock/Watson
Introduction to Econometrics
Studenmund
Using Econometrics: A Practical Guide
Tietenberg/Lewis
Environmental and Natural Resource
Economics
Environmental Economics and Policy
Todaro/Smith
Economic Development
Waldman/Jensen
Industrial Organization: Theory and
Practice
Walters/Walters/Appel/
Callahan/Centanni/
Maex/O’Neill
Econversations: Today’s Students Discuss
Today’s Issues
Weil
Economic Growth
Williamson
Macroeconomics
Introduction
to Econometrics
T h i r d
E d i t i o n
U p d a te
James H. Stock
Harvard University
Mark W. Watson
Princeton University
Boston Columbus Indianapolis New York San Francisco Hoboken
Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto
Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo
Vice President, Product Management: Donna Battista
Acquisitions Editor: Christina Masturzo
Editorial Assistant: Christine Mallon
Vice President, Marketing: Maggie Moylan
Director, Strategy and Marketing: Scott Dustan
Manager, Field Marketing: Leigh Ann Sims
Product Marketing Manager: Alison Haskins
Executive Field Marketing Manager: Lori DeShazo
Senior Strategic Marketing Manager: Erin Gardner
Team Lead, Program Management: Ashley Santora
Program Manager: Carolyn Philips
Team Lead, Project Management: Jeff Holcomb
Project Manager: Liz Napolitano
Operations Specialist: Carol Melville
Cover Designer: Jon Boylan
Cover Art: Courtesy of Carolin Pflueger and the authors.
Full-Service Project Management, Design, and Electronic
Composition: Cenveo® Publisher Services
Printer/Binder: Edwards Brothers Malloy
Cover Printer: Lehigh-Phoenix Color/Hagerstown
Text Font: 10/14 Times Ten Roman
About the cover: The cover shows a heat chart of 270 monthly variables measuring different aspects of employment, production,
income, and sales for the United States, 1974–2010. Each horizontal line depicts a different variable, and the horizontal axis is the
date. Strong monthly increases in a variable are blue and sharp monthly declines are red. The simultaneous declines in many
of these measures during recessions appear in the figure as vertical red bands.
Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on
appropriate page within text.
Photo Credits: page 410 left: Henrik Montgomery/Pressens Bild/AP Photo; page 410 right: Paul Sakuma/AP Photo;
page 428 left: Courtesy of Allison Harris; page 428 right: Courtesy of Allison Harris; page 669 top left: John McCombe/AP Photo;
bottom left: New York University/AFP/Newscom; top right: Denise Applewhite/Princeton University/AP Photo; bottom right:
Courtesy of the University of Chicago/AP Photo.
Copyright © 2015, 2011, 2007 Pearson Education, Inc. All rights reserved. Manufactured in the United States of America.
This publication is protected by Copyright, and permission should be obtained from the publisher prior to any prohibited
reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying,
recording, or likewise. To obtain permission(s) to use material from this work, please submit a written request to Pearson
Education, Inc., Permissions Department, 221 River Street, Hoboken, New Jersey 07030.
Many of the designations by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those
designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed in
initial caps or call caps.
Library of Congress Cataloging-in-Publication Data
Stock, James H.
Introduction to econometrics/James H. Stock, Harvard University, Mark W. Watson, Princeton University.—
Third edition update.
pages cm.—(The Pearson series in economics)
Includes bibliographical references and index.
ISBN 978-0-13-348687-2—ISBN 0-13-348687-7
1. Econometrics. I. Watson, Mark W. II. Title.
HB139.S765 2015
330.01’5195––dc23
2014018465
www.pearsonhighered.com
ISBN-10: 0-13-348687-7
ISBN-13: 978-0-13-348687-2
Brief Contents
PART ONE
Introduction and Review
Chapter 1
Chapter 3
Economic Questions and Data 1
Review of Probability 14
Review of Statistics 65
Part Two
Fundamentals of Regression Analysis
Chapter 4
Chapter 9
Linear Regression with One Regressor 109
Regression with a Single Regressor: Hypothesis Tests and Confidence
Intervals 146
Linear Regression with Multiple Regressors 182
Hypothesis Tests and Confidence Intervals in Multiple Regression 217
Nonlinear Regression Functions 256
Assessing Studies Based on Multiple Regression 315
Part Three
Further Topics in Regression Analysis
Chapter 10
Regression with Panel Data 350
Regression with a Binary Dependent Variable 385
Instrumental Variables Regression 424
Experiments and Quasi-Experiments 475
Chapter 2
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 11
Chapter 12
Chapter 13
Part Four Regression Analysis of Economic Time Series Data
Chapter 16
Introduction to Time Series Regression and Forecasting 522
Estimation of Dynamic Causal Effects 589
Additional Topics in Time Series Regression 638
Part Five
The Econometric Theory of Regression Analysis
Chapter 17
The Theory of Linear Regression with One Regressor 676
The Theory of Multiple Regression 705
Chapter 14
Chapter 15
Chapter 18
v
This page intentionally left blank
Contents
Preface
xxix
Part One
Introduction and Review
Chapter 1
Economic Questions and Data 1
1.1
Economic Questions We Examine
1
Question #1: Does Reducing Class Size Improve Elementary School Education? 2
Question #2: Is There Racial Discrimination in the Market for Home Loans? 3
Question #3: How Much Do Cigarette Taxes Reduce Smoking? 3
Question #4: By How Much Will U.S. GDP Grow Next Year? 4
Quantitative Questions, Quantitative Answers 5
1.2
Causal Effects and Idealized Experiments
5
Estimation of Causal Effects 6
Forecasting and Causality 7
1.3
Data: Sources and Types
7
Experimental Versus Observational Data
Cross-Sectional Data 8
Time Series Data 9
Panel Data 11
Chapter 2
Review of Probability
7
14
2.1 Random Variables and Probability Distributions
15
Probabilities, the Sample Space, and Random Variables 15
Probability Distribution of a Discrete Random Variable 16
Probability Distribution of a Continuous Random Variable 19
2.2 Expected Values, Mean, and Variance
19
The Expected Value of a Random Variable 19
The Standard Deviation and Variance 21
Mean and Variance of a Linear Function of a Random Variable
Other Measures of the Shape of a Distribution 23
2.3 Two Random Variables
22
26
Joint and Marginal Distributions
26
vii
viii
Contents
Conditional Distributions 27
Independence 31
Covariance and Correlation 31
The Mean and Variance of Sums of Random Variables
32
2.4 The Normal, Chi-Squared, Student t, and F Distributions
36
The Normal Distribution 36
The Chi-Squared Distribution 41
The Student t Distribution 41
The F Distribution 42
2.5 Random Sampling and the Distribution of the Sample Average
Random Sampling 43
The Sampling Distribution of the Sample Average
2.6
44
Large-Sample Approximations to Sampling Distributions
The Law of Large Numbers and Consistency
The Central Limit Theorem 50
Review of Statistics
63
65
3.1 Estimation of the Population Mean
Estimators and Their Properties 66
Properties of Y 68
The Importance of Random Sampling
3.2
47
48
Appendix 2.1 Derivation of Results in Key Concept 2.3
Chapter 3
43
66
70
Hypothesis Tests Concerning the Population Mean
71
Null and Alternative Hypotheses 71
The p-Value 72
Calculating the p-Value When sY Is Known 73
The Sample Variance, Sample Standard Deviation, and Standard Error
Calculating the p-Value When sY Is Unknown 76
The t-Statistic 76
Hypothesis Testing with a Prespecified Significance Level 77
One-Sided Alternatives 79
3.3
Confidence Intervals for the Population Mean
80
3.4
Comparing Means from Different Populations
82
Hypothesis Tests for the Difference Between Two Means 82
Confidence Intervals for the Difference Between Two Population Means
74
84
Contents
3.5
ix
Differences-of-Means Estimation of Causal Effects Using
Experimental Data 84
The Causal Effect as a Difference of Conditional Expectations 85
Estimation of the Causal Effect Using Differences of Means 85
3.6
Using the t-Statistic When the Sample Size Is Small
87
The t-Statistic and the Student t Distribution 87
Use of the Student t Distribution in Practice 89
3.7
Scatterplots, the Sample Covariance, and the Sample
Correlation 91
Scatterplots 91
Sample Covariance and Correlation
92
Appendix 3.1 The U.S. Current Population Survey
106
Appendix 3.2 Two Proofs That Y Is the Least Squares Estimator of μY
Appendix 3.3 A Proof That the Sample Variance Is Consistent
Part Two
Fundamentals of Regression Analysis
Chapter 4
Linear Regression with One Regressor 109
4.1 The Linear Regression Model
107
108
109
4.2 Estimating the Coefficients of the Linear Regression
Model 114
The Ordinary Least Squares Estimator 116
OLS Estimates of the Relationship Between Test Scores and the Student–
Teacher Ratio 118
Why Use the OLS Estimator? 119
4.3
Measures of Fit
121
R2
The
121
The Standard Error of the Regression 122
Application to the Test Score Data 123
4.4 The Least Squares Assumptions
124
Assumption #1: The Conditional Distribution of ui Given Xi Has a Mean of Zero
Assumption #2: (Xi, Yi), i = 1,…, n, Are Independently and Identically
Distributed 126
Assumption #3: Large Outliers Are Unlikely 127
Use of the Least Squares Assumptions 128
124
x
Contents
4.5
Sampling Distribution of the OLS Estimators
The Sampling Distribution of the OLS Estimators
4.6
Conclusion
129
130
133
Appendix 4.1 The California Test Score Data Set
Appendix 4.2 Derivation of the OLS Estimators
141
141
Appendix 4.3 Sampling Distribution of the OLS Estimator
Chapter 5
142
Regression with a Single Regressor: Hypothesis Tests and
Confidence Intervals 146
5.1 Testing Hypotheses About One of the Regression
Coefficients 146
Two-Sided Hypotheses Concerning β1 147
One-Sided Hypotheses Concerning β1 150
Testing Hypotheses About the Intercept β0 152
5.2
Confidence Intervals for a Regression Coefficient
5.3 Regression When X Is a Binary Variable
Interpretation of the Regression Coefficients
5.4
153
155
155
Heteroskedasticity and Homoskedasticity
157
What Are Heteroskedasticity and Homoskedasticity? 158
Mathematical Implications of Homoskedasticity 160
What Does This Mean in Practice? 161
5.5 The Theoretical Foundations of Ordinary Least Squares
Linear Conditionally Unbiased Estimators and the Gauss–Markov
Theorem 164
Regression Estimators Other Than OLS 165
5.6
Using the t-Statistic in Regression When the Sample Size
Is Small 166
The t-Statistic and the Student t Distribution 166
Use of the Student t Distribution in Practice 167
5.7
Conclusion
168
Appendix 5.1 Formulas for OLS Standard Errors
177
Appendix 5.2 The Gauss–Markov Conditions and a Proof of the
Gauss–Markov Theorem
178
163
Contents
Chapter 6
6.1
Linear Regression with Multiple Regressors 182
Omitted Variable Bias
182
Definition of Omitted Variable Bias 183
A Formula for Omitted Variable Bias 185
Addressing Omitted Variable Bias by Dividing the Data into
Groups 187
6.2 The Multiple Regression Model
189
The Population Regression Line 189
The Population Multiple Regression Model
190
6.3 The OLS Estimator in Multiple Regression
192
The OLS Estimator 193
Application to Test Scores and the Student–Teacher Ratio
6.4
Measures of Fit in Multiple Regression
The Standard Error of the Regression (SER)
The R2 196
The “Adjusted R2” 197
Application to Test Scores 198
194
196
196
6.5 The Least Squares Assumptions in Multiple
Regression 199
Assumption #1: The Conditional Distribution of ui Given X1i, X2i, c, Xki Has a
Mean of Zero 199
Assumption #2: (X1i, X2i, c, Xki, Yi), i = 1, c, n, Are i.i.d. 199
Assumption #3: Large Outliers Are Unlikely 199
Assumption #4: No Perfect Multicollinearity 200
6.6 The Distribution of the OLS Estimators in Multiple
Regression 201
6.7
Multicollinearity
202
Examples of Perfect Multicollinearity
Imperfect Multicollinearity 205
6.8
Conclusion
203
206
Appendix 6.1 Derivation of Equation (6.1)
214
Appendix 6.2 Distribution of the OLS Estimators When There Are Two
Regressors and Homoskedastic Errors
Appendix 6.3 The Frisch–Waugh Theorem
214
215
xi
xii
Contents
Chapter 7
7.1
Hypothesis Tests and Confidence Intervals in Multiple
Regression 217
Hypothesis Tests and Confidence Intervals for a Single Coefficient 217
Standard Errors for the OLS Estimators 217
Hypothesis Tests for a Single Coefficient 218
Confidence Intervals for a Single Coefficient 219
Application to Test Scores and the Student–Teacher Ratio
7.2 Tests of Joint Hypotheses
220
222
Testing Hypotheses on Two or More Coefficients 222
The F-Statistic 224
Application to Test Scores and the Student–Teacher Ratio
The Homoskedasticity-Only F-Statistic 227
226
7.3 Testing Single Restrictions Involving Multiple Coefficients
7.4
Confidence Sets for Multiple Coefficients
7.5
Model Specification for Multiple Regression
229
231
232
Omitted Variable Bias in Multiple Regression 233
The Role of Control Variables in Multiple Regression 234
Model Specification in Theory and in Practice 236
Interpreting the R2 and the Adjusted R2 in Practice 237
7.6 Analysis of the Test Score Data Set
7.7
Conclusion
238
243
Appendix 7.1 The Bonferroni Test of a Joint Hypothesis
Appendix 7.2 Conditional Mean Independence
Chapter 8
251
253
Nonlinear Regression Functions 256
8.1 A General Strategy for Modeling Nonlinear Regression Functions 258
Test Scores and District Income 258
The Effect on Y of a Change in X in Nonlinear Specifications 261
A General Approach to Modeling Nonlinearities Using Multiple Regression
8.2
Nonlinear Functions of a Single Independent Variable
266
266
Polynomials 267
Logarithms 269
Polynomial and Logarithmic Models of Test Scores and District Income
277
xiii
Contents
8.3
Interactions Between Independent Variables
278
Interactions Between Two Binary Variables 279
Interactions Between a Continuous and a Binary Variable
Interactions Between Two Continuous Variables 286
8.4
Nonlinear Effects on Test Scores of the Student–Teacher Ratio
Discussion of Regression Results
Summary of Findings 297
8.5
282
Conclusion
293
293
298
Appendix 8.1 Regression Functions That Are Nonlinear in the
Parameters
309
Appendix 8.2 Slopes and Elasticities for Nonlinear Regression
Functions
Chapter 9
9.1
313
Assessing Studies Based on Multiple Regression 315
Internal and External Validity
315
Threats to Internal Validity 316
Threats to External Validity 317
9.2 Threats to Internal Validity of Multiple Regression Analysis
Omitted Variable Bias 319
Misspecification of the Functional Form of the Regression Function
Measurement Error and Errors-in-Variables Bias 322
Missing Data and Sample Selection 325
Simultaneous Causality 326
Sources of Inconsistency of OLS Standard Errors 329
9.3
321
Internal and External Validity When the Regression Is Used for
Forecasting 331
Using Regression Models for Forecasting 331
Assessing the Validity of Regression Models for Forecasting
9.4 Example: Test Scores and Class Size
External Validity 332
Internal Validity 339
Discussion and Implications
9.5
319
Conclusion
332
332
341
342
Appendix 9.1 The Massachusetts Elementary School Testing Data
349
xiv
Contents
Part Three
Further Topics in Regression Analysis
Chapter 10
Regression with Panel Data 350
10.1
Panel Data
351
Example: Traffic Deaths and Alcohol Taxes
352
10.2
Panel Data with Two Time Periods: “Before and After”
Comparisons 354
10.3
Fixed Effects Regression
357
The Fixed Effects Regression Model
Estimation and Inference 359
Application to Traffic Deaths 361
357
10.4 Regression with Time Fixed Effects
Time Effects Only 362
Both Entity and Time Fixed Effects
361
363
10.5 The Fixed Effects Regression Assumptions and Standard Errors for
Fixed Effects Regression 365
The Fixed Effects Regression Assumptions 365
Standard Errors for Fixed Effects Regression 367
10.6
Drunk Driving Laws and Traffic Deaths
10.7
Conclusion
368
372
Appendix 10.1 The State Traffic Fatality Data Set
380
Appendix 10.2 Standard Errors for Fixed Effects Regression
Chapter 11
11.1
380
Regression with a Binary Dependent Variable 385
Binary Dependent Variables and the Linear Probability Model
Binary Dependent Variables 386
The Linear Probability Model 388
11.2
Probit and Logit Regression
391
Probit Regression 391
Logit Regression 396
Comparing the Linear Probability, Probit, and Logit Models
398
11.3 Estimation and Inference in the Logit and Probit Models
Nonlinear Least Squares Estimation
399
398
386
Contents
Maximum Likelihood Estimation
Measures of Fit 401
400
11.4 Application to the Boston HMDA Data
11.5
Conclusion
402
409
Appendix 11.1 The Boston HMDA Data Set
418
Appendix 11.2 Maximum Likelihood Estimation
418
Appendix 11.3 Other Limited Dependent Variable Models
Chapter 12
421
Instrumental Variables Regression 424
12.1 The IV Estimator with a Single Regressor and a Single
Instrument 425
The IV Model and Assumptions 425
The Two Stage Least Squares Estimator 426
Why Does IV Regression Work? 427
The Sampling Distribution of the TSLS Estimator
Application to the Demand for Cigarettes 433
12.2 The General IV Regression Model
431
435
TSLS in the General IV Model 437
Instrument Relevance and Exogeneity in the General IV Model 438
The IV Regression Assumptions and Sampling Distribution of the
TSLS Estimator 439
Inference Using the TSLS Estimator 440
Application to the Demand for Cigarettes 441
12.3
Checking Instrument Validity
442
Assumption #1: Instrument Relevance 443
Assumption #2: Instrument Exogeneity 445
12.4 Application to the Demand for Cigarettes
448
12.5
453
Where Do Valid Instruments Come From?
Three Examples
12.6
Conclusion
454
458
Appendix 12.1 The Cigarette Consumption Panel Data Set
467
Appendix 12.2 Derivation of the Formula for the TSLS Estimator in
Equation (12.4)
467
xv
xvi
Contents
Appendix 12.3 Large-Sample Distribution of the TSLS Estimator
468
Appendix 12.4 Large-Sample Distribution of the TSLS Estimator When
the Instrument Is Not Valid
469
Appendix 12.5 Instrumental Variables Analysis with Weak
Instruments
471
Appendix 12.6 TSLS with Control Variables
Chapter 13
13.1
473
Experiments and Quasi-Experiments
475
Potential Outcomes, Causal Effects, and Idealized
Experiments 476
Potential Outcomes and the Average Causal Effect 476
Econometric Methods for Analyzing Experimental Data 478
13.2 Threats to Validity of Experiments
479
Threats to Internal Validity 479
Threats to External Validity 483
13.3 Experimental Estimates of the Effect of Class Size
Reductions 484
Experimental Design 485
Analysis of the STAR Data 486
Comparison of the Observational and Experimental Estimates of Class Size
Effects 491
13.4
Quasi-Experiments
493
Examples 494
The Differences-in-Differences Estimator 496
Instrumental Variables Estimators 499
Regression Discontinuity Estimators 500
13.5
Potential Problems with Quasi-Experiments
502
Threats to Internal Validity 502
Threats to External Validity 504
13.6 Experimental and Quasi-Experimental Estimates in Heterogeneous
Populations 504
OLS with Heterogeneous Causal Effects 505
IV Regression with Heterogeneous Causal Effects
506
Contents
13.7
Conclusion
xvii
509
Appendix 13.1 The Project STAR Data Set
518
Appendix 13.2 IV Estimation When the Causal Effect Varies Across
Individuals
518
Appendix 13.3 The Potential Outcomes Framework for Analyzing Data
from Experiments
520
Part Four Regression Analysis of Economic Time Series Data
Chapter 14
Introduction to Time Series Regression and Forecasting 522
14.1
Using Regression Models for Forecasting
523
14.2
Introduction to Time Series Data and Serial Correlation
Real GDP in the United States 524
Lags, First Differences, Logarithms, and Growth Rates
Autocorrelation 528
Other Examples of Economic Time Series 529
14.3 Autoregressions
524
525
531
The First-Order Autoregressive Model 531
The pth-Order Autoregressive Model 534
14.4 Time Series Regression with Additional Predictors and the
Autoregressive Distributed Lag Model 537
Forecasting GDP Growth Using the Term Spread 537
Stationarity 540
Time Series Regression with Multiple Predictors 541
Forecast Uncertainty and Forecast Intervals 544
14.5
Lag Length Selection Using Information Criteria
547
Determining the Order of an Autoregression 547
Lag Length Selection in Time Series Regression with Multiple Predictors
14.6
Nonstationarity I: Trends
551
What Is a Trend? 551
Problems Caused by Stochastic Trends 554
Detecting Stochastic Trends: Testing for a Unit AR Root 556
Avoiding the Problems Caused by Stochastic Trends 561
550
xviii
Contents
14.7
Nonstationarity II: Breaks
561
What Is a Break? 562
Testing for Breaks 562
Pseudo Out-of-Sample Forecasting 567
Avoiding the Problems Caused by Breaks 573
14.8
Conclusion
573
Appendix 14.1 Time Series Data Used in Chapter 14
Appendix 14.2 Stationarity in the AR(1) Model
Appendix 14.3 Lag Operator Notation
Appendix 14.4 ARMA Models
583
584
585
586
Appendix 14.5 Consistency of the BIC Lag Length Estimator
Chapter 15
Estimation of Dynamic Causal Effects 589
15.1 An Initial Taste of the Orange Juice Data
15.2
587
Dynamic Causal Effects
590
593
Causal Effects and Time Series Data
Two Types of Exogeneity 596
593
15.3 Estimation of Dynamic Causal Effects with Exogenous
Regressors 597
The Distributed Lag Model Assumptions 598
Autocorrelated ut, Standard Errors, and Inference 599
Dynamic Multipliers and Cumulative Dynamic Multipliers
15.4
600
Heteroskedasticity- and Autocorrelation-Consistent Standard
Errors 601
Distribution of the OLS Estimator with Autocorrelated Errors
HAC Standard Errors 604
602
15.5 Estimation of Dynamic Causal Effects with Strictly Exogenous
Regressors 606
The Distributed Lag Model with AR(1) Errors 607
OLS Estimation of the ADL Model 610
GLS Estimation 611
The Distributed Lag Model with Additional Lags and AR(p) Errors
15.6
Orange Juice Prices and Cold Weather
616
613
Contents
15.7
Is Exogeneity Plausible? Some Examples
U.S. Income and Australian Exports 624
Oil Prices and Inflation 625
Monetary Policy and Inflation 626
The Growth Rate of GDP and the Term Spread
15.8
Conclusion
624
626
627
Appendix 15.1 The Orange Juice Data Set
634
Appendix 15.2 The ADL Model and Generalized Least Squares in Lag
Operator Notation
Chapter 16
16.1
634
Additional Topics in Time Series Regression 638
Vector Autoregressions
638
The VAR Model 639
A VAR Model of the Growth Rate of GDP and the Term Spread
16.2
Multiperiod Forecasts
642
643
Iterated Multiperiod Forecasts 643
Direct Multiperiod Forecasts 645
Which Method Should You Use? 648
16.3
Orders of Integration and the DF-GLS Unit Root Test
Other Models of Trends and Orders of Integration 649
The DF-GLS Test for a Unit Root 651
Why Do Unit Root Tests Have Nonnormal Distributions?
16.4
Cointegration
654
656
Cointegration and Error Correction 656
How Can You Tell Whether Two Variables Are Cointegrated?
Estimation of Cointegrating Coefficients 659
Extension to Multiple Cointegrated Variables 661
Application to Interest Rates 662
16.5
Volatility Clustering and Autoregressive Conditional
Heteroskedasticity 664
Volatility Clustering 664
Autoregressive Conditional Heteroskedasticity
Application to Stock Price Volatility 667
16.6
Conclusion
649
670
666
658
xix
xx
Contents
Part Five
The Econometric Theory of Regression Analysis
Chapter 17
The Theory of Linear Regression with One Regressor 676
17.1 The Extended Least Squares Assumptions and the OLS Estimator 677
The Extended Least Squares Assumptions
The OLS Estimator 679
17.2
677
Fundamentals of Asymptotic Distribution Theory
679
Convergence in Probability and the Law of Large Numbers 680
The Central Limit Theorem and Convergence in Distribution 682
Slutsky’s Theorem and the Continuous Mapping Theorem 683
Application to the t-Statistic Based on the Sample Mean 684
17.3 Asymptotic Distribution of the OLS Estimator and
t-Statistic 685
Consistency and Asymptotic Normality of the OLS Estimators 685
Consistency of Heteroskedasticity-Robust Standard Errors 685
Asymptotic Normality of the Heteroskedasticity-Robust t-Statistic 687
17.4 Exact Sampling Distributions When the Errors Are Normally
Distributed 687
Distribution of βn1 with Normal Errors 687
Distribution of the Homoskedasticity-Only t-Statistic
17.5
Weighted Least Squares
689
690
WLS with Known Heteroskedasticity 690
WLS with Heteroskedasticity of Known Functional Form 691
Heteroskedasticity-Robust Standard Errors or WLS? 694
Appendix 17.1 The Normal and Related Distributions and Moments of
Continuous Random Variables
Appendix 17.2 Two Inequalities
Chapter 18
700
703
The Theory of Multiple Regression 705
18.1 The Linear Multiple Regression Model and OLS Estimator in Matrix
Form 706
The Multiple Regression Model in Matrix Notation
The Extended Least Squares Assumptions 708
The OLS Estimator 709
706
xxi
Contents
18.2 Asymptotic Distribution of the OLS Estimator and t-Statistic
710
The Multivariate Central Limit Theorem 710
Asymptotic Normality of bn 711
Heteroskedasticity-Robust Standard Errors 712
Confidence Intervals for Predicted Effects 713
Asymptotic Distribution of the t-Statistic 713
18.3 Tests of Joint Hypotheses
713
Joint Hypotheses in Matrix Notation 714
Asymptotic Distribution of the F-Statistic 714
Confidence Sets for Multiple Coefficients 715
18.4
Distribution of Regression Statistics with Normal Errors
Matrix Representations of OLS Regression Statistics
Distribution of bn with Normal Errors 717
Distribution of s2uN 718
Homoskedasticity-Only Standard Errors 718
Distribution of the t-Statistic 719
Distribution of the F-Statistic 719
716
716
18.5 Efficiency of the OLS Estimator with Homoskedastic Errors
720
The Gauss–Markov Conditions for Multiple Regression 720
Linear Conditionally Unbiased Estimators 720
The Gauss–Markov Theorem for Multiple Regression 721
18.6
Generalized Least Squares
722
The GLS Assumptions 723
GLS When Ω Is Known 725
GLS When Ω Contains Unknown Parameters 726
The Zero Conditional Mean Assumption and GLS 726
18.7
Instrumental Variables and Generalized Method of Moments
Estimation 728
The IV Estimator in Matrix Form 729
Asymptotic Distribution of the TSLS Estimator 730
Properties of TSLS When the Errors Are Homoskedastic 731
Generalized Method of Moments Estimation in Linear Models
Appendix 18.1 Summary of Matrix Algebra
Appendix 18.2 Multivariate Distributions
734
746
749
Appendix 18.3 Derivation of the Asymptotic Distribution of βn
751
xxii
Contents
Appendix 18.4 Derivations of Exact Distributions of OLS Test Statistics
with Normal Errors
752
Appendix 18.5 Proof of the Gauss–Markov Theorem for Multiple
Regression
753
Appendix 18.6 Proof of Selected Results for IV and GMM Estimation
Appendix 757
References 765
Glossary 771
Index 779
754
Key Concepts
Part One
Introduction and Review
1.1
Cross-Sectional, Time Series, and Panel Data 12
2.1 Expected Value and the Mean 20
2.2
Variance and Standard Deviation 21
2.3
Means, Variances, and Covariances of Sums of Random Variables 35
2.4
Computing Probabilities Involving Normal Random Variables 37
2.5
Simple Random Sampling and i.i.d. Random Variables 44
2.6
Convergence in Probability, Consistency, and the Law of Large Numbers 48
2.7 The Central Limit Theorem 52
3.1 Estimators and Estimates 67
3.2
Bias, Consistency, and Efficiency 68
3.3 Efficiency of Y : Y Is BLUE 69
3.4 The Standard Error of Y 75
3.5 The Terminology of Hypothesis Testing 78
3.6 Testing the Hypothesis E(Y) = μY,0 Against the Alternative E(Y) ≠ μY,0 79
3.7
Confidence Intervals for the Population Mean 81
Part Two
Fundamentals of Regression Analysis
4.1 Terminology for the Linear Regression Model with a Single Regressor 113
4.2 The OLS Estimator, Predicted Values, and Residuals 117
4.3 The Least Squares Assumptions 129
4.4
Large-Sample Distributions of bn0 and bn1 131
5.1
General Form of the t-Statistic 147
5.2 Testing the Hypothesis b1 = b1,0 Against the Alternative b1 ≠ b1,0 149
5.3
Confidence Interval for β1 154
5.4
Heteroskedasticity and Homoskedasticity 159
5.5 The Gauss–Markov Theorem for bn1 165
6.1
Omitted Variable Bias in Regression with a Single Regressor 185
6.2 The Multiple Regression Model 192
6.3 The OLS Estimators, Predicted Values, and Residuals in the Multiple Regression
Model 194
6.4 The Least Squares Assumptions in the Multiple Regression Model 201
6.5
Large-Sample Distribution of bn0, bn1, c, bnk 202
7.1 Testing the Hypothesis bj = bj,0 Against the Alternative bj ≠ bj,0 219
7.2
Confidence Intervals for a Single Coefficient in Multiple Regression 220
xxiii
xxiv
Key Concepts
7.3
Omitted Variable Bias in Multiple Regression 233
7.4
R2 and R 2: What They Tell You—and What They Don’t 238
8.1 The Expected Change on Y of a Change in X1 in the Nonlinear Regression
Model (8.3) 263
8.2
Logarithms in Regression: Three Cases 276
8.3 A Method for Interpreting Coefficients in Regressions with Binary
Variables 281
8.4
Interactions Between Binary and Continuous Variables 284
8.5
Interactions in Multiple Regression 289
9.1
Internal and External Validity 316
9.2
Omitted Variable Bias: Should I Include More Variables in
My Regression? 321
9.3
Functional Form Misspecification 322
9.4 Errors-in-Variables Bias 324
9.5
Sample Selection Bias 326
9.6
Simultaneous Causality Bias 329
9.7 Threats to the Internal Validity of a Multiple Regression Study 330
Part Three
Further Topics in Regression Analysis
10.1 Notation for Panel Data 351
10.2 The Fixed Effects Regression Model 359
10.3 The Fixed Effects Regression Assumptions 366
11.1 The Linear Probability Model 389
11.2 The Probit Model, Predicted Probabilities, and Estimated Effects
11.3 Logit Regression 396
12.1 The General Instrumental Variables Regression Model and
Terminology 436
12.2 Two Stage Least Squares 438
12.3 The Two Conditions for Valid Instruments 439
12.4 The IV Regression Assumptions 440
12.5 A Rule of Thumb for Checking for Weak Instruments 444
12.6 The Overidentifying Restrictions Test (The J-Statistic) 448
Part four
394
Regression Analysis of Economic Time Series Data
14.1 Lags, First Differences, Logarithms, and Growth Rates 527
14.2 Autocorrelation (Serial Correlation) and Autocovariance 528
14.3 Autoregressions 535
14.4 The Autoregressive Distributed Lag Model 540
Key Concepts
14.5 Stationarity 541
14.6 Time Series Regression with Multiple Predictors 542
14.7 Granger Causality Tests (Tests of Predictive Content) 543
14.8 The Augmented Dickey–Fuller Test for a Unit Autoregressive Root
14.9 The QLR Test for Coefficient Stability 566
14.10 Pseudo Out-of-Sample Forecasts 568
15.1 The Distributed Lag Model and Exogeneity 598
15.2 The Distributed Lag Model Assumptions 599
15.3 HAC Standard Errors 607
15.4 Estimation of Dynamic Multipliers Under Strict Exogeneity 616
16.1 Vector Autoregressions 639
16.2 Iterated Multiperiod Forecasts 646
16.3 Direct Multiperiod Forecasts 648
16.4 Orders of Integration, Differencing, and Stationarity 650
16.5 Cointegration 657
Part five
559
Regression Analysis of Economic Time Series Data
17.1 The Extended Least Squares Assumptions for Regression with a
Single Regressor 678
18.1 The Extended Least Squares Assumptions in the Multiple Regression
Model 707
18.2 The Multivariate Central Limit Theorem 711
18.3 Gauss–Markov Theorem for Multiple Regression 722
18.4 The GLS Assumptions 724
xxv
This page intentionally left blank
General Interest Boxes
The Distribution of Earnings in the United States in 2012 33
A Bad Day on Wall Street 39
Financial Diversification and Portfolios 46
Landon Wins! 70
The Gender Gap of Earnings of College Graduates in the United States 86
A Novel Way to Boost Retirement Savings 90
The “Beta” of a Stock 120
The Economic Value of a Year of Education: Homoskedasticity or
Heteroskedasticity? 162
The Mozart Effect: Omitted Variable Bias? 186
The Return to Education and the Gender Gap 287
The Demand for Economics Journals 290
Do Stock Mutual Funds Outperform the Market? 327
James Heckman and Daniel McFadden, Nobel Laureates 410
Who Invented Instrumental Variables Regression? 428
A Scary Regression 446
The Externalities of Smoking 450
The Hawthorne Effect 482
What Is the Effect on Employment of the Minimum Wage? 497
Can You Beat the Market? Part I 536
The River of Blood 546
Can You Beat the Market? Part II 570
Orange Trees on the March 623
NEWS FLASH: Commodity Traders Send Shivers Through Disney World 625
Nobel Laureates in Time Series Econometrics 669
xxvii
This page intentionally left blank
Preface
E
conometrics can be a fun course for both teacher and student. The real world
of economics, business, and government is a complicated and messy place,
full of competing ideas and questions that demand answers. Is it more effective
to tackle drunk driving by passing tough laws or by increasing the tax on alcohol?
Can you make money in the stock market by buying when prices are historically
low, relative to earnings, or should you just sit tight, as the random walk theory
of stock prices suggests? Can we improve elementary education by reducing class
sizes, or should we simply have our children listen to Mozart for 10 minutes a day?
Econometrics helps us sort out sound ideas from crazy ones and find quantitative
answers to important quantitative questions. Econometrics opens a window on
our complicated world that lets us see the relationships on which people, businesses, and governments base their decisions.
Introduction to Econometrics is designed for a first course in undergraduate econometrics. It is our experience that to make econometrics relevant in
an introductory course, interesting applications must motivate the theory and
the theory must match the applications. This simple principle represents a significant departure from the older generation of econometrics books, in which
theoretical models and assumptions do not match the applications. It is no wonder that some students question the relevance of econometrics after they spend
much of their time learning assumptions that they subsequently realize are unrealistic so that they must then learn “solutions” to “problems” that arise when
the applications do not match the assumptions. We believe that it is far better
to motivate the need for tools with a concrete application and then to provide a
few simple assumptions that match the application. Because the theory is immediately relevant to the applications, this approach can make econometrics come
alive.
New to the Third Edition
• Updated treatment of standard errors for panel data regression
• Discussion of when and why missing data can present a problem for regression
analysis
• The use of regression discontinuity design as a method for analyzing quasiexperiments
xxix
xxx
Preface
• Updated discussion of weak instruments
• Discussion of the use and interpretation of control variables integrated into
the core development of regression analysis
• Introduction of the “potential outcomes” framework for experimental data
• Additional general interest boxes
• Additional exercises, both pencil-and-paper and empirical
This third edition builds on the philosophy of the first and second editions
that applications should drive the theory, not the other way around.
One substantial change in this edition concerns inference in regression with
panel data (Chapter 10). In panel data, the data within an entity typically are
correlated over time. For inference to be valid, standard errors must be computed using a method that is robust to this correlation. The chapter on panel data
now uses one such method, clustered standard errors, from the outset. Clustered
standard errors are the natural extension to panel data of the heteroskedasticityrobust standard errors introduced in the initial treatment of regression analysis in
Part II. Recent research has shown that clustered standard errors have a number
of desirable properties, which are now discussed in Chapter 10 and in a revised
appendix to Chapter 10.
Another substantial set of changes concerns the treatment of experiments
and quasi-experiments in Chapter 13. The discussion of differences-in-differences
regression has been streamlined and draws directly on the multiple regression
principles introduced in Part II. Chapter 13 now discusses regression discontinuity
design, which is an intuitive and important framework for the analysis of quasiexperimental data. In addition, Chapter 13 now introduces the potential outcomes
framework and relates this increasingly commonplace terminology to concepts
that were introduced in Parts I and II.
This edition has a number of other significant changes. One is that it incorporates a precise but accessible treatment of control variables into the initial
discussion of multiple regression. Chapter 7 now discusses conditions for control variables being successful in the sense that the coefficient on the variable
of interest is unbiased even though the coefficients on the control variables
generally are not. Other changes include a new discussion of missing data
in Chapter 9, a new optional calculus-based appendix to Chapter 8 on slopes
and elasticities of nonlinear regression functions, and an updated discussion
in Chapter 12 of what to do if you have weak instruments. This edition also
includes new general interest boxes, updated empirical examples, and additional
exercises.
Preface
xxxi
The Updated Third Edition
• The time series data used in Chapters 14–16 have been extended through the
beginning of 2013 and now include the Great Recession.
• The empirical analysis in Chapter 14 now focuses on forecasting the growth
rate of real GDP using the term spread, replacing the Phillips curve forecasts
from earlier editions.
• Several new empirical exercises have been added to each chapter. Rather
than include all of the empirical exercises in the text, we have moved many of
them to the Companion Website, www.pearsonhighered.com/stock_watson.
This has two main advantages: first, we can offer more and more in-depth
exercises, and second, we can add and update exercises between editions. We
encourage you to browse the empirical exercises available on the Companion
Website.
Features of This Book
Introduction to Econometrics differs from other textbooks in three main ways.
First, we integrate real-world questions and data into the development of the
theory, and we take seriously the substantive findings of the resulting empirical
analysis. Second, our choice of topics reflects modern theory and practice. Third,
we provide theory and assumptions that match the applications. Our aim is to
teach students to become sophisticated consumers of econometrics and to do so
at a level of mathematics appropriate for an introductory course.
Real-World Questions and Data
We organize each methodological topic around an important real-world question
that demands a specific numerical answer. For example, we teach single-variable
regression, multiple regression, and functional form analysis in the context of
estimating the effect of school inputs on school outputs. (Do smaller elementary
school class sizes produce higher test scores?) We teach panel data methods in the
context of analyzing the effect of drunk driving laws on traffic fatalities. We use
possible racial discrimination in the market for home loans as the empirical application for teaching regression with a binary dependent variable (logit and probit).
We teach instrumental variable estimation in the context of estimating the demand
elasticity for cigarettes. Although these examples involve economic reasoning, all
xxxii
Preface
can be understood with only a single introductory course in economics, and many
can be understood without any previous economics coursework. Thus the instructor can focus on teaching econometrics, not microeconomics or macroeconomics.
We treat all our empirical applications seriously and in a way that shows
students how they can learn from data but at the same time be self-critical and
aware of the limitations of empirical analyses. Through each application, we teach
students to explore alternative specifications and thereby to assess whether their
substantive findings are robust. The questions asked in the empirical applications are important, and we provide serious and, we think, credible answers. We
encourage students and instructors to disagree, however, and invite them to reanalyze the data, which are provided on the textbook’s Companion Website (www
.pearsonhighered.com/stock_watson).
Contemporary Choice of Topics
Econometrics has come a long way since the 1980s. The topics we cover reflect
the best of contemporary applied econometrics. One can only do so much in an
introductory course, so we focus on procedures and tests that are commonly used
in practice. For example:
• Instrumental variables regression. We present instrumental variables regression as a general method for handling correlation between the error term and
a regressor, which can arise for many reasons, including omitted variables
and simultaneous causality. The two assumptions for a valid instrument—
exogeneity and relevance—are given equal billing. We follow that presentation with an extended discussion of where instruments come from and with
tests of overidentifying restrictions and diagnostics for weak instruments,
and we explain what to do if these diagnostics suggest problems.
• Program evaluation. An increasing number of econometric studies analyze
either randomized controlled experiments or quasi-experiments, also known
as natural experiments. We address these topics, often collectively referred
to as program evaluation, in Chapter 13. We present this research strategy as
an alternative approach to the problems of omitted variables, simultaneous
causality, and selection, and we assess both the strengths and the weaknesses
of studies using experimental or quasi-experimental data.
• Forecasting. The chapter on forecasting (Chapter 14) considers univariate
(autoregressive) and multivariate forecasts using time series regression, not
large simultaneous equation structural models. We focus on simple and reliable tools, such as autoregressions and model selection via an information
Preface
xxxiii
criterion, that work well in practice. This chapter also features a practically
oriented treatment of stochastic trends (unit roots), unit root tests, tests for
structural breaks (at known and unknown dates), and pseudo out-of-sample
forecasting, all in the context of developing stable and reliable time series
forecasting models.
• Time series regression. We make a clear distinction between two very different applications of time series regression: forecasting and estimation of
dynamic causal effects. The chapter on causal inference using time series
data (Chapter 15) pays careful attention to when different estimation methods, including generalized least squares, will or will not lead to valid causal
inferences and when it is advisable to estimate dynamic regressions using
OLS with heteroskedasticity- and autocorrelation-consistent standard errors.
Theory That Matches Applications
Although econometric tools are best motivated by empirical applications, students need to learn enough econometric theory to understand the strengths and
limitations of those tools. We provide a modern treatment in which the fit between
theory and applications is as tight as possible, while keeping the mathematics at a
level that requires only algebra.
Modern empirical applications share some common characteristics: The data
sets typically are large (hundreds of observations, often more); regressors are
not fixed over repeated samples but rather are collected by random sampling (or
some other mechanism that makes them random); the data are not normally distributed; and there is no a priori reason to think that the errors are homoskedastic
(although often there are reasons to think that they are heteroskedastic).
These observations lead to important differences between the theoretical
development in this textbook and other textbooks:
• Large-sample approach. Because data sets are large, from the outset we use
large-sample normal approximations to sampling distributions for hypothesis
testing and confidence intervals. In our experience, it takes less time to teach
the rudiments of large-sample approximations than to teach the Student
t and exact F distributions, degrees-of-freedom corrections, and so forth.
This large-sample approach also saves students the frustration of discovering that, because of nonnormal errors, the exact distribution theory they just
mastered is irrelevant. Once taught in the context of the sample mean, the
large-sample approach to hypothesis testing and confidence intervals carries
directly through multiple regression analysis, logit and probit, instrumental
variables estimation, and time series methods.
xxxiv
Preface
• Random sampling. Because regressors are rarely fixed in econometric applications, from the outset we treat data on all variables (dependent and independent) as the result of random sampling. This assumption matches our
initial applications to cross-sectional data, it extends readily to panel and time
series data, and because of our large-sample approach, it poses no additional
conceptual or mathematical difficulties.
• Heteroskedasticity. Applied econometricians routinely use heteroskedasticityrobust standard errors to eliminate worries about whether heteroskedasticity
is present or not. In this book, we move beyond treating heteroskedasticity as an
exception or a “problem” to be “solved”; instead, we allow for heteroskedasticity
from the outset and simply use heteroskedasticity-robust standard errors.
We present homoskedasticity as a special case that provides a theoretical
motivation for OLS.
Skilled Producers, Sophisticated Consumers
We hope that students using this book will become sophisticated consumers of empirical analysis. To do so, they must learn not only how to use the tools of regression
analysis but also how to assess the validity of empirical analyses presented to them.
Our approach to teaching how to assess an empirical study is threefold. First,
immediately after introducing the main tools of regression analysis, we devote
Chapter 9 to the threats to internal and external validity of an empirical study.
This chapter discusses data problems and issues of generalizing findings to other
settings. It also examines the main threats to regression analysis, including omitted variables, functional form misspecification, errors-in-variables, selection, and
simultaneity—and ways to recognize these threats in practice.
Second, we apply these methods for assessing empirical studies to the empirical analysis of the ongoing examples in the book. We do so by considering alternative specifications and by systematically addressing the various threats to validity
of the analyses presented in the book.
Third, to become sophisticated consumers, students need firsthand experience as producers. Active learning beats passive learning, and econometrics is
an ideal course for active learning. For this reason, the textbook website features
data sets, software, and suggestions for empirical exercises of different scopes.
Approach to Mathematics and Level of Rigor
Our aim is for students to develop a sophisticated understanding of the tools of
modern regression analysis, whether the course is taught at a “high” or a “low”
level of mathematics. Parts I through IV of the text (which cover the substantive
Preface
xxxv
material) are accessible to students with only precalculus mathematics. Parts I
through IV have fewer equations and more applications than many introductory
econometrics books and far fewer equations than books aimed at mathematical sections of undergraduate courses. But more equations do not imply a more
sophisticated treatment. In our experience, a more mathematical treatment does
not lead to a deeper understanding for most students.
That said, different students learn differently, and for mathematically wellprepared students, learning can be enhanced by a more explicitly mathematical
treatment. Part V therefore contains an introduction to econometric theory that
is appropriate for students with a stronger mathematical background. When the
mathematical chapters in Part V are used in conjunction with the material in Parts
I through IV, this book is suitable for advanced undergraduate or master’s level
econometrics courses.
Contents and Organization
There are five parts to Introduction to Econometrics. This textbook assumes that
the student has had a course in probability and statistics, although we review that
material in Part I. We cover the core material of regression analysis in Part II. Parts
III, IV, and V present additional topics that build on the core treatment in Part II.
Part I
Chapter 1 introduces econometrics and stresses the importance of providing
quantitative answers to quantitative questions. It discusses the concept of causality in statistical studies and surveys the different types of data encountered in
econometrics. Material from probability and statistics is reviewed in Chapters 2
and 3, respectively; whether these chapters are taught in a given course or are
simply provided as a reference depends on the background of the students.
Part II
Chapter 4 introduces regression with a single regressor and ordinary least squares
(OLS) estimation, and Chapter 5 discusses hypothesis tests and confidence intervals in the regression model with a single regressor. In Chapter 6, students learn
how they can address omitted variable bias using multiple regression, thereby estimating the effect of one independent variable while holding other independent
variables constant. Chapter 7 covers hypothesis tests, including F-tests, and confidence intervals in multiple regression. In Chapter 8, the linear regression model is
xxxvi
Preface
extended to models with nonlinear population regression functions, with a focus
on regression functions that are linear in the parameters (so that the parameters
can be estimated by OLS). In Chapter 9, students step back and learn how to
identify the strengths and limitations of regression studies, seeing in the process
how to apply the concepts of internal and external validity.
Part III
Part III presents extensions of regression methods. In Chapter 10, students learn
how to use panel data to control for unobserved variables that are constant over
time. Chapter 11 covers regression with a binary dependent variable. Chapter 12
shows how instrumental variables regression can be used to address a variety of
problems that produce correlation between the error term and the regressor, and
examines how one might find and evaluate valid instruments. Chapter 13 introduces students to the analysis of data from experiments and quasi-, or natural,
experiments, topics often referred to as “program evaluation.”
Part IV
Part IV takes up regression with time series data. Chapter 14 focuses on forecasting and introduces various modern tools for analyzing time series regressions,
such as unit root tests and tests for stability. Chapter 15 discusses the use of time
series data to estimate causal relations. Chapter 16 presents some more advanced
tools for time series analysis, including models of conditional heteroskedasticity.
Part V
Part V is an introduction to econometric theory. This part is more than an appendix
that fills in mathematical details omitted from the text. Rather, it is a self-contained
treatment of the econometric theory of estimation and inference in the linear regression
model. Chapter 17 develops the theory of regression analysis for a single regressor;
the exposition does not use matrix algebra, although it does demand a higher level of
mathematical sophistication than the rest of the text. Chapter 18 presents and studies
the multiple regression model, instrumental variables regression, and generalized
method of moments estimation of the linear model, all in matrix form.
Prerequisites Within the Book
Because different instructors like to emphasize different material, we wrote this
book with diverse teaching preferences in mind. To the maximum extent possible,
Preface
xxxvii
the chapters in Parts III, IV, and V are “stand-alone” in the sense that they do
not require first teaching all the preceding chapters. The specific prerequisites for
each chapter are described in Table I. Although we have found that the sequence
of topics adopted in the textbook works well in our own courses, the chapters
are written in a way that allows instructors to present topics in a different order
if they so desire.
Sample Courses
This book accommodates several different course structures.
TABLE I
Guide to Prerequisites for Special-Topic Chapters in Parts III, IV, and V
Prerequisite parts or chapters
Part I
Part II
Part III
Part IV
10.1,
10.2
12.1,
12.2
X
X
X
Xa
X
X
X
14
a
X
Xa
b
15
Xa
Xa
b
X
16
Xa
Xa
b
X
17
X
X
X
18
X
X
X
Chapter
1–3
4–7, 9
8
10
Xa
Xa
X
11
a
X
Xa
X
12.1, 12.2
Xa
Xa
X
12.3–12.6
a
X
Xa
13
Xa
X
14.1–14.4
Part V
14.5–14.8
15
X
X
17
X
This table shows the minimum prerequisites needed to cover the material in a given chapter. For example, estimation of dynamic
causal effects with time series data (Chapter 15) first requires Part I (as needed, depending on student preparation, and except as
noted in footnote a), Part II (except for Chapter 8; see footnote b), and Sections 14.1 through 14.4.
aChapters
10 through 16 use exclusively large-sample approximations to sampling distributions, so the optional Sections 3.6 (the
Student t distribution for testing means) and 5.6 (the Student t distribution for testing regression coefficients) can be skipped.
bChapters 14 through 16 (the time series chapters) can be taught without first teaching Chapter 8 (nonlinear regression functions)
if the instructor pauses to explain the use of logarithmic transformations to approximate percentage changes.
xxxviii
Preface
Standard Introductory Econometrics
This course introduces econometrics (Chapter 1) and reviews probability and statistics as needed (Chapters 2 and 3). It then moves on to regression with a single
regressor, multiple regression, the basics of functional form analysis, and the
evaluation of regression studies (all Part II). The course proceeds to cover regression with panel data (Chapter 10), regression with a limited dependent variable
(Chapter 11), and instrumental variables regression (Chapter 12), as time permits.
The course concludes with experiments and quasi-experiments in Chapter 13,
topics that provide an opportunity to return to the questions of estimating causal
effects raised at the beginning of the semester and to recapitulate core regression
methods. Prerequisites: Algebra II and introductory statistics.
Introductory Econometrics with Time Series and
Forecasting Applications
Like a standard introductory course, this course covers all of Part I (as needed)
and Part II. Optionally, the course next provides a brief introduction to panel data
(Sections 10.1 and 10.2) and takes up instrumental variables regression (Chapter 12,
or just Sections 12.1 and 12.2). The course then proceeds to Part IV, covering
forecasting (Chapter 14) and estimation of dynamic causal effects (Chapter 15). If
time permits, the course can include some advanced topics in time series analysis
such as volatility clustering and conditional heteroskedasticity (Section 16.5).
Prerequisites: Algebra II and introductory statistics.
Applied Time Series Analysis and Forecasting
This book also can be used for a short course on applied time series and forecasting, for which a course on regression analysis is a prerequisite. Some time is spent
reviewing the tools of basic regression analysis in Part II, depending on student
preparation. The course then moves directly to Part IV and works through forecasting (Chapter 14), estimation of dynamic causal effects (Chapter 15), and advanced
topics in time series analysis (Chapter 16), including vector autoregressions and
conditional heteroskedasticity. An important component of this course is hands-on
forecasting exercises, available to instructors on the book’s accompanying website.
Prerequisites: Algebra II and basic introductory econometrics or the equivalent.
Introduction to Econometric Theory
This book is also suitable for an advanced undergraduate course in which the
students have a strong mathematical preparation or for a master’s level course in
Preface
xxxix
econometrics. The course briefly reviews the theory of statistics and probability as
necessary (Part I). The course introduces regression analysis using the nonmathematical, applications-based treatment of Part II. This introduction is followed by
the theoretical development in Chapters 17 and 18 (through Section 18.5). The
course then takes up regression with a limited dependent variable (Chapter 11)
and maximum likelihood estimation (Appendix 11.2). Next, the course optionally
turns to instrumental variables regression and generalized method of moments
(Chapter 12 and Section 18.7), time series methods (Chapter 14), and the estimation of causal effects using time series data and generalized least squares (Chapter
15 and Section 18.6). Prerequisites: Calculus and introductory statistics. Chapter 18
assumes previous exposure to matrix algebra.
Pedagogical Features
This textbook has a variety of pedagogical features aimed at helping students
understand, retain, and apply the essential ideas. Chapter introductions provide
real-world grounding and motivation, as well as brief road maps highlighting
the sequence of the discussion. Key terms are boldfaced and defined in context
throughout each chapter, and Key Concept boxes at regular intervals recap the
central ideas. General interest boxes provide interesting excursions into related
topics and highlight real-world studies that use the methods or concepts being
discussed in the text. A Summary concluding each chapter serves as a helpful
framework for reviewing the main points of coverage. The questions in the
Review the Concepts section check students’ understanding of the core content,
Exercises give more intensive practice working with the concepts and techniques
introduced in the chapter, and Empirical Exercises allow students to apply what
they have learned to answer real-world empirical questions. At the end of the
textbook, the Appendix provides statistical tables, the References section lists
sources for further reading, and a Glossary conveniently defines many key terms
in the book.
Supplements to Accompany the Textbook
The online supplements accompanying the third edition update of Introduction to
Econometrics include the Instructor’s Resource Manual, Test Bank, and PowerPoint® slides with text figures, tables, and Key Concepts. The Instructor’s Resource
Manual includes solutions to all the end-of-chapter exercises, while the Test
Bank, offered in Testgen, provides a rich supply of easily edited test problems and
xl
Preface
questions of various types to meet specific course needs. These resources are available for download from the Instructor’s Resource Center at www.pearsonhighered
.com/stock_watson.
Companion Website
The Companion Website, found at www.pearsonhighered.com/stock_watson,
provides a wide range of additional resources for students and faculty. These
resources include more and more in depth empirical exercises, data sets for the
empirical exercises, replication files for empirical results reported in the text,
practice quizzes, answers to end-of-chapter Review the Concepts questions and
Exercises, and EViews tutorials.
MyEconLab
The third edition update is accompanied by a robust MyEconLab course. The
MyEconLab course includes all the Review the Concepts questions as well as
some Exercises and Empirical Exercises. In addition, the enhanced eText available in MyEconLab for the third edition update includes URL links from the
Exercises and Empirical Exercises to questions in the MyEconLab course and to
the data that accompanies them. To register for MyEconLab and to learn more,
log on to www.myeconlab.com.
Acknowledgments
A great many people contributed to the first edition of this book. Our biggest
debts of gratitude are to our colleagues at Harvard and Princeton who used early
drafts of this book in their classrooms. At Harvard’s Kennedy School of Government, Suzanne Cooper provided invaluable suggestions and detailed comments
on multiple drafts. As a coteacher with one of the authors (Stock), she also helped
vet much of the material in this book while it was being developed for a required
course for master’s students at the Kennedy School. We are also indebted to two
other Kennedy School colleagues, Alberto Abadie and Sue Dynarski, for their
patient explanations of quasi-experiments and the field of program evaluation
and for their detailed comments on early drafts of the text. At Princeton, Eli
Tamer taught from an early draft and also provided helpful comments on the
penultimate draft of the book.
Preface
xli
We also owe much to many of our friends and colleagues in econometrics
who spent time talking with us about the substance of this book and who collectively made so many helpful suggestions. Bruce Hansen (University of Wisconsin–
Madison) and Bo Honore (Princeton) provided helpful feedback on very early
outlines and preliminary versions of the core material in Part II. Joshua Angrist
(MIT) and Guido Imbens (University of California, Berkeley) provided thoughtful suggestions about our treatment of materials on program evaluation. Our
presentation of the material on time series has benefited from discussions with
Yacine Ait-Sahalia (Princeton), Graham Elliott (University of California, San
Diego), Andrew Harvey (Cambridge University), and Christopher Sims (Princeton).
Finally, many people made helpful suggestions on parts of the manuscript close to
their area of expertise: Don Andrews (Yale), John Bound (University of Michigan),
Gregory Chow (Princeton), Thomas Downes (Tufts), David Drukker (StataCorp.),
Jean Baldwin Grossman (Princeton), Eric Hanushek (Hoover Institution), James
Heckman (University of Chicago), Han Hong (Princeton), Caroline Hoxby
(Harvard), Alan Krueger (Princeton), Steven Levitt (University of Chicago), Richard
Light (Harvard), David Neumark (Michigan State University), Joseph Newhouse
(Harvard), Pierre Perron (Boston University), Kenneth Warner (University of
Michigan), and Richard Zeckhauser (Harvard).
Many people were very generous in providing us with data. The California test score data were constructed with the assistance of Les Axelrod of the
Standards and Assessments Division, California Department of Education. We
are grateful to Charlie DePascale, Student Assessment Services, Massachusetts
Department of Education, for his help with aspects of the Massachusetts test
score data set. Christopher Ruhm (University of North Carolina, Greensboro)
graciously provided us with his data set on drunk driving laws and traffic fatalities. The research department at the Federal Reserve Bank of Boston deserves
thanks for putting together its data on racial discrimination in mortgage lending;
we particularly thank Geoffrey Tootell for providing us with the updated version
of the data set we use in Chapter 9 and Lynn Browne for explaining its policy
context. We thank Jonathan Gruber (MIT) for sharing his data on cigarette sales,
which we analyze in Chapter 12, and Alan Krueger (Princeton) for his help with
the Tennessee STAR data that we analyze in Chapter 13.
We thank several people for carefully checking the page proof for errors.
Kerry Griffin and Yair Listokin read the entire manuscript, and Andrew Fraker,
Ori Heffetz, Amber Henry, Hong Li, Alessandro Tarozzi, and Matt Watson
worked through several chapters.
In the first edition, we benefited from the help of an exceptional development
editor, Jane Tufts, whose creativity, hard work, and attention to detail improved
xlii
Preface
the book in many ways, large and small. Pearson provided us with first-rate support, starting with our excellent editor, Sylvia Mallory, and extending through the
entire publishing team. Jane and Sylvia patiently taught us a lot about writing,
organization, and presentation, and their efforts are evident on every page of this
book. We extend our thanks to the superb Pearson team, who worked with us on
the second edition: Adrienne D’Ambrosio (senior acquisitions editor), Bridget
Page (associate media producer), Charles Spaulding (senior designer), Nancy
Fenton (managing editor) and her selection of Nancy Freihofer and Thompson
Steele Inc. who handled the entire production process, Heather McNally (supplements coordinator), and Denise Clinton (editor-in-chief). Finally, we had the
benefit of Kay Ueno’s skilled editing in the second edition. We are also grateful to the excellent third edition Pearson team of Adrienne D’Ambrosio, Nancy
Fenton, and Jill Kolongowski, as well as Mary Sanger, the project manager with
Nesbitt Graphics. We also wish to thank the Pearson team who worked on the
third edition update: Christina Masturzo, Carolyn Philips, Liz Napolitano, and
Heidi Allgair, project manager with Cenveo® Publisher Services.
We also received a great deal of help and suggestions from faculty, students,
and researchers as we prepared the third edition and its update. The changes
made in the third edition incorporate or reflect suggestions, corrections, comments, data, and help provided by a number of researchers and instructors: Donald Andrews (Yale University), Jushan Bai (Columbia), James Cobbe (Florida
State University), Susan Dynarski (University of Michigan), Nicole Eichelberger
(Texas Tech University), Boyd Fjeldsted (University of Utah), Martina Grunow,
Daniel Hamermesh (University of Texas–Austin), Keisuke Hirano (University
of Arizona), Bo Honore (Princeton University), Guido Imbens (Harvard University), Manfred Keil (Claremont McKenna College), David Laibson (Harvard
University), David Lee (Princeton University), Brigitte Madrian (Harvard University), Jorge Marquez (University of Maryland), Karen Bennett Mathis (Florida Department of Citrus), Alan Mehlenbacher (University of Victoria), Ulrich
Müller (Princeton University), Serena Ng (Columbia University), Harry Patrinos
(World Bank), Zhuan Pei (Brandeis University), Peter Summers (Texas Tech
University), Andrey Vasnov (University of Sydney), and Douglas Young (Montana State University). We also benefited from student input from F. Hoces dela
Guardia and Carrie Wilson.
Thoughtful reviews for the third edition were prepared for Addison-Wesley
by Steve DeLoach (Elon University), Jeffrey DeSimone (University of Texas at
Arlington), Gary V. Engelhardt (Syracuse University), Luca Flabbi (Georgetown
University), Steffen Habermalz (Northwestern University), Carolyn J. Heinrich
(University of Wisconsin–Madison), Emma M. Iglesias-Vazquez (Michigan State
Preface
xliii
University), Carlos Lamarche (University of Oklahoma), Vicki A. McCracken
(Washington State University), Claudiney M. Pereira (Tulane University), and
John T. Warner (Clemson University). We also received very helpful input on
draft revisions of Chapters 7 and 10 from John Berdell (DePaul University), Janet
Kohlhase (University of Houston), Aprajit Mahajan (Stanford University), Xia
Meng (Brandeis University), and Chan Shen (Georgetown University).
Above all, we are indebted to our families for their endurance throughout this
project. Writing this book took a long time, and for them, the project must have
seemed endless. They, more than anyone else, bore the burden of this commitment, and for their help and support we are deeply grateful.
Introduction
to Econometrics
Chapter
1
Economic Questions and Data
A
sk a half dozen econometricians what econometrics is, and you could get a half
dozen different answers. One might tell you that econometrics is the science of
testing economic theories. A second might tell you that econometrics is the set of
tools used for forecasting future values of economic variables, such as a firm’s sales,
the overall growth of the economy, or stock prices. Another might say that econometrics is the process of fitting mathematical economic models to real-world data.
A fourth might tell you that it is the science and art of using historical data to make
numerical, or quantitative, policy recommendations in government and business.
In fact, all these answers are right. At a broad level, econometrics is the science
and art of using economic theory and statistical techniques to analyze economic
data. Econometric methods are used in many branches of economics, including
finance, labor economics, macroeconomics, microeconomics, marketing, and economic policy. Econometric methods are also commonly used in other social sciences, including political science and sociology.
This book introduces you to the core set of methods used by econometricians.
We will use these methods to answer a variety of specific, quantitative questions
from the worlds of business and government policy. This chapter poses four of those
questions and discusses, in general terms, the econometric approach to answering
them. The chapter concludes with a survey of the main types of data available to
econometricians for answering these and other quantitative economic questions.
1.1
Economic Questions We Examine
Many decisions in economics, business, and government hinge on understanding
relationships among variables in the world around us. These decisions require
quantitative answers to quantitative questions.
This book examines several quantitative questions taken from current issues
in economics. Four of these questions concern education policy, racial bias in
mortgage lending, cigarette consumption, and macroeconomic forecasting.
1
2
Chapter 1 Economic Questions and Data
Question #1: Does Reducing Class Size Improve
Elementary School Education?
Proposals for reform of the U.S. public education system generate heated debate.
Many of the proposals concern the youngest students, those in elementary schools.
Elementary school education has various objectives, such as developing social
skills, but for many parents and educators, the most important objective is basic
academic learning: reading, writing, and basic mathematics. One prominent proposal for improving basic learning is to reduce class sizes at elementary schools.
With fewer students in the classroom, the argument goes, each student gets more
of the teacher’s attention, there are fewer class disruptions, learning is enhanced,
and grades improve.
But what, precisely, is the effect on elementary school education of reducing
class size? Reducing class size costs money: It requires hiring more teachers and,
if the school is already at capacity, building more classrooms. A decision maker
contemplating hiring more teachers must weigh these costs against the benefits.
To weigh costs and benefits, however, the decision maker must have a precise
quantitative understanding of the likely benefits. Is the beneficial effect on basic
learning of smaller classes large or small? Is it possible that smaller class size actually has no effect on basic learning?
Although common sense and everyday experience may suggest that more
learning occurs when there are fewer students, common sense cannot provide a
quantitative answer to the question of what exactly is the effect on basic learning
of reducing class size. To provide such an answer, we must examine empirical
evidence—that is, evidence based on data—relating class size to basic learning in
elementary schools.
In this book, we examine the relationship between class size and basic learning, using data gathered from 420 California school districts in 1999. In the California data, students in districts with small class sizes tend to perform better on
standardized tests than students in districts with larger classes. While this fact is
consistent with the idea that smaller classes produce better test scores, it might
simply reflect many other advantages that students in districts with small classes
have over their counterparts in districts with large classes. For example, districts
with small class sizes tend to have wealthier residents than districts with large
classes, so students in small-class districts could have more opportunities for
learning outside the classroom. It could be these extra learning opportunities that
lead to higher test scores, not smaller class sizes. In Part II, we use multiple regression analysis to isolate the effect of changes in class size from changes in other
factors, such as the economic background of the students.
1.1 Economic Questions We Examine
3
Question #2: Is There Racial Discrimination
in the Market for Home Loans?
Most people buy their homes with the help of a mortgage, a large loan secured by
the value of the home. By law, U.S. lending institutions cannot take race into
account when deciding to grant or deny a request for a mortgage: Applicants who
are identical in all ways except their race should be equally likely to have their
mortgage applications approved. In theory, then, there should be no racial bias in
mortgage lending.
In contrast to this theoretical conclusion, researchers at the Federal Reserve
Bank of Boston found (using data from the early 1990s) that 28% of black applicants are denied mortgages, while only 9% of white applicants are denied. Do
these data indicate that, in practice, there is racial bias in mortgage lending? If so,
how large is it?
The fact that more black than white applicants are denied in the Boston Fed
data does not by itself provide evidence of discrimination by mortgage lenders
because the black and white applicants differ in many ways other than their race.
Before concluding that there is bias in the mortgage market, these data must be
examined more closely to see if there is a difference in the probability of being
denied for otherwise identical applicants and, if so, whether this difference is
large or small. To do so, in Chapter 11 we introduce econometric methods that
make it possible to quantify the effect of race on the chance of obtaining a mortgage, holding constant other applicant characteristics, notably their ability to
repay the loan.
Question #3: How Much Do Cigarette Taxes
Reduce Smoking?
Cigarette smoking is a major public health concern worldwide. Many of the costs
of smoking, such as the medical expenses of caring for those made sick by smoking
and the less quantifiable costs to nonsmokers who prefer not to breathe secondhand
cigarette smoke, are borne by other members of society. Because these costs are
borne by people other than the smoker, there is a role for government intervention
in reducing cigarette consumption. One of the most flexible tools for cutting
consumption is to increase taxes on cigarettes.
Basic economics says that if cigarette prices go up, consumption will go down.
But by how much? If the sales price goes up by 1%, by what percentage will the
quantity of cigarettes sold decrease? The percentage change in the quantity
demanded resulting from a 1% increase in price is the price elasticity of demand.
4
Chapter 1 Economic Questions and Data
If we want to reduce smoking by a certain amount, say 20%, by raising taxes, then
we need to know the price elasticity of demand to calculate the price increase
necessary to achieve this reduction in consumption. But what is the price elasticity
of demand for cigarettes?
Although economic theory provides us with the concepts that help us answer
this question, it does not tell us the numerical value of the price elasticity of
demand. To learn the elasticity, we must examine empirical evidence about the
behavior of smokers and potential smokers; in other words, we need to analyze
data on cigarette consumption and prices.
The data we examine are cigarette sales, prices, taxes, and personal income
for U.S. states in the 1980s and 1990s. In these data, states with low taxes, and thus
low cigarette prices, have high smoking rates, and states with high prices have low
smoking rates. However, the analysis of these data is complicated because causality runs both ways: Low taxes lead to high demand, but if there are many smokers
in the state, then local politicians might try to keep cigarette taxes low to satisfy
their smoking constituents. In Chapter 12, we study methods for handling this
“simultaneous causality” and use those methods to estimate the price elasticity of
cigarette demand.
Question #4: By How Much Will U.S. GDP
Grow Next Year?
It seems that people always want a sneak preview of the future. What will sales be
next year at a firm that is considering investing in new equipment? Will the stock
market go up next month, and, if it does, by how much? Will city tax receipts next
year cover planned expenditures on city services? Will your microeconomics
exam next week focus on externalities or monopolies? Will Saturday be a nice day
to go to the beach?
One aspect of the future in which macroeconomists are particularly interested
is the growth of real economic activity, as measured by real gross domestic product
(GDP), during the next year. A management consulting firm might advise a manufacturing client to expand its capacity based on an upbeat forecast of economic
growth. Economists at the Federal Reserve Board in Washington, D.C., are mandated to set policy to keep real GDP near its potential in order to maximize
employment. If they forecast anemic GDP growth over the next year, they might
expand liquidity in the economy by reducing interest rates or other measures, in
an attempt to boost economic activity.
Professional economists who rely on precise numerical forecasts use econometric models to make those forecasts. A forecaster’s job is to predict the future
1.2 Causal Effects and Idealized Experiments
5
by using the past, and econometricians do this by using economic theory and
statistical techniques to quantify relationships in historical data.
The data we use to forecast the growth rate of GDP are past values of GDP
and the “term spread” in the United States. The term spread is the difference
between long-term and short-term interest rates. It measures, among other things,
whether investors expect short-term interest rates to rise or fall in the future. The
term spread is usually positive, but it tends to fall sharply before the onset of a
recession. One of the GDP growth rate forecasts we develop and evaluate in
Chapter 14 is based on the term spread.
Quantitative Questions, Quantitative Answers
Each of these four questions requires a numerical answer. Economic theory provides clues about that answer—for example, cigarette consumption ought to go
down when the price goes up—but the actual value of the number must be learned
empirically, that is, by analyzing data. Because we use data to answer quantitative
questions, our answers always have some uncertainty: A different set of data
would produce a different numerical answer. Therefore, the conceptual framework for the analysis needs to provide both a numerical answer to the question
and a measure of how precise the answer is.
The conceptual framework used in this book is the multiple regression model,
the mainstay of econometrics. This model, introduced in Part II, provides a mathematical way to quantify how a change in one variable affects another variable,
holding other things constant. For example, what effect does a change in class size
have on test scores, holding constant or controlling for student characteristics (such
as family income) that a school district administrator cannot control? What effect
does your race have on your chances of having a mortgage application granted,
holding constant other factors such as your ability to repay the loan? What effect
does a 1% increase in the price of cigarettes have on cigarette consumption, holding constant the income of smokers and potential smokers? The multiple regression model and its extensions provide a framework for answering these questions
using data and for quantifying the uncertainty associated with those answers.
1.2
Causal Effects and Idealized Experiments
Like many other questions encountered in econometrics, the first three questions
in Section 1.1 concern causal relationships among variables. In common usage, an
action is said to cause an outcome if the outcome is the direct result, or consequence,
6
Chapter 1 Economic Questions and Data
of that action. Touching a hot stove causes you to get burned; drinking water
causes you to be less thirsty; putting air in your tires causes them to inflate; putting
fertilizer on your tomato plants causes them to produce more tomatoes. Causality
means that a specific action (applying fertilizer) leads to a specific, measurable
consequence (more tomatoes).
Estimation of Causal Effects
How best might we measure the causal effect on tomato yield (measured in kilograms) of applying a certain amount of fertilizer, say 100 grams of fertilizer per
square meter?
One way to measure this causal effect is to conduct an experiment. In that
experiment, a horticultural researcher plants many plots of tomatoes. Each plot
is tended identically, with one exception: Some plots get 100 grams of fertilizer
per square meter, while the rest get none. Moreover, whether a plot is fertilized
or not is determined randomly by a computer, ensuring that any other differences
between the plots are unrelated to whether they receive fertilizer. At the end of
the growing season, the horticulturalist weighs the harvest from each plot. The
difference between the average yield per square meter of the treated and
untreated plots is the effect on tomato production of the fertilizer treatment.
This is an example of a randomized controlled experiment. It is controlled in
the sense that there are both a control group that receives no treatment (no fertilizer) and a treatment group that receives the treatment (100 g/m2 of fertilizer). It
is randomized in the sense that the treatment is assigned randomly. This random
assignment eliminates the possibility of a systematic relationship between, for
example, how sunny the plot is and whether it receives fertilizer so that the only
systematic difference between the treatment and control groups is the treatment.
If this experiment is properly implemented on a large enough scale, then it will
yield an estimate of the causal effect on the outcome of interest (tomato production) of the treatment (applying 100 g/m2 of fertilizer).
In this book, the causal effect is defined to be the effect on an outcome of a
given action or treatment, as measured in an ideal randomized controlled experiment. In such an experiment, the only systematic reason for differences in outcomes between the treatment and control groups is the treatment itself.
It is possible to imagine an ideal randomized controlled experiment to answer
each of the first three questions in Section 1.1. For example, to study class size,
one can imagine randomly assigning “treatments” of different class sizes to different groups of students. If the experiment is designed and executed so that the only
systematic difference between the groups of students is their class size, then in
1.3 Data: Sources and Types
7
theory this experiment would estimate the effect on test scores of reducing class
size, holding all else constant.
The concept of an ideal randomized controlled experiment is useful because
it gives a definition of a causal effect. In practice, however, it is not possible to
perform ideal experiments. In fact, experiments are relatively rare in econometrics because often they are unethical, impossible to execute satisfactorily, or prohibitively expensive. The concept of the ideal randomized controlled experiment
does, however, provide a theoretical benchmark for an econometric analysis of
causal effects using actual data.
Forecasting and Causality
Although the first three questions in Section 1.1 concern causal effects, the
fourth—forecasting the growth rate of GDP—does not. You do not need to know
a causal relationship to make a good forecast. A good way to “forecast” whether
it is raining is to observe whether pedestrians are using umbrellas, but the act of
using an umbrella does not cause it to rain.
Even though forecasting need not involve causal relationships, economic
theory suggests patterns and relationships that might be useful for forecasting. As
we see in Chapter 14, multiple regression analysis allows us to quantify historical
relationships suggested by economic theory, to check whether those relationships
have been stable over time, to make quantitative forecasts about the future, and
to assess the accuracy of those forecasts.
1.3
Data: Sources and Types
In econometrics, data come from one of two sources: experiments or nonexperimental observations of the world. This book examines both experimental and
nonexperimental data sets.
Experimental Versus Observational Data
Experimental data come from experiments designed to evaluate a treatment or
policy or to investigate a causal effect. For example, the state of Tennessee
financed a large randomized controlled experiment examining class size in the
1980s. In that experiment, which we examine in Chapter 13, thousands of students
were randomly assigned to classes of different sizes for several years and were
given standardized tests annually.
8
Chapter 1 Economic Questions and Data
The Tennessee class size experiment cost millions of dollars and required the
ongoing cooperation of many administrators, parents, and teachers over several
years. Because real-world experiments with human subjects are difficult to administer and to control, they have flaws relative to ideal randomized controlled experiments. Moreover, in some circumstances, experiments are not only expensive and
difficult to administer but also unethical. (Would it be ethical to offer randomly
selected teenagers inexpensive cigarettes to see how many they buy?) Because of
these financial, practical, and ethical problems, experiments in economics are
relatively rare. Instead, most economic data are obtained by observing real-world
behavior.
Data obtained by observing actual behavior outside an experimental setting
are called observational data. Observational data are collected using surveys, such
as telephone surveys of consumers, and administrative records, such as historical
records on mortgage applications maintained by lending institutions.
Observational data pose major challenges to econometric attempts to estimate causal effects, and the tools of econometrics are designed to tackle these
challenges. In the real world, levels of “treatment” (the amount of fertilizer in the
tomato example, the student–teacher ratio in the class size example) are not
assigned at random, so it is difficult to sort out the effect of the “treatment” from
other relevant factors. Much of econometrics, and much of this book, is devoted
to methods for meeting the challenges encountered when real-world data are used
to estimate causal effects.
Whether the data are experimental or observational, data sets come in three
main types: cross-sectional data, time series data, and panel data. In this book, you
will encounter all three types.
Cross-Sectional Data
Data on different entities—workers, consumers, firms, governmental units, and
so forth—for a single time period are called cross-sectional data. For example, the
data on test scores in California school districts are cross sectional. Those data are
for 420 entities (school districts) for a single time period (1999). In general, the
number of entities on which we have observations is denoted n; so, for example,
in the California data set, n = 420.
The California test score data set contains measurements of several different
variables for each district. Some of these data are tabulated in Table 1.1. Each row
lists data for a different district. For example, the average test score for the first
district (“district #1”) is 690.8; this is the average of the math and science test scores
for all fifth graders in that district in 1999 on a standardized test (the Stanford
1.3 Data: Sources and Types
TABLE 1.1
Selected Observations on Test Scores and Other Variables for California School
Districts in 1999
Observation (District)
District Average
Number
Test Score (fifth grade)
Student–Teacher
Ratio
Expenditure per
Pupil ($)
Percentage of Students
Learning English
1
690.8
17.89
$6385
2
661.2
21.52
5099
4.6
3
643.6
18.70
5502
30.0
4
647.7
17.36
7102
0.0
5
640.8
18.67
5236
13.9
.
.
.
.
.
.
.
.
.
.
.
.
9
.
.
.
0.0%
418
645.0
21.89
4403
24.3
419
672.2
20.20
4776
3.0
420
655.8
19.04
5993
5.0
Note: The California test score data set is described in Appendix 4.1.
Achievement Test). The average student–teacher ratio in that district is 17.89; that
is, the number of students in district #1 divided by the number of classroom teachers
in district #1 is 17.89. Average expenditure per pupil in district #1 is $6385. The
percentage of students in that district still learning English—that is, the percentage
of students for whom English is a second language and who are not yet proficient
in English—is 0%.
The remaining rows present data for other districts. The order of the rows is
arbitrary, and the number of the district, which is called the observation number,
is an arbitrarily assigned number that organizes the data. As you can see in the
table, all the variables listed vary considerably.
With cross-sectional data, we can learn about relationships among variables
by studying differences across people, firms, or other economic entities during a
single time period.
Time Series Data
Time series data are data for a single entity (person, firm, country) collected at
multiple time periods. Our data set on the growth rate of GDP and the term
spread in the United States is an example of a time series data set. The data set
10
Chapter 1 Economic Questions and Data
TABLE 1.2
Selected Observations on the Growth Rate of GDP and the Term Spread in the United
States: Quarterly Data, 1960:Q1–2013:Q1
Observation
Number
Date
(year:quarter)
GDP Growth Rate
(% at an annual rate)
1
1960:Q1
2
1960:Q2
−1.5
1.3
3
1960:Q3
1.0
1.5
4
1960:Q4
−4.9
1.6
5
1961:Q1
2.7
1.4
.
.
.
.
.
.
.
.
.
.
.
.
211
2012:Q3
2.7
1.5
212
2012:Q4
0.1
1.6
213
2013:Q1
1.1
1.9
8.8%
Term Spread
(% per year)
0.6%
Note: The United States GDP and term spread data set is described in Appendix 14.1.
contains observations on two variables (the growth rate of GDP and the term
spread) for a single entity (the United States) for 213 time periods. Each time
period in this data set is a quarter of a year (the first quarter is January, February, and March; the second quarter is April, May, and June; and so forth). The
observations in this data set begin in the first quarter of 1960, which is denoted
1960:Q1, and end in the first quarter of 2013 (2013:Q1). The number of observations (that is, time periods) in a time series data set is denoted T. Because
there are 213 quarters from 1960:Q1 to 2013:Q1, this data set contains T = 213
observations.
Some observations in this data set are listed in Table 1.2. The data in each row
correspond to a different time period (year and quarter). In the first quarter of
1960, for example, GDP grew 8.8% at an annual rate. In other words, if GDP
had continued growing for four quarters at its rate during the first quarter of 1960,
the level of GDP would have increased by 8.8%. In the first quarter of 1960, the
long-term interest rate was 4.5%, the short-term interest rate was 3.9%, so their
difference, the term spread, was 0.6%.
By tracking a single entity over time, time series data can be used to study the
evolution of variables over time and to forecast future values of those variables.
1.3 Data: Sources and Types
TABLE 1.3
11
Selected Observations on Cigarette Sales, Prices, and Taxes, by State and Year for U.S.
States, 1985–1995
Year
Cigarette Sales
(packs per capita)
Average Price
per Pack
(including taxes)
Total Taxes
(cigarette
excise tax + sales tax)
Alabama
1985
116.5
$1.022
$0.333
2
Arkansas
1985
128.5
1.015
0.370
3
Arizona
1985
104.5
1.086
0.362
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
47
West Virginia
1985
112.8
1.089
0.382
48
Wyoming
1985
129.4
0.935
0.240
49
Alabama
1986
117.2
1.080
0.334
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
96
Wyoming
1986
127.8
1.007
0.240
97
Alabama
1987
115.8
1.135
0.335
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
528
Wyoming
1995
112.2
1.585
0.360
Observation
Number
State
1
Note: The cigarette consumption data set is described in Appendix 12.1.
Panel Data
Panel data, also called longitudinal data, are data for multiple entities in which
each entity is observed at two or more time periods. Our data on cigarette consumption and prices are an example of a panel data set, and selected variables and
observations in that data set are listed in Table 1.3. The number of entities in a
panel data set is denoted n, and the number of time periods is denoted T. In the
cigarette data set, we have observations on n = 48 continental U.S. states
(entities) for T = 11 years (time periods) from 1985 to 1995. Thus there is a total
of n * T = 48 * 11 = 528 observations.
12
Chapter 1 Economic Questions and Data
Key Concept
1.1
Cross-Sectional, Time Series, and Panel Data
• Cross-sectional data consist of multiple entities observed at a single time
period.
• Time series data consist of a single entity observed at multiple time periods.
• Panel data (also known as longitudinal data) consist of multiple entities,
where each entity is observed at two or more time periods.
Some data from the cigarette consumption data set are listed in Table 1.3. The
first block of 48 observations lists the data for each state in 1985, organized alphabetica...
Purchase answer to see full
attachment