quantitative finance, Eviews,target

User Generated

NaavrNaqGvooref

Mathematics

Description

what kind of factors influence the sales of target, collect related data and analyze.

we must use the method of Eviews and some academic methods on book.

Unformatted Attachment Preview

Research Paper Instructions AD 685 Structure – word count • Abstract – 150 words • Introduction – 300 words • Literature review – 600 words • Methods & Data Collections – 250 words • Interpretation and Discussion – 250 words • Conclusion – 300 words • MAX: 2000 words Structure – In detail • Abstract – Extremely short summary of your entire paper • Introduction – Summary of your entire paper, why this topic, why is it interesting, why is it important, what have the previous researchers done on the topic, what did you do, what more can be done • Literature review – What previous researchers have done on this topic, areas of agreement, disagreement, • Methods & Data Collections – How did you collect the data, what type of analysis did you do, report summary statistics, regression results, scatter plots, diagrams, and anything else that you think is important • Interpretation and Discussion – Interpret your results, discuss how this matches up with previous research on this topic • Conclusion – Summarize the paper and say what else can you do on this topic, what do you know now that you did not know before • MAX: 2000 words Resources: • APA Style Guide: https://owl.purdue.edu/owl/research_and_citation/apa_style/apa_fo rmatting_and_style_guide/apa_sample_paper.html • Writing Tips: http://www.tulane.edu/~bfleury/termpaper.html • Boston University Writing Center: http://www.bu.edu/writingprogram/the-writing-center/ • Boston University Writing and Reference Guide: http://library.bu.edu/citeys Introduction to Econometrics The Pearson Series in Economics Abel/Bernanke/Croushore Macroeconomics* Bade/Parkin Foundations of Economics* Berck/Helfand The Economics of the Environment Bierman/Fernandez Game Theory with Economic Applications Blanchard Macroeconomics* Blau/Ferber/Winkler The Economics of Women, Men, and Work Boardman/Greenberg/Vining/Weimer Cost-Benefit Analysis Boyer Principles of Transportation Economics Branson Macroeconomic Theory and Policy Bruce Public Finance and the American Economy Carlton/Perloff Modern Industrial Organization Case/Fair/Oster Principles of Economics* Chapman Environmental Economics: Theory, Application, and Policy Cooter/Ulen Law & Economics Daniels/VanHoose International Monetary & Financial ­Economics Downs An Economic Theory of Democracy Ehrenberg/Smith Modern Labor Economics Farnham Economics for Managers Folland/Goodman/Stano The Economics of Health and Health Care Fort Sports Economics Froyen Macroeconomics Fusfeld The Age of the Economist Gerber International Economics* González-Rivera Forecasting for Economics and Business Gordon Macroeconomics* Greene Econometric Analysis *denotes Gregory Essentials of Economics Gregory/Stuart Russian and Soviet Economic Performance and Structure Hartwick/Olewiler The Economics of Natural Resource Use Heilbroner/Milberg The Making of the Economic Society Heyne/Boettke/Prychitko The Economic Way of Thinking Holt Markets, Games, and Strategic Behavior Hubbard/O’Brien Economics* Money, Banking, and the Financial System* Hubbard/O’Brien/Rafferty Macroeconomics* Hughes/Cain American Economic History Husted/Melvin International Economics Jehle/Reny Advanced Microeconomic Theory Johnson-Lans A Health Economics Primer Keat/Young/Erfle Managerial Economics Klein Mathematical Methods for Economics Krugman/Obstfeld/Melitz International Economics: Theory & Policy* Laidler The Demand for Money Leeds/von Allmen The Economics of Sports Leeds/von Allmen/Schiming Economics* Lynn Economic Development: Theory and ­Practice for a Divided World Miller Economics Today* Understanding Modern Economics Miller/Benjamin The Economics of Macro Issues Miller/Benjamin/North The Economics of Public Issues Mills/Hamilton Urban Economics Mishkin The Economics of Money, Banking, and Financial Markets* The Economics of Money, Banking, and Financial Markets, Business School Edition* Macroeconomics: Policy and Practice* MyEconLab titles. Visit www.myeconlab.com to learn more. Murray Econometrics: A Modern Introduction O’Sullivan/Sheffrin/Perez Economics: Principles, Applications, and Tools* Parkin Economics* Perloff Microeconomics* Microeconomics: Theory and Applications with Calculus* Perloff/Brander Managerial Economics and Strategy* Phelps Health Economics Pindyck/Rubinfeld Microeconomics* Riddell/Shackelford/ Stamos/Schneider Economics: A Tool for Critically ­Understanding Society Roberts The Choice: A Fable of Free Trade and Protection Rohlf Introduction to Economic Reasoning Roland Development Economics Scherer Industry Structure, Strategy, and Public Policy Schiller The Economics of Poverty and ­Discrimination Sherman Market Regulation Stock/Watson Introduction to Econometrics Studenmund Using Econometrics: A Practical Guide Tietenberg/Lewis Environmental and Natural Resource Economics Environmental Economics and Policy Todaro/Smith Economic Development Waldman/Jensen Industrial Organization: Theory and Practice Walters/Walters/Appel/ ­Callahan/Centanni/ Maex/O’Neill Econversations: Today’s Students Discuss Today’s Issues Weil Economic Growth Williamson Macroeconomics Introduction to Econometrics T h i r d E d i t i o n U p d a te James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Hoboken Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montréal Toronto Delhi Mexico City São Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo Vice President, Product Management: Donna Battista Acquisitions Editor: Christina Masturzo Editorial Assistant: Christine Mallon Vice President, Marketing: Maggie Moylan Director, Strategy and Marketing: Scott Dustan Manager, Field Marketing: Leigh Ann Sims Product Marketing Manager: Alison Haskins Executive Field Marketing Manager: Lori DeShazo Senior Strategic Marketing Manager: Erin Gardner Team Lead, Program Management: Ashley Santora Program Manager: Carolyn Philips Team Lead, Project Management: Jeff Holcomb Project Manager: Liz Napolitano Operations Specialist: Carol Melville Cover Designer: Jon Boylan Cover Art: Courtesy of Carolin Pflueger and the authors. Full-Service Project Management, Design, and Electronic Composition: Cenveo® Publisher Services Printer/Binder: Edwards Brothers Malloy Cover Printer: Lehigh-Phoenix Color/Hagerstown Text Font: 10/14 Times Ten Roman About the cover: The cover shows a heat chart of 270 monthly variables measuring different aspects of employment, production, income, and sales for the United States, 1974–2010. Each horizontal line depicts a different variable, and the horizontal axis is the date. Strong monthly increases in a variable are blue and sharp monthly declines are red. The simultaneous declines in many of these measures during recessions appear in the figure as vertical red bands. Credits and acknowledgments borrowed from other sources and reproduced, with permission, in this textbook appear on appropriate page within text. Photo Credits: page 410 left: Henrik Montgomery/Pressens Bild/AP Photo; page 410 right: Paul Sakuma/AP Photo; page 428 left: Courtesy of Allison Harris; page 428 right: Courtesy of Allison Harris; page 669 top left: John McCombe/AP Photo; bottom left: New York University/AFP/Newscom; top right: Denise Applewhite/Princeton University/AP Photo; bottom right: Courtesy of the University of Chicago/AP Photo. Copyright © 2015, 2011, 2007 Pearson Education, Inc. All rights reserved. Manufactured in the United States of America. This publication is protected by Copyright, and permission should be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic, mechanical, photocopying, recording, or likewise. To obtain permission(s) to use material from this work, please submit a written request to Pearson Education, Inc., Permissions Department, 221 River Street, Hoboken, New Jersey 07030. Many of the designations by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trademark claim, the designations have been printed in initial caps or call caps. Library of Congress Cataloging-in-Publication Data Stock, James H. Introduction to econometrics/James H. Stock, Harvard University, Mark W. Watson, Princeton University.— Third edition update.    pages cm.—(The Pearson series in economics) Includes bibliographical references and index. ISBN 978-0-13-348687-2—ISBN 0-13-348687-7 1. Econometrics. I. Watson, Mark W. II. Title. HB139.S765 2015 330.01’5195––dc23               2014018465  www.pearsonhighered.com ISBN-10: 0-13-348687-7 ISBN-13: 978-0-13-348687-2 Brief Contents PART ONE Introduction and Review Chapter 1 Chapter 3 Economic Questions and Data  1 Review of Probability  14 Review of Statistics  65 Part Two Fundamentals of Regression Analysis Chapter 4 Chapter 9 Linear Regression with One Regressor  109 Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals  146 Linear Regression with Multiple Regressors  182 Hypothesis Tests and Confidence Intervals in Multiple Regression  217 Nonlinear Regression Functions  256 Assessing Studies Based on Multiple Regression  315 Part Three Further Topics in Regression Analysis Chapter 10 Regression with Panel Data  350 Regression with a Binary Dependent Variable  385 Instrumental Variables Regression  424 Experiments and Quasi-Experiments  475 Chapter 2 Chapter 5 Chapter 6 Chapter 7 Chapter 8 Chapter 11 Chapter 12 Chapter 13 Part Four Regression Analysis of Economic Time Series Data Chapter 16 Introduction to Time Series Regression and Forecasting  522 Estimation of Dynamic Causal Effects  589 Additional Topics in Time Series Regression  638 Part Five The Econometric Theory of Regression Analysis Chapter 17 The Theory of Linear Regression with One Regressor  676 The Theory of Multiple Regression  705 Chapter 14 Chapter 15 Chapter 18 v This page intentionally left blank Contents Preface xxix Part One Introduction and Review Chapter 1 Economic Questions and Data 1 1.1 Economic Questions We Examine 1 Question #1: Does Reducing Class Size Improve Elementary School Education? 2 Question #2: Is There Racial Discrimination in the Market for Home Loans? 3 Question #3: How Much Do Cigarette Taxes Reduce Smoking? 3 Question #4: By How Much Will U.S. GDP Grow Next Year?   4 Quantitative Questions, Quantitative Answers 5 1.2 Causal Effects and Idealized Experiments 5 Estimation of Causal Effects 6 Forecasting and Causality 7 1.3 Data: Sources and Types 7 Experimental Versus Observational Data Cross-Sectional Data 8 Time Series Data 9 Panel Data 11 Chapter 2 Review of Probability 7 14 2.1 Random Variables and Probability Distributions 15 Probabilities, the Sample Space, and Random Variables 15 Probability Distribution of a Discrete Random Variable 16 Probability Distribution of a Continuous Random Variable 19 2.2 Expected Values, Mean, and Variance 19 The Expected Value of a Random Variable 19 The Standard Deviation and Variance 21 Mean and Variance of a Linear Function of a Random Variable Other Measures of the Shape of a Distribution 23 2.3 Two Random Variables 22 26 Joint and Marginal Distributions 26 vii viii Contents Conditional Distributions 27 Independence 31 Covariance and Correlation 31 The Mean and Variance of Sums of Random Variables 32 2.4 The Normal, Chi-Squared, Student t, and F Distributions 36 The Normal Distribution 36 The Chi-Squared Distribution 41 The Student t Distribution 41 The F Distribution 42 2.5 Random Sampling and the Distribution of the Sample Average Random Sampling 43 The Sampling Distribution of the Sample Average 2.6 44 Large-Sample Approximations to Sampling Distributions The Law of Large Numbers and Consistency The Central Limit Theorem 50 Review of Statistics 63 65 3.1 Estimation of the Population Mean Estimators and Their Properties 66 Properties of Y 68 The Importance of Random Sampling 3.2 47 48 Appendix 2.1 Derivation of Results in Key Concept 2.3 Chapter 3 43 66 70 Hypothesis Tests Concerning the Population Mean 71 Null and Alternative Hypotheses 71 The p-Value 72 Calculating the p-Value When sY Is Known 73 The Sample Variance, Sample Standard Deviation, and Standard Error Calculating the p-Value When sY Is Unknown 76 The t-Statistic 76 Hypothesis Testing with a Prespecified Significance Level 77 One-Sided Alternatives 79 3.3 Confidence Intervals for the Population Mean 80 3.4 Comparing Means from Different Populations 82 Hypothesis Tests for the Difference Between Two Means 82 Confidence Intervals for the Difference Between Two Population Means 74 84 Contents 3.5 ix Differences-of-Means Estimation of Causal Effects Using Experimental Data 84 The Causal Effect as a Difference of Conditional Expectations 85 Estimation of the Causal Effect Using Differences of Means 85 3.6 Using the t-Statistic When the Sample Size Is Small 87 The t-Statistic and the Student t Distribution 87 Use of the Student t Distribution in Practice 89 3.7 Scatterplots, the Sample Covariance, and the Sample Correlation 91 Scatterplots 91 Sample Covariance and Correlation 92 Appendix 3.1 The U.S. Current Population Survey 106 Appendix 3.2 Two Proofs That Y Is the Least Squares Estimator of μY Appendix 3.3 A Proof That the Sample Variance Is Consistent Part Two Fundamentals of Regression Analysis Chapter 4 Linear Regression with One Regressor 109 4.1 The Linear Regression Model 107 108 109 4.2 Estimating the Coefficients of the Linear Regression Model 114 The Ordinary Least Squares Estimator 116 OLS Estimates of the Relationship Between Test Scores and the Student– Teacher Ratio 118 Why Use the OLS Estimator? 119 4.3 Measures of Fit 121 R2 The 121 The Standard Error of the Regression 122 Application to the Test Score Data 123 4.4 The Least Squares Assumptions 124 Assumption #1: The Conditional Distribution of ui Given Xi Has a Mean of Zero Assumption #2: (Xi, Yi), i = 1,…, n, Are Independently and Identically Distributed 126 Assumption #3: Large Outliers Are Unlikely 127 Use of the Least Squares Assumptions 128 124 x Contents 4.5 Sampling Distribution of the OLS Estimators The Sampling Distribution of the OLS Estimators 4.6 Conclusion 129 130 133 Appendix 4.1 The California Test Score Data Set Appendix 4.2 Derivation of the OLS Estimators 141 141 Appendix 4.3 Sampling Distribution of the OLS Estimator Chapter 5 142 Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals 146 5.1 Testing Hypotheses About One of the Regression Coefficients 146 Two-Sided Hypotheses Concerning β1 147 One-Sided Hypotheses Concerning β1 150 Testing Hypotheses About the Intercept β0 152 5.2 Confidence Intervals for a Regression Coefficient 5.3 Regression When X Is a Binary Variable Interpretation of the Regression Coefficients 5.4 153 155 155 Heteroskedasticity and Homoskedasticity 157 What Are Heteroskedasticity and Homoskedasticity? 158 Mathematical Implications of Homoskedasticity 160 What Does This Mean in Practice? 161 5.5 The Theoretical Foundations of Ordinary Least Squares Linear Conditionally Unbiased Estimators and the Gauss–Markov Theorem 164 Regression Estimators Other Than OLS 165 5.6 Using the t-Statistic in Regression When the Sample Size Is Small 166 The t-Statistic and the Student t Distribution 166 Use of the Student t Distribution in Practice 167 5.7 Conclusion 168 Appendix 5.1 Formulas for OLS Standard Errors 177 Appendix 5.2 The Gauss–Markov Conditions and a Proof of the Gauss–Markov Theorem 178 163 Contents Chapter 6 6.1 Linear Regression with Multiple Regressors 182 Omitted Variable Bias 182 Definition of Omitted Variable Bias 183 A Formula for Omitted Variable Bias 185 Addressing Omitted Variable Bias by Dividing the Data into Groups 187 6.2 The Multiple Regression Model 189 The Population Regression Line 189 The Population Multiple Regression Model 190 6.3 The OLS Estimator in Multiple Regression 192 The OLS Estimator 193 Application to Test Scores and the Student–Teacher Ratio 6.4 Measures of Fit in Multiple Regression The Standard Error of the Regression (SER) The R2 196 The “Adjusted R2” 197 Application to Test Scores 198 194 196 196 6.5 The Least Squares Assumptions in Multiple Regression 199 Assumption #1: The Conditional Distribution of ui Given X1i, X2i, c, Xki Has a Mean of Zero 199 Assumption #2: (X1i, X2i, c, Xki, Yi), i = 1, c, n, Are i.i.d. 199 Assumption #3: Large Outliers Are Unlikely 199 Assumption #4: No Perfect Multicollinearity 200 6.6 The Distribution of the OLS Estimators in Multiple Regression 201 6.7 Multicollinearity 202 Examples of Perfect Multicollinearity Imperfect Multicollinearity 205 6.8 Conclusion 203 206 Appendix 6.1 Derivation of Equation (6.1) 214 Appendix 6.2 Distribution of the OLS Estimators When There Are Two Regressors and Homoskedastic Errors Appendix 6.3 The Frisch–Waugh Theorem 214 215 xi xii Contents Chapter 7 7.1 Hypothesis Tests and Confidence Intervals in Multiple Regression 217 Hypothesis Tests and Confidence Intervals for a Single Coefficient 217 Standard Errors for the OLS Estimators 217 Hypothesis Tests for a Single Coefficient 218 Confidence Intervals for a Single Coefficient 219 Application to Test Scores and the Student–Teacher Ratio 7.2 Tests of Joint Hypotheses 220 222 Testing Hypotheses on Two or More Coefficients 222 The F-Statistic 224 Application to Test Scores and the Student–Teacher Ratio The Homoskedasticity-Only F-Statistic 227 226 7.3 Testing Single Restrictions Involving Multiple Coefficients 7.4 Confidence Sets for Multiple Coefficients 7.5 Model Specification for Multiple Regression 229 231 232 Omitted Variable Bias in Multiple Regression 233 The Role of Control Variables in Multiple Regression 234 Model Specification in Theory and in Practice 236 Interpreting the R2 and the Adjusted R2 in Practice 237 7.6 Analysis of the Test Score Data Set 7.7 Conclusion 238 243 Appendix 7.1 The Bonferroni Test of a Joint Hypothesis Appendix 7.2 Conditional Mean Independence Chapter 8 251 253 Nonlinear Regression Functions 256 8.1 A General Strategy for Modeling Nonlinear Regression Functions 258 Test Scores and District Income 258 The Effect on Y of a Change in X in Nonlinear Specifications 261 A General Approach to Modeling Nonlinearities Using Multiple Regression 8.2 Nonlinear Functions of a Single Independent Variable 266 266 Polynomials 267 Logarithms 269 Polynomial and Logarithmic Models of Test Scores and District Income 277 xiii Contents 8.3 Interactions Between Independent Variables 278 Interactions Between Two Binary Variables 279 Interactions Between a Continuous and a Binary Variable Interactions Between Two Continuous Variables 286 8.4 Nonlinear Effects on Test Scores of the Student–Teacher Ratio Discussion of Regression Results Summary of Findings 297 8.5 282 Conclusion 293 293 298 Appendix 8.1 Regression Functions That Are Nonlinear in the Parameters 309 Appendix 8.2 Slopes and Elasticities for Nonlinear Regression Functions Chapter 9 9.1 313 Assessing Studies Based on Multiple Regression 315 Internal and External Validity 315 Threats to Internal Validity 316 Threats to External Validity 317 9.2 Threats to Internal Validity of Multiple Regression Analysis Omitted Variable Bias 319 Misspecification of the Functional Form of the Regression Function Measurement Error and Errors-in-Variables Bias 322 Missing Data and Sample Selection 325 Simultaneous Causality 326 Sources of Inconsistency of OLS Standard Errors 329 9.3 321 Internal and External Validity When the Regression Is Used for Forecasting 331 Using Regression Models for Forecasting 331 Assessing the Validity of Regression Models for Forecasting 9.4 Example: Test Scores and Class Size External Validity 332 Internal Validity 339 Discussion and Implications 9.5 319 Conclusion 332 332 341 342 Appendix 9.1 The Massachusetts Elementary School Testing Data 349 xiv Contents Part Three Further Topics in Regression Analysis Chapter 10 Regression with Panel Data 350 10.1 Panel Data 351 Example: Traffic Deaths and Alcohol Taxes 352 10.2 Panel Data with Two Time Periods: “Before and After” Comparisons 354 10.3 Fixed Effects Regression 357 The Fixed Effects Regression Model Estimation and Inference 359 Application to Traffic Deaths 361 357 10.4 Regression with Time Fixed Effects Time Effects Only 362 Both Entity and Time Fixed Effects 361 363 10.5 The Fixed Effects Regression Assumptions and Standard Errors for Fixed Effects Regression 365 The Fixed Effects Regression Assumptions 365 Standard Errors for Fixed Effects Regression 367 10.6 Drunk Driving Laws and Traffic Deaths 10.7 Conclusion 368 372 Appendix 10.1 The State Traffic Fatality Data Set 380 Appendix 10.2 Standard Errors for Fixed Effects Regression Chapter 11 11.1 380 Regression with a Binary Dependent Variable 385 Binary Dependent Variables and the Linear Probability Model Binary Dependent Variables 386 The Linear Probability Model 388 11.2 Probit and Logit Regression 391 Probit Regression 391 Logit Regression 396 Comparing the Linear Probability, Probit, and Logit Models 398 11.3 Estimation and Inference in the Logit and Probit Models Nonlinear Least Squares Estimation 399 398 386 Contents Maximum Likelihood Estimation Measures of Fit 401 400 11.4 Application to the Boston HMDA Data 11.5 Conclusion 402 409 Appendix 11.1 The Boston HMDA Data Set 418 Appendix 11.2 Maximum Likelihood Estimation 418 Appendix 11.3 Other Limited Dependent Variable Models Chapter 12 421 Instrumental Variables Regression 424 12.1 The IV Estimator with a Single Regressor and a Single Instrument 425 The IV Model and Assumptions 425 The Two Stage Least Squares Estimator 426 Why Does IV Regression Work? 427 The Sampling Distribution of the TSLS Estimator Application to the Demand for Cigarettes 433 12.2 The General IV Regression Model 431 435 TSLS in the General IV Model 437 Instrument Relevance and Exogeneity in the General IV Model 438 The IV Regression Assumptions and Sampling Distribution of the TSLS Estimator 439 Inference Using the TSLS Estimator 440 Application to the Demand for Cigarettes 441 12.3 Checking Instrument Validity 442 Assumption #1: Instrument Relevance 443 Assumption #2: Instrument Exogeneity 445 12.4 Application to the Demand for Cigarettes 448 12.5 453 Where Do Valid Instruments Come From? Three Examples 12.6 Conclusion 454 458 Appendix 12.1 The Cigarette Consumption Panel Data Set 467 Appendix 12.2 Derivation of the Formula for the TSLS Estimator in Equation (12.4) 467 xv xvi Contents Appendix 12.3 Large-Sample Distribution of the TSLS Estimator 468 Appendix 12.4 Large-Sample Distribution of the TSLS Estimator When the Instrument Is Not Valid 469 Appendix 12.5 Instrumental Variables Analysis with Weak Instruments 471 Appendix 12.6 TSLS with Control Variables Chapter 13 13.1 473 Experiments and Quasi-Experiments 475 Potential Outcomes, Causal Effects, and Idealized Experiments 476 Potential Outcomes and the Average Causal Effect 476 Econometric Methods for Analyzing Experimental Data 478 13.2 Threats to Validity of Experiments 479 Threats to Internal Validity 479 Threats to External Validity 483 13.3 Experimental Estimates of the Effect of Class Size Reductions 484 Experimental Design 485 Analysis of the STAR Data 486 Comparison of the Observational and Experimental Estimates of Class Size Effects 491 13.4 Quasi-Experiments 493 Examples 494 The Differences-in-Differences Estimator 496 Instrumental Variables Estimators 499 Regression Discontinuity Estimators 500 13.5 Potential Problems with Quasi-Experiments 502 Threats to Internal Validity 502 Threats to External Validity 504 13.6 Experimental and Quasi-Experimental Estimates in Heterogeneous Populations 504 OLS with Heterogeneous Causal Effects 505 IV Regression with Heterogeneous Causal Effects 506 Contents 13.7 Conclusion xvii 509 Appendix 13.1 The Project STAR Data Set 518 Appendix 13.2 IV Estimation When the Causal Effect Varies Across Individuals 518 Appendix 13.3 The Potential Outcomes Framework for Analyzing Data from Experiments 520 Part Four Regression Analysis of Economic Time Series Data Chapter 14 Introduction to Time Series Regression and Forecasting 522 14.1 Using Regression Models for Forecasting 523 14.2 Introduction to Time Series Data and Serial Correlation Real GDP in the United States 524 Lags, First Differences, Logarithms, and Growth Rates Autocorrelation 528 Other Examples of Economic Time Series 529 14.3 Autoregressions 524 525 531 The First-Order Autoregressive Model 531 The pth-Order Autoregressive Model 534 14.4 Time Series Regression with Additional Predictors and the Autoregressive Distributed Lag Model 537 Forecasting GDP Growth Using the Term Spread 537 Stationarity 540 Time Series Regression with Multiple Predictors 541 Forecast Uncertainty and Forecast Intervals 544 14.5 Lag Length Selection Using Information Criteria 547 Determining the Order of an Autoregression 547 Lag Length Selection in Time Series Regression with Multiple Predictors 14.6 Nonstationarity I: Trends 551 What Is a Trend? 551 Problems Caused by Stochastic Trends 554 Detecting Stochastic Trends: Testing for a Unit AR Root 556 Avoiding the Problems Caused by Stochastic Trends 561 550 xviii Contents 14.7 Nonstationarity II: Breaks 561 What Is a Break? 562 Testing for Breaks 562 Pseudo Out-of-Sample Forecasting 567 Avoiding the Problems Caused by Breaks 573 14.8 Conclusion 573 Appendix 14.1 Time Series Data Used in Chapter 14 Appendix 14.2 Stationarity in the AR(1) Model Appendix 14.3 Lag Operator Notation Appendix 14.4 ARMA Models 583 584 585 586 Appendix 14.5 Consistency of the BIC Lag Length Estimator Chapter 15 Estimation of Dynamic Causal Effects 589 15.1 An Initial Taste of the Orange Juice Data 15.2 587 Dynamic Causal Effects 590 593 Causal Effects and Time Series Data Two Types of Exogeneity 596 593 15.3 Estimation of Dynamic Causal Effects with Exogenous Regressors 597 The Distributed Lag Model Assumptions 598 Autocorrelated ut, Standard Errors, and Inference 599 Dynamic Multipliers and Cumulative Dynamic Multipliers 15.4 600 Heteroskedasticity- and Autocorrelation-Consistent Standard Errors 601 Distribution of the OLS Estimator with Autocorrelated Errors HAC Standard Errors 604 602 15.5 Estimation of Dynamic Causal Effects with Strictly Exogenous Regressors 606 The Distributed Lag Model with AR(1) Errors 607 OLS Estimation of the ADL Model 610 GLS Estimation 611 The Distributed Lag Model with Additional Lags and AR(p) Errors 15.6 Orange Juice Prices and Cold Weather 616 613 Contents 15.7 Is Exogeneity Plausible? Some Examples U.S. Income and Australian Exports 624 Oil Prices and Inflation 625 Monetary Policy and Inflation 626 The Growth Rate of GDP and the Term Spread 15.8 Conclusion 624 626 627 Appendix 15.1 The Orange Juice Data Set 634 Appendix 15.2 The ADL Model and Generalized Least Squares in Lag Operator Notation Chapter 16 16.1 634 Additional Topics in Time Series Regression 638 Vector Autoregressions 638 The VAR Model 639 A VAR Model of the Growth Rate of GDP and the Term Spread 16.2 Multiperiod Forecasts 642 643 Iterated Multiperiod Forecasts 643 Direct Multiperiod Forecasts 645 Which Method Should You Use? 648 16.3 Orders of Integration and the DF-GLS Unit Root Test Other Models of Trends and Orders of Integration 649 The DF-GLS Test for a Unit Root 651 Why Do Unit Root Tests Have Nonnormal Distributions? 16.4 Cointegration 654 656 Cointegration and Error Correction 656 How Can You Tell Whether Two Variables Are Cointegrated? Estimation of Cointegrating Coefficients 659 Extension to Multiple Cointegrated Variables 661 Application to Interest Rates 662 16.5 Volatility Clustering and Autoregressive Conditional Heteroskedasticity 664 Volatility Clustering 664 Autoregressive Conditional Heteroskedasticity Application to Stock Price Volatility 667 16.6 Conclusion 649 670 666 658 xix xx Contents Part Five The Econometric Theory of Regression Analysis Chapter 17 The Theory of Linear Regression with One Regressor 676 17.1 The Extended Least Squares Assumptions and the OLS Estimator 677 The Extended Least Squares Assumptions The OLS Estimator 679 17.2 677 Fundamentals of Asymptotic Distribution Theory 679 Convergence in Probability and the Law of Large Numbers 680 The Central Limit Theorem and Convergence in Distribution 682 Slutsky’s Theorem and the Continuous Mapping Theorem 683 Application to the t-Statistic Based on the Sample Mean 684 17.3 Asymptotic Distribution of the OLS Estimator and t-Statistic 685 Consistency and Asymptotic Normality of the OLS Estimators 685 Consistency of Heteroskedasticity-Robust Standard Errors 685 Asymptotic Normality of the Heteroskedasticity-Robust t-Statistic 687 17.4 Exact Sampling Distributions When the Errors Are Normally Distributed 687 Distribution of βn1 with Normal Errors 687 Distribution of the Homoskedasticity-Only t-Statistic 17.5 Weighted Least Squares 689 690 WLS with Known Heteroskedasticity 690 WLS with Heteroskedasticity of Known Functional Form 691 Heteroskedasticity-Robust Standard Errors or WLS? 694 Appendix 17.1 The Normal and Related Distributions and Moments of Continuous Random Variables Appendix 17.2 Two Inequalities Chapter 18 700 703 The Theory of Multiple Regression 705 18.1 The Linear Multiple Regression Model and OLS Estimator in Matrix Form 706 The Multiple Regression Model in Matrix Notation The Extended Least Squares Assumptions 708 The OLS Estimator 709 706 xxi Contents 18.2 Asymptotic Distribution of the OLS Estimator and t-Statistic 710 The Multivariate Central Limit Theorem 710 Asymptotic Normality of bn 711 Heteroskedasticity-Robust Standard Errors 712 Confidence Intervals for Predicted Effects 713 Asymptotic Distribution of the t-Statistic 713 18.3 Tests of Joint Hypotheses 713 Joint Hypotheses in Matrix Notation 714 Asymptotic Distribution of the F-Statistic 714 Confidence Sets for Multiple Coefficients 715 18.4 Distribution of Regression Statistics with Normal Errors Matrix Representations of OLS Regression Statistics Distribution of bn with Normal Errors 717 Distribution of s2uN 718 Homoskedasticity-Only Standard Errors 718 Distribution of the t-Statistic 719 Distribution of the F-Statistic 719 716 716 18.5 Efficiency of the OLS Estimator with Homoskedastic Errors 720 The Gauss–Markov Conditions for Multiple Regression 720 Linear Conditionally Unbiased Estimators 720 The Gauss–Markov Theorem for Multiple Regression 721 18.6 Generalized Least Squares 722 The GLS Assumptions 723 GLS When Ω Is Known 725 GLS When Ω Contains Unknown Parameters 726 The Zero Conditional Mean Assumption and GLS 726 18.7 Instrumental Variables and Generalized Method of Moments Estimation 728 The IV Estimator in Matrix Form 729 Asymptotic Distribution of the TSLS Estimator 730 Properties of TSLS When the Errors Are Homoskedastic 731 Generalized Method of Moments Estimation in Linear Models Appendix 18.1 Summary of Matrix Algebra Appendix 18.2 Multivariate Distributions 734 746 749 Appendix 18.3 Derivation of the Asymptotic Distribution of βn 751 xxii Contents Appendix 18.4 Derivations of Exact Distributions of OLS Test Statistics with Normal Errors 752 Appendix 18.5 Proof of the Gauss–Markov Theorem for Multiple Regression 753 Appendix 18.6 Proof of Selected Results for IV and GMM Estimation Appendix 757 References 765 Glossary 771 Index 779 754 Key Concepts Part One Introduction and Review 1.1 Cross-Sectional, Time Series, and Panel Data 12 2.1 Expected Value and the Mean 20 2.2 Variance and Standard Deviation 21 2.3 Means, Variances, and Covariances of Sums of Random Variables 35 2.4 Computing Probabilities Involving Normal Random Variables 37 2.5 Simple Random Sampling and i.i.d. Random Variables 44 2.6 Convergence in Probability, Consistency, and the Law of Large Numbers 48 2.7 The Central Limit Theorem 52 3.1 Estimators and Estimates 67 3.2 Bias, Consistency, and Efficiency 68 3.3 Efficiency of Y : Y Is BLUE  69 3.4 The Standard Error of Y 75 3.5 The Terminology of Hypothesis Testing 78 3.6 Testing the Hypothesis E(Y) = μY,0 Against the Alternative E(Y) ≠ μY,0 79 3.7 Confidence Intervals for the Population Mean 81 Part Two Fundamentals of Regression Analysis 4.1 Terminology for the Linear Regression Model with a Single Regressor 113 4.2 The OLS Estimator, Predicted Values, and Residuals 117 4.3 The Least Squares Assumptions 129 4.4 Large-Sample Distributions of bn0 and bn1 131 5.1 General Form of the t-Statistic 147 5.2 Testing the Hypothesis b1 = b1,0 Against the Alternative b1 ≠ b1,0 149 5.3 Confidence Interval for β1 154 5.4 Heteroskedasticity and Homoskedasticity 159 5.5 The Gauss–Markov Theorem for bn1 165 6.1 Omitted Variable Bias in Regression with a Single Regressor 185 6.2 The Multiple Regression Model 192 6.3 The OLS Estimators, Predicted Values, and Residuals in the Multiple Regression Model 194 6.4 The Least Squares Assumptions in the Multiple Regression Model 201 6.5 Large-Sample Distribution of bn0, bn1, c, bnk 202 7.1 Testing the Hypothesis bj = bj,0 Against the Alternative bj ≠ bj,0 219 7.2 Confidence Intervals for a Single Coefficient in Multiple Regression 220 xxiii xxiv Key Concepts 7.3 Omitted Variable Bias in Multiple Regression 233 7.4 R2 and R 2: What They Tell You—and What They Don’t 238 8.1 The Expected Change on Y of a Change in X1 in the Nonlinear Regression Model (8.3) 263 8.2 Logarithms in Regression: Three Cases 276 8.3 A Method for Interpreting Coefficients in Regressions with Binary Variables 281 8.4 Interactions Between Binary and Continuous Variables 284 8.5 Interactions in Multiple Regression 289 9.1 Internal and External Validity 316 9.2 Omitted Variable Bias: Should I Include More Variables in My Regression? 321 9.3 Functional Form Misspecification 322 9.4 Errors-in-Variables Bias 324 9.5 Sample Selection Bias 326 9.6 Simultaneous Causality Bias 329 9.7 Threats to the Internal Validity of a Multiple Regression Study 330 Part Three Further Topics in Regression Analysis 10.1 Notation for Panel Data 351 10.2 The Fixed Effects Regression Model 359 10.3 The Fixed Effects Regression Assumptions 366 11.1 The Linear Probability Model 389 11.2 The Probit Model, Predicted Probabilities, and Estimated Effects 11.3 Logit Regression 396 12.1 The General Instrumental Variables Regression Model and Terminology 436 12.2 Two Stage Least Squares 438 12.3 The Two Conditions for Valid Instruments 439 12.4 The IV Regression Assumptions 440 12.5 A Rule of Thumb for Checking for Weak Instruments 444 12.6 The Overidentifying Restrictions Test (The J-Statistic) 448 Part four 394 Regression Analysis of Economic Time Series Data 14.1 Lags, First Differences, Logarithms, and Growth Rates 527 14.2 Autocorrelation (Serial Correlation) and Autocovariance 528 14.3 Autoregressions 535 14.4 The Autoregressive Distributed Lag Model 540 Key Concepts 14.5 Stationarity 541 14.6 Time Series Regression with Multiple Predictors 542 14.7 Granger Causality Tests (Tests of Predictive Content) 543 14.8 The Augmented Dickey–Fuller Test for a Unit Autoregressive Root 14.9 The QLR Test for Coefficient Stability 566 14.10 Pseudo Out-of-Sample Forecasts 568 15.1 The Distributed Lag Model and Exogeneity 598 15.2 The Distributed Lag Model Assumptions 599 15.3 HAC Standard Errors 607 15.4 Estimation of Dynamic Multipliers Under Strict Exogeneity 616 16.1 Vector Autoregressions 639 16.2 Iterated Multiperiod Forecasts 646 16.3 Direct Multiperiod Forecasts 648 16.4 Orders of Integration, Differencing, and Stationarity 650 16.5 Cointegration 657 Part five 559 Regression Analysis of Economic Time Series Data 17.1 The Extended Least Squares Assumptions for Regression with a Single Regressor 678 18.1 The Extended Least Squares Assumptions in the Multiple Regression Model 707 18.2 The Multivariate Central Limit Theorem 711 18.3 Gauss–Markov Theorem for Multiple Regression 722 18.4 The GLS Assumptions 724 xxv This page intentionally left blank General Interest Boxes The Distribution of Earnings in the United States in 2012 33 A Bad Day on Wall Street 39 Financial Diversification and Portfolios 46 Landon Wins! 70 The Gender Gap of Earnings of College Graduates in the United States 86 A Novel Way to Boost Retirement Savings 90 The “Beta” of a Stock 120 The Economic Value of a Year of Education: Homoskedasticity or Heteroskedasticity? 162 The Mozart Effect: Omitted Variable Bias? 186 The Return to Education and the Gender Gap 287 The Demand for Economics Journals 290 Do Stock Mutual Funds Outperform the Market? 327 James Heckman and Daniel McFadden, Nobel Laureates 410 Who Invented Instrumental Variables Regression? 428 A Scary Regression 446 The Externalities of Smoking 450 The Hawthorne Effect 482 What Is the Effect on Employment of the Minimum Wage? 497 Can You Beat the Market? Part I 536 The River of Blood 546 Can You Beat the Market? Part II 570 Orange Trees on the March 623 NEWS FLASH: Commodity Traders Send Shivers Through Disney World 625 Nobel Laureates in Time Series Econometrics 669 xxvii This page intentionally left blank Preface E conometrics can be a fun course for both teacher and student. The real world of economics, business, and government is a complicated and messy place, full of competing ideas and questions that demand answers. Is it more effective to tackle drunk driving by passing tough laws or by increasing the tax on alcohol? Can you make money in the stock market by buying when prices are historically low, relative to earnings, or should you just sit tight, as the random walk theory of stock prices suggests? Can we improve elementary education by reducing class sizes, or should we simply have our children listen to Mozart for 10 minutes a day? Econometrics helps us sort out sound ideas from crazy ones and find quantitative answers to important quantitative questions. Econometrics opens a window on our complicated world that lets us see the relationships on which people, businesses, and governments base their decisions. Introduction to Econometrics is designed for a first course in undergraduate econometrics. It is our experience that to make econometrics relevant in an introductory course, interesting applications must motivate the theory and the theory must match the applications. This simple principle represents a significant departure from the older generation of econometrics books, in which theoretical models and assumptions do not match the applications. It is no wonder that some students question the relevance of econometrics after they spend much of their time learning assumptions that they subsequently realize are unrealistic so that they must then learn “solutions” to “problems” that arise when the applications do not match the assumptions. We believe that it is far better to motivate the need for tools with a concrete application and then to provide a few simple assumptions that match the application. Because the theory is immediately relevant to the applications, this approach can make econometrics come alive. New to the Third Edition • Updated treatment of standard errors for panel data regression • Discussion of when and why missing data can present a problem for regression analysis • The use of regression discontinuity design as a method for analyzing quasiexperiments xxix xxx Preface • Updated discussion of weak instruments • Discussion of the use and interpretation of control variables integrated into the core development of regression analysis • Introduction of the “potential outcomes” framework for experimental data • Additional general interest boxes • Additional exercises, both pencil-and-paper and empirical This third edition builds on the philosophy of the first and second editions that applications should drive the theory, not the other way around. One substantial change in this edition concerns inference in regression with panel data (Chapter 10). In panel data, the data within an entity typically are correlated over time. For inference to be valid, standard errors must be computed using a method that is robust to this correlation. The chapter on panel data now uses one such method, clustered standard errors, from the outset. Clustered standard errors are the natural extension to panel data of the heteroskedasticityrobust standard errors introduced in the initial treatment of regression analysis in Part II. Recent research has shown that clustered standard errors have a number of desirable properties, which are now discussed in Chapter 10 and in a revised appendix to Chapter 10. Another substantial set of changes concerns the treatment of experiments and quasi-experiments in Chapter 13. The discussion of differences-in-differences regression has been streamlined and draws directly on the multiple regression principles introduced in Part II. Chapter 13 now discusses regression discontinuity design, which is an intuitive and important framework for the analysis of quasiexperimental data. In addition, Chapter 13 now introduces the potential outcomes framework and relates this increasingly commonplace terminology to concepts that were introduced in Parts I and II. This edition has a number of other significant changes. One is that it incorporates a precise but accessible treatment of control variables into the initial discussion of multiple regression. Chapter 7 now discusses conditions for control variables being successful in the sense that the coefficient on the variable of interest is unbiased even though the coefficients on the control variables generally are not. Other changes include a new discussion of missing data in Chapter 9, a new optional calculus-based appendix to Chapter 8 on slopes and elasticities of nonlinear regression functions, and an updated discussion in Chapter 12 of what to do if you have weak instruments. This edition also includes new general interest boxes, updated empirical examples, and additional exercises. Preface xxxi The Updated Third Edition • The time series data used in Chapters 14–16 have been extended through the beginning of 2013 and now include the Great Recession. • The empirical analysis in Chapter 14 now focuses on forecasting the growth rate of real GDP using the term spread, replacing the Phillips curve forecasts from earlier editions. • Several new empirical exercises have been added to each chapter. Rather than include all of the empirical exercises in the text, we have moved many of them to the Companion Website, www.pearsonhighered.com/stock_watson. This has two main advantages: first, we can offer more and more in-depth exercises, and second, we can add and update exercises between editions. We encourage you to browse the empirical exercises available on the Companion Website. Features of This Book Introduction to Econometrics differs from other textbooks in three main ways. First, we integrate real-world questions and data into the development of the theory, and we take seriously the substantive findings of the resulting empirical analysis. Second, our choice of topics reflects modern theory and practice. Third, we provide theory and assumptions that match the applications. Our aim is to teach students to become sophisticated consumers of econometrics and to do so at a level of mathematics appropriate for an introductory course. Real-World Questions and Data We organize each methodological topic around an important real-world question that demands a specific numerical answer. For example, we teach single-variable regression, multiple regression, and functional form analysis in the context of estimating the effect of school inputs on school outputs. (Do smaller elementary school class sizes produce higher test scores?) We teach panel data methods in the context of analyzing the effect of drunk driving laws on traffic fatalities. We use possible racial discrimination in the market for home loans as the empirical application for teaching regression with a binary dependent variable (logit and probit). We teach instrumental variable estimation in the context of estimating the demand elasticity for cigarettes. Although these examples involve economic reasoning, all xxxii Preface can be understood with only a single introductory course in economics, and many can be understood without any previous economics coursework. Thus the instructor can focus on teaching econometrics, not microeconomics or macroeconomics. We treat all our empirical applications seriously and in a way that shows students how they can learn from data but at the same time be self-critical and aware of the limitations of empirical analyses. Through each application, we teach students to explore alternative specifications and thereby to assess whether their substantive findings are robust. The questions asked in the empirical applications are important, and we provide serious and, we think, credible answers. We encourage students and instructors to disagree, however, and invite them to reanalyze the data, which are provided on the textbook’s Companion Website (www .pearsonhighered.com/stock_watson). Contemporary Choice of Topics Econometrics has come a long way since the 1980s. The topics we cover reflect the best of contemporary applied econometrics. One can only do so much in an introductory course, so we focus on procedures and tests that are commonly used in practice. For example: • Instrumental variables regression. We present instrumental variables regression as a general method for handling correlation between the error term and a regressor, which can arise for many reasons, including omitted variables and simultaneous causality. The two assumptions for a valid instrument— exogeneity and relevance—are given equal billing. We follow that presentation with an extended discussion of where instruments come from and with tests of overidentifying restrictions and diagnostics for weak instruments, and we explain what to do if these diagnostics suggest problems. • Program evaluation. An increasing number of econometric studies analyze either randomized controlled experiments or quasi-experiments, also known as natural experiments. We address these topics, often collectively referred to as program evaluation, in Chapter 13. We present this research strategy as an alternative approach to the problems of omitted variables, simultaneous causality, and selection, and we assess both the strengths and the weaknesses of studies using experimental or quasi-experimental data. • Forecasting. The chapter on forecasting (Chapter 14) considers univariate (autoregressive) and multivariate forecasts using time series regression, not large simultaneous equation structural models. We focus on simple and reliable tools, such as autoregressions and model selection via an information Preface xxxiii criterion, that work well in practice. This chapter also features a practically oriented treatment of stochastic trends (unit roots), unit root tests, tests for structural breaks (at known and unknown dates), and pseudo out-of-sample forecasting, all in the context of developing stable and reliable time series forecasting models. • Time series regression. We make a clear distinction between two very different applications of time series regression: forecasting and estimation of dynamic causal effects. The chapter on causal inference using time series data (Chapter 15) pays careful attention to when different estimation methods, including generalized least squares, will or will not lead to valid causal inferences and when it is advisable to estimate dynamic regressions using OLS with heteroskedasticity- and autocorrelation-consistent standard errors. Theory That Matches Applications Although econometric tools are best motivated by empirical applications, students need to learn enough econometric theory to understand the strengths and limitations of those tools. We provide a modern treatment in which the fit between theory and applications is as tight as possible, while keeping the mathematics at a level that requires only algebra. Modern empirical applications share some common characteristics: The data sets typically are large (hundreds of observations, often more); regressors are not fixed over repeated samples but rather are collected by random sampling (or some other mechanism that makes them random); the data are not normally distributed; and there is no a priori reason to think that the errors are homoskedastic (although often there are reasons to think that they are heteroskedastic). These observations lead to important differences between the theoretical development in this textbook and other textbooks: • Large-sample approach. Because data sets are large, from the outset we use large-sample normal approximations to sampling distributions for hypothesis testing and confidence intervals. In our experience, it takes less time to teach the rudiments of large-sample approximations than to teach the Student t and exact F distributions, degrees-of-freedom corrections, and so forth. This large-sample approach also saves students the frustration of discovering that, because of nonnormal errors, the exact distribution theory they just mastered is irrelevant. Once taught in the context of the sample mean, the large-sample approach to hypothesis testing and confidence intervals carries directly through multiple regression analysis, logit and probit, instrumental variables estimation, and time series methods. xxxiv Preface • Random sampling. Because regressors are rarely fixed in econometric applications, from the outset we treat data on all variables (dependent and independent) as the result of random sampling. This assumption matches our initial applications to cross-sectional data, it extends readily to panel and time series data, and because of our large-sample approach, it poses no additional conceptual or mathematical difficulties. • Heteroskedasticity. Applied econometricians routinely use heteroskedasticityrobust standard errors to eliminate worries about whether heteroskedasticity is present or not. In this book, we move beyond treating heteroskedasticity as an exception or a “problem” to be “solved”; instead, we allow for heteroskedasticity from the outset and simply use heteroskedasticity-robust standard errors. We present homoskedasticity as a special case that provides a theoretical motivation for OLS. Skilled Producers, Sophisticated Consumers We hope that students using this book will become sophisticated consumers of empirical analysis. To do so, they must learn not only how to use the tools of regression analysis but also how to assess the validity of empirical analyses presented to them. Our approach to teaching how to assess an empirical study is threefold. First, immediately after introducing the main tools of regression analysis, we devote Chapter 9 to the threats to internal and external validity of an empirical study. This chapter discusses data problems and issues of generalizing findings to other settings. It also examines the main threats to regression analysis, including omitted variables, functional form misspecification, errors-in-variables, selection, and simultaneity—and ways to recognize these threats in practice. Second, we apply these methods for assessing empirical studies to the empirical analysis of the ongoing examples in the book. We do so by considering alternative specifications and by systematically addressing the various threats to validity of the analyses presented in the book. Third, to become sophisticated consumers, students need firsthand experience as producers. Active learning beats passive learning, and econometrics is an ideal course for active learning. For this reason, the textbook website features data sets, software, and suggestions for empirical exercises of different scopes. Approach to Mathematics and Level of Rigor Our aim is for students to develop a sophisticated understanding of the tools of modern regression analysis, whether the course is taught at a “high” or a “low” level of mathematics. Parts I through IV of the text (which cover the substantive Preface xxxv material) are accessible to students with only precalculus mathematics. Parts I through IV have fewer equations and more applications than many introductory econometrics books and far fewer equations than books aimed at mathematical sections of undergraduate courses. But more equations do not imply a more sophisticated treatment. In our experience, a more mathematical treatment does not lead to a deeper understanding for most students. That said, different students learn differently, and for mathematically wellprepared students, learning can be enhanced by a more explicitly mathematical treatment. Part V therefore contains an introduction to econometric theory that is appropriate for students with a stronger mathematical background. When the mathematical chapters in Part V are used in conjunction with the material in Parts I through IV, this book is suitable for advanced undergraduate or master’s level econometrics courses. Contents and Organization There are five parts to Introduction to Econometrics. This textbook assumes that the student has had a course in probability and statistics, although we review that material in Part I. We cover the core material of regression analysis in Part II. Parts III, IV, and V present additional topics that build on the core treatment in Part II. Part I Chapter 1 introduces econometrics and stresses the importance of providing quantitative answers to quantitative questions. It discusses the concept of causality in statistical studies and surveys the different types of data encountered in econometrics. Material from probability and statistics is reviewed in Chapters 2 and 3, respectively; whether these chapters are taught in a given course or are simply provided as a reference depends on the background of the students. Part II Chapter 4 introduces regression with a single regressor and ordinary least squares (OLS) estimation, and Chapter 5 discusses hypothesis tests and confidence intervals in the regression model with a single regressor. In Chapter 6, students learn how they can address omitted variable bias using multiple regression, thereby estimating the effect of one independent variable while holding other independent variables constant. Chapter 7 covers hypothesis tests, including F-tests, and confidence intervals in multiple regression. In Chapter 8, the linear regression model is xxxvi Preface extended to models with nonlinear population regression functions, with a focus on regression functions that are linear in the parameters (so that the parameters can be estimated by OLS). In Chapter 9, students step back and learn how to identify the strengths and limitations of regression studies, seeing in the process how to apply the concepts of internal and external validity. Part III Part III presents extensions of regression methods. In Chapter 10, students learn how to use panel data to control for unobserved variables that are constant over time. Chapter 11 covers regression with a binary dependent variable. Chapter 12 shows how instrumental variables regression can be used to address a variety of problems that produce correlation between the error term and the regressor, and examines how one might find and evaluate valid instruments. Chapter 13 introduces students to the analysis of data from experiments and quasi-, or natural, experiments, topics often referred to as “program evaluation.” Part IV Part IV takes up regression with time series data. Chapter 14 focuses on forecasting and introduces various modern tools for analyzing time series regressions, such as unit root tests and tests for stability. Chapter 15 discusses the use of time series data to estimate causal relations. Chapter 16 presents some more advanced tools for time series analysis, including models of conditional heteroskedasticity. Part V Part V is an introduction to econometric theory. This part is more than an appendix that fills in mathematical details omitted from the text. Rather, it is a self-contained treatment of the econometric theory of estimation and inference in the linear regression model. Chapter 17 develops the theory of regression analysis for a single regressor; the exposition does not use matrix algebra, although it does demand a higher level of mathematical sophistication than the rest of the text. Chapter 18 presents and studies the multiple regression model, instrumental variables regression, and generalized method of moments estimation of the linear model, all in matrix form. Prerequisites Within the Book Because different instructors like to emphasize different material, we wrote this book with diverse teaching preferences in mind. To the maximum extent possible, Preface xxxvii the chapters in Parts III, IV, and V are “stand-alone” in the sense that they do not require first teaching all the preceding chapters. The specific prerequisites for each chapter are described in Table I. Although we have found that the sequence of topics adopted in the textbook works well in our own courses, the chapters are written in a way that allows instructors to present topics in a different order if they so desire. Sample Courses This book accommodates several different course structures. TABLE I Guide to Prerequisites for Special-Topic Chapters in Parts III, IV, and V Prerequisite parts or chapters Part I Part II Part III Part IV 10.1, 10.2 12.1, 12.2 X X X Xa X X X 14 a X Xa b 15 Xa Xa b X 16 Xa Xa b X 17 X X X 18 X X X Chapter 1–3 4–7, 9 8 10 Xa Xa X 11 a X Xa X 12.1, 12.2 Xa Xa X 12.3–12.6 a X Xa 13 Xa X 14.1–14.4 Part V 14.5–14.8 15 X X 17 X This table shows the minimum prerequisites needed to cover the material in a given chapter. For example, estimation of dynamic causal effects with time series data (Chapter 15) first requires Part I (as needed, depending on student preparation, and except as noted in footnote a), Part II (except for Chapter 8; see footnote b), and Sections 14.1 through 14.4. aChapters 10 through 16 use exclusively large-sample approximations to sampling distributions, so the optional Sections 3.6 (the Student t distribution for testing means) and 5.6 (the Student t distribution for testing regression coefficients) can be skipped. bChapters 14 through 16 (the time series chapters) can be taught without first teaching Chapter 8 (nonlinear regression functions) if the instructor pauses to explain the use of logarithmic transformations to approximate percentage changes. xxxviii Preface Standard Introductory Econometrics This course introduces econometrics (Chapter 1) and reviews probability and statistics as needed (Chapters 2 and 3). It then moves on to regression with a single regressor, multiple regression, the basics of functional form analysis, and the evaluation of regression studies (all Part II). The course proceeds to cover regression with panel data (Chapter 10), regression with a limited dependent variable (Chapter 11), and instrumental variables regression (Chapter 12), as time permits. The course concludes with experiments and quasi-experiments in Chapter 13, topics that provide an opportunity to return to the questions of estimating causal effects raised at the beginning of the semester and to recapitulate core regression methods. Prerequisites: Algebra II and introductory statistics. Introductory Econometrics with Time Series and Forecasting Applications Like a standard introductory course, this course covers all of Part I (as needed) and Part II. Optionally, the course next provides a brief introduction to panel data (Sections 10.1 and 10.2) and takes up instrumental variables regression (Chapter 12, or just Sections 12.1 and 12.2). The course then proceeds to Part IV, covering forecasting (Chapter 14) and estimation of dynamic causal effects (Chapter 15). If time permits, the course can include some advanced topics in time series analysis such as volatility clustering and conditional heteroskedasticity (Section 16.5). Prerequisites: Algebra II and introductory statistics. Applied Time Series Analysis and Forecasting This book also can be used for a short course on applied time series and forecasting, for which a course on regression analysis is a prerequisite. Some time is spent reviewing the tools of basic regression analysis in Part II, depending on student preparation. The course then moves directly to Part IV and works through forecasting (Chapter 14), estimation of dynamic causal effects (Chapter 15), and advanced topics in time series analysis (Chapter 16), including vector autoregressions and conditional heteroskedasticity. An important component of this course is hands-on forecasting exercises, available to instructors on the book’s accompanying website. Prerequisites: Algebra II and basic introductory econometrics or the equivalent. Introduction to Econometric Theory This book is also suitable for an advanced undergraduate course in which the students have a strong mathematical preparation or for a master’s level course in Preface xxxix econometrics. The course briefly reviews the theory of statistics and probability as necessary (Part I). The course introduces regression analysis using the nonmathematical, applications-based treatment of Part II. This introduction is followed by the theoretical development in Chapters 17 and 18 (through Section 18.5). The course then takes up regression with a limited dependent variable (Chapter 11) and maximum likelihood estimation (Appendix 11.2). Next, the course optionally turns to instrumental variables regression and generalized method of moments (Chapter 12 and Section 18.7), time series methods (Chapter 14), and the estimation of causal effects using time series data and generalized least squares (Chapter 15 and Section 18.6). Prerequisites: Calculus and introductory statistics. Chapter 18 assumes previous exposure to matrix algebra. Pedagogical Features This textbook has a variety of pedagogical features aimed at helping students understand, retain, and apply the essential ideas. Chapter introductions provide real-world grounding and motivation, as well as brief road maps highlighting the sequence of the discussion. Key terms are boldfaced and defined in context throughout each chapter, and Key Concept boxes at regular intervals recap the central ideas. General interest boxes provide interesting excursions into related topics and highlight real-world studies that use the methods or concepts being discussed in the text. A Summary concluding each chapter serves as a helpful framework for reviewing the main points of coverage. The questions in the Review the Concepts section check students’ understanding of the core content, Exercises give more intensive practice working with the concepts and techniques introduced in the chapter, and Empirical Exercises allow students to apply what they have learned to answer real-world empirical questions. At the end of the textbook, the Appendix provides statistical tables, the References section lists sources for further reading, and a Glossary conveniently defines many key terms in the book. Supplements to Accompany the Textbook The online supplements accompanying the third edition update of Introduction to Econometrics include the Instructor’s Resource Manual, Test Bank, and PowerPoint® slides with text figures, tables, and Key Concepts. The Instructor’s Resource Manual includes solutions to all the end-of-chapter exercises, while the Test Bank, offered in Testgen, provides a rich supply of easily edited test problems and xl Preface questions of various types to meet specific course needs. These resources are available for download from the Instructor’s Resource Center at www.pearsonhighered .com/stock_watson. Companion Website The Companion Website, found at www.pearsonhighered.com/stock_watson, provides a wide range of additional resources for students and faculty. These resources include more and more in depth empirical exercises, data sets for the empirical exercises, replication files for empirical results reported in the text, practice quizzes, answers to end-of-chapter Review the Concepts questions and Exercises, and EViews tutorials. MyEconLab The third edition update is accompanied by a robust MyEconLab course. The MyEconLab course includes all the Review the Concepts questions as well as some Exercises and Empirical Exercises. In addition, the enhanced eText available in MyEconLab for the third edition update includes URL links from the Exercises and Empirical Exercises to questions in the MyEconLab course and to the data that accompanies them. To register for MyEconLab and to learn more, log on to www.myeconlab.com. Acknowledgments A great many people contributed to the first edition of this book. Our biggest debts of gratitude are to our colleagues at Harvard and Princeton who used early drafts of this book in their classrooms. At Harvard’s Kennedy School of Government, Suzanne Cooper provided invaluable suggestions and detailed comments on multiple drafts. As a coteacher with one of the authors (Stock), she also helped vet much of the material in this book while it was being developed for a required course for master’s students at the Kennedy School. We are also indebted to two other Kennedy School colleagues, Alberto Abadie and Sue Dynarski, for their patient explanations of quasi-experiments and the field of program evaluation and for their detailed comments on early drafts of the text. At Princeton, Eli Tamer taught from an early draft and also provided helpful comments on the penultimate draft of the book. Preface xli We also owe much to many of our friends and colleagues in econometrics who spent time talking with us about the substance of this book and who collectively made so many helpful suggestions. Bruce Hansen (University of Wisconsin– Madison) and Bo Honore (Princeton) provided helpful feedback on very early outlines and preliminary versions of the core material in Part II. Joshua Angrist (MIT) and Guido Imbens (University of California, Berkeley) provided thoughtful suggestions about our treatment of materials on program evaluation. Our presentation of the material on time series has benefited from discussions with Yacine Ait-Sahalia (Princeton), Graham Elliott (University of California, San Diego), Andrew Harvey (Cambridge University), and Christopher Sims (Princeton). Finally, many people made helpful suggestions on parts of the manuscript close to their area of expertise: Don Andrews (Yale), John Bound (University of Michigan), Gregory Chow (Princeton), Thomas Downes (Tufts), David Drukker (StataCorp.), Jean Baldwin Grossman (Princeton), Eric Hanushek (Hoover Institution), James Heckman (University of Chicago), Han Hong (Princeton), Caroline Hoxby (Harvard), Alan Krueger (Princeton), Steven Levitt (University of Chicago), Richard Light (Harvard), David Neumark (Michigan State University), Joseph Newhouse (Harvard), Pierre Perron (Boston University), Kenneth Warner (University of Michigan), and Richard Zeckhauser (Harvard). Many people were very generous in providing us with data. The California test score data were constructed with the assistance of Les Axelrod of the Standards and Assessments Division, California Department of Education. We are grateful to Charlie DePascale, Student Assessment Services, Massachusetts Department of Education, for his help with aspects of the Massachusetts test score data set. Christopher Ruhm (University of North Carolina, Greensboro) graciously provided us with his data set on drunk driving laws and traffic fatalities. The research department at the Federal Reserve Bank of Boston deserves thanks for putting together its data on racial discrimination in mortgage lending; we particularly thank Geoffrey Tootell for providing us with the updated version of the data set we use in Chapter 9 and Lynn Browne for explaining its policy context. We thank Jonathan Gruber (MIT) for sharing his data on cigarette sales, which we analyze in Chapter 12, and Alan Krueger (Princeton) for his help with the Tennessee STAR data that we analyze in Chapter 13. We thank several people for carefully checking the page proof for errors. Kerry Griffin and Yair Listokin read the entire manuscript, and Andrew Fraker, Ori Heffetz, Amber Henry, Hong Li, Alessandro Tarozzi, and Matt Watson worked through several chapters. In the first edition, we benefited from the help of an exceptional development editor, Jane Tufts, whose creativity, hard work, and attention to detail improved xlii Preface the book in many ways, large and small. Pearson provided us with first-rate support, starting with our excellent editor, Sylvia Mallory, and extending through the entire publishing team. Jane and Sylvia patiently taught us a lot about writing, organization, and presentation, and their efforts are evident on every page of this book. We extend our thanks to the superb Pearson team, who worked with us on the second edition: Adrienne D’Ambrosio (senior acquisitions editor), Bridget Page (associate media producer), Charles Spaulding (senior designer), Nancy Fenton (managing editor) and her selection of Nancy Freihofer and Thompson Steele Inc. who handled the entire production process, Heather McNally (supplements coordinator), and Denise Clinton (editor-in-chief). Finally, we had the benefit of Kay Ueno’s skilled editing in the second edition. We are also grateful to the excellent third edition Pearson team of Adrienne D’Ambrosio, Nancy Fenton, and Jill Kolongowski, as well as Mary Sanger, the project manager with Nesbitt Graphics. We also wish to thank the Pearson team who worked on the third edition update: Christina Masturzo, Carolyn Philips, Liz Napolitano, and Heidi Allgair, project manager with Cenveo® Publisher Services. We also received a great deal of help and suggestions from faculty, students, and researchers as we prepared the third edition and its update. The changes made in the third edition incorporate or reflect suggestions, corrections, comments, data, and help provided by a number of researchers and instructors: Donald Andrews (Yale University), Jushan Bai (Columbia), James Cobbe (Florida State University), Susan Dynarski (University of Michigan), Nicole Eichelberger (Texas Tech University), Boyd Fjeldsted (University of Utah), Martina Grunow, Daniel Hamermesh (University of Texas–Austin), Keisuke Hirano (University of Arizona), Bo Honore (Princeton University), Guido Imbens (Harvard University), Manfred Keil (Claremont McKenna College), David Laibson (Harvard University), David Lee (Princeton University), Brigitte Madrian (Harvard University), Jorge Marquez (University of Maryland), Karen Bennett Mathis (Florida Department of Citrus), Alan Mehlenbacher (University of Victoria), Ulrich Müller (Princeton University), Serena Ng (Columbia University), Harry Patrinos (World Bank), Zhuan Pei (Brandeis University), Peter Summers (Texas Tech University), Andrey Vasnov (University of Sydney), and Douglas Young (Montana State University). We also benefited from student input from F. Hoces dela Guardia and Carrie Wilson. Thoughtful reviews for the third edition were prepared for Addison-Wesley by Steve DeLoach (Elon University), Jeffrey DeSimone (University of Texas at Arlington), Gary V. Engelhardt (Syracuse University), Luca Flabbi (Georgetown University), Steffen Habermalz (Northwestern University), Carolyn J. Heinrich (University of Wisconsin–Madison), Emma M. Iglesias-Vazquez (Michigan State Preface xliii University), Carlos Lamarche (University of Oklahoma), Vicki A. McCracken (Washington State University), Claudiney M. Pereira (Tulane University), and John T. Warner (Clemson University). We also received very helpful input on draft revisions of Chapters 7 and 10 from John Berdell (DePaul University), Janet Kohlhase (University of Houston), Aprajit Mahajan (Stanford University), Xia Meng (Brandeis University), and Chan Shen (Georgetown University). Above all, we are indebted to our families for their endurance throughout this project. Writing this book took a long time, and for them, the project must have seemed endless. They, more than anyone else, bore the burden of this commitment, and for their help and support we are deeply grateful. Introduction to Econometrics Chapter 1 Economic Questions and Data A sk a half dozen econometricians what econometrics is, and you could get a half dozen different answers. One might tell you that econometrics is the science of testing economic theories. A second might tell you that econometrics is the set of tools used for forecasting future values of economic variables, such as a firm’s sales, the overall growth of the economy, or stock prices. Another might say that econometrics is the process of fitting mathematical economic models to real-world data. A fourth might tell you that it is the science and art of using historical data to make numerical, or quantitative, policy recommendations in government and business. In fact, all these answers are right. At a broad level, econometrics is the science and art of using economic theory and statistical techniques to analyze economic data. Econometric methods are used in many branches of economics, including finance, labor economics, macroeconomics, microeconomics, marketing, and economic policy. Econometric methods are also commonly used in other social sciences, including political science and sociology. This book introduces you to the core set of methods used by econometricians. We will use these methods to answer a variety of specific, quantitative questions from the worlds of business and government policy. This chapter poses four of those questions and discusses, in general terms, the econometric approach to answering them. The chapter concludes with a survey of the main types of data available to econometricians for answering these and other quantitative economic questions. 1.1 Economic Questions We Examine Many decisions in economics, business, and government hinge on understanding relationships among variables in the world around us. These decisions require quantitative answers to quantitative questions. This book examines several quantitative questions taken from current issues in economics. Four of these questions concern education policy, racial bias in mortgage lending, cigarette consumption, and macroeconomic forecasting. 1 2 Chapter 1  Economic Questions and Data Question #1: Does Reducing Class Size Improve Elementary School Education? Proposals for reform of the U.S. public education system generate heated debate. Many of the proposals concern the youngest students, those in elementary schools. Elementary school education has various objectives, such as developing social skills, but for many parents and educators, the most important objective is basic academic learning: reading, writing, and basic mathematics. One prominent proposal for improving basic learning is to reduce class sizes at elementary schools. With fewer students in the classroom, the argument goes, each student gets more of the teacher’s attention, there are fewer class disruptions, learning is enhanced, and grades improve. But what, precisely, is the effect on elementary school education of reducing class size? Reducing class size costs money: It requires hiring more teachers and, if the school is already at capacity, building more classrooms. A decision maker contemplating hiring more teachers must weigh these costs against the benefits. To weigh costs and benefits, however, the decision maker must have a precise quantitative understanding of the likely benefits. Is the beneficial effect on basic learning of smaller classes large or small? Is it possible that smaller class size actually has no effect on basic learning? Although common sense and everyday experience may suggest that more learning occurs when there are fewer students, common sense cannot provide a quantitative answer to the question of what exactly is the effect on basic learning of reducing class size. To provide such an answer, we must examine empirical evidence—that is, evidence based on data—relating class size to basic learning in elementary schools. In this book, we examine the relationship between class size and basic learning, using data gathered from 420 California school districts in 1999. In the California data, students in districts with small class sizes tend to perform better on standardized tests than students in districts with larger classes. While this fact is consistent with the idea that smaller classes produce better test scores, it might simply reflect many other advantages that students in districts with small classes have over their counterparts in districts with large classes. For example, districts with small class sizes tend to have wealthier residents than districts with large classes, so students in small-class districts could have more opportunities for learning outside the classroom. It could be these extra learning opportunities that lead to higher test scores, not smaller class sizes. In Part II, we use multiple regression analysis to isolate the effect of changes in class size from changes in other factors, such as the economic background of the students. 1.1  Economic Questions We Examine 3 Question #2: Is There Racial Discrimination in the Market for Home Loans? Most people buy their homes with the help of a mortgage, a large loan secured by the value of the home. By law, U.S. lending institutions cannot take race into account when deciding to grant or deny a request for a mortgage: Applicants who are identical in all ways except their race should be equally likely to have their mortgage applications approved. In theory, then, there should be no racial bias in mortgage lending. In contrast to this theoretical conclusion, researchers at the Federal Reserve Bank of Boston found (using data from the early 1990s) that 28% of black applicants are denied mortgages, while only 9% of white applicants are denied. Do these data indicate that, in practice, there is racial bias in mortgage lending? If so, how large is it? The fact that more black than white applicants are denied in the Boston Fed data does not by itself provide evidence of discrimination by mortgage lenders because the black and white applicants differ in many ways other than their race. Before concluding that there is bias in the mortgage market, these data must be examined more closely to see if there is a difference in the probability of being denied for otherwise identical applicants and, if so, whether this difference is large or small. To do so, in Chapter 11 we introduce econometric methods that make it possible to quantify the effect of race on the chance of obtaining a mortgage, holding constant other applicant characteristics, notably their ability to repay the loan. Question #3: How Much Do Cigarette Taxes Reduce Smoking? Cigarette smoking is a major public health concern worldwide. Many of the costs of smoking, such as the medical expenses of caring for those made sick by smoking and the less quantifiable costs to nonsmokers who prefer not to breathe secondhand cigarette smoke, are borne by other members of society. Because these costs are borne by people other than the smoker, there is a role for government intervention in reducing cigarette consumption. One of the most flexible tools for cutting consumption is to increase taxes on cigarettes. Basic economics says that if cigarette prices go up, consumption will go down. But by how much? If the sales price goes up by 1%, by what percentage will the quantity of cigarettes sold decrease? The percentage change in the quantity demanded resulting from a 1% increase in price is the price elasticity of demand. 4 Chapter 1  Economic Questions and Data If we want to reduce smoking by a certain amount, say 20%, by raising taxes, then we need to know the price elasticity of demand to calculate the price increase necessary to achieve this reduction in consumption. But what is the price elasticity of demand for cigarettes? Although economic theory provides us with the concepts that help us answer this question, it does not tell us the numerical value of the price elasticity of demand. To learn the elasticity, we must examine empirical evidence about the behavior of smokers and potential smokers; in other words, we need to analyze data on cigarette consumption and prices. The data we examine are cigarette sales, prices, taxes, and personal income for U.S. states in the 1980s and 1990s. In these data, states with low taxes, and thus low cigarette prices, have high smoking rates, and states with high prices have low smoking rates. However, the analysis of these data is complicated because causality runs both ways: Low taxes lead to high demand, but if there are many smokers in the state, then local politicians might try to keep cigarette taxes low to satisfy their smoking constituents. In Chapter 12, we study methods for handling this “simultaneous causality” and use those methods to estimate the price elasticity of cigarette demand. Question #4: By How Much Will U.S. GDP Grow Next Year? It seems that people always want a sneak preview of the future. What will sales be next year at a firm that is considering investing in new equipment? Will the stock market go up next month, and, if it does, by how much? Will city tax receipts next year cover planned expenditures on city services? Will your microeconomics exam next week focus on externalities or monopolies? Will Saturday be a nice day to go to the beach? One aspect of the future in which macroeconomists are particularly interested is the growth of real economic activity, as measured by real gross domestic product (GDP), during the next year. A management consulting firm might advise a manufacturing client to expand its capacity based on an upbeat forecast of economic growth. Economists at the Federal Reserve Board in Washington, D.C., are mandated to set policy to keep real GDP near its potential in order to maximize employment. If they forecast anemic GDP growth over the next year, they might expand liquidity in the economy by reducing interest rates or other measures, in an attempt to boost economic activity. Professional economists who rely on precise numerical forecasts use econometric models to make those forecasts. A forecaster’s job is to predict the future 1.2   Causal Effects and Idealized Experiments 5 by using the past, and econometricians do this by using economic theory and statistical techniques to quantify relationships in historical data. The data we use to forecast the growth rate of GDP are past values of GDP and the “term spread” in the United States. The term spread is the difference between long-term and short-term interest rates. It measures, among other things, whether investors expect short-term interest rates to rise or fall in the future. The term spread is usually positive, but it tends to fall sharply before the onset of a recession. One of the GDP growth rate forecasts we develop and evaluate in Chapter 14 is based on the term spread. Quantitative Questions, Quantitative Answers Each of these four questions requires a numerical answer. Economic theory provides clues about that answer—for example, cigarette consumption ought to go down when the price goes up—but the actual value of the number must be learned empirically, that is, by analyzing data. Because we use data to answer quantitative questions, our answers always have some uncertainty: A different set of data would produce a different numerical answer. Therefore, the conceptual framework for the analysis needs to provide both a numerical answer to the question and a measure of how precise the answer is. The conceptual framework used in this book is the multiple regression model, the mainstay of econometrics. This model, introduced in Part II, provides a mathematical way to quantify how a change in one variable affects another variable, holding other things constant. For example, what effect does a change in class size have on test scores, holding constant or controlling for student characteristics (such as family income) that a school district administrator cannot control? What effect does your race have on your chances of having a mortgage application granted, holding constant other factors such as your ability to repay the loan? What effect does a 1% increase in the price of cigarettes have on cigarette consumption, holding constant the income of smokers and potential smokers? The multiple regression model and its extensions provide a framework for answering these questions using data and for quantifying the uncertainty associated with those answers. 1.2 Causal Effects and Idealized Experiments Like many other questions encountered in econometrics, the first three questions in Section 1.1 concern causal relationships among variables. In common usage, an action is said to cause an outcome if the outcome is the direct result, or consequence, 6 Chapter 1  Economic Questions and Data of that action. Touching a hot stove causes you to get burned; drinking water causes you to be less thirsty; putting air in your tires causes them to inflate; putting fertilizer on your tomato plants causes them to produce more tomatoes. Causality means that a specific action (applying fertilizer) leads to a specific, measurable consequence (more tomatoes). Estimation of Causal Effects How best might we measure the causal effect on tomato yield (measured in kilograms) of applying a certain amount of fertilizer, say 100 grams of fertilizer per square meter? One way to measure this causal effect is to conduct an experiment. In that experiment, a horticultural researcher plants many plots of tomatoes. Each plot is tended identically, with one exception: Some plots get 100 grams of fertilizer per square meter, while the rest get none. Moreover, whether a plot is fertilized or not is determined randomly by a computer, ensuring that any other differences between the plots are unrelated to whether they receive fertilizer. At the end of the growing season, the horticulturalist weighs the harvest from each plot. The difference between the average yield per square meter of the treated and untreated plots is the effect on tomato production of the fertilizer treatment. This is an example of a randomized controlled experiment. It is controlled in the sense that there are both a control group that receives no treatment (no fertilizer) and a treatment group that receives the treatment (100 g/m2 of fertilizer). It is randomized in the sense that the treatment is assigned randomly. This random assignment eliminates the possibility of a systematic relationship between, for example, how sunny the plot is and whether it receives fertilizer so that the only systematic difference between the treatment and control groups is the treatment. If this experiment is properly implemented on a large enough scale, then it will yield an estimate of the causal effect on the outcome of interest (tomato production) of the treatment (applying 100 g/m2 of fertilizer). In this book, the causal effect is defined to be the effect on an outcome of a given action or treatment, as measured in an ideal randomized controlled experiment. In such an experiment, the only systematic reason for differences in outcomes between the treatment and control groups is the treatment itself. It is possible to imagine an ideal randomized controlled experiment to answer each of the first three questions in Section 1.1. For example, to study class size, one can imagine randomly assigning “treatments” of different class sizes to different groups of students. If the experiment is designed and executed so that the only systematic difference between the groups of students is their class size, then in 1.3  Data: Sources and Types 7 theory this experiment would estimate the effect on test scores of reducing class size, holding all else constant. The concept of an ideal randomized controlled experiment is useful because it gives a definition of a causal effect. In practice, however, it is not possible to perform ideal experiments. In fact, experiments are relatively rare in econometrics because often they are unethical, impossible to execute satisfactorily, or prohibitively expensive. The concept of the ideal randomized controlled experiment does, however, provide a theoretical benchmark for an econometric analysis of causal effects using actual data. Forecasting and Causality Although the first three questions in Section 1.1 concern causal effects, the fourth—forecasting the growth rate of GDP—does not. You do not need to know a causal relationship to make a good forecast. A good way to “forecast” whether it is raining is to observe whether pedestrians are using umbrellas, but the act of using an umbrella does not cause it to rain. Even though forecasting need not involve causal relationships, economic theory suggests patterns and relationships that might be useful for forecasting. As we see in Chapter 14, multiple regression analysis allows us to quantify historical relationships suggested by economic theory, to check whether those relationships have been stable over time, to make quantitative forecasts about the future, and to assess the accuracy of those forecasts. 1.3 Data: Sources and Types In econometrics, data come from one of two sources: experiments or nonexperimental observations of the world. This book examines both experimental and nonexperimental data sets. Experimental Versus Observational Data Experimental data come from experiments designed to evaluate a treatment or policy or to investigate a causal effect. For example, the state of Tennessee financed a large randomized controlled experiment examining class size in the 1980s. In that experiment, which we examine in Chapter 13, thousands of students were randomly assigned to classes of different sizes for several years and were given standardized tests annually. 8 Chapter 1  Economic Questions and Data The Tennessee class size experiment cost millions of dollars and required the ongoing cooperation of many administrators, parents, and teachers over several years. Because real-world experiments with human subjects are difficult to administer and to control, they have flaws relative to ideal randomized controlled experiments. Moreover, in some circumstances, experiments are not only expensive and difficult to administer but also unethical. (Would it be ethical to offer randomly selected teenagers inexpensive cigarettes to see how many they buy?) Because of these financial, practical, and ethical problems, experiments in economics are relatively rare. Instead, most economic data are obtained by observing real-world behavior. Data obtained by observing actual behavior outside an experimental setting are called observational data. Observational data are collected using surveys, such as telephone surveys of consumers, and administrative records, such as historical records on mortgage applications maintained by lending institutions. Observational data pose major challenges to econometric attempts to estimate causal effects, and the tools of econometrics are designed to tackle these challenges. In the real world, levels of “treatment” (the amount of fertilizer in the tomato example, the student–teacher ratio in the class size example) are not assigned at random, so it is difficult to sort out the effect of the “treatment” from other relevant factors. Much of econometrics, and much of this book, is devoted to methods for meeting the challenges encountered when real-world data are used to estimate causal effects. Whether the data are experimental or observational, data sets come in three main types: cross-sectional data, time series data, and panel data. In this book, you will encounter all three types. Cross-Sectional Data Data on different entities—workers, consumers, firms, governmental units, and so forth—for a single time period are called cross-sectional data. For example, the data on test scores in California school districts are cross sectional. Those data are for 420 entities (school districts) for a single time period (1999). In general, the number of entities on which we have observations is denoted n; so, for example, in the California data set, n = 420. The California test score data set contains measurements of several different variables for each district. Some of these data are tabulated in Table 1.1. Each row lists data for a different district. For example, the average test score for the first district (“district #1”) is 690.8; this is the average of the math and science test scores for all fifth graders in that district in 1999 on a standardized test (the Stanford 1.3  Data: Sources and Types TABLE 1.1 Selected Observations on Test Scores and Other Variables for California School Districts in 1999 Observation (District) District Average Number Test Score (fifth grade) Student–Teacher Ratio Expenditure per Pupil ($) Percentage of Students Learning English 1 690.8 17.89 $6385 2 661.2 21.52 5099 4.6 3 643.6 18.70 5502 30.0 4 647.7 17.36 7102 0.0 5 640.8 18.67 5236 13.9 . . . . . . . . . . . . 9 . . . 0.0% 418 645.0 21.89 4403 24.3 419 672.2 20.20 4776 3.0 420 655.8 19.04 5993 5.0 Note: The California test score data set is described in Appendix 4.1. Achievement Test). The average student–teacher ratio in that district is 17.89; that is, the number of students in district #1 divided by the number of classroom teachers in district #1 is 17.89. Average expenditure per pupil in district #1 is $6385. The percentage of students in that district still learning English—that is, the percentage of students for whom English is a second language and who are not yet proficient in English—is 0%. The remaining rows present data for other districts. The order of the rows is arbitrary, and the number of the district, which is called the observation number, is an arbitrarily assigned number that organizes the data. As you can see in the table, all the variables listed vary considerably. With cross-sectional data, we can learn about relationships among variables by studying differences across people, firms, or other economic entities during a single time period. Time Series Data Time series data are data for a single entity (person, firm, country) collected at multiple time periods. Our data set on the growth rate of GDP and the term spread in the United States is an example of a time series data set. The data set 10 Chapter 1  Economic Questions and Data TABLE 1.2 Selected Observations on the Growth Rate of GDP and the Term Spread in the United States: Quarterly Data, 1960:Q1–2013:Q1 Observation Number Date (year:quarter) GDP Growth Rate (% at an annual rate) 1 1960:Q1 2 1960:Q2 −1.5 1.3 3 1960:Q3 1.0 1.5 4 1960:Q4 −4.9 1.6 5 1961:Q1 2.7 1.4 . . . . . . . . . . . . 211 2012:Q3 2.7 1.5 212 2012:Q4 0.1 1.6 213 2013:Q1 1.1 1.9 8.8% Term Spread (% per year) 0.6% Note: The United States GDP and term spread data set is described in Appendix 14.1. contains observations on two variables (the growth rate of GDP and the term spread) for a single entity (the United States) for 213 time periods. Each time period in this data set is a quarter of a year (the first quarter is January, February, and March; the second quarter is April, May, and June; and so forth). The observations in this data set begin in the first quarter of 1960, which is denoted 1960:Q1, and end in the first quarter of 2013 (2013:Q1). The number of observations (that is, time periods) in a time series data set is denoted T. Because there are 213 quarters from 1960:Q1 to 2013:Q1, this data set contains T = 213 observations. Some observations in this data set are listed in Table 1.2. The data in each row correspond to a different time period (year and quarter). In the first quarter of 1960, for example, GDP grew 8.8% at an annual rate. In other words, if GDP had continued growing for four quarters at its rate during the first quarter of 1960, the level of GDP would have increased by 8.8%. In the first quarter of 1960, the long-term interest rate was 4.5%, the short-term interest rate was 3.9%, so their difference, the term spread, was 0.6%. By tracking a single entity over time, time series data can be used to study the evolution of variables over time and to forecast future values of those variables. 1.3  Data: Sources and Types TABLE 1.3 11 Selected Observations on Cigarette Sales, Prices, and Taxes, by State and Year for U.S. States, 1985–1995 Year Cigarette Sales (packs per capita) Average Price per Pack (including taxes) Total Taxes (cigarette excise tax + sales tax) Alabama 1985 116.5 $1.022 $0.333 2 Arkansas 1985 128.5 1.015 0.370 3 Arizona 1985 104.5 1.086 0.362 . . . . . . . . . . . . . . . . . . 47 West Virginia 1985 112.8 1.089 0.382 48 Wyoming 1985 129.4 0.935 0.240 49 Alabama 1986 117.2 1.080 0.334 . . . . . . . . . . . . . . . . . . 96 Wyoming 1986 127.8 1.007 0.240 97 Alabama 1987 115.8 1.135 0.335 . . . . . . . . . . . . . . . . . . 528 Wyoming 1995 112.2 1.585 0.360 Observation Number State 1 Note: The cigarette consumption data set is described in Appendix 12.1. Panel Data Panel data, also called longitudinal data, are data for multiple entities in which each entity is observed at two or more time periods. Our data on cigarette consumption and prices are an example of a panel data set, and selected variables and observations in that data set are listed in Table 1.3. The number of entities in a panel data set is denoted n, and the number of time periods is denoted T. In the cigarette data set, we have observations on n = 48 continental U.S. states (entities) for T = 11 years (time periods) from 1985 to 1995. Thus there is a total of n * T = 48 * 11 = 528 observations. 12 Chapter 1  Economic Questions and Data Key Concept 1.1 Cross-Sectional, Time Series, and Panel Data • Cross-sectional data consist of multiple entities observed at a single time period. • Time series data consist of a single entity observed at multiple time periods. • Panel data (also known as longitudinal data) consist of multiple entities, where each entity is observed at two or more time periods. Some data from the cigarette consumption data set are listed in Table 1.3. The first block of 48 observations lists the data for each state in 1985, organized alphabetica...
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Please...


Anonymous
Goes above and beyond expectations!

Studypool
4.7
Indeed
4.5
Sitejabber
4.4

Related Tags