Econometrics ps **using RStudio**

User Generated

lnzv_0502

Programming

Description

Attached notes for unit1(read from 1-10) before start to work on the ps.

Unformatted Attachment Preview

1 Review 1st Semester Econometrics Review Outline: 1. Population Model and Model Assumptions 2. Statistical Tests (t-test and F-test) 3. Changing Units of Measurement 4. Log and Level Models 5. Failure of Model Assumptions 6. Variance of OLS Estimator 7. Quadratics and Interactions 8. Indicator Variables 9. Fixed E↵ects 10. 1st Di↵erencing 1.1 Population Model and Model Assumptions Multiple Regression Model: Y = 0+ 1 X1 + 2 X2 + · · · + k Xk + u where Y and X1 , X2 , ... are data, u is the error term, and 0, 1, . . . , k are parameters • Terminology – Y: Dependent variable, or explained variable, or LHS variable – X’s: Independent variable, or explanatory variables, or RHS variables – Predicted Parameters: ˆ0 , ˆ1 , . . . , ˆk – Ŷ : Predicted Dependent Variable, or fitted Y Ŷ = ˆ0 + ˆ1 X1 + ˆ2 X2 + · · · + ˆk Xk 1 – û: residual û = Y Ŷ • “Simple Regression Model”: Y = 0+ 1 X1 + u – Simple because only one RHS variable – 0 : intercept parameter – 1 : slope parameter – Estimated slope parameter: ˆ1 = PN i=1 (Xi X̄)(Yi Ȳ ) PN 2 i=1 (Xi X̄) Where: X̄ and Ȳ are the sample means, and the i subscript indicates the value of the variable for each observation i in the sample – Can approximate the estimated e↵ect on Y as: The change in the predicted Y as approximately equal to the product of the estimated slope parameter and the RHS variable, or = Ŷ ⇡ ˆ1 X1 • ˆ1 has the same interpretation in a regression model with multiple RHS variables as in simple regression, except now we hold all other variables (X2 , . . . , Xk ) constant • We can write the predicted value of Y, or fitted value of Y as: Ŷ = ˆ0 + ˆ1 X1 + ˆ2 X2 + · · · + ˆk Xk • Define the residual as: û = Y Ŷ • Parameter estimates are derived in the model by minimizing the “mistakes” P 2 – Select ˆ0 , ˆ1 , . . . , ˆk to minimize N i ûi – We can draw a simple picture that shows the regression line on our plotted sample of data: 2 We Have Six Basic Assumptions: 1. Linear in the parameters: Y = 0+ 2 X2 + · · · + 1 X1 + k Xk + u • At first this may seem very restrictive, but we will see that it is a surprisingly flexible model • Important to recognize that assumption is over the parameters • We can still incorporate nonlinearities in the data 2. Random sampling: random sampling of n observations {(xi1 , xi2 , . . . , xik , Yi ) : i = 1, . . . , n} • The idea is to learn about the world (i.e. “the population”) by taking a sample • We want to test hypotheses about the population using the sample • If the sample is non-random then we will not be able to make accurate statements about the population • Inference errors made using a non-random sample is Selection Bias 3. No Perfect Multicolinearity: (a) No variable is a constant. (b) We cannot write one variable as a linear combination of other variables. 4. Zero Conditional Mean: E[u|X1 , X2 , . . . , Xk ] = 0 • Says that the expected value of the error term conditional on the RHS variables in the model is zero • Assumption can fail if: (a) there is a RHS variable that “should” be in the model, or (b) if there is measurement error in a RHS variable • We sometimes refer to failure due to (a) as Omitted Variable Bias, which is closely related to Selection Bias 2 5. Homoskedasticity: V ar[u|X1 , X2 , . . . , Xk ] = • Says that the variance of the error term conditional on the RHS variables in the model is equal to a constant • Implies that the error term variance for each observation is the same • This assumption is not necessary for accurate parameter estimates • This assumption is only necessary for doing hypothesis testing • Failure of this assumption is called Heteroskedasticity 6. Normally distributed error term: u ⇠ N (0, 3 2 ) • Says that the error term is distributed according to a Normal Distribution with mean zero (from assumption 4) and constant variance (from assumption 5) • This assumption is only necessary for doing hypothesis testing Overview of OLS Assumptions 1. If assumptions 1-4 are valid then we can say that an estimator from our model (e.g. ˆ1 ) is Unbiased: • Unbiased because the expected value of the estimator (e.g. ˆ1 ) is equal to the true value of the parameter (e.g. 1 ), or E[ ˆ1 ] = 1 • Note that in most statistical (econometric) analysis that uses (reasonably) large datasets we essentially replace the concept of unbiasness with consistency: The estimated parameter from the sample converges to the true population parameter in probability (e.g.: ˆ1 !p 1 ) 2. If assumptions 1-5 are valid then we can say that an estimator from our model (e.g. ˆ1 ) is the Best Linear Unbiased Estimator (BLUE): • This result follows from the Gauss-Markov Theorem • Linear because the model is linear in parameters • Unbiased because assumptions 1-4 are valid • “Best” because the estimator has the smallest variance among all other linear unbiased estimators: ! [Draw pictures of two estimated parameter distributions, where the ˆBLU E distribution has the smaller variance] 3. If assumptions 1-6 are valid then we can do statistical tests: e.g. t-test, F-test: • Note that we can “easily” relax the normality assumption and still conduct statistical tests 4 Three OLS Facts Pn 1. i=1 ûi = 0. • The sum of all the residuals (i.e. observation specific prediction mistakes) in our sample is equal to zero • This is why we use the sum of squared residuals as our criteria to find our estimated parameters 2. Pn i=1 xij ûi = 0 8 xi1 , xi2 , . . . , xij • The sum of the product of the residual and each variable (e.g. the numerical value for X1 ) across all observations in our sample is zero 3. The point (Y , X 1 , . . . , Xk ) is on the regression line The “Partialling Out” Interpretation of Multiple Regression • We interpret each ˆj (for the slope coefficients) as the e↵ect of the independent variable on the outcome while holding all the other X’s constant =) That is, putting in each Xj “controls” for this factor when interpreting the ˆ’s for the other RHS variables • There is a neat mathematical way to see this using the simple regression model: 1. Run a Multiple Regression Model Y = 0+ 1 X1 + 2 X2 + u =) Ŷ = ˆ0 + ˆ1 X1 + ˆ2 X2 where we care about ˆ1 2. Take a di↵erent 2-Step Approach (1) Estimate a simple regression model with only the independent variables: X1 = 0 + 1 X2 + u - Calculate the predicted dependent variable: X̃1 = ˜0 + ˜1 X2 - Define ũ = X1 X̃1 - We interpret ũ as part of X1 not correlated with X2 (2) Estimate a model with the dependent variable we care about, Y, and the residual from the first step: Y = 1 ũ + e, where e is the error term - e is the error term - This 2nd regression has no intercept (but that is not really important) 5 - Calculate the predicted dependent variable: Ẏ = ˙1 ũ Q: How does ˆ1 compare to ˙1 ? =) They are the same! • Intuition: – ũ is the part of X1 uncorrelated with X2 – Using the part of X1 uncorrelated with X2 in a simple regression is the same thing as controlling for the e↵ect of X2 in a multiple regression 1.2 Statistical Tests 1.2.1 Testing Hypothesis About a Single Population Parameter: The t Test • It is important to remember that never know j with certainty j are unknown features of the population; we will • The best we can do is hypothesize about the value of using statistical inference j and then test the hypothesis • Since we are testing a sample we use the t distribution - Recall that the t distribution converges to the Normal distribution as the sample size gets larger (once n ⇡ 100, the distributions are very close) • So, instead of: ( ˆj j) SE( ˆj ) ⇠ N (0, 1) we use: ( ˆj j) ⇠ tn (k+1) ˆ SE( j ) where: q ˆ ˆ – Standard Deviation for j : SD( j ) = V ar[ ˆj ] q ˆ ˆ – Standard Error for j : SE( j ) = V\ ar[ ˆj ] ⇤ We use the Standard Error because we need to estimate V ar[ ˆj ] using our estimate for 2 6 Steps in Testing Our Parameters 1. Establish the Null Hypothesis, H0 : j = 0 • Most, but not all, of our H0 will test if j = 0 Example • Consider the Model: Y = 0+ 1 X1 + 2 X2 + u Y ⌘ Global Temperature Change Since 1750 X1 ⌘ % Individuals Working as Pirates in World X2 ⌘ Carbon Dioxide (and other GHG) emissions • The null hypothesis is H0 : 1 = 0 – In words: Controlling for the level of emissions, the correlation between pirates and global temperature is zero 2. Define the t-statistic (or t ratio) for a null hypothesis of t ˆj = j = 0 as: ˆj SE( ˆj ) • t-stat measures how many estimated standard deviations ˆj is away from zero • If the t-stat is “large”, then we reject H0 • We define how “large” the t-stat must be for rejection of H0 based on our selection of a critical value • Critical value c defines the level for t ˆj at which we reject the null hypothesis (given a specific significance level) • Typical significance level is 5%: If we select a 5% significance level, then we are willing to mistakenly reject H0 when it is true 5% of the time • Probability value answers the question: Given the observed value of the t-test, what is the smallest significance level at which H0 would be rejected? 3. Roughly speaking, we reject the Null in favor of the Alternative Hypothesis (that j 6= 0) when |t ˆj | > 2: ! Draw a roughly normal pdf for ˆj with shaded cuto↵ at 2 7 1.2.2 Testing Hypothesis About Multiple Parameters Simultaneously: F Test Goodness of Fit Statistic, R2 • Define the Following: – Total Sum of Squares: SST ⌘ – – Pn Y) i=1 (Yi Pn Explained Sum of Squares: SSE ⌘ i=1 (Ŷi P Residual Sum of Squares: SSR ⌘ ni=1 ûi 2 • Define R2 : SSE =1 SST 2 Y )2 SSR SST • We interpret R2 as the amount of variation in the dependent variable in our sample that our model can explain • R2 mechanically increases with the number of explanatory variables X included in the regression model provided that each new X variable added is correlated with Y • It is not necessarily bad to have a low R2 value • R2 is the key to running F -tests • Adjusted R2 is a slightly di↵erent formula that takes into account the number of explanatory variables: 1 SSR/N SST /N Define F-statistic F ⌘ (SSRr SSRu )/q SSRu /[n (k + 1)] where F ⇠ Fq,n (k+1) and SSRr ⌘ SSRu ⌘ n X ûi 2 from the Restricted Model i=1 n X ûi 2 from the Unrestricted Model i=1 q : number of restrictions on H0 (i.e. numerator degrees of freedom) n (k + 1) : degrees of freedom in the Unrestricted Model Notes: • (SSRr SSRu ) 0 is always true • Reject the Null if the F statistic is “large”, i.e. F > c 8 • Choose the significance level and look in the F table to find c (the default in Stata is 5% significance level) • If we reject H0 , we say that the variables in H0 are jointly statistically significant at the given significance level • If we fail to reject H0 , we say that the variables are jointly insignificant • It is possible for variables to be collectively statistically significant, but individually insignificant F-test Example Y = 0+ H0 : 1 X1 + 2 X2 + 3 = 0 AND 3 X3 + 4 X4 + 4 = 0 AND 5 X5 + u 5 = 0 H1 : H0 not true. 1. Estimate Unrestricted Model (i.e. the model without applying the Null Hypothesis) • Calculate SSRu 2. Estimate Restricted Model (i.e. the model after applying the Null Hypothesis) • In this example, Restricted Model: Y = 0+ 1 X1 + 2 X2 + u • Calculate SSRr 3. Calculate F using the formula • In this example: q = 3 and n (k + 1) = n 6 4. Compare F to c in a F-table to make a judgment about the Null • In practice, statistical software will calculate F and provide a probability value 1.2.3 Economic Significance vs. Statistical Significance • Statistical significance is determined by t ˆj (also F -stat) • Economic significance is determined by the size of ˆj • Possible to have a statistically significant variable that economically is not important – Most likely to happen with very large datasets with a very large sample size 9 Example (Wooldridge, p135) The question of interest is what company-level factors are correlated with a higher participation rate in a company retirement savings plan. The sample size is 1,534. Y ⌘ Participation Rate in Company Savings Plan X1 ⌘ Match Rate, X2 ⌘ Age of Plan, X3 ⌘ Firm Size Results (replacing each Ŷ = j with its estimate, SE( j ) in parentheses): 80.29 (0.78) + 5.44 X1 (0.52) + 0.269X2 (0.045) 0.00013X3 (0.00004) Interpretation 1. All variables statistically significant at the 5% level (e.g. using the “rule of 2”) 2. If size of company increases by 10,000 then this is associated with 1.3 percentage point decrease in the participation rate in company savings plan • A very large change in the size of a company is associated with a small change in participation in the savings plan • This is not an economically important finding 1.3 Changing Units of Measurement The units of measurement we use for our variables can, not surprisingly, a↵ect how we interpret the coefficient estimates ˆ0 and ˆ1 . Consider the following cases: CASE 1: Scaling the units of measurement for the Y variable • Suppose we run a simple regression of the hours of sleep a MSU student receives on a school night (Y) and the credit hours the student is enrolled in for the semester (X): HrsSleep = 0+ 1 (CreditHrs) + u – We estimate that ˆ0 = 10 and ˆ1 = 0.25 • What would ˆ0 , ˆ1 be if we ran the following regression instead? M inSleep = 0+ 1 (CreditHrs) + u – That is we changed the units of the data for the dependent variable 10 – New ˆ0 = 10 ⇤ 60 = 600 ; New ˆ1 = 0.25 ⇤ 60 = 15 – Intuition: Both ˆ0 and ˆ1 are measured in terms of Y-units; If we change the dependent variable units then the estimators get scaled by the same amount CASE 2: Scaling the units of measurement for the X variable Suppose we run the new regression: HrsSleep = 0+ 1 (CreditM in) + u. • Let “Credit Minutes” simply be Credit Hours times 60 Q: What is the ˆ1 ? =) : New ˆ1 = Recall: = Ŷ = ˆ1 · . X - So, the product of ˆ1 and X gives us our estimate of Ŷ - If we don’t change the units of Y , then the change in units of X must be o↵-set by the change in scale of ˆ1 Q: What is the new ˆ0 ? =) New ˆ0 = Old ˆ0 ; ˆ0 doesn’t change! Why? Because 1.4 0 is measured in units of Y , which didn’t change Log and Level Models (1) (2) (3) (4) Model Level-Level Level-Log Log-Level Log-Log Dep Var Y Y log(Y ) log(Y ) Ind Var X log(X) X log(X) Interp. 1 Y = 1· X 1 Y = ( 100 )·% X % Y = (100 1 ) · X % Y = 1·% X • It is very common in economics to have the Dependent Variable be in (natural) logarithmic form 11 • Terminology – Economists often use log and ln interchangeably – is shorthand for a change • Why use log transformations of the data? – The most traditional reason is that a variable that is not normally distributed in levels may be normally distributed after a log transformation ⇤ Thus, technically speaking, the OLS normality assumption will be satisfied; But the normality assumption can be relaxed when we are using a regression model to describe relationships in the sample (which is typically the case), rather then trying to predict individual data points – The most relevant reasons focus on practicality ⇤ Log transformation can make additive and linear models make more sense: A multiplicative model on the original scale corresponds to an additive model on the log scale ⇤ Difficult to interpret level e↵ects; Easier to interpret percent changes ⇤ Log transformation can lead the regression model to better explain the variation in the data (as measured by R2 ) • One challenge in using a log transformation is when the data include zeros – If the dependent variable includes zeros then could use model (2) ⇤ Could also use a limited dependent variable model such as Logit, Probit, or Poisson – Use a di↵erent transformation of the data: ln(original data + small number), where the “small number” could be 1 if the data are in the millions, or e.g. 10 7 if the data are on a smaller scale 1.5 Failure of Model Assumptions In this class we will be focusing on selection bias • Selection bias occurs when those observations in the sample that receive the “treatment” are di↵erent on observable characteristics (e.g. other X variables) than those observations that don’t receive treatment 12 • If there is selection bias, we can think of the observations as selecting into treatment based on other characteristics • The estimated ˆ parameter on the RHS treatment variable will be biased, e.g. E[ ˆ] 6= , when there is selection into treatment 1.5.1 Assumption 2 Fails If Assumption 2 fails so that the estimation sample is non-random, then you will often have to worry about selection bias An Example • You are interested in the e↵ect of (potentially) performance enhancing drugs on academic performance • Your causal question of interest is what the e↵ect of taking a prescription level of Ritalin shortly before the test has on a student’s SAT score • Population Model: SAT score = 0+ 1 1(T akesRitalin) + Controls + u Notation: - 1() is an indicator function where the variable = 1 if what is inside the parentheses is true and = 0 otherwise - “Controls” is just shorthand for the other control variables that a↵ect educational performance (along with their parameters) • 2 Possible Approaches 1. Exactly control for all factors that a↵ect education: Not Possible! 2. Use an experiment or experimental-type setting where we can think of Ritalin as randomly assigned • “Ideal Experiment” 1. Sign in for SATs and be given one of two pills (Ritalin or Placebo) that you must swallow in front of a SAT test administrator 2. This type of experiment is not possible for ethical and legal reasons • The usual alternative approach 13 1. Get survey data on high school students who took the SAT, where the survey asks about Ritalin use and a bunch of other stu↵ 2. Control for all information in the survey 3. Cross fingers and hope for no selection bias, so that after including control variables that Ritalin use is as good as randomly assigned Diagnostics for Selection Bias • Ultimately, absent a truly random experiment, it is very difficult to know for sure that your model estimates do not su↵er from selection bias • However, there are two relatively simple things you can do as a researcher to get a sense of the potential selection bias 1. Compare whether the values of the non-treatment RHS variables are similar based on whether the observation receives treatment – It is important to compare RHS variables that are predetermined and/or not a↵ected by whether the observation received treatment – The most common way to conduct this comparison is to calculate the mean of each RHS variable for both groups (treated and non-treated) in the sample and to provide evidence for whether the means are similar using e.g. a t-test – Another way to make the comparison is using figures that show the entire distribution of the values (not just the mean) of each RHS variable separately for both groups – If the treated and non-treated groups are similar along the observable variables: The argument is stronger that there are no unobserved variables that di↵er in their values between the groups and which is leading to selection bias – If the treated and non-treated groups are not similar along one or more observable variables: There is greater concern that the coefficient of interest will be biased 2. Estimate di↵erent models that range from a parsimonious specification to one that includes all of the available control variables – What we would like to see is that the estimated coefficient on the parameter of interest is mostly “stable” across the model specifications – If adding additional control variables does not have much of an e↵ect on the estimated coefficient: Increases our confidence that the unobserved factors that are not controlled for in the model are not leading to selection bias 14 1.5.2 Assumption 3 Fails 1. If Assumption 3 fails then the model is characterized by multicolinearity 2. This is really just a technical assumption that is necessary to mechanically run OLS 3. Recall that there are two parts to this assumption (a) No variable is constant (b) We can’t write one variable as a linear combination of others An Example Suppose you are interested in the role that money has in getting politicians elected. You estimate the following county-level regression model using the 2016 presidential 2-way (Democrat and Republican) vote share results. The population model: Y = 0+ 1 X1 + 2 X2 + 3 X3 + u Y : Votes for President Trump X1 : Dollars spent by President (then candidate) Trump X2 : Dollars spent by Secretary Clinton X3 : Total dollars spent by the two candidates Q: What is the problem with this model? – The problem is that X3 is a linear combination of X1 and X2 – We are unable to run this model because the parameters are undefined Q: How could we adjust the model? 1. We could simply drop one of the variables, but this wouldn’t allow us to separately estimate each of the three factors 2. Redefine X3 ⌘ (T otalspent)2 , now X3 is a not linear combination of X1 and X2 Failure of Assumption 4: E[u|X1 , . . . , Xk ] 6= 0 Assumption 4 can fail if there is: 1. An omitted variable 2. Measurement error in the independent variable 15 Omitted Variable Bias • Suppose the true population model is: Y = 0 + 1 X1 + 2 X2 + u But instead we use the model: Y = 0 + 1 X1 + u. (1) (2) Estimating Model (1): Ŷ = ˆ0 + ˆ1 X1 + ˆ2 X2 Estimating Model (2): Ỹ = ˜0 + ˜1 X1 Q: When will ˆ1 = ˜1 ? • The relationship is determined by the following equation: ˜1 = ˆ1 + ˆ2 · ˜1 • The 1 from the incorrect model (2) equals the sum of (i) ˆ1 : the real e↵ect of 1 (ii) Bias Term: ˆ2 · ˜1 • The bias term accounts for the e↵ect of ˆ2 that is being incorrectly picked up in ˜1 in the simple regression model (2) • To get the bias term, run the following simple regression with no intercept: X2 = 1 X1 + u And the predicted dependent variable: X̃2 = ˜1 X1 • Interpretation of the bias term: - ˆ2 is the e↵ect in the true model of X2 on Y - ˜1 is the correlation between X1 and X2 • In other words, the bias term depends on: (a) The importance of the omitted variable in explaining the outcome (b) The relationship between the two X’s Q: If the importance of the omitted variable increases, what happens to the bias of ˜1 ? =) Bias increases 16 Q: If the correlation between X1 and X2 increases, what happens to the bias of ˜1 ? =) Bias increases 1.6 Variance of OLS Estimator 1.6.1 The variance of our estimated slope parameter 2 V ar[ ˆj ] = Pn X j )2 (1 i=1 (Xij • 2 Rj2 ) comes from the assumption of homoskedasticity X j )2 term is essentially the V ar[Xj ] • The (Xij • The (1 Rj2 ) term captures the correlation between the jth variable and all the other independent variables: (i) Rj2 is the R2 from the regression Xj = 0+ 1 X1 + 2 X2 + · · · + k 1 Xk 1 + u (ii) Rj2 is the proportion of the total variation in Xj explained by all the other X’s 1.6.2 We would like V ar[ ˆj ] to be low. When will V ar[ ˆj ] be low? 1. Smaller 2 leads to a smaller V ar[ ˆj ] Intuition: 2 measures the (conditional) variance of the error term. The error term represents those factors correlated with the dependent variable not in the model. If 2 is low, then this implies that factors outside of the model have a limited influence on the dependent variable. Thus, we would expect our model estimates to be more precise. Role of researcher: We can usually make 2 smaller by adding more variables to the regression so that fewer important variables are in the error term. For example, this is often why control variables are included in models even when the model is for a randomized experiment. 2. Larger (Xij X j )2 leads to smaller V ar[ ˆj ] Intuition: OLS fits a regression “line” across the range of the X values. If the data do not span the range of X values then we are, by definition, relying more on extrapolation. Whenever there is extrapolation we would expect the precision of our estimates to decrease. A larger V ar[Xj ] implies that there is less need for extrapolation. 17 Role of researcher: We can (typically) increase the V ar[Xj ] by increasing the sample size, or by choosing the same-sized sample with larger V ar[Xj ]. 3. Larger (1 Rj2 ) implies smaller V ar[ ˆj ] • We would like Rj2 to be smaller • Rj2 is smaller if we exclude some X’s • If we exclude X’s that are important control variables then this might lead to omitted variable bias • We can exclude X’s that are closely correlated with another RHS variable 1.6.3 We usually have to estimate 2 ˆ2 V\ ar[ ˆj ] = Pn X j )2 (1 i=1 (Xij We defined ûi ⌘ Yi Rj2 ) Ŷi , we can write: Yi = Ŷi + ûi () Yi = ˆ0 + ˆ1 Xi + ûi () 0 + 1 Xi + ui = ˆ0 + ˆ1 Xi + ûi ( ˆ0 () ûi = ui 0) ( ˆ1 1 )Xi This shows us that ûi 6= ui . However, taking expectations: E[ûi ] = E[ui ] E[ûi ] = E[ui ] E[ ˆ0 ] + E[ 0 ] E[ ˆ1 Xi ] + E[ 1 Xi ] • If we knew ui , a sample estimator for E[u2 ] would be Pn 2 i=1 ui . n • What if we plug in ûi in place of ui ? Pn i=1 ûi 2 n • It turns out that this is a biased estimator of ui because there are restrictions on the values ûi can take given that n X ûi = 0 and i=1 n X i=1 18 Xi ûi = 0 • These two restrictions imply two fewer degrees of freedom (if we know all but two of the ûi we can compute the other two) • Unbiased estimator: ˆ2 = s = 1 2 n n X 2 i=1 2 ûi = ✓ 1.7 Quadratics and Interaction Terms 1.7.1 Models With Interaction Terms 1 n 2 ◆ · SSR The partial e↵ect (correlation after controlling for other variables) of one variable on the dependent variable can depend on the magnitude of another variable. Example (Wooldridge, p197) How much do various attributes of a house a↵ect the price of the house? Y = 0+ 1 X1 + 2 X2 + 3 X3 + 4 X4 + u where Y ⌘ Price of House X3 ⌘ Sq. Feet * Bedrooms X1 ⌘ Square Feet X4 ⌘ Bathrooms X2 ⌘ Bedrooms • X3 is the interaction variable • Interpretation (Level-Level Model): • Technically, 2 is the e↵ect of P rice = [ 2 + ( 3 · sqf t)] · Bedrooms bedrooms when sqf t = 0. This is impossible! - We need to remember that (like our intercept term) sometimes a literal interpretation at Xj = 0 does not make sense - The good news is that there will never be a house with 0 square feet. So, we should not have to worry about this interpretation for this model. • Statistical Significance and Testing Coefficients - In the above example, if we test the coefficients separately for each coefficient: H0 : 2 = 0, H0 : 3 = 0, we could get a case where we fail to reject both H0 19 - If we care about the overall a↵ect of the number of bedrooms on the housing price, then we would want to do an F -test with H0 : 2 = 0, 3 = 0 - Sometimes our “coefficient of interest” is on the interaction term (i.e. our research hypothesis is best answered by this coefficient), then a t-test is appropriate 1.7.2 Regression Models with Quadratics • Quadratic functions are used to capture decreasing or increasing marginal e↵ects • Consider the following model: Y = 0+ 1X + 2X 2 +u We can approximate how a change in X a↵ects Y as: Ŷ ⇡ ( ˆ1 + 2 ˆ2 X) · () X Ŷ ⇡ ˆ1 + 2 ˆ2 X. X • It is possible that the two terms have di↵erent signs Example (Wooldridge, p197) How much does work experience a↵ect your salary? Ŷ So, = W age = (0.289 3.73 + (0.35) 0.0061Exper2 (0.0009) 0.298 Exper (0.041) 2 · 0.0061 · Exper) Exper Numerical Calculations: - Going from 0 ! 1 year of experience: wage [ = $0.298 or 29.8 cents - Going from 10 ! 11 years of experience: wage [ = $0.176 or 17.6 cents wage [ = .298 2 · 0.0061 · 10 =) - At some point (i.e. years of experience), an increase in years of experience is predicted to decrease wage. We can find this inflection point, provided the ˆ’s .298 have di↵erent signs, as: 2(0.0061) = 24.4 years of experience. 20 1.8 Indicator Variables • Can use when we have qualitative information (gender, race, happiness, etc.) • Can use to make the model more flexible and less parametric 1.8.1 Indicator to Represent Qualitative Information Example How does gender (here defined as binary: male and female) correlate with the wage paid? (1) Consider the following simple model for a worker’s wage: wage = 0 + 0 f emale + 1 education + u - The model assumes that the wage a worker is paid depends on two factors: eduction (measured in years) and whether you are male/female - f emale is an indicator variable that =1 if female and =0 if not female (i.e. male) - We could think of this equation in terms of expectations Assume: E[u|f emale, educ] = 0 (i.e. Assumption 4). Then: 0 = E[wage|f emale = 1, educ] = E[wage|f emale, educ] E[wage|f emale = 0, educ] E[wage|male, educ] - Since we condition on education in each expectation, the di↵erence is just the e↵ect due to gender - Intuitively, including an indicator variable allows for wage to be (sadly) higher for males than females at every level of education [Draw picture with 2 regression lines for male and female] (2) Now consider a model for wage that interacts the female indicator: wage = 0 + 0 f emale + 1 education + 2 (education ⇤ f emale) + u - Model di↵ers from (1) only by the new interaction variable: education ⇤ f emale - This model allows for the correlation of wage and education to depend on whether the worker is male/female [Draw picture with 2 regression lines for male and female] 21 1.8.2 Indicator Variables for a Model with Multiple Categories • Suppose your data are categorical data that involve several possible responses • The two most basic ways to model these data are to include the data as a single variable, or to estimate a model with multiple indicator variables representing the di↵erent possible responses Example: (Survey) asks 2 questions (1) How happy were you with your last product by this company? 1 = Extremely unhappy, 2 = unhappy, 3 = ok, 4 = happy, 5 = very happy. (2) What is the percent chance that you will purchase a product by this company in the future: 0.00 to 1.00. Model 1: Single Categorical Variable Model • We could write this model as: Y = 0+ 1 X1 + u Y : The likelihood that you purchase the product again X1 : Variable for how happy the customer was (values 1,2,...,5) • Interpretation: ˆ1 is the e↵ect of a one-unit increase on the happiness scale on the predicted likelihood of buying the product again in the future • ˆ1 assumes a constant marginal e↵ect - Increase on the happiness scale from 1 to 2 is the same as from 2 to 3, etc. - This is an implicit restriction on the model Model 2: Multiple Categorical Variable Model • Define the following variables: X1 = 1 if very unhappy , = 0 if not very unhappy X2 = 1 if unhappy , = 0 if not unhappy X3 = 1 if happy , = 0 if not happy X4 = 1 if very happy , = 0 if not very happy • The new model becomes: Y = 0+ 1 X1 + 2 X2 + 3 X3 + 4 X4 + u 22 • This model does not restrict the relationship between the estimated coefficients, and thus does not force a constant relationship between the likelihood of purchase and product happiness Q: What is the interpretation of ˆ1 ? ˆ1 is the correlation between purchasing a future product and being “very unhappy” with the last purchase, relative to the customers who were OK with their last purchase 1.9 Fixed E↵ects • Panel data allows for the use of fixed e↵ects in a regression model – Recall that panel data implies that there are repeated observations at di↵erent times for the same underlying unit – In other words, there is a time dimension (e.g. daily, monthly, yearly, etc.) and a unit dimension (e.g. person, city, country, etc.) • Write the regression equation for a panel with 2 years data for each unit: Yit = 0 + 0 X1t + 1 X2it + ai + uit – i indexes the unit dimension, t indexes the time dimension – X1t is an indicator variable = 1 if the observation is in the 2nd year – X2it is a variable that varies between individuals and over time – ai + uit is the composite error term with two parts – ai are variables not in the model that vary only between units, but not over time (e.g. birthplace) – uit are variables not in the model that vary between units and over time (e.g. college GPA for current college students) • Model is called the unobserved e↵ects model or fixed e↵ects model • In the composite error, ai is the fixed e↵ect (or unobserved heterogeneity) and uit is the time-varying error (or idiosyncratic error) 23 1.9.1 Unit and Time Fixed E↵ects • The key advantage of fixed e↵ects models is that a researcher can control for unobserved factors that could otherwise lead to bias in the estimation of the model parameters • The way that the math works is as follows: – Rewrite the above model (for simplicity we exclude the time indicator variable): (1) Yit = 1 X1it + ai + uit , t = 1, 2, . . . , T – Take the time average for each individual i: (2) Y i = 1 1 X 1 i + ai + ui ; where e.g. Y i = T – Subtract (2) from (1): Yit Yi = (3) Y˙it = 1 (Xit X i ) + uit ui , PT t=1 Yit t = 1, 2, . . . , T ˙ 1 Xit + u˙it where the notation Y˙it just means time-demeaned - Estimate (pooled) OLS on equation (3) Notes: 1. Modern statistical programs such as Stata and R will do all of the math for you 2. The key OLS assumption for unbiased coefficient estimates is now: E[u˙it |X˙it = 0] - The key advantage of including unit fixed e↵ects is that the number of omitted factors that could potentially bias our estimates are essentially cut in half - Only unit-level factors omitted from the model that vary over time for the same units can lead to bias - We can e↵ectively control for all of the fixed unit-level factors without having to gather the data or even know which factors are important! 3. Of course, E[u˙it |X˙it = 0] could still fail if omitted time varying factors are correlated with the X’s 4. The correlations we want to measure in the model must involve X’s that vary over time; otherwise these factors will be di↵erenced away (and not estimated) 24 5. Researchers sometimes refer to the fixed e↵ect model as using (only) within unit variation to estimate the question of interest 6. If one or more of Xit are measured with error, then a fixed e↵ect model could make measurement error worse (and measurement error makes the bias in ’s worse) 1.9.2 Applying Fixed E↵ects to Other Data Structures 1. We can apply fixed e↵ect estimation techniques to many types of data structures to eliminate unobserved group or “cluster” fixed e↵ects 2. Can include more than one type of “cluster” fixed e↵ect in the same model 3. Key requirements: (1) Need more than one cluster in the dataset (e.g. if the cluster is a school, then need observations from more than one school) (2) Need more than one observation within each di↵erent cluster in the dataset (e.g. need to have more data on more than one student in the same school) Example What is the e↵ect that attending Headstart on child academic outcomes? • Econometric model: Ysf t = 1 X1s + 2 X2sf t + ↵f + t + uif t – There are now three subscript to keep track of the structure of the data: s for student, f for family, t for year – Ysf t : test score at the end of 1st grade for student s in family f during year t – X1s : whether the student attended Headstart prior to entering 1st grade – X2sf t : control variable that varies by s, f, and t – ↵f : family fixed e↵ect – t : year fixed e↵ect • Using family fixed e↵ects controls for unobserved and unchanging variables at the family level (e.g. characteristics about the parents that are fixed over the time period, characteristics about the home environment that are fixed over the time period) • It is important to remember that we can only include family fixed e↵ects in this model if there is more than one sibling from each family in the dataset 25 1.10 First Di↵erence Estimation • Motivation: We may be worried that the correlation between the composite error term and the X’s is not equal to zero, i.e. E[(ai + uit )|Xit ] 6= 0 • This would lead to bias in our estimated coefficients • Let the regression be: Yit = 0+ 1 X1it + ai + uit • Looking just at Period 1: Yi1 = 0+ 1 X1i1 + ai + ui1 • Looking just at Period 2: Yi2 = 0+ 1 X1i2 + ai + ui2 • Subtracting Period 1 from Period 2 for each observation i, we get: (Yi2 Yi1 = 1 (X1i2 Yi = X1i1 ) + (ui2 () 1 X1 + ui1 ) ui . We call ˆ1 our “First-Di↵erence Estimator” for the correlation between Y and X1 • Key: The unobserved ai has been “di↵erenced away”, so we no longer need to worry about these omitted factors biasing our coefficient estimates Notes: 1. First Di↵erencing is most commonly used in analyses using time series data (i.e. datasets with many observations at di↵erent time intervals on few underlying units) 2. We must have variation over time in the X’s; otherwise these factors will be di↵erenced away. Dummy variables for race, gender, birth place, etc. drop out of the first-di↵erenced model 3. Sometimes when we di↵erence the two years, there might not be much variation left for the variables, which leads to larger SEs. This can be a major drawback of First Di↵erencing. 4. When organizing panel data so that you can first di↵erence: 0 1 Observation Data B P erson1,t=1 · C B C B P erson1,t=2 C · B C B P erson2,t=1 · C B C B P erson2,t=2 C · @ A .. .. . . 26 5. As with the fixed e↵ect model, if one or more variables are measured with error, then first di↵erence will be make measurement error worse (and measurement error makes the bias in ’s worse) Choosing Between Fixed E↵ects and First Di↵erencing • When T = 2, First Di↵erencing and Fixed E↵ects are the same • When T 3, the FE estimator will have smaller standard errors, provided there is no serial correlation – Serial Correlation occurs when the error terms are correlated for the same unit across time – If Serial Correlation, then the assumption E[uit |X 0 s] = 0 will be violated and our coefficient estimates will be biased • If n is relatively small and T is relatively large so that the dataset is a time series dataset, then we will want to First Di↵erence (for example: I = 10, T = 30 and therefore N = 300) 1.11 Published Research Papers 1. Undergraduate Econometrics Instruction: Through Our Classes, Darkly (Joshua D. Angrist and Jorn-Ste↵en Pischke, Journal of Economic Perspectives, 2017). 2. Design-Based Research in Empirical Microeconomics (David Card, American Economic Review, 2022). 3. Estimating Safety by the Empirical Bayes Method: A Tutorial (Ezra Hauer, Douglas W. Harwood, Forrest M. Council, and Michael S. Griffith, Transportation Research Record, 2002). 4. Criminal Deterrence When There Are O↵setting Risks: Traffic Cameras, Vehicular Accidents, and Public Safety (Justin Gallagher and Paul J. Fisher, American Economic Journal: Economic Policy, 2020). 5. Will Studying Economics Make You Rich? A Regression Discontinuity Analysis of the Returns to College Major (Zachary Bleemer and Aashish Mehta, American Economic Journal: Applied Economics, 2022). 6. The Righteous and Reasonable Ambition to Become a Landholder: Land and Racial Inequality in the Postbellum South (Melinda C. Miller, Review of Economics and Statistics, 2019). 27 7. What Drives Racial and Ethnic Di↵erences in High Cost Mortgages? The Role of High Risk Lenders (Patrick Bayer, Fernando Ferreira, and Stephen L. Ross, The Review of Financial Studies, 2017). [Note: Focus on Sections 1-3 and 6.] 8. The Impact of Jury Race in Criminal Trials (Shamena Anwar, Patrick Bayer, and Randi Hjalmarsson, Quarterly Journal of Economics, 2012). 9. Intended and Unintended Consequences of Youth Bicycle Helmet Laws (Christopher Carpenter and Mark Stehr, Journal of Law and Economics, 2011). 28
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

View attached explanation and answer. Let me know if you have any questions.Hey! i have answered your questions according to the file you send me. please feel free to ask any question related. I hope that you will good marks as well. thanks

BSE Fundamental of Regression Analysis 2021/22

Assignment 1
Aysu Demir, Claudia Ochoa and Sandra Vicaria
1.
a) As we have the variable lwklywage, defined by: lwklywage = ln(wage), first we
create a new variable called wage, being: wage = exp(lwklywge).
generate wage = exp(lwklywge)

Wage is a variable that shows us the weekly wage. Now we compute the mean, if the
dummy variable education is 0, which means that we compute de average of the wage for
people with education years equal to 0 in the sample:
mean wage if educ == 0

Using this command we can see that the mean wage for people who have no education is
mean = 241.4846 as displayed in the table above.
wage = �0+ �1*educ with Bo=241.4846, which implies that the mean is an estimator of
the parameter of �0.

b) Using command hist we have produced the density histogram for the variable weekly
pay (wage). And following this we have used the command summarize in order to find the
basic descriptive statistics for wage.
hist(wage)

summarize(wage)

summarize(wage), detail

*the sample mean 439.4709 and sample median 384.7117 are as expected
*coefficient of skewness (26.39 as it is a lognormal distribution, which is expected to
exhibit some degree of right-skew, and therefore this coefficient is as expected).
c) Here we have used the exact same commands as the previous question but in this case
for the variable lwklywge. Therefore, the following graph represents the density histogram
for the variable lwklywge and the basic descriptive statistics for this same variable, using the
command summarize, and summarize, detail.
hist(lwklywge)

summarize(lwklywge)

summarize(lwklywge), detail

*the sample mean 5.89 and sample median 5.95 are as expected
*coefficient of skewness (-2: highly skewed to the left left tail is longer)
d) The first step is to sort the data in order to be able to plot the data we are interested in.
Therefore we use the keyword “by”, we use it as a prefix as it will sort it for the subgroup
education, and not the entire dataset. Therefore the following commands are used:
by educ, sort: egen conditional_mean = mean(lwklywge)
As we want to see the distribution of the data we use twoway connected, and it will give us
a similar plot to the theoretical one from mostly harmless econometrics referred to in the
problem set.

twoway connected (conditional_mean educ)

e)
regress lwklywge educ

The thick black function is an estimate of lwklywge = �0+ �1*educ with �0=4.99 and
�1=0.070
f) The thick black function illustrates the CEF (conditional expectation function) of the
lwklywge given the years of education for our sample. This CEF shows the fact that, in
spite of the big variations for each individual circumstance, it can be said that in general
people with more years of education tend to earn more.
g) We could use the thick black function for...

Related Tags