Description
Consider the following three models of house sales where housingi is the house price, POPi is population, Yi is the level of income and ri is the interest rate for home mortgages (standard errors in parenthesis and t-stats reported below). The correlation matrix is below the estimations. Each cell in the table is the simple correlation coefficient between the two variables indicated in the corresponding row and column headings.
Model 1:
Μ
π»ππ’π πππ =β132+2225ln(πππ)+4500ln(π)β185π
t-stat = Model 2:
(1612) 1.38
Adj-R2 = 0.784
(3000) 1.50
N = 50
(77) - 2.45
ππππ
Μ
π»ππ’π πππ =β140+8700ln(π)β184π
t-stat = Model 3:
πππ
(1548) (72) 5.62 - 2.56
Adj-R2 = 0.769
Μ
π»ππ’π πππ =β160+3575ln(πππ)β189π
t-stat =
ln(POPi) ln(Yi)
ri
a.
πππ
(812) (72)
4.40 - 2.62 Adj-R2 = 0.761
ln(POPi) 1.00 0.92 0.15
Correlation Matrix ln(Yi) ri
1.00
-0.34 1.00
Look at the simple correlation coefficients. Is there evidence of multicollinearity? If yes, for which variables?
1
b. Why does the coefficient on ln(POPi) increase when dropping ln(Yi) from the equation?
c. Even if you did not have the correlation matrix you could have determined whether there is multicollinearity. What could you say from comparing the regression output between the three models that would indicate whether there is multicollinearity?
d. Look at the coefficient on interest rates across the models. Why it does not change much across regressions while the coefficients on the other variables do?
e. Which model is preferred and why?
f. Do a Durbin Watson tests of first order serial correlation for Model 1. Show the critical values and the decision rule. Suppose you obtain a DW statistic equal to DW = 0.25. Do we have evidence of serial correlation. Show your work.
2
2. You are the assistant for an Economist hired by the Department of Education to determine how much does spending in public education affects future salaries. For this, a sample of 120 individuals who graduated from public high schools was selected and the amount of spending in public education (the average during the time individuals were in school) was recorded. Data from a household survey (N=120) allowed you to estimate the following regression model (IMPORTANT: t-statistics are in parenthesis):
Μ
πΏπ(π πππ) = 7.4 + 0.14 πππ’ππ + 0.025 πΏπ(π πππππ) + 0.03 ππ₯ππ + 0.02 ππππ β 0.20 π΅πΏπ΄πΆπΎπ
t-tests: (2.41 (3.05) (1.53) (0.97) (2.00) R2 = 0.75
where,
LN(sali) = the natural log of the annual salary of individual i
Educi = the years of education (12 = high school, 13 = one year of college, etc) LN(spendi) = the natural log of the (average) real annual spending in public education
in the high school district from which individual i graduated (millions of
dollars)
expi = years of experience working.
agei = the age of individual i
BLACKi = 1 if individual i is black, 0 otherwise
a. What are the expected signs of the coefficients? (The expected signs. Not the estimated ones.)
b. Which variables are statistically significant at the 5% level?
c. Is there any obvious signs of omitted variable bias? If yes, for which variable(s)?
3
d. Do you suspect multicollinearity? Explain why or why not. If yes, for which variable(s)?
e. What would be your primary concern from this dataset: serial correlation or heteroskedasticity? Why?
f. What is the interpretation of the R2?
g. What is the expected value of the dependent variable for a white 35 year old with 16 years of education, 20 years of experience and from a school district spending an average of 50 million dollars a year?
4
h. Consider that the correlation between experience (exp) and age is 0.85 and the correlation between experience and education is 0.46.
i. A classmate argues that the correlation of experience and age is creating a problem of multicollinearity that is affecting the hypothesis tests of experience and age. Is this possible? Why?
ii. Another classmate argues that the correlation of experience and age is also creating a problem of multicollinearity that is affecting the hypothesis test of education. Is your classmate right or wrong? Why?
i. Suppose that a White test for heteroskedasticity produces a test-statistic equal to 12. Do a hypothesis to test for heteroskedasticity at the 5% significance level?
j. Now suppose instead that you have been hired to provide an independent evaluation of the study above. You read in the conclusions section of the report: βWe conclude that the amount of public spending in education has a positive and statistically significant impact on future salariesβ. Considering the econometric problems do you agree with this conclusion using the information above? Explain your answer. (Hint: This question is about the quality of the regression and not about what you might think. Assume there are no omitted variables. The answer can be only one sentence.)
5
Unformatted Attachment Preview
Purchase answer to see full attachment
Explanation & Answer
Hi, here is your assignment :)
1. Consider the following three models of house sales where housingi is the house price, POPi is
population, Yi is the level of income and ri is the interest rate for home mortgages (standard
errors in parenthesis and t-stats reported below). The correlation matrix is below the estimations.
Each cell in the table is the simple correlation coefficient between the two variables indicated in
the corresponding row and column headings.
a .Look at the simple correlation coefficients. Is there evidence of multicollinearity? If yes, for
which variables?
Answer:
Yes, taking correlations just among sets of indicators, be that as it may, is restricting. It is
conceivable that the pair-wise connections are little, but then a straight reliance exists among
three or considerably more variables.
b. Why does the coefficient on ln(POPi) increase when dropping ln(Yi) from the equation?
Answer:
Coefficient on In (POPi) increases when dropping In (Yi) from equation because In (Yi) and In
(POPi) are highly correlated. We can see from the correlation matrix, that the correlation
between In (Yi) and (POPi) is 0.92.
c. Even if you did not have the correlation matrix you could have determined whether there is
multicollinearity. What could you say from comparing the regression output between the three
models that would indicate whether there is multicollinearity?
Answer:
Even if the information of correlation matrix is not given, we can still say that there is
multicollinearity because we can see that in the first model, there are large standard error and
less t-value indicating high multicollinearity. In model 2, if we drop In (Yi), then Standard ...