ECON 7810 Albany College of Pharmacy Applied Econometrics Worksheet

User Generated

zvyyvbaf9909

Economics

ECON 7810

Albany College of Pharmacy

ECON

Description

The question is using R to analyse the data and answer.

part of the question:

Please provide a summary of the statistics table including the number of observations, mean, standard deviation, minimum and maximum of all variables in the dataset.

(4 points) (ii) Using the data to estimate the population model

attend =β 0 + β1hw1 + u

Unformatted Attachment Preview

Econ7810: Applied Econometrics, Fall 2021 Homework #2 Due date: 22 October. 2021; 1pm. Do not copy and paste the answers from your classmates. Two identical homework will be treated as cheating. Do not copy and paste the entire output of your statistical package's. Report only the relevant part of the output. Please also submit your R-script for the empirical part. Please put all your work in one single le and upload via Moodle. Part I Multiple Choice (3 points each, 21 points in total) Please choose the answer that you think is appropriate. 1.1 When there are omitted variables in the regression, which are determinants of the dependent variable, then a. you cannot measure the eect of the omitted variable, but the estimator of your included variable(s) is (are) unaected. b. this has no eect on the estimator of your included variable because the other variable is not included. c. this will always bias the OLS estimator of the included variable. d. the OLS estimator is biased if the omitted variable is correlated with the included variable. 1.2 If you had a two regressor regression model, then omitting one variable which is relevant a. will have no eect on the coecient of the included variable if the correlation between the excluded and the included variable is negative. b. will always bias the coecient of the included variable upwards. c. can result in a negative value for the coecient of the included variable, even though the coecient will have a signicant positive eect on Y if the omitted variable were included. d. makes the sum of the product between the included variable and the residuals dierent from 0. 1.3 Consider the multiple regression model with two regressors X1 and X2, where both variables are determinants of the dependent variable. You rst regress Y on X1 only and nd no relationship. However when regressing Y on X1 and X2, the slope coecient changes by a large amount. This suggests that your rst regression suers from a. heteroskedasticity b. perfect multicollinearity c. omitted variable bias d. dummy variable trap 1.4 Imperfect multicollinearity a. implies that it will be dicult to estimate precisely one or more of the partial eects using the data at hand b. violates one of the four Least Squares assumptions in the multiple regression model c. means that you cannot estimate the eect of at least one of the Xs on Y d. suggests that a standard spreadsheet program does not have enough power to estimate the multiple regression model 1.5 If you reject a joint null hypothesis using the F-test in a multiple hypothesis setting, then a. a series of t-tests may or may not give you the same conclusion. b. the regression is always signicant. c. all of the hypotheses are always simultaneously rejected. d. the F-statistic must be negative. 1 1.6 If the estimates of the coecients of interest change substantially across specications, a. then this can be expected from sample variation b. then you should change the scale of the variables to make the changes appear to be smaller c. then this often provides evidence that the original specication had omitted variable bias d. then choose the specication for which your coecient of interest is most signicant 1.7 You have estimated the following equation: d T estScore = 607.3 + 3.85Income − 0.0423Income2 where TestScore is the average of the reading and math scores on the Stanford 9 standardized test administered to 5th grade students in 420 California school districts in 1998 and 1999. Income is the average annual per capita income in the school district, measured in thousands of 1998 dollars. The equation a. suggests a positive relationship between test scores and income for most of the sample. b. is positive until a value of Income of 610.81. c. does not make much sense since the square of income is entered. d. suggests a positive relationship between test scores and income for all of the sample. Part II Short Questions (32 points in total) Please limit your answer to less than or equal to 5 lines per sub-question. (12 points) 2.1 This question is about an omitted variable bias. The following model estimates the eects of age on time spent sleeping by adults: d = 3128.91 + 3.54age, sleep (59.47) (1.47) 2 n = 706, R = 0.008, where sleep is measured in minutes per week and age is measured in years. The standard errors are given in the parentheses. (2 points) (i) Interpret the coecient estimate on age. (2 points) (ii) It is likely that adults trade o sleep for work. If you are also given the data for time spent working, measured in minutes per week, and include this variable (say totwrk) in the above regression, what would you expect the sign of the coecient on totwrk to be? (8 points) (iii) Part (ii) suggests that there might be an omitted variable bias in the original simple regression because totwork is not included. For an omitted variable bias to exist, what additional condition needs to be met? Do you think this condition holds in reality? If yes, what do you expect the sign of the omitted variable bias is? (20 points) 2.2 You have collected data for 104 countries to address the dicult questions of the determinants for dierences in the standard of living among the countries of the world. You recall from your macroeconomics lectures that the neoclassical growth model suggests that output per worker (per capita income) levels are determined by, among others, the saving rate and population growth rate. To test the predictions of ˆ this growth model, you run the following regression:RelP ersInc = 0.339−12.894×n+1.397×sk, R2 = 0.621 where RelP ersInc is GDP per worker relative to the United States, n is the average population growth rate, 1980-1990, and sk is the average investment share of GDP from 1960 to1990 (remember investment equals saving). (6 points) (i) Interpret the results. Do the signs correspond to what you expected them to be? Explain. (Hints:The Solow growth model predicts higher productivity with higher saving rates and lower population growth.) 2 (8 poins) (ii) You remember that human capital in addition to physical capital also plays a role in determining the standard of living of a country. You therefore collect additional data on the average educational attainment in years for 1985, and add this variable (Educ) to the above regression. This results in the modied regression output:RelPd ersInc = 0.046 − 5.869 × n + 0.738 × sk + 0.055 × Educ, R2 = 0.775 When missing variable Educ, what happen to the coecient estimates of n and sk? Explain the reason and mechanism in detail. (2 points) (iii) Upon checking the regression output, you realize that there are only 86 observations, since data for Educ is not available for all 104 countries in your sample. Do you have to modify some of your statements in (b)? (4 points) (iv) Brazil has the following values in your sample: RelP ersInc = 0.30, n = 0.021, sk = 0.169, Educ = 3.5 Does your equation overpredict or underpredict the relative GDP per worker? What would happen to this result if Brazil managed to double the average educational attainment? Part III Empirical part (47 points in total) Please limit your answer to less than or equal to 10 lines per sub-question. PLEASE REPORT YOUR REGRESSION OUTCOMES IN TABLES, NOT SCREENSHORTS. (30 points) 3.1 Use the data attendance2018.dta for this exercise and check the label of each variable for the meaning. Dr. Qin want to study the relationship between attendance rate (attend) and some characteristics/performance of students. attend is measured as a percent, and the score (hw1 ) has a maximum possible value of 100. (4 points) (i) Please provide a summary of statistics table including the number of observations, mean, standard deviation, minimum and maximum of all variables in the dataset. (4 points) (ii) Using the data to estimate the population model attend = β0 + β1 hw1 + u Report the results in a table, including sample size and R-squared. Interpret the coecient β1 . Does hw1 explain a lot of the variation in the attendance rate? (4 points) (iii) Dr. Qin just found another variable is available in this data set, entry _GP A, which is the GP A before the students were enrolled in the program. If Dr. Qin is interested in discovering the relationship between attendance rate and the rst homework score (hw1), should Dr. Qin include this variable into the regression? Explain. (8 points) (iv) Dr. Qin decides to include entry _GP A. Please use the data to estimate the model attend = β0 + β1 hw1 + β2 entry _GP A + u Please report the result and interpret β1 and β2 . Do they make sense? Please derive the sign of the possible bias if entry _GP A is excluded. (4 points) (v) Dr. Qin wants to see if there is a gender gap in the attendence rate. Please suggest a regression model, estimate it and use result to answer the question. (6 points) (vi) Dr. Qin further wonders whether her teaching is equally attractive/boring to dierent ethnic groups of students. There are in total three ethnic groups in the class, indicating by three dummy variables - black , white and asian. Please suggest a regression model, estimate it and use the result to answer the question. You may use some tests if appropriate. (17 points) 3.2 Please use VOTE2016.dta to answer the following questions. The following model can be used to study whether campaign expenditures aect election outcomes: _ voteA = β0 + β1 log(expendA) + β2 log(expendB) + u (1) voteA = β0 + β1 log(expendA) + β2 log(expendB) + β3 prtystrA + u (2) 3 where voteA is the percentage of the vote received by Candidate A, expendA and expendB are campaign expenditures (in 1000 dollars) by Candidates A and B, and prtystrA is a measure of party strength for Candidate A (the percentage of the most recent presidential vote that went to A's party). (4 points) (i) Please run the regression (1) and report your result in a table. Do A's expenditure aect the outcome and how? What about B's expenditure? (Hint: you need to rst creat the variables ln(expendA) and ln(expendB). R code log() can do) (8 points) (ii) Please run the regression (2) and report your result in the same table. Do A's expenditure aect the outcome and how? What about B's expenditure? Compare result from (i) and (ii), explain whether we should include prtystrA in the regression or not. If we exclude it, to which direction the coecient of interest tend to be biased towards? (5 points) (iii) Can you tell whether a 1% increase in A's expenditures is oset by a 1% increase in B's expenditure? How? Please suggest a regression or test and then answer the question according to your result. 4 Applied Econometrics ECON7810 Fall 2021 Lecture 5 Dr. Bei QIN SW Ch 5/6 1/42 Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution of the OLS estimator SW Ch 5/6 2/42 Omitted Variable Bias (SW Section 6.1) The error u is present because of factors, or variables, that influence Y but are not included in the regression function. There are always omitted variables. Sometimes, the omission of those variables can lead to bias in the OLS estimator. Sometimes, it does not. SW Ch 5/6 3/42 Omitted variable bias, ctd. The bias in the OLS estimator that occurs as a result of an omitted factor, or variable, is called omitted variable bias. The two conditions for omitted variable bias (1) Z is a determinant of Y (i.e. Z is part of u); and (2) Z is correlated with the regressor X (i.e. corr(Z,X) ¹ 0) Both conditions must hold for the omission of Z to result in omitted variable bias. SW Ch 5/6 4/42 Omitted variable bias, ctd. In the class size and test score example: 1. English language ability (whether the student has English as a second language) plausibly affects standardized test scores: Z is a determinant of Y. 2. Immigrant communities tend to be less affluent and thus have smaller school budgets and higher STR: Z is correlated with X. è bˆ1 is biased. What is the direction of this bias? · What does common sense suggest? · If common sense fails you, there is a formula… SW Ch 5/6 5/42 Omitted variable bias, ctd. A formula for omitted variable bias: recall the equation, n 1 n vi ( X i - X )u i å å n i =1 i =1 ˆ b1 – b1 = n = æ n -1ö 2 2 (Xi - X ) å ç ÷ sX è n ø i =1 where vi = (Xi – X )ui » (Xi – mX)ui. Under LSA #1, E[(Xi – mX)ui] = cov(Xi,ui) = 0. But what if E[(Xi – mX)ui] = cov(Xi,ui) = sXu ¹ 0? SW Ch 5/6 6/42 Omitted variable bias, ctd. Under LSA #2 and #3 (that is, even if LSA #1 is not true), 1 n ( X i - X )u i å n i =1 ˆ b1 – b1 = 1 n 2 ( X X ) å i n i =1 s Xu ® 2 sX æ s u ö æ s Xu ö æ s u ö r Xu , =ç ´ç =ç ÷ ÷ ÷ è s X ø è s Xs u ø è s X ø p where rXu = corr(X,u). If assumption #1 is correct, then rXu = 0, but if not we have…. SW Ch 5/6 7/42 The omitted variable bias formula: p æ su ö ˆ b1 ® b1 + ç r Xu ÷ èsX ø · If an omitted variable Z is both: (1) a determinant of Y (that is, it is contained in u); and (2) correlated with X, then rXu ¹ 0 and the OLS estimator bˆ is biased and is not 1 consistent. · For example, districts with few ESL students (1) do better on standardized tests and (2) have smaller classes (bigger budgets), so ignoring the effect of having many ESL students factor would result in overstating the class size effect. SW Ch 5/6 8/42 · Why? oDistricts with few ESL students (1) do better on standardized tests à The number of ESL students enters the error term with a negative sign. oDistricts with few ESL students (2) have smaller classes à The number of ESL students is positively correlated with the STR. o rXu < 0 à bˆ < b1. If b1 < 0, this means that the effect 1 of reducing the class size will be overestimated. Is this actually going on in the CA data? SW Ch 5/6 9/42 · Districts with fewer English Learners have higher test scores · Districts with lower percent EL (PctEL) have smaller classes · Among districts with comparable PctEL, the effect of class size is small (recall overall “test score gap” = 7.4) SW Ch 5/6 10/42 Mozart effect The omitted variable bias is common in studies using observational data. Here is another example: · A study claimed that listening to Mozart for 10 to 15 minutes could temporarily raise your IQ by 8 or 9 points. · Really? oA review of dozen studies found that students who take optional music or arts courses in high school do have higher English or math test scores than those who don’t. But… oAcademically better students might have more time to take optional music or arts courses. oThose schools with a deeper musical curriculum might be just better schools. · A randomized, controlled experiment fails to find a significant Mozart effect. SW Ch 5/6 11/42 Causality and regression analysis The test score/STR/English example shows that, if an omitted variable satisfies the two conditions for omitted variable bias, then the OLS estimator in the regression omitting that variable is biased and inconsistent. So, even if n is large, bˆ1 will not be close to β1. This raises a deeper question: how do we define β1? That is, what precisely do we want to estimate when we run a regression? SW Ch 5/6 12/42 What precisely do we want to estimate when we run a regression? There are (at least) three possible answers to this question: 1. We want to estimate the slope of a line through a scatterplot as a simple summary of the data to which we attach no substantive meaning. This can be useful at times, but isn’t very interesting intellectually and isn’t what this course is about. SW Ch 5/6 13/42 2. We want to make forecasts, or predictions, of the value of Y for an entity not in the data set, for which we know the value of X. Forecasting is an important job for economists, and excellent forecasts are possible using regression methods without needing to know causal effects. We will return to forecasting later in the course. SW Ch 5/6 14/42 3. We want to estimate the causal effect on Y of a change in X. This is why we are interested in the class size effect. Suppose the school board decided to cut class size by 2 students per class (holding all other factors fixed). What would be the effect on test scores? This is a causal question (what is the causal effect on test scores of STR?) so we need to estimate this causal effect. SW Ch 5/6 15/42 What, precisely, is a causal effect? · “Causality” is a complex concept! · In this course, we take a practical approach to defining causality: A causal effect is defined to be the effect measured in an ideal randomized controlled experiment. SW Ch 5/6 16/42 Ideal Randomized Controlled Experiment · Randomized: subjects from the population of interest are randomly assigned to a treatment or control group (so there are no confounding factors) · Controlled: having a control group permits measuring the differential effect of the treatment · Experiment: the treatment is assigned as part of the experiment: the subjects have no choice, so there is no “reverse causality” in which subjects choose the treatment they think will work best. SW Ch 5/6 17/42 Back to class size: Imagine an ideal randomized controlled experiment for measuring the effect on Test Score of reducing STR… (1) In that experiment, students would be randomly assigned to classes, which would have different sizes. (2) Because they are randomly assigned, all student characteristics (and thus ui) would be distributed independently of STRi. (3) Thus, E(ui|STRi) = 0 – that is, LSA #1 holds in a randomized controlled experiment. SW Ch 5/6 18/42 How does our observational data differ from this ideal? · The treatment is not randomly assigned · Consider PctEL – percent English learners – in the district. It plausibly satisfies the two criteria for omitted variable bias: Z = PctEL is: (1) a determinant of Y; and (2) correlated with the regressor X. · Thus, the “control” and “treatment” groups differ in a systematic way, so corr(STR,PctEL) ¹ 0 SW Ch 5/6 19/42 · (Randomized + Controlled) means that any differences (but the treatment) between the treatment and control groups are random – not systematically related to the treatment · But, with observational data, we can eliminate the difference in PctEL between the large (control) and small (treatment) groups by examining the effect of class size among districts with the same PctEL. oIf the only systematic difference between the large and small class size groups is in PctEL, then we are back to the randomized controlled experiment – within each PctEL group. oThis is one way to “control” for the effect of PctEL when estimating the effect of STR. SW Ch 5/6 20/42 Three ways to overcome omitted variable bias 1. Run a randomized controlled experiment in which treatment (STR) is randomly assigned: then PctEL is still a determinant of TestScore, but PctEL is uncorrelated with STR. 2. Adopt the “cross tabulation” approach, with finer gradations of STR and PctEL – within each group, all classes have the same PctEL, so we control for PctEL. 3. Use a regression in which the omitted variable (PctEL) is no longer omitted: include PctEL as an additional regressor in a multiple regression. SW Ch 5/6 21/42 The Population Multiple Regression Model (SW Section 6.2) Consider the case of two regressors: Yi = b0 + b1X1i + b2X2i + ui, i = 1,…,n · Y is the dependent variable · X1, X2 are the two regressors (independent variables) · (Yi, X1i, X2i) denote the ith observation on Y, X1, and X2. · b0 = unknown population intercept · b1 = effect on Y of a change in X1, holding X2 constant · b2 = effect on Y of a change in X2, holding X1 constant · ui = the regression error (omitted factors) SW Ch 5/6 22/42 Interpretation of coefficients in multiple regression Yi = b0 + b1X1i + b2X2i + ui, i = 1,…,n Consider changing X1 by DX1 while holding X2 constant: Population regression line before the change: Y = b0 + b1X1 + b2X2 Population regression line, after the change: Y + DY = b0 + b1(X1 + DX1) + b2X2 SW Ch 5/6 23/42 Before: After: Difference: So: Y = b0 + b1X1 + b2X2 Y + DY = b0 + b1(X1 + DX1) + b2X2 DY = b1DX1 DY , holding X2 constant b1 = DX 1 DY , holding X1 constant b2 = DX 2 b0 = predicted value of Y when X1 = X2 = 0. SW Ch 5/6 24/42 The OLS Estimator in Multiple Regression (SW Section 6.3) With two regressors, the OLS estimator solves: n min b0 ,b1 ,b2 å [Yi - ( b0 + b1 X 1i + b2 X 2i )]2 i =1 · This minimization problem is solved using calculus. · This yields the OLS estimators of b0, b1 and b2. The formulas are complicated and we would not derive them. SW Ch 5/6 25/42 Example: the California test score data Regression of TestScore against STR: = 698.9 – 2.28´STR Now include percent English Learners in the district (PctEL): = 686.0 – 1.10´STR – 0.65PctEL · What happens to the coefficient on STR? · Why? (Note: corr(STR, PctEL) = 0.19) SW Ch 5/6 26/42 Multiple regression in R = 686.0 – 1.10´STR – 0.65PctEL More on this printout later… SW Ch 5/6 27/42 Measures of Fit for Multiple Regression (SW Section 6.4) Yi = Yˆi + uˆi R2 = fraction of variance of Y explained by X R 2 = “adjusted R2” = R2 with a degrees-of-freedom correction SW Ch 5/6 28/42 R2 and R 2 (adjusted R2) The R2 is the fraction of the variance explained – same definition as in regression with a single regressor: ESS SSR R = = 1, TSS TSS 2 n where ESS =∑ ( − ) , SSR = 2 ˆ u å i , TSS = i =1 n 2 . ( Y Y ) å i i =1 · The R2 always increases when you add another regressor– a bit of a problem for a measure of “fit” SW Ch 5/6 29/42 R2 and R 2 , ctd. The R 2 (the “adjusted R2”) corrects this problem by “penalizing” you for including another regressor – the R 2 does not necessarily increase when you add another regressor. æ n - 1 ö SSR Adjusted R : R = 1 - ç ÷ n k 1 è ø TSS 2 2 Note that R 2 < R2, however if n is large the two will be very close. SW Ch 5/6 30/42 Measures of fit, ctd. Test score example: (1) = 698.9 – 2.28´STR, R2 = .05 (2) = 686.0 – 1.10´STR – 0.65PctEL, R2 = .426, R 2 = .424 · What – precisely – does this tell you about the fit of regression (2) compared with regression (1)? · Why are the R2 and the R 2 so close in (2)? SW Ch 5/6 31/42 The Least Squares Assumptions for Multiple Regression (SW Section 6.5) Yi = b0 + b1X1i + b2X2i + … + bkXki + ui, i = 1,…,n 1. The conditional distribution of u given the X’s has mean zero, that is, E(ui|X1i = x1,…, Xki = xk) = 0. 2. (X1i,…,Xki,Yi), i =1,…,n, are i.i.d. 3. Large outliers are unlikely: X1,…, Xk, and Y have four moments: E( X 1i4 ) < ¥,…, E( X ki4 ) < ¥, E(Yi 4 ) < ¥. 4. There is no perfect multicollinearity. SW Ch 5/6 32/42 Assumption #1: the conditional mean of u given the included Xs is zero. E(u|X1 = x1,…, Xk = xk) = 0 · This has the same interpretation as in regression with a single regressor. · Failure of this condition leads to omitted variable bias, specifically, if an omitted variable (1) belongs in the equation (so is in u) and (2) is correlated with an included X then this condition fails and there is OV bias. · The best solution, if possible, is to include the omitted variable in the regression. SW Ch 5/6 33/42 Assumption #2: (X1i,…,Xki,Yi), i =1,…,n, are i.i.d. This is satisfied automatically if the data are collected by simple random sampling. Assumption #3: large outliers are rare (finite fourth moments) This is the same assumption as we had before for a single regressor. As in the case of a single regressor, OLS can be sensitive to large outliers, so you need to check your data (scatterplots!) to make sure there are no crazy values (typos or coding errors). SW Ch 5/6 34/42 Assumption #4: There is no perfect multicollinearity Perfect multicollinearity is when one of the regressors is an exact linear function of the other regressors. Example: Suppose you accidentally include STR twice: SW Ch 5/6 35/42 Perfect multicollinearity is when one of the regressors is an exact linear function of the other regressors. · In the previous regression, b1 is the effect on TestScore of a unit change in STR, holding STR constant · We will return to perfect (and imperfect) multicollinearity shortly, with more examples… With these least squares assumptions in hand, we now can derive the sampling distribution of bˆ , bˆ ,…, bˆ . 1 SW Ch 5/6 2 k 36/42 The Sampling Distribution of the OLS Estimator (SW Section 6.6) Under the four Least Squares Assumptions, · bˆ is unbiased. 1 · var( bˆ1 ) is inversely proportional to n. · Other than its mean and variance, the exact (finite-n) distribution of bˆ is very complicated; but for large n… 1 p o bˆ1 is consistent: bˆ1 ® b1 o ( ) ( ) is approximately distributed N(0,1) oThese statements hold for bˆ1 ,…, bˆk Conceptually, there is nothing new here! SW Ch 5/6 37/42 Multicollinearity, Perfect and Imperfect (SW Section 6.7) Perfect multicollinearity is when one of the regressors is an exact linear function of the other regressors. Some more examples of perfect multicollinearity 1. The example from before: you include STR twice, 2. Regress TestScore on a constant, D, and B, where: Di = 1 if STR ≤ 20, = 0 otherwise; Bi = 1 if STR >20, = 0 otherwise, so Bi = 1 – Di and there is perfect multicollinearity. SW Ch 5/6 38/42 The dummy variable trap Suppose you have a set of multiple binary (dummy) variables, which are mutually exclusive and exhaustive – that is, there are multiple categories and every observation falls in one and only one category (Freshmen, Sophomores, Juniors, Seniors, Other). If you include all these dummy variables and an intercept, you will have perfect multicollinearity – this is sometimes called the dummy variable trap. · Why is there perfect multicollinearity here? · Solutions to the dummy variable trap: 1. Omit one of the groups (e.g. Senior), or 2. Omit the intercept SW Ch 5/6 39/42 Perfect multicollinearity, ctd. · Perfect multicollinearity usually reflects a mistake in the definitions of the regressors, or an oddity in the data · If you have perfect multicollinearity, your statistical software will let you know · The solution to perfect multicollinearity is to modify your list of regressors so that you no longer have perfect multicollinearity. SW Ch 5/6 40/42 Imperfect multicollinearity Imperfect and perfect multicollinearity are quite different despite the similarity of the names. Imperfect multicollinearity occurs when two or more regressors are very highly but not perfectly correlated. SW Ch 5/6 41/42 Imperfect multicollinearity, ctd. Imperfect multicollinearity implies that one or more of the regression coefficients will be imprecisely estimated. · The idea: the coefficient on X1 is the effect of X1 holding X2 constant; but if X1 and X2 are highly correlated, there is very little variation in X1 once X2 is held constant – so the data don’t contain much information about what happens when X1 changes but X2 doesn’t. · Imperfect multicollinearity (correctly) results in large standard errors for one or more of the OLS coefficients. · The math? See SW, App. 6.2 Next topic: hypothesis tests and confidence intervals… SW Ch 5/6 42/42
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Please view explanation and answer below.

Part III

Empirical Part

3.1 Use the data attendance2018.dta

I.

Summary statistics table

. summarize attend hw1 entry_GPA

II.

Variable

Obs

Mean

attend
hw1
entry_GPA

658
658
658

71.20745
69.89457
2.801016

Std. Dev.
18.83009
10.69598
.5563479

Min

Max

7
40.625
.9427

97.5
100
3.971

Estimation

Interpretation
There were 658 observations in the data set. From the above regression output and
using 0.05 alpha level of significance, it can be noted that the overall regression model is
significant, F (1, 656) = 143.89, p < 0.01. This means that hw1 has a statistically significant
effect on attendance rate. The adjusted R-squared value of 0.1786, indicates that the

independent variable hw1 explains 17.86% (quite low) of variation in the dependent variable
attendance rate. The variable hw1 had a coeffic...


Anonymous
I was stuck on this subject and a friend recommended Studypool. I'm so glad I checked it out!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags