ECON7300: Statistical Project Assignment, Semester 2, 2016
Instructions for Dataset 1: Simple Regression Analysis (30 marks)
Paul is a student currently enrolled in Economics. He decided to try to improve his
results. He needed to know the secret of success for university students. After many
hours of discussing with more successful students, he postulated a theory: the longer
one studied, the better one’s grade. To test the theory, he took a random sample of 96
students in an economics subject and asked each student to report the average amount
of time he/she studied economics and the final mark received.
The variables in the dataset are:
Study time (X, in hours)
Final mark (Y, out of 100)
The dependent variable for your analysis is Final mark.
Answer the following questions using dataset 1.
(a) Estimate a regression model using X to predict Y (state the simple linear
regression equation).
(b) Interpret the meaning of the slope.
(c) Predict Y when X = 26.
(d) Compute the coefficient of determination and interpret its meaning.
(e) Compute the standard error of the estimate and interpret its meaning. Judge the
magnitude of the standard error of the estimate.
(f) Perform a residual analysis (plot the residuals) and evaluate whether the
assumptions of regression have been violated.
(g) Test for the slope using t test (follow all the necessary steps). Assume 5% level
of significance.
(h) Test for the slope using F test (follow all the necessary steps). Assume 5% level
of significance.
(i) Test for the correlation coefficient (follow all the necessary steps). Assume 5%
level of significance.
(j) Compute a 95% confidence interval estimate of the mean Y for all students when
X = 26 and interpret its meaning.
(k) Compute a 95% prediction interval of Y for an individual student when X = 26 and
interpret its meaning.
1
ECON7300: Statistical Project Assignment, Semester 2, 2016
Instructions for Dataset 4: Multiple Regression Analysis (45 marks)
Life insurance companies are keenly interested in predicting how long their customers
will live, because their premiums and profit-ability depend on such numbers. A
statistician for one insurance company gathered data from 100 recently deceased male
customers. He recorded the age at death of the customer, ages at death of his mother
and father, and whether the man was a smoker.
The variables in the dataset are:
Longevity (Y, age at death of the male customer)
Mother (X1, age at death of his mother)
Father (X2, age at death of his father)
Smoker (X3, coded 1 if the man was a smoker, and 0 if the man was not a
smoker)
The dependent variable for your analysis is Longevity.
Answer the following questions using dataset 4.
(a) Estimate a regression model using X1 and X2 to predict Y (state the multiple
regression equation).
(b) Interpret the meaning of the slopes.
(c) Predict Y when X1 = 72 and X2 = 65.
(d) Compute a 95% confidence interval estimate of the mean Y for all male
customers when X1 = 72 and X2 = 65 and interpret its meaning.
(e) Compute a 95% prediction interval of Y for a male customer when X1 = 72 and
X2 = 65 and interpret its meaning.
(f) Plot the residuals to test the assumptions of the regression model. Is there any
evidence of violation of the regression assumptions? Explain
(g) Determine the variance inflation factor (VIF) for each independent variable (X1
and X2) in the model. Is there reason to suspect the existence of collinearity?
(h) At the 0.05 level of significance, determine whether each independent variable
(X1 and X2) makes a significant contribution to the regression model (use t tests
and follow all the necessary steps). On the basis of these results, indicate the
independent variables to include in the model.
(i) Test for the significance of the overall multiple regression model at 5% level of
significance.
2
ECON7300: Statistical Project Assignment, Semester 2, 2016
(j) Determine whether there is a significant relationship between Y and each
independent variable (X1 and X2) at the 5% level of significance (hint: testing
portions of the multiple regression model using the partial F test).
(k) Compute the coefficients of partial determination and interpret their meaning
(l) Estimate a regression model using X1, X2 and X3 to predict Y (state the multiple
regression equation, the regression equation for smoker, the regression equation
for non-smoker) and interpret the coefficient for X3.
(m) Estimate a regression model using X1, X2, X3, an interaction between X1 and
X2, an interaction between X1 and X3, and an interaction between X2 and X3 to
predict Y.
(n) Test whether the three interactions significantly improve the regression model.
Assume 5% level of significance (hint: test the joint significance of the three
interaction terms using the partial F test. If you reject the null hypothesis, test the
contribution of each interaction separately (using the partial F test) in order to
determine which interaction terms to include in the model).
3
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.834386
R Square
0.6962
Adjusted R Square
0.692968
Standard Error
8.645672
Observations
96
Residuals
40
df
SS
MS
F
Significance F
1 16101.68 16101.68 215.4139
4.7E-26
94 7026.279 74.74765
95 23127.96
Coefficients
Standard Error t Stat
Intercept
25.51259 3.574654 7.13708
X Variable 1 1.750642 0.119278 14.67698
RESIDUAL OUTPUT
ObservationPredicted Y Residuals
1 78.03187 -7.03187
2 34.26581 -4.26581
3 88.53572 -6.53572
4 90.28636 7.713638
5 81.53315 -3.53315
6 65.77737 7.222631
7 85.03444 -3.03444
8 85.03444 8.965565
9 100.7902 -1.79022
10 85.03444 -0.03444
11 81.53315 -7.53315
12 78.03187 0.968134
13 88.53572 -6.53572
14 95.53829 -7.53829
15 67.52801
-12.528
16 69.27865 -7.27865
17 76.28122 14.71878
18 62.27608 3.723916
19 79.78251 6.217492
20 78.03187 -5.03187
21 83.28379 6.716207
22 78.03187 9.968134
23 83.28379 7.716207
24 64.02673 -0.02673
25 76.28122 6.718777
26 67.52801 19.47199
27 78.03187 17.96813
28 79.78251 4.217492
29 83.28379 8.716207
30 69.27865 12.72135
0
-20
ANOVA
Regression
Residual
Total
20
P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
1.97E-10 18.41503 32.61015 18.41503 32.61015
4.7E-26 1.513813 1.987472 1.513813 1.987472
0
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
92.037
71.0293
76.28122
95.53829
85.03444
86.78508
76.28122
90.28636
86.78508
76.28122
97.28893
88.53572
83.28379
97.28893
74.53058
72.77994
74.53058
79.78251
69.27865
74.53058
43.01902
69.27865
69.27865
79.78251
50.02159
93.78765
74.53058
78.03187
72.77994
58.7748
69.27865
88.53572
79.78251
39.51773
88.53572
69.27865
64.02673
83.28379
81.53315
50.02159
99.03957
78.03187
78.03187
65.77737
55.27351
85.03444
72.77994
76.28122
71.0293
69.27865
76.28122
44.76966
83.28379
90.28636
76.28122
-4.037
3.970704
3.718777
-1.53829
7.965565
7.214922
19.71878
-9.28636
-9.78508
-6.28122
-5.28893
-13.5357
-1.28379
-8.28893
7.469419
-12.7799
-3.53058
-0.78251
-12.2787
0.469419
1.980982
-6.27865
3.721346
15.21749
-13.0216
1.212353
1.469419
12.96813
4.220062
-7.7748
-14.2787
-1.53572
-1.78251
-6.51773
4.46428
19.72135
-14.0267
-4.28379
5.46685
-0.02159
-18.0396
-10.0319
-1.03187
1.222631
-3.27351
8.965565
-2.77994
-5.28122
-6.0293
-16.2787
0.718777
10.23034
9.716207
-7.28636
0.718777
86
87
88
89
90
91
92
93
94
95
96
74.53058
57.02416
57.02416
81.53315
86.78508
88.53572
83.28379
76.28122
79.78251
85.03444
53.52287
-3.53058
10.97584
-13.0242
8.46685
0.214922
-3.53572
12.71621
-10.2812
3.217492
0.965565
-1.52287
X Variable 1 Residual Plot
40
20
0
-20
0
10
20X Variable 30
1
40
50
Sheet1
Study time Final mark
30
71
5
30
36
82
37
98
32
78
23
73
34
82
34
94
43
99
34
85
32
74
30
79
36
82
40
88
24
55
25
62
29
91
21
66
31
86
30
73
33
90
30
88
33
91
22
64
29
83
24
87
30
96
31
84
33
92
25
82
38
88
26
75
29
80
40
94
34
93
35
94
29
96
37
81
35
77
29
70
41
92
36
75
33
82
41
89
28
82
27
60
28
71
31
79
25
57
28
75
10
45
Page 5
Sheet1
25
25
31
14
39
28
30
27
19
25
36
31
8
36
25
22
33
32
14
42
30
30
23
17
34
27
29
26
25
29
11
33
37
29
28
18
18
32
35
36
33
29
31
34
16
63
73
95
37
95
76
91
77
51
55
87
78
33
93
89
50
79
87
50
81
68
77
67
52
94
70
71
65
53
77
55
93
83
77
71
68
44
90
87
85
96
66
83
86
52
Page 6
Longevity Mother
Father
Smoker
80
85
78
0
73
88
63
1
70
66
75
1
72
72
67
1
79
88
73
0
83
90
72
0
70
67
65
1
72
76
71
1
72
66
75
1
71
78
64
1
67
69
66
1
74
71
76
0
80
74
77
0
63
68
66
1
71
70
70
1
66
64
66
1
74
82
71
0
71
71
71
1
65
75
60
1
74
76
66
0
66
69
68
1
71
76
67
1
73
69
80
1
74
79
70
0
68
74
62
1
77
81
78
0
77
85
69
0
74
77
69
1
68
72
71
1
68
75
63
1
74
82
66
0
71
73
72
1
80
84
73
0
72
75
61
1
77
82
76
0
76
88
69
0
62
65
57
1
70
75
57
1
71
72
67
1
69
72
63
1
73
79
71
1
72
77
69
1
84
80
90
0
73
76
74
1
70
78
70
1
78
75
74
0
82
77
67
72
67
72
75
72
75
80
72
71
71
89
74
72
74
77
77
73
72
65
81
76
64
72
69
59
69
76
63
66
72
68
73
78
83
78
67
70
68
66
75
71
72
71
77
83
75
68
69
67
78
73
71
73
84
78
76
77
86
79
72
78
85
72
67
74
66
82
80
71
73
75
59
73
82
71
74
85
72
80
82
82
82
73
77
74
80
76
71
77
76
79
78
66
69
72
67
64
69
66
73
75
64
61
70
77
71
77
68
77
72
71
64
63
77
70
67
69
67
62
69
65
56
64
60
66
75
72
82
70
62
60
79
63
72
60
64
67
74
0
0
1
1
1
1
0
1
0
0
1
1
1
0
0
1
0
0
0
1
1
1
0
0
1
1
1
1
1
0
1
1
1
1
1
0
0
0
1
1
1
1
0
1
1
1
0
67
68
67
74
70
72
69
68
71
76
77
66
78
73
68
61
62
64
68
70
60
1
1
1
0
1
1
1
...

Purchase answer to see full
attachment