Please type your name here
On 1 through 15, no partial credit. No comments or work will be considered in grading. 2 points each.
1
2
3
4
5
6
7
8
9
10
11
why?
12
13
14
15
outlier? Yes or no
to be used for grading:
possible
points
problem
1 through 15
30
16
17
18
19
15
15
20
20
total
100
your
points
0
Questions 1 through 15 are 2 points each.
1. True False In simple linear regression, the standard error of the fit, often denoted Se, is in the same u
2. True False The correlation coefficient r has the same sign as the slope b 1 in the least squares fit Y =
3. True False A correlation of 0.5 could correspond to a scatter plot with randomly scattered points tiltin
4. True False In multiple regression, the null hypothesis for the F-test is that there is a linear relationship
5. True False In simple linear regression, if the 95% confidence interval for the slope includes 0 then th
6. True False In multiple regression, if the F-test has a p-value less than .05 then all of the slopes will al
7. True False In regression, R-squared is the proportion of the variance in the dependent variable that is
8. ___ A medical researcher wondered if there is a significant difference between the mean birth weight of
Boys
8
4.7
7.3
6.2
3.4
Girls
5.3
2.8
6.4
6.8
7.4
To test the researcher’s hypothesis, we should use the:
a. paired (dependent) samples t -test
b. independent samples t -test
c. large-sample z -test
d. t -test for correlation
9. ___ In a test of a new surgical procedure, the five surgeons participated in a study at a hospital. Each sur
Surgeon
Allen
Bob
Chloe
Daphne
Old way
36
55
28
40
New way
31
45
28
35
Edgar
62
Which test should we use to test for zero difference in mean
57 times?
a. Paired t -test
b. Independent samples t -test
c. Independent samples z test
d. Cannot be sure without knowing α.
10. ______ To test whether a linear relationship using only a single independent variable is significant, w
a. The hypothesis that the correlation is 0.
b. The hypothesis that the slope is 0.
c. Whether the p-value of the F-statistic is less than .05.
d. All of the above.
e. None of the above.
11. The regression equation Salary = 25,000 + 3200 YearsExperience + 1400 YearsCollege describ
For questions 12 through 15:
In a Case-Control study, the Cases (those with disease) and Controls (those without disease) are compa
In Cohort studies, cohort groups are defined as those exposed and not exposed to a factor, and they are
12. Which of these two types of studies is also called a prospective study?
13. Which of these two types of studies is also called a retrospective study?
14. True False The incidence rate for those exposed in a Cohort study has as the denominator the tot
15. True False In a Case Control study, to find the prevalence of exposure among the cases, the deno
that there is a linear relationship between the dependent variable and the set of independent variables.
between the mean birth weight of boy and girl babies. Random samples of 5 babies’ weights (in pounds) for each gender showed the follow
d in a study at a hospital. Each surgeon was assigned two patients of the same age, gender, and overall health. One patient was operated upo
ce + 1400 YearsCollege describes employee salaries at a company. The standard error of the fit is 2800. An employee has 10 yea
ose without disease) are compared with respect to their past exposure or not to the factor of interest.
alth. One patient was operated upon in the old way, and the other in the new way. Both procedures are considered equally safe. The goal of
s 2800. An employee has 10 years' experience and 4 years of college. This employee’s salary is $66,000. Is this employee’s salar
nsidered equally safe. The goal of the study was to compare average times to perform the surgery in the two different ways. The surgery tim
wo different ways. The surgery times are shown below:
Please type your name here
16. (15 pts) A published study reported that overweight people on low-carbohydrate/Mediterranean diets lost more
weight and got greater cardiovascular benefits than people on a conventional low-fat diet. To verify this, a nutritionis
followed 30 dieters on low-carbohydrate/Mediterranean diets and 30 dieters on conventional low-fat diets and
measured (in pounds) weigh loss for each. The data set is included in the excel answer file.
a. What hypothesis should be used to show that average weight loss for those on low-carbohydrate/Mediterranean
diets is statistically significantly greater than average weight loss on conventional low-fat diets?
b. Use statistical software to perform this hypothesis test; assume a significance level of .05 and that the population
variances are equal. What are the critical value, test statistic, and p-value?
c. What is the conclusion, in the context of the problem?
16. 15 pts
null hypothesis:
alternative hypothesis:
critical value =
test statistic =
p-value =
Conclusion:
c. Interpretation of conclusion in context of problem:
Please show work here.
Low-carb/Mediterranean Diets Low-fat Diet
9.5
6.5
8.1
5.8
10.4
9.9
11.9
5.1
11.8
8
12.6
6.3
6.7
6.3
9.6
4.4
11.6
5.7
8.4
5.9
9
6.8
7.5
5.1
7.2
6.3
8.5
5.5
8.8
5.5
6.8
5.9
9.1
6.9
9.4
9.1
10.2
8
9.5
8.9
9.5
3.4
9.4
12
9.9
9.7
9.2
13
11.3
13.6
9
4.6
6.2
4.6
6.7
4.6
7.1
11
4.5
3.9
Mediterranean diets lost more
iet. To verify this, a nutritionist
ntional low-fat diets and
file.
-carbohydrate/Mediterranean
fat diets?
of .05 and that the population
Please type your name here
17. (15 pts) Consider the following three Case-Control studies to determine whether there is an association between
disease. The Cases are patients with brain tumors and the Controls are patients without brain tumors. The three facto
are: Gender, Smoking, and Cellular Phone Use. The three tables below show the numbers of Cases with and without
each factor, and the number of Controls with and without exposure to each factor.
For each study: find the prevalence of exposure and the odds of exposure for the Cases, find the prevalence of exposu
odds of exposure for the Controls, and calculate the odds ratio of Cases to Controls. Interpret, in the context of the pr
odds ratio. The data sets are included in the excel answer file.
a. First is a study to determine if ‘being Male’ (Gender) is a factor with brain tumors. This means that exposure is de
male, and no exposure is being female.
b. The second study looks at whether smoking is a factor with brain tumors.
c. The third study looks at whether cellular phone usage is a factor with brain tumors.
17. 15 pts
Study 1
prevalence of
exposure
odds of
exposure
odds ratio
Cases
Controls
Interpretation:
Study 2
exposure
Male
Female
total
prevalence of
exposure
odds of
exposure
odds ratio
Cases
Controls
Interpretation:
Study 3
Cases
Controls
Interpretation:
exposure
Smoker
Nonsmoker
total
prevalence of
exposure
odds of
exposure
odds ratio
exposure
Use cellular phone
Do not use cellular phone
total
there is an association between a factor and a
ut brain tumors. The three factors considered
bers of Cases with and without exposure to
es, find the prevalence of exposure and the
nterpret, in the context of the problem, the
. This means that exposure is defined as being
s.
Study 1- Gender and Brain Tumor
Brain Tumor
No Brain Tumor
(Cases)
(Controls)
11
10
9
10
20
20
Study 2- Smoking and Brain Tumor
Brain Tumor
(Cases)
13
7
20
No Brain Tumor
(Controls)
9
11
20
Study 3- Cellular Phone Use and Brain Tumor
Brain Tumor
No Brain Tumor
(Cases)
(Controls)
11
11
9
9
20
20
Please type your name here
18. (20 pts) An article in New York Times discussed the relationship between Scholastic Aptitude Test (SAT) score
and the test-takers’ family incomes. It commented that the wealthier a student’s family, the higher the SAT score.
Another common conjecture is that the student’s high school grade point average (GPA) is a good predictor of the
student’s SAT score. In the Data Analysis output below, I used regression to find the least squares fit to model SAT
score using GPA and Income. The data set I used had 24 students’ SAT scores, the students’ family Incomes (in $) a
high school GPA’s. Please use the output given to answer the following questions. It is included in the answer file.
a. What is the least squares fit equation to predict SAT using Income and GPA? Please state it in equation form: Y =
constant + slope*X1 + slope*X2 where you include the constant and slope values.
b. What is the approximate size of errors when using your equation to predict SAT? Compare this to standard devia
of the SAT scores, which is 84.47.
c. What fraction of the variation in SAT is explained using Income and GPA in the least squares equation?
d. What is the F-statistic of the equation? Please interpret it in the context of this problem. What hypothesis does it
test? What is its p-value? What is your conclusion with respect to the hypothesis?
e. Interpret the two slopes. Are the slope estimates significant? Explain.
f. Use the equation to predict the SAT of a student with Income = average of the Incomes ($72,833) and GPA =
average of the GPA’s (3.28). What is your prediction?
18. 20 pts
a. equation is
b. size of errors:
comparison:
c. fraction is:
d. Fstat:
Interpretation of F-stat:
H0:
H1:
Conclusion:
e. slope of Income is
Interpretation of slope:
test statistic:
H0:
H1:
Is it significant? Why?
e. slope of GPA is
Interpretation of slope:
test statistic:
H0:
H1:
Is it significant? Why?
f. SAT prediction:
p-value:
p-value
p-value
Regression Statistics
Multiple R
0.930
R Square
0.865
Adjusted R Square
0.852
Standard Error
32.490
Observations
24
ANOVA
df
Regression
Residual
Total
Intercept
Income
GPA
2
21
23
SS
141927.957
22167.876
164095.833
Coefficients Standard Error
1104.258
54.752
0.0017
0.00025
150.992
15.093
MS
70963.979
1055.613
F
Significance F
67.225
7.44E-10
t Stat
P-value Lower 95% Upper 95%
20.168 3.17E-15
990.394 1218.122
6.770 1.07E-06
0.001
0.002
10.004 1.92E-09
119.604 182.380
c Aptitude Test (SAT) scores
he higher the SAT score.
is a good predictor of the
t squares fit to model SAT
ents’ family Incomes (in $) and
ncluded in the answer file.
state it in equation form: Y =
mpare this to standard deviation
t squares equation?
m. What hypothesis does it
es ($72,833) and GPA =
Please type your name here
19. (20 pts) Investigate a multiple regression model of employee wages (Wage) considering the following variables: years
of higher education (EDUC), years of experience (EXPER), age (AGE), and gender (GENDER= 1 for male, 0 for female).
There are 50 observations, based on interviews with employees, given in the exam answer file. Wage is in units of
$10,000’s; education, experience, and age are in years.
a. First consider a model that uses four all of the independent variables. Give the prediction equation and include the
software run that produced the model. Is the model significant? Comment on this model, referencing the usual measures to
assess the fit.
b. Predict the average Wage for a 40-year-old male with 10 years of higher education and 5 years of experience. Predict
the average Wage of a 40-year-old female with the same qualifications.
c. Interpret the Gender coefficient. Is it significant at the 5 % level? Why?
d. Remove any variables that are not significant and build a new regression model. Give the prediction equation and
include the software run that produced the model. Comment on this model, referencing the usual measures to assess the fit.
e. Find the correlation matrix and with it explain the differences in the two models and why some of the variables were
not significant in the first model.
19. 20 pts
a. equation is:
Significant?
Comments:
yes or no
Why?
b. prediction is:
prediction is:
c. Interpretation of Gender coefficient:
d. Which variables, if any, are not significant?
new equation is:
Significant?
yes or no Why?
Comments:
e. Comments of differences after viewing the correlation matrix. Please show matrix to the right here:
Note that: Wage is in $10,000's, and EDUC and AGE are in years, and GENDER = 1 for male, 0 for female.
Wage
EDUC
EXPER
AGE
Gender
37.85
11
2
40
1
21.72
4
1
39
0
14.34
4
2
38
0
21.26
5
9
53
1
24.65
6
15
59
1
25.65
6
12
36
1
15.45
9
5
45
0
20.39
29.13
27.33
18.02
20.39
24.18
17.29
15.61
35.07
40.33
20.39
16.61
16.33
23.15
20.39
14.88
13.88
17.65
15.45
26.35
19.15
16.61
18.39
15.45
18.02
13.44
17.66
16.96
14.34
15.45
17.43
35.89
20.39
11.81
15.45
17.66
13.87
16.35
15.45
23.67
16.02
23.15
24.18
4
5
11
8
9
7
4
1
9
11
4
6
9
6
4
4
5
6
6
4
6
6
9
5
7
4
6
4
4
6
5
9
4
4
9
5
6
7
9
4
11
4
8
12
14
3
5
18
1
10
9
22
3
14
5
3
15
13
9
4
5
2
18
4
4
3
8
6
3
23
15
9
3
14
16
20
5
10
4
1
10
2
17
2
15
11
37
37
43
32
40
49
43
31
45
31
55
30
28
60
32
58
28
40
37
52
44
57
30
43
31
33
51
37
45
55
57
36
60
35
34
28
25
43
42
47
46
52
64
0
1
1
0
1
1
0
0
0
1
0
1
0
1
0
1
0
1
0
1
0
0
1
0
1
0
1
0
0
0
0
1
1
0
0
1
0
1
1
0
1
0
0
owing variables: years
or male, 0 for female).
is in units of
ation and include the
he usual measures to
of experience. Predict
ediction equation and
ures to assess the fit.
of the variables were
Purchase answer to see full
attachment