19 Statistics problems

User Generated

fglyrf

Mathematics

Description

The problems are in the attached document

Homework #2.xlsx 

Unformatted Attachment Preview

Please type your name here On 1 through 15, no partial credit. No comments or work will be considered in grading. 2 points each. 1 2 3 4 5 6 7 8 9 10 11 why? 12 13 14 15 outlier? Yes or no to be used for grading: possible points problem 1 through 15 30 16 17 18 19 15 15 20 20 total 100 your points 0 Questions 1 through 15 are 2 points each. 1. True False In simple linear regression, the standard error of the fit, often denoted Se, is in the same u 2. True False The correlation coefficient r has the same sign as the slope b 1 in the least squares fit Y = 3. True False A correlation of 0.5 could correspond to a scatter plot with randomly scattered points tiltin 4. True False In multiple regression, the null hypothesis for the F-test is that there is a linear relationship 5. True False In simple linear regression, if the 95% confidence interval for the slope includes 0 then th 6. True False In multiple regression, if the F-test has a p-value less than .05 then all of the slopes will al 7. True False In regression, R-squared is the proportion of the variance in the dependent variable that is 8. ___ A medical researcher wondered if there is a significant difference between the mean birth weight of Boys 8 4.7 7.3 6.2 3.4 Girls 5.3 2.8 6.4 6.8 7.4 To test the researcher’s hypothesis, we should use the: a. paired (dependent) samples t -test b. independent samples t -test c. large-sample z -test d. t -test for correlation 9. ___ In a test of a new surgical procedure, the five surgeons participated in a study at a hospital. Each sur Surgeon Allen Bob Chloe Daphne Old way 36 55 28 40 New way 31 45 28 35 Edgar 62 Which test should we use to test for zero difference in mean 57 times? a. Paired t -test b. Independent samples t -test c. Independent samples z test d. Cannot be sure without knowing α. 10. ______ To test whether a linear relationship using only a single independent variable is significant, w a. The hypothesis that the correlation is 0. b. The hypothesis that the slope is 0. c. Whether the p-value of the F-statistic is less than .05. d. All of the above. e. None of the above. 11. The regression equation Salary = 25,000 + 3200 YearsExperience + 1400 YearsCollege describ For questions 12 through 15: In a Case-Control study, the Cases (those with disease) and Controls (those without disease) are compa In Cohort studies, cohort groups are defined as those exposed and not exposed to a factor, and they are 12. Which of these two types of studies is also called a prospective study? 13. Which of these two types of studies is also called a retrospective study? 14. True False The incidence rate for those exposed in a Cohort study has as the denominator the tot 15. True False In a Case Control study, to find the prevalence of exposure among the cases, the deno that there is a linear relationship between the dependent variable and the set of independent variables. between the mean birth weight of boy and girl babies. Random samples of 5 babies’ weights (in pounds) for each gender showed the follow d in a study at a hospital. Each surgeon was assigned two patients of the same age, gender, and overall health. One patient was operated upo ce + 1400 YearsCollege describes employee salaries at a company. The standard error of the fit is 2800. An employee has 10 yea ose without disease) are compared with respect to their past exposure or not to the factor of interest. alth. One patient was operated upon in the old way, and the other in the new way. Both procedures are considered equally safe. The goal of s 2800. An employee has 10 years' experience and 4 years of college. This employee’s salary is $66,000. Is this employee’s salar nsidered equally safe. The goal of the study was to compare average times to perform the surgery in the two different ways. The surgery tim wo different ways. The surgery times are shown below: Please type your name here 16. (15 pts) A published study reported that overweight people on low-carbohydrate/Mediterranean diets lost more weight and got greater cardiovascular benefits than people on a conventional low-fat diet. To verify this, a nutritionis followed 30 dieters on low-carbohydrate/Mediterranean diets and 30 dieters on conventional low-fat diets and measured (in pounds) weigh loss for each. The data set is included in the excel answer file. a. What hypothesis should be used to show that average weight loss for those on low-carbohydrate/Mediterranean diets is statistically significantly greater than average weight loss on conventional low-fat diets? b. Use statistical software to perform this hypothesis test; assume a significance level of .05 and that the population variances are equal. What are the critical value, test statistic, and p-value? c. What is the conclusion, in the context of the problem? 16. 15 pts null hypothesis: alternative hypothesis: critical value = test statistic = p-value = Conclusion: c. Interpretation of conclusion in context of problem: Please show work here. Low-carb/Mediterranean Diets Low-fat Diet 9.5 6.5 8.1 5.8 10.4 9.9 11.9 5.1 11.8 8 12.6 6.3 6.7 6.3 9.6 4.4 11.6 5.7 8.4 5.9 9 6.8 7.5 5.1 7.2 6.3 8.5 5.5 8.8 5.5 6.8 5.9 9.1 6.9 9.4 9.1 10.2 8 9.5 8.9 9.5 3.4 9.4 12 9.9 9.7 9.2 13 11.3 13.6 9 4.6 6.2 4.6 6.7 4.6 7.1 11 4.5 3.9 Mediterranean diets lost more iet. To verify this, a nutritionist ntional low-fat diets and file. -carbohydrate/Mediterranean fat diets? of .05 and that the population Please type your name here 17. (15 pts) Consider the following three Case-Control studies to determine whether there is an association between disease. The Cases are patients with brain tumors and the Controls are patients without brain tumors. The three facto are: Gender, Smoking, and Cellular Phone Use. The three tables below show the numbers of Cases with and without each factor, and the number of Controls with and without exposure to each factor. For each study: find the prevalence of exposure and the odds of exposure for the Cases, find the prevalence of exposu odds of exposure for the Controls, and calculate the odds ratio of Cases to Controls. Interpret, in the context of the pr odds ratio. The data sets are included in the excel answer file. a. First is a study to determine if ‘being Male’ (Gender) is a factor with brain tumors. This means that exposure is de male, and no exposure is being female. b. The second study looks at whether smoking is a factor with brain tumors. c. The third study looks at whether cellular phone usage is a factor with brain tumors. 17. 15 pts Study 1 prevalence of exposure odds of exposure odds ratio Cases Controls Interpretation: Study 2 exposure Male Female total prevalence of exposure odds of exposure odds ratio Cases Controls Interpretation: Study 3 Cases Controls Interpretation: exposure Smoker Nonsmoker total prevalence of exposure odds of exposure odds ratio exposure Use cellular phone Do not use cellular phone total there is an association between a factor and a ut brain tumors. The three factors considered bers of Cases with and without exposure to es, find the prevalence of exposure and the nterpret, in the context of the problem, the . This means that exposure is defined as being s. Study 1- Gender and Brain Tumor Brain Tumor No Brain Tumor (Cases) (Controls) 11 10 9 10 20 20 Study 2- Smoking and Brain Tumor Brain Tumor (Cases) 13 7 20 No Brain Tumor (Controls) 9 11 20 Study 3- Cellular Phone Use and Brain Tumor Brain Tumor No Brain Tumor (Cases) (Controls) 11 11 9 9 20 20 Please type your name here 18. (20 pts) An article in New York Times discussed the relationship between Scholastic Aptitude Test (SAT) score and the test-takers’ family incomes. It commented that the wealthier a student’s family, the higher the SAT score. Another common conjecture is that the student’s high school grade point average (GPA) is a good predictor of the student’s SAT score. In the Data Analysis output below, I used regression to find the least squares fit to model SAT score using GPA and Income. The data set I used had 24 students’ SAT scores, the students’ family Incomes (in $) a high school GPA’s. Please use the output given to answer the following questions. It is included in the answer file. a. What is the least squares fit equation to predict SAT using Income and GPA? Please state it in equation form: Y = constant + slope*X1 + slope*X2 where you include the constant and slope values. b. What is the approximate size of errors when using your equation to predict SAT? Compare this to standard devia of the SAT scores, which is 84.47. c. What fraction of the variation in SAT is explained using Income and GPA in the least squares equation? d. What is the F-statistic of the equation? Please interpret it in the context of this problem. What hypothesis does it test? What is its p-value? What is your conclusion with respect to the hypothesis? e. Interpret the two slopes. Are the slope estimates significant? Explain. f. Use the equation to predict the SAT of a student with Income = average of the Incomes ($72,833) and GPA = average of the GPA’s (3.28). What is your prediction? 18. 20 pts a. equation is b. size of errors: comparison: c. fraction is: d. Fstat: Interpretation of F-stat: H0: H1: Conclusion: e. slope of Income is Interpretation of slope: test statistic: H0: H1: Is it significant? Why? e. slope of GPA is Interpretation of slope: test statistic: H0: H1: Is it significant? Why? f. SAT prediction: p-value: p-value p-value Regression Statistics Multiple R 0.930 R Square 0.865 Adjusted R Square 0.852 Standard Error 32.490 Observations 24 ANOVA df Regression Residual Total Intercept Income GPA 2 21 23 SS 141927.957 22167.876 164095.833 Coefficients Standard Error 1104.258 54.752 0.0017 0.00025 150.992 15.093 MS 70963.979 1055.613 F Significance F 67.225 7.44E-10 t Stat P-value Lower 95% Upper 95% 20.168 3.17E-15 990.394 1218.122 6.770 1.07E-06 0.001 0.002 10.004 1.92E-09 119.604 182.380 c Aptitude Test (SAT) scores he higher the SAT score. is a good predictor of the t squares fit to model SAT ents’ family Incomes (in $) and ncluded in the answer file. state it in equation form: Y = mpare this to standard deviation t squares equation? m. What hypothesis does it es ($72,833) and GPA = Please type your name here 19. (20 pts) Investigate a multiple regression model of employee wages (Wage) considering the following variables: years of higher education (EDUC), years of experience (EXPER), age (AGE), and gender (GENDER= 1 for male, 0 for female). There are 50 observations, based on interviews with employees, given in the exam answer file. Wage is in units of $10,000’s; education, experience, and age are in years. a. First consider a model that uses four all of the independent variables. Give the prediction equation and include the software run that produced the model. Is the model significant? Comment on this model, referencing the usual measures to assess the fit. b. Predict the average Wage for a 40-year-old male with 10 years of higher education and 5 years of experience. Predict the average Wage of a 40-year-old female with the same qualifications. c. Interpret the Gender coefficient. Is it significant at the 5 % level? Why? d. Remove any variables that are not significant and build a new regression model. Give the prediction equation and include the software run that produced the model. Comment on this model, referencing the usual measures to assess the fit. e. Find the correlation matrix and with it explain the differences in the two models and why some of the variables were not significant in the first model. 19. 20 pts a. equation is: Significant? Comments: yes or no Why? b. prediction is: prediction is: c. Interpretation of Gender coefficient: d. Which variables, if any, are not significant? new equation is: Significant? yes or no Why? Comments: e. Comments of differences after viewing the correlation matrix. Please show matrix to the right here: Note that: Wage is in $10,000's, and EDUC and AGE are in years, and GENDER = 1 for male, 0 for female. Wage EDUC EXPER AGE Gender 37.85 11 2 40 1 21.72 4 1 39 0 14.34 4 2 38 0 21.26 5 9 53 1 24.65 6 15 59 1 25.65 6 12 36 1 15.45 9 5 45 0 20.39 29.13 27.33 18.02 20.39 24.18 17.29 15.61 35.07 40.33 20.39 16.61 16.33 23.15 20.39 14.88 13.88 17.65 15.45 26.35 19.15 16.61 18.39 15.45 18.02 13.44 17.66 16.96 14.34 15.45 17.43 35.89 20.39 11.81 15.45 17.66 13.87 16.35 15.45 23.67 16.02 23.15 24.18 4 5 11 8 9 7 4 1 9 11 4 6 9 6 4 4 5 6 6 4 6 6 9 5 7 4 6 4 4 6 5 9 4 4 9 5 6 7 9 4 11 4 8 12 14 3 5 18 1 10 9 22 3 14 5 3 15 13 9 4 5 2 18 4 4 3 8 6 3 23 15 9 3 14 16 20 5 10 4 1 10 2 17 2 15 11 37 37 43 32 40 49 43 31 45 31 55 30 28 60 32 58 28 40 37 52 44 57 30 43 31 33 51 37 45 55 57 36 60 35 34 28 25 43 42 47 46 52 64 0 1 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 1 0 1 0 0 owing variables: years or male, 0 for female). is in units of ation and include the he usual measures to of experience. Predict ediction equation and ures to assess the fit. of the variables were
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer


Anonymous
Great! 10/10 would recommend using Studypool to help you study.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags