SOLUTION: segment addition postulate - Algebra

Directions: Complete the following questions. The questions have been separated into 4 parts of similar material. Parts 1, 2, and 3 will only use the corona_train data while Part 4 will use the corona_test data. Use the Markdown starter file here hw7_starter.Rmd.Part 1 - Odds1. Using the training dataset, compute the odds that a county has reported a Coronavirus-related death. (2pts)2. Does the odds of a Coronavirus-related death vary by Census region? Compute the odds that a county has reported a Coronavirus-related death for each Census region within the United States. Compare these values to address the question. (3pts)Part 2 - Simple Logistic Regression3. Build a plot (or plots) to explore how the logarithm of the population density predicts whether a county has recorded a coronavirus-related death. Briefly discuss the results of your plot. (2pts)4. Build a simple logistic model to statistically determine if the logarithm of the population density predicts the probability a county has reported a Coronavirus-related death. Support your findings with an appropriate hypothesis test. (3pts)Part 3 - Multiple Logistic Regression Models5. Fit a multiple logistic regression model with the census region, the logarithm of population density, the cumulative coronavirus rate, the median county age, the median income, the percent of the county that are U.S. citizens, the percent with a college degree, the percent of the population that are veterans of the U.S. armed services, the percent with healthcare and the percent that voted for President Trump in the 2016 general election to predict the probability a county has reported a Coronavirus-related death. Conduct an appropriate test to determine whether this model significantly predicts the probability a county has reported a Coronavirus-related death. (3pts)6. Perform a backward selection procedure on the model from question 5. Which variable(s) has/have been removed from the model. (2pts)7. We will now continue a backward selection procedure, but this time using Likelihood Ratio test. Using the drop1() function to determine which predictors are significant, iteratively remove all insignificant predictors from the model in question 6. That is, look at the drop1() output from the model in question 6, refit the model after removing all insignificant terms, look at the drop1() output, refit the model after removing all insignificant terms... Continue this process until all predictors are significant. What predictor variables remain in the model? (4pts)8. The starter file contains some code to help you along on this problem. Build a table to compare the AIC, BIC and a Pseudo-R-squared for the models fit in questions 5, 6 and 7. Which model is best with respect to each metric? (3pts)9. Code was supplied for a Pseudo-R-squared calculation in question 8. Explain how this value mimics that of the traditional R-squared value used in multiple linear regression. (2pts)10. For the model with the best BIC, of those fit in questions 5, 6, or 7, interpret the coefficient regionWest. Be sure to explain this coefficient in terms of odds (not log-odds, which do not provide a nice interpretation). How does this compare to the results in question 2? Why might they be similar/different? (3pts)Part 4 - Prediction11. We will use three fitted models built above to predict whether a county in the testing dataset will have a Coronavirus-related death. Some code is supplied in the starter file, edit and replicate so it will make predictions using all three models. Briefly describe what this code is doing. (2pts)12. Calculate and discuss the accuracy, sensitivity and specificity for all three models to predict if a county has reported a Coronavirus-related death. Which model appears to be the best model at predicting if a county has a Coronavirus-related death? Code is provided for the confusion matrix of the first model. Replicate this code to generate the confusion matrices for the other two models. (6pts)13. Using the best model from the previous question, compute the sensitivity and specificity if the probability threshold (the 0.5 provided in the code for question 11) were 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9. Use these values to complete the table in the starter file. Which threshold appears to be the best choice? (5pts)NOTE: the ideas of sensitivity and specificity are VERY relevant in today's society as scientist develop tests for the COVID-19 Coronavirus; for both antibody and detection of the disease. We felt it prudent to introduce these topics under the current circumstances. Some Coding hintsWe have covered a lot this semester... In an effort to help you with some of the necessary coding, we provide the following hints but note additional code is needed for all to workxtabs() can be used in questions 1, 2, and 12ggplot() is needed in question 3glm(), drop1() and/or anova() are needed in questions 4, 5, 7 and 8stats::step() is needed in question 6summary() will provide output with model coefficients, you can also use coef()I need rmd and html file in the end.

Access over 20 million homework & study documents

segment addition postulate

Question Description

Similar Content

The Color Purple

The Mayor of Casterbridge

Tess of the DUrbervilles

East of Eden

Moby Dick

Mockingjay

Heart of Darkness

The Restless Wave

Crime and Punishment