CDS 101 R Project

User Generated

uhyx833

Programming

CDS 101

CDS

Description

Unformatted Attachment Preview

CDS 101 Final Exam Project 1. Read in the file and select the following columns i. INSTNM ii. CITY iii. STABBR iv. ADM_RATE v. COSTT4_A vi. MD_EARN_WNE_P10 b. Save the dataframe to the variable “data” 2. Rename the columns to the following i. INSTNM -> NAME ii. STABBR -> STATE iii. COSTT4A -> AVG_COST iv. MD_EARN_WNE_P10 -> MED_10YR_PAY b. Save the dataframe to the variable “data_renamed” 3. Count how many missing values there are for each state/territory a. Answer i. What state/territory has the most missing values? 1. How many were there? ii. What state/territory have no missing values? b. Hint i. Google “US postal code” and the abbreviation to find the full name of the state/territory 4. Using the data_renamed dataframe, impute missing tuition costs with the median cost. Save to the variable data_imputed_cost. 5. Using the data_imputed_cost dataframe, impute missing admission rates with the median admission rate. Save to the variable data_imputed_admissions. 6. Using the data_imputed_admissions dataframe, impute missing MED_10YR_PAY with the median 10-year pay. Save to the variable data_imputed. 7. Using the data_imputed dataframe, calculate the average cost per institution by state and order the values from smallest to largest. Save to the variable state_price. a. What state/territory has the highest price? What is it? b. What state/territory has the lowest price? What is it? 8. Using the state_price dataframe, create a scatterplot with state as the explanatory variable and price as the response variable. Add color and a descriptive title and axes labels. 9. Even when adjusting the x-axis labels it is not very clear which data point belongs to which state. Use ggplotly() to make the graph interactive so it is easier to see which data point corresponds to which state/price. 10. Using the data_imputed dataset, make a linear regression model to compare average price as the explanatory variable and median debt as the response variable. a. print out the summary statistics b. what is R^2? 11. Use the tidy() function to report the slope and intercept of the model. 12. Use the glance() function to find the r-squared value. a. Colleges that have a lower admission rate are considered more selective and prestigious. Is there a relationship between how selective/prestigious a college is and the earnings 10 years after graduation? Use the r-squared value to justify your answer. 13. Hypothesis test average tuition price vs George Mason a. You will perform a one-sided hypothesis test to determine whether the average cost of attending George Mason University is higher than $24,537.32 (the average cost of universities in this dataset). The average university cost(mean_cost) and average cost of GMU(cost_obs_stat) have been calculated for you. 14. Generate the null distribution and p-value 15. Visualize the results and shade to the right. Give the graph a title and label both the x and y axes. a. Using a significance value of alpha = 0.05, decide whether to reject or fail to reject the null hypothesis. What does this mean in terms of the cost of GMU compared to the average tuition cost? A 12 Use the glance() function to find the r-squared value. glance(reg_model) r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs 0.1423654 0.1421989 14669.76 855.3862 0 1 -56768.34 113542.7 113562.3 1.108935e+12 5153 5155 The r-squared value is 0.1423654 The R2 for the model is low and we observe that average price explains only 14.24% of the variation and this means that there is not strong relationship between explanatory and response variable. Colleges that have a lower admission rate are considered more selective and prestigious. Is there a relationship between how selective/prestigious a college is and the earnings 10 years after graduation? Use the r-squared value to justify your answer. reg_model2
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

View attached explanation and answer. Let me know if you have any questions.Please find the final answe...


Anonymous
Great! 10/10 would recommend using Studypool to help you study.

Studypool
4.7
Indeed
4.5
Sitejabber
4.4

Related Tags