ALY 6015 Northeastern University Regression Models Using R Question

User Generated

pnaobwv

Programming

ALY 6015

Northeastern University

ALY

Description

Deliverables (HOW)

1.Load the Ames housing dataset.

2.Perform Exploratory Data Analysis and use descriptive statistics to describe the data.

3.Prepare the dataset for modeling by imputing missing values with the variable's mean value or any other value that you prefer.

4.Use the "cor()" function to produce a correlation matrix of the numeric values.

5.Produce a plot of the correlation matrix, and explain how to interpret it. (hint -check the corrplot or ggcorrplot plot libraries)

6.Make a scatter plot for the X continuous variable with the highest correlation with SalePrice. Do the same for the X variable that has the lowest correlation with SalePrice. Finally, make a scatter plot between X and SalePrice with the correlation closest to 0.5. Interpret the scatter plots and describe how the patterns differ.

7.Using at least 3 continuous variables, fit a regression model in R.

8.Report the model in equation form and interpret each coefficient of the model in the context of this problem

9.Use the "plot()" function to plot your regression model. Interpret the four graphs that are produced.

10.Check your model for multicollinearity and report your findings.What steps would you take to correct multicollinearity if it exists?

11.Check your model for outliers and report your findings. Shouldthese observations be removed from the model?

12.Attempt to correct any issues that you have discovered in your model. Did your changes improve the model, why or why not?

13.Use the all subsets regression method to identify the "best" model. State the preferred model in equation form.

14.Compare the preferred model from step 13 with your model from step 12. How do they differ? Which model do you prefer and why

User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

see here

LastName 1

Regression models using R
A regression model using at least three continuous variables.
I choose to predict sale prices using the variables: Lot frontage, Lot Area, Overall Quality,
Overall Condition, Total Basement Uniform Surface, Ground Floor Living area, and Garage
Area.
The correlation matrix shows that all these predictors have a significant linear relationship
between them and the response variable sale price.
SalePrice

Corr

Overall

Gr Lv

Total

Garage

Overall

Quality

Area

Bsmt SF

Area

Condition

0.7993

0.7068

0.6323

0.6404

-0.1017

Lot Area

Lot
Frontage

0.2665

0.3573

Overall Quality, Ground Living Area, Total Basement Surface Area, and Garage Area
have the strongest positive correlation with the sale price. However, Lot Frontage and Lot area
have a weak but positive significant relationship with the sale price. Overall condition of the
houses has a negative significant linear relationship with the sale prices.
12.A regres...


Anonymous
Just what I needed. Studypool is a lifesaver!

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags