Description
topic ;
1-statistical learning
2- R language
3-computer science
This exercise relates to the College data set, which can be found in the file College.csv. It contains a number of variables for 777 different universities and colleges in the US. The variables are Private : Public/private indicator • Apps : Number of applications received • Accept : Number of applicants accepted • Enroll : Number of new students enrolled • Top10perc : New students from top 10% of high school class • Top25perc : New students from top 25% of high school class • F.Undergrad : Number of full-time undergraduates • P.Undergrad : Number of part-time undergraduates • Outstate : Out-of-state tuition • Room.Board : Room and board costs • Books : Estimated book costs • Personal : Estimated personal spending • PhD : Percent of faculty with Ph.D.’s • Terminal : Percent of faculty with terminal degree • S.F.Ratio : Student/faculty ratio • perc.alumni : Percent of alumni who donate • Expend : Instructional expenditure per student • Grad.Rate : Graduation rate Before reading the data into R, it can be viewed in Excel or a text editor. (a) Use the read.csv() function to read the data into R. Call the loaded data college. Make sure that you have the directory set to the correct location for the data. (b) Look at the data using the fix() function. You should notice that the first column is just the name of each university. We don’t really want R to treat this as data. However, it may be handy to have these names for later. Try the following commands: > rownames (college )=college [,1] > fix (college ) You should see that there is now a row.names column with the name of each university recorded. This means that R has given each row a name corresponding to the appropriate university. R will not try to perform calculations on the row names. However, we still need to eliminate the first column in the data where the names are stored.
Unformatted Attachment Preview
Purchase answer to see full attachment
Explanation & Answer
Find the attached
Running head: CS 5565, LAB1 (INTRODUCTION)
CS 5565, LAB1 (Introduction)
Name
Institution
1
CS 5565, LAB1(INTRODUCTION)
2
CS 5565, LAB1 (Introduction)
1 (10 points) This exercise relates to the College data set, which can be fou
nd in the file College.csv. It contains a number of variables for 777 differe
nt universities and colleges in the US.
a) Use the read.csv () function to read the data into R. Call the loaded data college. Make sure that
you have the directory set to the correct location for the data.
library(ISLR)
data(College)
college 25000, ])
3624
Max.
48090
CS 5565, LAB1(INTRODUCTION)
8
## [1] "Rutgers at New Brunswick"
Detaching the College data set.
detach(college)
2 (10 points) This exercise involves the Auto data set studied in the lab. Make sure that the missi
ng values have been removed from the data.
a) Which of the predictors are quantitative, and which are qualitative
lapply(auto, class)
## $mpg
## [1] "numeric"
##
## $cylinders
## [1] "numeric"
##
## $displacement
## [1] "numeric"
##
## $horsepower
## [1] "numeric"
##
## $weight
## [1] "numeric"
##
## $acceleration
## [1] "numeric"
##
## $year
## [1] "numeric"
##
## $origin
## [1] "numeric"
##
CS 5565, LAB1(INTRODUCTION)
## $name
## [1] "factor"
b) What is the range of each quantitative predictor? You can answer this range() function.
# columns qualitative
cols.qlt = names(auto) %in% c("name", "origin")
# app...