Access over 20 million homework & study documents
search

Monthto Numeric Variables and Create of A New Variable Code Notes

Content type

User Generated

Subject

Programming

Type

Other

Rating

Showing Page:
1/3
Homework 2 - dplyr
Due 10/27 by Midnight
First, load the packages needed.
library(dplyr)
library(tidyr)
library(ggplot2)
Part 1 - Cleaning without dplyr
Instructions:
Without using dplyr
Each of the above should be one line of code.
zillow <-
read.csv('https://s3.amazonaws.com/douglas2/data/Sale_Prices_State.csv',
stringsAsFactors=F)
1. Remove the RegionID and SizeRank columns
zillow2 = zillow[ , -c(1, 3)]
2. Reshape the data appropriately using the gather() function
zillow2 = gather(zillow2, 'Date', 'Sales', -RegionName)
3. Remove the X from the variable containing the date information
zillow2$Date = gsub('X', '', zillow2$Date)
4. Using the separate() function, split the column containing the date information into
Year and Month columns
zillow2 = separate(zillow2, Date, c('Year', 'Month'), sep = '\\.')
5. Convert Year to a numeric variable
zillow2$Year = as.numeric(zillow2$Year)
6. Convert Month to a numeric variable
zillow2$Month = as.numeric(zillow2$Month)
7. Create a new variable, Date, defined as Year+Month/12 (I want this variable for easy
plotting)
zillow2$Date = zillow2$Year + (zillow2$Month / 12)
8. Remove any rows where the median sale price is missing (hint: you can use is.na(x)
to identify missing values, or !is.na(x) to identify non-missing value, where x is the
column you are interested in, both will return a TRUE/FALSE vector)

Sign up to view the full document!

lock_open Sign Up
Showing Page:
2/3
zillow2 = zillow2[!is.na(zillow2$Sales), ]
9. Order the data by state and date (hint: data = data[order(variable1, variable2),
])
zillow2 = zillow2[order(zillow2$RegionName, zillow2$Date), ]
10. Print out the data structure using the str() function
str(zillow2)
'data.frame': 3994 obs. of 5 variables:
$ RegionName: chr "Alabama" "Alabama" "Alabama" "Alabama" ...
$ Year : num 2012 2012 2012 2012 2012 ...
$ Month : num 6 7 8 9 10 11 12 1 2 3 ...
$ Sales : num 108800 111500 112200 115400 115000 ...
$ Date : num 2012 2013 2013 2013 2013 ...
Part 2 - Cleaning with dplyr
Instructions:
1. select - Remove the RegionID and SizeRank columns
2. Reshape the data appropriately using the gather() function
3. mutate - Remove the X from the variable containing the date information
4. Using the separate() function, split the column containing the date information into
Year and Month columns
5. mutate - Convert Year and Monthto numeric variables, and create a new variable, Date,
defined as Year+(Month/12)
6. filter - Remove any rows where the median sale price is missing
7. arrange - Order the data by state and date
When finished, connect your dplyr chain to the original read.csv() statement, so your
data reads in “cleaned” and ready to go, like this:
zillow <- read.csv() %>%
select() %>%
mutate()...
zillow <-
read.csv('https://s3.amazonaws.com/douglas2/data/Sale_Prices_State.csv',
stringsAsFactors=F)
zillow2 = zillow %>%
select(-RegionID, -SizeRank) %>%
gather('Date', 'Price', -RegionName) %>%
mutate(Date = gsub('X', '', Date)) %>%
separate(Date, c('Year', 'Month'), sep = '\\.') %>%
mutate(
Year = as.numeric(Year),
Month = as.numeric(Month),

Sign up to view the full document!

lock_open Sign Up
Showing Page:
3/3

Sign up to view the full document!

lock_open Sign Up

Unformatted Attachment Preview

Homework 2 - dplyr Due 10/27 by Midnight First, load the packages needed. library(dplyr) library(tidyr) library(ggplot2) Part 1 - Cleaning without dplyr Instructions: Without using dplyr… Each of the above should be one line of code. zillow Purchase document to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Anonymous
Great study resource, helped me a lot.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4