Access over 20 million homework & study documents
search

Clean & structure the Data Using Ggplot2 Command Code Notes

Content type

User Generated

Subject

Programming

Type

Other

Rating

Showing Page:
1/4
HW 5 - ggplot2
Due Monday, 11/2 by Midnight
Part 1 - Movies and Money
Data prep
The data for this part contains info on the top 50 films by worldwide gross. Please
clean/structure the data in the following format (pay attention to columns and variable
types):
movies <- read.csv('https://s3.amazonaws.com/douglas2/data/movies.csv',
stringsAsFactors = F) %>%
mutate(
Worldwide.gross = as.numeric(gsub('\\$|,', '', Worldwide.gross)),
Released = as.Date(Released, "%Y-%m-%d"),
OscarWin = ifelse(grepl("Won", Awards, ignore.case = T), "Yes", "No")
)
str(movies)
'data.frame': 50 obs. of 13 variables:
$ Rank : int 1 2 3 4 5 6 7 8 9 10 ...
$ Title : chr "Avatar" "Titanic" "Star Wars: The Force Awakens"
"Jurassic World" ...
$ Worldwide.gross: num 2.79e+09 2.19e+09 2.07e+09 1.67e+09 1.52e+09 ...
$ Year : int 2009 1997 2015 2015 2012 2015 2015 2011 2013 2013
...
$ Rated : chr "PG-13" "PG-13" "PG-13" "PG-13" ...
$ Released : Date, format: "2009-12-18" "1997-12-19" ...
$ Genre : chr "Action" "Drama" "Action" "Action" ...
$ Director : chr "James Cameron" "James Cameron" "J.J. Abrams" "Colin
Trevorrow" ...
$ Awards : chr "Won 3 Oscars. Another 80 wins & 121 nominations."
"Won 11 Oscars. Another 110 wins & 73 nominations." "Nominated for 5 Oscars.
Another 48 wins & 104 nominations." "6 wins & 53 nominations." ...
$ imdbRating : num 7.9 7.7 8.2 7 8.1 7.2 7.5 8.1 7.6 7.2 ...
$ imdbVotes : int 890617 796903 575439 421709 1003301 280028 467996
546266 424522 560586 ...
$ leadActor : chr "Sam Worthington" "Leonardo DiCaprio" "Harrison
Ford" "Chris Pratt" ...
$ OscarWin : chr "Yes" "Yes" "No" "No" ...

Sign up to view the full document!

lock_open Sign Up
Showing Page:
2/4
Plot 1 - Average gross by genre since 2000
The plot below displays the average worldwide gross by genre for movies released in the
year 2000 or later. In addition to the plot, please format the y-axis appropriately. You can
format the labels by loading the scales package, and adding labels=dollar to the appropriate
scale function.
movies %>%
group_by(Genre) %>%
filter(Genre != "Drama") %>%
summarise(AverageGross = mean(Worldwide.gross)) %>%
ggplot(aes(x = Genre, y = AverageGross)) +
geom_col() +
scale_y_continuous(name = "Average Gross ($)", labels = dollar)

Sign up to view the full document!

lock_open Sign Up
Showing Page:
3/4

Sign up to view the full document!

lock_open Sign Up

Unformatted Attachment Preview

HW 5 - ggplot2 Due Monday, 11/2 by Midnight Part 1 - Movies and Money Data prep The data for this part contains info on the top 50 films by worldwide gross. Please clean/structure the data in the following format (pay attention to columns and variable types): movies <- read.csv('https://s3.amazonaws.com/douglas2/data/movies.csv', stringsAsFactors = F) %>% mutate( Worldwide.gross = as.numeric(gsub('\\$|,', '', Worldwide.gross)), Released = as.Date(Released, "%Y-%m-%d"), OscarWin = ifelse(grepl("Won", Awards, ignore.case = T), "Yes", "No") ) str(movies) 'data.frame': 50 obs. of 13 variables: $ Rank : int 1 2 3 4 5 6 7 8 9 10 ... $ Title : chr "Avatar" "Titanic" "Star Wars: The Force Awakens" "Jurassic World" ... $ Worldwide.gross: num 2.79e+09 2.19e+09 2.07e+09 1.67e+09 1.52e+09 ... $ Year : int 2009 1997 2015 2015 2012 2015 2015 2011 2013 2013 ... $ Rated : chr "PG-13" "PG-13" "PG-13" "PG-13" ... $ Released : Date, format: "2009-12-18" "1997-12-19" ... $ Genre : chr "Action" "Drama" "Action" "Action" ... $ Director : chr "James Cameron" "James Cameron" "J.J. Abrams" "Colin Trevorrow" ... $ Awards : chr "Won 3 Oscars. Another 80 wins & 121 nominations." "Won 11 Oscars. Another 110 w ...
Purchase document to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Anonymous
I use Studypool every time I need help studying, and it never disappoints.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4