Harvard University Database Management and Modeling MySQL Task

User Generated

ebhebh01

Programming

Harvard University

Description

R Homework

Database Management and Modeling

General exercises

  • Create a text vector called Months with names of the 12 months of the year.
  • Create a numeric vector Summer, with Calendar month index positions for the summer months (inclusive, with 4 elements in all).
  • Use vector indexing to extract the text values of Months, indexed by Summer.
  • Multiply Summer by 3. What are the values of Months, when indexed by Summer multiplied by 3? Why do you get that answer?
  • What is the mean (average) summer month, as an integer value? Which value of Months corresponds to it? Why do you get that answer?
  • Use the floor() and ceiling() functions to return the upper and lower limits of Months for the average Summer month. (Hint: to find out how a function works, use R help if needed.)

E-commerce Data for Exercises

The data set comprises responses to intercept surveys asked when users visited the site, along with data about each user’s site activity such as number of pages visited and whether a sale was completed. Identifying details for the site and customers have been removed but the observations otherwise are actual data.

We will load the data set first, and then explain a few of its observations. To load the data from CSV format, use the following command (or load ecommerce-data.csv from a local location if you have downloaded it, as noted in Section 1.6.3).

ecomm.df <- read.csv("https://goo.gl/hzRyFd")

summary(ecomm.df)

  1. How many observations and variables are in the e-commerce data set?
  2. Compute a frequency table for the country of origin for site visits. After the United States, which country had the most visitors?
  3. Compute a two-way frequency table for the intent to purchase (intentWasPlanningToBuy), broken out by user profile.
  4. What are the proportions of parents who intended to purchase? the proportions of teachers who did? For each one, omit observations for whom the intent is unknown (blank).
  5. Among US states (recorded in the variable region), which state had the most visitors and how many?
  1. Solve the previous problem for the state with the most visitors, using the which.max() function (or repeat the same answer, if you already used it).
  2. Draw a histogram for the number of visits to the site (behavNumVisits). Adjust it for more detail in the lower values. Color the bars and add a density line.
  3. Draw a horizontal boxplot for the number of site visits.
  4. Which chart from the previous two exercises, a histogram or a boxplot, is more useful to you, and why?
  5. Draw a boxplot for site visits broken out with a unique row for each profile type. (Note: if the chart margins make it unreadable, try the following command before plotting: par(mar=c(3, 12, 2, 2)). After plotting, you can use the command par(mar=c(5, 4, 4, 2) + 0.1) to reset the chart margins.)
  6. *Write a function called MeanMedDiff that returns the absolute difference between the mean and the median of a vector.
  7. *What is the mean-median difference for number of site visits?
  8. *What is the mean-median difference for site visits, after excluding the person who had the most visits?

Unformatted Attachment Preview

R Homework Database Management and Modeling General exercises 1. Create a text vector called Months with names of the 12 months of the year. 2. Create a numeric vector Summer, with Calendar month index positions for the summer months (inclusive, with 4 elements in all). 3. Use vector indexing to extract the text values of Months, indexed by Summer. 4. Multiply Summer by 3. What are the values of Months, when indexed by Summer multiplied by 3? Why do you get that answer? 5. What is the mean (average) summer month, as an integer value? Which value of Months corresponds to it? Why do you get that answer? 6. Use the floor() and ceiling() functions to return the upper and lower limits of Months for the average Summer month. (Hint: to find out how a function works, use R help if needed.) E-commerce Data for Exercises The data set comprises responses to intercept surveys asked when users visited the site, along with data about each user’s site activity such as number of pages visited and whether a sale was completed. Identifying details for the site and customers have been removed but the observations otherwise are actual data. We will load the data set first, and then explain a few of its observations. To load the data from CSV format, use the following command (or load ecommerce-data.csv from a local location if you have downloaded it, as noted in Section 1.6.3). ecomm.df
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Hello,I have finished the task. I filled out the answers in the text answers in the original homework assignment and copy over all graphs into this document. The corresponding R file used to generate these files is available in analysis.R. If you have any questions, please let me know.

R Homework
Database Management and Modeling
General exercises
1. Create a text vector called Months with names of the 12 months of the year.
2. Create a numeric vecto...


Anonymous
Excellent! Definitely coming back for more study materials.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags