STAT 250 GMU 2018 Movies STAT

Anonymous
timer Asked: Feb 13th, 2019
account_balance_wallet $15

Question Description

STAT 250 Spring 2019 Data Analysis Assignment 1

Your submitted document should include the following items. Points will be deducted if the following are not included.

  1. Type your Name and STAT 250 with your correct section number (e.g. STAT 250-xxx) right justified and then Data Analysis Assignment #1 centered on the top of page 1 below your name the begin your document.
  2. Number your pages across your entire solutions document.
  3. Your document should include the ANSWERS ONLY with each answer labeled by its corresponding number and subpart. Keep the answers in order. Do not include the questions in your submitted document.
  4. Generate all requested graphs and tables using StatCrunch.
  5. Upload your document onto Blackboard as a Word (docx) file or pdf file using the link provided by your instructor. It is your responsibility for uploading a readable file.

Full assignment Instructions, as well as a example is attached as a word file.

Access to StatCrunch is required.

https://www.statcrunch.com/5.0/group.php?groupid=7...

I will provide the login info..

STAT 250 Spring 2019 Data Analysis Assignment 1 Your submitted document should include the following items. Points will be deducted if the following are not included. 1. Type your Name and STAT 250 with your correct section number (e.g. STAT 250-xxx) right justified and then Data Analysis Assignment #1 centered on the top of page 1 below your name the begin your document. 2. Number your pages across your entire solutions document. 3. Your document should include the ANSWERS ONLY with each answer labeled by its corresponding number and subpart. Keep the answers in order. Do not include the questions in your submitted document. 4. Generate all requested graphs and tables using StatCrunch. 5. Upload your document onto Blackboard as a Word (docx) file or pdf file using the link provided by your instructor. It is your responsibility for uploading a readable file. Elements of good technical writing: Use complete and coherent sentences to answer the questions. Graphs must be appropriately titled and should refer to the context of the question. Graphical displays must include labels with units if appropriate for each axis. Units should always be included when referring to numerical values. When making a comparison you must use comparative language, such as “greater than”, “less than”, or “about the same as.” Ensure that all graphs and tables appear on one page and are not split across two pages. Type all mathematical calculations when directed to compute an answer ‘by-hand.’ Pictures of actual handwritten work are not accepted on this assignment. When writing mathematical expressions into your document you may use either an equation editor or common shortcuts such as: x can be written as sqrt(x), p̂ can be written as p-hat, x can be written as x-bar. Problem 1: 2018 Movies 1 Moviepass is a subscription service that allows users to see one movie per a day at select theaters. AMC Theatres released their own movie subscription service called A-List to compete with Moviepass which allows users to see up to three movies per a week. Raw data was collected from one user who purchased an annual Moviepass subscription in January 2018 and subscription to AList in November 2018. The dataset found in our StatCrunch Group presents 169 movies seen along with other variables describing each movie. The data set is called “2018 Movies.” a) Use StatCrunch to create a one-way table for the variable “Genre” using both counts and percentages. Select Stat → Tables → Frequency and select both ‘Frequency’ and ‘Percent of total’ in the Statistic(s) box by holding down the Ctrl Key (Command Key on Macs) when making these selections. Copy your table into your document and then manually round the values in the ‘Percent of total” column to two decimal places in the StatCrunch table that you have copied into your document. b) Interpret your findings from the table in part (a) by identifying the least and most popular genre by percent of total. Use complete sentences with context and include the genre and percentage in the sentences. c) Use StatCrunch to generate a two-way table for the variables “Genre” and “Viewer Rating”. Go to Stat → Tables → Contingency → With Data (since you have the raw data in StatCrunch). Select “Genre” as your row variable and “Viewer Rating” as your column variable. In the display box, select only Percent of Total. Lastly, unclick (or deselect) “ChiSquare test for independence” since it is highlighted by default by holding the Ctrl key and clicking on it. Copy your table into your document. d) How many and what percentage of the 169 movies did the viewer dislike? Answer this question in a complete sentence. e) What values are the same when looking at both your one-way table and your two-way table? Be specific if referencing rows or columns. f) Now, create two more two-way tables keeping “Genre” as your row variable and “Viewer Rating” as your column variable. One table needs to include row percentages and the other needs to include column percentages. To do this, change what you select in the display box from percent of total (in part (c)) to row percent for the first table and column percent for the second table. Include both tables in your document. g) Specifically interpret the meaning of the row percentage found in the “Children’s/Animated” and “Liked” cell. Note that there are 14 movies in that cell. h) Now, specifically interpret the meaning of the column percentage found in the “Children’s/Animated” and “Liked” cell. Note that there are 14 movies in that cell. Problem 2: 2018 Movies Revisited 2 Which genre is most popular among the 169 movies seen? Use the “2018 Movies” data set posted in our StatCrunch group to answer the following questions. a) Using the variable named “Genre”, produce a relative frequency bar chart using Graph → Bar Plot → With Data. Please properly label axes and provide a meaningful title and copy it into your document. b) Using the variable “Genre”, produce a relative frequency Pareto chart. Begin with your bar chart, and edit it by changing “Order by” to Count Descending. Properly title and label your graph and copy it into your document. c) Using the variable “Genre”, produce a Pie Chart using Graph → Pie Chart → With Data. Add an appropriate title and copy this entire graph including the legend into your document. d) Use the three graphs to answer the question: Which genre of movie did this individual see the most of? Present both the count and the proportion and write your answer in one sentence. e) Now produce two grouped relative frequency bar charts (to copy to your document) by following the directions below. Go to Graph → Bar Plot → With Data. For the first grouped bar chart, graph the variable “Viewer Rating” and group by “Genre.” To “group by” click the arrow next to Group by box (the third box down) and select the variable you are asked to group by. In the Type box (5th box down from the top) choose relative frequency within category. Title these graphs clearly. You may keep the default labels for the x and y-axis. For the second grouped bar chart, graph the variable “Genre” and group by “Viewer Rating.” In the Type box (5th box down from the top) choose relative frequency within category. Title these graphs clearly. You may keep the default labels for the x and y-axis. f) Compare the graph variable among the categories of the genres. Describe what you see from each graph in one sentence each. Specifically with the graph grouped by Viewer Rating, revisit your answer to 1(h) in your comment. See next page for Problem 3 Problem 3: Metro Bike Share 3 On July 7, 2016, the Los Angeles County Metropolitan Transportation Authority launched a bicycle sharing system called Metro Bike Share. The system uses a fleet of about 1,400 bikes and includes 93 stations in Downtown Los Angeles, Venice, and the Port of Los Angeles. It is the first bike share system in the United States to be integrated as part of the city’s existing public transit system. The “Metro Bike Share” data set includes a random sample of 300 trips lasting between one and 60 minutes. Twelve variables are included for each observation. The Duration variable indicates the length of the trip in minutes. a) Create a frequency histogram for the variable “Duration” by using Graph → Histogram. Properly title and label your graph and copy it into your document. b) Interpret the shape of this distribution in one complete sentence. c) Use StatCrunch to obtain the sample size, mean, and standard deviation for the “Duration” variable by using Stat → Summary Stats → Columns. Note: in the Statistics box, select the summary statistics listed above in the exact order given. Copy the entire table into your document and manually round each value to two decimal places. d) Use StatCrunch to obtain the five number summary and the IQR for the “Duration” variable (the five number summary includes Min, Q1, Median, Q3, Max). Go to Stat → Summary Stats → Columns to obtain these values. Note: in the Statistics box, select the summary statistics listed above in the exact order given. Copy the entire table into your document and manually round each value to two decimal places. e) Choose the appropriate summary statistics for center and spread (presented in either 3c or 3d) based on your stated shape of the distribution in 3b. f) Use your summary statistics from part 3d and determine the fences used to mathematically identify outliers for the “Duration” variable. To do this, show all steps in your calculations manually including how you obtained the upper and lower fences. Please type your work and calculations. g) Construct a horizontally oriented boxplot of the “Duration” variable by using Graph → Boxplot. To do this, click the “Draw boxes horizontally” box. Properly title and label and copy this graph into your document. h) How many outliers do you identify (please use both the boxplot and your results from 3f)? Write your response in a complete sentence. Problem 4: SAT Scores This data set presents SAT Verbal and Math scores for a random sample of 300 individuals. In addition, the individual’s gender and college is recorded. The sample was collected from one of six colleges (numbered 1 – 6). The data set is called “SAT Scores.” a) Construct two relative frequency histograms using the “Math” variable (one for Males and one for Females). To do this, go to Graph → Histogram. Select Math to enter it in the 4 graph box and then click the arrow in the “Group by:” box and select Gender. Properly title and label your graphs. Finally, below the titling area, under “For multiple graphs” change Columns per page from 1 to 2 and click Compute! Once the graph is computed, click the three lines in the bottom left of the leftmost graph. Select x-axis and change the minimum to 250 and select the y-axis and change the maximum to 0.24. (I have to do this to have each graph have the same sizing for the x and y axes)*. Copy and paste your graphs into your document. b) Describe the shape of each distribution in context in one sentence each. c) Use StatCrunch to obtain sample size (n), the mean, and standard deviation of the “Math” variable by Gender (using “Group by:”) Copy and paste the table into your document. Round your answers to whole numbers in your document. For parts 4d-4f, determine how well the Empirical Rule does in predicting the percentage of observations within some number of standard deviations of the mean. d) Use your rounded summary statistics for females from part 4c to calculate the interval corresponding to one, two, and three standard deviations about the mean SAT Math Score. Type your work showing how you obtained these intervals. Round the endpoints of the final intervals correctly to whole numbers and clearly label and list these three intervals in your document as shown below: 68% interval (lower value, upper value) 95% interval (lower value, upper value) 99.7% interval (lower value, upper value) e) Use StatCrunch to determine the count and percentage of observations falling in each of these intervals by following the instructions listed below or using another appropriate counting method. Properly label and list these counts and percentages in your document. Start in the “Female Math SAT Scores” data set (found in your StatCrunch Group). Go to Data → Row Selection → Interactive Tools. In the slider selectors box, click the variable Math into the variable box. Then Click compute. The box that appears has a slider under the words Math that allows you to create ranges of scores that you determined in 4d. Use the slider to obtain the count for each interval by looking at the “# rows selected” presented in the first line of the box. Calculate the percentages from the counts you obtained for each interval and include them in your document. f) Do each of the three percentages found in part 4e match to what the Empirical Rule predicts? Compare your results in 4e with the expected percentage stated in the empirical rule. State your answer in one to three sentences. g) Suppose a new female student with a Math SAT score of 700 was recorded. Calculate the z- score of this ‘new’ score and explain in a complete sentence what this z-score indicates. 5
1 Sample Solution to Display Formatting Problem X: Students’ Grades A random sample of 30 students was selected from a STAT 250 course taught during the summer session and their first exam scores were recorded. a) Create a histogram in StatCrunch. Be sure to title and label it correctly. b) Interpret the histogram’s shape See sample solution and formatting on page 2. Notes about submission Following the main points will help you submit a professionally completed assignment. 1) 2) 3) 4) Right justify your name and provide your correct section and the due date. Center the specific homework assignment title. Bold each problem complete problem number. The graph can be around the below size for readability (click on the graph once and only adjust the size of the graph by using the bottom right dot) 5) Remember not to include the questions in your answer. Only provide answers. Please keep the assignment in problem and part order (present 1a, then 1b, and so on). 2 Kenneth Strazzeri STAT 250-0xx (your correct section) Data Analysis Assignment 1 Problem X a) b) The shape of this distribution is left skewed because I see the majority of the data values falling in the upper end of the distribution and a few 50s and 60s skewing the shape. There does not seem to be any outliers visible on the graph.

Tutor Answer

Effective_Meg
School: Cornell University

All done. I have also attached a pdf copy of the solutionsLemme know if you need anything else

1
Name
STAT 250-0xx (your correct section)

Data Analysis Assignment 1

Problem 1: 2018 Movies
a) One-way table for the Genre.
Frequency table results for Genre:
Count = 169
Genre

Frequency

Percent of Total

Action/Adventure

25

14.79

Children's/Animated

21

12.43

Comedy

31

18.34

Drama

50

29.59

Horror/Thriller

8

4.73

Mystery/Crime

14

8.28

Sci-Fi/Fantasy

20

11.83

b) From the frequency table results for Genre, Horror/Thriller is the least popular genre with
4.73 percent whereas the most popular genre is Drama with 29.59 percent of total.

2
c) Two-way table for the variables “Genre” and “Viewer Rating”.
Contingency table results:
Rows: Genre
Columns: Viewer Rating
Cell format
Count
(Percent of total)

Disliked

Liked

Total

Action/Adventure

2
(1.18%)

23
(13.61%)

25
(14.79%)

Children's/Animated

7
(4.14%)

14
(8.28%)

21
(12.43%)

Comedy

6
(3.55%)

25
(14.79%)

31
(18.34%)

12
(7.1%)

38
(22.49%)

50
(29.59%)

Horror/Thriller

2
(1.18%)

6
(3.55%)

8
(4.73%)

Mystery/Crime

1
(0.59%)

13
(7.69%)

14
(8.28%)

Sci-Fi/Fantasy

6
(3.55%)

14
(8.28%)

20
(11.83%)

Total

36
(21.3%)

133
(78.7%)

169
(100%)

Drama

d) The viewers disliked 36 movies out of the 169 movies which accounts for 21.3% of the
movies.
e) When looking at the one-way table and your two-way table, the values in the
“Frequency” column of the one-way table is the same as the count values in the “Total”
column of the two-way table.

3
f) Two more two-way tables for row percentages and column percentages.
Contingency table results:
Rows: Genre
Columns: Viewer Rating
Cell format
Count
(Row percent)

Disliked

Liked

Total

2
(8%)

23
(92%)

25
(100%)

Children's/Animated

7
(33.33%)

14
(66.67%)

21
(100%)

Comedy

6
(19.35%)

25
(80.65%)

31
(100%)

Drama

12
(24%)

38
(76%)

50
(100%)

Horror/Thriller

2
(25%)

6
(75%)

8
(100%)

Mystery/Crime

1
(7.14%)

13
(92.86%)

14
(100%)

Sci-Fi/Fantasy

6
(30%)

14
(70%)

20
(100%)

36
(21.3%)

133
(78.7%)

169
(100%)

Action/Adventure

Total

4
Contingency table results:
Rows: Genre
Columns: Viewer Rating
Cell format
Count
(Column percent)

Disliked

Liked

Total

2
(5.56%)

23
(17.29%)

25
(14.79%)

Children's/Animated

7
(19.44%)

14
(10.53%)

21
(12.43%)

Comedy

6
(16.67%)

25
(18.8%)

31
(18.34%)

Drama

12
(33.33%)

38
(28.57%)

50
(29.59%)

Horror/Thriller

2
(5.56%)

6
(4.51%)

8
(4.73%)

Mystery/Crime

1
(2.78%)

13
(9.77%)

14
(8.28%)

Sci-Fi/Fantasy

6
(16.67%)

14
(10.53%)

20
(11.83%)

36
(100%)

133
(100%)

169
(100%)

Action/Adventure

Total

g) The Children’s/Animated movies that were liked were 14 accounting for 66.67% of al...

flag Report DMCA
Review

Anonymous
Thanks, good work

Similar Questions
Hot Questions
Related Tags
Study Guides

Brown University





1271 Tutors

California Institute of Technology




2131 Tutors

Carnegie Mellon University




982 Tutors

Columbia University





1256 Tutors

Dartmouth University





2113 Tutors

Emory University





2279 Tutors

Harvard University





599 Tutors

Massachusetts Institute of Technology



2319 Tutors

New York University





1645 Tutors

Notre Dam University





1911 Tutors

Oklahoma University





2122 Tutors

Pennsylvania State University





932 Tutors

Princeton University





1211 Tutors

Stanford University





983 Tutors

University of California





1282 Tutors

Oxford University





123 Tutors

Yale University





2325 Tutors