__Data Analysis__

Your submitted document should include the following items.

- Type your Name and STAT 250 with your correct section number (e.g. STAT 250-xxx) right justified and then Data Analysis Assignment #1 centered on the top of page 1 below your name the begin
your document . - Number your pages across your entire solutions document.
- Your document should include the ANSWERS ONLY with each answer labeled by its corresponding number and subpart. Keep the answers in order. Do
__not__include the questions in your submitted document. - Generate all requested graphs and tables using
__StatCrunch__. - Upload your document onto Blackboard as a
__Word (docx) file or__using thepdf filelink provided by your instructor. It is your responsibility for uploading a readable file.

Full assignment Instructions, as well as a example is attached as a word file.

Access to StatCrunch is required.

https://www.statcrunch.com/5.0/group.php?groupid=8220

I will provide the

Extra Notes:

- Each graph title should

- For the questions that require calculation, you can do it on a paper but would have to type the solution into word document.

- Please

Tags:
data analysis
Confidence intervals
statistics
Northern Virginia Community College
food prices
STAT250
Frecuency distribution

STAT 250 Summer 2019 Data Analysis Assignment 4
Your submitted document should include the following items. Points will be deducted if the
following are not included.
1.
Type your Name and STAT 250 with your correct section number (e.g. STAT 250-xxx)
right justified and then Data Analysis Assignment #4 centered on the top of page 1
below your name the begin your document.
2.
Number your pages across your entire solutions document.
3.
Your document should include the ANSWERS ONLY with each answer labeled by its
corresponding number and subpart. Keep the answers in order. Do not include the
questions in your submitted document.
4.
Generate all requested graphs and tables using StatCrunch.
5.
Upload your document onto Blackboard as a Word (docx) file or pdf file using the link
provided by your instructor. It is your responsibility for uploading a readable file.
6.
You may not work with other individuals on this assignment. It is an honor code
violation if you do. In addition, using materials for a previous semester of STAT 250
(whether your own or someone else’s) is cheating.
Elements of good technical writing:
Use complete and coherent sentences to answer the questions.
Graphs must be appropriately titled and should refer to the context of the question.
Graphical displays must include labels with units if appropriate for each axis.
Units should always be included when referring to numerical values.
When making a comparison you must use comparative language, such as “greater than”, “less
than”, or “about the same as.”
Ensure that all graphs and tables appear on one page and are not split across two pages.
Type all mathematical calculations when directed to compute an answer ‘by-hand.’
Pictures of actual handwritten work are not accepted on this assignment.
When writing mathematical expressions into your document you may use either an equation
editor or common shortcuts such as:
x can be written as sqrt(x), p̂ can be written as p-hat, x
can be written as x-bar.
1
Problem 1: Appropriateness of Inference
For the following scenarios, answer the questions for each part. In each part, the underlined text
is the name of the StatCrunch data set to be used for that part. Please note, do not conduct
inference in either of these parts; just answer each question.
a) Food Prices: Target versus Safeway. Grocery prices of the same randomly selected
items were collected and compared from Target and Safeway. Imagine you were
interested in conducting a hypothesis test to determine whether the mean prices were
significantly different. Note: to answer the questions below, subtract Target price –
Safeway price (i.e. subtract Safeway price from Target price).
i) What is (are) the parameter(s) of interest? Choose one of the following symbols
( (the mean of one sample) D (the mean difference from a paired (dependent)
samples) − 2 (the mean difference of two independent samples) and describe the
parameter in context of this question in one sentence.
ii) Depending on your answer to part (i), construct one or two relative frequency
histograms. Remember to properly title and label the graph(s). Copy and paste these
graphs into your document.
iii) Describe the shape of the histogram(s) in one sentence.
iv) Depending on your answer to part (i), construct one or two boxplots and copy and
paste these graphs into your document.
v) Does the boxplot (or do the boxplots) show any outliers? Answer this question in one
sentence and identify any outliers if they are present.
vi) Considering your answers to parts (iii) and (v), is inference appropriate in this case?
Why or why not? Defend your answer using the graphs in two to three sentences.
b) GMU Health Center Waiting Time. During the flu season, it is known that the waiting
time at the GMU Health Center can be extreme. A statistics student wanted to test her
claim that the wait time was greater than 100 minutes. She took a random sample of wait
times during the flu season and recorded them in StatCrunch.
i) What is (are) the parameter(s) of interest? Choose one of the following symbols
( (the mean of one sample) D (the mean difference of two paired (dependent)
samples) − 2 (the mean difference of two independent samples) and describe the
parameter in context of this question in one sentence.
ii) Depending on your answer to part (i), construct one or two relative frequency
histograms. Remember to properly title and label the graph(s). Copy and paste the
graph(s) into your document.
iii) Describe the shape of the histogram(s) in one sentence.
2
iv) Depending on your answer to part (i), construct one or two boxplots and copy and
paste these graphs into your document.
v) Does the boxplot (or do the boxplots) show any outliers? Answer this question in one
sentence and identify any outliers if they are present.
vi) Considering the answers provided in parts (iii) and (v), is inference appropriate in this
case? Why or why not? Defend your answer using the graphs in two to three
sentences.
Problem 2: GPA of Students Depending on Where They Sit.
A professor wanted to know whether there was a difference in students’ grade point averages
(GPA) depending on whether they sit in the front half of the classroom versus the back half of
the classroom. In a previous semester, a random sample of students was selected from the front
of a classroom and another random sample was selected from the back of a classroom and the
student’s current GPA was recorded. The data provided in StatCrunch represent the GPAs from
each random sample. The file is called “GPA Versus Seating Location.” At the 0.01
significance level, can the professor conclude from these data that the mean GPA for front sitters
is higher than back sitters? Assume all conditions for conducting inference are satisfied.
Conduct a full hypothesis test by following the steps below. Enter an answer for each of
these steps in your document.
a) Define the population parameter of interest in context of this question in one
sentence.
b) State the null and alternative hypotheses using correct notation.
c) State the significance level for this problem.
d) Calculate the test statistic in StatCrunch using STAT → T Stats → 2 Sample →
With Data. Copy and paste the output table into your document.
e) Label the p-value seen in your output table produced in part (iv) using the
probability notation (it begins with P(…)).
f) State whether you reject or do not reject the null hypothesis and your reason for
your answer in one sentence.
g) State your conclusion in context of the problem (i.e. interpret your results and/or
answer the question being posed) in one or two complete sentences.
Problem 3: Next page
3
Problem 3: Metal Hardness Testing
The manufacturer of hardness testing equipment uses steel-ball indenters to indent metal that is
being tested. However, the manufacturer thinks there might be a difference in hardness reading
when using a diamond indenter. The metal specimens to be tested are large enough so that two
indentations can be made. Therefore, the manufacturer wants to use both indenters on each
specimen and compare the readings. The order of the indentations will be random. This
particular design is called the paired design (or matched pairs design or dependent samples
design). Assume all conditions are satisfied in this problem. The data set used for this problem
is called “Metal Hardness Testing”.
a) Calculate the difference between specimens by subtracting Steel Ball – Diamond. For
example, the first difference is 51 – 52 = -1. List the difference for each of the 14 pairs in
your document.
b) For the first piece of metal, which indenter produced the larger hardness reading?
Answer this question in a complete sentence.
c) Obtain the mean of these differences and the standard deviation of these differences in
StatCrunch. You may copy and paste the box that you obtain from StatCrunch or list the
values. Please round these values to four decimal places.
d) Construct a 95% confidence interval using the above data. Please do this “by hand”
using the formula and showing your work (please type your work). Use your t-table
(found in the last page of our formula packet) to obtain your t* critical value needed for
the confidence interval. Present this confidence as (lower limit, upper limit)
e) Use StatCrunch to obtain a 95% confidence interval for the above data by selecting:
Stat → T Stats → Paired. Enter Steel Ball for Sample 1 and Diamond for Sample 2.
Copy and paste your output into your document.
f) Does your confidence interval capture 0? Answer this question and briefly explain what
this implies in one or two sentences in the context of the question.
g) Using your answer to part (g), imagine you were using a hypothesis test to determine if a
significant difference exists in mean hardness reading between the two indenters (the
hypotheses would be H0: D = 0 vs Ha: D ≠ 0). What decision and conclusion can be
made in this case? Provide an answer and a reason for your choice in one or two
sentences. Please only use your confidence interval to answer this question (i.e. do not
run this hypothesis test).
Problem 4: Next page
4
Problem 4: Lego Prices
The data set named “Lego Prices” contains a selection of Lego sets sold on the Lego website in
August 2016. The goal of this problem is to explore one variable (the number of Pieces a set
contains) that may help a buyer predict the price of a Lego Set. The Price variable is the
response variable in this problem.
a) Investigate the relationship between the explanatory variable “Pieces” and response
variable “Price” by doing the following:
i) Make a scatterplot and copy and paste it in your solutions (use Graph → Scatter
Plot in StatCrunch).
ii) Calculate the correlation coefficient (use Stat → Summary Stats → Correlation in
StatCrunch). Provide this value in your document.
iii) Interpret the scatterplot and correlation coefficient in terms of trend, strength, and
shape (form) in one complete sentence.
b) Using the “Pieces” variable as the explanatory variable, run a Simple Linear Regression
analysis in StatCrunch. Use Stat → Regression → Simple Linear. Copy and paste only
the StatCrunch results output (no tables).
c) Add the fitted line plot to your document. This graph appears on page 2 of your output.
d) Type the regression equation into your document.
e) Interpret the slope of the regression line (in context of this data set).
f) Is it meaningful to interpret the y-intercept? Why or why not?
g) State r-squared (i.e., the coefficient of determination) and explain what this value means
in context of the data set.
h) Use the regression equation from part (d) to predict the price of a randomly selected set
containing 556 pieces. State your predicted value in a sentence that is in context of the
data. Do not forget to mention the units. Note: You can do this calculation “by hand” or
using StatCrunch.
i) Is your prediction in part (h) an example of extrapolation? Why or why not?
5
1
Sample Solution to Display Formatting
Problem X: Students’ Grades
A random sample of 30 students was selected from a STAT 250 course taught during the
summer session and their first exam scores were recorded.
a) Create a histogram in StatCrunch. Be sure to title and label it correctly.
b) Interpret the histogram’s shape
See sample solution and formatting on page 2.
Notes about submission
Following the main points will help you submit a professionally completed assignment.
1)
2)
3)
4)
Right justify your name and provide your correct section and the due date.
Center the specific homework assignment title.
Bold each problem complete problem number.
The graph can be around the below size for readability (click on the graph once and only
adjust the size of the graph by using the bottom right dot)
5) Remember not to include the questions in your answer. Only provide answers. Please
keep the assignment in problem and part order (present 1a, then 1b, and so on).
2
Kenneth Strazzeri
STAT 250-0xx (your correct section)
Data Analysis Assignment 1
Problem X
a)
b) The shape of this distribution is left skewed because I see the majority of the data values
falling in the upper end of the distribution and a few 50s and 60s skewing the shape. There does
not seem to be any outliers visible on the graph.
...

Purchase answer to see full attachment

Purchase answer to see full attachment

Looks good now. Just let me know.

1

Name

Course

Data Analysis Assignment

Problem 1: Appropriateness of Inference

a) Food Prices: Target versus Safeway

i)

D because the mean difference from the paired (dependent) samples needs to be

equal to zero if there is no significant difference.

ii)

iii) The graph shape is only slightly skewed to the left implying that only a few points are

below the average prices.

iv)

2

v) No, it does not show any outliers.

vi) Inference is not appropriate because there mean difference is insignificant and there

are no outliers.

b) GMU Health Center Waiting Time.

i)

because its one sample being tested for average time.

ii)

3

iii) It is skewed to the left implying that more than 50% of the participants experienced

waiting time less than 100.

4

iv) Yes, the boxplot show one outlier above 250 minutes of waiting time.

v) Yes, inference ...

