STA 2023 Statistics and the Empirical Rule with Sample Data Problems

User Generated

Enpury_B

Mathematics

STA 2023

STA

Description

Hi,

I need detailed solutions for all the problems in the attached assignment.

Thanks

Unformatted Attachment Preview

Data Set R 31.0 32.8 34.7 34.9 30.8 38.3 31.6 31.1 34.9 32.0 33.3 33.5 31.3 28.2 29.3 34.0 31.4 26.4 39.5 26.9 33.8 39.0 29.2 29.3 27.4 32.3 32.5 38.4 31.6 29.1 25.8 28.3 37.6 33.9 29.1 17.9 33.7 41.1 38.6 38.2 21.4 28.0 28.9 38.3 23.6 27.2 Sample Data, Statistics, and the Empirical Rule Purpose: The purpose of this assignment is to organize a random sample of data values and create statistics, tables, and a graph based on the data. Then the sample data will be compared to the Empirical Rule. The information will then be analyzed in a written summary. Part 1: Random Data, Statistics, and the Empirical Rule Methods: Use Excel (or similar software) to create the tables and graph. Then copy the items and paste them into a Word document. The tables should be formatted vertically, have borders, and be given the labels and titles stated in the assignment. The proper symbols should be used. Do not submit this assignment as an Excel file. The completed assignment should be a Word (or .pdf) document. 1. The data values and relevant information are posted in the course website. Use the data set (P, Q, R, S, or T) assigned to you by your instructor to complete this application. For the purpose of this application, treat the data set as if it represented a certain random variable and was a valid random sample gathered by a researcher from a normally distributed population. The sample data was actually found with an online Gaussian random number generator that creates normally distributed data values. The random number generator simulates the results of a researcher finding those values through observation or experimentation. 2. Use technology (Excel, graphing calculator, etc.) to sort the sample data values from low to high. Use Excel or similar software to put the data into a table with about 5 or 6 columns. Label this “Table 1: Sorted Set of Sample Data.” 3. Using 5 to 10 class intervals, organize the sample data as a frequency distribution in a table. The intervals of the frequency distribution should be rounded to the tenths so that they match the data. Label this “Table 2: Frequency Distribution.” 4. Use Excel (or similar software) to construct a frequency histogram to illustrate the data. Give the axes the proper titles. Label this “Graph 1: Histogram.” 5. Use Table 2, the frequency distribution, to find the midpoints of each class interval. Create a new frequency distribution with the midpoints in the left column and the frequencies in the right column. Label this “Table 3: Frequency Distribution with Midpoints.” 6. Use technology to find the mean, median, standard deviation, and variance of the sample data organized in Table 3 (from step 5 above). Put these values into a table with the proper symbol in the left column and the value of the statistic in the right column. Also, from the original data set, put the values of the range and sample size in the table. The median and range do not generally have symbols so the terms “Median” and “Range” can be used in the left column. Identify the modal class (the one with the highest frequency). Put the terms “Modal Class” in the left column and the class interval in the right column. The statistics should be STA2023 Application (2017) 1 rounded properly (one more decimal place than the data). Label this “Table 4: Summary Statistics” 7. Use the sample mean and standard deviation to find the values related to the Empirical Rule. The Empirical Rule: For a set of data whose distribution is approximately normal, • about 68% of the data are within one standard deviation of the mean. • about 95% of the data are within two standard deviations of the mean. • about 99.7% of the data are within three standard deviations of the mean. Use the value of n and the percents listed above to find how many data values should be within each category. Then use the sample mean and standard deviation to find the lower and upper cut-off values in each category. Then use the sorted list of data to determine how many values are actually in each category. Put the values into a table as shown in the example and label it “Table 5: The Empirical Rule.” Part 2: Written Introduction and Summary 1. Write an introduction to this application. Discuss the random variable and the source of the sample data. Refer to the textbook or class notes to describe the basic components of the application and the statistical concepts that are applied. The introduction should be at least 50 words and be written with proper grammar and spelling. 2. Write a summary of the application, considering the following topics. The summary should be at least 150 words and be written with proper grammar and spelling. Refer to the tables and graph (by label and number) throughout the summary. Use the proper statistical terms and symbols in the summary. Write about the concepts in the application rather than the steps followed. Do not number the parts or steps in the application except for the proper numbering of the tables and graph. • Discuss the difference between population and sample values • Discuss how the frequency distribution was created, specifically referring to the intervals and class width. • Describe the features of the histogram, including the axes and the shape of the distribution, especially whether it meets the criteria of an approximately normal distribution • Discuss measures of center and variation as related to the sample data; given the values used in the random number generator, compare the sample values to those of the population • Discuss the Empirical Rule and compare the actual number of values within each category of the Empirical Rule to the number that should be there • Discuss how the Empirical Rule is related to the criteria for significantly low or high values. • Discuss characteristics of the sample data such as significantly low/high values and skewness, providing support and examples. Use both the histogram and measures of center to identify whether there is skewness. Part 3: Format Requirements STA2023 Application (2017) 2 • • • • • • • • • • Include a title page with your name, STA2023 Application 1, the word count for the introduction, the word count for the summary, and which data set is used. The introduction and summary should be written in paragraphs that are typed and double-spaced, with 1-inch margins and a readable font type and font size. The preferred font size could be 12 Times New Roman. The introduction by itself should be at least 50 words. The summary should be at least 150 words. The introduction and summary should be college-level writing with proper grammar and spelling. Do not use first or second person (I, you, etc.). Do not write about the parts of the assignment or the use of a calculator or excel to complete steps in the application. Write about the concepts used in the application. Throughout the introduction, tables, graph, and summary, proper statistical symbols and terms should be used. The tables should be formatted vertically, have 2 columns (unless specified otherwise), have borders, and be given the labels and titles stated in the assignment. The title page, introduction, tables, graphs, and summary should be a single document (.doc, .docx, or .pdf) which is submitted to the Assignments link in Falcon Online by the due date. Assemble the application in the following order to create a single document. o Title page o Introduction o Tables and Graph o Summary Part 4: Grading: The application is worth up to 30 points, based on the following criteria. • The statistical content (Part 1) is worth up to 10 points, based on completion of all the listed requirements. • The introduction and written summary (Part 2) are worth up to 10 points, based on completion of all the listed requirements. • The format (Part 3) of the complete document submitted to the Assignments link is worth up to 10 points, based on completion of all the listed requirements. Rubric/allocation of 10 points: • 0 points for completing none of the requirements • 2 points for attempting the requirements • 4 points for completing some of the requirements • 8 points for completing most of the requirements • 10 points for completing all of the requirements STA2023 Application (2017) 3 Template for table 5 Sample size: n=48 Category of Empirical Rule 68% 95% 99.70% number of data values which should be within cut off values 33 46 48 Table 5: The Empirical Rule lower cut-off 𝑥 – s = 30.31 𝑥 – 2s = 26.54 𝑥 – 3s = 22.77 upper cut-off 𝑥 + s = 37.85 𝑥 + 2s = 41.26 𝑥 + 3s = 45.39 number of data values which actually are within cut-off values 30 45 48
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached. Please let me know if you have any questions or need revisions.

Sample Data, Statistics
and the Empirical Rule

Introduction :
Random variable is areal valued function define over a sample space. It says, it is a number
associated with each element in a sample space. Sample data gain from the sample space.
Sample space contains all possible outcomes that occurs in a experiment and we called sample
space a population. When we select few outcomes from the sample space, we called that subset
as sample.

Table 1: Sorted Set of Sample Data

Column1
17.9
21.4
23.6
25.8
26.4
26.9
27.2
27.4
28
28.2

Column2
28.3
28.9
29.1
29.1
29.2
29.3
29.3
30.8
31
31.1

Column3
31.3
31.4
31.6
31.6
32
32.3
32.5
32.8
33.3
33.5

Column4
33.7
33.8
33.9
34
34.7
34.9
34.9
37.6
38.2
38.3

Column5
38.3
38.4
38.6
39
39.5
41.1

Table 2 : Frequency Distribution

Class intervals
15-20
20-25
25-30
30-35
35-40
40-45

Frequency
1
2
14
20
8
1

Graph 1 : Histogram

Frequency
25
20
15
10

5
0
15-20

20-25

25-30

30-35

35-40

Table 3: : Frequency Distribution with Midpoints
Class intervals
15-20
20-25
25-30
30-35
35-40
40-45

Mid point
17.5
22.5
27.5
32.5
37.5
42.5

Frequency
1
2
14
20
8
1

40-45

Table 4 : Summary Statistics

Class
intervals
15-20
20-25
25-30
30-35
35-40
40-45

Mid
point(d)* Frequency(f)
17.5
1
22.5
2
27.5
14
32.5
20
37.5
8
42.5
1
46

Ʃfd = 1440

fd
d-x ̅
17.5
-13.80
45
22.50
385
-3.80
650
1.20
300
6.20
42.5
11.20
1440

Ʃf=46

Ʃ𝑓𝑑
𝑥̅ =
Ʃ𝑓
𝑥̅ =31.304
Ʃ𝑓(𝑑 − 𝑥̅ )2 = 1009.24
Ʃ𝑓(𝑑 − 𝑥̅ )2
𝜎2 =
Ʃ𝑓
𝜎 2 = 21.94

σ = 4.68
46 𝑡𝑒𝑟𝑚

Term of 𝑀𝑒𝑑𝑖𝑎𝑛 = ( 2 )

= 23𝑟𝑑 𝑡𝑒𝑟𝑚

Term of Median = 30-35 interval

𝑛
−𝑐
𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑙 + (2
)ℎ
𝑓

Column1
190.44
154.88
202.16
28.8
307.52
125.44
1009.24

cumulative
frequency
1
3
17
37
45
46

ℓ = lower limit of median class interval
C = cumulative frequency preceding to the median class frequency
f = frequency of the class interval to which median belongs
h = width of the class interval
n = sum of frequencies

Median = 26.50
Modal class = 30-35

Mean(𝑥̅ )
Standard deviation(𝜎 2 )
Variance(σ)
Median
Modal class

31.30
21.94
4.68
26.5
30-35

Table 5 : The Empirical Rule
Empirical rule
%
68
95
99.7

Number
of data
31.28
43.7
45.86

Summary :
A population includes all of the elements from a set of data. A sample consists one or more
observations gain by the population. A frequency distribution is an overview of all distinct
values some variable and the number of time they occur. Frequency distribution tells how

frequencies are distributed over values. When making frequency distribution, determine the
range(difference between the highest and lowest observations in data) and decide the number of
classes to estimate approximate size of the interval and then make the class intervals and also
make frequencies according to them.
Histograms are usually presented with vertically rectangular bars with no gap between the
bars. The area under the graph in a histogram represent the total frequency. The X axis gives the
class boundaries and the frequencies is plotted in the Y axis in a histogram. The mid points of
each of the bar in histogram give a bell shaped arc. Since normal distribution is very popular to
model continuous data, histogram may be helpful to understand the data normality.
When discuss the measure of center that measurements, called measures of central tendency.
They are mean, median and mode. Comparison between this three, mean shows as the best
measurement but it also has some disadvantages and median and mode play better roll in
understanding the location of sample data observations. Measures of variation ,called measures
of dispersion. They are standard deviation, range and inter quartile range. Standard deviation is
the most reliable measure. Range is a very poor measure for dispersion. Random number
generator is sometimes used for model sample data with population ones.
The empirical rule is a statistical rule which states that for a normal distribution, almost all
observed data will fall withing three standard deviation of the mean. In particular the empirical
rule predict that 68% of observations fall within the first standard deviation(μ ± σ), 95% within
the first two standard deviations(μ ± 2σ) and 99.7% within the first three standard deviations
(μ ± 3σ).Skewness and symmetry can be identified using the graphical techniques such as
histograms, which clearly shows the shape of a data distributions.


Sample Data, Statistics
and the Empirical Rule

Introduction :
Random variable is areal valued function define over a sample space. It says, it is a number
associated with each element in a sample space. Sample data gain from the sample space.
Sample space contains all possible outcomes that occurs in a experiment and we called sample
space a population. When we select few outcomes from the sample space, we called that subset
as sample.

Table 1: Sorted Set of Sample Data

Column1
17.9
21.4
23.6
25.8
26.4
26.9
27.2
27.4
28
28.2

Column2
28.3
28.9
29.1
29.1
29.2
29.3
29.3
30.8
31
31.1

Column3
31.3
31.4
31.6
31.6
32
32.3
32.5
32.8
33.3
33.5

Column4
33.7
33.8
33.9
34
34.7
34.9
34.9
37.6
38.2
38.3

Column5
38.3
38.4
38.6
39
39.5
41.1

Table 2 : Frequency Distribution

Class intervals
15-20
20-25
25-30
30-35
35-40
40-45

Frequency
1
2
14
20
8
1

Graph 1 : Histogram

Frequency
25
20
15
10
5
0
15-20

20-25

25-30

30-35

35-40

Table 3: : Frequency Distribution with Midpoints
Class intervals
15-20
20-25
25-30
30-35
3...


Anonymous
Really helped me to better understand my coursework. Super recommended.

Studypool
4.7
Indeed
4.5
Sitejabber
4.4

Related Tags