BIA652 Homework – Statistical Inference Review – 01
(Please show all work, copy/paste as needed from any computer output)
1. Television viewing reached a new high when the Nielsen Company reported a mean daily
viewing time of 8.35 hours per household (USA Today, November 11, 2009). Use a normal
probability distribution with a standard deviation of 2.5 hours to answer the following questions
about daily television viewing per household.
a. What is the probability that a household views television between 5 and 10 hours a
day?
b. How many hours of television viewing must a household have in order to be in the top
3% of all television viewing households?
c. What is the probability that a household views television more than 3 hours a day?
2. According to the Sleep Foundation, the average night’s sleep is 6.8 hours (Fortune, March20,
2006). Assume the standard deviation is .6 hours and that the probability distribution is normal.
a. What is the probability that a randomly selected person sleeps more than 8 hours?
b. What is the probability that a randomly selected person sleeps 6 hours or less?
c. Doctors suggest getting between 7 and 9 hours of sleep each night. What percentage
of the population gets this much sleep?
3. The mean preparation fee H&R Block charged retail customers last year was $183 (The Wall
Street Journal, March 7, 2012). Use this price as the population mean and assume the population
standard deviation of preparation fees is $50.
a. What is the probability that the mean price for a sample of 30 H&R Block retail
customers is within $8 of the population mean?
b. What is the probability that the mean price for a sample of 50 H&R Block retail
customers is within $8 of the population mean?
c. What is the probability that the mean price for a sample of 100 H&R Block retail
customers is within $8 of the population mean?
d. Which, if any, of the sample sizes in parts (a), (b), and (c) would you recommend to
have at least a .95 probability that the sample mean is within $8 of the population mean?
4. The latest available data showed health expenditures were $8086 per person in the United
States or 17.6% of gross domestic product (Centers for Medicare & Medicaid Services website,
April 1, 2012). Use $8086 as the population mean and suppose a survey research firm will take a
sample of 100 people to investigate the nature of their health expenditures. Assume the
population standard deviation is $2500.
a. Show the sampling distribution of the mean amount of health care expenditures for a
sample of 100 people.
b. What is the probability the sample mean will be within ± $200 of the population mean?
c. What is the probability the sample mean will be greater than $9000? If the survey
research firm reports a sample mean greater than $9000, would you question whether the
firm followed correct sampling procedures? Why or why not?
Continuous Probability Distributions
Uniform Probability Distribution
Normal Probability Distribution
Normal Approximation of Binomial Probabilities
Exponential Probability Distribution
f (x)
f (x) Exponential
Uniform
f (x)
Normal
x
x
x
TFB 1
Continuous Probability Distributions
A continuous random variable can assume any value
in an interval on the real line or in a collection of
intervals.
It is not possible to talk about the probability of the
random variable assuming a particular value.
Instead, we talk about the probability of the random
variable assuming a value within a given interval.
TFB 2
Continuous Probability Distributions
f (x)
The probability of the random variable assuming a
value within some given interval from x1 to x2 is
defined to be the area under the graph of the
probability density function between x1 and x2.
f (x) Exponential
Uniform
f (x)
x1 x 2
Normal
x1 xx12 x2
x
x1 x2
x
x
TFB 3
Uniform Probability Distribution
A random variable is uniformly distributed
whenever the probability is proportional to the
interval’s length.
The uniform probability density function is:
f (x) = 1/(b – a) for a < x < b
=0
elsewhere
where: a = smallest value the variable can assume
b = largest value the variable can assume
TFB 4
Uniform Probability Distribution
Expected Value of x
E(x) = (a + b)/2
Variance of x
Var(x) = (b - a)2/12
TFB 5
Uniform Probability Distribution
Example: Slater's Buffet
Slater customers are charged for the amount of
salad they take. Sampling suggests that the amount
of salad taken is uniformly distributed between 5
ounces and 15 ounces.
TFB 6
Uniform Probability Distribution
Uniform Probability Density Function
f(x) = 1/10 for 5 < x < 15
=0
elsewhere
where:
x = salad plate filling weight
TFB 7
Uniform Probability Distribution
Expected Value of x
E(x) = (a + b)/2
= (5 + 15)/2
= 10
Variance of x
Var(x) = (b - a)2/12
= (15 – 5)2/12
= 8.33
TFB 8
Uniform Probability Distribution
Uniform Probability Distribution
for Salad Plate Filling Weight
f(x)
1/10
0
5
10
Salad Weight (oz.)
x
15
TFB 9
Uniform Probability Distribution
What is the probability that a customer
will take between 12 and 15 ounces of salad?
f(x)
P(12 < x < 15) = 1/10(3) = .3
1/10
0
5
10 12
Salad Weight (oz.)
x
15
TFB 10
Area as a Measure of Probability
The area under the graph of f(x) and probability are
identical.
This is valid for all continuous random variables.
The probability that x takes on a value between some
lower value x1 and some higher value x2 can be found
by computing the area under the graph of f(x) over
the interval from x1 to x2.
TFB 11
Normal Probability Distribution
The normal probability distribution is the most
important distribution for describing a continuous
random variable.
It is widely used in statistical inference.
It has been used in a wide variety of applications
including:
• Heights of people
• Rainfall amounts
• Test scores
• Scientific measurements
Abraham de Moivre, a French mathematician,
published The Doctrine of Chances in 1733.
He derived the normal distribution.
TFB 12
Normal Probability Distribution
Normal Probability Density Function
1
( x )2 /2 2
f (x)
e
2
where:
= mean
= standard deviation
= 3.14159
e = 2.71828
TFB 13
Normal Probability Distribution
Characteristics
The distribution is symmetric; its skewness
measure is zero.
x
TFB 14
Normal Probability Distribution
Characteristics
The entire family of normal probability
distributions is defined by its mean and its
standard deviation .
Standard Deviation
Mean
x
TFB 15
Normal Probability Distribution
Characteristics
The highest point on the normal curve is at the
mean, which is also the median and mode.
x
TFB 16
Normal Probability Distribution
Characteristics
The mean can be any numerical value: negative,
zero, or positive.
x
-10
0
25
TFB 17
Normal Probability Distribution
Characteristics
The standard deviation determines the width of the
curve: larger values result in wider, flatter curves.
= 15
= 25
x
TFB 18
Normal Probability Distribution
Characteristics
Probabilities for the normal random variable are
given by areas under the curve. The total area
under the curve is 1 (.5 to the left of the mean and
.5 to the right).
.5
.5
x
TFB 19
Normal Probability Distribution
Characteristics (basis for the empirical rule)
68.26% of values of a normal random variable
are within +/- 1 standard deviation of its mean.
95.44% of values of a normal random variable
are within +/- 2 standard deviations of its mean.
99.72% of values of a normal random variable
are within +/- 3 standard deviations of its mean.
TFB 20
Normal Probability Distribution
Characteristics (basis for the empirical rule)
99.72%
95.44%
68.26%
– 3
– 1
– 2
+ 3
+ 1
+ 2
x
TFB 21
Standard Normal Probability Distribution
Characteristics
A random variable having a normal distribution
with a mean of 0 and a standard deviation of 1 is
said to have a standard normal probability
distribution.
TFB 22
Standard Normal Probability Distribution
Characteristics
The letter z is used to designate the standard
normal random variable.
1
z
0
TFB 23
Standard Normal Probability Distribution
Converting to the Standard Normal Distribution
z
x
We can think of z as a measure of the number of
standard deviations x is from .
TFB 24
Standard Normal Probability Distribution
Example: Pep Zone
Pep Zone sells auto parts and supplies including
a popular multi-grade motor oil. When the stock of
this oil drops to 20 gallons, a replenishment order is
placed.
The store manager is concerned that sales are
being lost due to stockouts while waiting for a
replenishment order.
TFB 25
Standard Normal Probability Distribution
Example: Pep Zone
It has been determined that demand during
replenishment lead-time is normally distributed
with a mean of 15 gallons and a standard deviation
of 6 gallons.
The manager would like to know the probability
of a stockout during replenishment lead-time. In
other words, what is the probability that demand
during lead-time will exceed 20 gallons?
P(x > 20) = ?
TFB 26
Standard Normal Probability Distribution
Solving for the Stockout Probability
Step 1: Convert x to the standard normal distribution.
z = (x - )/
= (20 - 15)/6
= .83
Step 2: Find the area under the standard normal
curve to the left of z = .83.
see next slide
TFB 27
Standard Normal Probability Distribution
Cumulative Probability Table for
the Standard Normal Distribution
z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
.
.
.
.
.
.
.
.
.
.
.
.5
.6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
.6
.7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
.7
.7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
.8
.7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
.9
.8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
.
.
.
.
.
.
.
.
.
.
.
P(z < .83)
TFB 28
Standard Normal Probability Distribution
Solving for the Stockout Probability
Step 3: Compute the area under the standard normal
curve to the right of z = .83.
P(z > .83) = 1 – P(z < .83)
= 1- .7967
= .2033
Probability
of a stockout
P(x > 20)
TFB 29
Standard Normal Probability Distribution
Solving for the Stockout Probability
Area = 1 - .7967
Area = .7967
= .2033
0
.83
z
TFB 30
Standard Normal Probability Distribution
Standard Normal Probability Distribution
If the manager of Pep Zone wants the probability
of a stockout during replenishment lead-time to be
no more than .05, what should the reorder point be?
--------------------------------------------------------------(Hint: Given a probability, we can use the standard
normal table in an inverse fashion to find the
corresponding z value.)
TFB 31
Standard Normal Probability Distribution
Solving for the Reorder Point
Area = .9500
Area = .0500
0
z.05
z
TFB 32
Standard Normal Probability Distribution
Solving for the Reorder Point
Step 1: Find the z-value that cuts off an area of .05
in the right tail of the standard normal
distribution.
z
.00
.01
.02
.03
.04
.05
.06
.07
.08
.09
.
.
.
.
.
.
.
.
.
.
.
1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441
1.6 .9452 .9463 .9474 .9484 .9495 .9505 .9515 .9525 .9535 .9545
1.7 .9554 .9564 .9573 .9582 .9591 .9599 .9608 .9616 .9625 .9633
up.9706
1.8 .9641 .9649 .9656 .9664 .9671 .9678 .9686 We
.9693look
.9699
the.9756
complement
1.9 .9713 .9719 .9726 .9732 .9738 .9744 .9750
.9761 .9767
.
.
.
.
.
.
.
.
of the
tail. area .
.
(1 - .05 = .95)
TFB 33
Standard Normal Probability Distribution
Solving for the Reorder Point
Step 2: Convert z.05 to the corresponding value of x.
x = + z.05
= 15 + 1.645(6)
= 24.87 or 25
A reorder point of 25 gallons will place the probability
of a stockout during leadtime at (slightly less than) .05.
TFB 34
Normal Probability Distribution
Solving for the Reorder Point
Probability of no
stockout during
replenishment
lead-time = .95
Probability of a
stockout during
replenishment
lead-time = .05
15
24.87
x
TFB 35
Standard Normal Probability Distribution
Solving for the Reorder Point
By raising the reorder point from 20 gallons to
25 gallons on hand, the probability of a stockout
decreases from about .20 to .05.
This is a significant decrease in the chance that
Pep Zone will be out of stock and unable to meet a
customer’s desire to make a purchase.
TFB 36
Excel NORMDIST
TFB 37
Excel NORMINV
TFB 38
Excel NORMSDIST
TFB 39
Excel NORMSINV
TFB 40
Excel STANDARDIZE
TFB 41
Normal Approximation of Binomial Probabilities
When the number of trials, n, becomes large,
evaluating the binomial probability function by hand
or with a calculator is difficult.
The normal probability distribution provides an
easy-to-use approximation of binomial probabilities
where np > 5 and n(1 - p) > 5.
In the definition of the normal curve, set
= np and np (1 p )
TFB 42
Normal Approximation of Binomial Probabilities
Add and subtract a continuity correction factor
because a continuous distribution is being used to
approximate a discrete distribution.
For example, P(x = 12) for the discrete binomial
probability distribution is approximated by
P(11.5 < x < 12.5) for the continuous normal
distribution.
TFB 43
Normal Approximation of Binomial Probabilities
Example
Suppose that a company has a history of making
errors in 10% of its invoices. A sample of 100
invoices has been taken, and we want to compute
the probability that 12 invoices contain errors.
In this case, we want to find the binomial
probability of 12 successes in 100 trials. So, we set:
= np = 100(.1) = 10
np (1 p ) = [100(.1)(.9)] ½ = 3
TFB 44
Normal Approximation of Binomial Probabilities
Normal Approximation to a Binomial Probability
Distribution with n = 100 and p = .1
=3
P(11.5 < x < 12.5)
(Probability
of 12 Errors)
= 10
11.5
12.5
x
TFB 45
Normal Approximation of Binomial Probabilities
Normal Approximation to a Binomial Probability
Distribution with n = 100 and p = .1
P(x < 12.5) = .7967
10 12.5
x
TFB 46
Normal Approximation of Binomial Probabilities
Normal Approximation to a Binomial Probability
Distribution with n = 100 and p = .1
P(x < 11.5) = .6915
10
x
11.5
TFB 47
Normal Approximation of Binomial Probabilities
The Normal Approximation to the Probability
of 12 Successes in 100 Trials is .1052
P(x = 12)
= .7967 - .6915
= .1052
10
11.5
12.5
x
TFB 48
Exponential Probability Distribution
The exponential probability distribution is useful in
describing the time it takes to complete a task.
The exponential random variables can be used to
describe:
•Time between vehicle arrivals at a toll booth
•Time required to complete a questionnaire
•Distance between major defects in a highway
In waiting line applications, the exponential
distribution is often used for service times.
TFB 49
Exponential Probability Distribution
A property of the exponential distribution is that the
mean and standard deviation are equal.
The exponential distribution is skewed to the right.
Its skewness measure is 2.
TFB 50
Exponential Probability Distribution
Density Function
f ( x)
where:
1
e x /
for x > 0
= expected or mean
e = 2.71828
TFB 51
Exponential Probability Distribution
Cumulative Probabilities
P ( x x0 ) 1 e xo /
where:
x0 = some specific value of x
TFB 52
Exponential Probability Distribution
Example: Al’s Full-Service Pump
The time between arrivals of cars at Al’s fullservice gas pump follows an exponential probability
distribution with a mean time between arrivals of 3
minutes. Al would like to know the probability that
the time between two successive arrivals will be 2
minutes or less.
TFB 53
Exponential Probability Distribution
Example: Al’s Full-Service Pump
f(x)
.4
P(x < 2) = 1 - 2.71828-2/3 = 1 - .5134 = .4866
.3
.2
.1
x
0 1 2 3 4 5 6 7 8 9 10
Time Between Successive Arrivals (mins.)
TFB 54
Excel EXPONDIST
TFB 55
Relationship between the Poisson
and Exponential Distributions
The Poisson distribution
provides an appropriate description
of the number of occurrences
per interval
The exponential distribution
provides an appropriate description
of the length of the interval
between occurrences
TFB 56
End
TFB 57
Sampling and Sampling Distributions
Selecting a Sample
Point Estimation
Introduction to Sampling Distributions
Sampling Distribution of x
Sampling Distribution of p
Properties of Point Estimators
Other Sampling Methods
TFB 1
Introduction
An element is the entity on which data are collected.
A population is a collection of all the elements of
interest.
A sample is a subset of the population.
The sampled population is the population from
which the sample is drawn.
A frame is a list of the elements that the sample will
be selected from.
TFB 2
Introduction
The reason we select a sample is to collect data to
answer a research question about a population.
The sample results provide only estimates of the
values of the population characteristics.
The reason is simply that the sample contains only
a portion of the population.
With proper sampling methods, the sample results
can provide “good” estimates of the population
characteristics.
TFB 3
Selecting a Sample
Sampling from a Finite Population
Sampling from an Infinite Population
TFB 4
Sampling from a Finite Population
Finite populations are often defined by lists such as:
• Organization membership roster
• Credit card account numbers
• Inventory product numbers
A simple random sample of size n from a finite
population of size N is a sample selected such that
each possible sample of size n has the same probability
of being selected.
TFB 5
Sampling from a Finite Population
Replacing each sampled element before selecting
subsequent elements is called sampling with
replacement.
Sampling without replacement is the procedure
used most often.
In large sampling projects, computer-generated
random numbers are often used to automate the
sample selection process.
TFB 6
Sampling from a Finite Population
Example: St. Andrew’s College
St. Andrew’s College received 900 applications for
admission in the upcoming year from prospective
students. The applicants were numbered, from 1 to
900, as their applications arrived. The Director of
Admissions would like to select a simple random
sample of 30 applicants.
TFB 7
Sampling from a Finite Population
Example: St. Andrew’s College
Step 1: Assign a random number to each of the 900
applicants.
The random numbers generated by Excel’s
RAND function follow a uniform probability
distribution between 0 and 1.
Step 2: Select the 30 applicants corresponding to the
30 smallest random numbers.
TFB 8
Sampling from an Infinite Population
Sometimes we want to select a sample, but find it is
not possible to obtain a list of all elements in the
population.
As a result, we cannot construct a frame for the
population.
Hence, we cannot use the random number selection
procedure.
Most often this situation occurs in infinite population
cases.
TFB 9
Sampling from an Infinite Population
Populations are often generated by an ongoing process
where there is no upper limit on the number of units
that can be generated.
Some examples of on-going processes, with infinite
populations, are:
• parts being manufactured on a production line
• transactions occurring at a bank
• telephone calls arriving at a technical help desk
• customers entering a store
TFB 10
Sampling from an Infinite Population
In the case of an infinite population, we must select
a random sample in order to make valid statistical
inferences about the population from which the
sample is taken.
A random sample from an infinite population is a
sample selected such that the following conditions
are satisfied.
• Each element selected comes from the population
of interest.
• Each element is selected independently.
TFB 11
Point Estimation
Point estimation is a form of statistical inference.
In point estimation we use the data from the sample
to compute a value of a sample statistic that serves
as an estimate of a population parameter.
We refer to
mean .
as the point estimator of the population
s is the point estimator of the population standard
deviation .
is the point estimator of the population proportion p.
TFB 12
Point Estimation
Example: St. Andrew’s College
Recall that St. Andrew’s College received 900
applications from prospective students. The
application form contains a variety of information
including the individual’s Scholastic Aptitude Test
(SAT) score and whether or not the individual desires
on-campus housing.
At a meeting in a few hours, the Director of
Admissions would like to announce the average SAT
score and the proportion of applicants that want to
live on campus, for the population of 900 applicants.
TFB 13
Point Estimation
Example: St. Andrew’s College
However, the necessary data on the applicants have
not yet been entered in the college’s computerized
database. So, the Director decides to estimate the
values of the population parameters of interest based
on sample statistics. The sample of 30 applicants is
selected using computer-generated random numbers.
TFB 14
Point Estimation
x as Point Estimator of
x
x
50,520
1684
30
30
i
s as Point Estimator of
s
2
(
x
x
)
i
29
210, 512
85.2
29
p as Point Estimator of p
p 20 30 .67
Note: Different random numbers would have
identified a different sample which would have
resulted in different point estimates.
TFB 15
1
Point Estimation
Once all the data for the 900 applicants were entered
in the college’s database, the values of the population
parameters of interest were calculated.
Population Mean SAT Score
xi
1697
900
Population Standard Deviation for SAT Score
2
(
x
)
i
900
87.4
Population Proportion Wanting On-Campus Housing
648
p
.72
900
TFB 16
Summary of Point Estimates
Obtained from a Simple Random Sample
Population
Parameter
Parameter
Value
= Population mean
1697
SAT score
Point
Estimator
x = Sample mean
1684
85.2
SAT score
= Population std.
87.4
s = Sample standard deviation
for SAT score
p = Population proportion wanting
campus housing
.72
p = Sample pro-
deviation for
SAT score
Point
Estimate
.67
portion wanting
campus housing
TFB 17
Practical Advice
The target population is the population we want to
make inferences about.
The sampled population is the population from
which the sample is actually taken.
Whenever a sample is used to make inferences
about a population, we should make sure that the
targeted population and the sampled population
are in close agreement.
TFB 18
Sampling Distribution of x
Process of Statistical Inference
Population
with mean
=?
The value of x is used to
make inferences about
the value of .
A simple random sample
of n elements is selected
from the population.
The sample data
provide a value for
the sample mean x .
TFB 19
Sampling Distribution of x
The sampling distribution of x is the probability
distribution of all possible values of the sample
mean x.
• Expected Value of
x
E( x ) =
where: = the population mean
When the expected value of the point estimator
equals the population parameter, we say the point
estimator is unbiased.
TFB 20
Sampling Distribution of x
• Standard Deviation of x
We will use the following notation to define the
standard deviation of the sampling distribution of x.
x = the standard deviation of x
= the standard deviation of the population
n = the sample size
N = the population size
TFB 21
Sampling Distribution of x
• Standard Deviation of x
Finite Population
N n
x
( )
N 1 n
Infinite Population
x
n
• A finite population is treated as being
infinite if n/N < .05.
• ( N n) / ( N 1) is the finite population
correction factor.
• x is referred to as the standard error of the
mean.
TFB 22
Sampling Distribution of x
When the population has a normal distribution, the
sampling distribution of is normally distributed
for any sample size.
In most applications, the sampling distribution of
can be approximated by a normal distribution
whenever the sample is size 30 or more.
In cases where the population is highly skewed or
outliers are present, samples of size 50 may be
needed.
TFB 23
Sampling Distribution of x
The sampling distribution of can be used to
provide probability information about how close
the sample mean is to the population mean .
TFB 24
The Distribution of the Sample Mean
The Case of a Normal Population Distribution
A normal population
distribution and X
sampling distributions.
TFB 25
Simulation Example: Platelet Sizes…
TFB 26
Simulation Example: Platelet Sizes…
TFB 27
Simulation Example: Platelet Sizes…
TFB 28
Central Limit Theorem
When the population from which we are selecting
a random sample does not have a normal distribution,
the central limit theorem is helpful in identifying the
shape of the sampling distribution of x .
CENTRAL LIMIT THEOREM
In selecting random samples of size n from a
population, the sampling distribution of the sample
mean can be approximated by a normal
distribution as the sample size becomes large.
TFB 29
The Distribution of the Sample Mean
The Central Limit Theorem
The Central Limit
Theorem illustrated.
TFB 30
Simulation Example: Electronic Control Lifetimes…
TFB 31
Simulation Example: Electronic Control
Lifetimes…(cont’d)
TFB 32
Simulation Example: Electronic Control
Lifetimes…(cont’d)
TFB 33
Sampling Distribution of x
Example: St. Andrew’s College
Sampling
Distribution
of x
for SAT
Scores
E( x ) 1697
x
n
87.4
15.96
30
x
TFB 34
Sampling Distribution of x
Example: St. Andrew’s College
What is the probability that a simple random
sample of 30 applicants will provide an estimate of
the population mean SAT score that is within +/10
of the actual population mean ?
In other words, what is the probability that x will
be between 1687 and 1707?
TFB 35
Sampling Distribution of x
Example: St. Andrew’s College
Step 1: Calculate the z-value at the upper endpoint of
the interval.
z = (1707 - 1697)/15.96= .63
Step 2: Find the area under the curve to the left of the
upper endpoint.
P(z < .63) = .7357
TFB 36
Sampling Distribution of x
Example: St. Andrew’s College
Cumulative Probabilities for
the Standard Normal Distribution
TFB 37
Sampling Distribution of x
Example: St. Andrew’s College
Sampling
Distribution
of x
for SAT
Scores
x 15.96
Area = .7357
x
1697 1707
TFB 38
Sampling Distribution of x
Example: St. Andrew’s College
Step 3: Calculate the z-value at the lower endpoint of
the interval.
z = (1687 - 1697)/15.96= - .63
Step 4: Find the area under the curve to the left of the
lower endpoint.
P(z < -.63) = .2643
TFB 39
Sampling Distribution of x for SAT Scores
Example: St. Andrew’s College
Sampling
Distribution
of x
for SAT
Scores
x 15.96
Area = .2643
x
1687 1697
TFB 40
Sampling Distribution of x for SAT Scores
Example: St. Andrew’s College
Step 5: Calculate the area under the curve between
the lower and upper endpoints of the interval.
P(-.68 < z < .68) = P(z < .68) - P(z < -.68)
= .7357 - .2643
= .4714
The probability that the sample mean SAT score will
be between 1687 and 1707 is:
P(1687 <
x < 1707) = .4714
TFB 41
Sampling Distribution of x for SAT Scores
Example: St. Andrew’s College
Sampling
Distribution
of x
for SAT
Scores
x 15.96
Area = .4714
1687 1697 1707
x
TFB 42
Relationship Between the Sample Size
and the Sampling Distribution of x
Example: St. Andrew’s College
• Suppose we select a simple random sample of 100
applicants instead of the 30 originally considered.
• E( x) = regardless of the sample size. In our
example, E( x) remains at 1697.
• Whenever the sample size is increased, the standard
error of the mean x is decreased. With the increase
in the sample size to n = 100, the standard error of
the mean is decreased from 15.96 to:
x
N n
900 100 87.4
.94333(8.74) 8.2
N 1 n
900 1 100
TFB 43
Relationship Between the Sample Size
and the Sampling Distribution of x
Example: St. Andrew’s College
With n = 100,
x 8.2
With n = 30,
x 15.96
E( x ) 1697
x
TFB 44
Relationship Between the Sample Size
and the Sampling Distribution of x
Example: St. Andrew’s College
• Recall that when n = 30, P(1687 <
x < 1707) = .4714.
• We follow the same steps to solve for P(1687 < x
< 1707) when n = 100 as we showed earlier when
n = 30.
• Now, with n = 100, P(1687 <
x < 1707) = .7776.
• Because the sampling distribution with n = 100 has a
smaller standard error, the values of x have less
variability and tend to be closer to the population
mean than the values of x with n = 30.
TFB 45
Relationship Between the Sample Size
and the Sampling Distribution of x
Example: St. Andrew’s College
Sampling
Distribution
of x
for SAT
Scores
x 8.2
Area = .7776
1687 1697 1707
x
TFB 46
Sampling Distribution of p
Making Inferences about a Population Proportion
Population
with proportion
p=?
The value of p is used
to make inferences
about the value of p.
A simple random sample
of n elements is selected
from the population.
The sample data
provide a value for the
sample proportion p.
TFB 47
Sampling Distribution of p
The sampling distribution of p is the probability
distribution of all possible values of the sample
proportion p.
• Expected Value of p
E ( p) p
where:
p = the population proportion
TFB 48
Sampling Distribution of p
• Standard Deviation of p
Finite Population
N n
p
N 1
p(1 p)
n
Infinite Population
p
p (1 p )
n
• p is referred to as the standard error of
the proportion.
• ( N n) / ( N 1) is the finite population
correction factor.
TFB 49
Form of the Sampling Distribution of p
The sampling distribution of p can be approximated
by a normal distribution whenever the sample size
is large enough to satisfy the two conditions:
np > 5
and
n(1 – p) > 5
. . . because when these conditions are satisfied, the
probability distribution of x in the sample proportion,
p = x/n, can be approximated by normal distribution
(and because n is a constant).
TFB 50
Sampling Distribution of p
Example: St. Andrew’s College
Recall that 72% of the prospective students applying
to St. Andrew’s College desire on-campus housing.
What is the probability that a simple random sample
of 30 applicants will provide an estimate of the
population proportion of applicant desiring on-campus
housing that is within plus or minus .05 of the actual
population proportion?
TFB 51
Sampling Distribution of p
Example: St. Andrew’s College
For our example, with n = 30 and p = .72, the
normal distribution is an acceptable approximation
because:
np = 30(.72) = 21.6 > 5
and
n(1 - p) = 30(.28) = 8.4 > 5
TFB 52
Sampling Distribution of p
Example: St. Andrew’s College
Sampling
Distribution
of p
p
E( p ) .72
.72(1 .72)
.082
30
p
TFB 53
Sampling Distribution of p
Example: St. Andrew’s College
Step 1: Calculate the z-value at the upper endpoint
of the interval.
z = (.77 - .72)/.082 = .61
Step 2: Find the area under the curve to the left of
the upper endpoint.
P(z < .61) = .7291
TFB 54
Sampling Distribution of p
Example: St. Andrew’s College
Cumulative Probabilities for
the Standard Normal Distribution
TFB 55
Sampling Distribution of p
Example: St. Andrew’s College
Sampling
Distribution
of p
p .082
Area = .7291
p
.72 .77
TFB 56
Sampling Distribution of p
Example: St. Andrew’s College
Step 3: Calculate the z-value at the lower endpoint of
the interval.
z = (.67 - .72)/.082 = - .61
Step 4: Find the area under the curve to the left of the
lower endpoint.
P(z < -.61) = .2709
TFB 57
Sampling Distribution of p
Example: St. Andrew’s College
Sampling
Distribution
of p
p .082
Area = .2709
p
.67 .72
TFB 58
Sampling Distribution of p
Example: St. Andrew’s College
Step 5: Calculate the area under the curve between
the lower and upper endpoints of the interval.
P(-.61 < z < .61) = P(z < .61) - P(z < -.61)
= .7291 - .2709
= .4582
The probability that the sample proportion of applicants
wanting on-campus housing will be within +/-.05 of the
actual population proportion :
P(.67 < p < .77) = .4582
TFB 59
Sampling Distribution of p
Example: St. Andrew’s College
Sampling
Distribution
of p
p .082
Area = .4582
p
.67
.72
.77
TFB 60
Properties of Point Estimators
Before using a sample statistic as a point estimator,
statisticians check to see whether the sample statistic
has the following properties associated with good
point estimators.
Unbiased
Efficiency
Consistency
TFB 61
Properties of Point Estimators
Unbiased
If the expected value of the sample statistic is
equal to the population parameter being estimated,
the sample statistic is said to be an unbiased
estimator of the population parameter.
TFB 62
Properties of Point Estimators
Efficiency
Given the choice of two unbiased estimators of
the same population parameter, we would prefer to
use the point estimator with the smaller standard
deviation, since it tends to provide estimates closer to
the population parameter.
The point estimator with the smaller standard
deviation is said to have greater relative efficiency
than the other.
TFB 63
Properties of Point Estimators
Consistency
A point estimator is consistent if the values of
the point estimator tend to become closer to the
population parameter as the sample size becomes
larger.
TFB 64
Other Sampling Methods
Stratified Random Sampling
Cluster Sampling
Systematic Sampling
Convenience Sampling
Judgment Sampling
TFB 65
Stratified Random Sampling
The population is first divided into groups of
elements called strata.
Each element in the population belongs to one and
only one stratum.
Best results are obtained when the elements within
each stratum are as much alike as possible (i.e. a
homogeneous group).
TFB 66
Stratified Random Sampling
A simple random sample is taken from each stratum.
Formulas are available for combining the stratum
sample results into one population parameter
estimate.
Advantage: If strata are homogeneous, this method
is as “precise” as simple random sampling but with
a smaller total sample size.
Example: The basis for forming the strata might be
department, location, age, industry type, and so on.
TFB 67
Cluster Sampling
The population is first divided into separate groups
of elements called clusters.
Ideally, each cluster is a representative small-scale
version of the population (i.e. heterogeneous group).
A simple random sample of the clusters is then taken.
All elements within each sampled (chosen) cluster
form the sample.
TFB 68
Cluster Sampling
Example: A primary application is area sampling,
where clusters are city blocks or other well-defined
areas.
Advantage: The close proximity of elements can be
cost effective (i.e. many sample observations can be
obtained in a short time).
Disadvantage: This method generally requires a
larger total sample size than simple or stratified
random sampling.
TFB 69
Systematic Sampling
If a sample size of n is desired from a population
containing N elements, we might sample one
element for every n/N elements in the population.
We randomly select one of the first n/N elements
from the population list.
We then select every n/Nth element that follows in
the population list.
TFB 70
Systematic Sampling
This method has the properties of a simple random
sample, especially if the list of the population
elements is a random ordering.
Advantage: The sample usually will be easier to
identify than it would be if simple random sampling
were used.
Example: Selecting every 100th listing in a telephone
book after the first randomly selected listing
TFB 71
Convenience Sampling
It is a nonprobability sampling technique. Items are
included in the sample without known probabilities
of being selected.
The sample is identified primarily by convenience.
Example: A professor conducting research might
use student volunteers to constitute a sample.
TFB 72
Convenience Sampling
Advantage: Sample selection and data collection are
relatively easy.
Disadvantage: It is impossible to determine how
representative of the population the sample is.
TFB 73
Judgment Sampling
The person most knowledgeable on the subject of
the study selects elements of the population that he
or she feels are most representative of the population.
It is a nonprobability sampling technique.
Example: A reporter might sample three or four
senators, judging them as reflecting the general
opinion of the senate.
TFB 74
Judgment Sampling
Advantage: It is a relatively easy way of selecting a
sample.
Disadvantage: The quality of the sample results
depends on the judgment of the person selecting the
sample.
TFB 75
Recommendation
It is recommended that probability sampling methods
(simple random, stratified, cluster, or systematic) be
used.
For these methods, formulas are available for
evaluating the “goodness” of the sample results in
terms of the closeness of the results to the population
parameters being estimated.
An evaluation of the goodness cannot be made with
non-probability (convenience or judgment) sampling
methods.
TFB 76
End
TFB 77
Statistical Inference – Part I
Interval Estimation
Population Mean: s Known
Population Mean: s Unknown
Determining the Sample Size
Population Proportion
TFB 1
Margin of Error and the Interval Estimate
A point estimator cannot be expected to provide the
exact value of the population parameter.
An interval estimate can be computed by adding and
subtracting a margin of error to the point estimate.
Point Estimate +/- Margin of Error
The purpose of an interval estimate is to provide
information about how close the point estimate is to
the value of the parameter.
TFB 2
Margin of Error and the Interval Estimate
The general form of an interval estimate of a
population mean is
x Margin of Error
TFB 3
Interval Estimate of a Population Mean:
s Known
In order to develop an interval estimate of a
population mean, the margin of error must be
computed using either:
• the population standard deviation s , or
• the sample standard deviation s
s is rarely known exactly, but often a good estimate
can be obtained based on historical data or other
information.
We refer to such cases as the s known case.
TFB 4
Interval Estimate of a Population Mean:
s Known
There is a 1 - probability that the value of a
sample mean will provide a margin of error of z /2 s x
or less.
Sampling
distribution
of x
/2
1 - of all
x values
z /2 s x
/2
x
z /2 s x
TFB 5
Interval Estimate of a Population Mean:
s Known
Sampling
distribution
of x
/2
interval
does not
include
1 - of all
x values
z /2 s x
/2
x
z /2 s x
interval
includes
x -------------------------]
[------------------------- x -------------------------]
[------------------------[-------------------------
x -------------------------]
TFB 6
Interval Estimate of a Population Mean:
s Known
Interval Estimate of
x z /2
where:
s
n
x is the sample mean
1 - is the confidence coefficient
z/2 is the z value providing an area of
/2 in the upper tail of the standard
normal probability distribution
s is the population standard deviation
n is the sample size
TFB 7
Interval Estimate of a Population Mean:
s Known
Values of z/2 for the Most Commonly Used
Confidence Levels
Confidence
Level
90%
95%
99%
/2
Table
Look-up Area
.10
.05
.01
.05
.025
.005
.9500
.9750
.9950
z/2
1.645
1.960
2.576
TFB 8
Meaning of Confidence
Because 90% of all the intervals constructed using
x 1.645s x will contain the population mean,
we say we are 90% confident that the interval
x 1.645s x includes the population mean .
We say that this interval has been established at the
90% confidence level.
The value .90 is referred to as the confidence
coefficient.
TFB 9
Interval Estimate of a Population Mean:
s Known
Example: Discount Sounds
Discount Sounds has 260 retail outlets throughout
the United States. The firm is evaluating a potential
location for a new outlet, based in part, on the mean
annual income of the individuals in the marketing
area of the new location.
A sample of size n = 36 was taken; the sample
mean income is $41,100. The population is not
believed to be highly skewed. The population
standard deviation is estimated to be $4,500, and the
confidence coefficient to be used in the interval
estimate is .95.
TFB 10
Interval Estimate of a Population Mean:
s Known
Example: Discount Sounds
95% of the sample means that can be observed
are within + 1.96 s x of the population mean .
The margin of error is:
z /2
s
4,500
1.96
1, 470
n
36
Thus, at 95% confidence,
the margin of error is $1,470.
TFB 11
Interval Estimate of a Population Mean:
s Known
Example: Discount Sounds
Interval estimate of is:
$41,100 + $1,470
or
$39,630 to $42,570
We are 95% confident that the interval contains the
population mean.
TFB 12
Interval Estimate of a Population Mean:
s Known
Example: Discount Sounds
Confidence Margin
Level
of Error
90%
95%
99%
1233.75
1470.00
1932.00
Interval Estimate
39,866.25 to 42,333.75
39,630.00 to 42,570.00
39,168.00 to 43,032.00
In order to have a higher degree of confidence,
the margin of error and thus the width of the
confidence interval must be larger.
TFB 13
Interval Estimate of a Population Mean:
s Known
Adequate Sample Size
In most applications, a sample size of n = 30 is
adequate.
If the population distribution is highly skewed or
contains outliers, a sample size of 50 or more is
recommended.
TFB 14
Interval Estimate of a Population Mean:
s Known
Adequate Sample Size (continued)
If the population is not normally distributed but is
roughly symmetric, a sample size as small as 15
will suffice.
If the population is believed to be at least
approximately normal, a sample size of less than 15
can be used.
TFB 15
Interval Estimate of a Population Mean:
s Unknown
If an estimate of the population standard deviation s
cannot be developed prior to sampling, we use the
sample standard deviation s to estimate s .
This is the s unknown case.
In this case, the interval estimate for is based on the
t distribution.
(We’ll assume for now that the population is
normally distributed.)
TFB 16
t Distribution
William Gosset, writing under the name “Student”,
is the founder of the t distribution.
Gosset was an Oxford graduate in mathematics
and worked for the Guinness Brewery in Dublin.
He developed the t distribution while working on
small-scale materials and temperature experiments.
TFB 17
t Distribution
The t distribution is a family of similar probability
distributions.
A specific t distribution depends on a parameter
known as the degrees of freedom.
Degrees of freedom refer to the number of
independent pieces of information that go into the
computation of s.
TFB 18
t Distribution
A t distribution with more degrees of freedom has
less dispersion.
As the degrees of freedom increases, the difference
between the t distribution and the standard
normal probability distribution becomes smaller
and smaller.
TFB 19
t Distribution
t distribution
(20 degrees
of freedom)
Standard
normal
distribution
t distribution
(10 degrees
of freedom)
z, t
0
TFB 20
t Distribution
For more than 100 degrees of freedom, the standard
normal z value provides a good approximation to
the t value.
The standard normal z values can be found in the
infinite degrees ( ) row of the t distribution table.
TFB 21
t Distribution
Standard normal
z values
TFB 22
Interval Estimate of a Population Mean:
s Unknown
Interval Estimate
x t /2
s
n
where: 1 - = the confidence coefficient
t/2 = the t value providing an area of /2
in the upper tail of a t distribution
with n - 1 degrees of freedom
s = the sample standard deviation
TFB 23
Interval Estimate of a Population Mean:
s Unknown
Example: Apartment Rents
A reporter for a student newspaper is writing an
article on the cost of off-campus housing. A sample of
16 one-bedroom apartments within a half-mile of
campus resulted in a sample mean of $750 per month
and a sample standard deviation of $55.
Let us provide a 95% confidence interval estimate
of the mean rent per month for the population of onebedroom efficiency apartments within a half-mile of
campus. We will assume this population to be
normally distributed.
TFB 24
Interval Estimate of a Population Mean:
s Unknown
At 95% confidence, = .05, and /2 = .025.
t.025 is based on n - 1 = 16 - 1 = 15 degrees of freedom.
In the t distribution table we see that t.025 = 2.131.
TFB 25
Interval Estimate of a Population Mean:
s Unknown
Interval Estimate
x t.025
s
n
Margin
of Error
55
750 2.131
750 29.30
16
We are 95% confident that the mean rent per month
for the population of one-bedroom apartments within
a half-mile of campus is between $720.70 and $779.30.
TFB 27
26
Excel TDIST
TFB 27
Excel TINV
TFB 28
Interval Estimate of a Population Mean:
s Unknown
Adequate Sample Size
In most applications, a sample size of n = 30 is
adequate when using the expression
to
develop an interval estimate of a population mean.
If the population distribution is highly skewed or
contains outliers, a sample size of 50 or more is
recommended.
TFB 29
Interval Estimate of a Population Mean:
s Unknown
Adequate Sample Size (continued)
If the population is not normally distributed but is
roughly symmetric, a sample size as small as 15
will suffice.
If the population is believed to be at least
approximately normal, a sample size of less than 15
can be used.
TFB 30
Summary of Interval Estimation Procedures
for a Population Mean
Can the
population standard
deviation s be assumed
known ?
Yes
Use the sample
standard deviation
s to estimate s
s Known
Case
Use
x z /2
s
n
No
s Unknown
Case
Use
x t /2
s
n
TFB 31
Sample Size for an Interval Estimate
of a Population Mean
Let E = the desired margin of error.
E is the amount added to and subtracted from the
point estimate to obtain an interval estimate.
If a desired margin of error is selected prior to
sampling, the sample size necessary to satisfy the
margin of error can be determined.
TFB 32
Sample Size for an Interval Estimate
of a Population Mean
Margin of Error
E z /2
s
n
Necessary Sample Size
( z / 2 ) 2 s 2
n
E2
TFB 33
Sample Size for an Interval Estimate
of a Population Mean
The Necessary Sample Size equation requires a
value for the population standard deviation s .
If s is unknown, a preliminary or planning value
for s can be used in the equation.
1. Use the estimate of the population standard
deviation computed in a previous study.
2. Use a pilot study to select a preliminary study and
use the sample standard deviation from the study.
3. Use judgment or a “best guess” for the value of s .
TFB 34
Sample Size for an Interval Estimate
of a Population Mean
Example: Discount Sounds
Recall that Discount Sounds is evaluating a
potential location for a new retail outlet, based in
part, on the mean annual income of the individuals in
the marketing area of the new location.
Suppose that Discount Sounds’ management team
wants an estimate of the population mean such that
there is a .95 probability that the sampling error is
$500 or less.
How large a sample size is needed to meet the
required precision?
TFB 35
Sample Size for an Interval Estimate
of a Population Mean
z /2
s
n
500
At 95% confidence, z.025 = 1.96. Recall that s = 4,500.
(1.96)2 (4, 500)2
n
311.17 312
2
(500)
A sample of size 312 is needed to reach a desired
precision of + $500 at 95% confidence.
TFB 36
Interval Estimate
of a Population Proportion
The general form of an interval estimate of a
population proportion is
p Margin of Error
TFB 37
Interval Estimate
of a Population Proportion
The sampling distribution of plays a key role in
computing the margin of error for this interval
estimate.
The sampling distribution of can be approximated
by a normal distribution whenever np > 5 and
n(1 – p) > 5.
TFB 38
Interval Estimate
of a Population Proportion
Normal Approximation of Sampling Distribution of p
Sampling
distribution
of p
/2
p(1 - p)
sp
n
1 - of all
p values
z /2s p
p
/2
p
z /2s p
TFB 39
Interval Estimate
of a Population Proportion
Interval Estimate
p z / 2
p (1 - p )
n
where: 1 - is the confidence coefficient
z/2 is the z value providing an area of
/2 in the upper tail of the standard
normal probability distribution
p is the sample proportion
TFB 40
Interval Estimate
of a Population Proportion
Example: Political Science, Inc.
Political Science, Inc. (PSI) specializes in voter polls
and surveys designed to keep political office seekers
informed of their position in a race.
Using telephone surveys, PSI interviewers ask
registered voters who they would vote for if the
election were held that day.
TFB 41
Interval Estimate
of a Population Proportion
Example: Political Science, Inc.
In a current election campaign, PSI has just found
that 220 registered voters, out of 500 contacted, favor
a particular candidate. PSI wants to develop a 95%
confidence interval estimate for the proportion of the
population of registered voters that favor the
candidate.
TFB 42
Interval Estimate
of a Population Proportion
p z / 2
p (1 - p )
n
where: n = 500, p = 220/500 = .44, z/2 = 1.96
.44(1 - .44)
= .44 + .0435
.44 1.96
500
PSI is 95% confident that the proportion of all voters
that favor the candidate is between .3965 and .4835.
TFB 43
Sample Size for an Interval Estimate
of a Population Proportion
Margin of Error
E z / 2
p (1 - p )
n
Solving for the necessary sample size, we get
( z / 2 ) 2 p (1 - p )
n
E2
However, p will not be known until after we have
selected the sample. We will use the planning value
p* for p.
TFB 44
Sample Size for an Interval Estimate
of a Population Proportion
Necessary Sample Size
( z / 2 ) 2 p* (1 - p* )
n
E2
The planning value p* can be chosen by:
1. Using the sample proportion from a previous
sample of the same or similar units, or
2. Selecting a preliminary sample and using the
sample proportion from this sample.
3. Use judgment or a “best guess” for a p* value.
4. Otherwise, use .50 as the p* value.
TFB 45
Sample Size for an Interval Estimate
of a Population Proportion
Example: Political Science, Inc.
Suppose that PSI would like a .99 probability that
the sample proportion is within + .03 of the
population proportion.
How large a sample size is needed to meet the
required precision? (A previous sample of similar
units yielded .44 for the sample proportion.)
TFB 46
Sample Size for an Interval Estimate
of a Population Proportion
z /2
p(1 - p )
.03
n
At 99% confidence, z.005 = 2.576. Recall that p = .44.
n
( z /2 )2 p(1 - p)
E2
(2.576)2 (.44)(.56)
1817
2
(.03)
A sample of size 1817 is needed to reach a desired
precision of + .03 at 99% confidence.
TFB 47
Sample Size for an Interval Estimate
of a Population Proportion
Note: We used .44 as the best estimate of p in the
preceding expression. If no information is available
about p, then .5 is often assumed because it provides
the highest possible sample size. If we had used
p = .5, the recommended n would have been 1843.
TFB 48
End Statistical Inference – Part I
Interval Estimation
TFB 49
Statistical Inference – Part II
Hypothesis Testing
Developing Null and Alternative Hypotheses
Type I and Type II Errors
Population Mean: s Known
Population Mean: s Unknown
Population Proportion
Hypothesis Testing and Decision Making
Calculating the Probability of Type II Errors
Determining the Sample Size for
a Hypothesis Test About a Population mean
TFB 1
Hypothesis Testing
Hypothesis testing can be used to determine whether
a statement about the value of a population parameter
should or should not be rejected.
The null hypothesis, denoted by H0 , is a tentative
assumption about a population parameter.
The alternative hypothesis, denoted by Ha, is the
opposite of what is stated in the null hypothesis.
The hypothesis testing procedure uses data from a
sample to test the two competing statements
indicated by H0 and Ha.
TFB 2
Developing Null and Alternative Hypotheses
•
It is not always obvious how the null and alternative
hypotheses should be formulated.
•
Care must be taken to structure the hypotheses
appropriately so that the test conclusion provides
the information the researcher wants.
•
The context of the situation is very important in
determining how the hypotheses should be stated.
•
In some cases it is easier to identify the alternative
hypothesis first. In other cases the null is easier.
• Correct hypothesis formulation will take practice.
TFB 3
Developing Null and Alternative Hypotheses
Alternative Hypothesis as a Research Hypothesis
•
Many applications of hypothesis testing involve
an attempt to gather evidence in support of a
research hypothesis.
•
In such cases, it is often best to begin with the
alternative hypothesis and make it the conclusion
that the researcher hopes to support.
•
The conclusion that the research hypothesis is true
is made if the sample data provide sufficient
evidence to show that the null hypothesis can be
rejected.
TFB 4
Developing Null and Alternative Hypotheses
Alternative Hypothesis as a Research Hypothesis
•
Example:
A new teaching method is developed that is
believed to be better than the current method.
•
Alternative Hypothesis:
The new teaching method is better.
•
Null Hypothesis:
The new method is no better than the old method.
TFB 5
Developing Null and Alternative Hypotheses
Alternative Hypothesis as a Research Hypothesis
•
Example:
A new sales force bonus plan is developed in an
attempt to increase sales.
•
Alternative Hypothesis:
The new bonus plan increase sales.
•
Null Hypothesis:
The new bonus plan does not increase sales.
TFB 6
Developing Null and Alternative Hypotheses
Alternative Hypothesis as a Research Hypothesis
•
Example:
A new drug is developed with the goal of lowering
blood pressure more than the existing drug.
• Alternative Hypothesis:
The new drug lowers blood pressure more than
the existing drug.
•
Null Hypothesis:
The new drug does not lower blood pressure more
than the existing drug.
TFB 7
Developing Null and Alternative Hypotheses
Null Hypothesis as an Assumption to be Challenged
•
We might begin with a belief or assumption that
a statement about the value of a population
parameter is true.
•
We then using a hypothesis test to challenge the
assumption and determine if there is statistical
evidence to conclude that the assumption is
incorrect.
•
In these situations, it is helpful to develop the null
hypothesis first.
TFB 8
Developing Null and Alternative Hypotheses
Null Hypothesis as an Assumption to be Challenged
•
Example:
The label on a soft drink bottle states that it
contains 67.6 fluid ounces.
• Null Hypothesis:
The label is correct. m > 67.6 ounces.
•
Alternative Hypothesis:
The label is incorrect. m < 67.6 ounces.
TFB 9
Summary of Forms for Null and Alternative
Hypotheses about a Population Mean
The equality part of the hypotheses always appears
in the null hypothesis.
In general, a hypothesis test about the value of a
population mean mmust take one of the following
three forms (where m0 is the hypothesized value of
the population mean).
H 0 : m m0
H a : m m0
H 0 : m m0
H a : m m0
H 0 : m m0
H a : m m0
One-tailed
(lower-tail)
One-tailed
(upper-tail)
Two-tailed
TFB 10
Null and Alternative Hypotheses
Example: Metro EMS
A major west coast city provides one of the most
comprehensive emergency medical services in the
world. Operating in a multiple hospital system
with approximately 20 mobile medical units, the
service goal is to respond to medical emergencies
with a mean time of 12 minutes or less.
The director of medical services wants to
formulate a hypothesis test that could use a sample
of emergency response times to determine whether
or not the service goal of 12 minutes or less is being
achieved.
TFB 11
Null and Alternative Hypotheses
H0: m
The emergency service is meeting
the response goal; no follow-up
action is necessary.
Ha:m
The emergency service is not
meeting the response goal;
appropriate follow-up action is
necessary.
where: m = mean response time for the population
of medical emergency requests
TFB 12
Type I Error
Because hypothesis tests are based on sample data,
we must allow for the possibility of errors.
A Type I error is rejecting H0 when it is true.
The probability of making a Type I error when the
null hypothesis is true as an equality is called the
level of significance.
Applications of hypothesis testing that only control
the Type I error are often called significance tests.
TFB 13
Type II Error
A Type II error is accepting H0 when it is false.
It is difficult to control for the probability of making
a Type II error.
Statisticians avoid the risk of making a Type II
error by using “do not reject H0” and not “accept H0”.
TFB 14
Type I and Type II Errors
Population Condition
Conclusion
H0 True
(m < 12)
H0 False
(m > 12)
Accept H0
(Conclude m < 12)
Correct
Decision
Type II Error
Type I Error
Correct
Decision
Reject H0
(Conclude m > 12)
TFB 15
p-Value Approach to
One-Tailed Hypothesis Testing
The p-value is the probability, computed using the
test statistic, that measures the support (or lack of
support) provided by the sample for the null
hypothesis.
If the p-value is less than or equal to the level of
significance , the value of the test statistic is in the
rejection region.
Reject H0 if the p-value < .
TFB 16
Suggested Guidelines for Interpreting p-Values
Less than .01
Overwhelming evidence to conclude Ha is true.
Between .01 and .05
Strong evidence to conclude Ha is true.
Between .05 and .10
Weak evidence to conclude Ha is true.
Greater than .10
Insufficient evidence to conclude Ha is true.
TFB 17
Lower-Tailed Test About a Population Mean:
s Known
p-Value < ,
so reject H0.
p-Value Approach
= .10
Sampling
distribution
x m0
of z
s/ n
p-value
7
z
z = -z =
-1.46 -1.28
0
TFB 18
Upper-Tailed Test About a Population Mean:
s Known
p-Value < ,
so reject H0.
p-Value Approach
Sampling
distribution
x m0
of z
s/ n
= .04
p-Value
z
0
z =
1.75
z=
2.29
TFB 19
Critical Value Approach to
One-Tailed Hypothesis Testing
The test statistic z has a standard normal probability
distribution.
We can use the standard normal probability
distribution table to find the z-value with an area
of in the lower (or upper) tail of the distribution.
The value of the test statistic that established the
boundary of the rejection region is called the
critical value for the test.
The rejection rule is:
• Lower tail: Reject H0 if z < -z
• Upper tail: Reject H0 if z > z
TFB 20
Lower-Tailed Test About a Population Mean:
s Known
Critical Value Approach
Sampling
distribution
x m0
of z
s/ n
Reject H0
Do Not Reject H0
z
z = 1.28
0
TFB 21
Upper-Tailed Test About a Population Mean:
s Known
Critical Value Approach
Sampling
distribution
x m0
of z
s/ n
Reject H0
Do Not Reject H0
z
0
z = 1.645
TFB 22
Steps of Hypothesis Testing
Step 1. Develop the null and alternative hypotheses.
Step 2. Specify the level of significance .
Step 3. Collect the sample data and compute the
value of the test statistic.
p-Value Approach
Step 4. Use the value of the test statistic to compute the
p-value.
Step 5. Reject H0 if p-value < .
TFB 23
Steps of Hypothesis Testing
Critical Value Approach
Step 4. Use the level of significanceto determine the
critical value and the rejection rule.
Step 5. Use the value of the test statistic and the rejection
rule to determine whether to reject H0.
TFB 24
One-Tailed Tests About a Population Mean:
s Known
Example: Metro EMS
The response times for a random sample of 40
medical emergencies were tabulated. The sample
mean is 13.25 minutes. The population standard
deviation is believed to be 3.2 minutes.
The EMS director wants to perform a hypothesis
test, with a .05 level of significance, to determine
whether the service goal of 12 minutes or less is
being achieved.
TFB 25
One-Tailed Tests About a Population Mean:
s Known
p -Value and Critical Value Approaches
1. Develop the hypotheses.
H0: m
Ha:m
2. Specify the level of significance.
= .05
3. Compute the value of the test statistic.
x m 13.25 12
z
2.47
s / n 3.2 / 40
TFB 26
One-Tailed Tests About a Population Mean:
s Known
p –Value Approach
4. Compute the p –value.
For z = 2.47, cumulative probability = .9932.
p–value = 1 .9932 = .0068
5. Determine whether to reject H0.
Because p–value = .0068 < = .05, we reject H0.
There is sufficient statistical evidence
to infer that Metro EMS is not meeting
the response goal of 12 minutes.
TFB 27
One-Tailed Tests About a Population Mean:
s Known
p –Value Approach
Sampling
distribution
x m0
of z
s/ n
= .05
p-value
z
0
z =
1.645
z=
2.47
TFB 28
One-Tailed Tests About a Population Mean:
s Known
Critical Value Approach
4. Determine the critical value and rejection rule.
For = .05, z.05 = 1.645
Reject H0 if z > 1.645
5. Determine whether to reject H0.
Because 2.47 > 1.645, we reject H0.
There is sufficient statistical evidence
to infer that Metro EMS is not meeting
the response goal of 12 minutes.
TFB 29
p-Value Approach to
Two-Tailed Hypothesis Testing
Compute the p-value using the following three steps:
1. Compute the value of the test statistic z.
2. If z is in the upper tail (z > 0), compute the
probability that z is greater than or equal to the
value of the test statistic. If z is in the lower tail
(z < 0), compute the probability that z is less than or
equal to the value of the test statistic.
3. Double the tail area obtained in step 2 to obtain
the p –value.
The rejection rule:
Reject H0 if the p-value < .
TFB 30
Critical Value Approach to
Two-Tailed Hypothesis Testing
The critical values will occur in both the lower and
upper tails of the standard normal curve.
Use the standard normal probability distribution
table to find z/2 (the z-value with an area of /2 in
the upper tail of the distribution).
The rejection rule is:
Reject H0 if z < -z/2 or z > z/2.
TFB 31
Two-Tailed Tests About a Population Mean:
s Known
Example: Glow Toothpaste
The production line for Glow toothpaste is
designed to fill tubes with a mean weight of 6 oz.
Periodically, a sample of 30 tubes will be selected in
order to check the filling process.
Quality assurance procedures call for the
continuation of the filling process if the sample
results are consistent with the assumption that the
mean filling weight for the population of toothpaste
tubes is 6 oz.; otherwise the process will be adjusted.
TFB 32
Two-Tailed Tests About a Population Mean:
s Known
Example: Glow Toothpaste
Assume that a sample of 30 toothpaste tubes
provides a sample mean of 6.1 oz. The population
standard deviation is believed to be 0.2 oz.
Perform a hypothesis test, at the .03 level of
significance, to help determine whether the filling
process should continue operating or be stopped and
corrected.
TFB 33
Two-Tailed Tests About a Population Mean:
s Known
p –Value and Critical Value Approaches
1. Determine the hypotheses.
H0: m
Ha: m 6
2. Specify the level of significance.
= .03
3. Compute the value of the test statistic.
x m0
6.1 6
z
2.74
s / n .2 / 30
TFB 34
Two-Tailed Tests About a Population Mean:
s Known
p –Value Approach
4. Compute the p –value.
For z = 2.74, cumulative probability = .9969
p–value = 2(1 .9969) = .0062
5. Determine whether to reject H0.
Because p–value = .0062 < = .03, we reject H0.
There is sufficient statistical evidence to
infer that the alternative hypothesis is true
(i.e. the mean filling weight is not 6 ounces).
TFB 35
Two-Tailed Tests About a Population Mean:
s Known
p-Value Approach
1/2
p -value
= .0031
1/2
p -value
= .0031
/2 =
/2 =
.015
.015
z
z = -2.74
-z/2 = -2.17
0
z/2 = 2.17
z = 2.74
TFB 36
Two-Tailed Tests About a Population Mean:
s Known
Critical Value Approach
4. Determine the critical value and rejection rule.
For /2 = .03/2 = .015, z.015 = 2.17
Reject H0 if z < -2.17 or z > 2.17
5. Determine whether to reject H0.
Because 2.74 > 2.17, we reject H0.
There is sufficient statistical evidence to
infer that the alternative hypothesis is true
(i.e. the mean filling weight is not 6 ounces).
TFB 37
Two-Tailed Tests About a Population Mean:
s Known
Critical Value Approach
Sampling
distribution
x m0
of z
s/ n
Reject H0
Reject H0
Do Not Reject H0
/2 = .015
-2.17
/2 = .015
0
2.17
z
TFB 38
Confidence Interval Approach to
Two-Tailed Tests About a Population Mean
Select a simple random sample from the population
and use the value of the sample mean x to develop
the confidence interval for the population mean m.
(Confidence intervals are covered in Chapter 8.)
If the confidence interval contains the hypothesized
value m0, do not reject H0. Otherwise, reject H0.
(Actually, H0 should be rejected if m0 happens to be
equal to one of the end points of the confidence
interval.)
TFB 39
Confidence Interval Approach to
Two-Tailed Tests About a Population Mean
The 97% confidence interval for m is
x z / 2
s
6.1 2.17(.2 30) 6.1 .07924
n
or 6.02076 to 6.17924
Because the hypothesized value for the
population mean, m0 = 6, is not in this interval,
the hypothesis-testing conclusion is that the
null hypothesis, H0: m = 6, can be rejected.
TFB 40
Tests About a Population Mean:
s Unknown
Test Statistic
x m0
t
s/ n
This test statistic has a t distribution
with n - 1 degrees of freedom.
TFB 41
Tests About a Population Mean:
s Unknown
Rejection Rule: p -Value Approach
Reject H0 if p –value <
Rejection Rule: Critical Value Approach
H0: mm
Reject H0 if t < -t
H0: mm
Reject H0 if t > t
H0: mm
Reject H0 if t < - t or t > t
TFB 42
p -Values and the t Distribution
The format of the t distribution table provided in most
statistics textbooks does not have sufficient detail
to determine the exact p-value for a hypothesis test.
However, we can still use the t distribution table to
identify a range for the p-value.
An advantage of computer software packages is that
the computer output will provide the p-value for the
t distribution.
TFB 43
Example: Highway Patrol
One-Tailed Test About a Population Mean: s Unknown
A State Highway Patrol periodically samples
vehicle speeds at various locations on a particular
roadway. The sample of vehicle speeds is used to
test the hypothesis H0: m < 65.
The locations where H0 is rejected are deemed the
best locations for radar traps. At Location F, a
sample of 64 vehicles shows a mean speed of 66.2
mph with a standard deviation of 4.2 mph. Use
= .05 to test the hypothesis.
TFB 44
One-Tailed Test About a Population Mean:
s Unknown
p –Value and Critical Value Approaches
1. Determine the hypotheses.
H0: m < 65
Ha: m > 65
2. Specify the level of significance.
= .05
3. Compute the value of the test statistic.
t
x m0 66.2 65
2.286
s / n 4.2 / 64
TFB 45
One-Tailed Test About a Population Mean:
s Unknown
p –Value Approach
4. Compute the p –value.
For t = 2.286, the p–value must be less than .025
(for t = 1.998) and greater than .01 (for t = 2.387).
.01 < p–value < .025
5. Determine whether to reject H0.
Because p–value < = .05, we reject H0.
We are at least 95% confident that the mean speed
of vehicles at Location F is greater than 65 mph.
TFB 46
One-Tailed Test About a Population Mean:
s Unknown
Critical Value Approach
4. Determine the critical value and rejection rule.
For = .05 and d.f. = 64 – 1 = 63, t.05 = 1.669
Reject H0 if t > 1.669
5. Determine whether to reject H0.
Because 2.286 > 1.669, we reject H0.
We are at least 95% confident that the mean speed
of vehicles at Location F is greater than 65 mph.
Location F is a good candidate for a radar trap.
TFB 47
One-Tailed Test About a Population Mean:
s Unknown
Reject H0
Do Not Reject H0
0
t =
1.669
t
TFB 48
A Summary of Forms for Null and Alternative
Hypotheses About a Population Proportion
The equality part of the hypotheses always appears
in the null hypothesis.
In general, a hypothesis test about the value of a
population proportion p must take one of the
following three forms (where p0 is the hypothesized
value of the population proportion).
H0: p > p0
H0: p < p0
H0: p = p0
Ha: p < p0
Ha: p > p0
Ha: p ≠ p0
One-tailed
(lower tail)
One-tailed
(upper tail)
Two-tailed
TFB 49
Tests About a Population Proportion
Test Statistic
z
p p0
sp
where:
sp
p0 (1 p0 )
n
assuming np > 5 and n(1 – p) > 5
TFB 50
Tests About a Population Proportion
Rejection Rule: p –Value Approach
Reject H0 if p –value <
Rejection Rule: Critical Value Approach
H0: pp
Reject H0 if z > z
H0: pp
Reject H0 if z < -z
H0: pp
Reject H0 if z < -z or z > z
TFB 51
Two-Tailed Test About a
Population Proportion
Example: National Safety Council (NSC)
For a Christmas and New Year’s week, the
National Safety Council estimated that 500 people
would be killed and 25,000 injured on the nation’s
roads. The NSC claimed that 50% of the accidents
would be caused by drunk driving.
A sample of 120 accidents showed that 67 were
caused by drunk driving. Use these data to test the
NSC’s claim with = .05.
TFB 52
Two-Tailed Test About a
Population Proportion
p –Value and Critical Value Approaches
1. Determine the hypotheses.
H 0 : p .5
H a : p .5
2. Specify the level of significance.
= .05
3. Compute the value of the test statistic.
a common
error is using
p in this
formula
sp
z
p0 (1 p0 )
.5(1 .5)
.045644
n
120
p p0
sp
(67 /120) .5
1.28
.045644
TFB 53
Two-Tailed Test About a
Population Proportion
pValue Approach
4. Compute the p -value.
For z = 1.28, cumulative probability = .8997
p–value = 2(1 .8997) = .2006
5. Determine whether to reject H0.
Because p–value = .2006 > = .05, we cannot reject H0.
TFB 54
Two-Tailed Test About a
Population Proportion
Critical Value Approach
4. Determine the criticals value and rejection rule.
For /2 = .05/2 = .025, z.025 = 1.96
Reject H0 if z < -1.96 or z > 1.96
5. Determine whether to reject H0.
Because 1.278 > -1.96 and < 1.96, we cannot reject H0.
TFB 55
Type I and Type II Errors
Population Condition
Conclusion
H0 True
(m < 12)
H0 False
(m > 12)
Accept H0
(Conclude m < 12)
Correct
Decision
Type II Error
Type I Error
Correct
Decision
Reject H0
(Conclude m > 12)
TFB 56
Hypothesis Testing and Decision Making
In many decision-making situations the decision
maker may want, and in some cases may be forced,
to take action with both the conclusion do not reject
H0 and the conclusion reject H0.
In such situations, it is recommended that the
hypothesis-testing procedure be extended to include
consideration of making a Type II error.
TFB 57
Calculating the Probability of a Type II Error
in Hypothesis Tests About a Population Mean
1. Formulate the null and alternative hypotheses.
2. Using the critical value approach, use the level of
significance to determine the critical value and
the rejection rule for the test.
3. Using the rejection rule, solve for the value of the
sample mean corresponding to the critical value of
the test statistic.
TFB 58
Calculating the Probability of a Type II Error
in Hypothesis Tests About a Population Mean
4. Use the results from step 3 to state the values of the
sample mean that lead to the acceptance of H0; this
defines the acceptance region.
5. Using the sampling distribution of x for a value of m
satisfying the alternative hypothesis, and the acceptance
region from step 4, compute the probability that the
sample mean will be in the acceptance region. (This is
the probability of making a Type II error at the chosen
level of m.)
TFB 59
Calculating the Probability
of a Type II Error
Example: Metro EMS (revisited)
Recall that the response times for a random sample
of 40 medical emergencies were tabulated. The sample
mean is 13.25 minutes. The population standard
deviation is believed to be 3.2 minutes.
The EMS director wants to perform a hypothesis test,
with a .05 level of significance, to determine whether or
not the service goal of 12 minutes or less is being
achieved.
TFB 60
Calculating the Probability
of a Type II Error
1. Hypotheses are: H0: m and Ha:m
2. Rejection rule is: Reject H0 if z > 1.645
3. Value of the sample mean that identifies
the rejection region:
x 12
z
1.645
3.2 / 40
3.2
x 12 1.645
12.8323
40
4. We will accept H0 when x < 12.8323
TFB 61
Calculating the Probability
of a Type II Error
5. Probabilities that the sample mean will be
in the acceptance region:
12.8323 m
Values of mb
1-b
3.2 / 40
z
14.0
13.6
13.2
12.8323
12.8
12.4
12.0001
-2.31
-1.52
-0.73
0.00
0.06
0.85
1.645
.0104
.0643
.2327
.5000
.5239
.8023
.9500
.9896
.9357
.7673
.5000
.4761
.1977
.0500
TFB 62
Calculating the Probability
of a Type II Error
Calculating the Probability of a Type II Error
Observations about the preceding table:
When the true population mean m is close to
the null hypothesis value of 12, there is a high
probability that we will make a Type II error.
Example: m = 12.0001, b = .9500
When the true population mean m is far above
the null hypothesis value of 12, there is a low
probability that we will make a Type II error.
Example: m = 14.0, b = .0104
TFB 63
Power of the Test
The probability of correctly rejecting H0 when it is
false is called the power of the test.
For any particular value of m, the power is 1 – b.
We can show graphically the power associated
with each value of m; such a graph is called a
power curve. (See next slide.)
TFB 64
Power Curve
Probability of Correctly
Rejecting Null Hypothesis
1.00
0.90
0.80
H0 False
0.70
0.60
0.50
0.40
0.30
0.20
0.10
0.00
11.5
m
12.0
12.5
13.0
13.5
14.0
14.5
TFB 65
Determining the Sample Size for a Hypothesis Test
About a Population Mean
The specified level of significance determines the
probability of making a Type I error.
By controlling the sample size, the probability of
making a Type II error is controlled.
TFB 66
Determining the Sample Size for a Hypothesis Test
About a Population Mean
Sampling
distribution
of x when
H0 is true
and m = m0
c
Reject H0
H0: mm
Ha:mm
x
m0
Sampling
distribution
of x when
H0 is false
and ma > m0
Note:
b
c
ma
x
TFB 67
Determining the Sample Size for a Hypothesis Test
About a Population Mean
n
where
( z zb ) 2 s 2
(m 0 m a ) 2
z = z value providing an area of in the tail
zb = z value providing an area of b in the tail
s= population standard deviation
m0 = value of the population mean in H0
ma = value of the population mean used for the
Type II error
Note: In a two-tailed hypothesis test, use z /2 not z
TFB 68
Determining the Sample Size for a Hypothesis Test
About a Population Mean
Let’s assume that the director of medical
services makes the following statements about the
allowable probabilities for the Type I and Type II
errors:
•If the mean response time is m = 12 minutes, I am
willing to risk an = .05 probability of rejecting H0.
•If the mean response time is 0.75 minutes over the
specification (m = 12.75), I am willing to risk a b = .10
probability of not rejecting H0.
TFB 69
Determining the Sample Size for a Hypothesis Test
About a Population Mean
= .05, b = .10
z = 1.645, zb = 1.28
m0 = 12, ma = 12.75
s= 3.2
( z zb )2s 2
(1.645 1.28)2 (3.2)2
n
155.75 156
2
2
( m0 ma )
(12 12.75)
TFB 70
Relationship Among , b, and n
Once two of the three values are known, the other
can be computed.
For a given level of significance , increasing the
sample size n will reduce b.
For a given sample size n, decreasing will increase
b, whereas increasing will decrease b.
TFB 71
End of Statistical Inference – Part II
Hypothesis Testing
TFB 72
Purchase answer to see full
attachment