IEGR 351
CHAPTER 10: INFERENCES CONCERNING
PROPORTIONS
Sections 10.1-10.3
Agenda
■ The Estimation of Proportions
■ Hypothesis Concerning One Proportion
■ Hypothesis Concerning Several Proportions
The Estimation of Proportions
■ This chapter will discuss how to do hypothesis testing on the proportions.
■ Some examples we can use to think about this are acceptance sampling or life
testing of a component.
■ The information we need to use to estimate a proportion is the number of times X,
that an appropriate event occurs in n trials, occasions, or observations.
■ The point estimator is denoted as the sample proportion or
𝑋
𝑛
■ If the n trials in our problems satisfies the assumptions in the binomial distribution
we can use the mean and standard deviation we can find mean and standard
deviation of the proportion of successes given by the following equations
𝑛𝑝
=p
𝑛
or
𝑛𝑝(1 − 𝑝)
=
𝑛
𝑝(1 − 𝑝)
𝑛
Estimation of Proportions
■ The confidence interval associated with the proportion is as follows
■
𝑥
𝑛
− 𝑧𝛼Τ2
𝑥
𝑛
𝑥
1−𝑛
𝑥
𝑛
< 𝑝 < + 𝑧𝛼Τ2
𝑛
𝑥
𝑛
𝑥
1−𝑛
𝑛
■ The error equation associated with proportions is as follows
■ 𝐸 = 𝑧𝛼Τ2
𝑝(1−𝑝)
𝑛
■ This is the equation to find the sample size if we know the proportion
■ 𝑛 = 𝑝(1 − 𝑝)
𝑧𝛼Τ2 2
𝐸
■ This is the equation to find the sample size if we do not know the proportion
■ 𝑛=
1 𝑧𝛼Τ2 2
4 𝐸
■ Let’s take a look at some examples
Estimation of Proportions
■ If x=36 of n=100 persons interviewed are familiar with the tax incentives
for installing certain energy-saving devices, construct a 95% confidence
interval for the corresponding true proportion.
■
𝑥
𝑛
=
36
100
= 0.36
■ And 𝑧𝛼Τ2 = 1.96
■ 0.36 − 1.96
0.36∗0.64
100
< 𝑝 < 0.36 + 1.96
0.36∗0.64
100
= 0.266 < 𝑝 < 0.454
■ We are 95% confident that the population proportion of the person familiar
with the tax incentive, p, is contained in the interval from 0.266 to 0.454.
Estimation of Proportions
■ In a sample survey conducted in a large city, 136 of 400 persons answered yes to
the question of whether their city’s public transportation is adequate. With 99%
confidence what can we say about the maximum error, if
■
𝑥
𝑛
=
136
400
= 0.34 is used as an estimate of the corresponding true proportion?
■ And 𝑧𝛼Τ2 = 2.575
■ 𝐸 = 2.575
0.34∗0.66
400
= 0.061
Estimation of Proportions
■ Suppose that we want to estimate the true proportion of defectives in a very large
shipment of adobe bricks, and that we want to be at least 95% confident that the
error is at most 0.04. How large a sample will we need if
■ We have no idea what the true proportion might be;
■ 𝑛=
1 1.96 2
4 0.04
= 600.25 ~601𝑟𝑜𝑢𝑛𝑑 𝑢𝑝 𝑡𝑜 𝑡ℎ𝑒 𝑛𝑒𝑎𝑟𝑒𝑠𝑡 𝑖𝑛𝑡𝑒𝑔𝑒𝑟
■ We know that the true proportion does not exceed 0.12?
■ 𝑛 = 0.12 0.88
1.96 2
0.04
= 253.55~254
Hypothesis Concerning One Proportion
■ Many methods used in sampling inspection, quality control, and reliability verification are
based on tests of null hypothesis that a proportion(percentage or probability) equals some
specified constant.
■ Based on the table below we will look at tests performed with approximate large sample
tests based on the normal approximation to the binomial distribution.
■ Null hypothesis: 𝑝 = 𝑝0
■ Alternative hypothesis: 𝑝 < 𝑝0 , 𝑝 > 𝑝0 , 𝑝 ≠ 𝑝0
■ This test is used when the desire to control the uniformity of a product or operation.
■ The test statistic used for this is below
■ 𝑍=
𝑋−𝑛𝑝0
𝑛𝑝0 (1−𝑝0 )
Critical Regions for Testing 𝐩 = 𝑝0 (large sample)
Alternative Hypothesis
Reject null hypothesis if:
𝑝 < 𝑝0
𝑍 < −𝑧𝛼Τ2
𝑝 > 𝑝0
𝑍 > 𝑧𝛼Τ2
𝑝 ≠ 𝑝0
𝑍 < −𝑧𝛼Τ2 or 𝑍 > 𝑧𝛼Τ2
Hypotheses Concerning One Proportion
■ Transceivers provide wireless communications among electronic components of
consumer products. Responding to a need for a fast, low-cost test of Bluetoothcapable transceivers engineers developed a product test at the wafer level. IN one
set of trial with 60 devices selected from different wafer lots , 48 devices passed.
Test the null hypothesis p>0.070 at the 0.95 level of significance.
■ Parameter of interest: Testing transceivers
■ Null hypothesis: p = 0.70
■ Alternative hypothesis: p > 0.70
■ Level of significance: 𝛼 = 0.05
■ Test Statistic: 𝑍 =
𝑋−𝑛𝑝0
𝑛𝑝0 (1−𝑝0 )
Hypothesis Concerning One Proportion
■ Criterion: Reject the null hypothesis if Z> 1.645
■ Calculation:
■ 𝑧=
48−60(0.70)
60(0.70) (0.30)
= 1.69
■ Decision: Since 𝑧 = 1.69 is greater than 1.645, we reject the null hypothesis at level
0.05. In other words, there is sufficient evidence to conclude that the proportion of
good transceivers that would be produced is greater than 0.70. The P-value,
(P(Z>1.69)=0.0455, somewhat strengthens this conclusion.
Hypotheses Concerning Several Proportions
■ In this case we are testing whether two or more binomial populations have
the same parameter p. We are interested in the null hypothesis being
𝑝1 = 𝑝2 … . . 𝑝𝑛 and the alternative hypothesis similar to we have seen
before.
■ Test Statistic for test concerning the difference between two proportions is
below
■ 𝑍=
𝑋1 𝑋2
−𝑛
𝑛
1
1
ො
ො
𝑝(1−
𝑝)
+
𝑛1 𝑛2
𝑤𝑖𝑡ℎ 𝑝Ƹ =
𝑋1 +𝑋2
𝑛1 +𝑛2
■ The large sample confidence interval for the difference between two
proportions is below
■
𝑥2
𝑛1
𝑥
− 𝑛2
2
± 𝑧𝛼Τ2
𝑥1
𝑥
(1−𝑛1 )
𝑛1
1
𝑛1
+
𝑥2
𝑥
(1−𝑛2 )
𝑛2
2
𝑛2
Hypotheses Concerning Several Proportions
■ A study shows that 16 of 200 tractors produced on one assembly line required extensive
adjustments before they could be shipped, while the same was true for 14 out of 400
tractors produced on another assembly line. At the 0.01 level of significance, does this
support the claim that the second production line does superior?
■ Parameter of interest: The quality of tractors from the production line
■ Null hypothesis: 𝑝1 = 𝑝2 this can also be written as 𝑝1 − 𝑝2 = 𝛿0
■ Alternative hypothesis: 𝑝1 > 𝑝2
■ Level of significance: 𝛼 = 0.01
■ Test Statistic: 𝑍 =
𝑋1 𝑋2
−
𝑛
𝑛
1
1
ො
ො
𝑝(1−
𝑝)
+
𝑛1 𝑛2
𝑤𝑖𝑡ℎ 𝑝Ƹ =
𝑋1 +𝑋2
𝑛1 +𝑛2
■ Criterion: Reject null hypothesis if Z>2,33, where Z is given by the formula
Hypotheses Concerning Several Proportions
■ Calculations
■ 𝑍=
16 14
−
200 400
1
1
0.05 (0.95)
+
200 400
= 2.38 𝑤𝑖𝑡ℎ 𝑝Ƹ =
16+14
200+400
= 0.05
■ Decision: Since Z=2.38 exceed 2.33, the null hypothesis must be rejected; we conclude
that the true proportion of tractors requiring extensive adjustments is greater for the first
assembly line that for the send. The P-value is 0.0087
Hypothesis Concerning Several Proportions
■ With reference to the example on the previous slide let’s find the 95% confidence
interval.
■ 0.08 − 0.035 ± 1.96
0.08 0.92
200
+
0.035 0.965
400
= 0.003 < 𝑝1 − 𝑝2 < 0.087
■ The first assembly line has a rate of extensive adjustment between 3 and 87 out of
1,000 higher than the rate for the second assembly
IEGR 351
CHAPTER 9: INFERENCES CONCERNING
VARIANCES
Part 1: Sections 9.1-9.3
Agenda
■ The Estimation of Variances
■ Hypothesis Concerning One Variance
■ Hypothesis Concerning Two Variances
The Estimation of Variances
■ The sample variance is an unbiased estimator of 𝜎 2 , it does not follow that the
sample standard deviation is also unbiased estimator of 𝜎.
■
𝑆2
=
ത 2
σ𝑛
𝑖=1(𝑋𝑖 −𝑋)
𝑛−1
is an unbiased estimator of 𝜎 2
■ For large samples it is common practice to estimate 𝜎 with 𝑠
■ Population standard deviations can also be estimated in terms of sample range 𝑅
■
𝑅
𝑑2
this is a good estimate to use when the sample size is less than or equal to 5 for
larger samples it is more appropriate to use 𝑠 𝑎𝑠 𝑎𝑛 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑜𝑓 𝜎. Below is a
sampling distribution of the range for different n.
n
2
3
4
5
6
7
8
9
10
𝑑2
1.128
1.693 2.059 2.326 2.534 2.704 2.847
2.970
3.078
𝑑3
0.853 0.888 0.880 0.864 0.848 0.833 0.820 0.808 0.797
Estimation of Variances
■ Let’s use data in the previous chapter to illustrate using the
range estimates.
■ Mine 1: 8,260 8,130 8,350 8,070 8,340
■ Mine 2: 7,950 7,890 7,900 8,140 7,920 7,840
■ Let’s look at the first sample and you can try the second sample
to get practice.
■ To take the range we take the difference of the smallest and
largest value of the sample then we have to find the value 𝑑2 of
the sample
■
𝑅
𝑑2
=
8350−8070
2.326
= 120.4
Estimation of Variances
■ In practical applications, interval estimates 𝜎 𝑜𝑟 𝜎 2 are based on the
sample standard deviation or the sample variance. For random samples
from normal populations, we make use of Theorem 6.5 according to which
is a random variable having the chi square distribution with n-1 degrees of
freedom.
■
(𝑛−1)𝑆 2
𝜎2
■ We can assert with probability 1-𝛼 that the inequality will be satisfied.
■
2
χ1−𝛼/2
<
(𝑛−1)𝑆 2
𝜎2
<
χ2𝛼/2
x^2 =
(𝑛−1)𝑆 2
𝜎2
■ Solving this inequality for 𝜎 2 we can get the confidence interval, ( take the
square root we get the confidence interval for 𝜎
Estimation of Variances
■
(𝑛−1)𝑠 2
χ2𝛼/2
■
The confidence intervals for 𝜎 𝑎𝑛𝑑 𝜎 2 obtained by taking equal tails as in the above formula, do not actually give the
narrowest confidence interval, because the chi square distribution is not symmetrical.
■
Let’s look at an example to use these equations
■
Suppose the refractive indices of 20 pieces of glass (randomly selected from a large shipment purchased by the optical
firm) have a variance of 1.20 ∗ 10−4 . Construct a 95% confidence interval for 𝜎, the standard deviation of the population
sampled.
■
Solution: For 20-1=19 degrees of freedom, χ20.975 = 8.907 and χ20.025 = 32.852 according to the Table 5 for the chi
squared table we can substitute the values into the equation
■
𝜎02 𝜎 2 ≠ 𝜎02
■ This test is used when the desire to control the uniformity of a product or operation.
■ The test statistic used for this is below
2
■ χ =
(𝑛−1)𝑆 2
𝜎02
with n-1 degrees of freedom
■ Below are the critical regions
Critical Regions for Testing 𝜎 2 = 𝜎02 (normal population)
Alternative Hypothesis
Reject null hypothesis if:
𝜎 2 < 𝜎02
2
χ2 < χ1−𝛼
𝜎 2 > 𝜎02
χ2 > χ2𝛼
𝜎 2 ≠ 𝜎02
2
χ2 < χ1−𝛼/2
or χ2 < χ2𝛼/2
Hypothesis Concerning One Variance
■ Remember since the chi squared distribution is not symmetrical. For moderate tests
of degrees of freedom the two tests are nearly the same.
■ Let’s take a look at an example we will still be following the same 8 step procedure
■ The lapping process which is used to grind certain silicon wafers to the proper
thickness is acceptable only if 𝜎, the population standard deviation of the thickness
of dice cut from the wafers is at most 0.50 mil. Use the 0.05 level of significance to
test the null hypothesis 𝜎 = 0.50 against the alternative hypothesis 𝜎 > 0.50, if the
thicknesses of 15 dice cut from such wafers have a standard deviation of 0.064 mil.
■ Parameter of interest: Lapping process to grind silicon wafers
■ Null hypothesis: 𝜎 = 0.50
■ Alternative hypothesis: 𝜎 > 0.50
■ Level of significance: 𝛼 = 0.05
■ Test Statistic:
χ2
=
(𝑛−1)𝑆 2
𝜎02
Hypotheses Concerning One Variance
■ Criterion: Reject the null hypothesis if Χ 2 > 23.685, the value of
2
Χ0.05
for 14 degrees of freedom
■ Calculation:
2
■ χ =
(15−1)(0.64)2
(0.50)2
= 22.94
■ Decision: Since χ2 =22.94 does not exceed 23.685, the null
hypothesis cannot be rejected; even though the sample
standard deviation exceeds 0.50, there is not sufficient
evidence to conclude that the lapping process is unsatisfactory.
Hypotheses Concerning Two Variances
■ Back in chapter 8 we performed a two sampled t test where the variances of the two
populations sampled are equal. This section of the book test the null hypothesis 𝜎12 = 𝜎22 ,
which applies to independent random samples from two normal populations.
■ Statistic for test of equality of two variances (normal populations)
■ 𝐹=
𝑆12
𝑆22
■ The F distribution with 𝑛1 − 1 and 𝑛2 − 1 degrees freedom
■ If the null hypothesis 𝜎12 = 𝜎22 is true, the ratio of the sample variances
𝑆12 𝑎𝑛𝑑 𝑆22 provides a statistic on which tests of the null hypothesis can be based.
■ The critical region is as follows on the next page. Since the tables we have only show the
alpha values for the right hand tails you can use the reciprocal of the original test statistic
and make use of the relation
■ 𝐹1−𝛼 ν1 , ν2 =
1
𝐹𝛼 (ν2 ,ν1 )
Hypotheses Concerning Two Variances
Critical regions for testing 𝝈𝟐𝟏 = 𝝈𝟐𝟐 normal population
Alternative hypothesis
Size
Mean
𝜎12 < 𝜎22
𝑆12
𝐹= 2
𝑆2
𝐹 > 𝐹𝛼 (𝑛2 − 1, 𝑛1 − 1)
𝜎12 > 𝜎22
𝑆12
𝐹= 2
𝑆2
𝐹 > 𝐹𝛼 (𝑛2 − 1, 𝑛1 − 1)
𝜎12 ≠ 𝜎22
𝑆12
𝐹= 2
𝑆2
𝐹 > 𝐹𝛼/2 (𝑛𝑀 − 1, 𝑛𝑚 − 1)
Note: Similar to the chi squared test, equal tails are used in the two-tailed test as
a matter of mathematical convenience, even though the F distribution is not
symmetrical
■ Confidence Interval
■ 𝐹1−𝛼Τ2 𝑛1 − 1, 𝑛2 − 1
𝑠22
𝑠12
<
𝜎22
𝜎12
< 𝐹𝛼/2 𝑛1 − 1, 𝑛2 − 1
Hypotheses Concerning Two Variances
■ It is desired to determine whether there is less variability in the silver
plating done by Company 1 than in that done by Company 2. If
independent random samples of size 12 of the two computers’ work
yield 𝑠1 = 0.035 mil and 𝑠2 = 0.062 mil , test the null hypothesis 𝜎12 =
𝜎22 against the alternative hypothesis 𝜎12 < 𝜎22 at the 0.05 level of
significance.
■ Parameter of interest: variability in silver plating
■ Null hypothesis: 𝜎12 = 𝜎22
■ Alternative hypothesis: 𝜎12 < 𝜎22
■ Level of Significance:𝛼 = 0.05
■ Test Statistic: 𝐹 =
𝑆12
𝑆22
Hypotheses Concerning Two Variances
■ Criterion: Reject null hypothesis if F>2.82, the value of 𝐹0.05 for 11
and 11 degrees of freedom
■ Calculations : 𝐹 =
(0.062)2
(0.035)2
= 3.14
■ Decision: Since F=3.14 exceeds 2.82, the null hypothesis must be
rejected in other words, the data support the contention that the
plating done by Company 1 is less variable than that done by
Company 2.
Hypotheses Concerning Two Variances
■
■
■
■
■
■
■
■
Let’s use this data again
Mine 1: 8,260 8,130 8,350 8,070 8,340
Mine 2: 7,950 7,890 7,900 8,140 7,920 7,840
Use a level of significance 0.02 to test whether it is reasonable
to assume that the variances of the two population sampled are
equal.
Parameter of interest: variability in silver plating
Null hypothesis: 𝜎12 = 𝜎22
Alternative hypothesis: 𝜎12 ≠ 𝜎22
Level of Significance:𝛼 = 0.02
■ Test Statistic: 𝐹 =
𝑆12
𝑆22
Hypotheses Concerning Two Variances
■ Criterion: Reject null hypothesis if F>11.4, the value of 𝐹0.01 for 4 and
5 degrees of freedom
■ Calculations : 𝐹 =
15750
10920
= 1.44
■ Decision: Since F=1.44 does not exceeds 11.4, the null hypothesis
cannot be rejected; there is no real statistical reason to doubt the
equality of the variances of the two populations..
Hypotheses Concerning Two Variances
■ Let’s say we have the following data from an earlier problem based on making green
gasoline from sucrose. The equal sample sizes2are 𝑛1 = 𝑛2 = 9, 𝑠12 = 0.4548, and 𝑠22 =
𝜎
0.1089. Obtain a 98% confidence interval for 22 .
𝜎1
■ Since the degrees of freedom for the F are (𝑛2 −1, 𝑛1 − 1) = (8,8) and 𝛼 Τ2 = 20.01, we
1
𝜎
find 𝐹0.01 = 6.03 and 𝐹0.99 =
= 1/6.03. The 98% confidence interval for 22
𝐹0.01
𝜎1
becomes
■
1 0.1089
0.10889
,
6.03
6.03 0.4548
0.4548
or (0.04,1.44)
■ The wideness of the interval illustrates the large amounts of variability in variances when
sample sizes are small. The second variance 𝜎22 could be as small as one-twenty-fifth of
𝜎12 or it could be larger than 𝜎12 .
■ The procedures used in this chapter depend strongly on the assumption that the
underlying population is normal. The sample variance can change when it depart from
normality.
IEGR 351
Directions: Please review the slides on Inference and the new slides I will put up on proportions.
Remember to show all 8 steps if doing testing to get full points.
Problem 1.
Using the data below estimate the 𝜎 for the Brinell hardness of Alloy 1 in terms of the sample range.
Alloy1
Alloy2
66.3
71.3
63.5
60.4
64.9
62.6
61.8
63.9
64.3
68.8
64.7
70.1
65.1
64.8
64.5
68.9
68.4
65.8
63.2
66.2
Problem 2.
Using the data below construct a 99% confidence interval for the variance of the yield of carbon chain
lengths n=9.
0.63 2.64 1.85 1.68 1.09 1.67 0.73 1.04 0.68
Problem 3.
If 12 determinations of the specific heat of iron have a standard deviation of 0.0086, test the null
hypothesis that 𝜎 = 0.010 for such determinations. Use the alternative hypothesis 𝜎 ≠ 0.010 and the
level of significance 𝛼 = 0.01.
Problem 4.
Pull strength tests on 10 soldered leads for a semiconductor device yield the following results in poundsforce required to rupture the bond:
15.8 12.7 13.2 16.9 10.6 18.8 11.1 14.3 17.0 12.5
Another set of 8 leads was tested after encapsulation to determine whether to pull strength has been
increased by encapsulation of the device, with the following result:
24.9 23.6 19.8 22.1 20.4 21.6 21.8 22.5
Use 0.02 level of significance to test whether it is reasonable to assume that the two samples come from
populations of equal variances.
Problem 5.
In a random sample of 200 claims filed against an insurance company writing collision insurance for cars,
84 exceeded $3,500. Construct a 95% confidence interval for the true proportion of claims filed against
this insurance company that exceed $3,500 using the large sample confidence interval formula.
Problem 6.
In a recent study, 69 of 120 meteorites were observed to enter the earth’s atmosphere with a velocity of
69
less than 26 miles per second. If we estimated the corresponding true proportion as 120 = 0.575
What can we say with 95% confidence about the maximum error?
Problem 7.
What is the size of a smallest sample required to estimate an unknown proportion of customers who
would pay for an additional service, to within a maximum error of 0.06 with at least 95% confidence?
Problem 8.
A manufacturer of submersible pumps claims that at most 30% of the pumps require repairs within the
first 5 years of operation. If a random sample of 120 of these pumps includes 47 which require repairs
within the first 5 years, test the null hypothesis p=0.03 against the null hypothesis p>0.30 at the 0.05
level of significance.
Purchase answer to see full
attachment