One-Way ANOVA Revisited
Module Six introduced ANOVA. The following is a summary:
An experimental design is the plan used to test a hypothesis.
The experimenter controls one or more independent variables. These variables are
called factors or treatment variables. In the Module Six example, the independent
variable was temperature.
Each independent variable has two or more levels, also called categories or
classifications. In the Module Six example, these were the individual temperatures
tested: 68o, 72o, and 76o.
The experimenter observes the effects of the independent variable on the dependent
variable. In the Module Six example, the level of production was the dependent
variable.
ANOVA evaluates the differences among the means of three or more populations using the
following assumptions:
Populations are normally distributed.
Populations have equal variances.
Samples are randomly and independently drawn.
In the Module Six example, the null hypothesis assumed the above, and as a result, stated
that the temperature had no effect on production levels. However, the calculations showed
the null hypothesis to be false. As a result, the null hypothesis was rejected, and the
alternate hypothesis, that temperature has an effect on production levels, was accepted.
Randomized Experiments
When experiments are randomized by design:
Experimental units (subjects) are assigned randomly to samples.
There is only one factor or independent variable.
QSO 530 Module 7
1
There are two or more treatment levels or categories
The experiment is analyzed by a one-way ANOVA
If all factor levels have equal sample sizes, it is called a Balanced Design.
One-Way ANOVA Hypothesis
As discussed in Module Six, the null hypothesis, H0, assumes that all population means are
equal and there is no treatment effect. Thus, H : μ μ μ μ where k is the number
0
1
2
3
k
of treatment levels. The graph below demonstrates a true null hypothesis:
1 = 2 = 3
The alternative hypothesis, HA, assumes that not all of the population means are the same
and at least one population mean is different. This means there is a treatment effect.
However, it does not mean that all population means are different, as some pairs may be the
same. The diagrams below show a false null hypothesis test, but only one mean is different
from the others.
1 = 2 3
Finally, the null hypothesis can be false, and all means are different. Again, this results in a
true alternate hypothesis, as seen below.
2
QSO 530 Module 7
ANOVA of Randomized Experiments
Module Six showed that SS(total) is made up of SS(factor) and SS(error). This same idea is
presented as
SST = SSB + SSW
Where:
SST is the Total Sum of Squares and the aggregate dispersion of the individual data
values across the various factor levels
SSB is the Sum of Squares Between and the dispersion among the factor sample
means
SSW is the Sum of Squares Within and the dispersion that exists among the values
within a particular factor level
The diagram below shows how SST is made up of both SSB and SST values
QSO 530 Module 7
3
The formula for SST is
k
ni
SST (x ij x ) 2
i 1 j1
Where:
k = number of populations (or levels of treatment)
ni = sample size from population i
xij = jth measurement from population i
X = the grand mean or the mean of all data values
A visual of the how the individual data points within each population are distributed about the
grand mean. The red line in the diagram below indicates the value of X (the Grand Mean).
Response, X
Group 1
Group 2
Group 3
Based on the above diagram, the following formula can be developed for SST:
k
ni
SST (x 11 x ) 2 (x 12 x ) 2 ... (x 31 x ) 2 (x 32 x ) 2 ... (x knk x ) 2 (x ij x ) 2
i 1 j 1
Notice that the result is the original formula for SST that is provided above. Again, SST is
made up of two components, SSB and SSW.
Sum of Squares Between
Remember that SSB is the variation due to differences among the populations (levels of
treatment) in a random experiment. The diagram on the next page shows how the mean of
each population is distributed about the grand mean.
4
QSO 530 Module 7
Response, X
Group 1
Group 2
Group 3
SSB is the sum of all the differences between the population mean and the grand mean.
Based on the above diagram, the following formula for SSB is presented below:
k
SSB n1 ( x 1 x ) 2 n2 ( x 2 x ) 2 ... nk ( x k x ) 2 ni ( x i x ) 2
i 1
Where:
SSB = Sum of Squares Between
k = number of populations (3 in the diagram above)
ni = sample size from population i
x i = sample mean from population i
x = grand mean (mean of all data values)
Notice that each population will have a mean that is different than the means of the other
populations. This variation is due to differences among the populations. In the graph below,
there is a difference in the means of µi and µj.
QSO 530 Module 7
5
The mean of the variation of the differences in population is called the Mean of Square
Between (MSB). Thus,
SSB
MSB
k 1
Where:
k-1 is the degrees of freedom
Sum of Squares Within
The other component of SST is SSW (Sum of Squares Within). This value provides an
indicator of the variation within the population itself. The diagram below shows that there is a
variation with respect to the mean of the population for each data value within the population
itself.
Response, X
Group 1
Group 2
Group 3
SSW sums the variation within each group and then adds over all groups. Based on the
above diagram the following formula for SSW is developed:
k
SSW (x 11 x 1 ) 2 (x 12 x 2 ) 2 ... (x knk x k ) 2
i 1
nj
j1
(x ij x i ) 2
Where:
SSW= Sum of Squares Within
k = number of populations
ni = sample size from population i
x i = sample mean from population i
Xij = jth measurement from population i
6
QSO 530 Module 7
The Mean Square Within (MSW) is the mean of SSW. Thus,
MSW =
SSW
Nk
Where:
N-k is the degrees of freedom for the SSW value
F-Test Statistic
The F-Test Statistic is the ratio of the between estimate of variance and the within estimate
of variance. Thus,
MSB
F
MSW
Where:
MSB = the mean squares between variances
MSW = the mean squares within variances
This ratio, F, must always be positive. There are two different degrees of freedom related to
F, which are seen in the calculation of MSB and MSW. The degrees of freedom used to
calculate MSB, df1, is k-1 where k is the number of populations (levels of treatment); df1 is
typically small.
The degrees of freedom used to calculate MSW, df2, is N-k where N is the sum of sample
sizes from all populations. df2 is typically large.
As a result, the ratio, F, will be close to 1 if the null hypothesis H0 is true. Thus, if H0: µ1 = µ2
=…=µk is true, F will be close to 1.
If H0 is false, the ratio, F, will be larger than 1.
QSO 530 Module 7
7
The following table summarizes the above information regarding Variance within a Random
Experiment.
Source of
Variance
SS
Df
MS
Between
Samples
SSB
k-1
MSB
Within
Samples
SSW
N-k
MSW =
Total Variance
SST= SSB +
SSW
N-1
F Ratio
SSB
k 1
F
MSB
MSW
SSW
Nk
Note in the above table that:
k= number of populations
N = sum of the sample sizes from all populations
df = degree of freedom
F-Test Example
Suppose a club manufacturer wants to know how their club compares with the competition.
The company will randomly select five measurements (distances) for each of the three clubs
from trials on an automated driving machine. The company must determine if at the .05
significance level, there is a difference in the mean distance.
The following table provides the selected measurements for each club:
8
Club 1
Club 2
Club 3
254
234
200
263
218
222
241
235
197
237
227
206
QSO 530 Module 7
251
x1 249.2
216
x 2 226.0
204
x 3 205.8
x 227.0
The graph below provides a scatter diagram of the club distances.
In this experiment, H0 assumes that there is no difference in the distance that any of the
clubs will hit a golf ball. Thus, H0: μ1 = μ2 = μ3.
The alternate hypothesis, HA: μi not all equal.
= .05
df1= k -1 = 3 - 1= 2
df2 = N- k = 15 – 3 = 12
QSO 530 Module 7
9
Using the F-Distribution tables, Fα = F0.05 = 3.885. This means that a value of F above 3.885,
will cause H0 to be rejected.
Based on calculations, MSB = 2358.2 and MSW = 93.3. Thus
F
MSB 2358.2
25.275
MSW
93.3
F > Fα and H0 is rejected because there is sufficient evidence that at least one population
mean, µi, differs from the rest. This means that at least one of the clubs hits, on average, at
a greater or a lesser distance than the rest of the clubs. The diagram below shows the
Critical Value in relation to the calculated F value.
MS Excel can also be used to calculate these values. Following is an example of the
ANOVA chart that gets created in Excel.
10
QSO 530 Module 7
SUMMARY
Groups
Count
Sum
Average
Variance
Club 1
5
1246
249.2
108.2
Club 2
5
1130
226
77.5
Club 3
5
1029
205.8
94.2
Source of
Variation
SS
df
MS
F
P-value
F crit
Between
Groups
4716.4
2
2358.2
25.275
4.99E-05
3.885
Within
Groups
1119.6
12
93.3
Total
5836.0
14
ANOVA
QSO 530 Module 7
11
Purchase answer to see full
attachment