7-1 Discussion: Analysis of Variance (ANOVA) – Part 2

User Generated

Xhzzvr108

Mathematics

Description

1)Share with your classmates an example of when the F-statistic would be a useful value in data analysis. Provide details of what you would measure and how the F-statistic would be used.

Refer to the Discussion Rubric for directions on completing these discussions.


2)Calculate MSB and MSW for the example problem in the lecture. Compare with the value in the Excel ANOVA table of the F-Test example.

Unformatted Attachment Preview

One-Way ANOVA Revisited Module Six introduced ANOVA. The following is a summary:     An experimental design is the plan used to test a hypothesis. The experimenter controls one or more independent variables. These variables are called factors or treatment variables. In the Module Six example, the independent variable was temperature. Each independent variable has two or more levels, also called categories or classifications. In the Module Six example, these were the individual temperatures tested: 68o, 72o, and 76o. The experimenter observes the effects of the independent variable on the dependent variable. In the Module Six example, the level of production was the dependent variable. ANOVA evaluates the differences among the means of three or more populations using the following assumptions:    Populations are normally distributed. Populations have equal variances. Samples are randomly and independently drawn. In the Module Six example, the null hypothesis assumed the above, and as a result, stated that the temperature had no effect on production levels. However, the calculations showed the null hypothesis to be false. As a result, the null hypothesis was rejected, and the alternate hypothesis, that temperature has an effect on production levels, was accepted. Randomized Experiments When experiments are randomized by design:   Experimental units (subjects) are assigned randomly to samples. There is only one factor or independent variable. QSO 530 Module 7 1   There are two or more treatment levels or categories The experiment is analyzed by a one-way ANOVA If all factor levels have equal sample sizes, it is called a Balanced Design. One-Way ANOVA Hypothesis As discussed in Module Six, the null hypothesis, H0, assumes that all population means are equal and there is no treatment effect. Thus, H : μ  μ  μ    μ where k is the number 0 1 2 3 k of treatment levels. The graph below demonstrates a true null hypothesis: 1 = 2 = 3 The alternative hypothesis, HA, assumes that not all of the population means are the same and at least one population mean is different. This means there is a treatment effect. However, it does not mean that all population means are different, as some pairs may be the same. The diagrams below show a false null hypothesis test, but only one mean is different from the others. 1 = 2  3 Finally, the null hypothesis can be false, and all means are different. Again, this results in a true alternate hypothesis, as seen below. 2 QSO 530 Module 7 ANOVA of Randomized Experiments Module Six showed that SS(total) is made up of SS(factor) and SS(error). This same idea is presented as SST = SSB + SSW Where:  SST is the Total Sum of Squares and the aggregate dispersion of the individual data values across the various factor levels  SSB is the Sum of Squares Between and the dispersion among the factor sample means  SSW is the Sum of Squares Within and the dispersion that exists among the values within a particular factor level The diagram below shows how SST is made up of both SSB and SST values QSO 530 Module 7 3 The formula for SST is k ni SST   (x ij  x ) 2 i 1 j1 Where: k = number of populations (or levels of treatment) ni = sample size from population i xij = jth measurement from population i X = the grand mean or the mean of all data values A visual of the how the individual data points within each population are distributed about the grand mean. The red line in the diagram below indicates the value of X (the Grand Mean). Response, X Group 1 Group 2 Group 3 Based on the above diagram, the following formula can be developed for SST: k ni SST  (x 11  x ) 2  (x 12  x ) 2  ...  (x 31  x ) 2  (x 32  x ) 2  ...  (x knk  x ) 2   (x ij  x ) 2 i  1 j 1 Notice that the result is the original formula for SST that is provided above. Again, SST is made up of two components, SSB and SSW. Sum of Squares Between Remember that SSB is the variation due to differences among the populations (levels of treatment) in a random experiment. The diagram on the next page shows how the mean of each population is distributed about the grand mean. 4 QSO 530 Module 7 Response, X Group 1 Group 2 Group 3 SSB is the sum of all the differences between the population mean and the grand mean. Based on the above diagram, the following formula for SSB is presented below: k SSB  n1 ( x 1  x ) 2  n2 ( x 2  x ) 2  ...  nk ( x k  x ) 2   ni ( x i  x ) 2 i 1 Where: SSB = Sum of Squares Between k = number of populations (3 in the diagram above) ni = sample size from population i x i = sample mean from population i x = grand mean (mean of all data values) Notice that each population will have a mean that is different than the means of the other populations. This variation is due to differences among the populations. In the graph below, there is a difference in the means of µi and µj. QSO 530 Module 7 5 The mean of the variation of the differences in population is called the Mean of Square Between (MSB). Thus, SSB MSB  k 1 Where: k-1 is the degrees of freedom Sum of Squares Within The other component of SST is SSW (Sum of Squares Within). This value provides an indicator of the variation within the population itself. The diagram below shows that there is a variation with respect to the mean of the population for each data value within the population itself. Response, X Group 1 Group 2 Group 3 SSW sums the variation within each group and then adds over all groups. Based on the above diagram the following formula for SSW is developed: k SSW  (x 11  x 1 ) 2  (x 12  x 2 ) 2  ...  (x knk  x k ) 2   i 1 nj  j1 (x ij  x i ) 2 Where: SSW= Sum of Squares Within k = number of populations ni = sample size from population i x i = sample mean from population i Xij = jth measurement from population i 6 QSO 530 Module 7 The Mean Square Within (MSW) is the mean of SSW. Thus, MSW = SSW Nk Where: N-k is the degrees of freedom for the SSW value F-Test Statistic The F-Test Statistic is the ratio of the between estimate of variance and the within estimate of variance. Thus, MSB F MSW Where: MSB = the mean squares between variances MSW = the mean squares within variances This ratio, F, must always be positive. There are two different degrees of freedom related to F, which are seen in the calculation of MSB and MSW. The degrees of freedom used to calculate MSB, df1, is k-1 where k is the number of populations (levels of treatment); df1 is typically small. The degrees of freedom used to calculate MSW, df2, is N-k where N is the sum of sample sizes from all populations. df2 is typically large. As a result, the ratio, F, will be close to 1 if the null hypothesis H0 is true. Thus, if H0: µ1 = µ2 =…=µk is true, F will be close to 1. If H0 is false, the ratio, F, will be larger than 1. QSO 530 Module 7 7 The following table summarizes the above information regarding Variance within a Random Experiment. Source of Variance SS Df MS Between Samples SSB k-1 MSB  Within Samples SSW N-k MSW = Total Variance SST= SSB + SSW N-1 F Ratio SSB k 1 F MSB MSW SSW Nk Note in the above table that: k= number of populations N = sum of the sample sizes from all populations df = degree of freedom F-Test Example Suppose a club manufacturer wants to know how their club compares with the competition. The company will randomly select five measurements (distances) for each of the three clubs from trials on an automated driving machine. The company must determine if at the .05 significance level, there is a difference in the mean distance. The following table provides the selected measurements for each club: 8 Club 1 Club 2 Club 3 254 234 200 263 218 222 241 235 197 237 227 206 QSO 530 Module 7 251 x1  249.2 216 x 2  226.0 204 x 3  205.8 x  227.0 The graph below provides a scatter diagram of the club distances. In this experiment, H0 assumes that there is no difference in the distance that any of the clubs will hit a golf ball. Thus, H0: μ1 = μ2 = μ3. The alternate hypothesis, HA: μi not all equal.  = .05 df1= k -1 = 3 - 1= 2 df2 = N- k = 15 – 3 = 12 QSO 530 Module 7 9 Using the F-Distribution tables, Fα = F0.05 = 3.885. This means that a value of F above 3.885, will cause H0 to be rejected. Based on calculations, MSB = 2358.2 and MSW = 93.3. Thus F MSB 2358.2   25.275 MSW 93.3 F > Fα and H0 is rejected because there is sufficient evidence that at least one population mean, µi, differs from the rest. This means that at least one of the clubs hits, on average, at a greater or a lesser distance than the rest of the clubs. The diagram below shows the Critical Value in relation to the calculated F value. MS Excel can also be used to calculate these values. Following is an example of the ANOVA chart that gets created in Excel. 10 QSO 530 Module 7 SUMMARY Groups Count Sum Average Variance Club 1 5 1246 249.2 108.2 Club 2 5 1130 226 77.5 Club 3 5 1029 205.8 94.2 Source of Variation SS df MS F P-value F crit Between Groups 4716.4 2 2358.2 25.275 4.99E-05 3.885 Within Groups 1119.6 12 93.3 Total 5836.0 14 ANOVA QSO 530 Module 7 11
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

Attached.

Surname 1

Name
Supervisor
Corse
Date

1. Use of the F-statistic in data analysis

The F-statistic is a value you get when you run an ANOVA test or a regression analysis to find if
there is a significant difference between the means of two populations. F test is run to tell if two
or more variables are jointly significant (Douglas).
The F-statistic is used when deciding to accept or reject a null hypothesis on two population
variables. It is a ratio of two mean squares of the data (Bolin).
Example
A manager of a bank wanted to know if the performance of tellers in two different branches of
the same bank is the same. He had moved two tellers from branch A to B to conduct the test. The

Surname 2

manager collected the following data on the performance of the tellers in the two different
branches. A: 156, 278, 134, 202, 236, 198, 187, 199, 143, 165, 223
B: 345, 332, 309, 367, 388, 312, 355, 363, 381.
The manager decides to use the F test to determine if the two performance are the same.
Ho: performance of the two tellers are the same
H1: performance of the two tellers is not the same.

He decides to use 5% level of significance
The first calculation is the sample means.
B (345+332+309+367+388+312+355+363+381)/9
=...


Anonymous
This is great! Exactly what I wanted.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Related Tags