1. Analysis of Variance
We have discussed how to develop interval estimates and how to conduct hypothesis tests for situations
involving a single population mean and a single population proportion. So what if we wanted to identify
means between different populations? For example, what if we wanted to learn if there was a
difference in post-graduate earnings between men and women? To identify any differences, we would
take two independent random samples – one for each population of men and of women – and calculate
the mean. We would assume that the standard deviation has been provided. We would then subtract
the two means to obtain the point estimator of the difference, and determine the standard error for
them using the formula:
As before, we would determine the significance level and find the range of error.
If, however, the standard deviations are not known for given populations, we can use the standard
deviations from the samples and the formula:
We can then calculate the degrees of freedom for t based on the formula:
and determine the range for the p-value. In an actual study, we would create the hypotheses, determine
the samples, and then run the statistical analysis. For the example above, let:
μ1= the population mean post-graduate earnings of men
μ2= the population mean post-graduate earnings of women
The hypotheses would then be written as follows:
H0: μ1−μ2 = 0 (There is no difference between post-graduate earnings between men and women.)
Ha: μ1−μ1 ≠0 (There is a difference between post-graduate earnings between men and women.)
We could then use a simple random selection procedure to select data from men and women. For
example, we could take a list of men and a list of women and select data from every odd numbered man
and woman. We would then determine the mean post-graduate earnings between men and between
women, calculate the sample standard deviation, then calculate the degrees of freedom for 95%
confidence level, and see if the p-value is between .10 and .05 (two-tailed test). If it does not lie
between the p-values, then you can say that H0 is not rejected.
In general, the hypotheses for the difference of two means are written as
Two-Tailed:
H0: μ1= μ2
Ha: μ1≠ μ2 or μ1−μ2≠ 0
Lower Tail:
H0: μ1≥ μ2
Ha: μ1< μ2 or μ1−μ2< 0
Upper Tail:
H0: μ1≤ μ2
Ha: μ1> μ2 or μ1−μ2> 0
The data from studies conducted to generate data can be used in the statistical procedure
called analysis of variance or ANOVA. ANOVA can be used to test data for three of more populations in
much the same way that we have already done with two populations.
To use ANOVA analysis, you must assume the following criteria:
Critaria1
Each population has a normal distribution.
Criteria2
The variance for each population is the same for all populations.
Criteria3
The observations are independent.
ANOVA is used when the data are divided into groups according to only one factor, such as the mean of
post-graduate pay. The analysis is usually:
a. Is there a significant difference between the groups?
b. If there is a difference, which groups are significantly different from which others?
In other words, to determine whether or not the population means are equal, we have to see whether
or not there is variability among the sample means and variability of the data within each sample.
Statistical tests are then provided to compare group means, group medians, and group standard
deviations.
For example, if we have three sets of data regarding post-graduate pay increases between an M.S. in
Leadership, an M.S. in Management, and an M.S. in Organizational Leadership, and we wanted to know
if there was a difference in pay between them, we would use ANOVA. For this example, we would let:
μ1= the population mean post-graduate pay increase of graduates with an M.S. in Leadership
μ2 = the population mean post-graduate pay increase of graduates with an M.S. in Management
μ3 = the population mean post-graduate pay increase of graduates with an M.S. in Organizational
Leadership
The hypotheses would then be written as follows:
H0: μ1=μ2=μ3 ( The population means are equal.)
Ha: Not all the population means are equal to each other.
After calculating the means, we would calculate the sample standard deviations and then the Estimate
of Standard Deviation = (Number of Data)(Sample Standard Deviation squared). We would need to use
the same formula for the sample variance data, and then compare the information to see if the ratio is
1. If it is not 1, we can reject the null hypothesis.
2. Analysis of Correlation
The test of independence seeks to identify whether or not variables in a given sample have a
relationship. For example, does being a man or a woman impact the amount of post-graduate pay
increase? The following is a contingency table with data collected for graduates:
Male
Female
$0-$3,000
456
709
$3,001-$6,000
302
958
$6,001-$9,000
909
387
$9,001-$12,000
536
282
More than $12,000
736
815
The hypotheses are:
H0: Gender and post-graduate pay are independent of each other
Ha: Gender and post-graduate pay are dependent
In general, if one makes a contingency table (as above), the hypotheses for tests of independence are
written as:
H0: The row and column are independent of each other
Ha: The row and column are dependent
The test statistic used is the same as the chi-square goodness-of-fit test. The principle behind the test
for independence is the same as the principle behind the goodness-of-fit test. In fact, you can think of
the test for independence as a goodness-of-fit test where the data is arranged into table form called
a contingency table.
For a chi-square goodness-of-fit test, we determine whether or not the population proportions are
equal to given values b1, b2, …,bk
(where all the bi’s add up to 1) :
H0: p1=b1, p2=b2, p3 = b3,..., pk=bk
Ha: H0 is not true
or if all the population proportions are equal to each other:
H0: p1= p2=p3= ... = pk
Ha: H0 is not true
The test statistic for goodness of fit compares the sample of observed results with the expected results
under the assumption that the null hypothesis is true. The formula is
fi = observed frequency for category i
ei = expected frequency for category i
k = the number of categories
Note: The test statistic has a chi-square distribution with k – 1 degrees of freedom provided that the
expected frequencies are 5 or more for all categories.
The following are properties of the test for independence:
The data are the observed frequencies.
The data is arranged into a contingency table that lists all possible combinations of possible
relationships.
The degrees of freedom are the degrees of freedom for the row variable times the degrees of
freedom for the column variable. It is the product of the two degrees of freedom.
It is always a right tail test.
It has a chi-square distribution.
The expected value is computed by taking the row total times the column total and dividing by
the grand total.
The value of the test statistic doesn't change if the order of the rows or columns is switched.
The value of the test statistic doesn't change if the rows and columns are interchanged.
where
fij = observed frequency for contingency table category in row i and column j
eij = expected frequency for contingency table category in row i and column j based on the assumption
of independence
Note: With n rows and m columns in the contingency table, the test statistic has a chi-square
distribution with (n - 1)(m -1) degrees of freedom provided that the expected frequencies are five or
more for all categories.
So, to test for goodness of fit and test of independence, we would follow these steps:
Step 1
State the null and alternative hypotheses.
Step 2
Select a random sample and record the observed frequencies.
Step 3
Determine the expected frequency in each category by multiplying the category probability by the
sample size.
Step 4
Compute the test statistic.
Step 5
Accept or reject the H0 and the Ha with consideration to the level of significance.
Statistical data often has relationships between variables. Your ability to test for such relationships will
allow you to draw conclusions from the information that can be helpful in understanding the
characteristics of your studied population.
Purchase answer to see full
attachment