Current Date
From: Your Name
To: InstructorSalutation InstructorLastName (e.g. Dr. Sherman)
Subj: Name of your issue analysis (edit this title)
1. Identify the issue. In this paragraph, summarize the analysis project. In other words, identify the issue
under investigation so that the reviewer knows what this analysis is about. This could be a problem or an
opportunity. What is the “issue” or “opportunity” that is better understood by examining this question
(you can be creative here, tie it back to a business function or need).
2. Analyze the issue. In this paragraph, briefly describe: a. where the data came from b. the steps you
took to prepare the data for analysis c. how you analyzed the data d. the results of your analysis i. you
should have a table with summary statistics ii. you should have some type of graphic (histogram, box
plot, scatter graph, etc.) that illustrates your findings
3. Discussion. In this paragraph, in clear language, what did you find in your analysis, and what are the
implications of your findings (for this exercise, you can be creative). Also, briefly discuss which tool you
liked better to do the ANOVA (Excel, or SAS). Why?
4. Conclusion. In this paragraph, briefly discuss what you learned in doing this exercise.
Data Analysis Exercise 4 (DAX4)
ANOVA Comparison between SAS and Excel
In this exercise, we will compare doing an analysis on the same dataset in Excel and SAS, and then report on our analysis.
The data that we will use for this analysis is called SASHELP.CARS, and the analysis we will do is an ANOVA. I have
explored the dataset and have developed a research question: Is there a statistically significant difference in the weight
of sedans based on their origin? To find the answer to this question, I am going to analyze the dataset and perform an
ANOVA comparing the sample means of the weight variable by continent of origin (USA, Asia, or Europe).
An ANOVA is a set of statistical methods used mainly to compare the means of two or more samples. Estimates of
variance are the key intermediate statistics calculated, hence the reference to variance in the title ANOVA.1
In testing the null hypothesis for a One-Way ANOVA, we generally follow these steps:
1. State the plain language research question: Is the average weight of sedans in this dataset statistically similar
regardless of origin?
2. State the hypotheses:
• Null hypothesis – HO: µAsia = µEurope = µUSA
• Alternative hypothesis – HA: Not all means are equal.
3. State the criteria for rejecting HO:
• α = 0.05
4. Test the assumptions for the One-Way ANOVA:
• Assumption of Independence (Research Design)
• Assumption of Normality (Shapiro-Wilk Pr α, then retain the null hypothesis
• If p < α, then reject the null hypothesis, and accept the alternative hypothesis
Please watch the video linked to this module for detailed instructions. In general, you are going to:
1. Download “Business Memo Template DAX4.docx” from the DAX4 Assignment.
2. Save the downloaded file as a Word file and name it “LastName DAX4” without the quotation marks. The file
extension will be added automatically. For example, my file would be “Bohler DAX4.docx”. Again, ignore the
quotation marks in this part of the instructions.
3. Edit the date, from, and to fields of the memo to have the correct information.
4. For the subject area, change the title to something like “Examination of Sedan Weight by Origin” or something
similar.
5. In paragraph 1, Identify the issue, discuss your research question. Why do you care if there is any difference by
weight of sedans by country of origin? Be creative, but logical. Try to tie your interest in this question to some
business function (Finance & Accounting, Sales & Marketing, Human Resources, or Manufacturing &
1
http://www.statsdirect.com/help/analysis_of_variance/anova.htm
1
Data Analysis Exercise 4 (DAX4)
ANOVA Comparison between SAS and Excel
Operations). Clearly state your research question “Is there a statistically significant difference in the weight of
sedans based on their origin?” Clearly state the statistical question (you can copy and paste the following):
•
Null hypothesis – HO: µAsia = µEurope = µUSA
•
Alternative hypothesis – HA: Not all means are equal.
6. In paragraph 2, Analyze the issue, discuss your analysis. This is your “methodology” or recipe for analyzing the
data to answer your research question. The following steps will help you fill out paragraph 2.
7. Open SAS.
8. Find the SASHELP.CARS data
9. Explore the SASHELP.CARS data (e.g. use the Characterize Data task). Choose the Origin and Weight variables to
explore. Run the task (F3). Check the LOG for Errors, Warnings, and Notes, if there are any issues, fix the
problem and rerun the task. Look at the OUTPUT. What are the values for “Origin”? What is the frequency for
each “Origin” value? What is the N, Minimum, Mean, Maximum, and Standard Deviation of Weight for the
dataset? Are there any missing values? Add any appropriate charts and tables to your Business Memo.
10. To perform an ANOVA, we need to make a few assumptions:
a. Independence: The observations are random and independent samples from the population.
b. Normality: Each group sample is drawn from a normally distributed population (errors are normally
distributed).
c. Homogeneity of variance: The variances of the distributions in the populations are equal.
11. Test for Independence: This is a methodological concern and is determined by the set up for the study. For this
assignment, please consider that the assumption of independence has been met in the research study design.
12. Test for Normality: In SAS, run the “Summary Statistics” task. On the DATA tab, select the “SASHELP.CARS”
dataset. Use Weight for the “Analysis Variable” and “Origin” for the “Classification Variable”. On the OPTIONS
tab, expand “Statistics”, expand “Basic Statistics” and choose all options (e.g. Mean, Standard Deviation, etc.).
Expand “Additional Statistics” and choose “Standard Error”, “Variance”, “Confidence limits for the mean”
(Confidence level: 95%), “Coefficient of variation”, “Skewness”, and “Kurtosis”. Expand PLOTS and choose
“Histogram” and “Add normal density curve”, “Comparative box plot”, and add inset statistics (overall and
group). Run the task (F3). Check the LOG for Errors, Warnings, and Notes, if there are any issues, fix the problem
and rerun the task. Look at the OUTPUT. Do you think we met the assumption of Normality? There are many
things to consider (sample size for example) when determining the answer to this question, but some general
rules of thumb to consider include:
a. Is Skewness more than ± 0.8? If yes, that is a concern.
b. Is Kurtosis more than ± 3.0? If yes, that is a concern.
2
Data Analysis Exercise 4 (DAX4)
ANOVA Comparison between SAS and Excel
c. Do the histograms appear normal (more or less)? If they do not approximate the normal distribution
curve, that could be an issue that indicates that the normality assumption for an ANOVA is invalid.
d. Look at the boxplots. Are there any extreme outliers?
e. If your data does not violate these rules of thumb, you can accept the ANOVA assumptions have been
satisfied for this analysis.
Additional test for normality
(Optional for this exercise)
Open the “t Test” task, and on the DATA tab, select the SASHELP.CARS dataset, One-sample test, and select
“Weight” as the “Analysis variable”. Click on the Filter icon and the Type = “Sedan” filter by typing it into the text
box. On the OPTIONS tab, make sure “Tails:” is “Two-tailed test” and the “Tests for normality” box is checked. Run
the tasks. Scroll down the input to you find the “Tests for Normality” table and review the “p Value” for the
“Shapiro-Wilk” statistic. If this value is greater than alpha (α = 0.05) then, accept the assumption that the data came
from a normally distributed population. If the Shapiro-Wilk p Value is less than α, then you need to consider if your
assumption of normality is valid.
In our example, the Shapiro-Wilk p Value is 0.0113 which is less than 0.05, which would mean that we should reject
the null hypothesis that the population that Weight came from was normally distributed. However, Shapiro-Wilk is
very conservative, and ANOVA is robust against the assumption of normality given a large enough data set and the
group sizes are approximately equal. For our example: Asia n=94, Europe n=78, and USA n=90. That means in our
case, we can consider our population as nearly normal in distribution and proceed with our analysis.
If there no other concerns from the examination of the distribution of the data, then you can consider the
assumption of normality to have been met.
13. Test for homogeneity of variance: We will examine this assumption by running the ANOVA task, alternatively
you can run the “Distribution Analysis” task, but it is more involved and most of the information we will need
can be found in the OUTPUT from the ANOVA Diagnostic Plots.
14. Select the “One-Way ANOVA” task.
15. Perform an ANOVA on the SASHELP.CARS. On the DATA tab, select the SASHELP.CARS dataset.
16. Click on the filter icon and in the text box that opens input the following: Type = “Sedan”
17. Select data using looking at “Weight” as the Dependent variable and the “Origin” variable as providing the
categories. Under the OPTIONS tab, select all plots as Individual Plots, and leave the other options to default.
18. Run the task (F3). Is the mean Weight statistically different for the different values of “Origin” in the filtered
(Sedan) data set? Report your findings in the Business Memo.
19. Download the SASHELP.CARS data set from the SAS Library. Refer to DAX 3 if you need help remembering how
to download data from SAS.
20. Open Excel. If you have not already done so, enable the Data Analysis tools in Excel.
21. Open the SASHELP.CARS data file in Excel.
3
Data Analysis Exercise 4 (DAX4)
ANOVA Comparison between SAS and Excel
22. Prepare the data for analysis in Excel (Excel requires the data be in columns), by putting the Origin and Weight
data into a separate sheet. Rename the sheet “ANOVA Data”. I suggest using the Excel Table function to help
organize and sort your data.
23. Run the Excel “Descriptive Statistics” task on the data, compare your results to the SAS descriptive stats.
24. Perform an ANOVA on the SASHELP.CARS data looking at “Origin” (Asia, Europe, and USA) and Weight. Is the
Weight of the three values of “Origin” statistically different in this data set?
25. In paragraph 3, Discussion, record the findings of your analysis.
26. Compare the results from both Excel and SAS. Which tool did you like best for your analysis? Why? Report your
observations in the Discussion area.
27. Any table, chart, or figure you include in your report must have a caption, and that table, chart, or figure must
be referenced in your discussion. Include any relevant details (P values, tables, etc.) in your discussion to
adequately justify any conclusions.
28. Spell check and grammar check your report. Your report should not exceed three pages, have at least one table,
and one graphic.
29. In paragraph 4, Conclusion, clearly answer your research question, “Is the average weight of sedans in this
dataset statistically similar regardless of origin?” and tie it back to your interest in this analysis (your creative
writing from paragraph 1).
30. Save your Word file as a PDF file as “LastName DAX4” without the quotation marks. The file extension will be
added automatically. For example, my file would be “Bohler DAX4.pdf”. Again, ignore the quotation marks in the
instructions.
31. Upload your completed Business Memo to this assignment. Now you know how to do an ANOVA in SAS and
Excel! Try this exercise using a different dataset to see if you understand how to apply this new knowledge and
skills to other real-world questions.
4
Purchase answer to see full
attachment