Week 6 Assignment

Anonymous

Question Description

This signature assignment is designed to align with specific program student learning outcome(s) in your program. Program Student Learning Outcomes are broad statements that describe what students should know and be able to do upon completion of their degree. The signature assignments might be graded with an automated rubric that allows the University to collect data that can be aggregated across a location or college/school and used for program improvements.

Purpose of Assignment

The purpose of this assignment is for students to synthesize the concepts learned throughout the course. This assignment will provide students an opportunity to build critical thinking skills, develop businesses and organizations, and solve problems requiring data by compiling all pertinent information into one report.

Assignment Steps

Resources: Microsoft Excel®, Signature Assignment Databases, Signature Assignment Options, Part 3: Inferential Statistics

Scenario: Upon successful completion of the MBA program, say you work in the analytics department for a consulting company. Your assignment is to analyze the following databases:

• Consumer Food

Select one of the databases based on the information in the Signature Assignment Options. (which has been selected)

Option 3: Consumer Food

The consumer food database contains five variables: Annual Food Spending per Household, Annual Household Income, Non-Mortgage Household Debt, Geographic Region of the U.S. of the Household, and Household Location. There are 200 entries for each variable in this database representing 200 different households from various regions and locations in the United States. Annual Food Spending per Household, Annual Household Income, and Non-Mortgage Household Debt are all given in dollars. The variable Region tells in which one of four regions the household resides. In this variable, the Northeast is coded as 1, the Midwest is coded 2, the South is coded as 3, and the West is coded as 4. The variable Location is coded as 1 if the household is in a metropolitan area and 2 if the household is outside a metro area. The data in this database were randomly derived and developed based on actual national norms.

Provide a 1,600-word detailed, statistical report including the following:

• Explain the context of the case
• Provide a research foundation for the topic
• Present graphs
• Explain outliers
• Prepare calculations
• Conduct hypotheses tests
• Discuss inferences you have made from the results

This assignment is broken down into four parts:

• Part 1 - Preliminary Analysis
• Part 2 - Examination of Descriptive Statistics
• Part 3 - Examination of Inferential Statistics
• Part 4 - Conclusion/Recommendations

Part 1 - Preliminary Analysis (3-4 paragraphs)

Generally, as a statistics consultant, you will be given a problem and data. At times, you may have to gather additional data. For this assignment, assume all the data is already gathered for you.

State the objective:

• What are the questions you are trying to address?

Describe the population in the study clearly and in sufficient detail:

• What is the sample?

Discuss the types of data and variables:

• Are the data quantitative or qualitative?
• What are levels of measurement for the data?

Part 2 - Descriptive Statistics (3-4 paragraphs)

Examine the given data.

Present the descriptive statistics (mean, median, mode, range, standard deviation, variance, CV, and five-number summary).

Identify any outliers in the data.

Present any graphs or charts you think are appropriate for the data.

Note: Ideally, we want to assess the conditions of normality too. However, for the purpose of this exercise, assume data is drawn from normal populations.

Part 3 - Inferential Statistics (2-3 paragraphs)

Use the Part 3: Inferential Statistics document.

• Create (formulate) hypotheses
• Run formal hypothesis tests
• Make decisions. Your decisions should be stated in non-technical terms.

Hint: A final conclusion saying "reject the null hypothesis" by itself without explanation is basically worthless to those who hired you. Similarly, stating the conclusion is false or rejected is not sufficient.

Part 4 - Conclusion and Recommendations (1-2 paragraphs)

Include the following:

• What do you infer from the statistical analysis?
• State the interpretations in non-technical terms. What information might lead to a different conclusion?
• Are there any variables missing?
• What additional information would be valuable to help draw a more certain conclusion?

Format your assignment consistent with APA format.

Unformatted Attachment Preview

Purchase answer to see full attachment

strongboss5
School: UC Berkeley

Kindly see attached a 1630-word essay with the statistical report following the outline provided in the grading rubric and the corresponding plagiarism report

WEEK 6 ASSIGNMENT: SIGNATURE

1

WEEK 6 ASSIGNMENT: SIGNATURE
(NAME)
(PROFESSOR’S NAME)
(COURSE)
(DATE)

WEEK 6 ASSIGNMENT: SIGNATURE

2

The present report details the analysis carried out of the data contained in the Consumer
Food database as part of the job at the analytics department in a consulting company obtained
after having successfully completed the MBA studies.

Part 1.
The Consumer Food database contains the information related to the annual food
spending, the annual household income and the non-mortgage annual debt of a total of 200
houses located in different regions of the USA and in different locations. In this regard, the
database is supposed to contain a representative sample of the population of the households
located in the Northeast, Midwest, South and Western regions of the United States. These
households are classified according to their metropolitan or outside metropolitan location.
The main objective of the current analysis was to determine if the annual food spending
of households located in the Midwest region of the country was significantly higher or not than
the US national average. Additionally, the database needed to be characterized from a descriptive
point of view to evaluate the characteristic measures of both central tendency (including the
evaluation of the mean, median and mode) and variability (including the calculation of the
standard deviation, variance, range, coefficient of variation and the five number summary).
To perform such analyses, the database consists of a total of five different columns. The
first three columns contain quantitative data regarding the annual food spending, the annual
household income and the non-mortgage annual debt of the different households. The level of
measurement of the data contained in these columns is therefore interval. The remaining two
columns, on the other hand, contain qualitative data regarding the location of the household
(either in the metropolitan or outside the metropolitan area) and the region of the household

WEEK 6 ASSIGNMENT: SIGNATURE

3

(which can be placed in the Northeast, the Midwest, the West and the South regions of the
country).
-

Which are the characteristic measures of central tendency of the data contained in the
database?

-

Which are the characteristic measures of variability of the data contained in the database?

-

Does the database contain any outlier?

-

Is the annual food spending of the houses located in the Midwest region higher than the
national average of \$8,000.00?

-

Is there any difference between households located in the metropolitan area and those
outside the metropolitan area regarding their annual food spending?

-

Does the region have any influence on the annual food spending, annual household
income or the non-mortgage debt?

Part 2.
The descriptive analysis carried out is based on the evaluation of the characteristic
measures of central tendency and variability of the different variables contained in the database
and the analysis of outliers.
The outcome from these analyses are summarized in tables 1 and 2, and in figure 1. In
this regard, table 1 presents the evaluation of the mean, median and mode as characteristic
measures of central tendency. Additionally, it includes information about the calculation of the
characteristic measures of variability, such as the standard deviation, variance, range, coefficient
of variation, and the five-number summary. Table 2, on the other hand, presents the outcome
from the analysis of outliers by considering as outlier any result that lies outside the 99%

WEEK 6 ASSIGNMENT: SIGNATURE

4

confidence interval for the mean values of the different variables under study. Finally, figure 1
summarizes the analysis of the descriptive statistics of the data contained in the database in a
graphical way. In this regard, the box and whisker plots presented in such figure provide
information related to the central tendency measures, the variability of the results, and the
presence of the identified outliers.
Table 1. Summary of the results obtained for the descriptive statistics of the quantitative
variables

Table 1 shows how the mean and the median values are close one to each other,
indicating that the assumption of normality might be valid. On the other hand, the mode does not
seem to be a representative measure of central tendency, due to the small sample size and big
variability of results. The mode is the best descriptor of central tendency for the qualitative
variables, for which the database indicates that the mode is the Northeast region and the
metropolitan area.

WEEK 6 ASSIGNMENT: SIGNATURE

5

Table 2. Identification of outliers

Table 2 represents the calculated 99% confidence intervals for each of the variables and
the resulting outliers. These 99% confidence intervals have been calculated by applying the
equation μ±2.575*σ where μ represents the mean value and σ the standard deviation. The value
of 2.575 is the z value corresponding to a probability of 99% and that therefore accounts for the
desired confidence level. Any value outside these intervals is considered an outlier since there is
only a 1% chance that it is part of the analyzed sample (Ashanulla, 2003). It is interesting to note
how the only outlier present in the database corresponds to one household, that has both an
unexpectedly high annual food spending and annual household income.

Figure 1. Box and whisker plot of the three quantitative variables contained in the database. The
boxes represent the data located between the first and third quartile with the middle box
representing the median. The whiskers correspond to the percentiles 1st-25th and 75th-99th.
Identified outliers are marked by an asterisk

WEEK 6 ASSIGNMENT: SIGNATURE

6

Part 3.
Evaluation of the annual food spending in Midwestern houses
A hypothesis test has been conducted to evaluate if the annual food income of
Midwestern houses is higher than the national average. In this regard, a subsample has been
selected where only the houses located in the Midwest.
A t test hypothesis has been carried out considering that the subsample size is relatively
small, as it includes only the 45 households located in the Midwest, and that the population’s
standard deviation for the annual food spending is unknown (O’Hagan & Forster, 2009). The
outcomes from the hypothesis test are presented in table 3.
Table 3. Summary of the outcome of the hypothesis test carried out

As can be observed, the resulting p-value is of 0.032. Considering that this value is higher
than the desired significance level of 0.01, there is not enough evidence to support the rejection
of the null hypothesis. Taking this into account, the mean annual food spending in Midwestern
houses is not significantly higher than the national average and the small difference observed can
be attributed to random.

WEEK 6 ASSIGNMENT: SIGNATURE

Evaluation of the effect of the location of the house in the annual food spending
A hypothesis test has been conducted to evaluate if the annual food income of
metropolitan and outside metropolitan areas are different. In this regard, the sample has been
divided into metropolitan households and outside metropolitan households. The outcomes from
the hypothesis test are presented in table 4.
Table 4. Summary of the outcome of the hypothesis test carried out

As can be observed from the outcome of the hypothesis test presented in table 4, the
resulting p-value is of 0.0089. Considering that this value is lower than the desired significance
level of 0.01, there is enough evidence to support the rejection of the null hypothesis, meaning
that the location of the house (in the metropolitan area or outside it) has a direct influence on the
annual food spending.
Effect of the region on the annual food spending, annual household income and non-mortgage
debt
Three independent one way ANOVA tests have been conducted to evaluate if the region
had any influence on any of the quantitative variables contained in the database. The outcome

7

WEEK 6 ASSIGNMENT: SIGNATURE

from these ANOVA tests is presented in tables 5-7 together with the descriptive characteristics
of each of the variables according in the different regions.
Table 5. ANOVA test for the evaluation of the effect of the region on the annual food spending

Table 6. ANOVA test for the evaluation of the effect of the region on the annual household
income

8

WEEK 6 ASSIGNMENT: SIGNATURE

9

Table 7. ANOVA test for the evaluation of the effect of the region on the non-mortgage debt

As can be observed, the p-values resulting from the one-way ANOVA analysis are below
than 0.01 in tables 5 and 6 and above 0.01 in table 7. This result indicates that the region in
which the households are located has a significant influence on the non-mortgage debt (therefore
resulting in a lower p value), whereas it does not have such a significant influence in the annual
food spending or the annual household income. However, considering the relatively small
differen...

flag Report DMCA
Review

Anonymous
Top quality work from this tutor! I’ll be back!

Anonymous
Just what I needed… fantastic!

Anonymous
Use Studypool every time I am stuck with an assignment I need guidance.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Brown University

1271 Tutors

California Institute of Technology

2131 Tutors

Carnegie Mellon University

982 Tutors

Columbia University

1256 Tutors

Dartmouth University

2113 Tutors

Emory University

2279 Tutors

Harvard University

599 Tutors

Massachusetts Institute of Technology

2319 Tutors

New York University

1645 Tutors

Notre Dam University

1911 Tutors

Oklahoma University

2122 Tutors

Pennsylvania State University

932 Tutors

Princeton University

1211 Tutors

Stanford University

983 Tutors

University of California

1282 Tutors

Oxford University

123 Tutors

Yale University

2325 Tutors