STAT200 Introduction to Statistics
Assignment #1: Descriptive Statistics Data Analysis Plan
Assignment #1: Prepare Descriptive Statistics Data Analysis Plan
Before conducting any statistical analyses, researchers develop a plan for how they will analyze their
data to answer their research questions. The purpose of this assignment is to provide an experience
developing a descriptive statistics analysis plan. Note: This first assignment is a plan only; no statistics
will be calculated or graphs created. The second assignment will involve carrying out the plan, after
receiving feedback from your instructor.
Assignment Steps:
Step #1: Review the STAT200 data set file. (Note: This data set will be used for all three of this term’s
written assignments).
The data is a subsample from the US Department of Labor’s Consumer Expenditure Surveys (CE) and
provides information about the composition of households and their annual expenditures
(https://www.bls.gov/cex/). Detailed information on the sample and variables is included with the data
set file; please carefully review this information to familiarize yourself with the data (Note: This
information will be used in Assignment #2 to describe the dataset).
Step #2: Develop descriptive statistics data analysis plan.
➢ Task 1: Develop scenario. Imagine that you are head of a household and have to determine a
household budget plan based on the data available from the dataset. For instance, you are a 35
year old single parent with a high school diploma and one child.
➢ Task 2: Select variables for analysis that match the scenario developed in Task 1.The data set
provides information on household consumption; there are socioeconomic variables and
expenditures variables. The socioeconomic variable names start with “SE-” and the expenditure
variable names start with a “USD;” all expenditures are in US dollars. All students must use
income as one variable. Select two additional socioeconomic variables (one qualitative and one
quantitative) and two expenditures for your analysis that match the scenario you developed for
Task 1. For instance, using the example scenario of a 35 year old single parent with a high
school diploma and one child, you could select “income,” “education,” and “number of children”
as socioeconomic variables and then pick two household expenditure items to show the
distribution of costs and compare that with your income. When selecting variables, think about
the following three questions:
o
Why am I choosing these variables?
o
What interests me about these variables?
o
What do I think will be the outcome?
➢ Task 3: Determine appropriate measures of central tendency and dispersion for the selected
variables. For each quantitative variable, select at least one measure of central tendency and at
least one measure of dispersion (Please see below table for list of measures). For the qualitative
variable, select one measure of central tendency. When determining the measures of central
tendency and dispersion, think about what is appropriate given the level of measurement and
type of variable. Recommend referring to the text and information posted in our LEO classroom
to help with this task (Note: you will use this information to provide a rationale for your choice
of measures).
Measures of Central Tendency
●
●
●
Mean
Mode
Median
Measures of Dispersion
●
●
●
Range
Sample Standard Deviation
Variance
➢ Task 4: Determine appropriate graph and/or table for each of the selected variables. Select
one graph or table for each variable (Please see below table for list of graphs and tables). When
determining the graphs and tables, think about what is appropriate given the level of
measurement and type of variable. Recommend referring to the text and information posted in
our LEO classroom to help with this task (Note: you will use this information to provide a
rationale for your choice of graphs and/or tables).
Types of Graphs
●
●
●
●
Types of Tables
Pie Chart
Bar Chart
Histogram
Box Plots (also known as Box-and-Whiskers Plot)
●
●
●
Frequency Table
Relative Frequency Table
Grouped Frequency Table
Step #3: Complete the “Assignment #1: Descriptive Statistics Data Analysis Plan Template.”
Remember, you will not be conducting any statistical analysis, drawing any graphs, or compiling any
tables for the first assignment. Rather, you need to wait for feedback from your instructor on this
assignment and use that feedback to complete Assignment #2.
Here are the main sections for this assignment (i.e., completing the plan template):
✓ Identifying Information. Fill in information on name, class, instructor, and date.
✓ Scenario. In this section, briefly (2-3 sentences) describe the scenario you developed in Step #2,
Task 1.
✓ Complete Table 1: Variables Selected for the Analysis. Enter information the variables selected
for analysis in Step #2, Task 2. For each selected variable be sure to include its: name as listed in
the data set, description, and variable type.
✓ Reason(s) for Selecting the Variables and Expected Outcome(s): In this section, for each
selected variable, please answer the following questions:
✓ Why did I choose this variable?
✓ What interests me about this variable?
✓ What do I think will be the outcome?
✓ Complete Table 2. Numerical Summaries of the Selected Variables. Enter information on
selected measures of central tendency and dispersion for each selected variable. Be sure to
briefly explain why you choose those measurements. Note: The information for the required
variable, “Income,” has already been completed and can be used as a guide for completing
information on the remaining variables.
✓ Complete Table 3. Type of Graphs and/or Tables for Selected Variables. Enter information on
selected graph and/or table for each selected variable. Be sure to briefly explain why you
choose those measurements. Note: The information for the required variable, “Income,” has
already been completed and can be used as a guide for completing information on the
remaining variables.
Assignment Submission: Name the file that contains your completed “Assignment #1: Descriptive
Statistics Data Analysis Plan Template” using the following format: “Assignment1-StudentLastName.”
Then, submit the file via the Assignments area in the LEO classroom in the “Assignment #1: Descriptive
Statistics Data Analysis Plan” folder and wait for your instructor’s feedback.
Grading Rubric for Written Assignment #1
Scenario and Selection of Related Variables
●
Clear description of scenario
●
Selected variables and reasons are appropriate for the scenario.
20%
Selection of Measures of Central Tendency and Dispersion
For each variable:
30%
●
Appropriate measures selected.
●
Rationale is provided and appropriate.
Selection of Graphs and/or Tables
For each variable:
30%
●
Appropriate measures selected.
●
Rationale is provided and appropriate.
Writing Quality:
Completes all sections of template.
Writes clearly, concisely, and with few errors.
20%
University of Maryland University College
STAT200 - Assignment #1: Descriptive Statistics Data Analysis Plan
Identifying Information
Student (Full Name):
Class:
Instructor:
Date:
Scenario: The purpose of this assignment is to give you a chance to do an actual statistical
project. In the dataset you have 8 variables.
Choose 4 variables that you think should relate to each other and describe how you think they
should relate. List the four variables and write a few lines describing four scenarios on how the
variables (Income must be one of the variables).
Give a link to a study or research that support the reason the variables you choose should
relate.
Example, suppose two variables named annual alcohol budget per household, and average
years of education per household was in my dataset. The link that supports my belief is below:
https://news.gallup.com/poll/184358/drinking-highest-among-educated-upper-incomeamericans.aspx
Use Table 1 to report the variables selected for this assignment. Note: The information for the
required variable, “Income,” has already been completed and can be used as a guide for
completing information on the remaining variables.
Table 1. Variables Selected for the Analysis
V
a
r
i
a
b
l
e
N
a
m
e
i
n
t
h
e
D
a
t
a
S
e
t
Description
(See the data dictionary for describing the variables.)
Continuous or Discrete
Variable
T
y
p
e
o
f
V
a
r
i
a
b
l
e
(
Q
u
a
l
i
t
a
t
i
v
e
o
r
Q
u
a
n
t
i
t
a
t
i
v
e
)
V Annual household income in USD.
a
r
i
a
b
l
e
1
:
“
I
n
c
o
m
e
”
V
a
r
i
a
b
l
e
2
:
V
a
r
i
a
b
l
e
3
:
Continuous
Q
u
a
n
t
i
t
a
t
i
v
e
V
a
r
i
a
b
l
e
4
:
V
a
r
i
a
b
l
e
5
:
Reason(s) for Selecting the Variables and Expected Outcome(s):
1. Variable 1: “Income” 2. Variable 2: “ “ 3. Variable 3: “ “ 4. Variable 4: “ “ 5. Variable 5: “ “ -
Data Set Description:
Proposed Data Analysis:
Measures of Central Tendency and Dispersion
Complete Table 2. Numerical Summaries of the Selected Variables and briefly explain why
you choose those measurements. Note: The information for the required variable, “Income,”
has already been completed and can be used as a guide for completing information on the
remaining variables.
Table 2. Numerical Summaries of the Selected Variables
Variable Name
Variable 1:
“Income”
Measures of Central
Tendency and Dispersion
!
!
!
Number of
Observations
Median
Sample Standard
Deviation
Rationale for Why
Appropriate
I am using median for two
reasons:
1. If there are any
outliers or the data is
not normally
distributed, the
median is the best
measure of central
tendency.
2. The variable is
quantitative.
I am using sample standard
deviation for three reasons:
1. The data is a sample
from a larger data set.
2. It is the most
commonly used
measure of
dispersion.
3. The variable is
quantitative.
Variable 2:
Variable 3:
Variable 4:
Variable 5:
Graphs and/or Tables
Complete Table 3. Type of Graphs and/or Table for Selected Variables and briefly explain
why you choose those graphs and/or tables. Note: The information for the required variable,
“Income,” has already been completed and can be used as a guide for completing information
on the remaining variables.
Table 3. Type of Graphs and/or Tables for Selected Variables
Variable Name
Graph and/or Table
Rationale for why
Appropriate?
Variable 1:
“Income”
Variable 2:
Variable 3:
Variable 4:
Variable 5:
Graph: I will use the
histogram to show the
normal distribution of data.
Histogram is one of the best
plot to show the normal
distribution of continuous or
quantitative level data.
STAT200 Introduction to Statistics
Dataset for Written Assignments
Description of Dataset:
The data is a random sample from the US Department of Labor’s 2016 Consumer Expenditure Surveys (CE) and provides information about the
composition of households and their annual expenditures (https://www.bls.gov/cex/). It contains information from 30 households, where a survey
responder provided the requested information; it is all self-reported information. This dataset contains four socioeconomic variables (whose names start
with SE) and four expenditure variables (whose names start with USD).
Description of Variables/Data Dictionary:
The following table is a data dictionary that describes the variables and their locations in this dataset (Note: Dataset is on second page of this document):
Variable Name
Location in Dataset
Variable Description
Coding
UniqueID#
First Column
Unique number used to identify each survey
responder
Each responder has a unique
number from 1-30
SE-MaritalStatus
SE-Income
SE-AgeHeadHousehold
SE-FamilySize
Second Column
Third Column
Fourth Column
Fifth Column
Not Married/Married
Amount in US Dollars
Age in Years
Number of People in Family
USD-Annual Expenditures
USD-Housing
USD-Electricity
Sixth Column
Seventh Column
Eighth Column
USD-Water
Ninth Column
Marital Status of Head of Household
Annual Household Income
Age of the Head of Household
Total Number of People in Family (Both Adults
and Children)
Total Amount of Annual Expenditures
Total Amount of Annual Expenditure on Housing
Total Amount of Annual Expenditure on
Electricity
Total Amount of Annual Expenditure on Water
Amount in US Dollars
Amount in US Dollars
Amount in US Dollars
Amount in US Dollars
How to read the data set: Each row contains information from one household. For instance, the first row of the dataset starting on the next page shows
us that: the head of household is not married and is 53 years old, has an annual household income of $97,681, a family size of 4, annual expenditures of
$56,124, and spends $18,676 on housing, $1,468 on electricity, and $551 on water.
UniqueID#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
SE-MaritalStatus
Not Married
Not Married
Not Married
Not Married
Not Married
Not Married
Not Married
Not Married
Not Married
Not Married
Not Married
Not Married
Not Married
Not Married
Not Married
Married
Married
Married
Married
Married
Married
Married
Married
Married
Married
Married
Married
Married
Married
Married
SE-Income
97681
96727
95432
96928
94929
95744
95366
96697
96572
96653
96664
96621
96886
96244
94867
98351
109312
111478
107511
95835
110553
95706
110651
98491
99610
97663
115766
107235
106627
109523
SE-AgeHeadHousehold
SE-FamilySize
USD-AnnualExpenditures
53
39
51
43
59
52
48
49
59
51
53
54
44
56
60
34
37
29
56
54
23
52
58
22
36
51
41
38
56
37
4
2
1
3
2
4
2
2
2
4
3
2
2
4
1
3
6
5
3
3
4
4
4
3
2
3
4
6
3
5
56124
56440
55120
55932
55247
55963
57082
56453
56515
56488
55558
55746
55321
56051
55512
76558
80801
82699
83347
73092
81419
71597
83766
75996
73550
72971
83448
83471
82676
84002
USD-Housing
18676
18376
18391
18701
18483
18435
18576
18520
18648
18838
18502
18149
18312
18484
18633
26513
25392
24949
22915
23252
26991
22376
22899
26283
27164
23150
25679
26074
22414
26771
USD-Electricity
1468
1441
1458
1479
1451
1465
1478
1469
1480
1470
1478
1455
1450
1457
1485
1342
1514
1503
1723
1300
1421
1315
1682
1326
1330
1320
1511
1486
1688
1457
USD-Water
551
542
548
520
546
555
538
545
552
535
553
540
523
539
523
547
743
814
773
705
719
694
754
620
627
689
767
769
709
768

