Quantitative Problem
Solving
1
Paul Souders/Corbis
Chapter Learning Objectives
After reading this chapter, you should be able to do the following:
1. Distinguish among data of nominal, ordinal, interval, and ratio scale.
2. Associate the measures of central tendency with the scale of the data for which those m
easures
are appropriate descriptive statistics.
3. Calculate the mean, median, and mode for a set of data.
4. Calculate and interpret the variance and the standard deviation for a set of data.
5. Distinguish between the characteristics and the notation used for sample and population data.
6. Describe guidelines for reporting statistical data.
1
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 1
3/3/16 10:02 AM
How Much Math Will You Need?
Section 1.1
Introduction
People approach thinking in different ways (Witkin, Moore, Goodenough, & Cox, 1977). Fielddependent learners process information as a whole. These people tend to view complex
problems or tasks in their entirety and usually resist taking them apart; they shy away from
analytical tasks. People with this type of cognitive style are sometimes referred to as “global
thinkers.”
In contrast, field-independent learners are inclined to break complex problems into pieces.
Their approach understands the whole by reducing it to its elements. People with this type of
cognitive style are sometimes called “analytical thinkers.”
What does this theory about cognitive-style preferences have to do with statistical analysis?
Google the phrase “statistical analysis” and the common element among the several definitions will be that it involves gathering quantitative data so that the whole can be understood
by analyzing a part. With its concern for measurement, detail, and mathematical precision,
statistical analysis sounds like work that favors field-independent people. However, the fielddependent person’s approach is an asset, for example, when an instructor needs to have a
sense of whether a group of psychology students understands the distinction between shortand long-term memory. So where do the demands of statistical analysis leave students for
whom an analytical approach is not second nature? For these people, data analysis can seem
rather foreign. This book aims to help everyone who studies the behavioral sciences to tackle
quantitative problems—especially students who do not naturally gravitate to data analysis. If
analytical tasks happen to be the reader’s preference, all well and good; even so, the rest of us
also need to navigate our way through statistical problems. The author’s intent is to appeal to
both analytical and global thinkers.
1.1 How Much Math Will You Need?
For researchers, statistical analysis is not the end goal: It’s a tool. The concepts and skills
involved allow us to interpret the numerical measures we encounter and answer the questions people ask about human behavior. We are not primarily interested in statistical theory. To that end, the text explains what the formulas mean, and in some cases why they
have the form they have, but not how they were derived. With such a practical approach,
the math readers probably learned in secondary school should be adequate. Ordinarily, the
text offers nothing more to worry about than order-of-operations questions. Brushing away
the cobwebs over whether to multiply or add first when both are required, and reviewing
how to deal with parentheses and exponents when they occur in a formula will be sufficient
for the calculations and explanations here. If these topics are not familiar, review “please
excuse my dear Aunt Sally,” or whatever memory aid you used in high school to prompt
you to
first, do what’s in the parentheses;
then deal with any exponents (any squaring, for example);
then do any multiplication and division, working from left to right, and
last, any addition and subtraction, also from left to right.
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 2
3/3/16 10:02 AM
Section 1.2
What About the Notation Symbols?
In some contemporary statistics classes, the student does not do very much mathematical calculation. Computer analysis is so easy and so readily
available that many of those involved in quantitative analysis say the manual mathematical calculations that were once mainstays in statistics classes
are no longer necessary. Dedicated statistical packages are generally accessible, and spreadsheet programs such as Microsoft Excel can perform all of the
basic calculations, as well as many of the simpler
statistical tests. With all of these resources, why do
anything by hand?
Brand X Pictures/Thinkstock
Although calculators and software
such as Excel make statistical
calculations easy, doing some of
these calculations by hand helps
with understanding the underlying
concepts.
Those who take the time to learn and complete
the calculations by hand come to understand the
underlying concepts more clearly than those who
have the computer do all the work. Computer output provides a result to view, and with proper input,
accuracy and precision are usually not problematic,
but the tables and statistics often do not communicate the logic involved very well. Also, the
computer software usually provides little guidance about what the output means or how the
results were derived and why.
Hand calculation requires that a researcher consider each step in the solution. Taken incrementally, the process makes the reasoning more coherent. Each of the 10 chapters in this
textbook has been written so that readers can explain what they have done and why. Once the
logic and the processes are familiar, we can use Excel to verification our calculations and to
save time, particularly with larger data sets.
1.2 What About the Notation Symbols?
Like any discipline, statistical analysis has symbols and language that are particular to it. Like
many math-based subjects, statistical analysis makes some use of symbols, some of them
from the Greek alphabet, to indicate procedures or values that are used frequently. Try not to
be distracted by them; they often involve procedures or values that the reader has probably
used many times. For example, the upper-case Greek letter ∑ (sigma) symbolizes summation.
It indicates that several values are to be added together. Further, a ∑x means that several
values, each of them referred to as “x,” are to be summed. If each of four subjects in a group
is administered a verbal ability test, resulting in scores of 23, 35, 36, and 42, x refers to the
individual scores, and ∑x 5 136.
Sometimes whether Greek or traditional roman letters are used indicates the difference
between a group’s origins. A population is all members of a defined group. All the members of your family comprise a population, as do all Nevada voters, or all psychologists in the
country. Any subset of a population is called a sample. The average, or the mean, of some
population characteristic—the mean age of Nevada voters, for example, or the mean number of years of education among psychologists—is indicated by the lowercase Greek letter µ
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 3
3/3/16 10:02 AM
Section 1.3
Why Do We Use Statistics?
(which is pronounced “mew”—like the
cat, not “moo” like the cow). This book
will follow the convention of journals
of the American Psychological Association, and the mean of the sample
will be indicated by the upper-case
roman M.
David Sipress/The New Yorker Collection/The Cartoon Bank
Although the idea of an average score
is not a new concept, the symbol µ may
be. The symbol is just a shorthand way
to mathematically represent the average, or what we will call from this point
forward the mean. If the ages of the
members of your family are 14, 21, 45,
and 47, and if those ages constitute the
entire group, then µt 5 31.750 [(14 1
21 1 45 1 47)/4].
People working with statistics find the symbols very helpful. They provide an economical
way to communicate procedures and values that are used repeatedly, as some tend to be
in statistical analysis. None of the symbols we use represent concepts or operations that
are very complicated; they are just a briefer way to
refer to common operations or characteristics.
Try It!: #1
If a population is all members of a
defined group, and your class has seven
students, can its population really be
only 7?
The point of this discussion is that statistical analysis need not be the exclusive domain of those with
an analytical style. It can be used by people who are
global thinkers as well. The math in this book will be
entirely manageable, and the notation that is used in
the formulas is for efficiency, not mystery.
1.3 Why Do We Use Statistics?
Many of the problems with which behavioral science professionals grapple are complex and
nuanced because the people we study are complex and nuanced. From one point of view,
behavioral scientists have a much more difficult job when it comes to research than, say,
chemists have. If two hydrogen atoms combine with one oxygen atom, the result is always
H20, water. Human behavior usually does not manifest such consistency.
In a study of childhood obesity, Hewer (2014) reported that to understand children’s food
choices, researchers decided to include brain-scan data gathered by the researchers, the number of children who selected healthy food, the number of children who ate what they took,
and other measures that the researchers thought were related to childhood obesity. These
different measures were all necessary because of subject-to-subject variation. Statistical
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 4
3/3/16 10:02 AM
Section 1.4
Describing Data
analysis must accommodate variability. And since researchers cannot analyze what they cannot describe, they begin with descriptive statistics.
Descriptive statistics are an economical way to describe several measures. Note that descriptive statistics are generalizations. For example, in the obesity study (Hewer, 2014), researchers found that if they labeled carrots as “x-ray vision carrots,” 66% of the students in five
schools consumed what they took. That doesn’t mean that 66% of the students in each school
ate their carrots; the percentage shows a generalization over the five schools. In some schools
the result was likely more than 66%, while in other schools, likely less.
Meanwhile, Kiecolt-Glaser and others (2011) studied a group of older adults who were providing care for a family member with dementia. They found that the caregivers had a higher
probability of disease late in life than other similar adults not caring for such a family member. Their generalization was that higher-stress activities result in a greater propensity for
disease later in life. Studies such as this one result in inferential statistics; the researcher
infers from a portion or a sample of the whole what is likely to occur in the entire group
or population of similar people. In the earlier study by Hewer (2014), the objective was to
understand how most children select food based on the behavior of a sample of such children;
researchers were inferring population characteristics, what all children will do, from what a
sample of children did.
Sometimes the inference is from a sample to a population not specifically represented in the
sample. For many years, it has been common in psychology to study one species and generalize to another, a practice that continues. Recently, Robinson (2014) reported studying rats to
understand substance-abuse relapse in humans.
The ability to use the sample as a window on a population requires, of course, that the sample be representative of the population, at least in the relevant characteristic(s). Robinson’s
(2014) burden is to demonstrate that at least in the instance of relapse, what happens to laboratory rats can inform what occurs with people, in spite of the fact that two different species
are represented.
When the appropriate conventions are observed, samples can provide great insight regarding
a population without the need to study the entire population—an important economy since
studying the entire population is often not practical or perhaps even possible.
Although inferential statistical analysis drives most psychological research, inferential procedures always involve descriptive statistics and our initial interest is in these measures. We
want to learn to calculate and interpret them. After we have developed some facility with
descriptive procedures, the discussion will turn to inferential statistical analysis.
1.4 Describing Data
There are many ways to describe data, including the exclusively verbal descriptions that
ethnographers and anthropologists sometimes use. However, quantitative research and the
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 5
3/3/16 10:02 AM
Section 1.4
Describing Data
statistical analysis upon which it is based require numeric descriptions. Numerical measures
differ in kind; one way to classify them is in terms of scale. Data scale refers to the type and
amount of information that each measure provides, and data can be categorized into four scales:
1. nominal
2. ordinal
Nominal Data
3. interval
4. ratio
The word nominal refers to the name assigned to the data. Nominal data are also sometimes called categorical data because the name indicates the category to which an individual belongs, in which case the only analysis that can be performed is to count the number
that occur in a particular category. In an effort to analyze the demographic makeup of those
involved in an alcoholics support group, for example, a researcher might gather data on gender, race, ethnicity, marital status, or religious affiliation by assigning numbers to each of the
data categories. For ethnic group membership, perhaps 1 indicates African-American, 2 designates Asian, 3 Caucasian, 4 Hispanic, and so on. The numbers simplify summarizing the
number of participants but beyond indicating the category to which an individual belongs,
the numbers themselves have no mathematical meaning. It would not make sense to try to
compute “mean” ethnicity, for example, by adding up the 1s, 2s, 3s, and 4s and then dividing
by the number of categories. The numbers, in this case, are just labels.
Ordinal Data
Ordinal data allow a researcher to rank individuals relative to the others in the group. Ordinal data provide more information than nominal data, enough in fact that individuals can be
ranked according to whatever quality is being measured. Higher values indicate more of the
measured characteristic. The limitation of this scale is that ordinal measures do not indicate
how much more of whatever is measured one individual possesses than another. In contrast,
when comparing nominal data values, a higher (or
lower) number simply means that the individual
belongs to a different group. Noting that one individual is faster, or more creative, or more successful
than another is an example of an ordinal measure.
The measure is not very precise, but it does provide enough information for a ranking. A student
who has an intelligence score placing her at the
90th percentile has a higher intelligence score than
someone at the 85th percentile, but the amount of
increase in intelligence is not clear. First prize at a
car show is better than second but does not indicate
the margin of victory.
Interval Data
Jim Reed/Science Faction/Corbis
Fahrenheit temperature measurements
are an example of interval scale data.
With interval data, the size of the gap, or the interval
(thus the name), between individual cases is known
because the difference between consecutive data
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 6
3/3/16 10:02 AM
Section 1.4
Describing Data
points is constant anywhere along the number line. That rather obtuse definition just means
that the difference between 4 and 6 on an interval scale is the same amount of difference as
that between 17 and 19.
Perhaps a psychologist develops a measure of verbal aptitude. If aptitude scores are based
on the number of items the respondent scores correctly, and if each item carries the same
weight, the increase in aptitude from scores of 12 correct responses to 17 is the same as the
increase from 16 to 21.
Interval scale measurement is quite common in the social sciences. Many of the mental measurement instruments psychologists use to gauge anger, depression, aptitude, intelligence,
and so on produce interval scale data. However, many are also ordinal scale. Though researchers will sometimes argue that ordinal scale data with a large number of categories can be
treated as interval scale data, several ordinal categories do not magically transform ordinal
data into actual interval data. It is not possible to directly measure mental states, for example,
nor is it possible to demonstrate that the intervals between consecutive ordinal scale values
are truly equal. To state that these types of instruments actually produce interval data instead
of ordinal data may be wishful thinking.
Ratio Data
Ratio data have all the characteristics that interval data have and two more. First, a 0 in ratio
measure indicates the absence of the characteristic being measured. With interval scale data,
0 does not mean that none of the characteristic is present; it is just a point on the scale midway between 21 and 11. Because the temperature is 0 degrees Fahrenheit does not mean
that it cannot become colder; 0 is just a point on the scale. (The Kelvin scale is a different matter, of course, where 0 is the coldest possible temperature.) In the aptitude example above, if
someone missed every item on the measure, it is probably not wise to conclude that the individual has zero aptitude for whatever is gauged. Likewise, we would not argue that someone
who misses every item on a spelling test cannot spell at all, despite the individual’s answering
every item incorrectly. In terms of the construct measured (spelling or aptitude), neither test
provides ratio scale data.
The other quality that sets ratio data apart has to do with the name. With these data, what are
called “ratio comparisons” are possible. A person who is 6 feet tall can be accurately said to
be twice as tall as a child who is 3 feet tall, or a 20-year-old is half as old as someone who is
40, or someone who makes $90,000 a year has an income three times greater than the person
who makes $30,000. Comparisons like these only make
sense with ratio data.
Except for demographic characteristics like height,
weight, age, and income, ratio scale measurement is
very uncommon in psychology and in fact in all the
social sciences. Mental measurements are rarely—
perhaps never—ratio scale.
Identifying the scale of the data is important because
to some extent the statistics that can be calculated to
describe the data depend upon their scale. Likewise,
Try It!: #2
a. Someone comments about a student,
“He’s more ambitious than his classmates.” What scale of data is involved
in such a statement?
b. What scale of data is in your bank
statement?
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 7
3/3/16 10:02 AM
Descriptive Statistics
Section 1.5
some of the statistical tests this text will calculate later are tied to data scale. Tests that require
interval data will also accommodate ratio data, so the distinction between those two is not
important in terms of which test to use. The differences between nominal, ordinal, and interval data matter a great deal, however.
1.5 Descriptive Statistics
As we suggested earlier, descriptive statistics are measures that summarize larger groups of
data. They make describing the data set possible without the need to recite each of the individual measures.
There are several descriptive statistics, but the most common are measures of central
tendency and measures of variability. The latter are also sometimes called measures of
dispersion. First, we will look at measures of central tendency.
Measures of Central Tendency
Three different measures of central tendency are commonly used in data description: the
mode, the median, and the mean. Although each indicates what is most typical in a data set,
each measure relies on a different definition of “typical.” As a result, these three statistics
complement each other.
The Mode
The mode (Mo) is defined as the most frequently occurring value in a group. It is the only
statistic for indicating central tendency that is appropriate when the data are nominal scale.
Perhaps someone is interested in the residential background of 20 military personnel being
treated for post-traumatic stress disorder (PTSD). If they indicate their predeployment residences on a questionnaire, and if 1 indicates urban, 2 suburban, 3 semi-rural, and 4 rural, the
results might be as follows:
1, 2, 2, 1, 1, 3, 4, 1, 4, 3, 2, 1, 1, 2, 1, 3, 2, 1, 1, 2
Remember that the numbers indicate just where they lived prior to deployment. If those in
each category are counted, the result is:
1s
9
3s
3
2s
4s
6
2
The mode, the most commonly occurring value, is 1, Mo 5 1, which indicates that in this group
of 20 military personnel being treated for PTSD, more come from urban backgrounds than
from any of the alternatives.
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 8
3/3/16 10:02 AM
Descriptive Statistics
Section 1.5
The mode is often calculated for ordinal, interval, and ratio data as well, where it also indicates the most frequently occurring value. In such cases, it is usually accompanied by some
other measure(s) of central tendency. However, for nominal data, the mode is the only measure of central tendency that makes sense.
In some instances, there can be more than one mode. In the PTSD example above, had three more
suburban service personnel been diagnosed with the condition, the results would include two
most frequently occurring values (9 ones and 9 twos). Such data have a bimodal distribution.
The Median
When scores are arranged either from largest to smallest or from smallest to largest, the
median (Mdn) is the middlemost score. Another way to describe it is that the median is that
point on a scale where an equal number of scores have greater values and lesser values. The
median requires data of at least ordinal scale. Since interval and ratio data can also be ranked,
the median can be calculated for any data beyond nominal scale. The median is not calculated
for nominal data because there, the numbers are only category labels, and the “middle” would
be an arbitrary value based entirely on the numbers used as labels.
Suppose all freshmen students at a university are ranked in order of their academic performance at the end of the year. These are the class rankings for nine students in the department
of psychology:
3, 7, 13, 15, 17, 33, 36, 42, 51
The median is the middle ranking. Among nine rankings, the middle ranking is the fifth, which
for these class rankings is 17, so the Mdn ranking for psychology freshman is 17. Note that this
data set has no mode. All the rankings occur with the same frequency, 1.
If 10 psychology students were ranked, and the 10th had a class ranking of 52, the results
would show the following:
3, 7, 13, 15, 17, 33, 36, 42, 51, 52
In this case, with an even number of scores, two scores occur in the middle of the distribution.
The median is the average, or mean, of those two middle scores. The fifth and sixth scores are
17 and 33: 17 1 33 5 50 4 2 5 25. With the additional student’s class ranking added, the Mdn
ranking for psychology freshman changes to 25.
Note that the change to the median that results from adding the ranking of 52 is substantial, not
because of the magnitude of the 52, but because the size of the gap from 17 to 33 is so large. Had
the next ranking after 17 been 18, for example, adding the ranking of 52 to the data set makes
Mdn 5 17.5. The variability in the data set greatly affected this measure of central tendency.
The Mean
The mean (M) is the average of a set of values, but hereafter the text will use the term mean
rather than average. To calculate the mean, the data must be interval or ratio scale. Suppose a
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 9
3/3/16 10:02 AM
Section 1.5
Descriptive Statistics
data analyst is interested in gauging the level of depression among 10 social workers. A psychologist administers a depression scale for the 10 and finds the following results:
3, 4, 4, 5, 5, 6, 6, 7, 7, 8
Calculating a mean might not be new, but the statistical symbols might be, so note Formula 1.1:
Formula 1.1
∑x
M5 n
where
M 5 the mean
x 5 each value in the set (in this case, each depression score)
∑x 5 the sum of the individual depression scores
n 5 the number of values or scores
For the depression data, verify that
Sx 5 55
n 5 10,
∑x
55
n 5 10 5 5.5
M 5 5.5 indicates that the mean level of depression is 5.5. Without something to which the
mean can be compared, the number by itself is not very revealing. It is not clear whether 5.5
is the mean level of depression for all people, or all college-educated people, or all social
workers. All we know is that for this group, 5.5 is the arithmetic mean.
Although there have been variations in the past, most of the major journals in psychology and
many other disciplines currently use M to indicate the mean of a sample. (Recall from earlier
in the chapter that µ indicates the mean of a population.) This text will follow the convention of using M
for the sample mean.
The median can also be calculated for interval
data. With an even number of scores, as in the
depression-score sample, the median will be the
midpoint between those two middle scores: 5 and
6. For the depression scores, Mdn 5 5.5.
JPL/Jpse : Pelaez/Corbis
A psychologist administering a
depression scale can use the median
and mode to understand what is
typical of several depression scores.
Calculating the mode for the depression scores
shows that four values occur with the same frequency, so there are four modes, Mo 5 4, 5, 6, and
7. In small data sets, the mode is usually not very
revealing and often not calculated at all, but the
most frequent score or measurement—the mode—
can be very informative with larger groups.
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 10
3/3/16 10:02 AM
Descriptive Statistics
Section 1.5
The fact that the mean and median have the same value (M 5 5.5, Mdn 5 5.5) will be particularly
relevant to the discussion of normality in Chapter 2. For now, just note that the mean and the
median have the same value and for our set of depression scores at least, the mode is not helpful.
Measures of Variability
Measures of central tendency are easier to understand when accompanied by variability
measures, and these two descriptive measures often go hand-in-hand. As the name suggests, variability measures indicate how scores differ from each other. Generally speaking,
large values indicate substantial variability. Relatively small variability values indicate that
the data are quite similar or quite homogeneous. For example, among university students
who have all met entrance requirements to a prestigious school, there will be less variability in scholastic aptitude than among people of the same age selected randomly from the
general population.
The Range
When a researcher needs a quick measure of data variability, the range (R) is the easiest
to calculate. It is the difference between the highest and lowest values in a data set. For the
depression data
3, 4, 4, 5, 5, 6, 6, 7, 7, 8,
R 5 5 (8 2 3)
It is common to hear people say, “Scores ranged from ___________ to _____________ ,” but technically the range is just one value: the difference between the highest and lowest measures. For
that reason R is not very informative when reported in isolation from other statistics. The
depression data’s R 5 5 does not reveal much.
A research report in which the authors report only that R 5 5, does not indicate how many
values were involved or what the highest and lowest values were. For example, if two clerical
workers from the same state office as the social workers are also measured on the depression
scale and have scores of 9 and 14, then the range for those two scores is also R 5 5. The clerical workers’ scores are quite different from the social workers’ scores in terms of quantity
and value, but the range does not reflect either of those differences.
The Variance and the Standard Deviation
While the range indicates how much difference there is between the high and low scores,
other measures of variability are based on how much individual scores in a data set vary from
the data mean, M. This describes both the variance (s2) and the standard deviation (s),
which are anchored to the mean of the group. For either statistic, large measures (and it is
difficult at this point to know what “large” is) indicate that individual values in the group tend
to differ more from the mean (M) of the group than smaller values. As the s and s2 notations
suggest, the two statistic’s formulas (see 1.2 and 1.3 below) are very similar. Note that in this
text, while we will initially distinguish between sample and population standard deviations,
we will use standard deviation as shorthand for sample standard deviation, since it is very
rare that anyone has access to population data.
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 11
3/3/16 10:02 AM
Section 1.5
Descriptive Statistics
Formula 1.2
The formula for the variance (s2) is expressed as:
∑(x 2 M)2
(n 2 1)
s2 5
Formula 1.3
The standard deviation (s) is expressed as:
s5Ñ
∑(x 2 M)2
(n 2 1)
In the case of either formula,
∑ 5 summation,
x 5 each score in the group,
M 5 the mean of the group,
n 5 the number of scores in the sample.
Note that if the standard deviation (s) is multiplied by itself the result is the variance (s2).
From the other direction, taking the square root of the variance produces the standard deviation. Why include both of these closely related statistics? The answer is that some of the more
involved procedures we will use later call for the standard deviation (s), some for the variance
(s2). Figure 1.1 depicts how to use the formula to calculate the variance.
Figure 1.1: Calculating the variance
Determine the group mean
∑x/n
Subtract the mean from each score
(x−M)
Square each x−M difference (x−M)2
Sum all the squared differences ∑ (x−M)2
Divide the sum of the squared differences by the
number of scores, minus one ∑ (x−M)2 / (n −1)
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 12
3/29/16 1:13 PM
Section 1.5
Descriptive Statistics
To calculate the variance for the depression data (3, 4, 4, 5, 5, 6, 6, 7, 7, 8), we follow this
procedure:
∑x
55
1. M 5 n 5 10 5 5.5
2. Subtract M from each x.
3 2 5.5 5 22.5
6 2 5.5 5 0.5
4 2 5.5 5 21.5
7 2 5.5 5 1.5
4 2 5.5 5 21.5
5 2 5.5 5 20.5
5 2 5.5 5 20.5
6 2 5.5 5 0.5
7 2 5.5 5 1.5
8 2 5.5 5 2.5
3. Square each x 2 M difference. Remember that squaring a negative number results in
a positive.
22.52 5 6.25
0.52 5 0.25
21.52 5 2.25
1.52 5 2.25
21.52 5 2.25
20.52 5 0.25
20.52 5 0.25
0.52 5 0.25
1.52 5 2.25
2.52 5 6.25
4. Sum the squared differences.
6.25 1 2.25 1 2.25 1 0.25 1 0.25 1 0.25 1 0.25 1 2.25 1 2.25 1 6.25 5 22.50
5. Divide the sum by the number of scores (10) minus 1.
22.50
9 5 2.50
s2 5 2.50
Determining the standard deviation requires one more step.
6. The standard deviation (s) equals the square root of the variance, or
s 5 Î 2.50
s 5 1.581
Note that the way the variance and standard deviation are determined emphasizes their relationship to the mean. Although all statistical packages and spreadsheets will calculate these
measures, and in fact almost any other descriptive statistic as well, there is value to doing a
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 13
3/3/16 10:02 AM
Descriptive Statistics
Section 1.5
few of these calculations by hand with the steps given. The repeated x 2 M calculations
(step 2) remind us that the central component in both of these statistics is the difference
between individual scores and the mean. Because subtracting each individual value (x)
from the mean indicates how far individual data points are from M, why not just sum those
differences to get a total of the differences between the xs and M? Why square the
result?
The answer is that about half of those differences are going to be positive (when x . M)
and about half are going to be negative (when x , M), as was the case with the depression scores. Summing them would result in something
close to 0, which would not provide any useful information about data variability. When the differences
TIP
are squared, however, all the negatives become posiWhen using a calculator to determine a
tives, and when they are averaged (step 5, dividing the
square root, be aware of some functionality
sum of the squares by n 2 1; the 21 will be discussed
differences. Some calculators allow entering
later), the result is a value that provides a gauge of the
the value and then pressing the square root
typical distance between the scores and the mean.
button. Others require that the square root
function be pressed before entering the value.
The variance and standard deviation have more than
purely descriptive functions. When we explore t test,
for example, we will find that when two sample sizes are equal, we need the standard deviation to complete the test. When sample sizes are not equal, we use the variance. But for
now, our interest lies just in describing data sets. The depression scores are fairly similar;
the 10 scores have a range of only five points and what readers will come to recognize, with
some practice, as a comparatively small variance and standard deviation: s2 5 2.50 and
s 5 1.581.
The name standard deviation suggests that what this statistic is indicating is the “standard”
deviation of all the individual data points from the mean of the group. While it is not, strictly
speaking, an average or mean deviation, it is similar. The result indicates that the “standard”
deviation of individual data points from the mean of the sample is 1.581 points. As we noted
above, the variance answers the same question about typical variability among the values in a
group, but it does so without the final square root, which means that the variance will always
have a larger value.
Alternative Formulas
Formulas 1.2 (the variance) and 1.3 (the standard deviation) are examples of conceptual formulas, chosen because they make it easier to understand how the resulting value is derived and
what it means. A scan of other statistics books indicates
that there are alternative formulas for calculating standard deviation and variance. Some of these are easier to
Try It!: #3
use, particularly with large data sets, but they commuIn a set of n 5 5 where M 5 6.0, the lowest
nicate less than the ones we have used here. Because we
value is 3 and the highest is 11. If a value
want to emphasize clarity over ease of calculation (and
of 12 is added, which will be affected most,
because we know that your calculator or computer is
R or s?
going to do the work in any case), this text will use Formulas 1.2 and 1.3 and keep the data sets small.
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 14
3/3/16 10:02 AM
Section 1.5
Descriptive Statistics
Apply It!
A Study on Studying
As part of her thesis, Anna, a graduate student in psychology, is studying college failure rates
among first-year students. She has just read results from a nationwide study that found that
the majority of students who dropped out in their first year studied less than 11 hours per
week. Anna decides to conduct a survey of first-year students at her college. She chooses a
random sample of 100 freshmen and asks each how many hours they study in a typical week.
Rather than present all 100 responses, she will summarize them with descriptive statistics.
The difference between studying 6 hours and 8 hours a week is precisely the same amount of
difference in study time as the difference between studying 1 and 3 hours a week, so the data
are at least interval scale. However, since a response of “zero” would indicate that the responding student did not study at all, students’ responses actually constitute ratio scale data.
Anna calculates several measures of central tendency, and discovers these results:
• Mode 5 16 hours
• Median 5 15.5 hours
• Mean (or average) 5 16.5 hours
As you can see, values for the mode, median,
and mean are all closely grouped. Since measures of central tendency are more informative
when accompanied by a variability measure,
Anna proceeds to calculate the range and standard deviation. She finds that the range (R) is
33, meaning the difference between the student
who studies the most and the one who studies
the least is 33 hours.
Creates/Thinkstock
In addition, Anna calculates the sample standard deviation(s) for the responses and finds that
s 5 5.5; the standard, or typical deviation from the mean of 16.5 hours of study is 5.5 hours.
Consider the implication. Students who are one standard deviation (5.5 hours) or more below
the mean of 16.5 hours study 11 hours per week, or less. They represent the very group most
likely to drop out during their first year of college study.
At the end of the study, Anna presented her results to the dean of students. He asked her to
make a presentation to the freshman class emphasizing the relationship between study time
and the tendency to drop out of school.
Apply It! boxes written by Shawn Murphy
The Impact of Different Score Values
In our example, the highest depression score was 8. If instead it had been a 12, how would
that affect the variance and the standard deviation? Because 12 is more distant from the mean
than 8, and because both the variance and standard deviation formulas call for the M 2 x differences to be squared, a more extreme value invariably increases the value of the statistics.
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 15
3/3/16 10:02 AM
Section 1.5
Descriptive Statistics
If s2 and s (and M) are recalculated to reflect the change, note that the mean changes to
M 5 5.9; s2, the variance, becomes 6.322, compared to s2 5 2.50 in the original data set; and s,
the standard deviation, becomes 2.514, compared to s 5 1.581 originally.
TIP
Calculators with a built-in standard deviation
function make it possible to enter the values and produce the result without the several x 2 M steps. Directions vary somewhat
depending upon the particular calculator, but
most have a key marked something like “σxn 2 1”
or “σn 2 1,” or SD for the standard deviation. The
σ is the lowercase Greek letter sigma, which is
the equivalent of s. It is often used as a symbol
for the standard deviation, as we will see later
in the chapter. An σ2 key for the variance is less
common because it is the lesser-used of the
two statistics, and in any case, using the x2 key
on the calculator to square the standard deviation will produce the variance.
Extreme values have a disproportionate impact on the
value of these statistics.
Because both statistics are based on the square of the
difference between individual scores and the mean, and
because extreme scores produce the largest squared
differences, extreme values have a disproportionate
effect on the size of s2 and s as we noted above. Scores
much smaller than the mean also have great impact
because the issue is the difference between x and M, not
just the magnitude of the score. If one of the 4s in the
data set is changed to 0 so that the depression scores
become 0, 3, 4, 5, 5, 6, 6, 7, 7, 8, then the mean reduces
to M 5 5.10; s2 becomes 5.433 instead of the original
2.50; and s becomes 2.331 instead of the original 1.581.
To summarize, scores quite similar to the mean of a distribution tend to shrink the values of the standard deviation and the variance. Scores substantially different from the mean in either direction make
those measures of variability larger. The effect of extreme scores, called outliers, can be to substantially distort values and statistical procedures, particularly when sample sizes are small.
The problems of how to identify them and what to do with them will come up later in the book.
Although they are all variability statistics, there is an important contrast between the variance
and standard deviation statistics on the one hand and the measure that is the range on the
other. Since the range is based on the lowest and highest score in a distribution, the R value
is unaffected by any of the values that fall between the extremes; no additional value between
the highest and lowest values will change the range. The number of scores in the range can
increase, but R will not decrease unless the two most extreme values become less extreme.
Populations Versus Samples and a Correction
for a Biased Estimator
Earlier, the chapter defined population as all possible members of a specified group. The
sample was any subset of the population. A population can include “every psychologist in San
Diego,” or “all law enforcement officers in the state,” or “everyone in your family.” Remove at
least one person from any of those populations, and the resulting group is a sample. Because
samples are easier to access than populations, most research is based on samples. Gathering
data on populations, particularly when they number in the thousands or millions, is far too
time-consuming and expensive to be feasible.
When important parameters for gathering the sample are met, the sample’s characteristics,
including its descriptive statistics, will mirror those of the population. While no sample can
be an exact duplicate of its parent population, samples can be similar enough for useful inferential analysis. This is the principle upon which political polling is based, of course. A few
thousand people who accurately represent the characteristics of the entire electorate may
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 16
3/3/16 10:02 AM
Section 1.5
Descriptive Statistics
reveal how an election will likely turn out; a carefully selected group of law enforcement officers may suggest the stress level in all law enforcement officers; a few consumers may indicate what the tendency will be among the population of all consumers.
Sometimes making the sample representative involves more than careful sampling technique.
As it turns out, samples tend to be less variable than populations. If we were to conduct a survey of stress levels among law enforcement officers, we would probably find that the sample
produces stress scores that are less variable than stress in the entire population of police officers. If researchers do not make some adjustment (described below), the result could easily
be a consistent underestimation of stress variability, which constitutes bias. Note that in statistical usage and in this example, bias means that stress among law enforcement officers is
consistently, and probably unintentionally, underestimated. It does not carry the more widely
used sense of intentional discrimination.
Since careful sampling is not enough to counter the bias, researchers make a mathematical
adjustment for variances and standard deviations in samples so that they will more accurately reflect the population. This “correction for a biased estimator” is the 21 in the denominators of Formulas 1.2 and 1.3. The adjustment slightly increases the value of the variability
measure, with the greatest correction occurring when samples are smallest and the potential
for distortion is greatest.
If all the data are available for every possible member of a group, bias is not a concern, and the
adjustment is therefore unnecessary. In the case of population data, should they be available,
the formulas for variance and standard deviation are as follows:
Formula 1.4
∑(x 2 μ)2
N
σ2 5
Try It!: #4
Formula 1.5
σ5Ñ
What constitutes bias in statistics?
∑(x 2 μ)2
N
Besides the absence of the 21 in the denominator, note that sigma (σ) has been substituted
for s, and mu (µ) has been substituted for M in both formulas. The σ indicates the population
standard deviation, and σ2 indicates the population variance. Earlier in the chapter, we noted
that µ indicates the population mean.
Differentiating Sample and Population Characteristics
The descriptive characteristics of populations are referred to as parameters, and technically,
the word “statistic” refers to a sample characteristic. To summarize,
Mean
Standard Deviation
Population Parameter
Sample Statistic
µ
M
σ
s
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 17
3/3/16 10:02 AM
Section 1.5
Descriptive Statistics
The Statistic or the Parameter?
Zhao Jingwu/Xinhua Press/Corbis
Researchers sometimes use population
data from an entire national census,
but most often they work with only a
sample of a given population.
Unless we are working with a relatively small
population—such as the population of a particular
social worker’s clients or the population of a family
or social group—we generally do not have all the
data. Exceptions are found, of course: Researchers
sometimes work with data from the U.S. Census
which, by definition, includes the entire population
of the country, and testing agencies will sometimes
provide means for the entire population who took
a particular test. But researchers more commonly
have access only to sample data. This book will
use Formula 1.3 and calculate the sample standard
deviation.
Understanding Degrees of Freedom
The n 2 1 also gives us a chance to introduce degrees of freedom (df). Degrees of freedom are
one of those odd statistical abstractions that are difficult to explain briefly but affect the way
that many procedures are computed and interpreted, such as t tests (Chapter 5) and analysis of
variance (Chapter 6). Degrees of freedom are the number of scores in a calculation that are free
to vary when the final result of the calculation is known. Consider the following example.
If the sum of three integers is 6, or ___ 1 ___ 1 ___ 5 6, then the first two of
those three integers can be any integers. They could be 2 1 2, or 3 1 1, or
23 1 10, or any other two values, as long as the value of the third integer
makes the sum 6. The third value cannot vary; it must be 2 in the first and second examples above, and 21 in the third example. This problem has 2 degrees
of freedom.
If the number of integers that make up the problem is n, degrees of freedom (df) for a problem
like the three-digit addition problem above are n 2 1. The standard deviation and the variance also have n 2 1 degrees of freedom. If we know the final value of either s2 or s, all the
scores in the data set except for one can have any value, but the final value must be whatever
it takes to make the result what it is. Degrees of freedom will come up with other procedures
as we move through the book.
TIP
Calculators with a built-in standard deviation
function generally have keys for both the sample
standard deviation and the population standard
deviation. They usually distinguish the sample
standard deviation with something like an n 2 1
on the key or in the display. Unfortunately, some
manufacturers use s for both the sample and the
population standard deviation, so look for the
n 2 1 to identify the sample statistic.
Reviewing Results
Do not become so busy calculating variances or standard deviations that you set common sense aside. All
measures of variability have this in common: the calculated value cannot be a number less than 0. There is no
such thing as negative variance. If a school psychologist
measures intelligence scores for a variety of students
who have learning disabilities and finds that they all
have the same score, the scores have no variation, and
s2, s, and R will all equal 0. If, in the course of all the
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 18
3/3/16 10:02 AM
Calculating Descriptive Statistics with Excel
Section 1.6
number-crunching, a negative value somehow emerged for a variability statistic, look for a
calculation error.
1.6 Calculating Descriptive Statistics with Excel
Now that you can calculate descriptive statistics by hand and understand what they mean, we
can begin to rely on Excel.
Excel, which is part of Microsoft Office, is an electronic spreadsheet program. Like all spreadsheets,
it is laid out in the rows and columns of a ledger. Computer spreadsheets were originally intended
for businesspeople who kept track of and manipulated large amounts of numeric data. Although
the commands vary for different spreadsheet programs, they all produce descriptive statistics,
and most programs, including Excel, will also complete some of the basic statistical tests.
Descriptive statistics in Excel can be obtained in two ways. They can be calculated directly
by entering the step-by-step commands, as we did when we used the calculator, or they can
be part of a package in Excel called “Descriptive Statistics.” First, we will try the individual
commands.
Consider a psychologist who is interested in the cognitive characteristics of teenage boys who
get into trouble with the law. The psychologist gathers data on problem-solving ability among
teenage boys consigned to juvenile hall. Using the Problem-solving Aptitude Test (PAT) and having secured the appropriate permissions, the psychologist tests 12 randomly selected juvenile
offenders. These are the resulting PAT scores: 11, 14, 14, 15, 17, 17, 17, 19, 22, 22, 23, 27.
Navigating Excel
The individual boxes in an Excel spreadsheet are called “cells.” Each cell is identified by the
column and row in which it is located. The columns are labeled from left to right, alphabetically from column A. The rows are numbered down the left side of the window beginning
with 1. Cell A1 is the top cell in column A. The next cell down is cell A2, and so on. When identifying a cell, the column letter is named first, followed by the row number. When entering cell
locations in Excel, do not put a space between letter and number.
This text uses the Office 2010 version of the software. Although Excel is updated from version
to version, and the appearance and some of the procedures change, the commands remain
quite consistent.
The steps for entering the PAT data into the spreadsheet are as follows:
1. Place the cursor in the first cell (for example, cell A1). This can be done by using the
arrow keys in the keypad to move the cursor, by clicking the mouse on the particular
cell, or by using the touchpad on a laptop.
2. In cell A1, type the number 11, followed by the Enter key. Typing Enter moves the
cursor to the next cell down.
3. Enter each of the other values so that the data are arranged vertically in all the cells
from A1 to A12. The spreadsheet should look like Figure 1.2.
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 19
3/3/16 10:02 AM
Section 1.6
Calculating Descriptive Statistics with Excel
Figure 1.2: A data set entered in Excel
Figure 1.2 depicts how Excel displays a data set as a vertical list.
Source: Microsoft Excel. Used with permission from Microsoft.
Entering the Command for the Mean
An equal sign (5) entered into a cell tells Excel that a specific command or a formula will follow.
To calculate the mean for the 12 PAT scores so they appear in cell A13, perform these steps:
1. Place the cursor in cell A13.
2. Type in =average(a1:a12), which averages the data in cells A1 to A12. (Note that
the Excel command is average rather than mean.) Press Enter. Or, type in =average(
and then use the mouse to highlight the cells that will be included, followed by Enter.
The value that appears in cell A13 is the mean, 18.16667.
3. From the Home page, click the arrow in the bottom right corner of the Number tab
(it is in the middle near the top of the screen).
4. Under category, click Number, and then using the arrow keys on the right, indicate
the number of decimal places. Round to 3 decimal places and click OK. That will make
M 5 18.167.
Using the Descriptive Statistics Option
Entering the specific command is fine if that is the only statistic needed, but if we need a more
comprehensive description, the mean is one of several descriptive values in the Descriptive
Statistics option. For the PAT data list, to perform the commands for that package of statistics
complete the following steps:
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 20
3/3/16 10:02 AM
Section 1.6
Calculating Descriptive Statistics with Excel
1. From the Home tab, click the Data tab, which is four to the right at the top of the page.
2. Click the Data Analysis window at the extreme right just below the tabs. This will
open a small window in the page with a list of options. (If the Data Analysis option
does not appear, you will need to add it. For Excel 2007 users, click here for instructions. For instructions on how to add in this window in Excel 2010, 2013, or 2016,
click here.
3. Click on the Descriptive Statistics option, and then click OK.
4. In the small window labeled Input Range, type in the cells for which you wish the
values to be included (A1:A12) just as we did when we entered the formula for
the mean. (Column letters may be entered as either upper- or lower-case.) Note
that the default setting for data is “Grouped by” columns. If the data were listed
along a row, the default would need to be changed.
5. Click Output Range and indicate where the results display is to begin, perhaps cell
C1, so that results are next to the original data but not on top of them.
6. Finally, click the particular output you wish, which is Summary statistics.
7. Click OK.
Figure 1.3 shows the results of the Descriptive Statistics option with the values rounded to
three decimals.
Figure 1.3: The Excel data analysis, Descriptive Statistics option
Figure 1.3 shows how Excel displays the results of the Descriptive Statistics option, which performs a
series of statistical calculations.
Source: Microsoft Excel. Used with permission from Microsoft.
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 21
3/3/16 10:02 AM
The Language of Research
Section 1.7
Note the following Excel results:
M 518.167
s 5 4.589
Mdn 5 17
Mo 5
R5
17
16
With values from 11 to 27 in the data set, what happens to the range if a second value of 27 is
added? The answer, of course, is that the range is unaffected since it is a measure of the difference between the highest and lowest values in the data set, and the highest and lowest values
are not altered by the addition of the new value.
Is the same true of the standard deviation? Is it also unaffected? Although it too is a measure
of variability, unlike the range, the standard deviation statistic indicates how much individual
values tend to vary from the mean of the data set. A value of 27 is substantially different from
the mean of 18.167, so the value of the standard deviation (and the variance) must increase.
Recalculating the values with that additional 27 produces a result of s 5 5.031, compared to
the original s 5 4.589.
On the other hand, if an additional value near the mean were added, say something like x 5 18,
what would be the effect on the standard deviation? The typical variability and therefore the
value of the standard deviation would diminish.
The output indicates that besides the mean, standard deviation, and range, Excel produces
other descriptive statistics, including the median (Mdn), the mode (Mo), the lowest and highest
values, and the variance (s2). It will also produce some statistics not yet introduced. Chapter 2
discusses two other statistics Excel produces, skewness and kurtosis. The standard error will
be discussed in Chapter 4.
1.7 The Language of Research
When a formal plan is developed to gather and analyze data, the plan is called a research
design. A research design allows the researchers to gather the relevant data and perform the
analyses needed to answer a research question. The statistical procedures in this chapter and
those that follow are often elements of research designs.
Sometimes a study is conducted to obtain descriptive statistics. For example, someone needs to
know the mean level of education among those who are unemployed, or how much variation is
in autistic clients’ verbal behaviors. Often, however, descriptive statistics are calculated as part
of some more involved research project, like a senior paper or a research report.
Researchers take the time to calculate descriptive statistics because they provide an economical way to understand a larger body of data. In the case of measures of central tendency
and variability, descriptive statistics indicate what is most typical, and how much individual
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 22
3/3/16 10:02 AM
Section 1.7
The Language of Research
measures tend to stray from what is typical. Measures of variability are necessary precisely because
of differences among the measures. If there were
no variability, we would have what is called a
constant value. Constants hold little interest for the
researcher. If Alfred Binet, the father of intelligence
testing, had not observed differences in intellectual
ability, he would have had no reason to develop an
intelligence test. Variables are what make measures interesting and worth studying.
Qualitative and Quantitative
Variables
Eco Images/Getty Images
Research design is a formal plan
that guides researchers in gathering
relevant data and performing the
appropriate analyses.
Qualitative variables are difficult to measure and
reduce to a number. Often they are the nominal
scale, or categorical variables mentioned earlier, that are used to classify peoples’ demographic characteristics, such as religious persuasion or national origin. Sometimes qualitative
variables refer to emotional characteristics, such as passion or discouragement, or they may
refer to traits such as intelligence or creativity that can be difficult to quantify.
Quantitative variables are any variables where numbers reflect the amount of what is measured, such as age or spelling ability. With these variables, a greater value indicates more
of the measured characteristic. Since many demographic variables are qualitative, quantitative research typically involves both kinds of variables. Strictly speaking, however, research
involving procedures designed specifically for qualitative and quantitative variables is called
mixed methods research.
Dependent and Independent Variables
Research designs always identify the variables the researcher believes to be relevant to an
outcome. The outcome itself is the dependent variable; it is the affected variable or the consequence variable. The variable thought to help bring about the effect is the independent
variable. It is tempting to say that the independent variable causes the dependent variable,
but cause is difficult to demonstrate in social science research. The problem is not that causes
do not exist; the problem is that—particularly with human-subjects research—they are difficult to verify.
Perhaps, having read the research on community service, a psychologist wonders whether
service to other people reduces individuals’ feelings of discouragement. The psychologist
develops a plan, a research design, to see whether serving in a soup kitchen for the poor (the
independent variable) is associated with lower feelings of discouragement in the volunteer
(the dependent variable).
As the psychologist executes the design, descriptive statistics will be calculated for the independent and dependent variables—the mean level of discouragement, the standard deviation
of the hours served by subjects in the experiment. But in this instance, the descriptive statistics are components of a broader purpose, which is to determine the relationship between the
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 23
3/3/16 10:02 AM
Summary and Resources
independent and dependent variables. By the way, even if discouragement declines among
those serving, it will be difficult to attribute the change to serving. For instance, if discouragement declines among people who serve, increased social interaction might be the factor
that reduces discouragement, or perhaps goal-directed behavior or some other variable is the
cause of the outcome.
Summary and Resources
Chapter Summary
Part of the transition into any new discipline is learning the terminology necessary to have a
common language. Part of the language of statistical analysis is labeling data which brought
us to the scale of the data. Recall that scale (nominal, ordinal, interval, and ratio) refers to
the kind and the quantity of information that the data provide (Objective 1). Data scale also
helps us determine which statistics we can calculate (Objective 2).
Descriptive statistics can provide a great economy when data sets increase in size. The central tendency measures (mean, median, and mode) suggest what is most representative in a
data set, although they each define what is typical differently. For that reason, these measures are often reported together (Objective 3).
Measures of variability complement measures of central tendency. When the standard deviation, the variance, or the range is calculated and reported with the mean, we have a view of
not just what is most typical but also of how homogeneous the data are (Objective 4).
The descriptive characteristics of populations are referred to as parameters, and technically, the word statistic refers to a sample characteristic. Recall that the notations used for
the mean and standard deviation in samples are M and s, respectively, and that the notations
used for population are µ and σ, respectively (Objective 5). To gain a working knowledge
of statistics, it is important to study often—daily, if possible. Several frequent, brief study
sessions tend to be more productive than less frequent, more intensive study periods.
In statistical analysis, we face problems that are not common to many other disciplines.
Unlike English or journalism, for example, statistical reasoning emerges less often in ordinary conversation. Outside of study sessions, it can be helpful to look for ways to apply concepts, which is one of the reasons that frequent study is important. As sensitivity increases,
students begin to recognize situations in which people calculate the wrong descriptive
statistics or explain their data improperly. Those experiences should prompt the healthy
skepticism important to all scholarship.
Many dedicated software packages are available for statistical analysis: SPSS, SAS, and
SYSTAT are three of the more prominent. However, Microsoft Excel is probably more
accessible than any of them, so it will be used in this course. Besides doing calculation
and analysis, Excel also makes it easy to produce some of the graphs and other data displays
that later chapters will use.
Statistical concepts have an incremental nature. The topics developed in each chapter
become part of a more involved topic in later chapters. Virtually no concept is raised,
discussed, and then permanently set aside, so frequent review is valuable. The questions
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 24
3/3/16 10:02 AM
Summary and Resources
in the Try It! boxes are intended to keep students thinking about important concepts. The
answers to these questions follow.
Besides the homework that instructors assign, the end-of-chapter review questions will
help students judge their level of understanding. If the questions seem difficult, reread the
relevant section, and then tackle the problem again. The answers to the odd-numbered
items are in the back of the book.
The author’s hope is that students do not attempt to simply push their way through this
course, but rather become enamored with the concepts. To that end, more important than
students’ natural talent are their drive, their frequent study, and their willingness to be open
to thinking differently—all variables students can control. Onward!
Key Terms
bias In statistical analysis, a consistent error
of the same nature. If a sample is drawn
which distorts some characteristic of the population, the nature of the parent population
will be misrepresented in all experiments
involving the sample. Results are distorted
(biased) in a way that can’t be corrected by
simply repeating the experiment.
constants Also called constant values; have
only one value. For example, the temperature at which water boils at sea level is a
constant value, 212 degrees Fahrenheit.
data scale The kind of information that
data values provide. The scales of data
include nominal data, which define category;
ordinal data, which allow ranking; interval data, which have consistent increases/
decreases between consecutive data points;
and ratio data, which have a meaningful 0.
degrees of freedom (df) The number of
measures in a procedure that are free to
vary, or to have any value, when the result
is known. For the standard deviation, for
example, df 5 n 2 1, which means that if
there are 10 values on which the standard
deviation calculation is based, 9 of them may
have any value, so long as the 10th produces
the correct result.
dependent variable In a research problem,
the variable affected by the treatment.
descriptive statistics Provide values that
define the characteristics of a data set. Typical descriptive statistics are measures of
what is most typical and measures of how
different individual values are from each
other.
independent variable In a research
problem, the antecedent variable expected
to explain any change in the dependent
variable.
inferential statistics Procedures that allow
the analyst to draw inferences or conclusions from the data. It is common in statistical analysis, for example, to deduce the
characteristics of the population from what
occurs in a sample.
mean (M) The arithmetic average of a set of
values. The mean of a population is represented by the symbol µ.
measures of central tendency Indicate
what measure is most typical in a data set
and include the mean, median, and mode.
measures of variability Indicate how
much variety there is in a set of data values.
Also called measures of dispersion.
median (Mdn) The middle number when
data are ordered.
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 25
3/3/16 10:02 AM
Summary and Resources
mixed methods research Research involving both qualitative and quantitative variables, as well as methods that are appropriate to each.
mode (Mo) The most frequently occurring
value in a set.
parameters Population characteristics;
symbolized by Greek letters, such as µ for
the population mean and σ for the population standard deviation.
population Includes all members of a
defined group.
qualitative variables Defined by the kind
of characteristic they represent, such as
gender or eye color.
quantitative variables Defined by the
amount of the characteristic they represent,
such as intelligence.
range (R) The difference between the highest and lowest values in a data set.
research design A formal plan for conducting a study. It specifies the variables to be
studied; indicates who the subjects in the
experiment will be, as well as how they will
be selected; specifies how the data will be
gathered, including how the independent
variable will be manipulated; and indicates
the type of analysis to be used.
sample Any subset of a population.
standard deviation The sample standard
deviation (s) and the population standard
deviation (σ) indicate how much individual
scores tend to differ from the mean of the
respective group.
statistic A characteristic of a sample. Some
common statistics are the mean (M), the
standard deviation (s), and the range (R).
variables Characteristics that can have
changing values.
variance (s2) The square of the standard
deviation. The variance is one measure of
how much individual scores differ from the
mean of the group.
Review Questions
Answers to the odd-numbered questions are provided in Appendix A.
1. A researcher is interested in how people of different political affiliations contrast.
a. Their different political affiliations represent data of what scale?
b. If the contrasted characteristic is income, what is the scale of that variable?
c. Arranging people from most involved to least involved in politics results in data
of what scale?
d. What measures of central tendency should be calculated for ordinal scale data?
2. A psychologist tracks the number of times a subject responds to a stimulus.
a. Of what scale are the data indicating the number of responses?
b. What measure or measures of central tendency are appropriate for data of this
scale?
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 26
3/3/16 10:02 AM
Summary and Resources
3. A group of clients being treated for substance abuse have the following measures on
a compulsive behavior scale: 55, 47, 62, 27, 50, 49, 66, 53, 50, 44, 63, 59. Calculate
the following:
a. the mean b. the median c. the range d. the standard deviation
4. Subjects in a research project are classified according to whether they have been
involved in a relaxation therapy program. After some receive the therapy, they rank
themselves from 1 to 10 according to how calm they feel.
a. Of what scale are the data regarding whether they participated?
b. What measure(s) of central tendency is/are appropriate for the attended/didnot-attend data?
c. What is the scale of the level-of-calm data?
5. These scores are measures of compulsive behavior for those involved in a therapy
session for people with compulsive disorders: 24, 25, 28, 28, 31, 33, 36, 36, 36, 39,
40, 53, 54.
a. Calculate the mean, the variance, and the range.
b. If a new client with a score of 35 is added,
i. What happens to the mode? Why?
ii. What is the effect on the variance?
iii. What is the effect on the range?
iv. Why are the range and variance affected differently?
v. Would adding a 22 increase or decrease the variance? Why?
6. Why is calculating the mean inappropriate for data of ordinal scale?
7. Why are the standard deviation and variance inappropriate for ordinal scale data?
8. A researcher is trying to predict retirements and has surveyed the number of years a
group of social workers have been employed. The data are as follows: 2, 5, 8, 11, 13,
16, 22, 27.
a. What is the scale of the data?
b. Calculate values for the mean and the median.
c. Without doing the calculations, explain the effect on the standard deviation of
including a ninth person who has been employed 13 years.
d. What will be the effect of that ninth person’s tenure on the value of the range?
e. Which would increase most dramatically by the addition of someone who had
been employed 28 years: the range or the standard deviation?
Check your answers to c, d, and e with calculations.
9. Determine the scale in each situation below. Psychologists are:
a. listed according to how many clients they see.
b. classified according to whether they are public employees or private practitioners.
c. ranked according to how much their receptionists like working for them.
For each of the situations in a, b, and c, what is the most sophisticated measure of
central tendency that can be calculated?
For which situation should one calculate a standard deviation?
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 27
3/3/16 10:02 AM
Summary and Resources
10. What is the difference between M and µ?
11. What does σ symbolize?
12. Why is the “correction for a biased estimator” inserted into the standard deviation
and variance formulas for samples?
13. A researcher wants to determine the impact that positive reinforcement has on
subjects’ response rates.
a. What is the independent variable?
b. What is the dependent variable?
14. Using your recently developed “AssessmeNt Gauging Supervisors’ Tension” instrument (ANGST, for short), you gather anxiety data for officers in law enforcement with
supervisory responsibility. Their scores are as follows: 11, 19, 20, 21, 19, 14, 19, 21.
What are the values of the following?
a.
b.
c.
d.
n
M
Mdn
s2
Answers to Try It! Questions
1. Absolutely! You must have driven through rural little towns out in the middle of
nowhere with signs indicating a population of 22. All that is required for a population is that all individuals in the described group be included. Although we tend to
think of populations as large by definition, they need not be.
2. a. Anything that references a “more than” or “less than” comparison will be ordinal
scale.
b. Your bank statement is ratio scale. If it says you have a zero balance, it means you
have no money in the account.
3. The effect on the range is going to be minimal; it is going to increase by just one.
While it is impossible to know the exact effect on the standard deviation without
having all of the data, it will likely to be more than one because the difference
between the individual score, 12 in this case, and the mean (M 5 6.0) will be
squared (12 – 6 5 6, 62 5 36) and then summed with the other squared differences.
4. In statistics, bias means that a consistent error is made in the same direction. A
test that consistently predicts the problem-solving ability of members of one group
better than it predicts for another has biased scores. In our case, using population
formulas to calculate the standard deviation from sample data will consistently
underestimate the variability in the parent population. That consistent error constitutes bias and is why the “correction for a biased estimator” (the 21 correction in
the denominator) is inserted into the formula for sample data.
© 2016 Bridgepoint Education, Inc. All rights reserved. Not for resale or redistribution.
tan82773_01_ch01_001-028.indd 28
3/3/16 10:02 AM
For this discussion, you will organize data sets into meaningful number groupings, calculate
basic descriptive values, and communicate written critiques of statistical analyses.
This exercise requires the use of a descriptive statistics calculator. You can find this tool in some
versions of Excel (as part of the Analysis ToolPak) or you can use one of the many free online
descriptive calculators such as the Descriptive Statistics Calculator (Links to an external
site.)Links to an external site. by Calculator Soup.
To begin, come up with 20 different data points (that will form a set of data) and enter them into
the first column of an Excel spreadsheet. The data points can be any numbers you want as long
as there are 20 of them.
You will then use the descriptive statistics option in your descriptive statistics program or
calculator. This is explained in Chapter 1 of your course text. You should get an output similar
to the image in Figure 1.1.
This output must contain the following values: mean, standard error, median, mode, standard
deviation, sample variance, kurtosis, skewness, range, minimum, maximum, sum, and
count. Address the following points in your initial post:
•
•
•
Begin your discussion by reporting your results for each of the values listed above.
Based on this output, which single value best describes this set of data and why?
If you could pick three of these values instead of only one, which three would you choose
and why?
It is important to note that the answers to these questions may be different for each of you since
you are each using unique sets of data.
For this exercise, you may elect to use the "Analysis ToolPak" within Excel. This feature is
already part of the 2007 and 2010 Excel program for Windows; however, it must be
activated. The following directions were provided by the Help function within Excel. If you
experience issues while following these steps, utilize the Help function within Excel or contact
Microsoft's technical support for Excel.
The Analysis ToolPak includes the tools described below. To access these tools, click Data
Analysis in the Analysis group on the Data tab. If the Data Analysis command is not available,
you need to load the Analysis ToolPak add-in program.
Load the Analysis ToolPak:
1. Click the File tab, click Options, and then click the Add-Ins category.
2. In the Manage box, select Excel Add-ins and then click Go.
3. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK.
Tip: If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it.
If you are prompted that the Analysis ToolPak is not currently installed on your computer, click
Yes to install it.
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
Purchase answer to see full
attachment