5
Reinforcement
Learning Objectives
After reading this chapter, you should be able to do the following:
• Define Thorndike’s Law of Effect and explore the difference between reinforcement and
classical conditioning.
• Understand the Premack principle and its practical applications.
• Identify three different types of reinforcers: primary, secondary, and social.
• Recognize the importance of the concepts of positive and negative reinforcement.
• Describe why a delay in the presentation of a reinforcer can seriously undermine its effectiveness.
• Explain the various schedules of reinforcement and the effects of these schedules on rate and
pattern of responding.
• Examine the relationship between motivation and reinforcement, including contrast effects
and the Yerkes-Dodson law.
• Discuss the concept of stimulus control.
lie6674X_05_c05_153-200.indd 153
4/9/12 8:21 AM
Section 5.1 Thorndike’s Law of Effect
CHAPTER 5
Using a reward is one of the most obvious ways to encourage a behavior. Parents praise
children for good behavior; companies pay salespeople bonuses for high output; universities promote productive researchers. There is nothing new or profound about the idea
of using rewards to increase desirable behavior—the principle was probably known long
before the discovery of fire.
If the principle of reward is so obvious, though, why is behavior often so hard to change?
Why do parents find it so difficult to get their teenage children to clean their rooms? Or, to
take a more immediately relevant example, why do students sometimes find it so difficult
to make themselves study? There are, after all, very powerful rewards for studying: in the
short term, good course grades; in the longer term, a better job. Yet students often leave
studying until the last minute, and some don’t get around to it at all. Similarly, smoking
and overeating can take years off our lives, and people are often desperate to give up these
habits; yet the habits persist. Why is behavior in these situations apparently so irrational,
with rewards as potent as a good job and longer life having little effect? Clearly, the principle of reward cannot be quite as simple as it sounds.
To understand why rewards seem to control behavior in some situations but not others,
we will examine experimental research into the principles that determine the effectiveness of rewards. Then, in Chapter 6, we will examine some of the attempts that have been
made to apply the principles discovered in the laboratory in real life, and what these
attempts have revealed about both the strengths and weaknesses of rewards as a tool for
altering behavior. We will begin, though, with the first experimental study of rewards, by
Edward Lee Thorndike.
5.1 Thorndike’s Law of Effect
Edward Thorndike (1874-1949), shown at the left of this
photo, was an American psychologist who spent nearly all of
his academic life at Columbia University in New York City. His
contributions in the areas of intelligence and learning helped
develop the field of educational psychology.
lie6674X_05_c05_153-200.indd 154
Thorndike’s research, like Pavlov’s, had its roots in the philosophy of Associationism, but
its most immediate antecedent
was the publication in 1859 of
Charles Darwin’s Origin of Species. Darwin’s theory of evolution proposed that man was but
one animal species among many,
and this claim triggered a surge
of interest in the intelligence and
reasoning powers of animals. If
Darwin was correct in his belief
that we are closely related to
other animal species, then the
traditional view that animals
are dumb brutes becomes far
less attractive. After all, if our
close relatives were dumb, what
might that imply about us?
3/15/12 7:53 AM
Section 5.1 Thorndike’s Law of Effect
CHAPTER 5
Are Animals Intelligent?
To lay the basis for a more realistic judgment, a contemporary
of Darwin’s named George
Romanes collected observations of animal behavior from
reliable observers around the
world. When published, the
material in Romanes’s Animal
Intelligence (1881) seemed to
strongly support Darwin’s thesis, as anecdote after anecdote
revealed impressive powers of
reasoning. Thorndike, however, Rodin’s The Thinker poses an interesting question: Can animal
was skeptical of these accounts. species think as deeply as humans? Thorndike’s research
For one thing, he wondered if suggested that they can’t, but, as we shall see in Chapter 8, this
the observations were entirely conclusion was to prove controversial.
accurate—observers’ memories
might have become distorted over time, and anecdotes might have been exaggerated as
they were told and retold. Even if an incident were described correctly, it might not be
representative of the species’ typical behavior. As Thorndike noted,
Dogs get lost hundreds of times, and no one ever notices it or sends an
account of it to a scientific magazine. But let one find his way from Brooklyn to Yonkers and the fact immediately becomes a circulating anecdote.
Thousands of cats on thousands of occasions sit helplessly yowling, and
no one takes thought of it or writes to his friend, the professor; but let one
cat claw at the knob of a door supposedly as a signal to be let out, and
straightaway this cat becomes the representative of the cat-mind in all the
books. (Thorndike, 1898, p. 4)
The Law of Effect
To remedy these defects, Thorndike argued, “experiment must be substituted for observation and the collection of anecdotes. Thus . . . you can repeat the conditions at will, so
as to see whether or not the animal’s behavior is due to mere coincidence.” To this end,
Thorndike began to study learning in animals using an apparatus that he called a puzzle
box. Basically, it was little more than a wooden crate with a door that could be opened by
a special mechanism, such as a latch or rope (see Figure 5.1). Thorndike placed a dish containing food outside the box but visible through its slats, then put the animal to be tested
inside and observed its reactions.
lie6674X_05_c05_153-200.indd 155
3/15/12 7:53 AM
Section 5.1 Thorndike’s Law of Effect
CHAPTER 5
Figure 5.1: Thorndike’s puzzle box
In Thorndike’s design, a dish of food was placed outside of the box, visible through the slats in the box.
Thorndike found that animal subjects placed in the box would eventually locate the release apparatus
and the latency of this response was shorter with each subsequent trial.
Source: Thorndike, 1911
When a hungry cat was first placed in the box, it would scramble around, frantically clawing and biting at the sides of the apparatus in an attempt to escape and reach the food.
After approximately 5 to 10 minutes of struggling, the cat would eventually stumble on
the correct response—for example, pressing a latch—and, finding the door open, would
rush out and eat the food. According to Romanes’s anecdotes, this success should have
led to the immediate repetition of the successful response on the following trial. Instead,
Thorndike found that the animal generally repeated the frantic struggling observed on
the first trial. When the cat finally did repeat the correct response, however, the latency
of this response—the amount of time that elapsed from the point when it entered the box
to the point when it performed the response—was generally shorter than it had been on
the first trial. It became shorter still on the third trial, and so on. Figure 5.2 presents representative records of the performance of two cats. Progress in both cases was gradual and
marked by occasional reversals, but on average the time to escape became progressively
shorter as training continued.
lie6674X_05_c05_153-200.indd 156
3/15/12 7:53 AM
Section 5.1 Thorndike’s Law of Effect
CHAPTER 5
Figure 5.2: Changes in latency
This figure shows the changes in the latency of escape from the puzzle box over trials for two of
Thorndike’s cats.
This slow and irregular improvement did not suggest that the cats had formed any rational understanding of the situation. Instead, Thorndike argued, the food reward was gradually stamping in an association between the escape response and the stimuli present
when it was made (the visual appearance of the box, its odor, and so on). This stimulusresponse or S-R association would be strengthened every time the cat was rewarded,
until eventually the box cues would elicit the correct response the instant the cat was
placed inside it. Thorndike repeated this experiment with other responses and also with
other species, including chicks, dogs, and monkeys. The basic pattern of the results was
almost always the same: a gradual improvement over many trials. This uniform pattern
suggested that the gradual strengthening effect of rewards was not confined to a single
situation or species but, rather, represented a general law of behavior, which Thorndike
called the Law of Effect:
Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things
being equal, be more firmly connected with the situation, so that, when it
recurs, they will be more likely to recur . . . . The greater the satisfaction . . .
the greater the strengthening . . . of the bond. (Thorndike, 1911, p. 24)
Some Controversial Issues
When Thorndike’s findings were published, they aroused considerable controversy. One
focus of debate was his claim that much if not all of animals’ seemingly intelligent behavior could be explained by the formation of associations. We will examine this issue in more
depth in Chapter 8; for now, we will simply focus on two other aspects of his findings that
attracted attention.
lie6674X_05_c05_153-200.indd 157
3/15/12 7:53 AM
Section 5.1 Thorndike’s Law of Effect
CHAPTER 5
I Can’t Get No Satisfaction
Thorndike was criticized by behaviorists for his use of the term satisfaction, which refers
to a subjective or mental state. We can’t see into the mind of a cat, so how can we know
whether it is experiencing satisfaction? This difficulty in assessing satisfaction makes the
Law of Effect potentially circular: A response will increase if it is followed by a satisfying
outcome, but the only way we know whether the outcome is satisfying is if the response
increases! In fact, Thorndike was aware of this problem, and he proposed an independent
and objective test for determining whether a consequence was satisfying.
By a satisfying state of affairs is meant one which the animal does nothing
to avoid, often doing such things as attain and preserve it. (Thorndike,
1911, p. 245)
In other words, if a cat repeatedly tries to obtain food in one
situation—for example, by
jumping up onto a table where
food is kept—then by definition
this food must be satisfying, and
the Law of Effect now allows
us to predict that the food will
also be an effective reward for
other behaviors, such as escaping from the puzzle box. Meehl
(1950) later labeled this property
of rewards transituationality.
Thorndike’s objective definition Lasagna was a powerful reinforcer for Garfield, the famous
of satisfaction saves the Law cartoon cat created by Jim Davis in the late 1970s, to the point
of Effect from circularity, but where it sometimes seemed as if he would do anything to
the term still bothered learning obtain it. This property of a reinforcer—that its effectiveness
theorists because of its subjec- is not confined to one situation but can strengthen a wide
tive connotation that a reward range of behaviors in a wide range of situations—is known as
is emotionally satisfying. An transituationality.
experiment by Sheffield, Wulff,
and Backer (1951) illustrates the dangers. To study what events are rewarding, they used
an apparatus called a straight-alley maze, which consists of a start box and a goal box
connected by a long alley (see Figure 5.3). To find out if a stimulus is rewarding, the stimulus is placed in the goal box, the subject is placed in the start box, and the experimenter
records how long it takes for the subject to run to the goal box. If the stimulus is rewarding, it should strengthen the response of running, and the speed of running down the
alley should thus increase over trials.
lie6674X_05_c05_153-200.indd 158
3/15/12 7:53 AM
Section 5.1 Thorndike’s Law of Effect
CHAPTER 5
Figure 5.3: Straight-alley maze
Start Box
Goal Box
food
In a straight-alley maze, a rat runs from aa.start
box (here, atMaze
the left) to a goal box; the goal box usually
Straight-alley
contains a reward such as food.
The experimenters used male rats as subjects and a receptive female in the goal box as
the reward. The normal copulatory pattern in rats consists of a series of 8 to 12 intromissions and withdrawals by the male until it finally ejaculates. When the male reached the
goal box, the experimenters allowed it two intromissions, and then abruptly removed it
from the goal box. Intuitively, it is not obvious that males would regard this as a satisfying
experience, but it proved to be a very powerful reward, as their speed of running down
the alley increased over trials by a factor of eight!
Such evidence makes it at least questionable whether all events that strengthen behavior
are emotionally satisfying, and it has led learning theorists to prefer the more objective
term “reinforcer” to “reward.” A reinforcer can be defined as an event that increases the
probability of a response when presented after that response. Similarly, we can define
reinforcement as an increase in the probability of a response caused by the presentation
of a reinforcer following that response. So, for example, if a child receives a piece of candy
every time she does her homework, and this makes her sit down to do her homework
more frequently, the candy is considered a reinforcer; it has strengthened or reinforced
this behavior.
Reinforcement Versus
Conditioning
The main difference between classical conditioning and
reinforcement is that in classical conditioning, a US such as
food is delivered regardless of what a person is doing at the
time, whereas in reinforcement the person must perform a
response—for example, work—to obtain it.
lie6674X_05_c05_153-200.indd 159
A further issue arising from
Thorndike’s work concerned
the relationship between the
learning he described and that
described by Pavlov. The procedures used by the two investigators clearly differed. Pavlov arranged a contingency
between two stimuli: food, for
example, was presented following a tone but not in its absence.
Thorndike, on the other hand,
arranged a contingency between
a response and a stimulus: Food
was presented only after a correct response. If we used the
3/15/12 7:53 AM
CHAPTER 5
Section 5.2 The Reinforcer
symbol S to represent a stimulus, R to represent a response, and S* to represent a consequence, then we can represent the two forms of learning as follows:
Classical conditioning: S
S*
Reinforcement: R
S*
Carrying this point a bit further, in classical conditioning the presentation of food depends
solely on whether the CS has been presented: Whether a dog salivates has no effect on
whether food is given. In reinforcement, on the other hand, the presentation of food
depends crucially on the subject’s response: No response, no food.
In procedural terms, classical conditioning and reinforcement clearly differ. This, however, does not necessarily mean that the learning processes involved are different. As we
suggested in our discussion of classical conditioning and causal learning, a single learning process could be involved in detecting relationships between events, regardless of the
nature of the events concerned. Thus, although the procedures used in classical conditioning and reinforcement are different, the underlying processes could be the same.
If the learning processes were the same, we should expect the principles of classical
conditioning and reinforcement to be similar if not identical. Just as classical conditioning depends on how closely the US follows the CS, for example, so the effectiveness of
reinforcement should depend on how closely the reinforcer follows the response. As we
shall see shortly, contiguity is indeed critical in reinforcement, and many of the other
principles of conditioning and reinforcement also turn out to be the same. (For further
discussion, see Colwill & Rescorla, 1990; Williams, Preston, & de Kervor, 1990.) For our
present purposes, though, the key point to note is the distinction between the two procedures. In both reinforcement and classical conditioning, a response is strengthened
because of the presentation of an event such as food: However, in reinforcement food is
delivered following a response, whereas in classical conditioning food is delivered following a stimulus.
5.2 The Reinforcer
One obvious determinant of whether a reward will strengthen behavior is the attractiveness of the reward. As in the classic recipe for elephant stew—where the first step is said
to be “catch an elephant”—the first step in using reinforcement effectively is to identify a
suitable reinforcer.
Primary Reinforcers
The most obvious candidates for reinforcers are stimuli that are necessary for survival,
such as food and water. It makes sense that such stimuli would become reinforcing in the
course of evolution because an animal that repeats a response that has led to food is likely
to have a better chance of obtaining food in the future. Thus, a gene that enabled food to
be established as a reinforcer would be likely to be transmitted to future generations. It
therefore came as no surprise when early research demonstrated that stimuli such as food,
lie6674X_05_c05_153-200.indd 160
3/15/12 7:53 AM
Section 5.2 The Reinforcer
CHAPTER 5
water, and sexual intercourse were all reinforcing. These types of stimuli are known as
primary reinforcers, and they are effective
essentially from birth.
In the early 1950s, evidence began to accumulate that not all reinforcers were necessary for survival, at least not in the simple
physical sense in which food is reinforcing.
In an experiment by Butler (1954), monkeys
were placed in an enclosed cage with two
wooden panels, one painted yellow and the
other blue. If a monkey pushed open the
blue door, it was allowed to look out into the
experimental room beyond for a period of 30
seconds. If it pushed against the yellow door,
an opaque screen immediately came down, A rat’s reward for navigating this maze successfully
terminating the trial. In this experiment, would be cheese, but research on sensory
the reinforcer was the monkey’s view of the reinforcement suggests that a rat might enjoy
room—in other words, access to visual stim- simply running through the maze! That is, if the rat
were given a choice between a path that always
uli outside the confines of its box. Not only
led to this maze and a second path that always led
did the monkeys quickly solve this problem, to an empty box of the same size, it might learn to
learning to push only the blue door, regard- choose the path that led to the more complex (in
less of the side on which it was presented, human terms, interesting) environment.
they proved remarkably persistent in performing the response. In one experiment in
which there was a trial once a minute—that is, a 30-second opportunity to look out into
the room, followed by a 30-second blank interval—one subject responded on every single
trial for nine hours without a break. A second subject responded for 11 hours, and a third
for an extraordinary 19 consecutive hours.
Visual access to the surrounding room was clearly not necessary for the monkeys’ survival in any direct sense, but it proved to be a remarkably potent reinforcer. As Butler
commented, “That monkeys would work as long and as persistently for food is highly
unlikely.” Visual stimulation now appears to be only one example of a large set of events
that Kish (1966) has referred to as sensory reinforcers. The most important characteristic of these reinforcers seems to be that they provide variety in our perceptual environment. Rats, for example, prefer to explore complex mazes with many turns rather than to
explore simple ones (Montgomery, 1954), and humans confined in a dark room will push
a button that turns on a panel of flashing lights, with the rate of button-pushing increasing as the pattern of lights becomes less predictable (Jones, Wilkinson, & Braden, 1961).
Sensory reinforcers are also primary reinforcers, since they require no special training to
be effective.
The Premack Principle
The evidence that sensory stimulation can be reinforcing suggests that not all reinforcers
are physically necessary for survival. Is there any other characteristic, then, that reinforcers such as hamburgers, sex, and flashing lights share? Perhaps the most useful integrating
lie6674X_05_c05_153-200.indd 161
3/15/12 7:53 AM
Section 5.2 The Reinforcer
CHAPTER 5
principle is one suggested by David Premack
(1965, 1971). Premack argued that different experiences all have different values for us and that these
values can be inferred by observing the amount of
time in which we engage in these activities when
they are freely available. The common characteristic of reinforcers, said Premack, is that they are all
high-probability activities. Is it possible, then, that
any high-probability activity will reinforce any
response that has a lower probability?
Video games are clearly not necessary
for physical survival, but some children
would gladly spend most of their free time
playing them. For these children, playing
with video games is a high-probability
activity, and according to David Premack,
access to any high-probability activity
can be used to reinforce less probable
activities. Parents who allow children to
play video games only when they have
completed their homework are effectively
applying the Premack principle.
Suppose that a group of children were given free
access to a number of foods and were found to
prefer potatoes to spinach, but to strongly prefer ice cream to both of them. If high-probability
responses reinforce lower-probability responses,
then—as all parents know—we should be able to
use access to ice cream to reinforce eating spinach.
However, we should also be able to use access to
potatoes to reinforce eating spinach, albeit less
effectively, because eating potatoes is also a higherprobability response. Premack (1965) tested predictions like these in a series of experiments involving rats and children, and on the whole the results
were positive. The suggestion that more probable
responses will reinforce less probable responses
thus became known as the Premack principle.
A subtle-but-important elaboration of this principle is known as the response deprivation hypothesis (Timberlake & Allison, 1974). This hypothesis
states that whether an activity will serve as a reinforcer depends on whether the current level of the
activity is below its preferred level. If a child, given a free choice, will eat two ice cream
bars a day, then access to ice cream will be reinforcing if the child currently has access to
less than two bars. If, however, the child has already eaten four ice cream bars, access to
ice cream will most likely not be an effective reinforcer.
A “Childish” Application
Homme, deBaca, Devine, Steinhorst, and Rickert (1963) reported a particularly delightful application of the Premack principle. The subjects were unruly three-year-olds who
repeatedly ignored their nursery school teacher’s instructions and, instead, raced around
the room screaming and pushing furniture. One common reaction of adults in such situations is to lose their tempers and punish the children to get them to do as they are told.
Instead, Homme and his co-workers set out to reinforce good behavior through a judicious
application of the Premack principle. They reinforced the children’s behavior whenever
the children sat and played quietly for a specified period of time, with the reinforcer being
several minutes of uninterrupted running and screaming! In only a few days, the children
lie6674X_05_c05_153-200.indd 162
3/15/12 7:53 AM
Section 5.2 The Reinforcer
CHAPTER 5
were obeying the teacher’s instructions almost perfectly, so that “an observer, new on the
scene, almost certainly would have assumed extensive aversive control was being used”
(Homme et al., 1963). Later on, new and even better reinforcers were developed through
continued observation of the children’s behavior, including such unusual rewards as
allowing the children to throw a plastic cup across the room, to kick a wastepaper basket,
and, best of all, to push the teacher around the room in a swivel chair on rolling wheels!
The moral of this story is that it is a mistake to think of reinforcers in terms of a restricted
list of “approved” stimuli. There is no magic list of reinforcers; the best way to determine
what will be reinforcing is to observe what activities a subject engages in when given a
free choice.
Secondary Reinforcers
In contrast to primary reinforcers, which are effective from birth, some of the most powerful reinforcers affecting our behavior are secondary or conditioned reinforcers, which
have acquired their reinforcing properties through experience. Money, for example, is not
at first an effective reinforcer. Showering an infant
with dollar bills is unlikely to have any discernible
impact on the infant’s behavior. As we grow older,
though, money becomes increasingly important;
in some cases, it becomes an obsession. How, then,
do secondary reinforcers, such as money or the
word “good,” acquire their reinforcing properties?
John B. Wolfe (1936) made one of the first attempts
to answer this question by examining whether
the powerful effects of money in real life could
be reproduced in the animal laboratory. Using six
chimpanzees as subjects, Wolfe first trained them
to place a token into a vending machine to obtain
grapes. Once they had mastered this task, they
were given a heavy lever to operate to obtain further tokens; Wolfe found that they would work
as hard to operate the lever when the reward was
tokens as when the reward was the grapes themselves. Furthermore, their behavior bore some
striking similarities to that of humans regarding
money. In one experiment in which the chimpanzees were tested in pairs, Wolfe found that the
dominant member of the pair would sometimes
push aside its subordinate to gain access to the
lever. If the subordinate had already amassed a
pile of tokens, then the dominant member might
simply take them away. In one of the pairs, however, the subordinate, a chimp named Bula, developed an effective counterstrategy. She would turn
toward her partner, Bimba, extend her hand palm
up, and begin to whine. This apparent begging
lie6674X_05_c05_153-200.indd 163
The tokens that issue from slot machines
can be thought of as secondary reinforcers.
Some individuals will sit in front of a
slot machine for hours, waiting for the
occasional payout of tokens, which they can
exchange for real money. The reinforcing
properties of tokens are thus acquired
through their association with a conditioned
reinforcer such as money, which itself
becomes a reinforcer through its association
with primary reinforcers such as food and
other goods that are necessary for survival.
3/15/12 7:53 AM
Section 5.2 The Reinforcer
CHAPTER 5
was invariably successful: As soon as Bula began to whine, Bimba would quickly hand
her one of the tokens and would continue doing so until she stopped whining.
As in this study, secondary reinforcers generally acquire their reinforcing properties
through pairing with primary reinforcers. (For reviews, see Fantino, 1977, and Williams,
1994. For a possible exception, see Lieberman, 1972, and Lieberman, Cathro, Nichol, &
Watson, 1997.) If you wanted to establish a word such as “good” as a secondary reinforcer
for a child, for example, you would want to ensure that this word was followed by other
reinforcers such as hugs or candy. And, once “good” had become an effective reinforcer,
you would want to continue to pair it with backup reinforcers at least occasionally. As
with classical conditioning, continually presenting a secondary reinforcer by itself is likely
to extinguish its reinforcing properties (for example, Warren & Cairns, 1972).
Social Reinforcers
A third category (one not usually treated separately) is that of social reinforcers—stimuli
whose reinforcing properties derive uniquely from their origin in the behavior of other
members of the same species. Stimuli such as praise, affection, comfort and even simple
attention from another person can be reinforcing. In fact, this category of reinforcers is
probably the one that we encounter most often in our daily lives, and it plays an important—and often underestimated role—in controlling our behavior.
Social reinforcers are a blend of both primary and secondary reinforcers. Poulson (1983)
found that an adult’s smile could reinforce behavior in infants as young as three months,
suggesting that smiling is innately reinforcing.
But considerable evidence also indicates that the
power of social reinforcers can be altered by pairing them with other reinforcers. The reinforcing
properties of the word “good,” for example, can be
increased by following this instance of praise with
candy (Warren & Cairns, 1972). Thus, although
social reinforcement might have an innate basis,
experience also plays an important role.
In the photograph above, actress Judy
Garland is shown hugging her daughter
Liza Minnelli. Hugging can be a very
powerful social reinforcer for children.
lie6674X_05_c05_153-200.indd 164
We can illustrate the power of social reinforcers
with a study by Allen, Hart, Buell, Harris, and Wolf
(1964). The subject was a four-year-old girl, Ann,
who had just started nursery school. From the time
of her arrival, she interacted more frequently with
the adults than the other children, and as time went
on she developed a variety of behavioral problems. She complained frequently about skin abrasions that no one else could see; she spoke in a low
voice that was very difficult to hear; and she spent
increasing amounts of time standing by herself,
pulling at her lower lip and fingering her cheek.
One possible analysis of Ann’s behavior might
have been that she was an insecure and unhappy
3/15/12 7:53 AM
Section 5.2 The Reinforcer
CHAPTER 5
child, and thus needed as much comfort and reassurance as possible to help her adjust
to her new surroundings. The authors’ analysis, however, was quite different. They
noticed that a common feature of Ann’s problem behaviors was that they elicited adult
attention. If she stood by herself, for example, a teacher was likely to come over to ask
what was wrong. If adult attention was reinforcing for Ann, the teachers were inadvertently encouraging the very behaviors they were trying to eliminate. The authors’
advice to the teachers, therefore, was to change the reinforcement contingencies by
paying attention to Ann whenever she played with others but to ignore her when she
stood alone. When Ann did talk or play with other children, a teacher would come over
to Ann, smile, and talk to her about what she was doing. In this way, the teachers began
to reinforce Ann’s social behavior by following her interactions with other children
with attention.
The result was a dramatic transformation in Ann’s behavior. After just a single day, the
proportion of her time spent in social play increased from 10% to 60%, and this higher
level was maintained over subsequent weeks. The frequency of reinforcement—in other
words, the number of times the teachers came over to Ann while she was playing with
others—was then gradually reduced and eventually faded out altogether, but Ann’s social
play remained at a high level. (As her skills in playing with other children increased, this
play probably became its own source of reinforcement.)
Social reinforcers can be very powerful: Even
a small shift in adult attention—not money, not
candy, but simple attention—was sufficient to
substantially alter Ann’s behavior. Also, as often
happens, the crucial role of social reinforcement
in directing Ann’s behavior was not at first appreciated. Actions such as paying attention to someone are such a common part of our lives that we
take them for granted, but, as we shall see again
in other applications, social reinforcement can
play a very powerful role in controlling behavior.
Negative Reinforcers
There is another class of reinforcers that we need
to discuss at least briefly before proceeding. All
of the reinforcers we have discussed to this point
have been positive reinforcers, stimuli whose
presentation will strengthen preceding responses.
However, certain stimuli will strengthen behavior if they are removed, and these stimuli are called
negative reinforcers. Suppose, for example, that
you have the misfortune to move into an apartment where your neighbor plays appallingly
loud music every night. And further suppose
that the room contains a white button mounted
on a wall, and you discover that each time you
push the button the noise stops for one minute.
lie6674X_05_c05_153-200.indd 165
When we take aspirin to relieve a
headache, the subsequent reduction
in pain is very reinforcing and makes
it more likely that we will take aspirin
again the next time we have a headache.
Because the reinforcer is the removal of
something—in this case, headache pain—it
is called a negative reinforcer.
3/15/12 7:53 AM
Section 5.3 Delay of Reinforcement
CHAPTER 5
You would likely develop a real fondness for pushing the button. This is considered an
instance of negative reinforcement: The reinforcer is the cessation of the loud music, and
the response of pushing the button is reinforced, or strengthened, every time the music
stops. In other words, the reinforcer is the removal of an unpleasant stimulus, rather
than the presentation of a pleasant one. (Another example would be taking an aspirin to
relieve the pain of a headache; this behavior would be reinforced by the termination of
pain.) To recap, we talk about a stimulus as a positive reinforcer when its presentation
strengthens a response, but as a negative reinforcer when it is the removal of the stimulus
that is reinforcing.
Note that positive and negative reinforcement are both forms of reinforcement: In both
cases, the outcome is a strengthening of a response. The term “negative reinforcement” is
sometimes misused to mean punishment, but that is a mistake to try to avoid. If the term
reinforcement is used, it always means a strengthening of behavior; in negative reinforcement this is achieved by removing an unpleasant or undesirable stimulus.
In some ways the distinction between positive and negative reinforcement is purely technical—is the result achieved by presenting a stimulus or removing it?—but it is important
to use the terms correctly so as to avoid misunderstandings.
5.3 Delay of Reinforcement
Having identified a wide variety of potential reinforcers, we turn now to the question of
what determines whether a particular reinforcer will be effective in practice.
Research with Animals
Because of the critical importance of contiguity in classical conditioning, where delays
of even a few seconds between the CS and the US could prevent conditioning, researchers assumed that contiguity would also be critical in reinforcement. Early attempts to
demonstrate this, however, encountered unexpected difficulties. In a study by Wolfe
(1934), for example, rats were allowed to run through a T-shaped maze, or T-maze, to
obtain food (see Figure 5.4). Subjects received food only if they turned to the right, and
Wolfe anticipated that imposing a delay between turning to the right and obtaining food
would make learning difficult. In fact, he found that rats were able to learn the correct
path even with delays of 20 minutes between the time they chose the correct path and
the time they obtained food.
lie6674X_05_c05_153-200.indd 166
3/15/12 7:53 AM
CHAPTER 5
Section 5.3 Delay of Reinforcement
Figure 5.4: T-maze
Goal Box
Goal Box
food
Start Box
In a T-maze, the rat runs from the start box to a choice point, where it can turn to either the right or the
left. In a typical experiment, only one of the goal boxes contains a reward.
b. T-maze
The reason, researchers eventually discovered, was that although the primary reinforcer,
food, was delayed, the response was still producing immediate secondary reinforcement.
When the rats made a correct turn, they immediately entered a delay box where they were
held until they were released into the goal box that contained the food. This meant that
the stimuli of the delay box were present just before they obtained food, and this resulted
in the delay box becoming a reinforcing stimulus: When the rats made a correct response
on subsequent trials, simply entering the delay box reinforced this response (Grice, 1948).
Subsequent research showed that when immediate secondary reinforcement is eliminated,
however, contiguity is just as important in reinforcement as it is in conditioning. A study
by Dickinson, Watt, and Griffiths (1992) provides one example. The authors trained rats in
a standard testing apparatus called an operant chamber or, more colloquially, a Skinner
box. This apparatus was developed by one of the most influential figures in the history
of animal learning research, B. F. Skinner, and is essentially a descendent of the puzzle
box developed by Thorndike. In Thorndike’s box, subjects had to open a door to escape
from the box and obtain food; in the Skinner box, animals remain in the box and make a
response such as pressing a lever to obtain food. (See Figure 5.5 for an illustration of a typical Skinner box.) Because rats are free to press the lever again as soon as they have eaten
the food, it is possible to deliver many reinforcers in a very short period of time, making
the Skinner box a very efficient apparatus for studying the development of learning.
lie6674X_05_c05_153-200.indd 167
3/15/12 7:53 AM
CHAPTER 5
Section 5.3 Delay of Reinforcement
Figure 5.5: Skinner box
Speaker
Pellet
dispenser
Signal
lights
Lever
Dispenser
tube
Food cup
Electric grid
To shock
generator
A Skinner box, or operant chamber, for rats. When the rat presses the bar, a food pellet is delivered to a
tray located below the bar.
In the Dickinson et al. study, the time between pressing the lever and obtaining food was
varied in different groups: Some rats received a food pellet 2 seconds after pressing the
lever, others after delays of up to 64 seconds. As shown in Figure 5.6, the delay used had
a powerful effect on the rate at which the rats pressed the lever. An increase in the delay
of just a few seconds produced sharply lower rates of responding, and responding ceased
altogether when the delay reached 64 seconds.
lie6674X_05_c05_153-200.indd 168
3/15/12 7:53 AM
CHAPTER 5
Section 5.3 Delay of Reinforcement
Figure 5.6: Effects of delayed reinforcement
Lever Pressure per Minute
20
15
10
5
0
0
20
40
60
Delay of Reinforcement (seconds)
This figure demonstrates the effects of delayed reinforcement on bar-pressing in rats. The longer
reinforcement was delayed, the lower was the rate of responding.
Source: From Dickinson, A., Watt, A., & Griffiths, W. J. (1992). Free-operant acquisition with delayed reinforcement. Quarterly Journal
of Experimental Psychology: Comparative and Physiological Psychology, 45B, 241–258, Figure 6. Reprinted by permission of Taylor &
Francis Group.
Why should a delay of just a few seconds have such a powerful impact? At first, learning theorists thought it was because rats have poor memories, so that if a reward were
delayed, they wouldn’t be able to remember the response that produced it. However,
later research made it clear that rats can remember their responses for surprisingly long
periods—in one study by Capaldi (1971), for 24 hours. It now looks as if the problem is not
that rats can’t remember their responses, but rather that they have difficulty figuring out
which of the many responses they have made produced the reward.
From the experimenter’s point of view, the correct response in the Dickinson et al. study
seems obvious, but from the rat’s perspective the situation was far more confusing. Prior
to finding the food it would have been engaged in a continuous stream of activity—grooming, exploring the cage, and so on—and this behavior would have continued during the
delay interval. At any given moment, moreover, it would have been performing many
lie6674X_05_c05_153-200.indd 169
6/5/13 11:37 AM
CHAPTER 5
Section 5.3 Delay of Reinforcement
responses simultaneously. As it pressed the lever, it might have been holding its head at
a 45-degree angle, breathing rapidly, curling its tail to its left side, and so on. Rather than
the simple situation depicted in Figure 5.7a, with just a single response preceding food,
the rat would have experienced a situation more like that shown in Figure 5.7b, with the
correct response (Rc) embedded in a sea of other behaviors.
Figure 5.7: Dickinson study from the experimenter’s and rat’s perspectives
R R R R R R R R R R R R R
R R R R R R R R R R R R R
R R R R R R R R R R R R R
Rc
Food
R R R R R R Rc R R R R R R
Food
R R R R R R R R R R R R R
R R R R R R R R R R R R R
a.
b.
A situation in which a rat receives food after a delay, (a) from the experimenter’s perspective, in which
only a single response precedes food, and (b) from the rat’s perspective, in which the correct response
was only one of many which it made, both simultaneously and sequentially, before receiving the food.
Model created by author
Viewed from this perspective, it is not surprising that rats have difficulty figuring out
which of their behaviors produced food. There are so many possibilities, it’s a wonder
that they ever do solve such problems. If the correct response is made more obvious or
marked by having it produce a distinctive stimulus—for example, a brief tone or flash of
light—then rats turn out to be much better at solving the problem (for example, Lieberman, McIntosh, & Thomas, 1979).
Research with Humans
What, then, of humans: Will a reinforcer be effective only if it occurs within seconds of
the behavior to be strengthened? You know from your own experience that this is not
so: A good grade for an essay, for example, can influence your future behavior even
if there was a delay of many days between your writing the essay and receiving the
grade. How, then, can we reconcile the evidence from animal research with our everyday experience?
One obvious answer is language. If you received a reward without any explanation—
think of a mysterious stranger approaching you, silently handing you $500 and walking away—then you, like the rats in Dickinson et al.’s study, might also struggle to
lie6674X_05_c05_153-200.indd 170
3/15/12 7:53 AM
CHAPTER 5
Section 5.3 Delay of Reinforcement
understand why you were being rewarded. Indeed, when experiments with humans
employ procedures that parallel those used with animals, where rewards are provided
without explanation, the results are almost uncannily similar (e.g, Shanks, Pearson, &
Dickinson, 1989; Lieberman, Vogel & Nisbet, 2008). Fortunately for us, the relationship
between our behavior and its rewards is rarely this opaque because we possess language.
If a father decides to reward his young daughter for some exemplary behavior, he doesn’t
just hand her a new toy without a word; he explains what it is for. Language can bridge
the gap between a response and a reward symbolically, even when physically they are
widely separated in time.
Delay Reduces Incentive
Our possession of language means that a delay in the presentation of a reward need not
be nearly as catastrophic for people as for rats. Nevertheless, we are going to suggest that
it is still desirable—and sometimes even vital—to reward behaviors as quickly as circumstances allow. One reason is that rewards tend to be perceived as less attractive when they
are delayed. If you were offered a choice between receiving $100 now or in a year, would
you find these options equally attractive? It seems unlikely. In the jargon of the field, a
delayed reward has less attractive incentive value, and we are less motivated to work to
obtain it.
One example was reported in a study by Rachlin and Green (1972) using pigeons as subjects. They trained the pigeons in a Skinner box containing two circular plastic disks called
keys. If the birds pecked the key on the left (key 1), grain was immediately made available
for 2 seconds, whereas if they pecked the key on the right (key 2), grain was made available for 4 seconds, but only after a delay of 4 seconds.
R1
R2
2 seconds of food
4 seconds of food
The time between trials was held constant, so that over the course of a session, a bird that
always pecked key 2 would receive twice as much food as a bird that always pecked key 1.
Despite this, the pigeons pecked key 1 on 95% of the trials. They preferred to receive half
as much food rather than wait four seconds for the larger amount.
You might be tempted to dismiss this result as evidence of pigeons’ lack of intelligence,
but Kirby and Herrnstein (1995) found that humans discount delayed reinforcers in much
the same way. To assess the value of delayed rewards, they offered college students a
choice between a smaller amount of money to be delivered soon and a larger amount to
be delivered later. For example, subjects were asked if they would prefer $12 in 6 days or
$16 in 12 days. The students were offered a number of such choices, and, to ensure that
they would take these choices seriously, they were told that one of their choices would be
selected at random at the end of the session, and they would actually receive the option
they had chosen. Rationally, you might think that the students would have preferred
receiving $16 to $12. Since both rewards were substantially delayed anyway, surely it
would be better to wait a few more days and receive 33% more money? Apparently not, as
most participants preferred the smaller sum that was delivered sooner. Like pigeons, we
seem to value rewards less when they are delayed. (See also Kirby, 1997.)
lie6674X_05_c05_153-200.indd 171
3/15/12 7:53 AM
Section 5.3 Delay of Reinforcement
CHAPTER 5
An intriguing study by Madden, Petry, Badger, and Bickel (1997) suggests that this preference for immediate rewards is even more pronounced in drug addicts. The authors asked
addicts and control subjects to choose between hypothetical options similar to those used by
Kirby and Herrnstein. They found that addicts were much more likely to choose the more
immediate reward—they seemed less able to tolerate delayed gratification. One possible
reason for this result is that addicts’ experiences with drugs increased their need for immediate reinforcement. Another possibility is that the addicts had a preexisting need for immediate reinforcement that made them more vulnerable to addiction in the first place. Resolving
this question would require longitudinal data, tracking whether individuals who are poor
at tolerating delayed rewards are more likely to later develop problems such as addiction.
Reinforcing Homework
A study by Phillips (1968) provides a real-life example of the value of providing reinforcement quickly. To improve procedures for treating juvenile delinquents, Phillips
established a residential home for boys, called Achievement Place. Because one problem
shared by most delinquents is
failure in school, which in turn
reflects an almost total failure
to do any assigned homework,
Phillips set out to encourage
homework completion through
the use of reinforcers. Whenever
an assignment was completed
to an acceptable standard, the
boys were allowed to stay up
for one hour past their normal
bedtime on weekends. This
reward was known as “weekly
time.” The effect of this reward
on the behavior of one boy, Tom,
Many students have difficulty studying in a consistent way;
is shown in Figure 5.8. Over
part of the reason for this is that the reinforcers for studying
a 14-day period, Tom did not
(such as getting a good grade, obtaining a job, etc.) are delayed
rather than immediate.
complete a single assignment.
One possible explanation for this failure was that the reinforcer being used was not sufficiently attractive; maybe Tom didn’t value being allowed to stay up late. Another possible explanation was the delay between completing an assignment during the week and
being allowed to stay up late on the weekend. To find out, Phillips used exactly the same
reinforcer in the next phase of the study—one hour of late time for each correct assignment—but now allowed Tom to stay up on the night that an assignment was completed
rather than waiting until the weekend. These results are also shown in Figure 5.8, in the
section labeled “daily time.”
lie6674X_05_c05_153-200.indd 172
3/15/12 7:53 AM
CHAPTER 5
Section 5.3 Delay of Reinforcement
Assignments
Completed (percent)
Figure 5.8: The Phillips study: different reinforcement conditions
weekly time
daily time
100
80
60
40
20
0
0
14
21
Sessions
This figure displays the percentage of homework assignments completed by Tom under two conditions.
In both, the reward for successful completion of homework was being allowed to stay up an extra hour.
In the first condition, the reward could not be collected until the weekend (“weekly late time”), and
Tom did not complete a single assignment. When he could stay up the same night (“daily late time”),
his performance improved substantially.
Source: Adapted from Phillips, 1968
We can see that the percentage of homework assignments completed rose immediately
from 0 to an average of 50%. Even though the same reinforcer was used in both conditions, its effectiveness varied dramatically depending on the delay in its presentation.
Thus, although reinforcers can be effective after a delay, as a general rule they should be
delivered as soon after a response as possible if they are to achieve their full potential.
Failure to adhere to this principle may be one of the most important reasons that reinforcers are sometimes ineffective.
At the beginning of the chapter, we referred to the puzzle of why students have difficulty
studying despite the potent rewards—good grades, a job that pays well—contingent on
this behavior. One important reason is almost certainly the delay involved in reinforcement. The reinforcers for studying arrive only after very long delays, whereas those for
alternative activities, such as going to a movie or a football game, are essentially immediate. This principle is illustrated in the diagram below, where SR represents any of the possible reinforcers for going to a movie (being with friends, the enjoyment of watching the
movie itself, etc.). SR follows R movie almost immediately. In contrast, the reinforcers for
studying (R studying) come much later in time.
lie6674X_05_c05_153-200.indd 173
3/15/12 7:53 AM
CHAPTER 5
Section 5.4 Schedules of Reinforcement
R movie
SR
R studying
SR
The student who doesn’t study might thus be behaving much like the pigeon in the Rachlin and Green study: he knows that in the long term one response produces much more
valuable consequences, but is nevertheless unable to resist the temptation of immediate
gratification. The moral to this section can thus be summarized very simply: For a reinforcer to be maximally effective, it should be presented as soon as possible after a response.
5.4 Schedules of Reinforcement
One of the most important factors determining the effect of reinforcement was discovered
by accident. When Skinner was carrying out the research for his Ph.D., he ran his experiments on weekends as well as during the week, and one Saturday he discovered that his
supply of pellets would not last until Monday. So instead of reinforcing every bar-press as
he had done in the past, he decided to reinforce only one per minute. This had two gratifying consequences:
1. His supply of pellets lasted almost indefinitely.
2. The rats continued to respond and, after some initial hesitation, did so at a steady
rate.
Over time, Skinner tried several different rules, or reinforcement schedules, for deciding
which responses to reinforce, and he found that the choice of schedule had important consequences for how his animals responded. We will begin by defining some of the schedules he used and then look at their effects on behavior.
Ratio and Interval Schedules
The simplest schedule is to reinforce a response every time it occurs. This schedule is
known, not unreasonably, as a continuous reinforcement (CRF) schedule. In the real
world, though, behavior is rarely reinforced so consistently. Children, for example, are
not praised every time they tell the truth, and factory workers are not paid every time they
tighten a screw. Instead, most behavior is reinforced on intermittent, or partial, reinforcement schedules.
Two types of partial reinforcement schedules have been studied most often: ratio schedules and interval schedules. In a ratio schedule, reinforcement depends on the number
of responses that have been emitted. In factories, for example, workers’ wages used to
depend solely on the number of responses they made—for example, the number of dresses
made—regardless of how long it took. In an interval schedule, on the other hand, the passage of time since the last reinforcement, rather than the number of responses, determines
whether the next response will be reinforced. Whether you find mail the next time you go
to your mailbox, for example, will depend on how long it has been since the last time you
found mail, not on how often you visited the mailbox in the interim. Note that obtaining
lie6674X_05_c05_153-200.indd 174
3/15/12 7:53 AM
Section 5.4 Schedules of Reinforcement
CHAPTER 5
reinforcement in an interval schedule still requires
a response: You do not obtain mail unless you go to
the mailbox. The length of the interval determines
when reinforcement becomes available; a response
is still necessary to actually obtain it.
Further complicating matters, ratio and interval schedules can be subdivided according to
whether the requirement for reinforcement is
fixed or variable. In a fixed interval (FI) schedule,
the interval that must elapse before a response
can be reinforced is always the same, whereas in
a variable interval (VI) schedule this interval is
varied. In an FI 60-second schedule, for example,
60 seconds must always elapse following a reinforcement before a response can be reinforced
again, whereas in a VI 60-second schedule, the
interval might be as short as 5 seconds or as long
as 2 minutes. (The 60 seconds in the schedule’s
name refers to the average.)
Ratio schedules are subdivided in a similar way. Trips to a mailbox are reinforced on an
In a fixed ratio (FR) schedule, the number of FI schedule—a set period must elapse
responses required for reinforcement is always following a delivery before another trip
the same. For example, a rat who receives a food can be reinforced.
pellet after every three lever presses is on a fixed
ratio schedule. In a variable ratio (VR) schedule, the number of responses required to
obtain reinforcement varies across successive reinforcements. For example, FR 30 means
that every 30th response will be reinforced; VR 30 means that an average of 30 responses
(sometimes only 5 responses, sometimes 50, and so on) will be required for reinforcement.
A slot machine in a casino is a classic example of a VR schedule: Payoffs depend on how
many times the machine is played, but the jackpot is made unpredictable to prevent players playing only when a machine has been in use by others for a long time. Figure 5.9
summarizes these four schedules.
lie6674X_05_c05_153-200.indd 175
3/15/12 7:53 AM
CHAPTER 5
Section 5.4 Schedules of Reinforcement
Figure 5.9: Partial reinforcement schedules
The most commonly studied types are ratio (where reinforcement depends on the number of responses
emitted) and interval (where reinforcement depends on time since the last reinforced response).
Schedules are further subdivided according to whether the schedule requirement is fixed or variable.
Patterns of Responding
Learning the distinctions among the various schedules can be tedious, but each schedule
has somewhat different effects on behavior, and these differences can be important. Figure
5.10 presents cumulative records illustrating the typical patterns of responding obtained
under FI and FR schedules of reinforcement. In a cumulative response record, time is
plotted along the x-axis and the y-axis shows the cumulative or total number of responses
made since the beginning of the session. If a rat were to press a lever at a steady rate of one
press every second, this would appear on a cumulative record as an ascending straight
line. The faster the rat responds, the more steeply the line will rise.
FI
Time
Cumulative Responses
Cumulative Responses
Figure 5.10: Typical cumulative response records
FR
Time
b.
a.
This figure shows the typical cumulative response records generated by two types of schedules: (a) fixed
interval (FI); (b) fixed ratio (FR). The short diagonal marks indicate presentations of a reinforcer.
lie6674X_05_c05_153-200.indd 176
3/15/12 7:53 AM
Section 5.4 Schedules of Reinforcement
CHAPTER 5
In an FI schedule (Figure 5.10a), reinforcement becomes available only after a fixed period
of time has elapsed following the previous reinforcement; each short diagonal mark on
the record indicates the occurrence of a reinforcer. We can see that immediately after
reinforcement, subjects respond at a very low rate, but this rate steadily accelerates and
reaches a peak just before the next reinforcement is due. Thus, subjects tend to respond in
a cyclical pattern.
Because of its appearance when graphed, this positively accelerated response pattern is
called an FI scallop, and it has important implications for the practical use of FI schedules. For example, if you were a parent who wanted to encourage your daughter to study
by praising this behavior, it would be a great mistake to visit her room only at regular,
hourly intervals. If your praise were the main reinforcer for studying, it is likely that
your daughter would begin to study at regular, hourly intervals. Ironically, psychology
professors (including those teaching learning) make exactly this mistake by scheduling
exams at predictable, fixed intervals, with the result that students’ studying often takes
the form of a classic FI scallop: a zero or very low rate of studying immediately after
an exam, gradually rising to a frantic peak the night before the next exam! Mawhinney,
Bostow, Laws, Blumenfeld, and Hopkins (1971) reported evidence that studying really
does follow this pattern. To estimate the amount of time students spend studying, they
monitored the use of course material in the library. When exams were scheduled daily,
students maintained a constant rate of studying of around 60 minutes per day. When
exams were scheduled at 3-week intervals, students spent an average of only 15 minutes studying during the first session following an exam, but study time then increased
steadily over days, reaching a peak of almost two hours during the session just before
the next exam. In this case the effects of an FI schedule on students were almost identical to those on rats and pigeons, but this is not always the case. We will discuss this
anomaly further in Chapter 8.
Figure 5.10b shows the typical response pattern under an FR schedule. Here, reinforcement is contingent on a fixed number of responses, and the result is generally “pauseand-run” behavior. Subjects pause for a while after reinforcement (as shown by the level
horizontal line after the diagonal mark on the graph), but once they begin to respond,
they respond steadily until they earn another reinforcer. If the ratio requirement is too
great, however, ratio strain may be observed: Subjects will begin to respond, then pause,
respond a bit more, pause again, and so on. If the schedule requirement is not reduced at
this point, subjects soon cease to respond altogether.
In VI and VR schedules, by contrast, the requirement for reinforcement is varied, with
the result that a response can be reinforced at any time. The result is that these schedules
produce much steadier rates of responding, without such obvious pauses.
The Partial Reinforcement Effect
Having described the properties of the five schedules most often studied—CRF, FI, FR,
VI, and VR—can we now say which one is best? If our goal were to ensure as strong
a response as possible, the obvious answer would seem to be to reinforce the desired
response every time it occurred (CRF)—the more often a response is reinforced, the stronger it should be. Indeed, in some respects that is so, but reinforcing every response can
sometimes have unintended consequences. Consider the following experiment by Lewis
and Duncan (1956). The subjects in this study were college students, and they were given
lie6674X_05_c05_153-200.indd 177
3/15/12 7:53 AM
CHAPTER 5
Section 5.4 Schedules of Reinforcement
an opportunity to play a slot machine. They were told that they could play as long as
they wanted, and that each time they won they would earn five cents. The percentage
of reinforcement was varied across groups during the first phase: One group was never
reinforced; a second group was reinforced once; and so on. Reinforcement was then discontinued, and the experimenters monitored how long subjects continued to play.
You might think that the higher the percentage of reinforcement, the stronger the response,
and thus the longer subjects would continue to play. As shown in Figure 5.11, however,
that was not the case. Quite the contrary, the lower the percentage of reinforcement during
training, the longer subjects played during extinction. This counterintuitive result—that
partial reinforcement during training increases responding during extinction—is called
the partial reinforcement effect (PRE). It should be noted that in the Lewis and Duncan
experiment, the no-reinforcement condition (0%) resulted in the highest levels of responding during extinction, but this is not usually the case. The persistent responding in this
group was probably caused by the wording of the instructions, which implied that some
reinforcement would be given if subjects responded. When this reinforcement was not
forthcoming following the first eight plays, participants kept trying.
Plays to Extinction (mean log)
Figure 5.11: Partial reinforcement on responding during extinction
2.0
1.9
1.8
1.7
0
25
50
75
100
Percent Reinforcement
This graph displays the effect of partial reinforcement on responding during extinction. The lower the
percentage of reinforcement college students received for playing a slot machine during training, the
longer they persisted in playing during extinction.
Source: Adapted from Lewis & Duncan, 1956
lie6674X_05_c05_153-200.indd 178
3/15/12 7:53 AM
Section 5.4 Schedules of Reinforcement
CHAPTER 5
The partial reinforcement effect was so surprising to psychologists that at first it was called
“Humphreys’ paradox,” after the researcher who discovered it. Various explanations
were proposed, but most learning psychologists now agree that the fundamental cause
is the difficulty subjects have in judging whether further responding is likely to produce
reinforcement. For subjects who have always been reinforced, the transition to extinction
is obvious, and they are likely to quit responding quickly. For subjects who have received
reinforcement after long periods of nonreinforcement during training, on the other hand,
the transition to extinction is less obvious, and they are more likely to persist in the hope
that they will eventually be reinforced.
Tantrum behavior in children provides a real-life example of the partial reinforcement
effect. When parents pay attention to a child having a tantrum, their attention can reinforce this behavior. Sometimes parents realize this is the case, so they try hard to ignore
the tantrum. If, with great effort, they manage to ignore their child’s tantrums 90% of the
time, they might then be baffled when the tantrums continue, but this persistence follows
directly from the partial reinforcement effect: By reinforcing the behavior on a partial reinforcement schedule (in this case, a VR10), the parents are in fact increasing the persistence
of the behavior, as the child learns that persistence will eventually pay off. If parents do
decide to ignore tantrums, it is very important that they do so consistently, as even one or
two reinforcements can dramatically increase the time required for extinction.
Choosing a Schedule
Let us now return to the question of which schedule is best. Reinforcing every response
(CRF) has some important advantages, but it also has some serious disadvantages. As we
have just seen, continuous reinforcement does not encourage persistent responding—if
reinforcement is not available for a while, there is a greater likelihood that responding will
cease. A further disadvantage is that continuous reinforcement is often costly: In monetary terms, it costs whatever the value of the reinforcer is, but it also requires considerable
time and effort of the person delivering the reinforcer to ensure that he or she is always
present when the desired response occurs.
Given these problems, the optimum strategy for producing durable responding is usually
to begin by reinforcing every response, but then to gradually reduce the rate of reinforcement to the lowest level that will maintain a satisfactory response rate. For this purpose,
schedules with variable reinforcement requirements are generally preferable to schedules
with fixed requirements because the unpredictability of reinforcement generates more
consistent and rapid responding. Our search for the “best” schedule, therefore, has narrowed to two candidates: VR and VI. Which should you use?
The answer turns out to be a bit complicated. A VR schedule normally generates a higher
rate of response than a VI schedule because reinforcement on a VR schedule directly
depends on the number of responses: If a subject doubles the number of responses he or
she makes, the reinforcements will also double. On the other hand, if the VR requirement
is set too high, subjects will abruptly quit, whereas VI schedules can maintain a low but
steady rate of responding even when reinforcement is infrequent. In sum, a VR or a VI
schedule is generally the most effective in maintaining persistent responding; a VR schedule will tend to generate higher response rates, but if reinforcement is to be delivered only
infrequently, then a VI schedule is more likely to sustain responding.
lie6674X_05_c05_153-200.indd 179
3/15/12 7:53 AM
Section 5.4 Schedules of Reinforcement
CHAPTER 5
A Criminally Successful Application
By now, your feelings about schedules might resemble those of the child whose review
of a book about penguins began, “This book told me more about penguins than I wanted
to know.” Learning the technical distinctions among schedules is tedious, but as we suggested earlier, different schedules can have very different effects, and when used imaginatively, schedules can be powerful tools for altering behavior.
In a striking demonstration of the importance of the schedule used, Kandel, Ayllon, and
Roberts (1976) used reinforcement as part of a remedial high school education program in
a Georgia state prison. The subjects were two inmates, one with a measured IQ of 65, the
other with an IQ of 91. To reinforce studying, they were awarded points whenever they
passed a test with a score of 80% or better, and these points could then be exchanged for
a variety of reinforcers such as cigarettes, cookies, and extra visiting privileges. With 1000
points, for example, a convict could buy a radio as a present for his family.
The program produced significant progress, but not as much as the authors had hoped. One
possible explanation was that the inmates simply were not bright enough to progress any
faster. (With IQs of 65 and 91, it was perhaps remarkable that they progressed as fast as they
did.) Another possibility was that the reinforcement schedule did not provide sufficient
incentive for the hard work required. To find out, the authors devised a new schedule in
which the faster the inmates progressed, the more points they earned. If an inmate completed one grade level in a subject in 90 days, for example, he received 120 points; if he did it
in only 4 days, he received 900 points; and if he did it in only 1 day he received 4700 points.
The result was a quite staggering rate of progress. Under the old schedule, one of the convicts, Sanford, had completed ninth-grade English in three months—all things considered,
not unimpressive. Under the new schedule, he completed tenth-, eleventh-, and part of
twelfth-grade English in just one week. He often missed recreational periods and stayed up
all night to work. As he remarked to one of the instructors, he wanted to “get when the gettin’ was good.” During the five months of the program—standard reinforcement schedule
as well as enriched—he advanced 4.6 years in high school arithmetic, 4.9 years in reading,
and 6.6 years in language. In other words, he completed almost five years of high school in
five months—roughly 12 times the normal rate. And Sanford was the one with an IQ of 65!
These results have at least two important implications. First, and most relevant to our
current concern, they illustrate how powerfully the choice of reinforcement schedule can
determine the effectiveness of reinforcement. More generally, they hint at how often we
underestimate people’s ability to learn and change. Knowing Sanford’s criminal record
and apparent IQ, few would have believed that he was capable of such progress. But under
appropriate learning conditions, all of us—learning disabled as well as gifted, criminal as
well as non-criminal—might be capable of far more learning than is commonly assumed.
Too often, we blame failure on the learner: “Oh, he’s too stupid.” “She’s just not trying.” A
much more productive reaction to failure might be to assume that our teaching methods
are at fault and to search for better methods. We have now seen two examples in which
a critical reexamination of teaching procedures led to dramatic improvements in learning—Phillip’s change to immediate reinforcement at Achievement Place, and the Kandel
group’s imaginative use of a new reinforcement schedule—and we shall encounter others
as we proceed. Greater faith in human potential can sometimes pay handsome dividends.
lie6674X_05_c05_153-200.indd 180
3/15/12 7:53 AM
CHAPTER 5
Section 5.5 Motivation
5.5 Motivation
Whether you performed a response to obtain a reinforcer would depend not only on whether
you believed the response would produce the reinforcer (learning) but also on whether you
wanted the reinforcer (motivation). To take a simple example, whether you insert a coin
into a vending machine to obtain a cup of coffee would depend not only on whether you
had learned that a coin would
operate the machine but also on
whether you wanted coffee.
Motivation, in turn, depends
partly on how long you have
been deprived of that reinforcer
and partly on its attractiveness.
How hard you would work
to obtain food, for example,
would depend on how hungry
you were and how much you
liked that food. To use a carrotand-stick analogy, deprivation
functions as a stick to drive us
The effectiveness of a reinforcer depends not only on what it
forward, and the reinforcer
is—in this case, an apple—but also on our motivation to obtain functions as a carrot to attract
it—for example, whether we’re hungry.
us; we need to consider both in
predicting how effective a reinforcer will be. Figure 5.12 summarizes these concepts: Whether you perform a reinforced
response depends on both learning and motivation, and motivation in turn depends on
the amount of reinforcement available and how long you have been deprived of it.
Figure 5.12: The effect of a reinforcer depends on learning and motivation
learning
(R
S R)
performance
of response
amount of SR
deprivation
motivation
R
( value of S )
If a response is reinforced, future performance of that response will depend on both learning (knowing
that the response produces a reinforcer) and motivation (wanting the reinforcer).
lie6674X_05_c05_153-200.indd 181
3/15/12 7:53 AM
CHAPTER 5
Section 5.5 Motivation
On the surface, the concept of motivation is simple—the more you want a reinforcer, the harder
you will work to obtain it. When this concept is
examined more closely, though, it turns out to
be surprisingly complex. In this section we will
examine two of these complications. (For further
examples, see Bolles, 1975, and Balleine, 2001.)
Contrast Effects
The attractiveness of a reinforcer is referred to as
its incentive value. One determinant of incentive
value is the nature or quality of the reinforcer—
most children, for example, can be relied on to
prefer ice cream to spinach—and another is the
amount or quantity provided. In one examination
of the effect of amount, Crespi (1942) trained rats
to run down a straight-alley maze to a goal box
containing either 1, 16, or 256 pellets of food. A
larger amount should have a greater incentive
value, so the rats should run faster to obtain it. As
shown in the left-hand section of Figure 5.13, that
is exactly what Crespi found.
The effectiveness of a reinforcer depends
in part on the amount offered—the greater
the amount, the harder we will work to
obtain it.
Figure 5.13: The effect of amount of reinforcement on running speed in rats
Running Speed (ft/sec)
Preshift
Postshift
4.0
256 16 pellets
16 16 pellets
1 16 pellets
3.0
elation
depression
2.0
1.0
0
0
2
4
6
8
10
12
14
16
18
20
2
4
6
8
Trials
This figure features the effect of amount of reinforcement on running speed in rats. During the pre-shift
phase, shown at the left, rats received either 1, 16, or 256 pellets of food after running down an alley;
the larger the reward, the faster they ran. In the test phase, all the rats received the same reward of 16
pellets, but the effect of this reward depended on what amount they had received previously.
lie6674X_05_c05_153-200.indd 182
3/15/12 7:53 AM
Section 5.5 Motivation
CHAPTER 5
During the initial, or preshift, phase (left portion of the graph), groups received either 1,
16, or 256 pellets of food on each trial. As we can see, the group of rats being reinforced
with 256 pellets had the fastest speed, running as much as 4.0 feet/second. The group
reinforced with 16 pellets had a more moderate running speed, and the group given
the reinforcer with the lowest incentive value (1 pellet) never achieved a running speed
greater than 1.0 feet/second. At the twentieth trial, the researchers gave the same number
of pellets (16) to all three groups. The results of this shift are shown on the right (postshift)
side of the graph. The group previously given 1 pellet ran faster than the group already
accustomed to 16 pellets, resulting in an elation effect, or positive contrast. The group previously given 256 pellets ran slower than the group accustomed to 16 pellets, resulting in
a depression effect, or negative contrast. (Adapted from Crespi, 1942.)
This result could be explained in two quite different ways. The first possibility is the one
we have already considered, that quantity affects motivation. The group that received 256
pellets would have found this reward more attractive and therefore ran faster to obtain
it. The second possibility is that quantity affects learning. According to Thorndike’s Law
of Effect, satisfaction stamps in an association between a response and the situation in
which it is made, and the greater the satisfaction, the stronger the association. According
to this interpretation, the group receiving 256 pellets ran faster because the larger reward
produced a stronger association between the alley cues and the response of running. So,
does amount of reinforcement affect learning or motivation?
To find out, Crespi ran a second (postshift) phase in which he gave all three groups 16 pellets when they reached the goal box. For the group switched from 256 pellets to 16 pellets,
a motivational interpretation predicts that they should now be less motivated to reach the
goal box and thus should run more slowly. According to the Law of Effect, on the other
hand, this group should continue to run quickly because they still receive food every time
they reach the goal box, and this reward should continue to strengthen the response. The
results, shown on the right-hand side of the figure, supported the motivational interpretation, as reducing the amount of food produced a precipitous drop in running speed.
Similarly, increasing the amount from 1 pellet to 16 pellets produced a sharp increase
in running speed. These results strongly supported a motivational interpretation. In the
words of Logan and Wagner,
If a rat’s speed of running decreases over a series of trials after its reward
has been reduced, it is unreasonable to conclude that the current trials have
caused the animal to know less about the runway or about the appropriateness of running. Common sense says that the animal simply learned that
he would receive a smaller reward as a consequence of the running. (Logan
& Wagner, 1965, p. 43)
As this quote suggests, amount of reward did have some effect on learning, in the sense
that the amount of food in the goal box was one of the things that subjects learned. The
main effect of reward, however, was clearly on motivation—when the amount was
increased, for example, the rats immediately began to run faster.
The fact that the amount of reinforcement affects motivation is perhaps not altogether
surprising, but one aspect of Crespi’s results was less easily predicted. If you look again at
lie6674X_05_c05_153-200.indd 183
3/15/12 7:53 AM
Section 5.5 Motivation
CHAPTER 5
the right-hand side of the graph, you will see that the running speeds of the three groups
differed substantially in the second phase, even though all were now receiving the same
reward. In the group shifted from 1 pellet to 16 pellets, running speed not only increased
to the level of those given 16 pellets throughout, but significantly exceeded it. Conversely,
16 group not only fell to the level of the group trained on 16
running speed in the 256
pellets throughout but dropped significantly below it.
Crespi called the overshoot in the group shifted from 1 to 16 pellets an “elation effect,”
implying that the rats were so excited over this improvement in their circumstances that
they ran especially fast. On similar reasoning, he labeled the undershoot in the group
switched from 256 pellets to 16 pellets a “depression effect.” Other learning psychologists,
however, were unhappy with these terms. Aside from the problem of knowing what a rat
is feeling, the terms elation and depression imply emotional effects that should disappear as
subjects become accustomed to the new levels of reinforcement. In some cases, however,
the effects are enduring. (See Flaherty, 1996, for a review.) Psychologists have thus come
to prefer the more neutral terminology of contrast effects to describe these phenomena,
emphasizing that the effect of any reinforcer depends on how it contrasts with reinforcers experienced previously. Crespi’s elation effect is now called positive contrast, and the
depression effect is called negative contrast.
Contrast effects suggest that the effects of reinforcement depend on subjects’ expectations.
If you expect 1 pellet, 16 pellets may seem marvelous; if you expect 256 pellets, 16 pellets
may come as a disappointment. The importance of expectations in reinforcement might
remind you of classical conditioning, where we encountered a similar phenomenon in
our discussion of the Rescorla-Wagner model. There, too, the effect of a US depended on
a subject’s expectations, expressed in terms of V, as the same US could produce either an
increase or a decrease in associative strength depending on what subjects were expecting.
When an important event such as food or shock occurs, we seem to evaluate it relative to
our expectations, and this comparison or contrast then determines how we react. (For a
more detailed analysis of the mechanisms underlying contrast effects, including a discussion of factors not considered here, see Williams, 1997.)
One practical implication is that in choosing a reinforcer it is important to consider what
reinforcers a person has experienced previously. If you own a car and a television, the
promise of a bicycle as a reward might not be very exciting, but if you grew up in poverty,
it might seem priceless. This might explain the age-old parental complaint, “Kids today
just don’t appreciate the value of money. Why, when I was a kid . . .” When standards of
living improve, people become accustomed to the new levels; what was once a powerful
reinforcer might now seem drab and unexciting by comparison.
The Yerkes-Dodson Law
So far, we have assumed that motivation affects performance. In Crespi’s experiment,
for example, the group trained with a small reward seemed to learn just as well as the
group trained with a large reward—when the small-reward group was shifted to a larger
reward, they immediately ran faster. On the other hand, the response the rats had to learn
to obtain food was very, very simple—just to run down an alley. Even if larger amounts
did produce better learning, it might have been difficult to detect in this situation because
lie6674X_05_c05_153-200.indd 184
3/15/12 7:53 AM
Section 5.5 Motivation
CHAPTER 5
the group receiving the smaller
amount would have learned
so quickly. Thus, although the
behavior of the rats that were
shifted from 256 pellets to 16
pellets provided clear evidence
that amount affects motivation, it remained possible that
amount also affects learning,
and that we could observe such
effects if we used more difficult
tasks.
To provide a fairer test of whether
motivation affects learning,
According to the Yerkes-Dodson law, high motivation is likely
Broadhurst (1957) trained rats
to interfere with a person’s ability to complete a difficult task
on a visual discrimination task
such as a crossword puzzle, but it enhances performance on a
in a Y-shaped maze. The maze
simpler task such as a game of tic-tac-toe.
was flooded with water, and a
platform located in one arm of the Y allowed the rats to escape. This is an example of negative reinforcement, in which the reinforcer is the termination of an aversive stimulus,
rather than the presentation of a desirable one. (As mentioned earlier, negative reinforcement is not punishment: In negative reinforcement as in positive reinforcement, behavior
is strengthened; the difference lies solely in whether this is achieved by the presentation
of a stimulus or its removal.) The position of the platform in this experiment was shifted
randomly over trials, but its current location was always signaled by the illumination of
the arms; the brighter of the two arms always contained the platform. To assess the effects
of motivation on learning, Broadhurst varied how long the rats were held underwater
before being allowed to swim through the maze; the confinement period ranged from
zero to eight seconds. In addition, he examined the role of problem difficulty by varying
the relative brightness of the alleys in different groups. For the easiest problem, the correct
alley was 300 times brighter than the incorrect one, whereas for the most difficult problem
the illumination ratio was only 15 to 1.
The results for the different groups are shown in Figure 5.14, which plots in threedimensional form the percentage of correct responses over the first 100 trials as a function of both drive level and problem difficulty. In all three problems, drive level did
influence learning, but the optimal level of motivation varied with the difficulty of the
problem. On the easy problem, drive seemed to enhance learning uniformly: The longer
that subjects were deprived of air, the fewer errors they made while learning. On the
difficult problem, on the other hand, the fastest learning occurred with deprivations of
only two seconds; increases in deprivation beyond this value resulted in a substantial
decrease in learning.
lie6674X_05_c05_153-200.indd 185
3/15/12 7:54 AM
CHAPTER 5
Section 5.5 Motivation
Figure 5.14: Visual discrimination experiment
y
eas
ate
r
e
d
mo
80
75
70
Percent Correct
Learing
85
8
6
4
2
0
s)
d
n
eco n
s
(
io
ay
Del otivat
M
Results of a visual discrimination experiment to determine the effects of motivation on learning. The
percentage of correct responses on a discrimination learning task was affected by both motivation and
problem difficulty.
Source: Data from Broadhurst, 1957
Broadhurst’s results suggest that motivation does affect learning, but that the relationship
is complex. With relatively simple problems, increasing motivation enhances learning, but
on more difficult problems high motivation can actually be harmful. This inverse relationship between task difficulty and optimum motivation—the more difficult the problem, the
lower the optimum level of motivation—has been observed in a number of other studies
(for example, Bregman & McAllister, 1982; Hochauser & Fowler, 1975). The phenomenon is
known as the Yerkes-Dodson law, named for the two psychologists who first discovered it.
A possible educational example might be the use of an attractive reward to encourage
teenagers to get good grades in a mathematics course. For a student who finds math easy,
the promise of an ipad if they get an A might be a powerful and effective incentive. For a
student who finds math difficult, on the other hand, offering this reward might actually
lead to poorer performance. The reasons that high motivation interferes with the learning
lie6674X_05_c05_153-200.indd 186
3/15/12 7:54 AM
CHAPTER 5
Section 5.6 The Role of the Stimulus
of difficult tasks are not fully understood, but the most likely explanation is that motivation affects attention. According to Easterbrook (1959), attention becomes more highly
focused when we are aroused; we concentrate more intensely on only a few stimuli while
effectively ignoring all others. For simple problems, in which the relevant cues are obvious, focused attention is likely to facilitate learning. For problems in which the important
cues are more subtle, however, a subject that focuses attention too narrowly might miss
the critical cues and thus take much longer to solve the problem. The result is that high
motivation helps subjects to solve simple problems but impairs performance on more
difficult tasks. (For experimental support, see Telegdy & Cohen, 1971; Geen, 1985; for an
alternative explanation of the effects of motivation, see Humphreys & Revelle, 1984.)
In summary, we have seen that the effectiveness of a reinforcer depends on whether subjects are motivated to obtain it, and this in turn depends on its attractiveness or incentive
value (the carrot) and how long subjects have been deprived of it (the stick). In general,
stronger motivation produces better performance, but we have also seen two complications—that the incentive value of a reinforcer depends on how it contrasts with previous
reinforcers, and that motivation can affect learning as well as performance. As with so
many other aspects of reinforcement, the concept of motivation is simple on the surface
but considerably more complex when examined closely.
5.6 The Role of the Stimulus
One basic aspect of Thorndike’s Law of Effect, the assumption that a response will be
strengthened if it is rewarded, seems little more than common sense. Thorndike’s version,
however, is subtly different—it does not say that a reward will strengthen a response in
a general sense, but in the particular situation where the reward was received (he wrote,
“responses. . .followed by satisfaction [will be] more firmly connected with the situation”). To
see the importance of this distinction, consider a child praised for cleaning her room. According to Thorndike, the effect might not be a general increase in cleaning her room, as her parents might fervently be hoping, but rather an increase only in the situation where the reward
was given. For example, because
her parents had been present
when she was rewarded, she
might learn to clean her room
only when they are there, not
exactly the intended outcome.
Stimulus Control
Thorndike did not systematically test this assumption, but
later research was to support it.
In one classic study, Guttman
and Kalish trained pigeons to
peck at a circular plastic disk, or
key, mounted on one wall of a
Skinner box. The key was illuminated with a yellowish-orange
lie6674X_05_c05_153-200.indd 187
Children may be well-behaved in a classroom setting but not so
compliant or easy to get along with at home. This illustrates the
phenomenon of stimulus control, in which the probability of a
response depends on the particular stimuli that are present.
3/15/12 7:54 AM
CHAPTER 5
Section 5.6 The Role of the Stimulus
light of 580 nanometers (a nanometer is a measure of a light’s wavelength, which determines its color) and pecks at the key were occasionally reinforced with access to a grain
dish located below the key. To find out what the birds had learned during the training
phase, Guttman and Kalish ran a test session in which they varied the color on the key.
Sometimes it was illuminated with a green light (550 nm), sometimes with a red light (640
nm), and so on. During this session, responding was not reinforced in the presence of any
of the colors. As shown in Figure 5.15, the birds responded vigorously whenever the key
was illuminated with the yellowish-orange training stimulus (580 nm), but responding
fell off sharply when the test wavelengths diverged from this value. Contrary to our earlier analysis, reinforcement did not result in a general tendency to peck the key but, rather,
to peck the particular stimulus that had been present during reinforcement. Subsequent
experiments have extended this finding, showing that even seemingly irrelevant features
of the training situation (for example, the appearance of the walls, the texture of the floor)
can acquire control over the reinforced response, so that subjects respond less when these
stimuli are altered (see Balsam & Tomie, 1985).
Figure 5.15: Generalization of responding to colors of different wavelengths
300
Responses
250
200
150
100
50
510
530
550
570
590
610
630
Wavelength (nanometers)
The pigeons’ pecking was reinforced during training only in the presence of the 580-nm stimulus
(indicated by the arrow).
Source: Adapted from Guttman & Kalish, 1956
lie6674X_05_c05_153-200.indd 188
3/15/12 7:54 AM
Section 5.6 The Role of the Stimulus
CHAPTER 5
As you can see by looking at Figure 5.15, the 580-nm training stimulus received the largest number of responses (300), with a gradual decrease in responding the more a stimulus
diverged from that particular wavelength. For example, a stimulus light of 590 nanometers
didn’t produce as much response as the training stimulus, but it had considerably more
effect than, say, a stimulus of only 530 nanometers, which produced practically no response
at all. In other words, Guttman and Kalish’s experiment demonstrates how the response to
the training stimulus spread to similar stimuli, a phenomenon known as generalization. As
the training and test stimuli became less similar, responding declined, and this progressive
decline in response is called a generalization gradient (a gradient is an incline or slope).
This gradient illustrates the phenomenon of stimulus control, in which the probability of
a response varies depending on what stimuli are present. In this case, the color of the key
acquired control over the birds’ pecking, so that changes in this color affected their responding. Similarly, human behavior often comes under the control of stimuli that are present
when we are reinforced, sometimes without our realizing it. A businessman, for example,
may give generously to charity when in church, while behaving ruthlessly at work, and most
of us behave quite differently when in the presence of a superior—a parent, a teacher, or an
employer—than when we are with friends. We are not quite as consistent as the concept of
personality might imply, as our behavior can vary substantially depending on the situation.
Attention
Thorndike was thus right: When a response is reinforced, it will become associated with
the stimuli present at the time. But which stimuli? Will all the stimuli present acquire
control or only some? And if only some, which? The first question—whether all stimuli
will acquire control—proved surprisingly difficult to answer, but when an answer did
emerge, it was simple. We are constantly bombarded by stimuli—many thousands of
lights, sounds, and odors every second—and we can only attend to a fraction of them.
The inevitable consequence is that only some of the stimuli present when a response is
reinforced will come to control it.
The first really clear demonstration of this came in an experiment using two pigeons as
subjects (Reynolds, 1961). The pigeons were trained on a successive discrimination in
which two stimuli were presented alternately for three minutes at a time. When the key
was illuminated with the outline of a white triangle against a red background (S+), pecking was occasionally reinforced, but not when the stimulus was the outline of a white
circle against a green background (S−) no reinforcement was given. (See Figure 5.16a.)
Both birds quickly learned to peck S+, but not S−. To find out exactly what the birds had
learned about each stimulus, Reynolds now presented the elements of each compound separately, illuminating the key with either the circle, the triangle, red, or green. Figure 5.16b
shows the results for bird number 1. If all stimuli present during reinforcement acquire
control over responding, the red and triangle components should have elicited roughly
equal responding, but this was not the case: The first bird responded vigorously when the
key was red, but ignored the triangle. The second bird, on the other hand, responded at a
high rate when the triangle was present, but virtually not at all when the key was red (see
Figure 5.16c). Each bird, in other words, learned about only one of the two stimuli present.
lie6674X_05_c05_153-200.indd 189
3/15/12 7:54 AM
CHAPTER 5
Section 5.6 The Role of the Stimulus
Figure 5.16: Selective attention
S+
green
20
bird #1
10
Responses per Minute
red
Responses per Minute
20
bird #2
10
S0
R
Test Stimuli
G
0
R
G
Test Stimuli
a.
b.
c.
In Reynolds’ experiment, two pigeons received food when they pecked a triangle on a red background,
but not when they pecked a circle on a green background. When each element was presented separately
in the test phase, the results showed that one bird had learned only about the color red, the other only
about the triangle.
Source: Adapted from Reynolds, 1961
These results illustrate the empirical phenomenon of selective attention, in which only a
subset of the stimuli present comes to control responding (see also Langley & Riley, 1993).
We can thus reformulate the principle of reinforcement as follows: When a response is reinforced, some subset of the stimuli present is likely to acquire control over it, so that the response will
become more likely when these stimuli, or others similar to them, are present.
Practical Applications
In some applications involving reward, the goal is to have a behavior occur as widely
as possible, regardless of the situation. If you were a parent trying to train a child to be
honest, you would probably want this behavior to occur very widely. In other situations,
however, your goal might be to have a behavior occur only in specific settings. A child, for
example, needs to learn to cross a street only when the light is green, not red. In the following sections we will look at what can be done to achieve each of these goals.
Encouraging Discrimination
In cases where we want a behavior to occur only in particular settings, one useful technique is to provide discrimination training. In this procedure, training is provided not
lie6674X_05_c05_153-200.indd 190
3/15/12 7:54 AM
CHAPTER 5
Section 5.6 The Role of the Stimulus
only in the situation where we want the behavior
to occur (S+), but also in situations where we do
not want it to occur (S−). Presentations of the situations are alternated, and behavior is reinforced
only in the positive situation:
S+: R
SR
S−: R
___
This is the procedure Reynolds used to train his
birds to peck the red triangle but not the green
circle, and the outcome he obtained—differential responding to the two stimuli—is called a
discrimination.
A study by Redd and Birnbrauer (1969) illustrates
how discrimination training can determine the
outcome when behavior is reinforced. The particiOne of the myths surrounding George
pants were mentally disabled 12- to 15-year-old
Washington’s childhood is that he cut
boys, and the purpose of the study was to examdown his father’s cherry tree and admitted ine how the reinforcement contingencies estabto the misdeed by famously saying, “I
lished by adults can shape children’s behavior.
cannot tell a lie.” Honesty is often thought
When one of the experimenters was present, the
to be a personality trait that remains
boys were reinforced for playing cooperatively (a
relatively consistent across situations,
typical reward was candy and praise); when the
but research shows that people can be
impressively honest in some situations and other experimenter was present, the boys were
reinforced equally often, but the rewards were
yet deeply dishonest in others.
delivered at random intervals, regardless of how
the boys were behaving. When the boys were
later tested, they were far more likely to engage in cooperative play when the first experimenter was present. The experimenters then reversed roles, with the second experimenter
being the one who reinforced cooperative play, and the boys altered their behavior accordingly. As a result of discrimination training, cooperative play occurred only in the situation where it was reinforced.
Although the experimenter in this study deliberately arranged discrimination training, we
often encounter similar contingencies in real life. If a child’s cooperative behavior is praised
by a parent but ignored by a teacher, it would not be surprising if the child learned to
behave differently at home and in school. Indeed, considerable evidence indicates that our
behavior is not as consistent as we think. In one early study by Hartshorne and May (1928),
children were given opportunities to behave dishonestly at home, at school, and at play.
We tend to think of honesty as a personality trait, and thus expect children who are honest
in one situat...
Purchase answer to see full
attachment