Comparing and Contrasting Theories of Learning, Psychology homework help

User Generated

zvunryyn

Humanities

Description

Comparing and Contrasting Theories of Learning

Compare and contrast:

  • Humanism vs. Behaviorism

http://wmich.edu/academicsuccess

http://www.rosejourn.com/index.php/rose/article/view/99/124

http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0104-40602015000200019&lng=en&tlng=en


you will apply basic research to your project by taking some time to research and develop your expertise in the subject. This will aid you in the successful completion of this assignment. The research can be accomplished through reading online material or researching articles and e-books available from the Ashford University Library. You must utilize at least three scholarly sources.

Unformatted Attachment Preview

5 Reinforcement Learning Objectives After reading this chapter, you should be able to do the following: • Define Thorndike’s Law of Effect and explore the difference between reinforcement and classical conditioning. • Understand the Premack principle and its practical applications. • Identify three different types of reinforcers: primary, secondary, and social. • Recognize the importance of the concepts of positive and negative reinforcement. • Describe why a delay in the presentation of a reinforcer can seriously undermine its effectiveness. • Explain the various schedules of reinforcement and the effects of these schedules on rate and pattern of responding. • Examine the relationship between motivation and reinforcement, including contrast effects and the Yerkes-Dodson law. • Discuss the concept of stimulus control. lie6674X_05_c05_153-200.indd 153 4/9/12 8:21 AM Section 5.1 Thorndike’s Law of Effect CHAPTER 5 Using a reward is one of the most obvious ways to encourage a behavior. Parents praise children for good behavior; companies pay salespeople bonuses for high output; universities promote productive researchers. There is nothing new or profound about the idea of using rewards to increase desirable behavior—the principle was probably known long before the discovery of fire. If the principle of reward is so obvious, though, why is behavior often so hard to change? Why do parents find it so difficult to get their teenage children to clean their rooms? Or, to take a more immediately relevant example, why do students sometimes find it so difficult to make themselves study? There are, after all, very powerful rewards for studying: in the short term, good course grades; in the longer term, a better job. Yet students often leave studying until the last minute, and some don’t get around to it at all. Similarly, smoking and overeating can take years off our lives, and people are often desperate to give up these habits; yet the habits persist. Why is behavior in these situations apparently so irrational, with rewards as potent as a good job and longer life having little effect? Clearly, the principle of reward cannot be quite as simple as it sounds. To understand why rewards seem to control behavior in some situations but not others, we will examine experimental research into the principles that determine the effectiveness of rewards. Then, in Chapter 6, we will examine some of the attempts that have been made to apply the principles discovered in the laboratory in real life, and what these attempts have revealed about both the strengths and weaknesses of rewards as a tool for altering behavior. We will begin, though, with the first experimental study of rewards, by Edward Lee Thorndike. 5.1 Thorndike’s Law of Effect Edward Thorndike (1874-1949), shown at the left of this photo, was an American psychologist who spent nearly all of his academic life at Columbia University in New York City. His contributions in the areas of intelligence and learning helped develop the field of educational psychology. lie6674X_05_c05_153-200.indd 154 Thorndike’s research, like Pavlov’s, had its roots in the philosophy of Associationism, but its most immediate antecedent was the publication in 1859 of Charles Darwin’s Origin of Species. Darwin’s theory of evolution proposed that man was but one animal species among many, and this claim triggered a surge of interest in the intelligence and reasoning powers of animals. If Darwin was correct in his belief that we are closely related to other animal species, then the traditional view that animals are dumb brutes becomes far less attractive. After all, if our close relatives were dumb, what might that imply about us? 3/15/12 7:53 AM Section 5.1 Thorndike’s Law of Effect CHAPTER 5 Are Animals Intelligent? To lay the basis for a more realistic judgment, a contemporary of Darwin’s named George Romanes collected observations of animal behavior from reliable observers around the world. When published, the material in Romanes’s Animal Intelligence (1881) seemed to strongly support Darwin’s thesis, as anecdote after anecdote revealed impressive powers of reasoning. Thorndike, however, Rodin’s The Thinker poses an interesting question: Can animal was skeptical of these accounts. species think as deeply as humans? Thorndike’s research For one thing, he wondered if suggested that they can’t, but, as we shall see in Chapter 8, this the observations were entirely conclusion was to prove controversial. accurate—observers’ memories might have become distorted over time, and anecdotes might have been exaggerated as they were told and retold. Even if an incident were described correctly, it might not be representative of the species’ typical behavior. As Thorndike noted, Dogs get lost hundreds of times, and no one ever notices it or sends an account of it to a scientific magazine. But let one find his way from Brooklyn to Yonkers and the fact immediately becomes a circulating anecdote. Thousands of cats on thousands of occasions sit helplessly yowling, and no one takes thought of it or writes to his friend, the professor; but let one cat claw at the knob of a door supposedly as a signal to be let out, and straightaway this cat becomes the representative of the cat-mind in all the books. (Thorndike, 1898, p. 4) The Law of Effect To remedy these defects, Thorndike argued, “experiment must be substituted for observation and the collection of anecdotes. Thus . . . you can repeat the conditions at will, so as to see whether or not the animal’s behavior is due to mere coincidence.” To this end, Thorndike began to study learning in animals using an apparatus that he called a puzzle box. Basically, it was little more than a wooden crate with a door that could be opened by a special mechanism, such as a latch or rope (see Figure 5.1). Thorndike placed a dish containing food outside the box but visible through its slats, then put the animal to be tested inside and observed its reactions. lie6674X_05_c05_153-200.indd 155 3/15/12 7:53 AM Section 5.1 Thorndike’s Law of Effect CHAPTER 5 Figure 5.1: Thorndike’s puzzle box In Thorndike’s design, a dish of food was placed outside of the box, visible through the slats in the box. Thorndike found that animal subjects placed in the box would eventually locate the release apparatus and the latency of this response was shorter with each subsequent trial. Source: Thorndike, 1911 When a hungry cat was first placed in the box, it would scramble around, frantically clawing and biting at the sides of the apparatus in an attempt to escape and reach the food. After approximately 5 to 10 minutes of struggling, the cat would eventually stumble on the correct response—for example, pressing a latch—and, finding the door open, would rush out and eat the food. According to Romanes’s anecdotes, this success should have led to the immediate repetition of the successful response on the following trial. Instead, Thorndike found that the animal generally repeated the frantic struggling observed on the first trial. When the cat finally did repeat the correct response, however, the latency of this response—the amount of time that elapsed from the point when it entered the box to the point when it performed the response—was generally shorter than it had been on the first trial. It became shorter still on the third trial, and so on. Figure 5.2 presents representative records of the performance of two cats. Progress in both cases was gradual and marked by occasional reversals, but on average the time to escape became progressively shorter as training continued. lie6674X_05_c05_153-200.indd 156 3/15/12 7:53 AM Section 5.1 Thorndike’s Law of Effect CHAPTER 5 Figure 5.2: Changes in latency This figure shows the changes in the latency of escape from the puzzle box over trials for two of Thorndike’s cats. This slow and irregular improvement did not suggest that the cats had formed any rational understanding of the situation. Instead, Thorndike argued, the food reward was gradually stamping in an association between the escape response and the stimuli present when it was made (the visual appearance of the box, its odor, and so on). This stimulusresponse or S-R association would be strengthened every time the cat was rewarded, until eventually the box cues would elicit the correct response the instant the cat was placed inside it. Thorndike repeated this experiment with other responses and also with other species, including chicks, dogs, and monkeys. The basic pattern of the results was almost always the same: a gradual improvement over many trials. This uniform pattern suggested that the gradual strengthening effect of rewards was not confined to a single situation or species but, rather, represented a general law of behavior, which Thorndike called the Law of Effect: Of several responses made to the same situation, those which are accompanied or closely followed by satisfaction to the animal will, other things being equal, be more firmly connected with the situation, so that, when it recurs, they will be more likely to recur . . . . The greater the satisfaction . . . the greater the strengthening . . . of the bond. (Thorndike, 1911, p. 24) Some Controversial Issues When Thorndike’s findings were published, they aroused considerable controversy. One focus of debate was his claim that much if not all of animals’ seemingly intelligent behavior could be explained by the formation of associations. We will examine this issue in more depth in Chapter 8; for now, we will simply focus on two other aspects of his findings that attracted attention. lie6674X_05_c05_153-200.indd 157 3/15/12 7:53 AM Section 5.1 Thorndike’s Law of Effect CHAPTER 5 I Can’t Get No Satisfaction Thorndike was criticized by behaviorists for his use of the term satisfaction, which refers to a subjective or mental state. We can’t see into the mind of a cat, so how can we know whether it is experiencing satisfaction? This difficulty in assessing satisfaction makes the Law of Effect potentially circular: A response will increase if it is followed by a satisfying outcome, but the only way we know whether the outcome is satisfying is if the response increases! In fact, Thorndike was aware of this problem, and he proposed an independent and objective test for determining whether a consequence was satisfying. By a satisfying state of affairs is meant one which the animal does nothing to avoid, often doing such things as attain and preserve it. (Thorndike, 1911, p. 245) In other words, if a cat repeatedly tries to obtain food in one situation—for example, by jumping up onto a table where food is kept—then by definition this food must be satisfying, and the Law of Effect now allows us to predict that the food will also be an effective reward for other behaviors, such as escaping from the puzzle box. Meehl (1950) later labeled this property of rewards transituationality. Thorndike’s objective definition Lasagna was a powerful reinforcer for Garfield, the famous of satisfaction saves the Law cartoon cat created by Jim Davis in the late 1970s, to the point of Effect from circularity, but where it sometimes seemed as if he would do anything to the term still bothered learning obtain it. This property of a reinforcer—that its effectiveness theorists because of its subjec- is not confined to one situation but can strengthen a wide tive connotation that a reward range of behaviors in a wide range of situations—is known as is emotionally satisfying. An transituationality. experiment by Sheffield, Wulff, and Backer (1951) illustrates the dangers. To study what events are rewarding, they used an apparatus called a straight-alley maze, which consists of a start box and a goal box connected by a long alley (see Figure 5.3). To find out if a stimulus is rewarding, the stimulus is placed in the goal box, the subject is placed in the start box, and the experimenter records how long it takes for the subject to run to the goal box. If the stimulus is rewarding, it should strengthen the response of running, and the speed of running down the alley should thus increase over trials. lie6674X_05_c05_153-200.indd 158 3/15/12 7:53 AM Section 5.1 Thorndike’s Law of Effect CHAPTER 5 Figure 5.3: Straight-alley maze Start Box Goal Box food In a straight-alley maze, a rat runs from aa.start box (here, atMaze the left) to a goal box; the goal box usually Straight-alley contains a reward such as food. The experimenters used male rats as subjects and a receptive female in the goal box as the reward. The normal copulatory pattern in rats consists of a series of 8 to 12 intromissions and withdrawals by the male until it finally ejaculates. When the male reached the goal box, the experimenters allowed it two intromissions, and then abruptly removed it from the goal box. Intuitively, it is not obvious that males would regard this as a satisfying experience, but it proved to be a very powerful reward, as their speed of running down the alley increased over trials by a factor of eight! Such evidence makes it at least questionable whether all events that strengthen behavior are emotionally satisfying, and it has led learning theorists to prefer the more objective term “reinforcer” to “reward.” A reinforcer can be defined as an event that increases the probability of a response when presented after that response. Similarly, we can define reinforcement as an increase in the probability of a response caused by the presentation of a reinforcer following that response. So, for example, if a child receives a piece of candy every time she does her homework, and this makes her sit down to do her homework more frequently, the candy is considered a reinforcer; it has strengthened or reinforced this behavior. Reinforcement Versus Conditioning The main difference between classical conditioning and reinforcement is that in classical conditioning, a US such as food is delivered regardless of what a person is doing at the time, whereas in reinforcement the person must perform a response—for example, work—to obtain it. lie6674X_05_c05_153-200.indd 159 A further issue arising from Thorndike’s work concerned the relationship between the learning he described and that described by Pavlov. The procedures used by the two investigators clearly differed. Pavlov arranged a contingency between two stimuli: food, for example, was presented following a tone but not in its absence. Thorndike, on the other hand, arranged a contingency between a response and a stimulus: Food was presented only after a correct response. If we used the 3/15/12 7:53 AM CHAPTER 5 Section 5.2 The Reinforcer symbol S to represent a stimulus, R to represent a response, and S* to represent a consequence, then we can represent the two forms of learning as follows: Classical conditioning: S S* Reinforcement: R S* Carrying this point a bit further, in classical conditioning the presentation of food depends solely on whether the CS has been presented: Whether a dog salivates has no effect on whether food is given. In reinforcement, on the other hand, the presentation of food depends crucially on the subject’s response: No response, no food. In procedural terms, classical conditioning and reinforcement clearly differ. This, however, does not necessarily mean that the learning processes involved are different. As we suggested in our discussion of classical conditioning and causal learning, a single learning process could be involved in detecting relationships between events, regardless of the nature of the events concerned. Thus, although the procedures used in classical conditioning and reinforcement are different, the underlying processes could be the same. If the learning processes were the same, we should expect the principles of classical conditioning and reinforcement to be similar if not identical. Just as classical conditioning depends on how closely the US follows the CS, for example, so the effectiveness of reinforcement should depend on how closely the reinforcer follows the response. As we shall see shortly, contiguity is indeed critical in reinforcement, and many of the other principles of conditioning and reinforcement also turn out to be the same. (For further discussion, see Colwill & Rescorla, 1990; Williams, Preston, & de Kervor, 1990.) For our present purposes, though, the key point to note is the distinction between the two procedures. In both reinforcement and classical conditioning, a response is strengthened because of the presentation of an event such as food: However, in reinforcement food is delivered following a response, whereas in classical conditioning food is delivered following a stimulus. 5.2 The Reinforcer One obvious determinant of whether a reward will strengthen behavior is the attractiveness of the reward. As in the classic recipe for elephant stew—where the first step is said to be “catch an elephant”—the first step in using reinforcement effectively is to identify a suitable reinforcer. Primary Reinforcers The most obvious candidates for reinforcers are stimuli that are necessary for survival, such as food and water. It makes sense that such stimuli would become reinforcing in the course of evolution because an animal that repeats a response that has led to food is likely to have a better chance of obtaining food in the future. Thus, a gene that enabled food to be established as a reinforcer would be likely to be transmitted to future generations. It therefore came as no surprise when early research demonstrated that stimuli such as food, lie6674X_05_c05_153-200.indd 160 3/15/12 7:53 AM Section 5.2 The Reinforcer CHAPTER 5 water, and sexual intercourse were all reinforcing. These types of stimuli are known as primary reinforcers, and they are effective essentially from birth. In the early 1950s, evidence began to accumulate that not all reinforcers were necessary for survival, at least not in the simple physical sense in which food is reinforcing. In an experiment by Butler (1954), monkeys were placed in an enclosed cage with two wooden panels, one painted yellow and the other blue. If a monkey pushed open the blue door, it was allowed to look out into the experimental room beyond for a period of 30 seconds. If it pushed against the yellow door, an opaque screen immediately came down, A rat’s reward for navigating this maze successfully terminating the trial. In this experiment, would be cheese, but research on sensory the reinforcer was the monkey’s view of the reinforcement suggests that a rat might enjoy room—in other words, access to visual stim- simply running through the maze! That is, if the rat were given a choice between a path that always uli outside the confines of its box. Not only led to this maze and a second path that always led did the monkeys quickly solve this problem, to an empty box of the same size, it might learn to learning to push only the blue door, regard- choose the path that led to the more complex (in less of the side on which it was presented, human terms, interesting) environment. they proved remarkably persistent in performing the response. In one experiment in which there was a trial once a minute—that is, a 30-second opportunity to look out into the room, followed by a 30-second blank interval—one subject responded on every single trial for nine hours without a break. A second subject responded for 11 hours, and a third for an extraordinary 19 consecutive hours. Visual access to the surrounding room was clearly not necessary for the monkeys’ survival in any direct sense, but it proved to be a remarkably potent reinforcer. As Butler commented, “That monkeys would work as long and as persistently for food is highly unlikely.” Visual stimulation now appears to be only one example of a large set of events that Kish (1966) has referred to as sensory reinforcers. The most important characteristic of these reinforcers seems to be that they provide variety in our perceptual environment. Rats, for example, prefer to explore complex mazes with many turns rather than to explore simple ones (Montgomery, 1954), and humans confined in a dark room will push a button that turns on a panel of flashing lights, with the rate of button-pushing increasing as the pattern of lights becomes less predictable (Jones, Wilkinson, & Braden, 1961). Sensory reinforcers are also primary reinforcers, since they require no special training to be effective. The Premack Principle The evidence that sensory stimulation can be reinforcing suggests that not all reinforcers are physically necessary for survival. Is there any other characteristic, then, that reinforcers such as hamburgers, sex, and flashing lights share? Perhaps the most useful integrating lie6674X_05_c05_153-200.indd 161 3/15/12 7:53 AM Section 5.2 The Reinforcer CHAPTER 5 principle is one suggested by David Premack (1965, 1971). Premack argued that different experiences all have different values for us and that these values can be inferred by observing the amount of time in which we engage in these activities when they are freely available. The common characteristic of reinforcers, said Premack, is that they are all high-probability activities. Is it possible, then, that any high-probability activity will reinforce any response that has a lower probability? Video games are clearly not necessary for physical survival, but some children would gladly spend most of their free time playing them. For these children, playing with video games is a high-probability activity, and according to David Premack, access to any high-probability activity can be used to reinforce less probable activities. Parents who allow children to play video games only when they have completed their homework are effectively applying the Premack principle. Suppose that a group of children were given free access to a number of foods and were found to prefer potatoes to spinach, but to strongly prefer ice cream to both of them. If high-probability responses reinforce lower-probability responses, then—as all parents know—we should be able to use access to ice cream to reinforce eating spinach. However, we should also be able to use access to potatoes to reinforce eating spinach, albeit less effectively, because eating potatoes is also a higherprobability response. Premack (1965) tested predictions like these in a series of experiments involving rats and children, and on the whole the results were positive. The suggestion that more probable responses will reinforce less probable responses thus became known as the Premack principle. A subtle-but-important elaboration of this principle is known as the response deprivation hypothesis (Timberlake & Allison, 1974). This hypothesis states that whether an activity will serve as a reinforcer depends on whether the current level of the activity is below its preferred level. If a child, given a free choice, will eat two ice cream bars a day, then access to ice cream will be reinforcing if the child currently has access to less than two bars. If, however, the child has already eaten four ice cream bars, access to ice cream will most likely not be an effective reinforcer. A “Childish” Application Homme, deBaca, Devine, Steinhorst, and Rickert (1963) reported a particularly delightful application of the Premack principle. The subjects were unruly three-year-olds who repeatedly ignored their nursery school teacher’s instructions and, instead, raced around the room screaming and pushing furniture. One common reaction of adults in such situations is to lose their tempers and punish the children to get them to do as they are told. Instead, Homme and his co-workers set out to reinforce good behavior through a judicious application of the Premack principle. They reinforced the children’s behavior whenever the children sat and played quietly for a specified period of time, with the reinforcer being several minutes of uninterrupted running and screaming! In only a few days, the children lie6674X_05_c05_153-200.indd 162 3/15/12 7:53 AM Section 5.2 The Reinforcer CHAPTER 5 were obeying the teacher’s instructions almost perfectly, so that “an observer, new on the scene, almost certainly would have assumed extensive aversive control was being used” (Homme et al., 1963). Later on, new and even better reinforcers were developed through continued observation of the children’s behavior, including such unusual rewards as allowing the children to throw a plastic cup across the room, to kick a wastepaper basket, and, best of all, to push the teacher around the room in a swivel chair on rolling wheels! The moral of this story is that it is a mistake to think of reinforcers in terms of a restricted list of “approved” stimuli. There is no magic list of reinforcers; the best way to determine what will be reinforcing is to observe what activities a subject engages in when given a free choice. Secondary Reinforcers In contrast to primary reinforcers, which are effective from birth, some of the most powerful reinforcers affecting our behavior are secondary or conditioned reinforcers, which have acquired their reinforcing properties through experience. Money, for example, is not at first an effective reinforcer. Showering an infant with dollar bills is unlikely to have any discernible impact on the infant’s behavior. As we grow older, though, money becomes increasingly important; in some cases, it becomes an obsession. How, then, do secondary reinforcers, such as money or the word “good,” acquire their reinforcing properties? John B. Wolfe (1936) made one of the first attempts to answer this question by examining whether the powerful effects of money in real life could be reproduced in the animal laboratory. Using six chimpanzees as subjects, Wolfe first trained them to place a token into a vending machine to obtain grapes. Once they had mastered this task, they were given a heavy lever to operate to obtain further tokens; Wolfe found that they would work as hard to operate the lever when the reward was tokens as when the reward was the grapes themselves. Furthermore, their behavior bore some striking similarities to that of humans regarding money. In one experiment in which the chimpanzees were tested in pairs, Wolfe found that the dominant member of the pair would sometimes push aside its subordinate to gain access to the lever. If the subordinate had already amassed a pile of tokens, then the dominant member might simply take them away. In one of the pairs, however, the subordinate, a chimp named Bula, developed an effective counterstrategy. She would turn toward her partner, Bimba, extend her hand palm up, and begin to whine. This apparent begging lie6674X_05_c05_153-200.indd 163 The tokens that issue from slot machines can be thought of as secondary reinforcers. Some individuals will sit in front of a slot machine for hours, waiting for the occasional payout of tokens, which they can exchange for real money. The reinforcing properties of tokens are thus acquired through their association with a conditioned reinforcer such as money, which itself becomes a reinforcer through its association with primary reinforcers such as food and other goods that are necessary for survival. 3/15/12 7:53 AM Section 5.2 The Reinforcer CHAPTER 5 was invariably successful: As soon as Bula began to whine, Bimba would quickly hand her one of the tokens and would continue doing so until she stopped whining. As in this study, secondary reinforcers generally acquire their reinforcing properties through pairing with primary reinforcers. (For reviews, see Fantino, 1977, and Williams, 1994. For a possible exception, see Lieberman, 1972, and Lieberman, Cathro, Nichol, & Watson, 1997.) If you wanted to establish a word such as “good” as a secondary reinforcer for a child, for example, you would want to ensure that this word was followed by other reinforcers such as hugs or candy. And, once “good” had become an effective reinforcer, you would want to continue to pair it with backup reinforcers at least occasionally. As with classical conditioning, continually presenting a secondary reinforcer by itself is likely to extinguish its reinforcing properties (for example, Warren & Cairns, 1972). Social Reinforcers A third category (one not usually treated separately) is that of social reinforcers—stimuli whose reinforcing properties derive uniquely from their origin in the behavior of other members of the same species. Stimuli such as praise, affection, comfort and even simple attention from another person can be reinforcing. In fact, this category of reinforcers is probably the one that we encounter most often in our daily lives, and it plays an important—and often underestimated role—in controlling our behavior. Social reinforcers are a blend of both primary and secondary reinforcers. Poulson (1983) found that an adult’s smile could reinforce behavior in infants as young as three months, suggesting that smiling is innately reinforcing. But considerable evidence also indicates that the power of social reinforcers can be altered by pairing them with other reinforcers. The reinforcing properties of the word “good,” for example, can be increased by following this instance of praise with candy (Warren & Cairns, 1972). Thus, although social reinforcement might have an innate basis, experience also plays an important role. In the photograph above, actress Judy Garland is shown hugging her daughter Liza Minnelli. Hugging can be a very powerful social reinforcer for children. lie6674X_05_c05_153-200.indd 164 We can illustrate the power of social reinforcers with a study by Allen, Hart, Buell, Harris, and Wolf (1964). The subject was a four-year-old girl, Ann, who had just started nursery school. From the time of her arrival, she interacted more frequently with the adults than the other children, and as time went on she developed a variety of behavioral problems. She complained frequently about skin abrasions that no one else could see; she spoke in a low voice that was very difficult to hear; and she spent increasing amounts of time standing by herself, pulling at her lower lip and fingering her cheek. One possible analysis of Ann’s behavior might have been that she was an insecure and unhappy 3/15/12 7:53 AM Section 5.2 The Reinforcer CHAPTER 5 child, and thus needed as much comfort and reassurance as possible to help her adjust to her new surroundings. The authors’ analysis, however, was quite different. They noticed that a common feature of Ann’s problem behaviors was that they elicited adult attention. If she stood by herself, for example, a teacher was likely to come over to ask what was wrong. If adult attention was reinforcing for Ann, the teachers were inadvertently encouraging the very behaviors they were trying to eliminate. The authors’ advice to the teachers, therefore, was to change the reinforcement contingencies by paying attention to Ann whenever she played with others but to ignore her when she stood alone. When Ann did talk or play with other children, a teacher would come over to Ann, smile, and talk to her about what she was doing. In this way, the teachers began to reinforce Ann’s social behavior by following her interactions with other children with attention. The result was a dramatic transformation in Ann’s behavior. After just a single day, the proportion of her time spent in social play increased from 10% to 60%, and this higher level was maintained over subsequent weeks. The frequency of reinforcement—in other words, the number of times the teachers came over to Ann while she was playing with others—was then gradually reduced and eventually faded out altogether, but Ann’s social play remained at a high level. (As her skills in playing with other children increased, this play probably became its own source of reinforcement.) Social reinforcers can be very powerful: Even a small shift in adult attention—not money, not candy, but simple attention—was sufficient to substantially alter Ann’s behavior. Also, as often happens, the crucial role of social reinforcement in directing Ann’s behavior was not at first appreciated. Actions such as paying attention to someone are such a common part of our lives that we take them for granted, but, as we shall see again in other applications, social reinforcement can play a very powerful role in controlling behavior. Negative Reinforcers There is another class of reinforcers that we need to discuss at least briefly before proceeding. All of the reinforcers we have discussed to this point have been positive reinforcers, stimuli whose presentation will strengthen preceding responses. However, certain stimuli will strengthen behavior if they are removed, and these stimuli are called negative reinforcers. Suppose, for example, that you have the misfortune to move into an apartment where your neighbor plays appallingly loud music every night. And further suppose that the room contains a white button mounted on a wall, and you discover that each time you push the button the noise stops for one minute. lie6674X_05_c05_153-200.indd 165 When we take aspirin to relieve a headache, the subsequent reduction in pain is very reinforcing and makes it more likely that we will take aspirin again the next time we have a headache. Because the reinforcer is the removal of something—in this case, headache pain—it is called a negative reinforcer. 3/15/12 7:53 AM Section 5.3 Delay of Reinforcement CHAPTER 5 You would likely develop a real fondness for pushing the button. This is considered an instance of negative reinforcement: The reinforcer is the cessation of the loud music, and the response of pushing the button is reinforced, or strengthened, every time the music stops. In other words, the reinforcer is the removal of an unpleasant stimulus, rather than the presentation of a pleasant one. (Another example would be taking an aspirin to relieve the pain of a headache; this behavior would be reinforced by the termination of pain.) To recap, we talk about a stimulus as a positive reinforcer when its presentation strengthens a response, but as a negative reinforcer when it is the removal of the stimulus that is reinforcing. Note that positive and negative reinforcement are both forms of reinforcement: In both cases, the outcome is a strengthening of a response. The term “negative reinforcement” is sometimes misused to mean punishment, but that is a mistake to try to avoid. If the term reinforcement is used, it always means a strengthening of behavior; in negative reinforcement this is achieved by removing an unpleasant or undesirable stimulus. In some ways the distinction between positive and negative reinforcement is purely technical—is the result achieved by presenting a stimulus or removing it?—but it is important to use the terms correctly so as to avoid misunderstandings. 5.3 Delay of Reinforcement Having identified a wide variety of potential reinforcers, we turn now to the question of what determines whether a particular reinforcer will be effective in practice. Research with Animals Because of the critical importance of contiguity in classical conditioning, where delays of even a few seconds between the CS and the US could prevent conditioning, researchers assumed that contiguity would also be critical in reinforcement. Early attempts to demonstrate this, however, encountered unexpected difficulties. In a study by Wolfe (1934), for example, rats were allowed to run through a T-shaped maze, or T-maze, to obtain food (see Figure 5.4). Subjects received food only if they turned to the right, and Wolfe anticipated that imposing a delay between turning to the right and obtaining food would make learning difficult. In fact, he found that rats were able to learn the correct path even with delays of 20 minutes between the time they chose the correct path and the time they obtained food. lie6674X_05_c05_153-200.indd 166 3/15/12 7:53 AM CHAPTER 5 Section 5.3 Delay of Reinforcement Figure 5.4: T-maze Goal Box Goal Box food Start Box In a T-maze, the rat runs from the start box to a choice point, where it can turn to either the right or the left. In a typical experiment, only one of the goal boxes contains a reward. b. T-maze The reason, researchers eventually discovered, was that although the primary reinforcer, food, was delayed, the response was still producing immediate secondary reinforcement. When the rats made a correct turn, they immediately entered a delay box where they were held until they were released into the goal box that contained the food. This meant that the stimuli of the delay box were present just before they obtained food, and this resulted in the delay box becoming a reinforcing stimulus: When the rats made a correct response on subsequent trials, simply entering the delay box reinforced this response (Grice, 1948). Subsequent research showed that when immediate secondary reinforcement is eliminated, however, contiguity is just as important in reinforcement as it is in conditioning. A study by Dickinson, Watt, and Griffiths (1992) provides one example. The authors trained rats in a standard testing apparatus called an operant chamber or, more colloquially, a Skinner box. This apparatus was developed by one of the most influential figures in the history of animal learning research, B. F. Skinner, and is essentially a descendent of the puzzle box developed by Thorndike. In Thorndike’s box, subjects had to open a door to escape from the box and obtain food; in the Skinner box, animals remain in the box and make a response such as pressing a lever to obtain food. (See Figure 5.5 for an illustration of a typical Skinner box.) Because rats are free to press the lever again as soon as they have eaten the food, it is possible to deliver many reinforcers in a very short period of time, making the Skinner box a very efficient apparatus for studying the development of learning. lie6674X_05_c05_153-200.indd 167 3/15/12 7:53 AM CHAPTER 5 Section 5.3 Delay of Reinforcement Figure 5.5: Skinner box Speaker Pellet dispenser Signal lights Lever Dispenser tube Food cup Electric grid To shock generator A Skinner box, or operant chamber, for rats. When the rat presses the bar, a food pellet is delivered to a tray located below the bar. In the Dickinson et al. study, the time between pressing the lever and obtaining food was varied in different groups: Some rats received a food pellet 2 seconds after pressing the lever, others after delays of up to 64 seconds. As shown in Figure 5.6, the delay used had a powerful effect on the rate at which the rats pressed the lever. An increase in the delay of just a few seconds produced sharply lower rates of responding, and responding ceased altogether when the delay reached 64 seconds. lie6674X_05_c05_153-200.indd 168 3/15/12 7:53 AM CHAPTER 5 Section 5.3 Delay of Reinforcement Figure 5.6: Effects of delayed reinforcement Lever Pressure per Minute 20 15 10 5 0 0 20 40 60 Delay of Reinforcement (seconds) This figure demonstrates the effects of delayed reinforcement on bar-pressing in rats. The longer reinforcement was delayed, the lower was the rate of responding. Source: From Dickinson, A., Watt, A., & Griffiths, W. J. (1992). Free-operant acquisition with delayed reinforcement. Quarterly Journal of Experimental Psychology: Comparative and Physiological Psychology, 45B, 241–258, Figure 6. Reprinted by permission of Taylor & Francis Group. Why should a delay of just a few seconds have such a powerful impact? At first, learning theorists thought it was because rats have poor memories, so that if a reward were delayed, they wouldn’t be able to remember the response that produced it. However, later research made it clear that rats can remember their responses for surprisingly long periods—in one study by Capaldi (1971), for 24 hours. It now looks as if the problem is not that rats can’t remember their responses, but rather that they have difficulty figuring out which of the many responses they have made produced the reward. From the experimenter’s point of view, the correct response in the Dickinson et al. study seems obvious, but from the rat’s perspective the situation was far more confusing. Prior to finding the food it would have been engaged in a continuous stream of activity—grooming, exploring the cage, and so on—and this behavior would have continued during the delay interval. At any given moment, moreover, it would have been performing many lie6674X_05_c05_153-200.indd 169 6/5/13 11:37 AM CHAPTER 5 Section 5.3 Delay of Reinforcement responses simultaneously. As it pressed the lever, it might have been holding its head at a 45-degree angle, breathing rapidly, curling its tail to its left side, and so on. Rather than the simple situation depicted in Figure 5.7a, with just a single response preceding food, the rat would have experienced a situation more like that shown in Figure 5.7b, with the correct response (Rc) embedded in a sea of other behaviors. Figure 5.7: Dickinson study from the experimenter’s and rat’s perspectives R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R Rc Food R R R R R R Rc R R R R R R Food R R R R R R R R R R R R R R R R R R R R R R R R R R a. b. A situation in which a rat receives food after a delay, (a) from the experimenter’s perspective, in which only a single response precedes food, and (b) from the rat’s perspective, in which the correct response was only one of many which it made, both simultaneously and sequentially, before receiving the food. Model created by author Viewed from this perspective, it is not surprising that rats have difficulty figuring out which of their behaviors produced food. There are so many possibilities, it’s a wonder that they ever do solve such problems. If the correct response is made more obvious or marked by having it produce a distinctive stimulus—for example, a brief tone or flash of light—then rats turn out to be much better at solving the problem (for example, Lieberman, McIntosh, & Thomas, 1979). Research with Humans What, then, of humans: Will a reinforcer be effective only if it occurs within seconds of the behavior to be strengthened? You know from your own experience that this is not so: A good grade for an essay, for example, can influence your future behavior even if there was a delay of many days between your writing the essay and receiving the grade. How, then, can we reconcile the evidence from animal research with our everyday experience? One obvious answer is language. If you received a reward without any explanation— think of a mysterious stranger approaching you, silently handing you $500 and walking away—then you, like the rats in Dickinson et al.’s study, might also struggle to lie6674X_05_c05_153-200.indd 170 3/15/12 7:53 AM CHAPTER 5 Section 5.3 Delay of Reinforcement understand why you were being rewarded. Indeed, when experiments with humans employ procedures that parallel those used with animals, where rewards are provided without explanation, the results are almost uncannily similar (e.g, Shanks, Pearson, & Dickinson, 1989; Lieberman, Vogel & Nisbet, 2008). Fortunately for us, the relationship between our behavior and its rewards is rarely this opaque because we possess language. If a father decides to reward his young daughter for some exemplary behavior, he doesn’t just hand her a new toy without a word; he explains what it is for. Language can bridge the gap between a response and a reward symbolically, even when physically they are widely separated in time. Delay Reduces Incentive Our possession of language means that a delay in the presentation of a reward need not be nearly as catastrophic for people as for rats. Nevertheless, we are going to suggest that it is still desirable—and sometimes even vital—to reward behaviors as quickly as circumstances allow. One reason is that rewards tend to be perceived as less attractive when they are delayed. If you were offered a choice between receiving $100 now or in a year, would you find these options equally attractive? It seems unlikely. In the jargon of the field, a delayed reward has less attractive incentive value, and we are less motivated to work to obtain it. One example was reported in a study by Rachlin and Green (1972) using pigeons as subjects. They trained the pigeons in a Skinner box containing two circular plastic disks called keys. If the birds pecked the key on the left (key 1), grain was immediately made available for 2 seconds, whereas if they pecked the key on the right (key 2), grain was made available for 4 seconds, but only after a delay of 4 seconds. R1 R2 2 seconds of food 4 seconds of food The time between trials was held constant, so that over the course of a session, a bird that always pecked key 2 would receive twice as much food as a bird that always pecked key 1. Despite this, the pigeons pecked key 1 on 95% of the trials. They preferred to receive half as much food rather than wait four seconds for the larger amount. You might be tempted to dismiss this result as evidence of pigeons’ lack of intelligence, but Kirby and Herrnstein (1995) found that humans discount delayed reinforcers in much the same way. To assess the value of delayed rewards, they offered college students a choice between a smaller amount of money to be delivered soon and a larger amount to be delivered later. For example, subjects were asked if they would prefer $12 in 6 days or $16 in 12 days. The students were offered a number of such choices, and, to ensure that they would take these choices seriously, they were told that one of their choices would be selected at random at the end of the session, and they would actually receive the option they had chosen. Rationally, you might think that the students would have preferred receiving $16 to $12. Since both rewards were substantially delayed anyway, surely it would be better to wait a few more days and receive 33% more money? Apparently not, as most participants preferred the smaller sum that was delivered sooner. Like pigeons, we seem to value rewards less when they are delayed. (See also Kirby, 1997.) lie6674X_05_c05_153-200.indd 171 3/15/12 7:53 AM Section 5.3 Delay of Reinforcement CHAPTER 5 An intriguing study by Madden, Petry, Badger, and Bickel (1997) suggests that this preference for immediate rewards is even more pronounced in drug addicts. The authors asked addicts and control subjects to choose between hypothetical options similar to those used by Kirby and Herrnstein. They found that addicts were much more likely to choose the more immediate reward—they seemed less able to tolerate delayed gratification. One possible reason for this result is that addicts’ experiences with drugs increased their need for immediate reinforcement. Another possibility is that the addicts had a preexisting need for immediate reinforcement that made them more vulnerable to addiction in the first place. Resolving this question would require longitudinal data, tracking whether individuals who are poor at tolerating delayed rewards are more likely to later develop problems such as addiction. Reinforcing Homework A study by Phillips (1968) provides a real-life example of the value of providing reinforcement quickly. To improve procedures for treating juvenile delinquents, Phillips established a residential home for boys, called Achievement Place. Because one problem shared by most delinquents is failure in school, which in turn reflects an almost total failure to do any assigned homework, Phillips set out to encourage homework completion through the use of reinforcers. Whenever an assignment was completed to an acceptable standard, the boys were allowed to stay up for one hour past their normal bedtime on weekends. This reward was known as “weekly time.” The effect of this reward on the behavior of one boy, Tom, Many students have difficulty studying in a consistent way; is shown in Figure 5.8. Over part of the reason for this is that the reinforcers for studying a 14-day period, Tom did not (such as getting a good grade, obtaining a job, etc.) are delayed rather than immediate. complete a single assignment. One possible explanation for this failure was that the reinforcer being used was not sufficiently attractive; maybe Tom didn’t value being allowed to stay up late. Another possible explanation was the delay between completing an assignment during the week and being allowed to stay up late on the weekend. To find out, Phillips used exactly the same reinforcer in the next phase of the study—one hour of late time for each correct assignment—but now allowed Tom to stay up on the night that an assignment was completed rather than waiting until the weekend. These results are also shown in Figure 5.8, in the section labeled “daily time.” lie6674X_05_c05_153-200.indd 172 3/15/12 7:53 AM CHAPTER 5 Section 5.3 Delay of Reinforcement Assignments Completed (percent) Figure 5.8: The Phillips study: different reinforcement conditions weekly time daily time 100 80 60 40 20 0 0 14 21 Sessions This figure displays the percentage of homework assignments completed by Tom under two conditions. In both, the reward for successful completion of homework was being allowed to stay up an extra hour. In the first condition, the reward could not be collected until the weekend (“weekly late time”), and Tom did not complete a single assignment. When he could stay up the same night (“daily late time”), his performance improved substantially. Source: Adapted from Phillips, 1968 We can see that the percentage of homework assignments completed rose immediately from 0 to an average of 50%. Even though the same reinforcer was used in both conditions, its effectiveness varied dramatically depending on the delay in its presentation. Thus, although reinforcers can be effective after a delay, as a general rule they should be delivered as soon after a response as possible if they are to achieve their full potential. Failure to adhere to this principle may be one of the most important reasons that reinforcers are sometimes ineffective. At the beginning of the chapter, we referred to the puzzle of why students have difficulty studying despite the potent rewards—good grades, a job that pays well—contingent on this behavior. One important reason is almost certainly the delay involved in reinforcement. The reinforcers for studying arrive only after very long delays, whereas those for alternative activities, such as going to a movie or a football game, are essentially immediate. This principle is illustrated in the diagram below, where SR represents any of the possible reinforcers for going to a movie (being with friends, the enjoyment of watching the movie itself, etc.). SR follows R movie almost immediately. In contrast, the reinforcers for studying (R studying) come much later in time. lie6674X_05_c05_153-200.indd 173 3/15/12 7:53 AM CHAPTER 5 Section 5.4 Schedules of Reinforcement R movie SR R studying SR The student who doesn’t study might thus be behaving much like the pigeon in the Rachlin and Green study: he knows that in the long term one response produces much more valuable consequences, but is nevertheless unable to resist the temptation of immediate gratification. The moral to this section can thus be summarized very simply: For a reinforcer to be maximally effective, it should be presented as soon as possible after a response. 5.4 Schedules of Reinforcement One of the most important factors determining the effect of reinforcement was discovered by accident. When Skinner was carrying out the research for his Ph.D., he ran his experiments on weekends as well as during the week, and one Saturday he discovered that his supply of pellets would not last until Monday. So instead of reinforcing every bar-press as he had done in the past, he decided to reinforce only one per minute. This had two gratifying consequences: 1. His supply of pellets lasted almost indefinitely. 2. The rats continued to respond and, after some initial hesitation, did so at a steady rate. Over time, Skinner tried several different rules, or reinforcement schedules, for deciding which responses to reinforce, and he found that the choice of schedule had important consequences for how his animals responded. We will begin by defining some of the schedules he used and then look at their effects on behavior. Ratio and Interval Schedules The simplest schedule is to reinforce a response every time it occurs. This schedule is known, not unreasonably, as a continuous reinforcement (CRF) schedule. In the real world, though, behavior is rarely reinforced so consistently. Children, for example, are not praised every time they tell the truth, and factory workers are not paid every time they tighten a screw. Instead, most behavior is reinforced on intermittent, or partial, reinforcement schedules. Two types of partial reinforcement schedules have been studied most often: ratio schedules and interval schedules. In a ratio schedule, reinforcement depends on the number of responses that have been emitted. In factories, for example, workers’ wages used to depend solely on the number of responses they made—for example, the number of dresses made—regardless of how long it took. In an interval schedule, on the other hand, the passage of time since the last reinforcement, rather than the number of responses, determines whether the next response will be reinforced. Whether you find mail the next time you go to your mailbox, for example, will depend on how long it has been since the last time you found mail, not on how often you visited the mailbox in the interim. Note that obtaining lie6674X_05_c05_153-200.indd 174 3/15/12 7:53 AM Section 5.4 Schedules of Reinforcement CHAPTER 5 reinforcement in an interval schedule still requires a response: You do not obtain mail unless you go to the mailbox. The length of the interval determines when reinforcement becomes available; a response is still necessary to actually obtain it. Further complicating matters, ratio and interval schedules can be subdivided according to whether the requirement for reinforcement is fixed or variable. In a fixed interval (FI) schedule, the interval that must elapse before a response can be reinforced is always the same, whereas in a variable interval (VI) schedule this interval is varied. In an FI 60-second schedule, for example, 60 seconds must always elapse following a reinforcement before a response can be reinforced again, whereas in a VI 60-second schedule, the interval might be as short as 5 seconds or as long as 2 minutes. (The 60 seconds in the schedule’s name refers to the average.) Ratio schedules are subdivided in a similar way. Trips to a mailbox are reinforced on an In a fixed ratio (FR) schedule, the number of FI schedule—a set period must elapse responses required for reinforcement is always following a delivery before another trip the same. For example, a rat who receives a food can be reinforced. pellet after every three lever presses is on a fixed ratio schedule. In a variable ratio (VR) schedule, the number of responses required to obtain reinforcement varies across successive reinforcements. For example, FR 30 means that every 30th response will be reinforced; VR 30 means that an average of 30 responses (sometimes only 5 responses, sometimes 50, and so on) will be required for reinforcement. A slot machine in a casino is a classic example of a VR schedule: Payoffs depend on how many times the machine is played, but the jackpot is made unpredictable to prevent players playing only when a machine has been in use by others for a long time. Figure 5.9 summarizes these four schedules. lie6674X_05_c05_153-200.indd 175 3/15/12 7:53 AM CHAPTER 5 Section 5.4 Schedules of Reinforcement Figure 5.9: Partial reinforcement schedules The most commonly studied types are ratio (where reinforcement depends on the number of responses emitted) and interval (where reinforcement depends on time since the last reinforced response). Schedules are further subdivided according to whether the schedule requirement is fixed or variable. Patterns of Responding Learning the distinctions among the various schedules can be tedious, but each schedule has somewhat different effects on behavior, and these differences can be important. Figure 5.10 presents cumulative records illustrating the typical patterns of responding obtained under FI and FR schedules of reinforcement. In a cumulative response record, time is plotted along the x-axis and the y-axis shows the cumulative or total number of responses made since the beginning of the session. If a rat were to press a lever at a steady rate of one press every second, this would appear on a cumulative record as an ascending straight line. The faster the rat responds, the more steeply the line will rise. FI Time Cumulative Responses Cumulative Responses Figure 5.10: Typical cumulative response records FR Time b. a. This figure shows the typical cumulative response records generated by two types of schedules: (a) fixed interval (FI); (b) fixed ratio (FR). The short diagonal marks indicate presentations of a reinforcer. lie6674X_05_c05_153-200.indd 176 3/15/12 7:53 AM Section 5.4 Schedules of Reinforcement CHAPTER 5 In an FI schedule (Figure 5.10a), reinforcement becomes available only after a fixed period of time has elapsed following the previous reinforcement; each short diagonal mark on the record indicates the occurrence of a reinforcer. We can see that immediately after reinforcement, subjects respond at a very low rate, but this rate steadily accelerates and reaches a peak just before the next reinforcement is due. Thus, subjects tend to respond in a cyclical pattern. Because of its appearance when graphed, this positively accelerated response pattern is called an FI scallop, and it has important implications for the practical use of FI schedules. For example, if you were a parent who wanted to encourage your daughter to study by praising this behavior, it would be a great mistake to visit her room only at regular, hourly intervals. If your praise were the main reinforcer for studying, it is likely that your daughter would begin to study at regular, hourly intervals. Ironically, psychology professors (including those teaching learning) make exactly this mistake by scheduling exams at predictable, fixed intervals, with the result that students’ studying often takes the form of a classic FI scallop: a zero or very low rate of studying immediately after an exam, gradually rising to a frantic peak the night before the next exam! Mawhinney, Bostow, Laws, Blumenfeld, and Hopkins (1971) reported evidence that studying really does follow this pattern. To estimate the amount of time students spend studying, they monitored the use of course material in the library. When exams were scheduled daily, students maintained a constant rate of studying of around 60 minutes per day. When exams were scheduled at 3-week intervals, students spent an average of only 15 minutes studying during the first session following an exam, but study time then increased steadily over days, reaching a peak of almost two hours during the session just before the next exam. In this case the effects of an FI schedule on students were almost identical to those on rats and pigeons, but this is not always the case. We will discuss this anomaly further in Chapter 8. Figure 5.10b shows the typical response pattern under an FR schedule. Here, reinforcement is contingent on a fixed number of responses, and the result is generally “pauseand-run” behavior. Subjects pause for a while after reinforcement (as shown by the level horizontal line after the diagonal mark on the graph), but once they begin to respond, they respond steadily until they earn another reinforcer. If the ratio requirement is too great, however, ratio strain may be observed: Subjects will begin to respond, then pause, respond a bit more, pause again, and so on. If the schedule requirement is not reduced at this point, subjects soon cease to respond altogether. In VI and VR schedules, by contrast, the requirement for reinforcement is varied, with the result that a response can be reinforced at any time. The result is that these schedules produce much steadier rates of responding, without such obvious pauses. The Partial Reinforcement Effect Having described the properties of the five schedules most often studied—CRF, FI, FR, VI, and VR—can we now say which one is best? If our goal were to ensure as strong a response as possible, the obvious answer would seem to be to reinforce the desired response every time it occurred (CRF)—the more often a response is reinforced, the stronger it should be. Indeed, in some respects that is so, but reinforcing every response can sometimes have unintended consequences. Consider the following experiment by Lewis and Duncan (1956). The subjects in this study were college students, and they were given lie6674X_05_c05_153-200.indd 177 3/15/12 7:53 AM CHAPTER 5 Section 5.4 Schedules of Reinforcement an opportunity to play a slot machine. They were told that they could play as long as they wanted, and that each time they won they would earn five cents. The percentage of reinforcement was varied across groups during the first phase: One group was never reinforced; a second group was reinforced once; and so on. Reinforcement was then discontinued, and the experimenters monitored how long subjects continued to play. You might think that the higher the percentage of reinforcement, the stronger the response, and thus the longer subjects would continue to play. As shown in Figure 5.11, however, that was not the case. Quite the contrary, the lower the percentage of reinforcement during training, the longer subjects played during extinction. This counterintuitive result—that partial reinforcement during training increases responding during extinction—is called the partial reinforcement effect (PRE). It should be noted that in the Lewis and Duncan experiment, the no-reinforcement condition (0%) resulted in the highest levels of responding during extinction, but this is not usually the case. The persistent responding in this group was probably caused by the wording of the instructions, which implied that some reinforcement would be given if subjects responded. When this reinforcement was not forthcoming following the first eight plays, participants kept trying. Plays to Extinction (mean log) Figure 5.11: Partial reinforcement on responding during extinction 2.0 1.9 1.8 1.7 0 25 50 75 100 Percent Reinforcement This graph displays the effect of partial reinforcement on responding during extinction. The lower the percentage of reinforcement college students received for playing a slot machine during training, the longer they persisted in playing during extinction. Source: Adapted from Lewis & Duncan, 1956 lie6674X_05_c05_153-200.indd 178 3/15/12 7:53 AM Section 5.4 Schedules of Reinforcement CHAPTER 5 The partial reinforcement effect was so surprising to psychologists that at first it was called “Humphreys’ paradox,” after the researcher who discovered it. Various explanations were proposed, but most learning psychologists now agree that the fundamental cause is the difficulty subjects have in judging whether further responding is likely to produce reinforcement. For subjects who have always been reinforced, the transition to extinction is obvious, and they are likely to quit responding quickly. For subjects who have received reinforcement after long periods of nonreinforcement during training, on the other hand, the transition to extinction is less obvious, and they are more likely to persist in the hope that they will eventually be reinforced. Tantrum behavior in children provides a real-life example of the partial reinforcement effect. When parents pay attention to a child having a tantrum, their attention can reinforce this behavior. Sometimes parents realize this is the case, so they try hard to ignore the tantrum. If, with great effort, they manage to ignore their child’s tantrums 90% of the time, they might then be baffled when the tantrums continue, but this persistence follows directly from the partial reinforcement effect: By reinforcing the behavior on a partial reinforcement schedule (in this case, a VR10), the parents are in fact increasing the persistence of the behavior, as the child learns that persistence will eventually pay off. If parents do decide to ignore tantrums, it is very important that they do so consistently, as even one or two reinforcements can dramatically increase the time required for extinction. Choosing a Schedule Let us now return to the question of which schedule is best. Reinforcing every response (CRF) has some important advantages, but it also has some serious disadvantages. As we have just seen, continuous reinforcement does not encourage persistent responding—if reinforcement is not available for a while, there is a greater likelihood that responding will cease. A further disadvantage is that continuous reinforcement is often costly: In monetary terms, it costs whatever the value of the reinforcer is, but it also requires considerable time and effort of the person delivering the reinforcer to ensure that he or she is always present when the desired response occurs. Given these problems, the optimum strategy for producing durable responding is usually to begin by reinforcing every response, but then to gradually reduce the rate of reinforcement to the lowest level that will maintain a satisfactory response rate. For this purpose, schedules with variable reinforcement requirements are generally preferable to schedules with fixed requirements because the unpredictability of reinforcement generates more consistent and rapid responding. Our search for the “best” schedule, therefore, has narrowed to two candidates: VR and VI. Which should you use? The answer turns out to be a bit complicated. A VR schedule normally generates a higher rate of response than a VI schedule because reinforcement on a VR schedule directly depends on the number of responses: If a subject doubles the number of responses he or she makes, the reinforcements will also double. On the other hand, if the VR requirement is set too high, subjects will abruptly quit, whereas VI schedules can maintain a low but steady rate of responding even when reinforcement is infrequent. In sum, a VR or a VI schedule is generally the most effective in maintaining persistent responding; a VR schedule will tend to generate higher response rates, but if reinforcement is to be delivered only infrequently, then a VI schedule is more likely to sustain responding. lie6674X_05_c05_153-200.indd 179 3/15/12 7:53 AM Section 5.4 Schedules of Reinforcement CHAPTER 5 A Criminally Successful Application By now, your feelings about schedules might resemble those of the child whose review of a book about penguins began, “This book told me more about penguins than I wanted to know.” Learning the technical distinctions among schedules is tedious, but as we suggested earlier, different schedules can have very different effects, and when used imaginatively, schedules can be powerful tools for altering behavior. In a striking demonstration of the importance of the schedule used, Kandel, Ayllon, and Roberts (1976) used reinforcement as part of a remedial high school education program in a Georgia state prison. The subjects were two inmates, one with a measured IQ of 65, the other with an IQ of 91. To reinforce studying, they were awarded points whenever they passed a test with a score of 80% or better, and these points could then be exchanged for a variety of reinforcers such as cigarettes, cookies, and extra visiting privileges. With 1000 points, for example, a convict could buy a radio as a present for his family. The program produced significant progress, but not as much as the authors had hoped. One possible explanation was that the inmates simply were not bright enough to progress any faster. (With IQs of 65 and 91, it was perhaps remarkable that they progressed as fast as they did.) Another possibility was that the reinforcement schedule did not provide sufficient incentive for the hard work required. To find out, the authors devised a new schedule in which the faster the inmates progressed, the more points they earned. If an inmate completed one grade level in a subject in 90 days, for example, he received 120 points; if he did it in only 4 days, he received 900 points; and if he did it in only 1 day he received 4700 points. The result was a quite staggering rate of progress. Under the old schedule, one of the convicts, Sanford, had completed ninth-grade English in three months—all things considered, not unimpressive. Under the new schedule, he completed tenth-, eleventh-, and part of twelfth-grade English in just one week. He often missed recreational periods and stayed up all night to work. As he remarked to one of the instructors, he wanted to “get when the gettin’ was good.” During the five months of the program—standard reinforcement schedule as well as enriched—he advanced 4.6 years in high school arithmetic, 4.9 years in reading, and 6.6 years in language. In other words, he completed almost five years of high school in five months—roughly 12 times the normal rate. And Sanford was the one with an IQ of 65! These results have at least two important implications. First, and most relevant to our current concern, they illustrate how powerfully the choice of reinforcement schedule can determine the effectiveness of reinforcement. More generally, they hint at how often we underestimate people’s ability to learn and change. Knowing Sanford’s criminal record and apparent IQ, few would have believed that he was capable of such progress. But under appropriate learning conditions, all of us—learning disabled as well as gifted, criminal as well as non-criminal—might be capable of far more learning than is commonly assumed. Too often, we blame failure on the learner: “Oh, he’s too stupid.” “She’s just not trying.” A much more productive reaction to failure might be to assume that our teaching methods are at fault and to search for better methods. We have now seen two examples in which a critical reexamination of teaching procedures led to dramatic improvements in learning—Phillip’s change to immediate reinforcement at Achievement Place, and the Kandel group’s imaginative use of a new reinforcement schedule—and we shall encounter others as we proceed. Greater faith in human potential can sometimes pay handsome dividends. lie6674X_05_c05_153-200.indd 180 3/15/12 7:53 AM CHAPTER 5 Section 5.5 Motivation 5.5 Motivation Whether you performed a response to obtain a reinforcer would depend not only on whether you believed the response would produce the reinforcer (learning) but also on whether you wanted the reinforcer (motivation). To take a simple example, whether you insert a coin into a vending machine to obtain a cup of coffee would depend not only on whether you had learned that a coin would operate the machine but also on whether you wanted coffee. Motivation, in turn, depends partly on how long you have been deprived of that reinforcer and partly on its attractiveness. How hard you would work to obtain food, for example, would depend on how hungry you were and how much you liked that food. To use a carrotand-stick analogy, deprivation functions as a stick to drive us The effectiveness of a reinforcer depends not only on what it forward, and the reinforcer is—in this case, an apple—but also on our motivation to obtain functions as a carrot to attract it—for example, whether we’re hungry. us; we need to consider both in predicting how effective a reinforcer will be. Figure 5.12 summarizes these concepts: Whether you perform a reinforced response depends on both learning and motivation, and motivation in turn depends on the amount of reinforcement available and how long you have been deprived of it. Figure 5.12: The effect of a reinforcer depends on learning and motivation learning (R S R) performance of response amount of SR deprivation motivation R ( value of S ) If a response is reinforced, future performance of that response will depend on both learning (knowing that the response produces a reinforcer) and motivation (wanting the reinforcer). lie6674X_05_c05_153-200.indd 181 3/15/12 7:53 AM CHAPTER 5 Section 5.5 Motivation On the surface, the concept of motivation is simple—the more you want a reinforcer, the harder you will work to obtain it. When this concept is examined more closely, though, it turns out to be surprisingly complex. In this section we will examine two of these complications. (For further examples, see Bolles, 1975, and Balleine, 2001.) Contrast Effects The attractiveness of a reinforcer is referred to as its incentive value. One determinant of incentive value is the nature or quality of the reinforcer— most children, for example, can be relied on to prefer ice cream to spinach—and another is the amount or quantity provided. In one examination of the effect of amount, Crespi (1942) trained rats to run down a straight-alley maze to a goal box containing either 1, 16, or 256 pellets of food. A larger amount should have a greater incentive value, so the rats should run faster to obtain it. As shown in the left-hand section of Figure 5.13, that is exactly what Crespi found. The effectiveness of a reinforcer depends in part on the amount offered—the greater the amount, the harder we will work to obtain it. Figure 5.13: The effect of amount of reinforcement on running speed in rats Running Speed (ft/sec) Preshift Postshift 4.0 256 16 pellets 16 16 pellets 1 16 pellets 3.0 elation depression 2.0 1.0 0 0 2 4 6 8 10 12 14 16 18 20 2 4 6 8 Trials This figure features the effect of amount of reinforcement on running speed in rats. During the pre-shift phase, shown at the left, rats received either 1, 16, or 256 pellets of food after running down an alley; the larger the reward, the faster they ran. In the test phase, all the rats received the same reward of 16 pellets, but the effect of this reward depended on what amount they had received previously. lie6674X_05_c05_153-200.indd 182 3/15/12 7:53 AM Section 5.5 Motivation CHAPTER 5 During the initial, or preshift, phase (left portion of the graph), groups received either 1, 16, or 256 pellets of food on each trial. As we can see, the group of rats being reinforced with 256 pellets had the fastest speed, running as much as 4.0 feet/second. The group reinforced with 16 pellets had a more moderate running speed, and the group given the reinforcer with the lowest incentive value (1 pellet) never achieved a running speed greater than 1.0 feet/second. At the twentieth trial, the researchers gave the same number of pellets (16) to all three groups. The results of this shift are shown on the right (postshift) side of the graph. The group previously given 1 pellet ran faster than the group already accustomed to 16 pellets, resulting in an elation effect, or positive contrast. The group previously given 256 pellets ran slower than the group accustomed to 16 pellets, resulting in a depression effect, or negative contrast. (Adapted from Crespi, 1942.) This result could be explained in two quite different ways. The first possibility is the one we have already considered, that quantity affects motivation. The group that received 256 pellets would have found this reward more attractive and therefore ran faster to obtain it. The second possibility is that quantity affects learning. According to Thorndike’s Law of Effect, satisfaction stamps in an association between a response and the situation in which it is made, and the greater the satisfaction, the stronger the association. According to this interpretation, the group receiving 256 pellets ran faster because the larger reward produced a stronger association between the alley cues and the response of running. So, does amount of reinforcement affect learning or motivation? To find out, Crespi ran a second (postshift) phase in which he gave all three groups 16 pellets when they reached the goal box. For the group switched from 256 pellets to 16 pellets, a motivational interpretation predicts that they should now be less motivated to reach the goal box and thus should run more slowly. According to the Law of Effect, on the other hand, this group should continue to run quickly because they still receive food every time they reach the goal box, and this reward should continue to strengthen the response. The results, shown on the right-hand side of the figure, supported the motivational interpretation, as reducing the amount of food produced a precipitous drop in running speed. Similarly, increasing the amount from 1 pellet to 16 pellets produced a sharp increase in running speed. These results strongly supported a motivational interpretation. In the words of Logan and Wagner, If a rat’s speed of running decreases over a series of trials after its reward has been reduced, it is unreasonable to conclude that the current trials have caused the animal to know less about the runway or about the appropriateness of running. Common sense says that the animal simply learned that he would receive a smaller reward as a consequence of the running. (Logan & Wagner, 1965, p. 43) As this quote suggests, amount of reward did have some effect on learning, in the sense that the amount of food in the goal box was one of the things that subjects learned. The main effect of reward, however, was clearly on motivation—when the amount was increased, for example, the rats immediately began to run faster. The fact that the amount of reinforcement affects motivation is perhaps not altogether surprising, but one aspect of Crespi’s results was less easily predicted. If you look again at lie6674X_05_c05_153-200.indd 183 3/15/12 7:53 AM Section 5.5 Motivation CHAPTER 5 the right-hand side of the graph, you will see that the running speeds of the three groups differed substantially in the second phase, even though all were now receiving the same reward. In the group shifted from 1 pellet to 16 pellets, running speed not only increased to the level of those given 16 pellets throughout, but significantly exceeded it. Conversely, 16 group not only fell to the level of the group trained on 16 running speed in the 256 pellets throughout but dropped significantly below it. Crespi called the overshoot in the group shifted from 1 to 16 pellets an “elation effect,” implying that the rats were so excited over this improvement in their circumstances that they ran especially fast. On similar reasoning, he labeled the undershoot in the group switched from 256 pellets to 16 pellets a “depression effect.” Other learning psychologists, however, were unhappy with these terms. Aside from the problem of knowing what a rat is feeling, the terms elation and depression imply emotional effects that should disappear as subjects become accustomed to the new levels of reinforcement. In some cases, however, the effects are enduring. (See Flaherty, 1996, for a review.) Psychologists have thus come to prefer the more neutral terminology of contrast effects to describe these phenomena, emphasizing that the effect of any reinforcer depends on how it contrasts with reinforcers experienced previously. Crespi’s elation effect is now called positive contrast, and the depression effect is called negative contrast. Contrast effects suggest that the effects of reinforcement depend on subjects’ expectations. If you expect 1 pellet, 16 pellets may seem marvelous; if you expect 256 pellets, 16 pellets may come as a disappointment. The importance of expectations in reinforcement might remind you of classical conditioning, where we encountered a similar phenomenon in our discussion of the Rescorla-Wagner model. There, too, the effect of a US depended on a subject’s expectations, expressed in terms of V, as the same US could produce either an increase or a decrease in associative strength depending on what subjects were expecting. When an important event such as food or shock occurs, we seem to evaluate it relative to our expectations, and this comparison or contrast then determines how we react. (For a more detailed analysis of the mechanisms underlying contrast effects, including a discussion of factors not considered here, see Williams, 1997.) One practical implication is that in choosing a reinforcer it is important to consider what reinforcers a person has experienced previously. If you own a car and a television, the promise of a bicycle as a reward might not be very exciting, but if you grew up in poverty, it might seem priceless. This might explain the age-old parental complaint, “Kids today just don’t appreciate the value of money. Why, when I was a kid . . .” When standards of living improve, people become accustomed to the new levels; what was once a powerful reinforcer might now seem drab and unexciting by comparison. The Yerkes-Dodson Law So far, we have assumed that motivation affects performance. In Crespi’s experiment, for example, the group trained with a small reward seemed to learn just as well as the group trained with a large reward—when the small-reward group was shifted to a larger reward, they immediately ran faster. On the other hand, the response the rats had to learn to obtain food was very, very simple—just to run down an alley. Even if larger amounts did produce better learning, it might have been difficult to detect in this situation because lie6674X_05_c05_153-200.indd 184 3/15/12 7:53 AM Section 5.5 Motivation CHAPTER 5 the group receiving the smaller amount would have learned so quickly. Thus, although the behavior of the rats that were shifted from 256 pellets to 16 pellets provided clear evidence that amount affects motivation, it remained possible that amount also affects learning, and that we could observe such effects if we used more difficult tasks. To provide a fairer test of whether motivation affects learning, According to the Yerkes-Dodson law, high motivation is likely Broadhurst (1957) trained rats to interfere with a person’s ability to complete a difficult task on a visual discrimination task such as a crossword puzzle, but it enhances performance on a in a Y-shaped maze. The maze simpler task such as a game of tic-tac-toe. was flooded with water, and a platform located in one arm of the Y allowed the rats to escape. This is an example of negative reinforcement, in which the reinforcer is the termination of an aversive stimulus, rather than the presentation of a desirable one. (As mentioned earlier, negative reinforcement is not punishment: In negative reinforcement as in positive reinforcement, behavior is strengthened; the difference lies solely in whether this is achieved by the presentation of a stimulus or its removal.) The position of the platform in this experiment was shifted randomly over trials, but its current location was always signaled by the illumination of the arms; the brighter of the two arms always contained the platform. To assess the effects of motivation on learning, Broadhurst varied how long the rats were held underwater before being allowed to swim through the maze; the confinement period ranged from zero to eight seconds. In addition, he examined the role of problem difficulty by varying the relative brightness of the alleys in different groups. For the easiest problem, the correct alley was 300 times brighter than the incorrect one, whereas for the most difficult problem the illumination ratio was only 15 to 1. The results for the different groups are shown in Figure 5.14, which plots in threedimensional form the percentage of correct responses over the first 100 trials as a function of both drive level and problem difficulty. In all three problems, drive level did influence learning, but the optimal level of motivation varied with the difficulty of the problem. On the easy problem, drive seemed to enhance learning uniformly: The longer that subjects were deprived of air, the fewer errors they made while learning. On the difficult problem, on the other hand, the fastest learning occurred with deprivations of only two seconds; increases in deprivation beyond this value resulted in a substantial decrease in learning. lie6674X_05_c05_153-200.indd 185 3/15/12 7:54 AM CHAPTER 5 Section 5.5 Motivation Figure 5.14: Visual discrimination experiment y eas ate r e d mo 80 75 70 Percent Correct Learing 85 8 6 4 2 0 s) d n eco n s ( io ay Del otivat M Results of a visual discrimination experiment to determine the effects of motivation on learning. The percentage of correct responses on a discrimination learning task was affected by both motivation and problem difficulty. Source: Data from Broadhurst, 1957 Broadhurst’s results suggest that motivation does affect learning, but that the relationship is complex. With relatively simple problems, increasing motivation enhances learning, but on more difficult problems high motivation can actually be harmful. This inverse relationship between task difficulty and optimum motivation—the more difficult the problem, the lower the optimum level of motivation—has been observed in a number of other studies (for example, Bregman & McAllister, 1982; Hochauser & Fowler, 1975). The phenomenon is known as the Yerkes-Dodson law, named for the two psychologists who first discovered it. A possible educational example might be the use of an attractive reward to encourage teenagers to get good grades in a mathematics course. For a student who finds math easy, the promise of an ipad if they get an A might be a powerful and effective incentive. For a student who finds math difficult, on the other hand, offering this reward might actually lead to poorer performance. The reasons that high motivation interferes with the learning lie6674X_05_c05_153-200.indd 186 3/15/12 7:54 AM CHAPTER 5 Section 5.6 The Role of the Stimulus of difficult tasks are not fully understood, but the most likely explanation is that motivation affects attention. According to Easterbrook (1959), attention becomes more highly focused when we are aroused; we concentrate more intensely on only a few stimuli while effectively ignoring all others. For simple problems, in which the relevant cues are obvious, focused attention is likely to facilitate learning. For problems in which the important cues are more subtle, however, a subject that focuses attention too narrowly might miss the critical cues and thus take much longer to solve the problem. The result is that high motivation helps subjects to solve simple problems but impairs performance on more difficult tasks. (For experimental support, see Telegdy & Cohen, 1971; Geen, 1985; for an alternative explanation of the effects of motivation, see Humphreys & Revelle, 1984.) In summary, we have seen that the effectiveness of a reinforcer depends on whether subjects are motivated to obtain it, and this in turn depends on its attractiveness or incentive value (the carrot) and how long subjects have been deprived of it (the stick). In general, stronger motivation produces better performance, but we have also seen two complications—that the incentive value of a reinforcer depends on how it contrasts with previous reinforcers, and that motivation can affect learning as well as performance. As with so many other aspects of reinforcement, the concept of motivation is simple on the surface but considerably more complex when examined closely. 5.6 The Role of the Stimulus One basic aspect of Thorndike’s Law of Effect, the assumption that a response will be strengthened if it is rewarded, seems little more than common sense. Thorndike’s version, however, is subtly different—it does not say that a reward will strengthen a response in a general sense, but in the particular situation where the reward was received (he wrote, “responses. . .followed by satisfaction [will be] more firmly connected with the situation”). To see the importance of this distinction, consider a child praised for cleaning her room. According to Thorndike, the effect might not be a general increase in cleaning her room, as her parents might fervently be hoping, but rather an increase only in the situation where the reward was given. For example, because her parents had been present when she was rewarded, she might learn to clean her room only when they are there, not exactly the intended outcome. Stimulus Control Thorndike did not systematically test this assumption, but later research was to support it. In one classic study, Guttman and Kalish trained pigeons to peck at a circular plastic disk, or key, mounted on one wall of a Skinner box. The key was illuminated with a yellowish-orange lie6674X_05_c05_153-200.indd 187 Children may be well-behaved in a classroom setting but not so compliant or easy to get along with at home. This illustrates the phenomenon of stimulus control, in which the probability of a response depends on the particular stimuli that are present. 3/15/12 7:54 AM CHAPTER 5 Section 5.6 The Role of the Stimulus light of 580 nanometers (a nanometer is a measure of a light’s wavelength, which determines its color) and pecks at the key were occasionally reinforced with access to a grain dish located below the key. To find out what the birds had learned during the training phase, Guttman and Kalish ran a test session in which they varied the color on the key. Sometimes it was illuminated with a green light (550 nm), sometimes with a red light (640 nm), and so on. During this session, responding was not reinforced in the presence of any of the colors. As shown in Figure 5.15, the birds responded vigorously whenever the key was illuminated with the yellowish-orange training stimulus (580 nm), but responding fell off sharply when the test wavelengths diverged from this value. Contrary to our earlier analysis, reinforcement did not result in a general tendency to peck the key but, rather, to peck the particular stimulus that had been present during reinforcement. Subsequent experiments have extended this finding, showing that even seemingly irrelevant features of the training situation (for example, the appearance of the walls, the texture of the floor) can acquire control over the reinforced response, so that subjects respond less when these stimuli are altered (see Balsam & Tomie, 1985). Figure 5.15: Generalization of responding to colors of different wavelengths 300 Responses 250 200 150 100 50 510 530 550 570 590 610 630 Wavelength (nanometers) The pigeons’ pecking was reinforced during training only in the presence of the 580-nm stimulus (indicated by the arrow). Source: Adapted from Guttman & Kalish, 1956 lie6674X_05_c05_153-200.indd 188 3/15/12 7:54 AM Section 5.6 The Role of the Stimulus CHAPTER 5 As you can see by looking at Figure 5.15, the 580-nm training stimulus received the largest number of responses (300), with a gradual decrease in responding the more a stimulus diverged from that particular wavelength. For example, a stimulus light of 590 nanometers didn’t produce as much response as the training stimulus, but it had considerably more effect than, say, a stimulus of only 530 nanometers, which produced practically no response at all. In other words, Guttman and Kalish’s experiment demonstrates how the response to the training stimulus spread to similar stimuli, a phenomenon known as generalization. As the training and test stimuli became less similar, responding declined, and this progressive decline in response is called a generalization gradient (a gradient is an incline or slope). This gradient illustrates the phenomenon of stimulus control, in which the probability of a response varies depending on what stimuli are present. In this case, the color of the key acquired control over the birds’ pecking, so that changes in this color affected their responding. Similarly, human behavior often comes under the control of stimuli that are present when we are reinforced, sometimes without our realizing it. A businessman, for example, may give generously to charity when in church, while behaving ruthlessly at work, and most of us behave quite differently when in the presence of a superior—a parent, a teacher, or an employer—than when we are with friends. We are not quite as consistent as the concept of personality might imply, as our behavior can vary substantially depending on the situation. Attention Thorndike was thus right: When a response is reinforced, it will become associated with the stimuli present at the time. But which stimuli? Will all the stimuli present acquire control or only some? And if only some, which? The first question—whether all stimuli will acquire control—proved surprisingly difficult to answer, but when an answer did emerge, it was simple. We are constantly bombarded by stimuli—many thousands of lights, sounds, and odors every second—and we can only attend to a fraction of them. The inevitable consequence is that only some of the stimuli present when a response is reinforced will come to control it. The first really clear demonstration of this came in an experiment using two pigeons as subjects (Reynolds, 1961). The pigeons were trained on a successive discrimination in which two stimuli were presented alternately for three minutes at a time. When the key was illuminated with the outline of a white triangle against a red background (S+), pecking was occasionally reinforced, but not when the stimulus was the outline of a white circle against a green background (S−) no reinforcement was given. (See Figure 5.16a.) Both birds quickly learned to peck S+, but not S−. To find out exactly what the birds had learned about each stimulus, Reynolds now presented the elements of each compound separately, illuminating the key with either the circle, the triangle, red, or green. Figure 5.16b shows the results for bird number 1. If all stimuli present during reinforcement acquire control over responding, the red and triangle components should have elicited roughly equal responding, but this was not the case: The first bird responded vigorously when the key was red, but ignored the triangle. The second bird, on the other hand, responded at a high rate when the triangle was present, but virtually not at all when the key was red (see Figure 5.16c). Each bird, in other words, learned about only one of the two stimuli present. lie6674X_05_c05_153-200.indd 189 3/15/12 7:54 AM CHAPTER 5 Section 5.6 The Role of the Stimulus Figure 5.16: Selective attention S+ green 20 bird #1 10 Responses per Minute red Responses per Minute 20 bird #2 10 S0 R Test Stimuli G 0 R G Test Stimuli a. b. c. In Reynolds’ experiment, two pigeons received food when they pecked a triangle on a red background, but not when they pecked a circle on a green background. When each element was presented separately in the test phase, the results showed that one bird had learned only about the color red, the other only about the triangle. Source: Adapted from Reynolds, 1961 These results illustrate the empirical phenomenon of selective attention, in which only a subset of the stimuli present comes to control responding (see also Langley & Riley, 1993). We can thus reformulate the principle of reinforcement as follows: When a response is reinforced, some subset of the stimuli present is likely to acquire control over it, so that the response will become more likely when these stimuli, or others similar to them, are present. Practical Applications In some applications involving reward, the goal is to have a behavior occur as widely as possible, regardless of the situation. If you were a parent trying to train a child to be honest, you would probably want this behavior to occur very widely. In other situations, however, your goal might be to have a behavior occur only in specific settings. A child, for example, needs to learn to cross a street only when the light is green, not red. In the following sections we will look at what can be done to achieve each of these goals. Encouraging Discrimination In cases where we want a behavior to occur only in particular settings, one useful technique is to provide discrimination training. In this procedure, training is provided not lie6674X_05_c05_153-200.indd 190 3/15/12 7:54 AM CHAPTER 5 Section 5.6 The Role of the Stimulus only in the situation where we want the behavior to occur (S+), but also in situations where we do not want it to occur (S−). Presentations of the situations are alternated, and behavior is reinforced only in the positive situation: S+: R SR S−: R ___ This is the procedure Reynolds used to train his birds to peck the red triangle but not the green circle, and the outcome he obtained—differential responding to the two stimuli—is called a discrimination. A study by Redd and Birnbrauer (1969) illustrates how discrimination training can determine the outcome when behavior is reinforced. The particiOne of the myths surrounding George pants were mentally disabled 12- to 15-year-old Washington’s childhood is that he cut boys, and the purpose of the study was to examdown his father’s cherry tree and admitted ine how the reinforcement contingencies estabto the misdeed by famously saying, “I lished by adults can shape children’s behavior. cannot tell a lie.” Honesty is often thought When one of the experimenters was present, the to be a personality trait that remains boys were reinforced for playing cooperatively (a relatively consistent across situations, typical reward was candy and praise); when the but research shows that people can be impressively honest in some situations and other experimenter was present, the boys were reinforced equally often, but the rewards were yet deeply dishonest in others. delivered at random intervals, regardless of how the boys were behaving. When the boys were later tested, they were far more likely to engage in cooperative play when the first experimenter was present. The experimenters then reversed roles, with the second experimenter being the one who reinforced cooperative play, and the boys altered their behavior accordingly. As a result of discrimination training, cooperative play occurred only in the situation where it was reinforced. Although the experimenter in this study deliberately arranged discrimination training, we often encounter similar contingencies in real life. If a child’s cooperative behavior is praised by a parent but ignored by a teacher, it would not be surprising if the child learned to behave differently at home and in school. Indeed, considerable evidence indicates that our behavior is not as consistent as we think. In one early study by Hartshorne and May (1928), children were given opportunities to behave dishonestly at home, at school, and at play. We tend to think of honesty as a personality trait, and thus expect children who are honest in one situat...
Purchase answer to see full attachment
User generated content is uploaded by users for the purposes of learning and should be used following Studypool's honor code & terms of service.

Explanation & Answer

hello, find the attached document, if you have any questions kindly let me know. thank you and goodbye...

Insert surname 1

Student’s name
Professor’s name

Course

Date

Humanism vs behaviorism

The Humanism learning theory is a philosophy which believes that learning is a personal
act which aims at fulfilling one’s potential. The theory focuses on the potential, dignity, and
human freedom. The theory works with believing that people work with intentionality and values
which push them to achieve their objectives. Learning is achieved by discovering knowledge and
also constructing meaning. The Humanism looks on the individual as a whole instead of
concentration at a single factor (Friedman, and Miriam, 2016).

The behaviorism theory, on the other hand, states that the learning process involves
acquiring new behaviors based on the available environment. The theory concentrates on the
unique observable behaviors and approving any independent activities of the mind. Unlike the
Humanism, Behaviorist theory operates on the principle of “stimulus-response. The theory states
that all the behavior is caused by external stimuli and can be explained without considering the
internal mental status of an individual. Behaviorism pays attention to the external factors of the
individual thus ignoring the mental processes which cannot be seen (Friedman, and Miriam,
2016).

Insert surname 2

Difference between the two theories

For the two theories, the learning process is viewed differently, the behaviorism states
that learn...


Anonymous
Awesome! Made my life easier.

Studypool
4.7
Trustpilot
4.5
Sitejabber
4.4

Similar Content

Related Tags