How does operant conditioning occur




















To show how positive reinforcement works, Skinner placed a hungry rat in the operant conditioning chamber. In one side of the chamber was a lever that dropped food pellets into the chamber when pressed. As the rat moved around the box, at one point it would accidently press the lever, resulting in a pellet of food being dropped into the chamber immediately. Over time, the rat would learn that pressing the lever led to food being released, and it quickly learned to go directly to the lever whenever it was placed in the chamber.

Receiving food every time it pressed the lever acted as positive reinforcement, ensuring that the rat would keep pressing the lever again and again. Negative reinforcement : This refers to consequences where an unfavorable event or outcome is removed following a certain behavior. In this case, the behavior is strengthened not by the desire to get something good, but rather by the desire to get out of an unpleasant condition. A good example of negative reinforcement is a teacher promising to exempt students who have perfect attendance from the final test.

This encourages them to attend all classes. Such responses are referred to as negative reinforcement because the removal of the unfavorable event or outcome is rewarding to the individual. While they have not actually received anything, not sitting a test can still be seen as a reward. To show how negative reinforcement works, Skinner placed a rat in the operant conditioning chamber and then delivered an unpleasant electric through the floor of the chamber. As the rat moved about in discomfort, it would accidently knock the lever, switching off the electric current immediately.

Over time, the rat learns that it can escape from the unpleasant electric current by pressing the lever, and it starts going directly to the lever every time the current is switched on. Punishment refers to any adverse or unwanted environmental consequence to an action that reduces the probability of the action being repeated.

In other words, punishment weakens behavior. There are two types of punishment:. Positive punishment : This refers to consequences where an unfavorable or unpleasant event or outcome is presented or applied following a certain behavior in order to discourage the behavior. For instance, when you get fined for a traffic infraction, that is an example of positive punishment. An unfavorable outcome payment of the fine is applied to discourage you from committing the infraction again.

Negative punishment : This refers to consequences where a favorable or pleasant outcome is removed following a certain behavior. This can also be referred to as punishment by removal. An example of negative punishment is where a parent denies a child the opportunity to watch television following misbehavior by the child.

Sometimes, it can be challenging to distinguish between punishment and negative reinforcement. What you need to remember is that reinforcement both positive and negative is meant to strengthen behavior, while punishment is used to weaken behavior.

It is also good to note that reinforcement is a more effective in effecting behavior change compared to punishment for a number of reasons. These include:. Apart from reinforcement and punishment, behaviorists also discovered that operant conditioning is also influenced by reinforcement schedules. Reinforcement schedule refers to the rules that determine when and how often behavior reinforcements are delivered.

Reinforcement schedules have an impact on how quickly behaviors are learned and the strength of the acquired behavior. There are several different delivery schedules that can be used to influence the operant conditioning process.

Continuous reinforcement : This is a schedule where a reinforcement is immediately delivered every time a response occurs. For instance, a food pellet is dropped immediately every time the lever is pressed. With continuous reinforcement, new behaviors are learned relatively quickly. However, the response rate the rate at which the rat presses the lever is quite low. The learned behavior is also forgotten very quickly once reinforcement stops.

Fixed ratio reinforcement : This is a schedule where the reinforcement is delivered only after a behavior or response has occurred a specified number of times. For instance, a pellet of food is released every fifth time the rat presses the lever. With fixed ratio schedules, the response rate as well as the extinction rate the rate at which the learned behavior is forgotten is medium. Fixed interval reinforcement : This refers to a schedule where reinforcement is delivered after a specified interval of time, provided the correct response has been made at least once.

The response rate is medium, though the responses tend to increase as the interval approaches and slow down following the delivery of the reinforcement.

Variable ratio reinforcement : This refers to a reinforcement schedule where reinforcement is delivered after an unpredictable number of responses.

A good example of variable ratio reinforcement is gambling. Variable ratio reinforcement results in a very high response rate and a very slow extinction rate. This explains why gambling becomes addictive. Variable interval reinforcement : This refers to a reinforcement schedule where reinforcement is delivered after an unpredictable interval of time has elapsed, provided the correct response has been made at least once.

Variable ratio reinforcement also results in a very high response rate and a very slow extinction rate. Apart from reinforcement schedules, there are a few other factors that influence the effectiveness of reinforcement and punishment. However, if the individual has received enough of the reward to satiate his or her craving, the individual will be less inclined to display the desired behavior.

When the individual has been deprived of the reward, on the other hand, the effectiveness of the reinforcement will be increased due to the increased craving for the reward. This explains why Skinner used hungry rats in his experiments. Immediacy : Learning occurs faster when the consequence reinforcement or punishment is delivered immediately after an action or behavior. The more the consequence is delayed, the more ineffective it becomes. Consistency : Reinforcements that are consistently delivered following every correct response lead to faster learning times.

Intermittent delivery of reinforcements leads to slower learning, but then the learned behavior is harder to extinguish compared to when reinforcements are consistently delivered after each correct response. Size : The amount of reinforcement or punishment also has an effect on the effectiveness of the consequence. When the reward is too little, it might not seem worthwhile to go through a lot of effort displaying the desired behavior for such a small reward.

Similarly, when the punishment is too small, the benefits of engaging in the unwanted behavior might outweigh the discomfort of experiencing the punishment. Operant conditioning can be applied at the workplace in various ways, from instituting corporate culture and addressing interactions between employees to helping an organization achieve its annual targets.

Positive reinforcement, one of the key components of operant conditioning, can be used to increase productivity at the workplace. He also believed that this learned association could end, or become extinct, if the reinforcement or punishment was removed. Skinner : Skinner was responsible for defining the segment of behaviorism known as operant conditioning—a process by which an organism learns from its physical environment.

In his first work with rats, Skinner would place the rats in a Skinner box with a lever attached to a feeding tube. Whenever a rat pressed the lever, food would be released.

After the experience of multiple trials, the rats learned the association between the lever and food and began to spend more of their time in the box procuring food than performing any other action. It was through this early work that Skinner started to understand the effects of behavioral contingencies on actions.

He discovered that the rate of response—as well as changes in response features—depended on what occurred after the behavior was performed, not before. Skinner named these actions operant behaviors because they operated on the environment to produce an outcome.

The process by which one could arrange the contingencies of reinforcement responsible for producing a certain behavior then came to be called operant conditioning. In this way, he discerned that the pigeon had fabricated a causal relationship between its actions and the presentation of reward. In his operant conditioning experiments, Skinner often used an approach called shaping.

Instead of rewarding only the target, or desired, behavior, the process of shaping involves the reinforcement of successive approximations of the target behavior. Behavioral approximations are behaviors that, over time, grow increasingly closer to the actual desired response.

Skinner believed that all behavior is predetermined by past and present events in the objective world. He did not include room in his research for ideas such as free will or individual choice; instead, he posited that all behavior could be explained using learned, physical aspects of the world, including life history and evolution. His work remains extremely influential in the fields of psychology, behaviorism, and education. Shaping is a method of operant conditioning by which successive approximations of a target behavior are reinforced.

In his operant-conditioning experiments, Skinner often used an approach called shaping. The method requires that the subject perform behaviors that at first merely resemble the target behavior; through reinforcement, these behaviors are gradually changed, or shaped , to encourage the performance of the target behavior itself. Shaping is useful because it is often unlikely that an organism will display anything but the simplest of behaviors spontaneously. It is a very useful tool for training animals, such as dogs, to perform difficult tasks.

Dog show : Dog training often uses the shaping method of operant conditioning. In shaping, behaviors are broken down into many small, achievable steps. To test this method, B. Skinner performed shaping experiments on rats, which he placed in an apparatus known as a Skinner box that monitored their behaviors. The target behavior for the rat was to press a lever that would release food. Initially, rewards are given for even crude approximations of the target behavior—in other words, even taking a step in the right direction.

Then, the trainer rewards a behavior that is one step closer, or one successive approximation nearer, to the target behavior. For example, Skinner would reward the rat for taking a step toward the lever, for standing on its hind legs, and for touching the lever—all of which were successive approximations toward the target behavior of pressing the lever.

As the subject moves through each behavior trial, rewards for old, less approximate behaviors are discontinued in order to encourage progress toward the desired behavior.

For example, once the rat had touched the lever, Skinner might stop rewarding it for simply taking a step toward the lever. In this way, shaping uses operant-conditioning principles to train a subject by rewarding proper behavior and discouraging improper behavior. This process has been replicated with other animals—including humans—and is now common practice in many training and teaching methods. It is commonly used to train dogs to follow verbal commands or become house-broken: while puppies can rarely perform the target behavior automatically, they can be shaped toward this behavior by successively rewarding behaviors that come close.

Shaping is also a useful technique in human learning. For example, if a father wants his daughter to learn to clean her room, he can use shaping to help her master steps toward the goal. First, she cleans up one toy and is rewarded. Second, she cleans up five toys; then chooses whether to pick up ten toys or put her books and clothes away; then cleans up everything except two toys. Through a series of rewards, she finally learns to clean her entire room.

Reinforcement and punishment are principles of operant conditioning that increase or decrease the likelihood of a behavior. Reinforcement and punishment are principles that are used in operant conditioning. Reinforcement means you are increasing a behavior: it is any consequence or outcome that increases the likelihood of a particular behavioral response and that therefore reinforces the behavior. The strengthening effect on the behavior can manifest in multiple ways, including higher frequency, longer duration, greater magnitude, and short latency of response.

Punishment means you are decreasing a behavior: it is any consequence or outcome that decreases the likelihood of a behavioral response. Extinction , in operant conditioning, refers to when a reinforced behavior is extinguished entirely. This occurs at some point after reinforcement stops; the speed at which this happens depends on the reinforcement schedule, which is discussed in more detail in another section.

Both reinforcement and punishment can be positive or negative. In operant conditioning, positive and negative do not mean good and bad. Instead, positive means you are adding something and negative means you are taking something away. All of these methods can manipulate the behavior of a subject, but each works in a unique fashion. See the blue text and yellow text above, which represent positive and negative, respectively.

Similarly, reinforcement always means you are increasing or maintaining the level of a behavior, and punishment always means you are decreasing the level of a behavior. See the green and red backgrounds above, which represent reinforcement and punishment, respectively. The stimulus used to reinforce a certain behavior can be either primary or secondary. We then withhold reinforcement until a slight movement is made toward the spot.

This again alters the general distribution of behavior without producing a new unit. We continue by reinforcing positions successively closer to the spot, then by reinforcing only when the head is moved slightly forward, and finally only when the beak actually makes contact with the spot.

The original probability of the response in its final form is very low; in some cases it may even be zero. In this way we can build complicated operants which would never appear in the repertoire of the organism otherwise. By reinforcing a series of successive approximations, we bring a rare response to a very high probability in a short time. The total act of turning toward the spot from any point in the box, walking toward it, raising the head, and striking the spot may seem to be a functionally coherent unit of behavior; but it is constructed by a continual process of differential reinforcement from undifferentiated behavior, just as the sculptor shapes his figure from a lump of clay.

The clicker training featured in the chicken and goat videos, and used by many for training dogs, combines classical and operant conditioning. Classical conditioning is used to make the clicking sound into a conditional stimulus, which is then used for positive reinforcement in operant conditioning. Several real-world examples of operant conditioning have already been mentioned: rewarding a child for good behavior or punishing a child for bad behavior, slot machines, and pop quizzes.

In zoos and other animal facilities, keepers use operant conditioning in order to train animals to move between different parts of their enclosures, to present body parts for inspection, or to ensure that veterinary examinations are conducted safely.

Operant conditioning can also explain why some zoo animals display stereotypies or repetitive behaviors. To understand how this works, let's return to Skinner's pigeons. In one experiment, Skinner placed the birds into their boxes, and set the food reward to be delivered at a systematic interval regardless of the birds' behaviors. The pigeons went on to develop what Skinner referred to as "superstitious behaviors," as the result of accidental juxtapositions between their overt behaviors and the presentation of the food reward.

One pigeon turned counter-clockwise in the cage just before a reward was presented, which led the pigeon to learn an association between the counter-clockwise turn and food. The pigeon spent its time turning 'round and 'round waiting for the reward.

Another thrust its head into one corner of the cage to elicit the food. Two birds swayed their heads from left to right, and another bird had been conditioned to peck towards - almost but not quite touching — the floor.

Stereotypical behaviors in captive animals can result from a number of sources, but accidental operant conditioning might explain a large proportion of them. Indeed, the most common form of stereotypical behavior in zoo animals is pacing, if combined with stereotypic swimming patterns, followed by various forms of swaying or head bobbing. Luckily, principles of operant conditioning can also be used to remedy these sorts of problems.

Skinner B. DOI: Shyne A. Meta-analytic review of the effects of enrichment on stereotypic behavior in zoo mammals, Zoo Biology, 25 4 What Is Classical Conditioning?

And Why Does It Matter? The views expressed are those of the author s and are not necessarily those of Scientific American. Jason G. Goldman is a science journalist based in Los Angeles. He has written about animal behavior, wildlife biology, conservation, and ecology for Scientific American , Los Angeles magazine, the Washington Post , the Guardian , the BBC, Conservation magazine, and elsewhere.

He enjoys sharing his wildlife knowledge on television and on the radio, and often speaks to the public about wildlife and science communication. Follow Jason G. Goldman on Twitter. Already a subscriber?

Sign in. Thanks for reading Scientific American. Create your free account or Sign in to continue. See Subscription Options. Discover World-Changing Science.



0コメント

  • 1000 / 1000