Jump to content

Question About Reinforcement...


Recommended Posts

So, I'm re-reading some Steven Lindsay. Here's a quote from volume 1:

"During the training process, dogs definitely form certain predictions and expectations about outcomes associated with their behaviour. Extrapolating from the foregoing analysis of classical conditioning to instrumental learning, if a dog receives a reward that is significantly smaller than expected, the outcome is perceived as punitive (disappointment), resulting in the trial rendering the response weaker. If, on the other hand, the reward exactly matches the dog's expectations, then the instrumental response that resulted in reward is neither rendered stronger nor weaker than it was before reinforcement. A reinforcer that does not result in additional learning (acquisition or extinction) might aptly be termed a verifier, serving to confirm the status quo but not resulting in any new learning. This general theory suggests that a third instrumental outcome exists in learning besides rewards and punishers (i.e., verifying events that function to maintain behaviour at the same level of probability). For new instrumental learning to take place, the reward must exceed a dog's expectations..."

Now, so far as I can figure it, this is saying that to simply maintain behaviours, the reward must always at least as good as the dog expects. Giving rewards that are less good than the dog expects actually promote extinction. If the dog thinks it will get roast meat for sitting, and you give kibble, you've just weakened (punished) the behaviour, not reinforced it. And to strengthen the behaviour, the reward must always be better than expected - either more frequent, bigger, or qualitatively better.

Now, can someone enlighten me as to how this works in actual training! Although this makes sense in some ways, in some ways it seems counter intuitive to me.

I think I actually do the opposite - I fade rewards as my dog gets more fluent at behaviours. I think this is the goal of many trainers. With us, the really good rewards are saved for hard behaviours (such as brand new behaviours, scary behaviours, or behaviours around high distraction environments). When she's fluent in one behaviour in a particular environment, we mostly drop the roast meat or sausage & she just gets praise or petting or kibble for doing it. I'm no world class dog trainer but this seems to work for us? These behaviours do not appear to go extinct, despite the reward getting progressively crappier.

What am I missing?

Link to comment
Share on other sites

IMO it's about what the dog expects more than anything. If you reward with a lot of different things, sometimes jackpot, sometimes give something little, sometimes nothing at all, then what is the dog to expect? You've pretty much short-circuited their reward expectations and introduced a whole lot of "maybe", which does some cool things to dopamine reward systems if you keep dropping in great rewards.

On the other hand, take clicker training for example. I pretty much always deliver the same reward, but the behaviour only gets stronger. What's going on here? Lindsay briefly touched on this when he was in Sydney, pointing out that the clicker serves as a surprise every time purely because of the sharp noise it makes. So they get surprise, treat, which is practically the same thing as OMG, that was better than I expected!

And in yet another confusing scenario, I reward recalls with the best treats I can get my hands on. My dogs come galloping when they are recalled because they anticipate something awesome. This is IMO different to strengthening, weakening or maintaining behaviour. It's creating anticipation for a highly valued reward. Again, kicking that dopamine reward system into gear.

Link to comment
Share on other sites

I'm thinking out loud here, (half asleep too), by reinforcing behaviour you have built up a history so you've strengthened that behaviour through reinforcement (does that make sense?). Over time with reinforcement you've developed both a muscle memory ie sit means put bottom on ground and a reward history which is where the dopamine hit Corvus mentions comes in. From memory didn't Lindsay mention that surprise is the biggest reinforcer such as using a reward that is unexpected (the dopamine hit transfers from the reward to the click in time, so the click becomes a reward in itself). It might be treats or it might be tug and it's the dopamine rush associated with that reward that reinforces the behaviour. If I used kibble as a reward for a sit I will get a reasonable sit but if I throw a ball for a sit that I think is "just right" will my dog remember that and repeat that performance again in the hope that she will get the ball again, yes! (Hmm sleep beckons, I'm not sure I'm making sense!)

Link to comment
Share on other sites

I think I actually do the opposite - I fade rewards as my dog gets more fluent at behaviours. I think this is the goal of many trainers. With us, the really good rewards are saved for hard behaviours (such as brand new behaviours, scary behaviours, or behaviours around high distraction environments)

Bob Bailey (and myself and others) have long promoted the idea of not using jackpots. Rewards themselves aren't faded, but the schedule of reinforcement may be manipulated. However I don't recommend the sort of standardised reward systems used in laboratories for practical reasons, apart from making the whole process a bit tedious in real life, it doesn't work when the job at hand might be significantly more reinforcing than the reinforcers you can more easily manipulate (such as food). For this I have been experimenting with what I call a "scaffolding" approach, where basic responses are built with food (or some other convenient reinforcer) then the big rewards are introduced when the dog is fluent at all the things we need them to do around the big reward. It's not really different to what other trainers do, but I'm giving it a fancy name and making some of the concepts a bit clearer to make it a more efficient, better defined process for dog trainers.

We mustn't forget "matching law" in all of this, which you might find in Lindsay under "concurrent schedules" or "generalised matching law". Matching law quite reliably predicts (at least in animals) what happens when the dog has choices about what he can do, what happens when you have lots of little reinforcers vs one big reinforcer, what happens when a reinforcer for one response is delivered immediately vs a reinforcer for a different response being delivered after a delay and that sort of thing.

I'm not sure that I completely follow Lindsay's logic behind a third class of reward. The definition of reinforcement has always been something that "increases or maintains a response" to the best of my recollection. And that is all the proof you need of reinforcement - was the response maintained or increased?

In this day and age of being able to look very closely inside the brain we're discovering all sorts of things about reinforcement and punishment. Negative reinforcement lights up the same parts of the brain as positive reinforcement (at least after cessation), for e.g. Learning is more relative than absolute. Each time we click the clicker there isn't a column that gets a token added to it inside the brain, but something is increased relative to something else, from what I can gather.

Link to comment
Share on other sites

I think I actually do the opposite - I fade rewards as my dog gets more fluent at behaviours. I think this is the goal of many trainers. With us, the really good rewards are saved for hard behaviours (such as brand new behaviours, scary behaviours, or behaviours around high distraction environments). When she's fluent in one behaviour in a particular environment, we mostly drop the roast meat or sausage & she just gets praise or petting or kibble for doing it. I'm no world class dog trainer but this seems to work for us? These behaviours do not appear to go extinct, despite the reward getting progressively crappier.

What am I missing?

I think depending on the dog, it is possible to substitute an external reward with the self rewarding nature of the activity with no visible extinction of behaviour. My dogs don't get anywhere near the reward in a trial that they do in training...and yet they continually give me noticeably MORE in a trial.

Link to comment
Share on other sites

...and yet they continually give me noticeably MORE in a trial.

Extinction?

huh? What I am trying to say is that once a behaviour becomes self rewarding, it is possible to reward less with no detrimentral effect.

Edited by Vickie
Link to comment
Share on other sites

...and yet they continually give me noticeably MORE in a trial.

Extinction?

huh? What I am trying to say is that once a behaviour becomes self rewarding, it is possible to reward less with no detrimentral effect.

Responses typically increase temporarily during unreinforced trials. It is a well documented extinction effect. Some of these behaviours are reinforcing in themselves, but the rate of reinforcement during a trial would be typically much lower than during training so you would expect responding to increase temporarily as an extinction effect, particularly if your dog has an expectation of some other reward.

It doesn't mean the response will be extinguished, but it is still an effect of extinction; a temporarily increased rate of responding.

Link to comment
Share on other sites

Aidan, I don't think extinction would be the case at all.

It doesn't take long for a dog to cotton on to the differences between a trial and training such as different location, time of day, duration of outing, level of distractions, people, dogs etc (as I mentioned in the potential of dogs thread). Some people (including myself) have noticed behaviours that are different at training and at a trial - the dogs know the difference. Extinction might occur if the dog thought it would get a certain reward and didn't, but once they have been to a few trials they KNOW what the reward is going to be like at a trial and that it is going to be different to training.

Link to comment
Share on other sites

So are you saying that you wouldn't get a similarly increased rate of responding in a training situation with no reinforcers for a single run?

By what other mechanism would you attribute this increased rate of responding to a different environment? The discriminative stimuli are the same, the consequences are different. The shoe fits...

Link to comment
Share on other sites

That's right - at training, if you rewarded the same as at a trial, you would not get the same response as you get in a trial :vomit:

I think that, like people, they do respond to the different environment. And to the handler, and I know I act differently at a trial than at training. More nervous, more excited. I'm sure my dog picks up on that. And once they have done a few, they look forward to their runs.

Link to comment
Share on other sites

I think that, like people, they do respond to the different environment. And to the handler, and I know I act differently at a trial than at training. More nervous, more excited. I'm sure my dog picks up on that. And once they have done a few, they look forward to their runs.

OK, I'm getting all that. What I'd like to determine is by what mechanism would this lead to an increased rate of responding (if not extinction)? Why does the response differ in a different environment? If it is an identifiable phenomenon then it isn't simply a chance event, so there must be a reason, some mechanism of behaviour that causes it.

Are we on the same page with extinction - that unreinforced trials often lead to a temporary increase in the rate of responding?

Link to comment
Share on other sites

From what I know of extinction I can't see how it could account for performance in competitions. Unless you are seeing extinction differently to how I do. If you compete often, the dog forms a correlation between trials and certain things, and if extinction was at play you would get a decrease in performance, as the dog recognises it as a trial.

I know I perform differently when I compete to when I train (in several disciplines) because of the different conditions, nervousness, excitement etc. Why not the same for dogs?

Edited by Kavik
Link to comment
Share on other sites

From what I know of extinction I can't see how it could account for performance in competitions. Unless you are seeing extinction differently to how I do. If you compete often, the dog forms a correlation between trials and certain things, and if extinction was at play you would get a decrease in performance, as the dog recognises it as a trial.

What normally happens is that you get an "extinction curve". Particularly if a response has been reinforced quite a lot, you get an initial increase in the rate of responding before it drops off. A simple example is if you teach a pigeon to peck a key and reinforce each key peck on a fixed ratio of 1 reward for each key press (FR1), then suddenly stop reinforcing, the pigeon will peck harder and faster. It's as if he thinks "damned thing must be broken, surely if I hit it harder it will come good!", and then after a while of doing this he thinks "hmm, this isn't working... maybe I'll go over here and groom, come back later and see if they fixed it for me".

If a response is strongly conditioned the increased rate of responding can continue for some time. An entire agility trial would not be out of the question, by any stretch. Plus, particularly in sports like Agility (and not so much in Obedience) there are reinforcers. So, you get what Gary Wilkes calls "riding the extinction bursts", the increased rate of responding IS reinforced. Add a stimulus condition (the trial environment) and voila, you have conditioned an increased rate of responding under those discriminative stimuli.

I know I perform differently when I compete to when I train (in several disciplines) because of the different conditions, nervousness, excitement etc. Why not the same for dogs?

Possibly! Not something we could prove or disprove easily. We could run a bunch of mock trial and genuine trial tests and look for statistically significant differences in the rates responding between the two but there are a ton of confounders. In any case the responses are emitted for the same discriminative stimuli in all cases (training or trial), the rate of reinforcement is reduced in a trial; we know those things for a fact, therefore we know that there will be extinction effects. This is not to say that your hypothesis would be disproved though, there can be more than one reason for changes in behaviour.

ETA: see Fig 8 here: http://psychclassics.asu.edu/Skinner/Theories/ for an example of an extinction curve. The cumulative responses on the Y-axis build rapidly, then level off over time, i.e the rate of responding increases, then reduces until the response is extinguished.

Edited by Aidan
Link to comment
Share on other sites

Well, we'll just have to agree to disagree :vomit:

I think if say you compete every two weeks, that if extinction was at work, you would quickly see dips in performance.

And this does happen with some behaviours at a trial. Stopped contacts for example, if you release early in trials but make the dog wait and reward in training, you will find the dog will behave differently in trials and may not give you as good a stop (happens often). The dog learns that the trial is a different environment and different things will be rewarded or not.

But since this does not happen to performances in general and many dogs (such as Vickie's) perform better at a trial, I don't see how extinction could be at play.

Since dogs often act differently in different scenarios as they learn specific behaviours are acceptable in different places, I don't see why a competition would be different and they are more excited and give a better performance.

Link to comment
Share on other sites

My post was simply in response to the question Staranais asked. She asked why the reduction in reward was not adversely affecting performance. I am suggesting that the performance has become self rewarding enough to provide as much incentive as the external rewards she started with.

I believe this is possible (for some dogs) & I believe this has occurred to an extent with my dogs in agility.

A few years ago I spent 6 months training Trim with no food & no toys. Despite the fact that all her foundation ha been dne with toys, I saw no decrease in her enthusiam for agility. I reverted back to toys because it gives me finer & more precise control over what I am rewarding (aned because it seems to make seminar presenters happier :crossfingers: ).

Despite the fact that she is a food motivated pig & loves her treats for rewards for tricks, it took me ages to get her to take food in agility. Once I got her to take it, she would spit it to the side & remain focussed on the next obstacle. Eventually I got her to actually swallow it. The only way I could do this was to teach her that agility stopped if she didn't take the food.

In training now we run 3-8 obstacle short sequences & she gets rewarded heavily. In a trial, she runs 24 obstacles & gets rewarded with her lead to tug with at the end. She will still often pull off the tug & try to get back into the ring if she possibly can. I highly doubt I could ever get her to take food or tug in the middle of a trial run. She has been trialling for 4 years, she continues to give me a better performance in a trial than she does in training.

Same with Shine, she loves to tug ( :rolleyes: sometimes a little too much as Kavik will tell you after seeing the blood dripping from my hand on Saturday after her run) but I still struggle to get her to tug once she knows she is going into the ring. She will only tug when we're done.

Trying to reward either of them on sheep, with anything other than the chance to work, would be ludicrous. I see this as similar except that they are much more serious/intense about the sheep than they are about agility.

Link to comment
Share on other sites

My post was simply in response to the question Staranais asked. She asked why the reduction in reward was not adversely affecting performance. I am suggesting that the performance has become self rewarding enough to provide as much incentive as the external rewards she started with.

I don't disagree with that at all. I was offering another perspective that extinction can explain an increase in responding. The flip-side of that is that not seeing a decrease (Staranais' observation) does not prove that a stimulus is a reinforcer. Although I think in her case the rewards she mentioned probably were reinforcers, and that they were above the threshold of expectation for her dog despite being less than earlier rewards.

Link to comment
Share on other sites

I think if say you compete every two weeks, that if extinction was at work, you would quickly see dips in performance.

If reinforcement was discontinued completely then you could predict that. But this is virtually impossible in agility.

And this does happen with some behaviours at a trial. Stopped contacts for example, if you release early in trials but make the dog wait and reward in training, you will find the dog will behave differently in trials and may not give you as good a stop (happens often).

So what happens after a sub-par stop? He has a choice right? This is a real world example of concurrent schedules.

Link to comment
Share on other sites

If the dog is not giving you reliable stops, you have a few choices. Some people have no problem with this and choose to do nothing so the behaviour continues. If you compete in ADAA you now have the choice to do a run as Not For Competition and bring a tug toy to reward in the ring, so you can reward your stops in the competition environment. Or some people will pull their dog out of the ring and not let them continue if they have not given a good stopped contact (providing they are sure the dog understands what is required).

Luckily I haven't been faced with this decision yet :crossfingers:

My dog's issue is start lines :rolleyes:

Link to comment
Share on other sites

If you compete in ADAA you now have the choice to do a run as Not For Competition and bring a tug toy to reward in the ring, so you can reward your stops in the competition environment.

That's a good idea! That would be good in other disciplines too.

I should have been clearer, what is your dog actually doing? Running off to another obstacle instead of stopping to criteria?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...