James MacGlashan's answer to When is it advantageous to use Reinforcement Learning over Supervised Learning?

When is it advantageous to use Reinforcement Learning over Supervised Learning?

Ph.D. in Artificial Intelligence, University of Maryland, Baltimore County (Graduated 2013) · Author has 64 answers and 265.3K answer views · 6y ·

Easy! When the problem you are trying to solve cannot be described as a supervised learning (SL) problem, but can be described as a reinforcement learning (RL) problem :)

Okay, I’ll elaborate. Don’t think of SL and RL as competing solution methods to a problem. Rather, SL and RL are a class of problems to solve, of which there are many different ways to solve them. Lets talk about these two problem classes briefly.

Reinforcement Learning

In reinforcement learning, our AI makes observations and choose actions. It wants to choose actions so that it gets a very high total “score,” where the score is dictated by positive or negative rewards it receives every time it chooses an action. The AI then must learn which actions to take given the current observation by trying different actions under different contexts and inferring how good that action is relative to other actions it could have taken.

In RL, it is often also the case that when the AI chooses an action, that impacts what observation it will see next, and what kinds of future rewards it will be able to receive. The AI must live with the consequences of its decision. Consequently, it cannot magically test what reward each action would receive for any conceivable observation and action choice. Nor can the agent go back in time to a previous observation it saw and see what would happen if it tried a different action. It only gets to to test what will happen for world as it is now, and can only affect what it will see next in so far as it can choose actions to take it to where it wants to go.

Supervised Learning

Supervised learning can also be framed as an AI making observations and making some decision in response. But there is a significant twist to this story. In SL, the AI is told not only how good or bad an action it takes is, it’s also told which action it should have taken.

This additional information is why it’s called supervised learning. It’s as if the AI has a supervisor watching its decisions and telling it what it should have done. Because of this fact, whether the AI’s decisions do or do not affect what observations it could see next is somewhat irrelevant. Because for any observations it has seen, the AI knows what it should have done and can play that back in its head as many times as it wants.

This property is also why SL typically operates on fixed datasets. Someone collects of a bunch of observations, writes down what the answer should have been, and then the AI just thinks about that for a while to internalize it so that it knows what to do in the future.

Comparing the two

If you want to think of an analogy, the RL class of problems is like taking an exam and all you get back is your grade. The teacher may then choose another exam based on what you did on the first exam, but you don’t get to directly choose which questions they will ask on the next.

SL is like taking an exam, and then the teacher explains how to solve each of the problems.*

SL is a subset of RL

After thinking about these problem classes, what you may notice is that SL problems are arguably a subset of the class of problems RL describes. That is, SL simply gives you more information than what the RL definitions commits to giving you.

Consequently, you may ask yourself “Can I use methods to solve RL to solve SL?”

The answer is yes. But why would you want to?

Choosing to use a method that solves RL problems and ignoring the additional information available in a supervised learning problem is similar to getting your exam back and refusing to watch the teacher show you how you should have solved the problems you got wrong or were unsure of.

You should always use solutions that fit the narrowest definition of the problem you’re trying to solve so that you can maximally benefit from all sources of information.

—

* The astute reader might ask: in SL might it be worthwhile for the student to deliberately get some answers wrong to affect what the next exam is? Yes, that is possible! In this setting the problem class is less narrow than what SL usually describes, but still more narrow than RL.

9.7K views ·

View upvotes

· Answer requested by

James Williams

1 of 6 answers

Something went wrong. Wait a moment and try again.

View 5 other answers to this question

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·