Matthew Lai's answer to Intuitively, why is there a bias-variance tradeoff between TD(k=0) and TD(k=∞)?

Intuitively, why is there a bias-variance tradeoff between TD(k=0) and TD(k=∞)?

Research Engineer @ Google DeepMind · Author has 3.9K answers and 17.2M answer views · 8y ·

Think about what we are actually trying to achieve with TD -

We have an agent that performs a series of actions, and end up with a reward. We want to know which of the action(s) were responsible for the reward.

Unfortunately, we have no easy way of figuring that out in most cases. So we use a simple heuristic - actions that are closer to the reward are more likely to have been responsible for the reward.

This makes sense intuitively if we think about real life situations. For example, if we end up dead (remember that rewards can also be negative), the root cause is most likely because we didn’t look before crossing the road (3 seconds ago), less likely because we didn’t have a good night sleep (8 hours ago), and even less likely because of a sandwich we ate 3 years ago.

So what does having a longer trace mean? It means we look further into the past to try to find explanations for rewards we get. That means we are more likely to overfit by seeing patterns when there are none, or in other words, model noise in the training data. In the case above, if our agent dies 10 times, and we try to find a pattern in the 3 years worth of actions before each death, we will probably find some pattern, but it will most likely be just coincidence (overfitting).

Having a short trace means we are less likely to find those wrong generalizations, but also that we may fail to attribute to actions that don’t produce a reward immediately. That’s bias (not finding a pattern when there is one).

This goes beyond TD and reinforcement learning as well, and applies to machine learning in general - if we have enough noise, we will always be able to find patterns. But the patterns won’t actually be predictive on future input, and that’s one definition of overfitting.

3.8K views ·

View upvotes

1 of 4 answers

Something went wrong. Wait a moment and try again.

View 3 other answers to this question

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·