Jay Wang's answer to In reinforcement learning, what is the difference between a state value function V(s) and a state-action value function Q(s,a)?

In reinforcement learning, what is the difference between a state value function V(s) and a state-action value function Q(s,a)?

Jay Wang

ML engineer · Author has 116 answers and 442.3K answer views · 7y ·

First recall that a policy [math]\pi[/math] is a mapping from each state, [math]s[/math], action [math]a[/math], to the probability [math]\pi(a|s)[/math] of taking action [math]a[/math] when in state [math]s[/math].

The state value function, [math]V_\pi(s)[/math], is the expected return when starting in state [math]s[/math] and following [math]\pi[/math] thereafter.

Similarly, the state-action value function, [math]Q_\pi(s, a)[/math], is the expected return of when starting in state [math]s[/math], taking action [math]a[/math], and following policy [math]\pi[/math] thereafter.

Read these 3 times out loud and you’ll get the difference.

6.6K views ·

View upvotes

1 of 5 answers

Something went wrong. Wait a moment and try again.

View 4 other answers to this question

About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·