Profile photo for Jay Wang

First recall that a policy [math]\pi[/math] is a mapping from each state, [math]s[/math], action [math]a[/math], to the probability [math]\pi(a|s)[/math] of taking action [math]a[/math] when in state [math]s[/math].

The state value function, [math]V_\pi(s)[/math], is the expected return when starting in state [math]s[/math] and following [math]\pi[/math] thereafter.

Similarly, the state-action value function, [math]Q_\pi(s, a)[/math], is the expected return of when starting in state [math]s[/math], taking action [math]a[/math], and following policy [math]\pi[/math] thereafter.

Read these 3 times out loud and you’ll get the difference.

View 4 other answers to this question
About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025