Bellman Equations

Recall the intuition that if we denote \(G_t\) to be the total sum of (possibly discounted) expected rewards, then we have the recurrent relation \(G_t = R_t + \gamma G_{t+1}\).

For Markov Decision Processes (MDPs), the Bellman equations are as follows.

They can derived via:

Anki

Derive the Bellman equation for \(V_\pi(s)\).

Derive the Bellman equation for \(Q_\pi(s,a)\).

Emacs 29.3 (Org mode 9.6.15)