Bellman Equations
For Markov Decision Processes (MDPs), the Bellman equations are as follows.
- Bellman equation for state-value function (3.14 in cite:sutton-2018-reinf)
- Bellman equation for action-value function
- Bellman optimality equation for state-value function (special case of bellman eq for state-value functions for the optimal value function which does not depend on a generic policy pi, p63 cite:sutton-2018-reinf)
- Bellman optimality equation for action-value function
They can derived via:
- Backward reasoning for deterministic dynamics such as was done in Topics in Reinforcement Learning (Arizona State University CSE691).
- Computing \(\mathbb{ E }_{A_t,R_{t+1},S_{t+1},A_{t+1},\cdots}[G_{t+1}|S_t=s]\) and using the Markov property (where \(G_{t+1}=\sum_{i=t+1}^{\infty}\gamma^{i-{t+1}}R_{t+1}\)).