Policy evaluation

In an MDP, the problem of policy evaluation refers to getting the state-value function for each state given a fixed policy \(\pi\). This is done using the following Bellman Equation:

\(V^{\pi}(s) = \sum_{s'}\sum_{a}p(s',a|s)[r(s,a,s')+\gamma V^{\pi}(s')]=\sum_{s'}\sum_{a}p(s'|s,a)\pi(a|s)[r(s,a,s')+\gamma V^{\pi}(s')]\)

Where \(\pi(a|s) = p(s|a)\) and \(r(s,a,s')\) is the expected reward at state \(s'\) after taking action \(a\) at state \(s\).

Emacs 29.4 (Org mode 9.6.15)