Markov Decision Processes (MDPs)

A Markov Decision Process (MDP) is a 5 tuple \((S,A,P_a,R_a,\gamma)\) where

The optimal state-value function \(V^*(s)\) and optimal action-value function \(Q^*(s,a)\) is the maximum of all such functions over the policy space. All optimal policies achieve optimal state-value and action-value functions.

One key tool to solve MDPs are the Bellman Equations which represent the state/action value functions and the optimal state/value functions in terms of themselves.

Emacs 29.4 (Org mode 9.6.15)