Inverse Reinforcement Learning (IRL)
[Definition from here] Given:
- States \(s \in \mathcal{ S }\) and actions \(a \in \mathcal{ A }\).
- (Sometimes) Transition probabilities \(p(s'\mid s,a)\).
- Sample trajectories \(\{\tau_i\}\) sampled from running the optimal policy \(\pi^*(\tau)\).
Learn:
- The reward function \(r_\psi(s,a)\) where \(\psi\) represents reward parameters.
Then use the learnt reward funcion to learn \(\pi^*(a\mid s)\).
Some papers mentioned from Lecture linked above:
- (, ).
- (, a).
- (, a).
- (, a).