Inverse Reinforcement Learning (IRL)

[Definition from here] Given:

Learn:

Then use the learnt reward funcion to learn \(\pi^*(a\mid s)\).

Some papers mentioned from Lecture linked above:

Emacs 29.3 (Org mode 9.6.15)