A Game-Theoretic Approach to Offline Reinforcement Learning - 2022
Details
Title : A Game-Theoretic Approach to Offline Reinforcement Learning Author(s): Simons Institute Link(s) : https://www.youtube.com/watch?v=UhqDcHrWn3E
Rough Notes
Collected data often lacks diversity. How can we make decisions when we have uncertainty caused by this lack of diversity?
Offline RL - Goal is to learn good policies from non-exploratory datasets. Challenge - missing data coverage means we cannot evaluate policies well, e.g. how can we know if a driving behaviour is unsafe if all collected trajectories show safe driving behaviour?
One approach is to use the principle of pessimism - optimize the worst-case scenarios, however defining a worst-scenario is now another problem.
How to trade-off conservatism and generalization?
This work offers a game-theoretic framework to offline RL.
The Relative Pessimism Game (MDP anaology counterpart: Behaviour Regularization) is more relevant for me, rather than Absolute Pessimism (MDP analogy counterpart: Algorithms based on bonus/truncations). The game theoretic formulations give insight for algorithms that are less conservative.
Offline RL + Relative Pessimism = Imitation Learning + Bellman Regularization