A Game-Theoretic Approach to Offline Reinforcement Learning - 2022

Details

Title : A Game-Theoretic Approach to Offline Reinforcement Learning Author(s): Simons Institute Link(s) : https://www.youtube.com/watch?v=UhqDcHrWn3E

Rough Notes

Collected data often lacks diversity. How can we make decisions when we have uncertainty caused by this lack of diversity?

Offline RL - Goal is to learn good policies from non-exploratory datasets. Challenge - missing data coverage means we cannot evaluate policies well, e.g. how can we know if a driving behaviour is unsafe if all collected trajectories show safe driving behaviour?

One approach is to use the principle of pessimism - optimize the worst-case scenarios, however defining a worst-scenario is now another problem.

How to trade-off conservatism and generalization?

This work offers a game-theoretic framework to offline RL.

The Relative Pessimism Game (MDP anaology counterpart: Behaviour Regularization) is more relevant for me, rather than Absolute Pessimism (MDP analogy counterpart: Algorithms based on bonus/truncations). The game theoretic formulations give insight for algorithms that are less conservative.

Offline RL + Relative Pessimism = Imitation Learning + Bellman Regularization