Survival Instinct in Offline Reinforcement Learning and Implicit Human Bias in Data - 2023

Details

Title : Survival Instinct in Offline Reinforcement Learning and Implicit Human Bias in Data Author(s): Li, Anqi and Misra, Dipendra and Kolobov, Andrey and Cheng, Ching-An Link(s) :

Rough Notes

Thought provoking work that shows that offline Reinforcement Learning can find good policies on many benchmarks even when the reward labels are wrong, such as 0 everywhere, or negatives of the true rewards. The claim to prove that this robustness is due to the interplay between the use of pessimism in offline RL algorithms and the implicit bias in the data collection processes.

[#DOUBT What exactly are the datasets used?]

Emacs 29.4 (Org mode 9.6.15)