Bayes-Adaptive POMDPs - 2007

Details

Title : Bayes-Adaptive POMDPs Author(s): Ross, Stephane and Chaib-draa, Brahim and Pineau, Joelle Link(s) : https://papers.nips.cc/paper/2007/hash/3b3dbaf68507998acd6a5a5254ab2d76-Abstract.html

Rough Notes

Authors present a new model to extend ideas within Bayesian Reinforcement Learning from Markov Decision Processes (MDPs) to Partially Observable MDPs (POMDPs).

Bayesian RL takes into account the uncertainty in the model parameters during planning whilst learning their values through experience. They have been applied to MDPs, and this paper aims to go further to the POMDP setting, where there are 2 challenges:

How to update Dirichlet parameters when the state is a hidden variable.
How to approximate the infinite dimensional belief space to perform belief monitoring and compute the optimal policy.

This paper tackles the first problem by including the Dirichlet parameters in the state space and maintaining belief states over these parameters, and the second problem by bounding the space of Dirichlet parameters to a finite subspace necessary for \(\epsilon\) -optimal solutions.

The authors assume finite and known state, action and observation spaces, and partially/un-known transition and observation probabilities. The reward function is known. To model the uncertainty in the transition parameters \(T^{sas'}=P(S_{t+1}=s'|S_t=s,A_t=a)\) and observation parameters \(O^{saz}=P(Z_t=z|S_t=s,A_{t-1}=a)\), Dirichlet distributions are used.