Common Inaccuracies in Multi-Agent RL Research - 2021

Details

Title : Common Inaccuracies in Multi-Agent RL Research Author(s): Albrecht, Stefano Link(s) : https://agents.inf.ed.ac.uk/blog/multiagent-rl-inaccuracies/

Rough Notes

Markov games were not introduced specifically in (, ) (which also deals with fully observed states and actions), they date back to (, a) where they are called stochastic games. The naming convention is a difference between the machine learning community and the game theory community.

Regarding the reward definition for a single agent in the Multi-Agent Reinforcement Learning (MARL) setting, defining this for agent \(i\) as \(R_i = \sum_{t}^{}\gamma^t r_{i,t}\) is ill-posed - since the returns of one agent's policy \(\pi_i\) will depend on the policies of other agents \(\pi_{-i}\), and without \(\pi_{-i}\) we cannot know where \(\pi_i\) achieves its goal of maximizing the expected reward of agent \(i\). The recommendation is to define the learning objective in an MARL setting with \(N\) agents as finding \((\pi_1,\cdots,\pi_N)\) where \(\forall i \pi_i \in \text{argmax}_{\pi'_i}\mathbb{ E }[R_i|\pi'_i, \pi_{-i}]\). (#DOUBT It says \(\pi\) here is a Nash equilibrium - need to understand that).

Action spaces do not in general increase exponentially with the number of agents. For example, in a setting where there are 1000 different control variable and each can take \(k\) values, the action space is \(k^{1000}\), we could have \(N\) agents each looking at \(\frac{1000}{N}\) of the control variables, so each agent is looking at \(k^{\frac{1000}{N}}\) possible actions, and the problem has a complexity of \(\mathcal{ O }(N k^{\frac{1000}{N}})\), while the total joint actions is still \(k^{1000}\). The catch is that agents need to coordinate their decisions.

Emacs 29.4 (Org mode 9.6.15)