Filippos Christianos
Much emphasis in reinforcement learning research is placed on exploration, ensuring that optimal policies are found. In multi-agent deep reinforcement learning (MARL), efficiency in exploration is even more important since the state-action spaces grow beyond our computational capabilities. We motivate and experiment on coordinating exploration between agents, to improve efficiency.
We have proposed a general method for efficient exploration by sharing experience amongst agents. Our proposed algorithm, called Shared Experience Actor-Critic (SEAC) [1], applies experience sharing in an actor-critic framework. SEAC has been evaluated in a collection of sparse-reward multi-agent environments and find that it consistently outperforms existing baselines. In some harder environments, experience sharing makes the difference between learning to solve the task and not learning at all.
With strong evidence that coordinated exploration can improve MARL algorithms, we continue to search for improved ways to approach this problem. Current research directions include (but are not limited to) improving SEAC by relaxing assumptions, gaining a better theoretical understanding of how coordination affects MARL or combining our findings with current research on intrinsic exploration.
[1] Filippos Christianos, Lukas Schäfer, and Stefano V. Albrecht.“Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning”. In:34th Conference on Neural Information Processing Systems. 2020.