IPAB Workshop | Edinburgh Centre for Robotics

Speaker: Leonardo Hinckeldey

Title: Assistax: A Benchmark for Robust Reinforcement Learning in Collaborative Multi-Agent Settings

Abstract: Reinforcement learning (RL) holds great potential for building autonomous systems, yet its real-world application faces significant challenges. One critical hurdle is the need for RL agents to collaborate effectively with a diverse range of previously unseen partners, including humans – a property known as ad-hoc teamwork (AHT). Current multi-agent RL (MARL) algorithms often result in agents overfitting to their training partners, leading to poor performance when interacting with novel agents.

Despite the clear importance of robust, adaptable agents capable of on-the-fly cooperation in multi-agent environments, RL lacks rigorous benchmarks specifically targeting more complex, realistic tasks. To address this gap, we introduce Assistax, a novel simulation benchmark focused on real-world assisted healthcare tasks. Assistax provides a range of diverse pre-trained partner agents to evaluate the AHT capabilities of RL algorithms, offering a more comprehensive assessment of adaptability and collaboration. Leveraging JAX for acceleration, Assistax offers an efficient framework up to 150 times faster than comparable benchmark environments making RL research in complex, realistic multi-agent scenarios more accessible to the research community.

Speaker: Hasra Dodampe Gamage

Title: Collaborate and Explain on-the-fly: Knowledge-based Reasoning and Interactive Learning in Ad Hoc Teamwork

Abstract: Ad hoc teamwork refers to the problem of enabling an agent such as a software system or a robot to collaborate with other agents without prior coordination. For such teamwork, the ad hoc agent needs the ability to adapt to previously unseen situations. State of the art frameworks for ad hoc teamwork often pursue a data-driven approach, using substantial computation and a large labeled dataset of prior observations to model the behavior of other agents and to determine the ad hoc agent's behavior. These frameworks have difficulty accommodating rapid incremental revisions or transparency, and the necessary resources (e.g., training examples, computation) are not always readily available in many practical domains. We present an architecture that seeks to address these challenges by drawing on insights into human cognition. It embeds principles such as step-wise iterative refinement and ecological rationality that enable an ad hoc agent to perform non-monotonic logical reasoning with prior commonsense domain knowledge and models learned rapidly from limited examples to predict the behavior of other agents. We evaluate our architecture in simulated benchmark domains demonstrating the ability to make decisions reliably and efficiently with orders of magnitude fewer resources than state of the art frameworks. Furthermore, the ad hoc agent is able to incrementally acquire previously unknown domain knowledge governing actions and change while providing relational descriptions as on-demand explanations of its decisions in response to different types of questions.

________________________________________________________________________________
Microsoft Teams Need help?<https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Faka.ms%2FJoinTeamsMeeting%3Fomkt%3Den-US&data=05%7C02%7C%7Cb836f2de42a44925029908dcf73466ff%7C2e9f06b016694589878910a06934dc61%7C0%7C0%7C638657051670105823%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=eR7D8dd0X7e5p1S8K6YAZgyW%2FxOBhD0cYHXSB9a1DZQ%3D&reserved=0>
Join the meeting now<https://teams.microsoft.com/l/meetup-join/19%3ameeting_NzQwOWU1NDQtOTZjOC00NTBmLWE2YzEtZjBhN2QzM2YzNDMy%40thread.v2/0?context=%7b%22Tid%22%3a%222e9f06b0-1669-4589-8789-10a06934dc61%22%2c%22Oid%22%3a%22e361e7b5-d66e-470e-9272-be8eb05a925e%22%7d>
Meeting ID: 332 563 314 549
Passcode: kVvq8S

Date:

Thursday, 31 October, 2024 - 13:00

Speaker:

Leonardo Hinckeldey & Hasra Dodampe Gamage

Location:

Informatics Forum, G.03