Multi-Agent Self-Driving (MASD) systems provide an effective solution for coordinating autonomous vehicles to reduce congestion and enhance both safety and operational efficiency in future intelligent transportation systems. Recently, Multi-Agent Reinforcement Learning (MARL) has emerged as a promising approach for developing advanced end-to-end MASD systems. However, achieving efficient collaboration in dynamic MASD systems remains a significant challenge due to the complex interactions among agents in dense scenarios. To address this, we propose a novel collaborative(CO-) interaction-aware(-IN) MARL framework, named COIN. Specifically, we propose a new counterfactual individual-global twin delayed deep deterministic policy gradient (CIG-TD3) algorithm, crafted in a centralized training and decentralized execution (CTDE) manner, which aims to jointly optimize the individual objectives (navigation) and the global objectives (collaboration) of agents. To further enhance the optimization of global collaborative objectives, we introduce a novel dual-level interaction-aware centralized critic architecture that employs variational inference to identify local interactions between key pairs of agents through their impact on state transitions and rewards. We then combine agents' local interaction features with their state-action information and utilize an attention mechanism to capture global interactions across agents. This enables COIN to effectively capture both micro-level agent interactions and macro-level overall system interactions for more accurate global value estimation, thus enhancing credit assignment and facilitating the learning of more efficient collaborative navigation strategies. We conduct extensive simulation experiments in dense urban traffic environments, including roundabouts, intersections, and bottlenecks, which demonstrate that COIN consistently outperforms other advanced baseline methods in both safety and efficiency across various system sizes. These results highlight its superiority in complex and dynamic MASD scenarios, as further validated through real-world robot demonstrations.
In the highly-interactive (i.e., the most challenging) intersection environment, COIN demonstrates superior control and collaboration capabilities, exhibiting almost no collisions and very few off-road incidents. By accurately modeling local and global interactions among vehicles during the policy optimization process, COIN achieves efficient and rational yielding and overtaking behaviors, thus enhancing overall safety and traffic efficiency.
COIN demonstrates strong stability and safety, maintaining smoother trajectories and more orderly traffic flow compared with other baselines. In contrast, TraCo and CoPO often result in collisions or deviations when vehicles exit the roundabout, while IPPO and ITD3 exhibit poorer stability, leading to frequent collisions during navigation.
COIN effectively models both local and global interactions, allowing it to efficiently coordinate vehicle passing sequences and significantly reduce conflicts and congestion. In the later stages, where interactions become less frequent, COIN maintains highly consistent trajectories with very low error rates, highlighting its capability to balance individual navigation and global cooperation objectives effectively.
The demonstration shows the robots navigating efficiently to their destinations while cooperating to avoid collisions in a real-world intersection mock-up, highlighting COIN’s effectiveness in complex MASD systems and its potential for end-to-end navigation in realistic and challenging traffic environments.