A terrain-aware active gaze learning framework for agile and generalizable humanoid locomotion across gaps, sparse footholds, stairs, narrow beams, and outdoor terrain.
TAGA enables agile and robust humanoid locomotion across diverse challenging terrains. Deployed on a Unitree G1 with onboard Jetson Orin inference, the robot traverses up to 1.2 m gaps, narrow beams, sparse stepping stones, stairs, and outdoor terrain.
Agile humanoid locomotion across diverse challenging terrain demands both wide perceptual coverage and precise local geometry understanding. Motivated by the way humans selectively look at relevant terrain during locomotion, we introduce TAGA, a Terrain-aware Active Gaze learning framework for Attention-based humanoid control. By fusing vision, proprioception, and motion commands, our framework guides the model to learn anticipatory cues and actively attend to specific areas of the height scan, selectively using these informative regions for the downstream network. This adaptively increases the information density of observations under tight onboard computational constraints, thus enabling fine-grained perceptive locomotion over larger-scale terrains. We find that such gaze behaviors can naturally emerge through reinforcement learning alone, without requiring additional supervision or explicit guidance, significantly improve training efficiency. As a result, the trained policy demonstrates robust and generalizable locomotion in simulation and on hardware, including reliable terrain-aware foothold selection, elevated-platform traversal, competitive sparse-foothold traversal, and the largest reported real-world gap traversal distance of 1.2 m among perceptive humanoid locomotion systems, while maintaining stability under severe perceptual disturbances and environmental interference.
TAGA fuses a forward-facing depth image, proprioceptive history, and motion commands to predict a task-relevant gaze location on the height scan. The selected terrain patch increases local information density while keeping onboard computation compact.
A visuomotor fusion encoder applies cross-attention over the cropped terrain region, producing a terrain-aware representation for the downstream locomotion policy. A mixture-of-experts action decoder then adapts control to diverse terrain conditions.
TAGA is evaluated across simulated terrains and hardware deployments, including sparse footholds, stairs, beams, gaps, and outdoor environments.
Curriculum training covers challenging terrain families and supports robust skill acquisition before deployment-oriented randomization.
Hardware experiments demonstrate agile locomotion on a Unitree G1, including large gap traversal and sparse foothold selection.
The terrain set spans gaps, stairs, sparse stepping stones, narrow beams, slopes, and obstacle-crossing scenarios.
@article{li2026taga,
title={TAGA: Terrain-aware Active Gaze Learning for Generalizable Agile Humanoid Locomotion},
author={Li, Peizhuo and Li, Hongyi and Fan, Mingfeng and Xu, Fangzhou and Liao, Shuhao and Ma, Yuxuan and Zeng, Zicheng and Wang, Ze and Jin, Yongbin and Cao, Yuhong and Wang, Hongtao and Sartoretti, Guillaume},
journal={arXiv preprint},
year={2026}
}