TAGA: Terrain-aware Active Gaze Learning for Generalizable Agile Humanoid Locomotion

LI, Peizhuo; LI, Hongyi; FAN, Mingfeng; XU, Fangzhou; LIAO, Shuhao; MA, Yuxuan; ZENG, Zicheng; WANG, Ze; JIN, Yongbin; CAO, Yuhong; WANG, Hongtao; SARTORETTI, Guillaume

A terrain-aware active gaze learning framework for agile and generalizable humanoid locomotion across gaps, sparse footholds, stairs, narrow beams, and outdoor terrain.

LI Peizhuo^1,*, Hongyi LI^2,*, Mingfeng FAN^1,*, Fangzhou XU¹, Shuhao LIAO¹, Yuxuan MA¹, Zicheng ZENG³, Ze WANG², Yongbin JIN^2,†, Yuhong CAO^1,†, Hongtao WANG², Guillaume SARTORETTI¹

¹MarmotLab, National University of Singapore ²Center of X-Mechanics, Zhejiang University ³South China University of Technology
^*Equal contribution ^†Corresponding authors Under Review

Abstract

Agile humanoid locomotion across diverse challenging terrain demands both wide perceptual coverage and precise local geometry understanding. Motivated by the way humans selectively look at relevant terrain during locomotion, we introduce TAGA, a Terrain-aware Active Gaze learning framework for Attention-based humanoid control. By fusing vision, proprioception, and motion commands, our framework guides the model to learn anticipatory cues and actively attend to specific areas of the height scan, selectively using these informative regions for the downstream network. This adaptively increases the information density of observations under tight onboard computational constraints, thus enabling fine-grained perceptive locomotion over larger-scale terrains. We find that such gaze behaviors can naturally emerge through reinforcement learning alone, without requiring additional supervision or explicit guidance, significantly improve training efficiency. As a result, the trained policy demonstrates robust and generalizable locomotion in simulation and on hardware, including reliable terrain-aware foothold selection, elevated-platform traversal, competitive sparse-foothold traversal, and the largest reported real-world gap traversal distance of 1.2 m among perceptive humanoid locomotion systems, while maintaining stability under severe perceptual disturbances and environmental interference.

Method Overview

Active gaze for terrain-aware control

TAGA fuses a forward-facing depth image, proprioceptive history, and motion commands to predict a task-relevant gaze location on the height scan. The selected terrain patch increases local information density while keeping onboard computation compact.

A visuomotor fusion encoder applies cross-attention over the cropped terrain region, producing a terrain-aware representation for the downstream locomotion policy. A mixture-of-experts action decoder then adapts control to diverse terrain conditions.

Simulation and Real-World Results

TAGA is evaluated across simulated terrains and hardware deployments, including sparse footholds, stairs, beams, gaps, and outdoor environments.

Simulation Evaluation

Curriculum training covers challenging terrain families and supports robust skill acquisition before deployment-oriented randomization.

Real-World Evaluation

Hardware experiments demonstrate agile locomotion on a Unitree G1, including large gap traversal and sparse foothold selection.

Terrain Diversity

The terrain set spans gaps, stairs, sparse stepping stones, narrow beams, slopes, and obstacle-crossing scenarios.

BibTeX

@article{li2026taga,
  title={TAGA: Terrain-aware Active Gaze Learning for Generalizable Agile Humanoid Locomotion},
  author={Li, Peizhuo and Li, Hongyi and Fan, Mingfeng and Xu, Fangzhou and Liao, Shuhao and Ma, Yuxuan and Zeng, Zicheng and Wang, Ze and Jin, Yongbin and Cao, Yuhong and Wang, Hongtao and Sartoretti, Guillaume},
  journal={arXiv preprint},
  year={2026}
}