TAGA: Terrain-aware Active Gaze Learning for Generalizable Agile Humanoid Locomotion

A terrain-aware active gaze learning framework for agile and generalizable humanoid locomotion across gaps, sparse footholds, stairs, narrow beams, and outdoor terrain.

LI Peizhuo1,*, Hongyi LI2,*, Mingfeng FAN1,*, Fangzhou XU1, Shuhao LIAO1, Yuxuan MA1, Zicheng ZENG3, Ze WANG2, Yongbin JIN2,†, Yuhong CAO1,†, Hongtao WANG2, Guillaume SARTORETTI1
1MarmotLab, National University of Singapore    2Center of X-Mechanics, Zhejiang University    3South China University of Technology
*Equal contribution    Corresponding authors    Under Review
TAGA enables agile and robust humanoid locomotion across diverse challenging terrains.

TAGA enables agile and robust humanoid locomotion across diverse challenging terrains. Deployed on a Unitree G1 with onboard Jetson Orin inference, the robot traverses up to 1.2 m gaps, narrow beams, sparse stepping stones, stairs, and outdoor terrain.

Abstract

Agile humanoid locomotion across diverse challenging terrain demands both wide perceptual coverage and precise local geometry understanding. Motivated by the way humans selectively look at relevant terrain during locomotion, we introduce TAGA, a Terrain-aware Active Gaze learning framework for Attention-based humanoid control. By fusing vision, proprioception, and motion commands, our framework guides the model to learn anticipatory cues and actively attend to specific areas of the height scan, selectively using these informative regions for the downstream network. This adaptively increases the information density of observations under tight onboard computational constraints, thus enabling fine-grained perceptive locomotion over larger-scale terrains. We find that such gaze behaviors can naturally emerge through reinforcement learning alone, without requiring additional supervision or explicit guidance, significantly improve training efficiency. As a result, the trained policy demonstrates robust and generalizable locomotion in simulation and on hardware, including reliable terrain-aware foothold selection, elevated-platform traversal, competitive sparse-foothold traversal, and the largest reported real-world gap traversal distance of 1.2 m among perceptive humanoid locomotion systems, while maintaining stability under severe perceptual disturbances and environmental interference.

Method Overview

TAGA architecture overview

Active gaze for terrain-aware control

TAGA fuses a forward-facing depth image, proprioceptive history, and motion commands to predict a task-relevant gaze location on the height scan. The selected terrain patch increases local information density while keeping onboard computation compact.

A visuomotor fusion encoder applies cross-attention over the cropped terrain region, producing a terrain-aware representation for the downstream locomotion policy. A mixture-of-experts action decoder then adapts control to diverse terrain conditions.

Simulation and Real-World Results

TAGA is evaluated across simulated terrains and hardware deployments, including sparse footholds, stairs, beams, gaps, and outdoor environments.

Simulation evaluation

Simulation Evaluation

Curriculum training covers challenging terrain families and supports robust skill acquisition before deployment-oriented randomization.

Real-world hardware evaluation

Real-World Evaluation

Hardware experiments demonstrate agile locomotion on a Unitree G1, including large gap traversal and sparse foothold selection.

TAGA terrain set

Terrain Diversity

The terrain set spans gaps, stairs, sparse stepping stones, narrow beams, slopes, and obstacle-crossing scenarios.

BibTeX

@article{li2026taga,
  title={TAGA: Terrain-aware Active Gaze Learning for Generalizable Agile Humanoid Locomotion},
  author={Li, Peizhuo and Li, Hongyi and Fan, Mingfeng and Xu, Fangzhou and Liao, Shuhao and Ma, Yuxuan and Zeng, Zicheng and Wang, Ze and Jin, Yongbin and Cao, Yuhong and Wang, Hongtao and Sartoretti, Guillaume},
  journal={arXiv preprint},
  year={2026}
}