KiVi: Kinesthetic-Visuospatial Integration for Dynamic and Safe Egocentric Legged Locomotion

Li, Peizhuo; Li, Hongyi; Ma, Yuxuan; Chang, Linnan; Yang, Xinrong; Yu, Ruiqi; Liao, Shuhao; Zhang, Yifeng; Cao, Yuhong; Zhu, Qiuguo; Sartoretti, Guillaume

A bio-inspired quadruped locomotion framework that separates proprioceptive kinesthetics from visuospatial terrain reasoning, enabling dynamic traversal and graceful fallback under corrupted or unavailable vision.

Peizhuo Li^1,*, Hongyi Li^1,2,*, Yuxuan Ma^1,*, Linnan Chang¹, Xinrong Yang¹, Ruiqi Yu³, Shuhao Liao¹, Yifeng Zhang¹, Yuhong Cao^1,†, Qiuguo Zhu³, Guillaume Sartoretti¹

¹MARMot Lab, National University of Singapore ²Center of X-Mechanics, Zhejiang University ³Robot and Robot Intelligence Lab, Zhejiang University
^*Equal contribution ^†Corresponding author IROS 2026

Abstract

Vision-based locomotion has shown great promise in enabling legged robots to perceive and adapt to complex environments. However, visual information is inherently fragile, being vulnerable to occlusions, reflections, and lighting changes, which often cause instability in locomotion. Inspired by animal sensorimotor integration, we propose KiVi, a Kinesthetic-Visuospatial integration framework, where kinesthetics encodes proprioceptive sensing of body motion and visuospatial reasoning captures visual perception of surrounding terrain. KiVi separates these pathways, leveraging proprioception as a stable backbone while selectively incorporating vision for terrain awareness and obstacle avoidance. Combined with memory-enhanced attention, this design allows robust interpretation of visual cues while maintaining fallback stability through proprioception. Experiments show that KiVi enables quadruped robots to traverse diverse terrains and operate reliably in unstructured outdoor environments, remaining robust to out-of-distribution visual noise and occlusion unseen during training.

Method Overview

Modality-separated robust control

KiVi uses a dual-branch estimator with a Kinesthetic Module for proprioceptive body-motion sensing and a Visuospatial Module for visual terrain reasoning.

The kinesthetic branch provides a stable locomotion backbone, while the visuospatial branch uses memory-enhanced attention to reconstruct terrain structure and anticipate obstacles. Their latent representations are integrated by the downstream actor for dynamic, terrain-aware control.

Single-stage actor-critic training with privileged critic information.
MemTransformer-based temporal memory for visual terrain understanding.
Graceful fallback to proprioception when vision is unreliable.

Simulation and Real-World Results

KiVi is evaluated in simulation and on DeepRobotics Lite3 hardware across visual corruption, terrain traversability, and outdoor disturbance tests.

Diverse Terrain Curriculum

Training spans stairs, platforms, random rough terrain, slopes, gaps, and high walls with increasing procedural difficulty.

Outdoor Traversability

With a constant forward command, the robot traverses tree roots, stairs, elevated platforms, and dynamic pedestrian scenarios.

KiVi under tall grass and camera occlusion

Visual Robustness

Under tall grass and complete camera occlusion, KiVi maintains stable locomotion by falling back to proprioceptive control.

Real-World Visual Disturbances

Reflective surfaces create structured depth artifacts, yet KiVi maintains stable locomotion.

Key findings

5/5 success on high platforms, obstacle avoidance, tall grass, and camera-blocking tests.
Robust to out-of-distribution visual noise such as reflections, vegetation, and complete camera occlusion.
Runs onboard with depth acquisition at 10 Hz, policy inference at 50 Hz, and low-level PD control at 200 Hz.

BibTeX

@inproceedings{li2026kivi,
  title={KiVi: Kinesthetic-Visuospatial Integration for Dynamic and Safe Egocentric Legged Locomotion},
  author={Li, Peizhuo and Li, Hongyi and Ma, Yuxuan and Chang, Linnan and Yang, Xinrong and Yu, Ruiqi and Liao, Shuhao and Zhang, Yifeng and Cao, Yuhong and Zhu, Qiuguo and Sartoretti, Guillaume},
  booktitle={IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  year={2026},
  eprint={2509.23650},
  archivePrefix={arXiv},
  primaryClass={cs.RO}
}

KiVi: Kinesthetic-Visuospatial Integration for Dynamic and Safe Egocentric Legged Locomotion

Video

Abstract

Method Overview

Modality-separated robust control

Simulation and Real-World Results

Diverse Terrain Curriculum

Outdoor Traversability

Visual Robustness

Robustness Under Corrupted Vision

Real-World Visual Disturbances

Key findings

BibTeX