Sampling-Based System Identification with Active Exploration for Legged Robot Sim2Real Learning

Abstract

Sim-to-real discrepancies hinder learning-based policies from achieving high-precision tasks in the real world. While Domain Randomization (DR) is commonly used to bridge this gap, it often relies on heuristics and can lead to overly conservative policies with degrading performance when not properly tuned. System Identification (Sys-ID) offers a targeted approach, but standard techniques rely on differentiable dynamics and/or direct torque measurement, assumptions that rarely hold for contact-rich legged systems. To this end, we present Sampling-based Parameter Identification with Active Exploration (SPI-Active), a two-stage framework that estimates physical parameters of legged robots to minimize the sim-to-real gap. SPI-Active robustly identifies key physical parameters through massive parallel sampling, minimizing state prediction errors between simulated and real-world trajectories. To further improve the informativeness of collected data, we introduce an active exploration strategy that maximizes the Fisher Information of the collected real-world trajectories via optimizing the input commands of an exploration policy. This targeted exploration leads to accurate identification and better generalization across diverse tasks. Experimental results demonstrate that SPI-Active enables precise sim-to-real transfer of learned policies to the real world, outperforming baselines by 42-63% in various locomotion tasks.

Method

SPI-Active is a two-stage framework that estimates physical parameters of legged robots to minimize the sim-to-real gap. It involves data collection of real-world trajectories using RL policies or motion priors and subsequently estimating the desired parameters using sampling-based optimization. Further, the stage 1 parameters are refined by optimizing input commands of a multi-behavioral policy to maximize Fisher Information and gather informative data. Training with identified parameters from SPI-Active leads to improved sim-to-real transfer in downstream tasks.

Performance in Sim-to-real Locomotion Tasks

We compare the real-world performance of SPI-Active with Vanilla baseline on the following tasks: Forward Jump, Yaw Jump, Velocity Trackingand Attitude Tracking.

Forward Jump (Vanilla)

Forward Jump (SPI-Active)

Yaw Jump (vanilla)

Yaw Jump (SPI-Active)

Velocity Tracking (Vanilla)

Velocity Tracking (SPI-Active)

Attitude Tracking (Vanilla)

Attitude Tracking (SPI-Active)

Generalization to Humanoids

Velocity Tracking (Vanilla)

Velocity Tracking (SPI-Active)

Open-Loop Weave Pole Navigation

Vanilla

SPI-Active

Data Collection with Active Exploration

BibTeX

@article{sobanbabu2025spiactive,
      title={Sampling-Based System Identification with Active Exploration for Legged Robot Sim2Real Learning}, 
      author={Nikhil Sobanbabu and Guanqi He and Tairan He and Yuxiang Yang and Guanya Shi},
      year={2025}
      url={https://arxiv.org/abs/2505.14266}, 
}