Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing

* Equal contribution
Carnegie Mellon University

Overview Video

Highlights

Go2 Tracking (Real-World)

Go2 Precise Jump (Real-World)

Go2 Jump (Real-world, viz in sim)

H1 Jogging (Sim)

Go2 Climb (Sim)

H1 Push Light Box (Sim)

Go2 Tracking (Real-world, viz in sim)

Go2 Tracking Forward (Sim)

Go2 Tracking Backward (Sim)

Abstract

Due to high dimensionality and non-convexity, real-time optimal control using full-order dynamics models for legged robots is challenging. Therefore, Nonlinear Model Predictive Control (NMPC) approaches are often limited to reduced-order models or local approximations of full models. Sampling-based MPC has shown potential in nonconvex even discontinuous problems, but often yields suboptimal solutions with high variance, which limits its applications in high-dimensional locomotion. This work introduces DIAL-MPC (Diffusion-Inspired Annealing for Legged MPC), a sampling-based MPC framework with a novel diffusion-style annealing process. Such an annealing process is supported by our theoretical landscape analysis of Model Predictive Path Integral Control (MPPI) and the connection between MPPI and single-step diffusion. Algorithmically, DIAL-MPC iteratively refines solutions online and achieves both global coverage and local convergence. In quadrupedal torque-level control tasks, DIAL-MPC reduces the tracking error of standard MPPI by 13.4 times and outperforms reinforcement learning (RL) policies by 40% in challenging jumping tasks without any training. In particular, DIAL-MPC enables precise real- world quadrupedal jumping with payload. To the best of our knowledge, DIAL-MPC is the first training-free method that optimizes directly over full-order quadruped dynamics in real-time.

Teaser

DIAL-MPC is a sampling-based MPC framework with a novel diffusion-style annealing process.

What is DIAL-MPC?

DIAL-MPC is a training-free full-order torque-level legged robot controller: DIAL-MPC is an MPC framework that can optimize over full-order legged robot dynamics in real-time without heavy assumptions on dynamics and cost functions (i.e., plug-and-play with any model and cost function). It doesn't require reduced-order modeling, linearization, convexification, or predefined contact sequences. To our knowledge, this is the first framework achieving both real-time flexibility and RL-level agility in legged locomotion.

Diffusion-Inspired Annealing for Legged MPC (DIAL-MPC) = Sampling-based MPC + Diffusion-style Annealing: To achieve efficient real-time locomotion, DIAL-MPC extends MPPI with a diffusion-style annealing process in both trajectory-level and action-level to achieve better global coverage and local convergence.

Trajectory-level annealing: at certain time step, DIAL-MPC optimizes the planned trajectory iteratively.

Action-level annealing: across different time step, the same action is optimized when doing receding horizon with a scheduled sampling kernel.

How Does DIAL-MPC Work?

MPPI is a single-stage diffusion process. Given the following non-convex cost function, MPPI aims to optimize over p0. Our theoretical analysis reveals that MPPI is a single-stage diffusion process, i.e., each step of MPPI can be viewed as an ascent step on a noise-conditioned score function of p0 convoluted with the sampling kernel, where the score function is approximated with Monte Carlo sampling.

Demo Task

MPPI suffers from suboptimal solutions or high variance. Due to the sparsity and non-smooth nature of p0, MPPI either optimize a over-smoothed function (sampling kernel too large) or a highly non-smooth function (sampling kernel too small), leading to suboptimal solutions or high variance. Diffusion process overcomes this problem by iteratively refining solutions over a series of smoothing levels.

DIAL-MPC

DIAL-MPC use a multi-stage diffusion process for better coverage and convergence. Compared with MPPI which only uses a single-stage diffusion process, DIAL-MPC iteratively refines solutions in a diffusion manner, leading to better coverage and convergence for contact-rich locomotion tasks.

DIAL-MPC

Compare With Other Methods

Compared with RL: DIAL-MPC is trainig-free and can achieve higher precision control in tasks that require test-time zero-shot adaptation (e.g., robots with payload), thanks to its model-based nature and diffusion-style annealing process.

Compared with Standard Sampling-based MPC: DIAL-MPC achieve more agile motion and higher precision control with better sampling strategy inspired by diffusion models.

Compared with Nonlinear MPC: DIAL-MPC can handle full-order dynamics and arbitrary cost functions in a plug-and-play manner, while existing MPC methods require either reduced-order modeling, specialized cost functions, linearization/convexification, or predefined contact sequences.

Aspect DIAL-MPC Sampling-based MPC Baselines Nonlinear MPC Model-Free RL
Agile Motion Yes No Need careful system identification and solver design Yes
High-precision Control Yes No Yes Need careful reward engineering and training design
Full-Order Dynamics Yes Yes No, especially with contact Yes
Aribitrary Cost Yes Yes No Yes
Training-Free Yes Yes Yes No
Test-time generalization Yes Yes Yes Need extra components in training stage

BibTeX


                @misc{xue2024fullordersamplingbasedmpctorquelevel,
                    title={Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing}, 
                    author={Haoru Xue and Chaoyi Pan and Zeji Yi and Guannan Qu and Guanya Shi},
                    year={2024},
                    eprint={2409.15610},
                    archivePrefix={arXiv},
                    primaryClass={cs.RO},
                    url={https://arxiv.org/abs/2409.15610}, 
                }