Model-based Diffusion for Trajectory Optimization

* Equal contribution + Equal advising
Carnegie Mellon University
NeurIPS 2024

Overview Video

Generated Trajectories

Visualization of generated trajectories (yellow videos) together with the corresponding diffusion process (red videos).

Abstract

Recent advances in diffusion models have demonstrated their strong capabilities in generating high-fidelity samples from complex distributions through an iterative refinement process. Despite the empirical success of diffusion models in motion planning and control, the model-free nature of these methods does not leverage readily available model information and limits their generalization to new scenarios beyond the training data (e.g., new robots with different dynamics). In this work, we introduce Model-Based Diffusion (MBD), an optimization approach using the diffusion process to solve trajectory optimization (TO) problems without data. The key idea is to explicitly compute the score function by leveraging the model information in TO problems, which is why we refer to our approach as model-based diffusion. Moreover, although MBD does not require external data, it can be naturally integrated with data of diverse qualities to steer the diffusion process. We also reveal that MBD has interesting connections to sampling-based optimization. Empirical evaluations show that MBD outperforms state-of-the-art reinforcement learning and sampling-based TO methods in challenging contact-rich tasks. Additionally, MBD’s ability to integrate with data enhances its versatility and practical applicability, even with imperfect and infeasible data (e.g., partial-state demonstrations for high-dimensional humanoids), beyond the scope of standard diffusion models. Videos and codes are available in the supplementary materials.

Teaser

Model-based Diffusion enables flexible trajectory generation without any data.

Why Model-Based Diffusion?

MBD doesn’t require external data: Diffusion-based planner relies on large-scale and high-quality demonstration data. By leveraging the model information, MBD is data-free, and can be naturally integrated with data to steer the diffusion process.

MBD delivers good performance in short time: MBD can generate high-quality motion plans for contact-rich tasks with nonconvex cost functions within tens of seconds, whose performance is comparable to RL. (Note: MBD vs. RL is not apple-to-apple. For MBD we just replay the planned actions in an open loop whereas RL generates a closed-loop policy) Here are some examples:

Interactive trajectories: Ant HalfCheetah Hopper Walker2d PushT Humanoid Jogging Humanoid Standup Humanoid Run

Diffusion process: Ant HalfCheetah Hopper Walker2d PushT Humanoid Jogging Humanoid Standup Humanoid Run

MBD

How Does Model-Based Diffusion Work?

MBD as a zeroth-order optimization method: Given an optimization problem (which provides model information), MBD seeks for its global optimum through the diffusion process.

Score function estimation: We propose to compute the score function by leveraging the model information and Monte Carlo approximation.

Denoising process: We propose Monte Carlo score ascent (MCSA) to replace reverse SDE to run backward process to achieve faster convergence to the target distribution.

Model-Based Diffusion

Reverse SDE vs. Monte Carlo score ascent (MCSA) on a synthetic highly non-convex objective function. (a) Synthesized objective function with multiple local minima. (b) The intermediate stage density. (c) Reverse SDE vs. MCSA: Background colors represent the density at different stages. MCSA converges faster due to larger step size and lower sampling noise while still capturing the multimodality.

MBD vs. Model-Free Diffusion

MBD leverages model information to compute the score function explicitly, while model-free diffusion learns the score function from data.

Aspect Model-Based Diffusion (MBD) Model-Free Diffusion (MFD)
Target distribution Known, but hard to sample Unknown, but have data from it
Objective Sample high-likelihood solution Generate diverse samples
Score Approximation From model + data (optional) From data
Backward Process Monte Carlo Score Ascent Reverse SDE

BibTeX


            @misc{pan2024modelbaseddiffusiontrajectoryoptimization,
                title={Model-Based Diffusion for Trajectory Optimization}, 
                author={Chaoyi Pan and Zeji Yi and Guanya Shi and Guannan Qu},
                year={2024},
                eprint={2407.01573},
                archivePrefix={arXiv},
                primaryClass={cs.RO},
                url={https://arxiv.org/abs/2407.01573}, 
              }