OPT-Mimic: Imitation of Optimized Trajectories for Dynamic Quadruped Behaviors

International Conference on Robotics and Automation (ICRA), 2023

YUNI FUCHIOKA, The University of British Columbia, Canada ZHAOMING XIE, Stanford University, United States of America MICHIEL VAN DE PANNE, The University of British Columbia, Canada

Opt-Mimic teaser image

Paper: ArXiv Code: Trajectory Optimization | Reinforcement Learning | Robot Deployment

Reinforcement Learning (RL) has seen many recent successes for quadruped robot control. The imitation of reference motions provides a simple and powerful prior for guiding solutions towards desired solutions without the need for meticulous reward design. While much work uses motion capture data or hand-crafted trajectories as the reference motion, relatively little work has explored the use of reference motions coming from model-based trajectory optimization. In this work, we investigate several design considerations that arise with such a framework, as demonstrated through four dynamic behaviours: trot, front hop, 180 backflip, and biped stepping. These are trained in simulation and transferred to a physical Solo 8 quadruped robot without further adaptation. In particular, we explore the space of feed-forward designs afforded by the trajectory optimizer to understand its impact on RL learning efficiency and sim-to-real transfer. These findings contribute to the long standing goal of producing robot controllers that combine the interpretability and precision of model-based optimization with the robustness that model-free RL-based controllers offer.


In our motion generation framework, trajectory optimization is used to produce open-loop trajectories feasible for the simplified Single Rigid Body (SRB) model, which is then tracked by imitation-based Reinforcement Learning (RL) to produce closed-loop feedback controllers capable of executing these motions on a full-order model. These motions can then be deployed directly on a physical robot without additional online model adaptations.

System diagram

Results Video

Presentation Video


  doi = {10.48550/ARXIV.2210.01247},
  url = {https://arxiv.org/abs/2210.01247},
  author = {Fuchioka, Yuni and Xie, Zhaoming and van de Panne, Michiel},
  keywords = {Robotics (cs.RO), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {OPT-Mimic: Imitation of Optimized Trajectories for Dynamic Quadruped Behaviors},
  publisher = {arXiv},
  year = {2023},
  copyright = {Creative Commons Attribution 4.0 International}