Learning Spring Mass Locomotion: Guiding Policies with a Reduced-Order Model

Physical knowledge of legged locomotion + Deep RL

a control hierarchy:

higher level controllers - based on reduced-order physical model

                               - task-specific

lower level controllers - based on a learned policy

                                  - applicable to many tasks

Problems with numerous methods:

heuristic (experience-based) method: reward func. must be sufficiently detailed
single expert trajectory method: restricts the solutions to be around the reference trajectory
- ex) Feedback Control For Cassie With Deep Reinforcement Learning
- to produce effective info for a variety of speeds, the single trajectory was “stretched” and “compressed” to higher and lower speeds. ⇒ sometimes create infeasible trajec.
⇒ trajec. matching reward signal might conflict with dynamic. of sys.

In this paper:

mitigates the reward conflict by using reduced order model trajectories with inverse kinematics to produce feasible walking trajectories.

Contents:

Control Hierarchy

Velocity command & clock $\overset {input} →$ library of reduced-order model motions $\overset {returns} →$ positions and velocities of the reduced-order model’s body and feet $\overset {input} →$ learned policy