Physical knowledge of legged locomotion + Deep RL
a control hierarchy:
higher level controllers - based on reduced-order physical model
- task-specific
lower level controllers - based on a learned policy
- applicable to many tasks
Problems with numerous methods:
heuristic (experience-based) method: reward func. must be sufficiently detailed
single expert trajectory method: restricts the solutions to be around the reference trajectory
ex) Feedback Control For Cassie With Deep Reinforcement Learning
to produce effective info for a variety of speeds, the single trajectory was “stretched” and “compressed” to higher and lower speeds. ⇒ sometimes create infeasible trajec.
⇒ trajec. matching reward signal might conflict with dynamic. of sys.
In this paper:
Contents:
Velocity command & clock $\overset {input} →$ library of reduced-order model motions $\overset {returns} →$ positions and velocities of the reduced-order model’s body and feet $\overset {input} →$ learned policy