Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition | Notion

Approaches for learning legged locomotion and their challenges:

Using reference motions to guide learning
- finding high-quality reference motions can be difficult and the trajectories themselves narrowly constrain the space of learned motion
Using reference-free reward functions
- massive variance in policy behavior

In this paper:

propose a reward-specification framework based on composing simple probabilistic periodic costs on basic forces and velocities.

The gait reward should be

specific enough to produce the desired gait
but at the same time, should not be overly constraining, since there is also an uncertainty about the gait due to the terrain and dynamic conditions

Contributions of this paper:

a framework for designing reward functions for gaits that are characterized by periodic swing and stance phases.
- Swing phase: foot forces are penalized while velocities are allowed
```
                    → learn to lift the foot
```
demonstrate this framework for sim-to-real RL of all common bipedal gaits, including walking, running, galloping, skipping, and hopping, without using a motion capture dataset or reference trajectories

Contents: