Concurrent Training of a Control Policy and a State Estimator for Dynamic and Robust Legged Locomotion

Untitled

Previous problems:

In this paper:

concurrently train both control policy and state estimator

Contents:

Method

Controller Input: vel. in forward & lateral directions, yaw rate

Estimator $\overset {state\ variables\ for\ control} →$ Actor $\overset {actuator\ commands} →$ Critic - helps reduce variance in the policy gradient estimate from the RL algorithms

Algorithm:

Overall framework