Concurrent Training of a Control Policy and a State Estimator for Dynamic and Robust Legged Locomotion

Untitled

Previous problems:

some control approaches rely on estimated state input, but existing state estimation algorithms are unreliable on challenging terrains, such as ice or sand.
some rely on contact states, which can be estimated from either the model or sensors, but it is costly to calculate from the model and sensors might face damages during foot landing.
terrain can be estimated by using a trained NN and proprioceptive state history, which has two problems: “latent vectors…cannot be used in conjunction with other modules that require state information” and encoding training is costly.

In this paper:

concurrently train both control policy and state estimator

Contents:

Method

Controller Input: vel. in forward & lateral directions, yaw rate

Estimator $\overset {state\ variables\ for\ control} →$ Actor $\overset {actuator\ commands} →$ Critic - helps reduce variance in the policy gradient estimate from the RL algorithms

Algorithm:

PPO for Actor & Critic
Supervised learning for Estimator

Overall framework