Concurrent Training of a Control Policy and a State Estimator for Dynamic and Robust Legged Locomotion
Untitled
Previous problems:
- some control approaches rely on estimated state input, but existing state estimation algorithms are unreliable on challenging terrains, such as ice or sand.
- some rely on contact states, which can be estimated from either the model or sensors, but it is costly to calculate from the model and sensors might face damages during foot landing.
- terrain can be estimated by using a trained NN and proprioceptive state history, which has two problems: “latent vectors…cannot be used in conjunction with other modules that require state information” and encoding training is costly.
In this paper:
concurrently train both control policy and state estimator
Contents:
Method
Controller Input: vel. in forward & lateral directions, yaw rate
Estimator $\overset {state\ variables\ for\ control} →$ Actor $\overset {actuator\ commands} →$ Critic - helps reduce variance in the policy gradient estimate from the RL algorithms
Algorithm:
- PPO for Actor & Critic
- Supervised learning for Estimator
Overall framework