Feedback Control For Cassie With Deep Reinforcement Learning
Problems with model-based approaches:
⇒ controller would not be fully aware of all the details (torque limits, joint limits, etc)
Alternative solution ⇒ Deep RL offers a model-free approach
In this paper:
Summarized approach:
Formulate the feedback control problem as searching for an optimal imitation policy for a Markov Decision Process → apply DRL to train controllers for bipedal walking tasks in a model-free manner with a single reference motion
Contents
A. RL & Policy Gradient Methods
B. Feedback Control
trajectory optimization is often done offline to produce a nominal trajectory with $\hat{X}$ and $\hat{U}$ that satisfies the equation of motion
Then a feedback law $u_t = g(x_t, \hat{x}_t)$ is calculated online to track the nominal trajectory by minimizing some distance metrics in X and U.
→ involves solving a QP by linearizing the sys. dynamics along the nominal trajectory [14] [1]
→ Popular choice is Time Varying Linear Quadratic Regulator (TVLQR) [15]
minimize $\displaystyle\sum_{t=1}^{T-1} {\delta_{u_{t}}^{T}R\delta_{u_{t}}+\delta_{x_{t}}^{T}Q\delta_{x_{t}}}$
subject to ${\delta_{x_{t+1}}}={A_t}{\delta_{x_{t}}}+ {B_t}{\delta_{u_{t}}}$
C. Feedback Control Problem Interpreted as RL Problem
Given the dynamical sys. $f({\hat{x_t}}, {\hat{u_t}})$ and reference motion ${\hat{X}}$, we can formulate an MDP