Feedback Control For Cassie With Deep Reinforcement Learning

Problems with model-based approaches:

control strategies often need reduced-order abstractions to get solutions. (Simplification)

⇒ controller would not be fully aware of all the details (torque limits, joint limits, etc)

Alternative solution ⇒ Deep RL offers a model-free approach

But problem: (당시 기준) often based on ad-hoc (for particular purposes) simplified simulation models.

In this paper:

demonstrate the effectiveness of DRL on an actual real robot called Cassie

Summarized approach:

Formulate the feedback control problem as searching for an optimal imitation policy for a Markov Decision Process → apply DRL to train controllers for bipedal walking tasks in a model-free manner with a single reference motion

Contents

Background

A. RL & Policy Gradient Methods

Markov Decision Process

B. Feedback Control

Given a dynamical sys. $x_{t+1} = f(x_t, u_t)$, where $x_t$, $x_{t+1} \in X \in R^n$ and $u_t \in U \in R^m$

trajectory optimization is often done offline to produce a nominal trajectory with $\hat{X}$ and $\hat{U}$ that satisfies the equation of motion

Then a feedback law $u_t = g(x_t, \hat{x}_t)$ is calculated online to track the nominal trajectory by minimizing some distance metrics in X and U.

→ involves solving a QP by linearizing the sys. dynamics along the nominal trajectory [14] [1]

→ Popular choice is Time Varying Linear Quadratic Regulator (TVLQR) [15]

minimize $\displaystyle\sum_{t=1}^{T-1} {\delta_{u_{t}}^{T}R\delta_{u_{t}}+\delta_{x_{t}}^{T}Q\delta_{x_{t}}}$

subject to ${\delta_{x_{t+1}}}={A_t}{\delta_{x_{t}}}+ {B_t}{\delta_{u_{t}}}$
- Here, ${\delta_{x_{t}}}={x_t}-{\hat{x}_t}$, and $A_t$, $B_t$ come from linearized dynamics around $f({\hat{x_t}}, {\hat{u_t}})$

C. Feedback Control Problem Interpreted as RL Problem

Given the dynamical sys. $f({\hat{x_t}}, {\hat{u_t}})$ and reference motion ${\hat{X}}$, we can formulate an MDP