Untitled

Problems with model-based approaches:

⇒ controller would not be fully aware of all the details (torque limits, joint limits, etc)

Alternative solution ⇒ Deep RL offers a model-free approach

In this paper:

Summarized approach:

Formulate the feedback control problem as searching for an optimal imitation policy for a Markov Decision Process → apply DRL to train controllers for bipedal walking tasks in a model-free manner with a single reference motion

Contents

Background

A. RL & Policy Gradient Methods

B. Feedback Control

trajectory optimization is often done offline to produce a nominal trajectory with $\hat{X}$ and $\hat{U}$ that satisfies the equation of motion