Problems with model-based approaches:
⇒ controller would not be fully aware of all the details (torque limits, joint limits, etc)
Alternative solution ⇒ Deep RL offers a model-free approach
In this paper:
Summarized approach:
Formulate the feedback control problem as searching for an optimal imitation policy for a Markov Decision Process → apply DRL to train controllers for bipedal walking tasks in a model-free manner with a single reference motion
Contents
A. RL & Policy Gradient Methods
B. Feedback Control
trajectory optimization is often done offline to produce a nominal trajectory with $\hat{X}$ and $\hat{U}$ that satisfies the equation of motion