Learning Agile and Dynamic Motor Skills for Legged Robots

Untitled

In this paper:

develop a practical methodology for autonomously learning and transferring agile and dynamic motor skills for complex and large legged systems
since it is impossible to accurately model the mechanisms that map actuator commands to the generalized forces acting on the robot system, (such as actuator dynamics, the delays in control signals introduced by multiple hardware and software layers, the low-level controller dynamics, and compliance/damping at the joints), use deep neural network for mapping.

Contents:

Previous Methods

Modular Controller Design
- most popular conventional approach to controlling physical legged sys.
- breaks the control problem down into smaller submodules. Each module is based on template dynamics or heuristics and generates reference values for the next module.
- Template-dynamics-based control module
  - approximates the robot as a point mass with a massless limb to compute the next foothold position
  - Given the foothold positions, the next module computes a parameterized trajectory for the foot to follow. The last module tracks the trajectory with a simple PID controller
  ⇒ MIT LittleDog
- Drawbacks:
- limited detail in the modeling constrains the model’s accuracy ⇒ limiting the operational state domain of each module. (i.e. slow acceleration, fixed upright pose of the body, and limited velocity of the limbs)
- design of modular controllers is extremely laborious. Imagine this arduous work for every new robot or even for every new maneuver
Trajectory Optimization(TO)
- able to mitigate the aforementioned problems.
- two modules: planning and tracking
- a series of approximations are employed to reduce complexity.
<aside> 💡 Need to find more about TO…

</aside>
Reinforcement Learning
- overcome the limitations of prior model-based approaches by learning effective controllers directly from experience.
- Drawbacks:
- Direct application of learning methods to physical legged systems is complicated and thus often applied to simple models.
TO + DRL
- for simulation-to-reality transfer, reality gap is mitigated by:
1. improve simulation fidelity analytically
2. improve simulation fidelity in a data-driven way
  - randomizing the dynamics, adding noise to the observations, …
- use deep neural network for mapping

Method

Command input: forward velocity, lateral velocity, and yaw rate.

Policy network: maps the observation of the current state → joint state history to the joint position targets

outputs joint pos. targets (pos. controller outperforms torque controller in both training speed and final control performance [57])
acti. func.: tanh (bounded acti. func. yields less aggressive trajectories when subjected to disturbances)

Actuator network: maps joint state history → joint position targets to 12 joint torque values

acti. func.: softsign (found to be more efficient than tanh)

Rigid-body simulator: outputs the next state of the robot, given the joint torques and the current state as input

스크린샷 2022-08-09 오후 3.18.11.png