Existing work:
In this paper:
recurrent neural networks (RNNs) for sim-to-real biped locomotion, allowing for policies that learn to use internal memory
RNNs are found to outperform memoryless policies in simulation but not on the real biped due to overfitting to the simulation physics
⇒ use dynamics randomization in training to prevent overfitting
Contents: