학술논문

Curriculum Reinforcement Learning From Avoiding Collisions to Navigating Among Movable Obstacles in Diverse Environments
Document Type
Periodical
Source
IEEE Robotics and Automation Letters IEEE Robot. Autom. Lett. Robotics and Automation Letters, IEEE. 8(5):2740-2747 May, 2023
Subject
Robotics and Control Systems
Computing and Processing
Components, Circuits, Devices and Systems
Navigation
Training
Task analysis
Convergence
Manipulators
Measurement
Collision avoidance
curriculum learning
deep reinforcement learning
movable obstacles
search and rescue robots
Language
ISSN
2377-3766
2377-3774
Abstract
Curriculum learning has proven highly effective to speed up training convergence with improved performance in a variety of tasks. Researchers have been studying how a curriculum can be constituted to train reinforcement learning (RL) agents in various application domains. However, discovering curriculum sequencing requires the ranking of sub-tasks or samples in order of difficulty, which is not yet sufficiently studied for robot navigation problems. It is still an open question what navigation strategies can be learned and transferred during multi-stage transfer learning from easy to hard. Furthermore, despite of some attempts of learning real robot manipulation tasks using curriculum, most of existing works are limited to toy or simulated settings rather than realistic scenarios. To address those issues, we first investigated how the model convergence in diverse environments relates to the navigation strategies and difficulty metrics. We found that only some of the environments can be trained from scratch, such as in a relatively open tunnel-like environment that only required wall following. We then carried out a two-stage transfer learning for more difficult environments. We found such approach effective for goal navigation, but failed for more complex tasks where movable obstacles may be on the navigation path. To facilitate more complex policies in the navigation among movable obstacles (NAMO) task, another curriculum with distance and pace functions appropriate to the difficulty of the environment was developed. The proposed scheme was proved effective and the strategies learned were discussed via comprehensive evaluations conducted in simulated and real environments.