학술논문

Adaptive Formation Motion Planning and Control of Autonomous Underwater Vehicles Using Deep Reinforcement Learning
Document Type
Periodical
Source
IEEE Journal of Oceanic Engineering IEEE J. Oceanic Eng. Oceanic Engineering, IEEE Journal of. 49(1):311-328 Jan, 2024
Subject
Geoscience
Power, Energy and Industry Applications
Formation control
Collision avoidance
Planning
Behavioral sciences
Autonomous underwater vehicles
Vehicle dynamics
Task analysis
Actor-critic network
adaptive formation control
deep reinforcement learning (DRL)
multiple autonomous underwater vehicles (AUVs)
motion planning
obstacle avoidance
Language
ISSN
0364-9059
1558-1691
2373-7786
Abstract
Creating safe paths in unknown and uncertain environments is a challenging aspect of leader–follower formation control. In this architecture, the leader moves toward the target by taking optimal actions, and followers should also avoid obstacles while maintaining their desired formation shape. Most of the studies in this field have inspected formation control and obstacle avoidance separately. This article proposes a new approach based on deep reinforcement learning for end-to-end motion planning and control of underactuated autonomous underwater vehicles (AUVs). The aim is to design optimal adaptive distributed controllers based on actor-critic structure for AUVs formation motion planning. This is accomplished by controlling the speed and heading of AUVs. In obstacle avoidance, two approaches are developed. In the first approach, the goal is to design control policies for the leader and followers such that each learns its own collision-free path. Moreover, the followers adhere to an overall formation maintenance policy. In the second approach, the leader solely learns the control policy and safely leads the whole group toward the target. Here, the control policy of the followers is to maintain the predetermined distance and angle. In the presence of ocean currents, communication delays, and sensing errors, the robustness of the proposed method under realistically perturbed circumstances is shown. The efficiency of the algorithms has been evaluated and approved using a number of computer-based simulations.