학술논문

Multi-Agent Deep Reinforcement Learning for Joint Decoupled User Association and Trajectory Design in Full-Duplex Multi-UAV Networks
Document Type
Periodical
Source
IEEE Transactions on Mobile Computing IEEE Trans. on Mobile Comput. Mobile Computing, IEEE Transactions on. 22(10):6056-6070 Oct, 2023
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Autonomous aerial vehicles
Trajectory
Optimization
Uncertainty
Reinforcement learning
Games
Full-duplex system
Decoupled UL-DL association
in-band full-duplex communication
multi-agent deep reinforcement learning
proximal policy optimization
trajectory design
unmanned aerial vehicle (UAV)
Language
ISSN
1536-1233
1558-0660
2161-9875
Abstract
In multi-UAV networks, the downlink (DL) and uplink (UL) associations between a UAV and a user equipment (UE) is typically coupled, which restricts each UE to associate to the same UAV for both DL and UL. However, this mode may not be efficient since UAV networks can be heterogeneous (e.g., multi-tier UAV networks) and can experience high link uncertainty due to the mobility of UAVs. The introduction of full-duplex communication in a multi-UAV network further complicates the UE-UAV association. For this reason, the idea of DL-UL decoupling (DUDe) is introduced in this work, with which each UE is allowed to associate with separate UAVs for UL and DL transmissions. Besides, the UE-UAV association depends on the flight trajectory of the UAVs, which makes the DUDe design challenging. In this article, we study the joint decoupled UL-DL association and trajectory design problem for full-duplex multi-UAV networks. A joint optimization problem is formulated with the objective of maximizing the UEs’ sum-rate in both UL and DL. Since the problem is non-convex with sophisticated states and an individual UAV may not know the reward functions of other UAVs, a robust partially observable Markov decision process (POMDP) model is proposed to characterize the model uncertainty. A multi-agent deep reinforcement learning (MADRL) approach is proposed which enables each UAV to select its policy in a distributed manner. To train the actor-critic neural networks in the MADRL approach, an improved clip and count-based proximal policy optimization (PPO) algorithm is developed. In particular, a modified clip distribution is designed to deal with the hard restrictions between current and old policies, and an intrinsic reward is introduced to enhance the exploration capability. Simulation results illustrate the superiority of our proposed schemes when compared to the benchmarks. The codes are made publicly available in GitHub (https://github.com/isdai/MADRL-PPO).