학술논문

MOIPC-MAAC: Communication-Assisted Multiobjective MARL for Trajectory Planning and Task Offloading in Multi-UAV-Assisted MEC
Document Type
Periodical
Source
IEEE Internet of Things Journal IEEE Internet Things J. Internet of Things Journal, IEEE. 11(10):18483-18502 May, 2024
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Autonomous aerial vehicles
Task analysis
Trajectory planning
Optimization
Reinforcement learning
Energy consumption
Internet of Things
Joint trajectory planning and task offloading (JTPTO)
multiagent reinforcement learning (MARL)
multiobjective reinforcement learning
unmanned aerial vehicle (UAV) cooperation
UAV-assisted mobile edge computing (MEC)
Language
ISSN
2327-4662
2372-2541
Abstract
Existing joint trajectory planning and task offloading (JTPTO) methods provide ultralow latency services for mobile devices (MDs) in unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC). However, UAVs typically provide services to MDs under partial observation, leading to challenges in achieving optimal service performance due to information loss. Moreover, the JTPTO problem typically involves multiobjective optimization, which is challenging because the objectives may conflict with each other. In this article, we present a decentralized JTPTO method based on multiobjective and independently predicted communication multiagent actor–critic (MOIPC-MAAC). First, an IPC network is designed to facilitate UAV agents in learning a prior for communication between UAVs. UAV agents learn this prior through causal reasoning, which represents the mapping of UAV’s observation to the level of confidence in choosing communication partners. The effect of one UAV on another UAV is predicted through the critic-network in multiagent reinforcement learning (MARL) and measured to indicate the necessity of UAV-UAV communication. Further, we regularize JTPTO policies to more effectively utilize exchanged messages. Second, a generalized variant of the Bellman optimality operator with multiple objectives is applied to address the JTPTO problem. We use it to learn a single parameterized expression that encompasses all the best JTPTO policies across the space of preferences. Experiments show that compared to existing solutions, MOIPC-MAAC reduces system costs by 14.23%–19.56% and the communication cost to approximately 11.23%. Moreover, compared to training from scratch, MOIPC-MAAC accelerates the adaptation of new JTPTO tasks with unknown preferences by 13.12%.