학술논문

Multi-Agent Reinforcement Learning-Based Trading Decision-Making in Platooning-Assisted Vehicular Networks
Document Type
Periodical
Source
IEEE/ACM Transactions on Networking IEEE/ACM Trans. Networking Networking, IEEE/ACM Transactions on. 32(3):2143-2158 Jun, 2024
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Vehicle dynamics
Resource management
Dynamic scheduling
Training
Task analysis
Optimization
Decision making
Platooning-assisted vehicular networks
trading market
multi-agent reinforcement learning (MARL)
Language
ISSN
1063-6692
1558-2566
Abstract
Utilizing the stable underlying and cloud-native functions of vehicle platoons allows for flexible resource provisioning in environments with limited infrastructure, particularly for dynamic and compute-intensive applications. To maximize this potential, we propose the creation of a trading market to encourage interactions between service supporters (vehicle platoons) and requesters (task vehicles). Current trading decisions based on game and negotiations can lead to unpredicted handover costs and increased communication overhead in dynamic environments. Moreover, existing research tends to overlook a mutually beneficial trading philosophy by focusing on either the service supporters’ profitability or the user experience of resource-restrained requesters. Addressing these issues, we introduce a multi-objective optimization problem to model environmental dynamics and uncertainty, aiming to maximize both platoons’ and task vehicles’ long-term utilities while maintaining a satisfactory service access ratio. To tackle the problem within acceptable time frames, we develop a global-local training architecture, incorporating a hybrid action space and prioritized sampling into a multi-agent reinforcement learning algorithm that utilizes a twin delayed deep deterministic gradient (GL-HPMATD3). This approach facilitates consensus in the trading market on key issues, including service request selection, resource allocation, and trading pricing. Through extensive experimentation and comparison, we demonstrate our mechanism’s superior performance in convergence, service access ratio, player utility, execution latency, and trading pricing relative to several state-of-the-art and baseline methods.