학술논문

Downlink Scheduler for Delay Guaranteed Services Using Deep Reinforcement Learning
Document Type
Periodical
Author
Source
IEEE Transactions on Mobile Computing IEEE Trans. on Mobile Comput. Mobile Computing, IEEE Transactions on. 23(4):3376-3390 Apr, 2024
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Signal Processing and Analysis
Delays
Scheduling
Downlink
Throughput
Optimal scheduling
Wireless networks
Simulation
Resource allocation
packet selection
delay and network utility optimality
deep reinforcement learning
Language
ISSN
1536-1233
1558-0660
2161-9875
Abstract
In this article, we propose a novel scheduling scheme to guarantee per-packet delay in single-hop wireless networks for delay-critical applications. We consider several classes of packets with different delay requirements, where high-class packets yield high utility after successful transmission. Considering the correla-tionship of delays among competing packets, we apply a delay-laxity concept and introduce a new output gain function for scheduling decisions. Particularly, the selection of a packet takes into account not only its output gain but also the delay-laxity of other packets. In this context, we formulate a multi-objective optimization problem aiming to minimize the average queue length while maximizing the average output gain under the constraint of guaranteeing per-packet delay. However, due to the uncertainty in the environment (e.g., time-varying channel conditions and random packet arrivals), it is difficult and often impractical to solve this problem using traditional optimization techniques. We develop a deep reinforcement learning (DRL)-based framework to solve it. Specifically, we decompose the original optimization problem into a set of scalar optimization subproblems and model each of them as a partially observable Markov Decision Process (POMDP). We then resort to a Double Deep Q Network (DDQN)-based algorithm to learn an optimal scheduling policy for each subproblem, which can overcome the large-scale state space and reduce Q-value overestimation. Simulation results show that our proposed DDQN-based algorithm outperforms the conventional Q-learning algorithm in terms of reward and learning speed. In addition, our proposed scheduling scheme can achieve significant reductions in average delay and delay outage drop rate compared to other benchmark schemes.