학술논문

Deep Reinforcement Learning for Optimization of RAN Slicing Relying on Control- and User-Plane Separation
Document Type
Periodical
Source
IEEE Internet of Things Journal IEEE Internet Things J. Internet of Things Journal, IEEE. 11(5):8485-8498 Mar, 2024
Subject
Computing and Processing
Communication, Networking and Broadcast Technologies
Optimization
Resource management
Radio access networks
Base stations
Reinforcement learning
Network slicing
Deep learning
Asynchronous advantage actor–critic (A3C)
control- and user-plane separation (CUPS)
Lyapunov optimization
radio access network (RAN) slicing
Language
ISSN
2327-4662
2372-2541
Abstract
The rapid development of radio access network (RAN) slicing and control- and user-plane separation (CUPS) has created a new paradigm for future networks, namely, CUPS-based RAN slicing. In this article, we formulate the utility optimization problems of the CUPS-based RAN slicing system and propose a Lyapunov-based deep reinforcement learning (L-DRL) framework to solve them. Specifically, we propose that the control plane (CP) and user plane (UP) slices should control their respective power and subcarrier resources. First, we provide coverage-driven slices in the CP for coverage control and data-driven slices in the UP for diverse user requests, where we consider the influence of coverage-driven slices on data-driven slices. Second, we define the system’s utilities as income minus cost, and we formulate the utility maximization problem of the UP as a mixed-integer nonlinear programming (MINLP) problem, which is NP-hard because it considers both continuous actions (densities deployment and power allocation) and discrete action (subcarrier allocation). Furthermore, we design an alternating optimization method for the CP and UP based on the densities of deployment. Finally, we develop a novel framework for mixed-action optimization problems and propose a specific Lyapunov-based asynchronous advantage actor–critic (L-A3C) algorithm. Simulation results demonstrate that our proposed Lyapunov-based A3C (L-A3C) algorithm outperforms the standard A3C algorithm in terms of the convergence while achieving higher performance than Lyapunov optimization. Moreover, our proposed CUPS-based RAN slicing scheme surpasses the benchmark RAN slicing schemes in terms of the achievable rate and delay.