학술논문

Scaling UPF Instances in 5G/6G Core With Deep Reinforcement Learning
Document Type
Periodical
Source
IEEE Access Access, IEEE. 9:165892-165906 2021
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
5G mobile communication
Cloud computing
Containers
Support vector machines
Q-learning
Virtual machining
Task analysis
5G
6G
core
PDU session
UPF
deep reinforcement learning
Kubernetes
proximal policy optimization
Language
ISSN
2169-3536
Abstract
In the 5G core and the upcoming 6G core, the User Plane Function (UPF) is responsible for the transportation of data from and to subscribers in Protocol Data Unit (PDU) sessions. The UPF is generally implemented in software and packed into either a virtual machine or container that can be launched as a UPF instance with a specific resource requirement in a cluster. To save resource consumption needed for UPF instances, the number of initiated UPF instances should depend on the number of PDU sessions required by customers, which is often controlled by a scaling algorithm. In this paper, we investigate the application of Deep Reinforcement Learning (DRL) for scaling UPF instances that are packed in the containers of the Kubernetes container-orchestration framework. We propose an approach with the formulation of a threshold-based reward function and adapt the proximal policy optimization (PPO) algorithm. Also, we apply a support vector machine (SVM) classifier to cope with a problem when the agent suggests an unwanted action due to the stochastic policy. Extensive numerical results show that our approach outperforms Kubernetes’s built-in Horizontal Pod Autoscaler (HPA). DRL could save 2.7–3.8% of the average number of Pods, while SVM could achieve 0.7–4.5% saving compared to HPA.