학술논문

Significant Sampling for Shortest Path Routing: A Deep Reinforcement Learning Solution
Document Type
Conference
Source
2019 IEEE Global Communications Conference (GLOBECOM) Global Communications Conference (GLOBECOM), 2019 IEEE. :1-7 Dec, 2019
Subject
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Engineering Profession
General Topics for Engineers
Power, Energy and Industry Applications
Signal Processing and Analysis
Routing
Machine learning
Learning (artificial intelligence)
Delays
Monitoring
Markov processes
Language
ISSN
2576-6813
Abstract
We face a growing ecosystem of applications that produce and consume data at unprecedented rates and with strict latency requirements. Meanwhile, the bursty and unpredictable nature of their traffic can induce highly dynamic environments within networks which endanger their own viability. Unencumbered operation of these applications requires rapid (re)actions by Network Management and Control (NMC) systems which themselves depends on timely collection of network state information. Given the size of today's networks, collection of detailed network states is prohibitively costly for the network transport and computational resources. Thus, judicious sampling of network states is necessary for a cost-effective NMC system. This paper proposes a deep reinforcement learning (DRL) solution that learns the principle of significant sampling and effectively balances the need for accurate state information against the cost of sampling. Modeling the problem as a Markov Decision Process, we treat the NMC system as an agent that samples the state of various network elements to make optimal routing decisions. The agent will periodically receive a reward commensurate with the quality of its routing decisions. The decision on when to sample will progressively improve as the agent learns the relationship between the sampling frequency and the reward function. We show that our solution has a comparable performance to the recently published analytical optimal without the need for an explicit knowledge of the traffic model. Furthermore, we show that our solution can adapt to new environments, a feature that has been largely absent in the analytical considerations of the problem.