학술논문

GRL-PS: Graph Embedding-Based DRL Approach for Adaptive Path Selection
Document Type
Periodical
Source
IEEE Transactions on Network and Service Management IEEE Trans. Netw. Serv. Manage. Network and Service Management, IEEE Transactions on. 20(3):2639-2651 Sep, 2023
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Routing
Heuristic algorithms
Topology
Network topology
Markov processes
Decision making
Throughput
Forwarding path selection
adaptive path selection
deep reinforcement learning
graph representation learning
Language
ISSN
1932-4537
2373-7379
Abstract
Forwarding path selection for data traffic is one of the most fundamental operations in computer networks, whose performance drastically impacts both transmission efficiency and reliability in network domains. Although deep reinforcement learning (DRL) has attracted considerable attention for path selection instead of hand-tuned heuristics, few works have considered how to exploit graph-structured information in networks to improve routing and forwarding efficiency. In fact, generating routes is essentially a process for finding a subgraph in a graph-structured network. To this end, this paper proposes an effective and novel graph embedding-based DRL framework for adaptive path selection (termed GRL-PS), aiming at reducing end-to-end (E2E) latency and promoting network throughput while maintaining stability in dynamically changing environments. Specifically, graph representation learning (GRL) is deployed as an effective enabler for the DRL agent to learn the relational knowledge of interacting entities for route decisions in networks. However, training such an agent in a dynamically changing environment encounters a knowledge acquisition bottleneck, since the DRL agent is always forced to learn every task from scratch. To improve the adaptation of behaviors and acquire skills beyond what the source policy can teach, we introduce potential-based reward shaping as a means of knowledge transfer to guide the agent in unfamiliar conditions with sparse rewards. Experimental results show that compared with baseline methods, our solution can achieve nearly-optimal performance with both latency and throughput, especially in large-scale dynamic networks.