학술논문

Transfer-Based DRL for Task Scheduling in Dynamic Environments for Cognitive Radar
Document Type
Periodical
Source
IEEE Transactions on Aerospace and Electronic Systems IEEE Trans. Aerosp. Electron. Syst. Aerospace and Electronic Systems, IEEE Transactions on. 60(1):37-50 Feb, 2024
Subject
Aerospace
Robotics and Control Systems
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Task analysis
Training
Dynamic scheduling
Reinforcement learning
Cognitive radar
Artificial neural networks
Aerodynamics
deep reinforcement learning
multifunction radar
radar resource management
task scheduling
transfer learning
Language
ISSN
0018-9251
1557-9603
2371-9877
Abstract
Cognitive radars sense, interact with, and learn from the environment continuously. This paradigm can be applied to a multifunction radar (MFR), which performs multiple functions, such as surveillance, tracking, and communications amongst others. To execute these tasks, a radar resource management (RRM) module assigns the available resources to these functions while accounting for tasks' parameters, including their priority. This article focuses on the problem of task scheduling within a time window. For the time resource, RRM becomes especially challenging as 1) task requirements can be extremely heterogeneous with multiple priority categories and 2) the scheduling policy should be adaptable to a dynamic environment. Adapting to a nonstationary environment is a key benefit of cognitive radar. While previous works have developed effective techniques for homogeneous tasks in static environments, in this article, we make two key contributions: we formulate a fairly general model for the distributions of task parameters including, specifically, task priorities and delay tolerances; second, we develop the use of transfer learning (TL) within a deep reinforcement learning (DRL) framework to address the challenge of adaptability to a varying environment. Our approach builds on using a Monte Carlo Tree Search (MCTS) aided by a deep neural network (DNN). We show that TL allows accelerated training by transferring the policy learned by training the DNN-based MCTS on an initial parameter distribution (environment) to the policy required for a new distribution. We show that our TL-based approach provides adaptability to both rapid or gradual changes of environment. Our results illustrate the robustness and the computation gains achieved. Moreover, our results show that a multinetwork approach is to be preferred over single network being trained on a series of differing environments.