학술논문

Incorporating Online Learning Into MCTS-Based Intention Progression
Document Type
Periodical
Source
IEEE Access Access, IEEE. 12:56400-56413 2024
Subject
Aerospace
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Engineered Materials, Dielectrics and Plasmas
Engineering Profession
Fields, Waves and Electromagnetics
General Topics for Engineers
Geoscience
Nuclear Engineering
Photonics and Electrooptics
Power, Energy and Industry Applications
Robotics and Control Systems
Signal Processing and Analysis
Transportation
Simulation
Monte Carlo methods
Decision making
Maintenance
Computational modeling
Taxonomy
Search problems
Electronic learning
BDI agents
intention progression problem
Monte-Carlo tree search
online learning
Language
ISSN
2169-3536
Abstract
Agents have been applied to a wide variety of fields, including power systems and spacecraft. Belief-Desire-Intention (BDI) agents, as one of the most widely used and researched architectures, have the advantage of being able to pursue multiple goals in parallel. The problem of deciding “what to do” next at each of the agent’s deliberation cycle is therefore critical for BDI agents, which is defined as the intention progression problem (IPP). Among all existing approaches to IPP, the majority of approaches have overlooked the significance of runtime historical data, thereby limiting the adaptability and decision-making capabilities of agents. In this paper, we propose to incorporate online learning into the current state-of-the-art intention progression approach $S_{A}$ to overcome the above limitations. This approach not only prevents $S_{A}$ from consuming computational resources on ineffective and inefficient simulations, but also significantly improves the execution efficiency of the agent. Especially when dealing with large-scale problem domains, this improvement significantly enhances the planning capability of the agents. In particular, we have proposed the $SA_{Q}$ and $SA_{L}$ schedulers, both of which can learn how to generate “reasonable” rollouts during the simulation phase of MCTS based on historical simulation data at run time. We compare the performance of our approach with the state-of-the-art $S_{A}$ in a range of scenarios of increasing difficulty. The results demonstrate that our approaches outperform $S_{A}$ , both in terms of the number of goals achieved and the computational overhead required.