학술논문

Fuzzy Feedback Multiagent Reinforcement Learning for Adversarial Dynamic Multiteam Competitions
Document Type
Periodical
Source
IEEE Transactions on Fuzzy Systems IEEE Trans. Fuzzy Syst. Fuzzy Systems, IEEE Transactions on. 32(5):2811-2824 May, 2024
Subject
Computing and Processing
Heuristic algorithms
Task analysis
Optimization
Fuzzy systems
Bayes methods
Reinforcement learning
Real-time systems
Bayesian optimization (BayesOpt)
fuzzy feedback control
multiagent systems
multiteam competition
reinforcement learning (RL)
Language
ISSN
1063-6706
1941-0034
Abstract
A large proportion of recent studies on cooperative multiagent reinforcement learning (MARL) focus on the policy-learning process in scenarios with stationary opponents (or without opponents). This article, instead, investigates a different challenge of achieving team superiority in dynamic competitions among competitors that evolve dynamically with MARL. We aim to enhance the competitiveness of such MARL learners by enabling them to adjust their own learning settings dynamically, so as to take quick counter measures against the policy shift of competitor learners, or to learn faster to suppress the opponents. We propose a competitive automultiagent learner with fuzzy feedback (CALF) with two essential highlights: 1) CALF establishes feedback controllers to achieve real-time adjustments based on fuzzy logic, using human-readable fuzzy rules to provide significant explainability and flexibility; 2) CALF integrates Bayesian optimization to search and optimize the feedback fuzzy logic rules automatically. CALF can be used to apply real-time adjustments for MARL hyperparameters and intrinsic rewards. We also give solid empirical results to show that CALF significantly promotes team competitiveness in adversarial competitions, spanning from small-scale tasks involving two teams to large-scale tasks involving three teams and hundreds of agents. Furthermore, CALF exhibits superior competitiveness when engaging in competition with established competitors, such as Qmix, Qtran, and Qplex, in dynamic competitive environments. Moreover, the experiments also demonstrate that the integration of the fuzzy logic with Bayesian optimization offers considerable transferability and explainability, enabling a CALF-implemented learner optimized from one scenario to be transferred to other distinct scenarios.