학술논문

Value-Based Reinforcement Learning for Selective Disassembly Sequence Optimization Problems: Demonstrating and Comparing a Proposed Model
Document Type
Periodical
Source
IEEE Systems, Man, and Cybernetics Magazine IEEE Syst. Man Cybern. Mag. Systems, Man, and Cybernetics Magazine, IEEE. 10(2):24-31 Apr, 2024
Subject
Signal Processing and Analysis
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
General Topics for Engineers
Power, Energy and Industry Applications
Sequential analysis
Q-learning
Mathematical models
Optimization
Cybernetics
Genetic algorithms
Language
ISSN
2380-1298
2333-942X
Abstract
Selective optimal disassembly sequencing (SODS) is a methodology for the disassembly of waste products. Mathematically, it is an optimization problem. However, in the existing research, the connection between the optimization algorithms and the established model is limited to some specific processes, and their generality is poor. Due to the unique characteristics of each disassembly product, most disassembly sequences require modification and even reconstruction of the mathematical model. In this article, reinforcement learning (RL) is used to produce a single-item selective disassembly sequence based on the AND/OR graph. First, the AND/OR graph is mapped to a value matrix and represents the precedence relationship between the component and the values of the component itself. Second, on the basis of the established mathematical model and graph, value-based RL is used to solve the selective disassembly sequencing problem. Finally, the experimental results of the genetic algorithm (GA), Sarsa, Deep Q-learning (DQN), and CPLEX are compared to verify the correctness of the proposed model and the effectiveness of the RL algorithm.