학술논문

A Robust Log Classification Approach Based on Natural Language Processing
Document Type
Conference
Source
2023 3rd International Conference on Computer, Control and Robotics (ICCCR) Computer, Control and Robotics (ICCCR), 2023 3rd International Conference on. :152-157 Mar, 2023
Subject
Computing and Processing
Robotics and Control Systems
Computational modeling
Semantics
Digital representation
Bit error rate
Process control
Feature extraction
Natural language processing
log classification
natural language processing
Part-of-Speech
word embeddings
Language
Abstract
The log data that records the operating state of a computer system is of great significance for understanding the system state. Log classification is crucial for engineers to monitor the system running status and analysis of system failures. To improve the representation quality of the log template and reduce classification model inference time, we propose a new log classification method based on natural language processing techniques. In this paper, three embedding methods are adopted to complete the word vectorization process and improve the digital representation quality of log templates, which can make full use of semantic information, part-of-speech (PoS) information, and location information of words in log templates. This word vectorization process provides the log classification model with more informative inputs and promotes the model to make better results. The classification model consists of TextCNN and a nonlinear classifier. We utilize the knowledge distillation method and transfer the knowledge from BERT to TextCNN to improve the accuracy and efficiency of the proposed classification model. The effectiveness of our approach is tested on five public datasets and one private dataset collected from a global top e-commerce corporation. The experimental results show that, compared with other state-of-the-art log classification methods, the proposed method performs well and achieves a better classification result.