학술논문

Offensive Chinese Text Detection Based on Multi-Feature Fusion
Document Type
Conference
Source
2023 4th International Symposium on Computer Engineering and Intelligent Communications (ISCEIC) Computer Engineering and Intelligent Communications (ISCEIC), 2023 4th International Symposium on. :460-465 Aug, 2023
Subject
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Computational modeling
Semantics
Text categorization
Text detection
Syntactics
Feature extraction
Object recognition
offensive text classification
character-words-sentence vector fusion
Wobert
ALBERT
Language
Abstract
To purify the online environment, it is essential to identify objectionable content, including offensive texts. However, some offensive texts are expressed in a more subtle manner, making it difficult to detect their literal characteristics. To enhance the effectiveness of detecting offensive Chinese text, we propose a multi-feature fusion-based method. First, we combine the word vectors obtained from Wobert with the character vectors obtained from ALBERT. The attention mechanism assigns greater importance to key features within the word vectors. Next, we merge the fusion vector with the sentence vector generated by ALBERT, which encompasses contextual semantics and syntactic information. This results in a new fusion vector that captures information at the character, word, and sentence levels. Finally, we employ a fully connected layer to process the three-level fusion vector and obtain the detection outcome. Experimental results demonstrate that this approach provides a comprehensive characterization of offensive text by fusing information from multiple levels. It substantially enhances the detection performance for offensive Chinese text.