학술논문

CFNAM-PG: Bridging Phonetic and Glyphic Information for Chinese Full Name and Abbreviation Matching Based on Simbert and DenseNet

Document Type

article

Author

Dongsheng Wang; Yue Feng; Jiawei Li; Sha Liu; Miaomiao Zhou; Diming Zhang; Huige Li

Source

International Journal of Computational Intelligence Systems, Vol 17, Iss 1, Pp 1-14 (2024)

Subject

Near homophone
Near homoglyph
Multimodal feature fusion
Full name
Abbreviation matching
Electronic computers. Computer science
QA75.5-76.95

Language

English

ISSN

1875-6883

Abstract

Abstract Matching abbreviated names with their full names (full-abbr matching) plays a key role in data integration, address matching, information retrieval, and other fields. Traditional full-abbr matching technology often encounters issues related to near homophones and near homoglyphs. First, a near-homophone full-abbr matching model based on Simbert and VGG was first proposed, which integrates character and speech features, leveraging a speech recognition model and combining a brain-like cognitive learning dual-process mechanism which involves linguistic knowledge and neural network together. Second, to address the problem of near-homoglyph full-abbr matching in Chinese, a DenseNet-based model that fuses glyph structure and image features was proposed, in which statistical feature extractors are employed to extract feature vectors for glyphic features including stroke, Wubi and structural features separately. Lastly, the near-homophone model and the near-homoglyph model are coupled to work together in the full-abbr matching task, in which expert knowledge is used as a component of the feature optimizer. Experimental results showed that the integrated model significantly increased the matching accuracy to 87.5%, demonstrating a 12.3% improvement.

Online Access

Open Access (DOAJ) Web of Science JCR 저널정보 Scopus Find it@PNU

이메일

부산대학교 도서관

Online Access

메일 발송