학술논문

EHR-HGCN: An Enhanced Hybrid Approach for Text Classification Using Heterogeneous Graph Convolutional Networks in Electronic Health Records
Document Type
Periodical
Source
IEEE Journal of Biomedical and Health Informatics IEEE J. Biomed. Health Inform. Biomedical and Health Informatics, IEEE Journal of. 28(3):1668-1679 Mar, 2024
Subject
Bioengineering
Communication, Networking and Broadcast Technologies
Components, Circuits, Devices and Systems
Computing and Processing
Signal Processing and Analysis
Text categorization
Convolutional neural networks
Feature extraction
Graph neural networks
Encoding
Electronic medical records
Context modeling
Text classification
heterogeneous graph convolutional network
graph classification
electronic health records
Language
ISSN
2168-2194
2168-2208
Abstract
Text classification is a central part of natural language processing, with important applications in understanding the knowledge behind biomedical texts including electronic health records (EHR). In this article, we propose a novel heterogeneous graph convolutional network method for classifying EHR texts. Our method, called EHR-HGCN, is able to combine context-sensitive word and sentence embeddings with structural sentence-level and word-level relation information to perform text classification. EHR-HGCN reframes EHR text classification as a graph classification task to better capture structural information about the document using a heterogeneous graph. To mine contextual information from a document, EHR-HGCN first applies a bidirectional recurrent neural network (BiRNN) on word embeddings obtained via Global Vectors for word representation (GloVe) to obtain context-sensitive word-level and sentence-level embeddings. To mine structural relationships from the document, EHR-HGCN then constructs a heterogeneous graph over the word and sentence embeddings, where sentence-word and word-word relationships are represented by graph edges. Finally, a heterogeneous graph convolutional neural network is used to classify documents by their graph representation. We evaluate EHR-HGCN on a variety of standard text classification benchmarks and find that EHR-HGCN has higher accuracy and F1-score than other representative machine learning and deep learning methods. We also apply EHR-HGCN to the MedLit benchmark and find it performs with high accuracy and F1-score on the task of section classification in EHR texts. Our ablation experiments show that the heterogeneous graph construction and heterogeneous graph convolutional network are critical to the performance of EHR-HGCN.