학술논문

Heterogeneous Graph Contrastive Learning With Metapath-Based Augmentations
Document Type
Periodical
Source
IEEE Transactions on Emerging Topics in Computational Intelligence IEEE Trans. Emerg. Top. Comput. Intell. Emerging Topics in Computational Intelligence, IEEE Transactions on. 8(1):1003-1014 Feb, 2024
Subject
Computing and Processing
Task analysis
Training
Semantics
Representation learning
Mutual information
Data augmentation
Computational intelligence
Graph neural network
graph representation learning
contrastive learning
Language
ISSN
2471-285X
Abstract
Heterogeneous graph contrastive learning is an effective method to learn discriminative representations of nodes in heterogeneous graph when the labels are absent. To utilize metapath in contrastive learning process, previous methods always construct multiple metapath-based graphs from the original graph with metapaths, then perform data augmentation and contrastive learning on each graph respectively. However, this paradigm suffers from three defects: 1) It does not consider the augmentation scheme on the whole metapath-based graph set, which hinders them from fully leveraging the information of metapath-based graphs to achieve better performance. 2) The final node embeddings are not optimized from the contrastive objective directly, so they are not guaranteed to be distinctive enough. It leads to suboptimal performance on downstream tasks. 3) Its computational complexity for contrastive objective is high. To tackle these defects, we propose a H eterogeneous G raph C ontrastive learning model with M etapath-based A ugmentations ( HGCMA ), which is designed for downstream tasks with a small amount of labeled data. To address the first defect, both semantic-level and node-level augmentation schemes are proposed in our HGCMA for augmentation, where a metapath-based graph and a certain ratio of edges in each metapath-based graph are randomly masked, respectively. To address the second and third defects, we utilize a two-stage attention aggregation graph encoder to output final node embedding and optimize them with contrastive objective directly. Extensive experiments on three public datasets validate the effectiveness of HGCMA when compared with state-of-the-art methods.