학술논문

基于互信息最大化和聚类感知的节点表示学习 / Node representation learning based on mutual information maximization and cluster perception
Document Type
Academic Journal
Author
Source
云南大学学报(自然科学版) / Journal of Yunnan University(Natural Sciences Edition). 46(1):15-22
Subject
节点表示学习
互信息
聚类感知
节点分类
节点聚类
node representation learning
mutual information
cluster perception
node classification
node clustering
Language
Chinese
ISSN
0258-7971
Abstract
节点表示学习是研究各类图结构数据的基础.图结构数据具有复杂的结构关系和丰富的节点信息,因此如何融合图结构和节点信息学习高质量的节点表示仍是一个挑战性问题.为此,提出一种基于互信息最大化和聚类感知的节点表示学习模型.首先,对原始图使用图扩散方法构造扩散图;然后,使用图卷积网络编码两个图到低维特征空间获得节点表示和全局表示;最后,基于互信息最大化原理,最大化一个图的节点表示和另一个图的全局表示间的一致性,反之亦然.同时,将语义相似的节点聚类到同一个簇,并最大化两个图的节点表示间的聚类一致性.在两个引文数据集上的节点分类和节点聚类的实验结果表明,该模型的性能在多项指标上都优于基线方法.以Cora数据集为例,在节点分类任务上,该模型对比基线方法在准确率和F1 值指标上分别提高了2.7和 0.6 个百分点.
Node representation learning is a fundamental technique for studying various graph-structured data.Graph-structured data exhibits complex structure relationships and rich node information,and thus,how to integrate graph structure and node information to learn high-quality node representation is still a challenging problem.Therefore,a node representation learning model based on mutual information maximization and cluster perception is proposed.First,a diffusion graph is constructed by using a graph diffusion method on the original graph;Then,the graph convolution network is used to encode the two graphs into the low-dimensional latent space to obtain the node representation and global representation.Finally,based on the principle of mutual information maximization,the mutual information between the two graphs is maximized by comparing the node representation of one graph with the global representation of the another graph,and vice versa.Meanwhile,nodes with similar semantics are clustered into the same cluster,and the clustering consistency between the node representations of two graphs is maximized.The experimental results on node classification and node clustering on two citation datasets show that the proposed model outperforms baseline methods on several indicators.Taking the Cora dataset as an example,on the node classification task,the model improves the classification accuracy and F1 value indicators by 2.7 and 0.6 percentage points respectively compared with the baseline method.