학술논문

Towards Generalised Pre-Training of Graph Models
Document Type
Working Paper
Source
Subject
Computer Science - Machine Learning
Computer Science - Artificial Intelligence
Language
Abstract
The principal benefit of unsupervised representation learning is that a pre-trained model can be fine-tuned where data or labels are scarce. Existing approaches for graph representation learning are domain specific, maintaining consistent node and edge features across the pre-training and target datasets. This has precluded transfer to multiple domains. In this work we present Topology Only Pre-Training, a graph pre-training method based on node and edge feature exclusion. Separating graph learning into two stages, topology and features, we use contrastive learning to pre-train models over multiple domains. These models show positive transfer on evaluation datasets from multiple domains, including domains not present in pre-training data. On 75% of experiments, ToP models perform significantly ($P \leq 0.01$) better than a supervised baseline. These results include when node and edge features are used in evaluation, where performance is significantly better on 85.7% of tasks compared to single-domain or non-pre-trained models. We further show that out-of-domain topologies can produce more useful pre-training than in-domain. We show better transfer from non-molecule pre-training, compared to molecule pre-training, on 79% of molecular benchmarks.
Comment: 23 pages, 5 figures, 11 tables. For in-development code see https://github.com/neutralpronoun/general-gcl