Self-Supervised Learning for Graph Data

6 min readApr 25, 2021

Deep Learning has been a subject of interest in solving a lot of complex machine learning problems, more recently on graph data. However most of the solutions are either supervised or semi-supervised which rely highly on labels in the data, causing over-fitting and overall weak robustness. Self-Supervised Learning (SSL) is an up-and-coming solution which mines useful information from unlabelled data making it a very interesting choice in the field of graph data.

What makes self-supervised learning more suitable for graph data?

SSL helps in understanding structural and attributive information that is present in the graph data which would otherwise be ignored when labelled data is used

Getting labelled graph data is expensive and impractical for real world data. Because of graph’s general and complex data structure, SSL pretext tasks work better in this context

How does self-supervised learning work on graph data?

Self-Supervised models help learn generalized information in unlabelled graph data by performing some pretext tasks. A pretext task is a combination of tasks performed to generate useful feature representations without labelled data which can be further used in other downstream tasks

Graph Data and definitions

Definition of Graph

Graph is a set of nodes and set of edges. An adjacency matrix is used to represent the topology of a graph. A graph where the nodes and edges have their own attributes (features) is called an attribute graph. A heterogeneous graph has more than one type of node or edge, while a homogeneous graph is the opposite.

Types of Downstream Graph Analysis Tasks

An embedding is created by a neural network (encoder) from an input graph and then fed into an output head to perform different downstream tasks. There are three main types of downstream tasks which can be summarized as below -

Node-level tasks are different tasks related to the nodes in a graph, for example, node classification where a model trained on a small batch of labelled nodes predicts the labels of the rest of the nodes.
Link-level tasks focus on the edges and representation of nodes, for instance, link prediction where the goal is to recognize any connection between the edges.
Graph-level tasks target the representation of graphs, they learn from multiple graphs and predict the property of a single graph.

Self-Supervised Training Schemes

Based on the relation between graph encoders, self-supervised pretext tasks and downstream tasks, self-supervised training schemes can be classified into 3 types -

Pre-training and Fine-tuning is the first kind of training scheme where in the encoder is pre-trained with pretext tasks and later fine-tuned with specific downstream tasks.
Joint learning is a scheme where the encoder is pre-trained with both pretext and downstream tasks together.
Unsupervised representation learning, where the encoder is first pre-trained with pretext tasks and then the parameters of the encoder are frozen when the model is trained with the downstream tasks. In this training scheme there is no supervision during encoder training.

Types of Graph Self-Supervised Learning

Different branches of existing graph self-supervised learning solutions (photo source)

In this section, we are going to explore four different categories of pretext design techniques in graph self-supervised learning -

Masked Feature Regression (MFR)

This technique is used in image inpainting in computer vision which is a process of restoring damaged images by filling the masked pixels of an image. In the context of graph data, the features of nodes and edges are masked with zero or other tokens. After this step, the goal is to use a Graph Neural Network (GNN) to recover the masked features based on the unmasked data.

Existing approaches for this branch in terms of graph data can be summarized as below-

Masked node feature regression for graph completion — By enabling GNN to extract features from context
AttributeMask — It’s objective is to reconstruct the dense feature matrix processed by PCA
AttrMasking — By replacing the attributes of edges and nodes with special masks, forcing GNN to rebuild them concurrently
Reconstruction techniques — Reconstructing features or embeddings from clean or corrupted input and using them to train the encoder in a joint learning fashion

Auxiliary Property Prediction (APP)

This branch can be used to understand the underlying graph structural and attributive information to extract self-supervision signals. This can be done using classification or regression-based approaches summarized as below -

Regression-based Approach (R-APP) — In this approach, the local properties of graphs are learnt, for example, representative node properties with respect to the overall structure of the graph. Then this information can be used to predict the properties of unlabelled nodes based on predefined clusters in the graph
Classification-based Approach (C-APP) — In contrast to R-APP, this approach relies on constructing pseudo labels. Assigning pseudo labels during training and using these self-supervised labels (attributive), grouping nodes based on inherent topology (structural-based), graph property prediction (statistical properties of nodes and centrality of nodes) are some examples of the classification-based approach (C-APP)

Same-Scale Contrasting (SSC)

This branch of methods learn by predicting the similarity between two elements in a graph, for example, node-node contrasting or graph-graph contrasting. Different sub-branches of this method can be summarized as below -

Context-Based Approaches (C-SSC) — The main idea of this method is to pull the contextual nodes closer in the embedding space. The assumption is that contextually similar nodes are more likely to interconnect in the graph
Augmentation-Based Approaches (A-SSC) — Augmented data samples are generated by this method from original data samples and samples from the same source are regarded as positive pairs, while the samples from different sources are regarded as negative pairs

Cross-Scale Contrasting (CSC)

As opposed to SSC, this approach learns representations by contrasting different elements in a graph, for example, node-graph contrasting, node-subgraph contrasting.

Hybrid Self-supervised Learning

In hybrid learning, instead of using a single approach, different types of pretext tasks can be combined for a better performance.

For example, GPT-GNN combined MFR and C-SSC into a graph generation task to pretrain a Graph Neural Network
Graph-Bert using node feature reconstruction (MFR) and graph structure recovery (C-SSC) to pretrain a graph transformer model

Table showing information about approaches based on one of the four Graph SSL based methods (Photo Source)

Challenges

Lack of theoretical foundation — All the existing methods rely on either intuition or empirical experiments. A strong theoretical foundation for graph SSL will minimize the gap between empirical SSL and graph theories
Augmentation — Because there are many augmentation-based approaches for graph SSL, the data augmentation schemes should be explored further
Pretext tasks for complex graphs — Existing approaches are mostly for attributed graphs, only a few of them focus on complex graphs. It would be promising to have more pretext tasks designed for complex graphs and more ubiquitous graphs

Conclusion

Graph Self-supervised Learning is an interesting topic to explore as most of the data is graph structured and generally unlabelled. Approaches like these help provide better generalization and robust models. Using these methods, we can learn the structural and attributive information present in the graphs which would often be ignored when labelled data is used.

This article is a summary of the paper Graph Self-Supervised Learning: A Survey