Author: ****Arnaud Autef

Contents

<aside> 💡 Today, we review SimCLR, a high-performing self-supervised learning algorithm for image data.

SimCLR leverages contrastive learning to learn useful features from vast amounts of unlabeled images.

The SimCLR features allow a mere a linear classifier to reach upwards of 76.5% top1 and 93.2% top5 accuracy on ImageNet.

</aside>

Graph taken from the SimCLR paper. This presents the "Top-1" accuracy of a linear classifier trained on frozen SimCLR features, on the ImageNet dataset.

Contrastive Learning in a few words

At a high-level, contrastive learning refers to tasks where

The dataset $\mathcal{D} = \{(q, k^+, k^-)\}$ is a collection of tuples, each tuple containing
- a query or anchor $q \in \mathbb{R}^p$
- one positive key $x^+ \in \mathbb{R}^p$
- $N$ negative keys $x^- = (x^-i){1 \le i \le N}$ in $\mathbb{R}^p$
The machine learning model consists of an encoder that maps individual data points to a high-dimensional representation

$$ x \in \mathbb{R}^p \mapsto h = f_\theta(x) \in \mathbb{R}^d $$
A scoring function $s$ assigns a similarity to pairs of vectors in the representation space $\mathbb{R}^d$

$$ (h_1, h_2) \in \mathbb{R}^d \times \mathbb{R}^d \mapsto s(h_1, h_2) \in \mathbb{R} $$
The task is to learn a model that assigns larger similarity scores between a query and its corresponding positive example than between a query and its corresponding negative example

$$ \tag{1} \forall (q, x^+, x^-) \in \mathcal{D},~\forall i,~\quad s(f_\theta(q), f_\theta(x^+)) \ge s(f_\theta(q), f_\theta(x^-_i)) $$

<aside> 💡 From the above, we need a couple more ingredients to get to practical task and algorithms

1 - A smart definition of queries, positive examples and negative examples from a dataset of unlabeled data points, that "makes" sense and will yield good representations. 2 - A loss function that favors model satisfying the similarity constraints (1)

→ Let's see what those are for SimCLR!

</aside>

SimCLR through the lens of contrastive learning

Queries, positive and negative examples for SimCLR

SimCLR is an instance of Instance-level discrimination:

Start from a raw dataset of unlabeled data points $x$ (images in the case of SimCLR).
Transform each data point twice $x$ with a stochastic function $\mathcal{T}$, to get transformed points $\tilde{x}_1,~\tilde{x}_2$
Each transformed data point $\tilde{x}_1$ is a query, and
- $\tilde{x}_2$ is its positive example
- all other points $\tilde{x}'_j$ = transformations from other data points $x' \neq x$ are negative examples