[MICLe] Multi-Instance Contrastive Learning
Google’s Research Team
{Medical Imaging Analysis}
Paper:
Code:
Google’s Research Team
{Medical Imaging Analysis}
Paper:
Code:
Uses multiple images of the underlying pathology per patient case, to construct more informative positive pairs for self-supervised learning.
A couple of things to keep in mind about the illustrated approach:
Step 1 is carried out using SimCLR, another framework designed by Google for self-supervised representation learning on images.
Unlike step (1), steps (2) and (3) are task and dataset-specific.
It stands for A Simple Framework for Contrastive Learning of Visual Representations, and it significantly advances the state of the art on self-supervised and semi-supervised learning and achieves a new record for image classification with a limited amount of class-labeled data.
SimCLR first learns generic representations of images on an unlabelled dataset, and then it can be fine-tuned with a small amount of labelled images to achieve good performance for a given classification task (just like medical imaging task).
The generic representations are learned by simultaneously maximizing agreement between differently transformed views of the same image and minimizing agreement between transformed views of different images, following a method called contrastive learning. Updating the parameters of a neural network using this contrastive objective causes representations of corresponding views to “attract” each other, while representations of non-corresponding views “repel” each other.
To begin, SimCLR randomly draws examples from the original dataset, transforming each example twice using a combination of simple augmentations, creating two sets of corresponding views.
It then computes the image representation using a CNN, based on ResNet architecture.
Finally, SimCLR computes a non-linear projection of the image representation using a fully-connected network (i.e., MLP), which amplifies the invariant features and maximizes the ability of the network to identify different transformations of the same image.
The trained model not only does well at identifying different transformations of the same image but also learns representations of similar concepts (e.g., chairs vs. dogs), which later can be associated with labels through fine-tuning.
After the initial pre-training with SimCLR on unlabelled natural images is complete, the model is trained to capture the special characteristics of medical image datasets. This, too, can be done with SimCLR, but this method constructs positive pairs only through augmentation and does not readily leverage patients’ metadata for positive pair construction. Hence MICLe is used here.
Given multiple images of a given patient case, MICLe constructs a positive pair for self-supervised contrastive learning by drawing two crops from two distinct images from the same patient case. Such images may be taken from different viewing angles and show different body parts with the same underlying pathology.
This presents a great opportunity for self-supervised learning algorithms to learn representations that are robust to changes of viewpoint, imaging conditions, and other confounding factors in a direct way.
The model is trained end-to-end during fine-tuning, using the weights of the pre-trained network as initialization for the downstream supervised task dataset.
For data augmentation during fine-tuning, random colour augmentation, cropping with resizing, blurring, rotation and flipping were done for the images in both tasks (Dermatology and Chest X-Rays).
For every combination of pretraining strategy and downstream fine-tuning task, an extensive hyperparameter search was performed.
n2 n0
θ