Self-Supervised Learning

Y. LeCun, I. Misra, "Self-supervised learning: The Dark Matter of Intelligence", Facebook AI, 2021.

Update

A Framework For Contrastive Self-Supervised Learning And Designing A New Approach
Spatial Cross-Attention Improves Self-Supervised Visual Representation Learning

Distinguishes the similarity and dissimilarity

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International conference on machine learning. pp. 1597–1607. PMLR (2020)

Grill, J.B., Strub, F., Altch´e, F., Tallec, C., Richemond, P., Buchatskaya, E., Doersch, C., Avila Pires, B., Guo, Z., Gheshlaghi Azar, M., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems 33, 21271–21284 (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 9729–9738 (2020)

Distinguishes the similarity only (self-distillation)

Caron, M., Touvron, H., Misra, I., J´egou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9650–9660 (2021).

Unfilters

Bao, H., Dong, L., Wei, F.: Beit: Bert pre-training of image transformers. arXiv preprint arXiv:2106.08254 (2021).

Applications

S. T. Ly, B. Lin, H. Q. Vo, D. Maric, B. Roysam, H. V. Nguyen, "Student Collaboration Improves Self-Supervised Learning: Dual-Loss Adaptive Masked Autoencoder for Brain Cell Image Analysis", in arXiv:2205.05194, 2022.

[MICLe]_S. Azizi, B. Mustafa, F. Ryan, Z. Beaver, J. Freyberg, J. Deaton, An. Loh, A. Karthikesalingam, S. Kornblith, T. Chen, V. Natarajan, M. Norouzi, "Big Self-Supervised Models Advance Medical Image Classifications", in ICCV 2021. [Paper] [Link]
"Self-Supervised Learning Based on Spatial Awareness for Medical Image Analysis"

Contrastive Learning

[CPC] A. Oord, Y. Li, O. Vinyals, Representation Learning with Contrastive Predictive Coding, in arXiv:1807.03748, 2018.
[DIM] R. D. Hjelm, A. Fedorov, S. Lavoie-Marchildon, K. Grewal, P. Bachman, A. Trischler, Y. Bengio, Learning deep representations by mutual information estimation and maximization, ICLR, 2019. [Code]
[AMDIM] P. Bachman, R. D. Hjelm, W. Buchwalter, Learning Representations by Maximizing Mutual Information Across Views, NIPS, 2019. [Code]
[CSL] W. Falcan, K. Cho, "A Framework For Contrastive Self-Supervised Learning And Designing A New Approach", arXiv preprint arXiv:2009.00104, 2020.
[CMC] Y. Tian, D. Krishnan, P. Isola, Contrastive Multiview Coding, ECCV, 2020. [Code]
[CPC V2] O. Henaff, A. Razavi, C. Doersch, S. Eslami, A. Oord, Data-Efficient Image Recognition with Contrastive Predictive Coding, ICML, 2020
[MoCo] K. He, H. Fan, Y. Wu, S. Xie, R. Girshick, Momentum Contrast for Unsupervised Visual Representation Learning, CVPR, 2020. [Code]
[MoCov2]
[MoCo V2+] "Improved Baselines with Momentum Contrastive Learning",
[SimCLR] T. Chen, S. Kornblith, M. Norouzi, G. Hinton, A Simple Framework for Contrastive Learning of Visual Representations, ICML, 2020. [Code]
[YADIM]
[DRC] Deep Robust Clustering by Contrastive Learning
[MiCo] "EqCo: Equivalent Rules for Self-supervised Contrastive Learning"
[InterCLR] J. Xie, X. Zhan, Z. Liu, Y. S. Ong, C. C. Loy, "Delving into Inter-Image Invariance for Unsupervised Visual Representations", International Journal of Computer Vision, (IJCV), 2022.

Update

[Examplar]
[InstDict]
[OBoW] "OBoW: Online Bag-of-Visual-Words Generation for Self-Supervised Learning", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6830-6840, 2021.
[BoWNet] S. Gidaris, A. Bursuc, N. Komodakis, P. Pérez, and M. Cord, "Learning representations by predicting bags of visual words", In CVPR, 2020.
[PCL] J. Li, P. Zhou, C. Xiong, R. Socher, and S. Hoi, "Prototypical contrastive learning of unsupervised representations", In ICLR, 2021.
[Barlow Twins] "Barlow Twins: Self-Supervised Learning via Redundancy Reduction", ICML 2021.
"Understanding Deep Contrastive Learning via Coordinate-wise Optimization"
[DeepCluster V2] "Unsupervised Learning of Visual Features by Contrasting Cluster Assignments",
[DINO] "Emerging Properties in Self-Supervised Vision Transformers",
[NNBYOL] [NNCLR] [NNSiam]D Dwibedi, Y Aytar, J Tompson, P Sermanet, A. Zisserman, "With a Little Help from My Friends: Nearest-Neighbor Contrastive Learning of Visual Representations", Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9588-9597, 2021. [Code]
[ReSSL] "ReSSL: Relational Self-Supervised Learning with Weak Augmentation",
[SimSiam] X. Chen, K. He, Exploring Simple Siamese Representation Learning, CVPR, 2021.
[SupCon] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, "Supervised Contrastive Learning", in Advances in Neural Information Processing Systems 33 (NeurIPS), 2020.
[W-MSE] "Whitening for Self-Supervised Representation Learning", ICML, 2021
[DCL] [DCLW] "Decoupled Contrastive Learning", ECCV, 2022.

Self-Supervision:

NPID, NPID++, DeepClusterV2, ClusterFit,

[SEER] P. Goyal, M. Caron, B. Lefaudeux, M. Xu, P. Wang, V. Pai, M. Singh, V. Liptchinsky, I. Misra, A. Joulin, P. Bojanowski, Self-supervised Pretraining of Visual Features in the Wild, arxiv, 2021. [Code]
- [MI] M. Tschannen, J. Djolonga, P. K. Rubenstein, S. Gelly, M. Lucic, On Mutual Information Maximization for Representation Learning, ICLR, 2020. [Code]
.
[AND] J. Huang, Q. Dong, S. Gong, X. Zhu, Unsupervised Deep Learning by Neighbourhood Discovery, ICML, 2019. [Code]
Z. Wu, Y. Xiong and X. Y. Stella, D. Lin, Unsupervised Feature Learning via Non-parameteric Instance Discrimination, CVPR, 2018. [Code]
Wang, Xiaolong and He, Kaiming and Gupta, Abhinav, Transitive Invariance for Self-supervised Visual Representation Learning, ICCV, 2017.
Li, Dong and Hung, Wei-Chih and Huang, Jia-Bin and Wang, Shengjin and Ahuja, Narendra and Yang, Ming-Hsuan, Unsupervised Visual Representation Learning by Graph-based Consistent Constraints, ECCV, 2016. [Code]

- InfoMin Aug Tian et al. (2020)

Contrastive learning [15, 29, 68, 77, 81]

Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, and Andrew Zisserman. Temporal cycleconsistency learning. In CVPR, 2019
Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain. Time-contrastive networks: Self-supervised learning from video. In ICRA, 2018.
Xiaolong Wang and Abhinav Gupta. Unsupervised learning of visual representations using videos. In ICCV, 2015
Zhirong Wu, Yuanjun Xiong, Stella X Yu, and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, 2018.

Maximizing Mutual Information [4, 29, 31]

"Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency" (NCE loss)
"On Variational Bounds of Mutual Information" (InfoNCE loss)
X. Ji, J.F. Henriques, and A. Vedaldi. Invariant information clustering for unsupervised image classification and segmentation. In ICCV, 2019.

Unfilter:

SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption_2021 [Paper]
Improving Contrastive Learning by Visualizing Feature Transformation_2021
Video Contrastive Learning with Global Context_2021
Learning a Similarity Metric Discriminatively, with Application to Face Verification By Chopra et al in 2004.

Application:

Semi-weakly Supervised Contrastive Representation Learning for Retinal Fundus Images_2021 [Paper]
The Effect of the Loss on Generalization: Empirical Study on Synthetic Lung Nodule Data [Paper]
Towards Domain-Agnostic Contrastive Learning [Paper] [Personal Summary]
"A Framework For Contrastive Self-Supervised Learning And Designing A New Approach" [Review]

Non-Contrastive

[DeeperCluster]
[MoCo-v2]
[ClusterFit]
[SwAV]
[SimSiam]

To-do List:

[MIM]_J. Zhou, C. Wei, H. Wang, W. Shen, C. Xie, A. Yuille, T. Kong, "iBOT: Image BERT Pre-Training with Online Tokenizer", in arXiv:2111.07832, 2021. [Paper]
[TWIST]_F. Wang, T. Kong, R. Zhang, H. Liu, and H. Li,"Self-Supervised Learning by Estimating Twin Class Distributions", in arXiv preprint arXiv:2110.07402, 2021. [Paper]

Clustering-based

[SwAV] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, "Unsupervised Learning of Visual Features by Contrasting Cluster Assignments", NIPS, 2020. [Code]
[SeLa] Y. M. Asano, C. Rupprecht, and A. Vedaldi, "Self-labelling Via Simultaneous Clustering and Representation Learning", ICLR, 2020. [Code]
[ClusterFit] X. Yan, I. Misra, A. Gupta, D. Ghadiyaram, and D. Mahajan, "ClusterFit: Improving Generalization of Visual Representations", CVPR, 2020.
[DeeperCluster] M. Caron, P. Bojanowski, J. Mairal, and A. Joulin, "Unsupervised Pre-Training of Image Features on Non-Curated Data", ICCV, 2019. [Code]
[DeepCluster] M. Caron, P. Bojanowski, A. Joulin, and M. Douze, "Deep Clustering for Unsupervised Learning of Visual Features", ECCV, 2018. [Code]
J. Yang, D. Parikh, and D. Batra, "Joint Unsupervised Learning of Deep Representations and Image Clusters", CVPR, 2016. [Code]
[DEC] J. Xie, R. Girshick, and A. Farhadi, "Unsupervised Deep Embedding for Clustering Analysis", ICML, 2016. [Code]
[DeepDPM] M. Ronen, S. Finder and O. Freifeld, "DeepDPM: Deep Clustering With An Unknown Number of Clusters", in the Proceeding of CVPR, 2022 [Paper] [Code] [Paper Summary]

Clustering [6, 7, 56, 78]

Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning of visual features. In ECCV, 2018.
Mathilde Caron, Piotr Bojanowski, Julien Mairal, and Armand Joulin. Unsupervised pre-training of image features on non-curated data. In ICCV, 2019.
Xiaolong Wang, Kaiming He, and Abhinav Gupta. Transitive invariance for self-supervised visual representation learning. In ICCV, pages 1329–1338, 2017.

Knowledge Transfer

Mehdi Noroozi, Ananth Vinjimoor, Paolo Favaro, and Hamed Pirsiavash. Boosting self-supervised learning via knowledge transfer. In CVPR, 2018.

Bootstrapping

[BYOL] J. Grill, F. Strub, F. Altché, C. Tallec, P. H. Richemond, E. Buchatskaya, C. Doersch, B. A. Pires, Z. Guo, M. G. Azar, B. Piot, K. Kavukcuoglu, R. Munos, M. Valko, Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning, NIPS, 2020. [Code]

Regularization

[VIbCReg] "Computer Vision Self-supervised Learning Methods on Time Series",
[VICReg] A. Bardes, J. Ponce, and Y. LeCun, "VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning", in ICLR, 2022 [Paper]

Masked Image Modelling

[SimMIM] "SimMIM: a Simple Framework for Masked Image Modeling" [Ref]
[ViTMAE] K He, X Chen, S Xie, Y Li, P Dollár, R Girshick "Masked Autoencoders Are Scalable Vision Learners", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16000-16009, 2022.
[L-MAE] "Label Mask AutoEncoder (L-MAE): A Pure Transformer Method to Augment Semantic Segmentation Datasets"

[UNETR+MAE] L Zhou, H Liu, J Bae, J He, D Samaras, P Prasanna, "Self Pre-training with Masked Autoencoders for Medical Image Analysis", arXiv preprint arXiv:2203.05573, 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
J Li, J Chen, Y Tang, C Wang, BA Landman, SK Zhou, "Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives", Medical Image Analysis, 2023
"Advances in Medical Image Analysis with Vision Transformers: A Comprehensive Review"
"SB-SSL: Slice-Based Self-supervised Transformers for Knee Abnormality Classification fro"m MRI"
[MT-UNet] "SLMT-Net: A Self-supervised Learning based Multi-scale Transformer Network for Cross-Modality MR Image Synthesis"
"MaeFE: Masked Autoencoders Family of Electrocardiogram for Self-Supervised Pretraining and Transfer Learning"
"Masked Autoencoders for Low dose CT denoising"
"Swin MAE: Masked Autoencoders for Small Datasets"
"Contrastive Masked Autoencoders are Stronger Vision Learners"

Pretext Tasks

Geometric Transformation

[Geometric Rotation Prediction] https://sh-tsang.medium.com/review-rotnet-unsupervised-representation-learning-by-predicting-image-rotations-60f4e4f3cf67
- [Examplar] https://sh-tsang.medium.com/review-exemplar-cnn-discriminative-unsupervised-feature-learning-with-convolutional-neural-fa68abe937cc
- [RelPatchLoc] https://medium.com/nerd-for-tech/review-unsupervised-visual-representation-learning-by-context-prediction-self-supervised-51a1d7ce6aff
[Context Prediction]
[Jigsaw Puzzle] https://sh-tsang.medium.com/review-unsupervised-learning-of-visual-representations-by-solving-jigsaw-puzzles-50b14d755004
[Frame Order Recognition]
[PIRL] I. Misra, L. Maaten, Self-Supervised Learning of Pretext-Invariant Representations, CVPR, 2020.
[AET] L. Zhang, G. J. Qi, L. Wang, J. Luo, "AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather than Data", in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVP), 2019
[Colorization] R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization.” in ECCV, vol. 9907. Springer, pp. 649–666, 2016.
[Eotations] S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” in ICLR, 2018.
[Jigsaw] M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles.” in ECCV, pp. 69–84, 2016.

Update

Pre-text tasks

Carl Doersch, Abhinav Gupta, and Alexei A Efros. Unsupervised visual representation learning by context prediction. In ICCV, pages 1422–1430, 2015.

Video - Ordering Video Frames [1, 18, 34, 41, 51, 79, 83]

Unaiza Ahsan, Rishi Madhok, and Irfan Essa. Video jigsaw: Unsupervised learning of spatiotemporal context for video action recognition. In WACV, 2019.
Basura Fernando, Hakan Bilen, Efstratios Gavves, and Stephen Gould. Self-supervised video representation learning with odd-one-out networks. In CVPR, 2017.
Dahun Kim, Donghyeon Cho, and In So Kweon. Self-supervised video representation learning with space-time cubic puzzles. In AAAI, volume 33, 2019.
Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, and MingHsuan Yang. Unsupervised representation learning by sorting sequences. In CVPR, 2017.
Ishan Misra, C Lawrence Zitnick, and Martial Hebert. Shuffle and learn: unsupervised learning using temporal order verification. In ECCV, 2016.
Donglai Wei, Joseph Lim, Andrew Zisserman, and William T. Freeman. Learning and using the arrow of time. In CVPR, 2018.
Dejing Xu, Jun Xiao, Zhou Zhao, Jian Shao, Di Xie, and Yueting Zhuang. Self-supervised spatiotemporal learning via video clip order prediction. In CVPR, 2019

Video - Tracking [62, 77]

Deepak Pathak, Ross Girshick, Piotr Dollar, Trevor Darrell, and Bharath Hariharan. Learning features by watching objects move. In CVPR, 2017.
Xiaolong Wang and Abhinav Gupta. Unsupervised learning of visual representations using videos. In ICCV, 2015.

Audio - Cross-modal Signals [2, 3, 19, 36, 60, 61]

Relja Arandjelovic and Andrew Zisserman. Look, listen and learn. In ICCV, 2017.
Relja Arandjelovic and Andrew Zisserman. Objects that sound. In ECCV, 2018.
Ruohan Gao, Rogerio Feris, and Kristen Grauman. Learning to separate object sounds by watching unlabeled video. In ECCV, 2018.
Bruno Korbar, Du Tran, and Lorenzo Torresani. Cooperative learning of audio and video models from self-supervised synchronization. In NeurIPS, 2018.
Andrew Owens and Alexei A Efros. Audio-visual scene analysis with self-supervised multisensory features. In ECCV, 2018.
Andrew Owens, Jiajun Wu, Josh H McDermott, William T Freeman, and Antonio Torralba. Ambient sound provides supervision for visual learning. In ECCV, 2016.

Image - Image Colorization [9, 30, 38, 39, 86, 87]

Aditya Deshpande, Jason Rock, and David Forsyth. Learning large-scale automatic image colorization. In ICCV, 2015.
Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics, 35(4):110, 2016.

Image - Orientation Prediction [20]

Image - Affine Transform Prediction [85]

Image - Predicting Contextual Image Patches [10]

Image - Reordering Image Patches [5, 21, 53, 54, 56]

Image - Counting Visual Primitives [55]

Image - Combinations [11]

Reconstruction

Image Colorization

Colorful Image Colorization | Real-Time User-Guided Image Colorization with Learned Deep Priors | Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification

Image Superresolution

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Image Inpainting

Context encoders: Feature learning by inpainting

Cross-Channel Prediction

Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction

Image Inpainting

Context encoders: Feature learning by inpainting

Common Sense Tasks

Image Jigsaw Puzzle

Unsupervised learning of visual representations by solving jigsaw puzzles

Context Prediction

Unsupervised Visual Representation Learning by Context Prediction

Geometric Transformation Recognition

Unsupervised Representation Learning by Predicting Image Rotations

Automatic Label Generation

Image Clustering

Synthetic Imagery

Ren et al.

SSL From Video

Frame Order Verification

Survey

Jing, et al. “Self-Supervised Visual Feature Learning with Deep Neural Networks: A Survey.”

References

Self-supervised Learning:

Contrastive Learning:

Applications:

https://paulxiong.medium.com/self-supervised-learning-advances-medical-image-classification-f1a70a85bd8

Done

Self-Supervised Learning

Overview

Dataset

Metrics

Categories

Libraries

For Speech

For Vision

Papers

Theory and Survey

Theory

Survey