[Geometric Rotation Prediction] https://sh-tsang.medium.com/review-rotnet-unsupervised-representation-learning-by-predicting-image-rotations-60f4e4f3cf67
[Context Prediction]
[Jigsaw Puzzle] https://sh-tsang.medium.com/review-unsupervised-learning-of-visual-representations-by-solving-jigsaw-puzzles-50b14d755004
[Frame Order Recognition]
[PIRL] I. Misra, L. Maaten, Self-Supervised Learning of Pretext-Invariant Representations, CVPR, 2020.
[AET] L. Zhang, G. J. Qi, L. Wang, J. Luo, "AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations rather than Data", in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVP), 2019
[Colorization] R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization.” in ECCV, vol. 9907. Springer, pp. 649–666, 2016.
[Eotations] S. Gidaris, P. Singh, and N. Komodakis, “Unsupervised representation learning by predicting image rotations,” in ICLR, 2018.
[Jigsaw] M. Noroozi and P. Favaro, “Unsupervised learning of visual representations by solving jigsaw puzzles.” in ECCV, pp. 69–84, 2016.
Update
Carl Doersch, Abhinav Gupta, and Alexei A Efros. Unsupervised visual representation learning by context prediction. In ICCV, pages 1422–1430, 2015.
Video - Ordering Video Frames [1, 18, 34, 41, 51, 79, 83]
Unaiza Ahsan, Rishi Madhok, and Irfan Essa. Video jigsaw: Unsupervised learning of spatiotemporal context for video action recognition. In WACV, 2019.
Basura Fernando, Hakan Bilen, Efstratios Gavves, and Stephen Gould. Self-supervised video representation learning with odd-one-out networks. In CVPR, 2017.
Dahun Kim, Donghyeon Cho, and In So Kweon. Self-supervised video representation learning with space-time cubic puzzles. In AAAI, volume 33, 2019.
Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, and MingHsuan Yang. Unsupervised representation learning by sorting sequences. In CVPR, 2017.
Ishan Misra, C Lawrence Zitnick, and Martial Hebert. Shuffle and learn: unsupervised learning using temporal order verification. In ECCV, 2016.
Donglai Wei, Joseph Lim, Andrew Zisserman, and William T. Freeman. Learning and using the arrow of time. In CVPR, 2018.
Dejing Xu, Jun Xiao, Zhou Zhao, Jian Shao, Di Xie, and Yueting Zhuang. Self-supervised spatiotemporal learning via video clip order prediction. In CVPR, 2019
Video - Tracking [62, 77]
Deepak Pathak, Ross Girshick, Piotr Dollar, Trevor Darrell, and Bharath Hariharan. Learning features by watching objects move. In CVPR, 2017.
Xiaolong Wang and Abhinav Gupta. Unsupervised learning of visual representations using videos. In ICCV, 2015.
Audio - Cross-modal Signals [2, 3, 19, 36, 60, 61]
Relja Arandjelovic and Andrew Zisserman. Look, listen and learn. In ICCV, 2017.
Relja Arandjelovic and Andrew Zisserman. Objects that sound. In ECCV, 2018.
Ruohan Gao, Rogerio Feris, and Kristen Grauman. Learning to separate object sounds by watching unlabeled video. In ECCV, 2018.
Bruno Korbar, Du Tran, and Lorenzo Torresani. Cooperative learning of audio and video models from self-supervised synchronization. In NeurIPS, 2018.
Andrew Owens and Alexei A Efros. Audio-visual scene analysis with self-supervised multisensory features. In ECCV, 2018.
Andrew Owens, Jiajun Wu, Josh H McDermott, William T Freeman, and Antonio Torralba. Ambient sound provides supervision for visual learning. In ECCV, 2016.
Image - Image Colorization [9, 30, 38, 39, 86, 87]
Aditya Deshpande, Jason Rock, and David Forsyth. Learning large-scale automatic image colorization. In ICCV, 2015.
Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. Let there be color!: joint end-to-end learning of global and local image priors for automatic image colorization with simultaneous classification. ACM Transactions on Graphics, 35(4):110, 2016.
Image - Orientation Prediction [20]
Image - Affine Transform Prediction [85]
Image - Predicting Contextual Image Patches [10]
Image - Reordering Image Patches [5, 21, 53, 54, 56]
Image - Counting Visual Primitives [55]
Image - Combinations [11]