01_Classification

In Computer Vision, CNNs have become the dominant models for vision tasks since 2012. There is an increasing convergence of computer vision and NLP with much more efficient class of architectures.

Pure Convolution

Full Network

[HRNet] "Deep high-resolution representation learning for visual recognition", 2020. (maintain high-resolution representation throughout the whole network).

[AlexNet] ImageNet Classification with Deep Convolutional Neural Networks(AlexNet)

[VGG] Very Deep Convolutional Networks For Large-Scale Image Recognition(VGG)

[NIN] Network In Network(NIN)

[GoogleNet] Going Deeper with Convolutions(GoogleNet)

[ResNet] Deep Residual Learning for Image Recognition(ResNet)

[DenseNet] Densely Connected Convolutional Networks(DenseNet)

[SENet] Squeeze-and-Excitation Networks(SENet)

[GENet] Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks(GENet)

Convolutional Neural Networks with layer reuse(LruNet)

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond(GCNet)

Rethinking ImageNet Pre-training

Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation

Inceptions [68]

ResNe(X)t [28, 87]

MobileNet [34]

RegNet [54]

[Depthwise Separable Convolution] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

[MBConv] Efficientnet: Rethinking model scaling for convolutional neural networks

Involution: Inverting the Inherence of Convolution for Visual Recognition

EfficientNetV2: Smaller Models and Faster Training: [Paper] [Code]

Bag of Tricks for Image Classification with Convolutional Neural Networks_CVPR_2019 [Paper] [Code]

AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing [Paper]

Demystifying local vision transformer: Sparse connectivity, weight sharing, and dynamic weight.

[Fix-EfficientNet] H Touvron, A Vedaldi, M Douze, H Jégou, "Fixing the train-test resolution discrepancy", Advances in Neural Information Processing Systems 32 (NeurIPS), 2019.

[ConvMixer] A. Trockman, and J. Z. Kolter, "Patches Are All You Need?", in arXiv:2201.09792, 2022.

GFNet (Fast Fourier Transform (FFT))

[ConvNeXt] Z. Liu, H. Mao, C. Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, "A ConvNet for the 2020s", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11976-11986, 2022. [Code] [Video]

[ConvNeXt V2] S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie "ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders", in arXiv:2301.00808, 2023.

[NFNet] A Brock, S De, SL Smith, K Simonyan, "High-Performance Large-Scale Image Recognition Without Normalization", Proceedings of the 38th International Conference on Machine Learning, PMLR 139:1059-1071, 2021.

Light-weight

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size(SqueezeNet)

Mobilenets: Efficient convolutional neural networks for mobile vision applications(Mobilenet V1)

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices(ShuffleNet V1)

Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation(Mobilenet V2)

SqueezeNext: Hardware-Aware Neural Network Design(SqueezeNext)

CondenseNet: An Efficient DenseNet using Learned Group Convolutions(CondenseNet)

Pelee: A Real-Time Object Detection System on Mobile Devices(PeleeNet)

ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design(ShuffleNet V2)

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation(ESPNet)

ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions(ChannelNets)

ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network(ESPNetV2)

Interleaved Group Convolutions for Deep Neural Networks(IGCV1)

IGCV2: Interleaved Structured Sparse Convolutional Neural Networks(IGCV2)

IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks(IGCV3)

MnasNet: Platform-Aware Neural Architecture Search for Mobile(MnasNet)

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search(FBNet)

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks(EfficientNet)

DiCENet: Dimension-wise Convolutions for Efficient Networks(DiCENet)

Hybrid Composition with IdleBlock: More Efficient Networks for Image Recognition

An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection

Multi-Layer Perceptron (MLP)

"Are we ready for a new paradigm shift? A Survey on Visual Deep MLP"

[MLP-Mixer] I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, M. Lucic, A. Dosovitskiy, "MLP-Mixer: An all-MLP Architecture for Vision", Part of Advances in Neural Information Processing Systems 34 (NeurIPS), 2021. (based on two types of MLPs)

FNet: Mixing Tokens with Fourier Transforms, arXiv 2021 [Paper] [Code] (based on unparameterized Fourier Transform)

[EANet] M. H. Guo, Z. N. Liu, T. J. Mu, and S. M. Hu, "Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks", in arXiv preprint arXiv:2105.02358, 2021. [Personal Summary]

[ViP] Q. Hou, Z. Jiang, L. Yuan, M. M. Cheng, S. Yan, J. Feng, "Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.

ResMLP: Feedforward networks for image classification with data-efficient training

RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition [Paper]

[gMLP] H. Liu, Z. Dai, D. So, Q. V. Le, "Pay Attention to MLPs", Advances in Neural Information Processing Systems (NeurIPS), 2021. (based on MLP with gating)

RaftMLP: Do MLP-based Models Dream of Winning Over Computer Vision? [Paper]

S²-MLPv2: Improved Spatial-Shift MLP Architecture for Vision [Paper]

01_Classification

Overview

Dataset

Metrics

Papers

Pure Convolution

Full Network

Light-weight

Re-Parameter

Multi-Layer Perceptron (MLP)

About Me: