01_Classification
In Computer Vision, CNNs have become the dominant models for vision tasks since 2012. There is an increasing convergence of computer vision and NLP with much more efficient class of architectures.
In Computer Vision, CNNs have become the dominant models for vision tasks since 2012. There is an increasing convergence of computer vision and NLP with much more efficient class of architectures.
[HRNet] "Deep high-resolution representation learning for visual recognition", 2020. (maintain high-resolution representation throughout the whole network).
[AlexNet] ImageNet Classification with Deep Convolutional Neural Networks(AlexNet)
[VGG] Very Deep Convolutional Networks For Large-Scale Image Recognition(VGG)
[NIN] Network In Network(NIN)
[GoogleNet] Going Deeper with Convolutions(GoogleNet)
[ResNet] Deep Residual Learning for Image Recognition(ResNet)
[DenseNet] Densely Connected Convolutional Networks(DenseNet)
[SENet] Squeeze-and-Excitation Networks(SENet)
[GENet] Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks(GENet)
Convolutional Neural Networks with layer reuse(LruNet)
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond(GCNet)
Rethinking ImageNet Pre-training
Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation
Inceptions [68]
ResNe(X)t [28, 87]
MobileNet [34]
RegNet [54]
[Depthwise Separable Convolution] MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
[MBConv] Efficientnet: Rethinking model scaling for convolutional neural networks
Involution: Inverting the Inherence of Convolution for Visual Recognition
EfficientNetV2: Smaller Models and Faster Training: [Paper] [Code]
Bag of Tricks for Image Classification with Convolutional Neural Networks_CVPR_2019 [Paper] [Code]
AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing [Paper]
Demystifying local vision transformer: Sparse connectivity, weight sharing, and dynamic weight.
[Fix-EfficientNet] H Touvron, A Vedaldi, M Douze, H Jégou, "Fixing the train-test resolution discrepancy", Advances in Neural Information Processing Systems 32 (NeurIPS), 2019.
[ConvMixer] A. Trockman, and J. Z. Kolter, "Patches Are All You Need?", in arXiv:2201.09792, 2022.
GFNet (Fast Fourier Transform (FFT))
[ConvNeXt] Z. Liu, H. Mao, C. Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, "A ConvNet for the 2020s", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11976-11986, 2022. [Code] [Video]
[ConvNeXt V2] S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I. S. Kweon, and S. Xie "ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders", in arXiv:2301.00808, 2023.
[NFNet] A Brock, S De, SL Smith, K Simonyan, "High-Performance Large-Scale Image Recognition Without Normalization", Proceedings of the 38th International Conference on Machine Learning, PMLR 139:1059-1071, 2021.
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size(SqueezeNet)
Mobilenets: Efficient convolutional neural networks for mobile vision applications(Mobilenet V1)
ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices(ShuffleNet V1)
Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation(Mobilenet V2)
SqueezeNext: Hardware-Aware Neural Network Design(SqueezeNext)
CondenseNet: An Efficient DenseNet using Learned Group Convolutions(CondenseNet)
Pelee: A Real-Time Object Detection System on Mobile Devices(PeleeNet)
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design(ShuffleNet V2)
ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation(ESPNet)
ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions(ChannelNets)
ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network(ESPNetV2)
Interleaved Group Convolutions for Deep Neural Networks(IGCV1)
IGCV2: Interleaved Structured Sparse Convolutional Neural Networks(IGCV2)
IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks(IGCV3)
MnasNet: Platform-Aware Neural Architecture Search for Mobile(MnasNet)
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search(FBNet)
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks(EfficientNet)
DiCENet: Dimension-wise Convolutions for Efficient Networks(DiCENet)
Hybrid Composition with IdleBlock: More Efficient Networks for Image Recognition
An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection
RepVGG: Making VGG-style ConvNets Great Again [Paper] [Fast Read]
ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks [Paper]
Diverse Branch Block: Building a Convolution as an Inception-like Unit [Paper]
"Are we ready for a new paradigm shift? A Survey on Visual Deep MLP"
[MLP-Mixer] I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, M. Lucic, A. Dosovitskiy, "MLP-Mixer: An all-MLP Architecture for Vision", Part of Advances in Neural Information Processing Systems 34 (NeurIPS), 2021. (based on two types of MLPs)
FNet: Mixing Tokens with Fourier Transforms, arXiv 2021 [Paper] [Code] (based on unparameterized Fourier Transform)
[EANet] M. H. Guo, Z. N. Liu, T. J. Mu, and S. M. Hu, "Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks", in arXiv preprint arXiv:2105.02358, 2021. [Personal Summary]
[ViP] Q. Hou, Z. Jiang, L. Yuan, M. M. Cheng, S. Yan, J. Feng, "Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
ResMLP: Feedforward networks for image classification with data-efficient training
RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition [Paper]
[gMLP] H. Liu, Z. Dai, D. So, Q. V. Le, "Pay Attention to MLPs", Advances in Neural Information Processing Systems (NeurIPS), 2021. (based on MLP with gating)
RaftMLP: Do MLP-based Models Dream of Winning Over Computer Vision? [Paper]
S²-MLPv2: Improved Spatial-Shift MLP Architecture for Vision [Paper]
References: