1. Classification/ Backbone Enhancement
2. Object Detection
3. Segmentation
4. Transformer in Visual
5. Tracking
6. Anomaly/ Defect Detection
7. Data augmentation
8. GAN
9. Medical Imaging
10. Image Clustering
1. Classification/ Backbone Enhancement
- ReXNet: Diminishing Representational Bottleneck on Convolutional Neural Network [Paper] [Code]
- Involution: Inverting the Inherence of Convolution for Visual Recognition [Paper] [Code]
- Coordinate Attention for Efficient Mobile Network Design [Paper] [Code]
- Inception Convolution with Efficient Dilation Search [Paper] [Code]
- RepVGG: Making VGG-style ConvNets Great Again [Paper] [Code]
1.1 Fine-grained classification
Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels [Paper]
Differentiable Patch Selection for Image Recognition [Paper]
Fine-grained Angular Contrastive Learning with Coarse Labels [Paper]
Few-Shot Classification with Feature Map Reconstruction Networks [Paper]
A Realistic Evaluation of Semi-Supervised Learning for Fine-Grained Classification [Paper]
1.2 Image Classification
MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition [Paper]
PML: Progressive Margin Loss for Long-tailed Age Classification [Paper]
Contrastive Learning based Hybrid Networks for Long-Tailed Image Classification [Paper]
Capsule Network is Not More Robust than Convolutional Network [Paper]
Model-Contrastive Federated Learning [Paper]
Towards Good Practices for Efficiently Annotating Large-Scale Image Classification Datasets [Paper]
Correlated Input-Dependent Label Noise in Large-Scale Image Classification [Paper]
1.3 Semi-supervised image classification
SimPLE: Similar Pseudo Label Exploitation for Semi-Supervised Classification [Paper]
1.4 Long-tail visual recognition
Distribution Alignment: A Unified Framework for Long-tail Visual Recognition [Paper]
Improving Calibration for Long-Tailed Recognition [Paper]
Adversarial Robustness under Long-Tailed Distribution [Paper]
2. Object Detection
2.1 Object Detection on COCO
- VarifocalNet: An IoU-aware Dense Object Detector [Paper]
- You Only Look One-level Feature [Paper]
- Multiple Instance Active Learning for Object Detection [Paper] [Code]
- Positive-Unlabeled Data Purification in the Wild for Object Detection [Paper]
- Depth from Camera Motion and Object Detection [Paper]
- Towards Open World Object Detection [Paper] [Code]
- General Instance Distillation for Object Detection [Paper]
- Distilling Object Detectors via Decoupled Features [Paper]
- MeGA-CDA: Memory Guided Attention for Category-Aware Unsupervised Domain Adaptive Object Detection [Paper]
- Informative and Consistent Correspondence Mining for Cross-Domain Weakly Supervised Object Detection [Paper]
- Sparse R-CNN: End-to-End Object Detection with Learnable Proposals [Paper]
- OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection [Paper] [Code]
- End-to-End Object Detection with Fully Convolutional Network [Paper]
- Robust and Accurate Object Detection via Adversarial Learning [Paper]
- I^3Net: Implicit Instance-Invariant Network for Adapting One-Stage Object Detectors [Paper]
- Distilling Object Detectors via Decoupled Features [Paper]
- OTA: Optimal Transport Assignment for Object Detection [Paper]
- Scale-aware Automatic Augmentation for Object Detection [Paper]
- A Closer Look at Fourier Spectrum Discrepancies for CNN-generated Images Detection [Paper]
- Group Collaborative Learning for Co-Salient Object Detection [Paper]
- IQDet: Instance-wise Quality Distribution Sampling for Object Detection [Paper]
- Domain-Specific Suppression for Adaptive Object Detection [Paper]
2.2 Small Object Detection
- Semantic Relation Reasoning for Shot-Stable Few-Shot Object Detection [Paper]
- Dense Relation Distillation with Context-aware Aggregation for Few-Shot Object Detection [Paper]
- FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding [Paper]
- Generalized Few-Shot Object Detection without Forgetting [Paper]
2.3 Multi-target Detection
- There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge [Paper]
2.4 3D Target Detection
- Categorical Depth Distribution Network for Monocular 3D Object Detection [Paper]
- 3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection [Paper]
- ST3D: Self-training for Unsupervised Domain Adaptation on 3D ObjectDetection [Paper]
- Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection [Paper]
- MonoRUn: Monocular 3D Object Detection by Self-Supervised Reconstruction and Uncertainty Propagation [Paper]
- M3DSSD: Monocular 3D Single Stage Object Detector [Paper]
- GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection [Paper]
- LiDAR R-CNN: An Efficient and Universal 3D Object Detection [Paper]
- Exploring intermediate representation for monocular vehicle pose estimation [Paper]
- Delving into Localization Errors for Monocular 3D Object Detection [Paper]
- HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection [Paper]
- Objects are Different: Flexible Monocular 3D Object Detection [Paper]
- Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds [Paper]
- PointAugmenting: Cross-Modal Augmentation for 3D Object Detection [Paper]
- SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud [Paper]
2.5 Rotating target detection
- Dense Label Encoding for Boundary Discontinuity Free Rotation Detection [Paper]
2.6 Target setting
- Unveiling the Potential of Structure-Preserving for Weakly Supervised Object Localization [Paper]
2.7 Dense object detection
- Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection [Paper] [Code]
2.8 Salient object detection
- Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion [Paper]
- Weakly Supervised Video Salient Object Detection [Paper]
- Uncertainty-aware Joint Salient Object and Camouflaged Object Detection [Paper]
2.9 Semi-supervised/ Weakly supervised target detection
- Data-Uncertainty Guided Multi-Phase Learning for Semi-Supervised Object Detection [Paper]
- Points as Queries: Weakly Semi-supervised Object Detection by Points [Paper]
- DAP: Detection-Aware Pre-training with Weak Supervision [Paper]
2.10 Long-tail target detection
- Adaptive Class Suppression Loss for Long-Tail Object Detection [Paper]
2.11 OOD Detection
- MOS: Towards Scaling Out-of-distribution Detection for Large Semantic Space [Paper]
- MOOD: Multi-level Out-of-distribution Detection [Paper]
3. Segmentation
- Information-Theoretic Segmentation by Inpainting Error Maximization [Paper]
- Simultaneously Localize, Segment and Rank the Camouflaged Objects [Paper]
- Capturing Omni-Range Context for Omnidirectional Segmentation [Paper]
- Boundary IoU: Improving Object-Centric Image Segmentation Evaluation [Paper]
- Locate then Segment: A Strong Pipeline for Referring Image Segmentation [Paper]
- InverseForm: A Loss Function for Structured Boundary-Aware Segmentation [Paper]
- Omnimatte: Associating Objects and Their Effects in Video [Paper]
3.1 Panoptic/ Panorama Segmentation
- Fully Convolutional Networks for Panoptic Segmentation [Paper] [Code]
- Cross-View Regularization for Domain Adaptive Panoptic Segmentation [Paper]
- 4D Panoptic LiDAR Segmentation [Paper]
- Cross-View Regularization for Domain Adaptive Panoptic Segmentation [Paper]
- Panoptic-PolarNet: Proposal-free LiDAR Point Cloud Panoptic Segmentation [Paper]
- Panoptic Segmentation Forecasting [Paper]
- Exemplar-Based Open-Set Panoptic Segmentation Network [Paper]
3.2 Instance segmentation
- Zero-Shot Instance Segmentation [Paper]
- Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers [Paper]
- Weakly Supervised Instance Segmentation for Videos with Temporal Mask Consistency [Paper]
- FAPIS: A Few-shot Anchor-free Part-based Instance Segmenter [Paper]
- Weakly-supervised Instance Segmentation via Class-agnostic Learning with Salient Images [Paper]
- Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation [Paper]
- RefineMask: Towards High-Quality Instance Segmentation with Fine-Grained Features [Paper]
- A^2-FPN: Attention Aggregation based Feature Pyramid Network for Instance Segmentation [Paper]
- Incremental Few-Shot Instance Segmentation [Paper]
3.3 Semantic segmentation
- PLOP: Learning without Forgetting for Continual Semantic Segmentation [Paper]
- Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges [Paper]
- Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation [Paper]
- Semi-supervised Domain Adaptation based on Dual-level Domain Mixing for Semantic Segmentation [Paper]
- Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing [Paper]
- Learning Statistical Texture for Semantic Segmentation [Paper]
- MetaCorrection: Domain-aware Meta Loss Correction for Unsupervised Domain Adaptation in Semantic Segmentation [Paper]
- Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations [Paper]
- Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion [Paper]
- Rethinking BiSeNet For Real-time Semantic Segmentation [Paper]
- BBAM: Bounding Box Attribution Map for Weakly Supervised Semantic and Instance Segmentation [Paper]
- Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation [Paper]
- Cross-Dataset Collaborative Learning for Semantic Segmentation [Paper]
- Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization [Paper]
- Non-Salient Region Object Mining for Weakly Supervised Semantic Segmentation [Paper]
- Source-Free Domain Adaptation for Semantic Segmentation [Paper]
- PiCIE: Unsupervised Semantic Segmentation using Invariance and Equivariance in Clustering [Paper]
- Background-Aware Pooling and Noise-Aware Loss for Weakly-Supervised Semantic Segmentation [Paper]
- Progressive Semantic Segmentation [Paper]
- Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalization [Paper]
- DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation [Paper] [Code]
- Self-supervised Augmentation Consistency for Adapting Semantic Segmentation [Paper] [Code]
- Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation [Paper]
3.4 Scene understanding/scene analysis
- Exploring Data Efficient 3D Scene Understanding with Contrastive Scene Contexts [Paper]
- Monte Carlo Scene Search for 3D Scene Understanding [Paper]
- Bidirectional Projection Network for Cross Dimension Scene Understanding [Paper]
- RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening [Paper]
- CoCoNets: Continuous Contrastive 3D Scene Representations [Paper]
- Exploiting Edge-Oriented Reasoning for 3D Point-based Scene Graph Analysis [Paper]
- SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences [Paper]
- Probabilistic Modeling of Semantic Ambiguity for Scene Graph Generation [Paper]
- SceneGraphFusion: Incremental 3D Scene Graph Prediction from RGB-D Sequences [Paper]
- Fully Convolutional Scene Graph Generation [Paper]
- Bipartite Graph Network with Adaptive Message Passing for Unbiased Scene Graph Generation [Paper]
3.5 Cutout
- Real-Time High Resolution Background Matting [Paper]
3.6 Action segmentation
- Global2Local: Efficient Structure Search for Video Action Segmentation [Paper]
- Temporal Action Segmentation from Timestamp Supervision [Paper]
- Temporally-Weighted Hierarchical Clustering for Unsupervised Action Segmentation [Paper]
- Action Shuffle Alternating Learning for Unsupervised Action Segmentation [Paper]
- Anchor-Constrained Viterbi for Set-Supervised Action Segmentation [Paper]
3.7 Radar segmentation
- Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR Segmentation [Paper]
3.8 Video segmentation
- Modular Interactive Video Object Segmentation:Interaction-to-Mask, Propagation and Difference-Aware Fusion [Paper]
- Learning to Recommend Frame for Interactive Video Object Segmentation in the Wild [Paper]
- Efficient Regional Memory Network for Video Object Segmentation [Paper]
- Learning Position and Target Consistency for Memory-based Video Object Segmentation [Paper]
- Guided Interactive Video Object Segmentation Using Reliability-Based Attention Maps [Paper]
- Target-Aware Object Discovery and Association for Unsupervised Video Multi-Object Segmentation [Paper]
- SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation [Paper]
- Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation [Paper]
- Self-Guided and Cross-Guided Learning for Few-Shot Segmentation [Paper]
- Adaptive Prototype Learning and Allocation for Few-Shot Segmentation [Paper]
- Camouflaged Object Segmentation with Distraction Mining [Paper]
- Deep Video Matting via Spatio-Temporal Alignment and Aggregation [Paper]
- Omni-supervised Point Cloud Segmentation via Gradual Receptive Field Component Reasoning [Paper]
4. Transformer in Visual
- Transformer Interpretability Beyond Attention Visualization [Paper] [Code]
- MIST: Multiple Instance Spatial Transformer Network [Paper]
- Variational Transformer Networks for Layout Generation [Paper]
4.1 Motion recognition detection
- 3D Vision Transformers for Action Recognition [Paper]
4.2 Target Detection
- UP-DETR: Unsupervised Pre-training for Object Detection with Transformers [Paper] [Code]
4.3 Image Processing
- Pre-Trained Image Processing Transformer [Paper]
4.4 Human-computer interaction
- End-to-End Human Object Interaction Detection with HOI Transformer [Paper]
- HOTR: End-to-End Human-Object Interaction Detection with Transformers [Paper] [Code]
4.5 Image segmentation
- Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers [Paper]
- [VisTR] End-to-End Video Instance Segmentation with Transformers [Paper] [Code]
4.6 Tracking
- Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking [Paper]
- Transformer Tracking [Paper]
4.7 Action prediction
- Multimodal Motion Prediction with Stacked Transformers [Paper]
4.8 Self-attention mechanism
- Scaling Local Self-Attention For Parameter Efficient Visual Backbones [Paper]
4.9 Search
- Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning [Paper]
4.10 Feature matching
- LoFTR: Detector-Free Local Feature Matching with Transformers [Paper]
4.11 Gesture recognition
- Pose Recognition with Cascade Transformers [Paper]
4.12 Autopilot
- Multi-Modal Fusion Transformer for End-to-End Autonomous Driving [Paper]
5. Tracking
- Rotation Equivariant Siamese Networks for Tracking [Paper]
- Multiple Object Tracking with Correlation Learning [Paper]
- Graph Attention Tracking [Paper]
- LightTrack: Finding Lightweight Neural Networks for Object Tracking via One-Shot Architecture Search [Paper]
- Track, Check, Repeat: An EM Approach to Unsupervised Tracking [Paper]
5.1 Multi-target tracking
- Probabilistic Tracklet Scoring and Inpainting for Multiple Object Tracking [Paper]
- Track to Detect and Segment: An Online Multi-Object Tracker [Paper]
- Multiple Object Tracking with Correlation Learning [Paper]
- Learning a Proposal Classifier for Multiple Object Tracking [Paper]
- Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking [Paper]
- Online Multiple Object Tracking with Cross-Task Synergy [Paper]
5.2 Visual-target tracking
- IoU Attack: Towards Temporally Coherent Black-Box Adversarial Attack for Visual Object Tracking [Paper]
- Learning to Track Instances without Video Annotations [Paper]
5.3 Single-target tracking
- Towards More Flexible and Accurate Object Tracking with Natural Language: Algorithms and Benchmark [Paper]
- SiamGAT: Graph Attention Tracking [Paper]
7. Data augmentation
- KeepAugment: A Simple Information-Preserving Data Augmentation [Paper]
8. GAN
- Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editin [Paper]
- Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation [Paper]
- Hijack-GAN: Unintended-Use of Pretrained, Black-Box GANs [Paper]
- Image-to-image Translation via Hierarchical Style Disentanglement [Paper]
- Efficient Conditional GAN Transfer with Knowledge Propagation across Classes [Paper]
- Anycost GANs for Interactive Image Synthesis and Editing [Paper]
- TediGAN: Text-Guided Diverse Image Generation and Manipulation [Paper]
- Generative Hierarchical Features from Synthesizing Images [Paper]
- Teachers Do More Than Teach: Compressing Image-to-Image Models [Paper]
- PISE: Person Image Synthesis and Editing with Decoupled GAN [Paper]
- LOHO: Latent Optimization of Hairstyles via Orthogonalization [Paper]
- HumanGAN: A Generative Model of Humans Images [Paper]
- HistoGAN: Controlling Colors of GAN-Generated and Real Images via Color Histograms [Paper] https://arxiv.org/abs/2011.11731
- DivCo: Diverse Conditional Image Synthesis via Contrastive Generative Adversarial Network [Paper] https://arxiv.org/abs/2103.07893
- Training Generative Adversarial Networks in One Stage [Paper] https://arxiv.org/abs/2103.00430
- Closed-Form Factorization of Latent Semantics in GANs [Paper] https://arxiv.org/abs/2007.06600 Code https://github.com/genforce/sefa
- pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis [Paper]
- ReMix: Towards Image-to-Image Translation with Limited Data [Paper]
- Unsupervised Disentanglement of Linear-Encoded Facial Semantics [Paper]
- Content-Aware GAN Compression [Paper]
- Regularizing Generative Adversarial Networks under Limited Data [Paper]
- Where and What? Examining Interpretable Disentangled Representations [Paper]
- Few-shot Image Generation via Cross-domain Correspondence [Paper]
- DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort [Paper]
- Surrogate Gradient Field for Latent Space Manipulation [Paper]
- StylePeople: A Generative Model of Fullbody Human Avatars [Paper]
- Ensembling with Deep Generative Views [Paper]
- Continuous Face Aging via Self-estimated Residual Age Embedding [Paper]
8.1 Image to image translation
Memory-guided Unsupervised Image-to-image Translation [Paper]
Image-to-image Translation via Hierarchical Style Disentanglement [Paper]
- CoMoGAN: continuous model-guided image-to-image translation [Paper]
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation [Paper]
8.2 Image editing
StyleMapGAN: Exploiting Spatial Dimensions of Latent in GAN for Real-time Image Editing [Paper]
9. Medical Imaging
- Deep Learning for Chest X-ray Analysis: A Survey [Paper]
- 3D Graph Anatomy Geometry-Integrated Network for Pancreatic Mass Segmentation, Diagnosis, and Quantitative Patient Management [Paper]
- Deep Lesion Tracker: Monitoring Lesions in 4D Longitudinal Imaging Studies [Paper]
- Automatic Vertebra Localization and Identification in CT by Spine Rectification and Anatomically-constrained Optimization [Paper]
- Multi-institutional Collaborations for Improving Deep Learning-based Magnetic Resonance Image Reconstruction Using Federated Learning [Paper]
- DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images [Paper]
- Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles [Paper]
- XProtoNet: Diagnosis in Chest Radiography with Global and Local Explanations [Paper]
Medical image segmentation
- FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space [Paper] [Code]
- DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets [Paper]
- DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation [Paper]
- DARCNN: Domain Adaptive Region-based Convolutional Neural Network for Unsupervised Instance Segmentation in Biomedical Images [Paper]
- Every Annotation Counts: Multi-label Deep Supervision for Medical Image Segmentation [Paper]
Medical image synthesis
- Brain Image Synthesis with Unsupervised Multivariate Canonical CSCℓ4Net [Paper]
References (update 25/05)
Sorting Papers: https://github.com/52CV/CVPR-2021-Papers
Paperwithcode: https://github.com/amusi/CVPR2021-Papers-with-Code
From VinAI Research: https://fb.watch/v/2Xb8Xe2Nq/
https://blog.kitware.com/demos/cvpr-2021-papers/?filter=authors&search=