02_Detection

Localizing an object in a picture means predicting a bounding box around the object and can be expressed as a regression task.

Overview

Dataset

Metrics

Papers

Transformer-based

Plain Backbone

[VitDET] Y Li, H Mao, R Girshick, K He, "Exploring plain vision transformer backbones for object detection", arXiv preprint arXiv:2203.16527, 2022

Hybrid (CNN + Transformer)

[DETR] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” arXiv Preprint, arXiv2005.12872, 2020. [Fast Read]

DINO

[31,35,19,2,12].

DyHead [7], Swin [23] and SwinV2 [22] with HTC++ [4]

The best detection models nowadays are based on improved classical detectors like DyHead [8] and HTC [4]. For example, the best result presented in SwinV2 [22] was trained with the HTC++ [4,23] framework.

training convergence of DETR is slow and the meaning of queries is unclear

deformable attention [41]
decoupling positional and content information [25],
providing spatial priors [11,39,37],
DAB-DETR [21] proposes to formulate DETR queries as dynamic anchor boxes (DAB), which bridges the gap between classical anchor-based detectors and DETR-like ones. DN-DETR [17] further solves the instability of bipartite matching by introducing a denoising (DN) technique
by improving the denoising training, query initialization, and box prediction, we design a new DETR-like model based on DN-DETR [17], DAB-DETR [21], and Deformable DETR [41].

DERT-Variant:

- [DE-DETRs] [DELA-DETR] [DE-CondDETR] [DELA-CondDETR] W. Wang, J. Zhang, Y. Cao, Y. Shen, and D. Tao, "Towards Data-Efficient Detection Transformers", in ECCV, 2022.
- [CondDETR] D Meng, X Chen, Z Fan, G Zeng, H Li, Y Yuan, L Sun, J. Wang, "Conditional DETR for Fast Training Convergence", Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3651-3660, 2021.
- [FP-DETR] "FP-DETR: Detection Transformer Advanced by Fully Pre-training", ICLR, 2022.
- [Deformable-DERT] X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, "Deformable DETR: Deformable transformers for end-to-end object detection". In arXiv preprint arXiv:2010.04159, 2020. [Code] [Fast Read]
- [ACT] M. Zheng, P. Gao, R. Zhang, K. Li, X. Wang, H. Li, and H. Dong, "End-to-end object detection with adaptive clustering transformer", arXiv preprint arXiv:2011.09315, 2020.
- [DINO] H. Zhang, F.Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H. Y. Shum, "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection", in arXiv:2203.03605, 2022. [Fast Read]
- [UP-DETR] Z. Dai, B. Cai, Y. Lin, J. Chen, "UP-DETR: Unsupervised Pre-Training for Object Detection With Transformers", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1601-1610, 2021.
- [DAB-DETR] S Liu, F Li, H Zhang, X Yang, X Qi, H Su, J Zhu, L. Zhang "DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR", ICLR, 2022.
- [DN-DETR] F Li, H Zhang, S Liu, J Guo, LM Ni, L Zhang, "DN-DETR: Accelerate DETR Training by Introducing Query DeNoising", Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13619-13627, 2022.
- DETReg: Unsupervised Pretraining with Region Priors for Object Detection_2021

CNN-based

YOLOv8 [Colab]
"YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors"
YOLOv6 [Code] [FastRead]
[DAMO-YOLO] X Xu, Y Jiang, W Chen, Y Huang, Y Zhang, X Sun, "DAMO-YOLO: A Report on Real-Time Object Detection Design", arXiv preprint arXiv:2211.15444, 2022.
PP-YOLOE
YOLOX
YOLOR
YOLOS
PP-YOLOv2
Scaled YOLOv4
PP-YOLO
YOLOv5
YOLOv4
YOLOv3
YOLO9000
YOLOv1
Learning to Discover and Detect Objects
YOLOX-PAI
[RCNET] "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection", pp. 5637-5645, ACM Multimedia, 2021. [Fast Read] (RevFP, Cross-scale Shift Network)
Rich feature hierarchies for accurate object detection and semantic segmentation(R-CNN)
SSD: Single Shot MultiBox Detector(SSD)
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(Faster R-CNN)
Feature Pyramid Networks for Object Detection(FPN)
Is Faster R-CNN Doing Well for Pedestrian Detection?(RPN_BF)
Training Region-based Object Detectors with Online Hard Example Mining(OHEM)
Receptive Field Block Net for Accurate and Fast Object Detection(RFBNet)
Focal Loss for Dense Object Detection(RetinaNet)
Single-Shot Refinement Neural Network for Object Detection(RefinDet)
PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection(PVANET)
Multi-label learning of part detectors for heavily occluded pedestrian detection(JL-TopS)
Graininess-aware Deep Feature Learning for Pedestrian Detection(GDFL)
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network(M2Det)
CFENet: An Accurate and Efficient Single-Shot Object Detector for Autonomous Driving(CFENet)
ScratchDet: Training Single-Shot Object Detectors from Scratch(ScratchDet)
Pooling Pyramid Network for Object Detection（PPN）
ThunderNet: Towards Real-time Generic Object Detection(ThunderNet)
Light-Weight RetinaNet for Object Detection
CornerNet: Detecting Objects as Paired Keypoints(CornerNet)
Bottom-up Object Detection by Grouping Extreme and Center Points(ExtremeNet)
RepPoints: Point Set Representation for Object Detection(RepPoints)
FCOS: Fully Convolutional One-Stage Object Detection(FCOS)
Mask-Guided Attention Network for Occluded Pedestrian Detection
Learning Rich Features at High-Speed for Single-Shot Object Detection.
Dynamic Anchor Feature Selection for Single-Shot Object Detection.
Contextual Attention for Hand Detection in the Wild
Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression
Multiple Anchor Learning for Visual Object Detection
NETNet: Neighbor Erasing and Transferring Network for Better Single Shot Object Detection
Is Sampling Heuristics Necessary in Training Deep Object Detectors?
EPSANet: An Efficient Pyramid Split Attention Block on Convolutional Neural Network [Paper] [Code]
Towards Open World Object Detection_CVPR_2021 [Paper]
End-to-End Semi-Supervised Object Detection with Soft Teacher_2021 [Paper]

AugFPN: Improving Multi-Scale Feature Learning for Object Detection [Paper] [Code] [Personal Summary]
Bag of Freebies for Training Object Detection Neural Networks_arXiv_2019 [Paper] [Personal Summary]
- You Only Learn One Representation: Unified Network for Multiple Tasks [Paper] [Code]

NMS-free

- Qiang Chen, Yingming Wang, Tong Yang, Xiangyu Zhang, Jian Cheng, and Jian Sun. You only look one-level feature. In CVPR, 2021.
- Jianfeng Wang, Lin Song, Zeming Li, Hongbin Sun, Jian Sun, and Nanning Zheng. End-to-end object detection with fully convolutional network. In CVPR, 2020.
- Qiang Zhou, Chaohui Yu, Chunhua Shen, Zhibin Wang, and Hao Li. Object detection made simpler by eliminating heuristic nms. arXiv preprint arXiv:2101.11782, 2021.

Anchor-free

- Anchor point-based
  1. 1. Zheng Ge, Songtao Liu, Feng Wang, Zeming Li, and Jian Sun. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021.
    2. Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. FCOS: Fully convolutional one-stage object detection. In Proc. Int. Conf. Computer Vision (ICCV), 2019.
- Keypoint-based
  1. 1. Hei Law and Jia Deng. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European conference on computer vision (ECCV), pages 734–750, 2018.
    2. Ze Yang, Shaohui Liu, Han Hu, Liwei Wang, and Stephen Lin. Reppoints: Point set representation for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9657–9666, 2019.
    3. Xingyi Zhou, Dequan Wang, and Philipp Krahenbuhl. Objects as points. arXiv preprint arXiv:1904.07850, 2019.
- Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. Fcos: Fully convolutional one-stage object detection. In ICCV, 2019.
- Xingyi Zhou, Dequan Wang, and Philipp Krahenb ¨ uhl. Ob- ¨ jects as points. arXiv preprint arXiv:1904.07850, 2019.
- Hei Law and Jia Deng. Cornernet: Detecting objects as paired keypoints. In ECCV, 2018.

Shallow Network

Robust Real-Time Face Detection(Haar+Adaboost)
Integral Channel Features(ICF)
The Fastest Pedestrian Detector in the West(FPDW)
Fast Feature Pyramids for Object Detection(ACF)
Local Decorrelation for Improved Pedestrian Detection(LDCF)
Convolutional Channel Features(CCF)
Informed Haar-like Features Improve Pedestrian Detection(InformedHaar)
Fast Pedestrian Detection for Mobile Devices(FastCF)
Pedestrian detection at 100 Frames Per Second(VeryFast)
To Boost or Not to Boost? On the Limits of Boosted Trees for Object Detection(ACF+/LDCF+)
Filtered channel features for pedestrian detection(Checkerboard)
Pedestrian Detection Inspired by Appearance Constancy and Shape Symmetry(NNNF)
Aggregate Channel Features for Multi-view Face Detection(ACFFace)
Pedestrian Detection with Spatially Pooled Features and Structured Ensemble Learning(SpatialPooling+)
BAdaCost: Multi-class Boosting with Costs(BAdaCost)
Exploring Prior Knowledge for Pedestrian Detection(SCCPriors)
A Fast, Modular Scene Understanding System using Context-Aware Object Detection(SC-ACF）
Ten Years of Pedestrian Detection,What Have We Learned?(Katamari)
How Far are We from Solving Pedestrian Detection?
What Can Help Pedestrian Detection?
Taking a Deeper Look at Pedestrians
Semantic Channels for Fast Pedestrian Detection(MRFC+Semantic)
Fast Boosting based Detection using Scale Invariant Multimodal Multiresolution Filtered Features
Learning Multilayer Channel Features for Pedestrian Detection
Fast and Robust Object Detection Using Visual Subcategories
Learning to Detect Vehicles by Clustering Appearance Patterns(Subcat)
Looking at Pedestrians at Different Scales: A Multiresolution Approach and Evaluations(MR-ACF)
Multiresolution models for object detection
Face Detection without Bells and Whistles
Fast Detection of Multiple Objects in Traffic Scenes With a Common Detection Framework
An Exploration of Why and When Pedestrian Detection Fails
Discriminative Sub-categorization

Others

Model Ensemble

- Ensemble Methods for Object Detection_ECAI_2020 [Paper]

Neck

- ASFF_Sim
- GSConv
- Focus layer (YOLOv5)

Head

- TOOD-Head

Tiny Object Detection

https://github.com/kuanhungchen/awesome-tiny-object-detection?fbclid=IwAR3bHz4le2K7adGCzMEBF_n3NmQTYc3js6cCsTp2QtN3b7lhhKaXnVmbKbM#papers

Object Detection from 2018-2020

2020

[AAAI] Arbitrary-Oriented Object Detection with Circular Smooth Label
[AAAI] CBNet: A Novel Composite Backbone Network Architecture for Object Detection
[AAAI] Distance-IoU
[AAAI] Progressive Feature Polishing Network for Salient Object Detection
[BMVC] Cascade RetinaNet: Maintaining Consistency for Single-Stage Object Detection
[CVPR] AugFPN: Improving Multi-scale Feature Learning for Object Detection
[CVPR] ContourNet: Taking a Further Step toward Accurate Arbitrary-shaped Scene Text Detection
[CVPR] Delving into Online High-quality Anchors Mining for Detecting Outer Faces
[CVPR] Detection in Crowded Scenes One Proposal, Multiple Predictions
[CVPR] Learning from Noisy Anchors for One-stage Object Detection
[CVPR] Multiple Anchor Learning for Visual Object Detection
[CVPR] PolarMask: Single Shot Instance Segmentation with Polar Representation
[CVPR] [– for Tony] Revisiting the Sibling Head in Object Detector: https://arxiv.org/pdf/2003.07540.pdf
[ECCV] Dynamic R-CNN : Towards High Quality Object Detection via Dynamic Training
[ECCV] PIoU Loss: Towards Accurate Oriented Object Detection in Complex Environments
[ECCV] Probabilistic Anchor Assignment with IoU Prediction for Object Detection
[ECCV] Rotation-robust Intersection over Union for 3D Object Detection
[IEEE JSTARS] Learning Point-guided Localization for Detection in Remote Sensing Images
[IEEE TGRS] Adaptive Period Embedding for Representing Oriented Objects in Aerial Images
[IEEE TCSVT] Joint Anchor-Feature Refinement for Real-Time Accurate Object Detection in Images and Videos
[Neurocomputing] Recent Advances in Deep Learning for Object Detection
[Neurocomputing] [– for Tony] Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection: https://arxiv.org/pdf/1803.08208.pdf
[Remote Sens.] EFN: Field-based Object Detection for Aerial Images
[Remote Sens.] Single-Stage Rotation-Decoupled Detector for Oriented Object
[Remote Sens.] A2S-Det: Efficiency Anchor Matching in Aerial Image Oriented Object Detection
[WACV] [– for Tony] Improving Object Detection with Inverted Attention: https://arxiv.org/pdf/1903.12255.pdf
[WACV] [– for Tony] Propose-and-Attend Single Shot Detector: https://arxiv.org/abs/1907.12736
Align Deep Features for Oriented Object Detection
AMRNet: Chips Augmentation in Areial Images Object Detection
BBRefinement: An universal scheme to improve precision of box object detectors
Conditional Convolutions for Instance Segmentation
Cross-layer Feature Pyramid Network for Salient Object Detection: https://arxiv.org/pdf/2002.10864.pdf
EAGLE: Large-scale Vehicle Detection Dataset inReal-World Scenarios using Aerial Imagery
Extended Feature Pyramid Network for Small Object Detection: https://arxiv.org/pdf/2003.07021v1.pdf
FeatureNMS: Non-Maximum Suppression by Learning Feature Embeddings
Feature Pyramid Grids
IterDet: Iterative Scheme for ObjectDetection in Crowded Environments
Location-Aware Feature Selection for Scene Text Detection
Objects detection for remote sensing images based on polar coordinates
Scale-Invariant Multi-Oriented Text Detection in Wild Scene Images
Scaled-YOLOv4: Scaling Cross Stage Partial Network

2019

[AAAI] Gradient Harmonized Single-stage Detector
[AAAI] M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid
[BMVC] Rethinking Classification and Localization for Cascade R-CNN
[CVPR] Assisted Excitation of Activations: A Learning Technique to Improve Object
[CVPR] Borrow from Anywhere Pseudo Multi-modal Object Detection in Thermal Imagery
[CVPR] Dual Attention Network for Scene Segmentation
[CVPR] Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
[CVPR] Learning RoI Transformer for Detecting Oriented Objects in Aerial Images
[CVPR] Learning Instance Activation Maps for Weakly Supervised Instance Segmentation
[CVPR] Libra R-CNN: Towards Balanced Learning for Object Detection
[CVPR] Panoptic Segmentation
[CVPR] Region Proposal by Guided Anchoring
[CVPR] ScratchDet : Training Single-Shot Object Detectors
[CVPR] Softer-NMS: Rethinking Bounding Box Regression for Accurate Object Detection
[CVPR] Spatial-aware Graph Relation Network for Large-scale Object Detection
[CVPR] Weakly Supervised Learning of Instance Segmentation with Inter-pixel Relations
[ICCV] Dynamic Multi-scale Filters for Semantic Segmentation
[ICCV] EGNet: Edge Guidance Network for Salient Object Detection
[ICCV] FCOS: Fully Convolutional One-Stage Object Detection
[ICCV] InstaBoost: Boosting Instance Segmentation via Probability Map Guided
[ICCV] Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving
[ICCV] Matrix Nets: A New Deep Architecture for Object Detection
[ICCV] ThunderNet: Towards Real-time Generic Object Detection
[ICCV] Towards More Robust Detection for Small, Cluttered and Rotated Objects
[ICCV] Scale-Aware Trident Networks for Object Detection
[ICCV] SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects
[ICIP] SSSDET: Simple Short and Shallow Network for Resource Efficient Vehicle Detection in Aerial Scenes
[ICLR] Approximating CNNs with Bag-of-local-Features models works surprisingly well on ImageNet
[ICLR] ImageNet-trained CNNs are biased towards texture: increasing shape bias improves accuracy and robustness
[ICLR] Why do deep convolutional networks generalize so poorly to small image transformations?
[ICML] How much real data do we actually need: Analyzing object detection performance using synthetic and real data
[ICML] Making Convolutional Networks Shift-Invariant Again
[ICTAI] Twin Feature Pyramid Networks for Object Detection
[IEEE Access] A Real-Time Scene Text Detector with Learned Anchor
[IEEE Trans Geosci Remote Sens] CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery
[IJCAI] Omnidirectional Scene Text Detection with Sequential-free Box Discretization
[J. Big Data] A survey on Image Data Augmentation for Deep Learning
[NeurIPS] Cascade RPN Delving into High-Quality Region Proposal Network with Adaptive Convolution
[NeurIPS] FreeAnchor Learning to Match Anchors for Visual Object Detection
A Preliminary Study on Data Augmentation of Deep Learning for Image Classification
Bag of Freebies for Training Object Detection Neural Networks
Consistent Optimization for Single-Shot Object Detection
Deep Learning for 2D and 3D Rotatable Data An Overview of Methods
Double-Head RCNN: Rethinking Classification and Localization for Object Detection
IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection
IoU-uniform R-CNN: Breaking Through the Limitations of RPN
Is Sampling Heuristics Necessary in Training Deep Object Detectors
Learning Data Augmentation Strategies for Object Detection
Learning from Noisy Anchors for One-stage Object Detection
Light-Head R-CNN: In Defense of Two-Stage Object Detector
MMDetection: Open MMLab Detection Toolbox and Benchmark
Multi-Scale Attention Network for Crowd Counting
Natural Adversarial Examples
Needles in Haystacks: On Classifying Tiny Objects in Large Images
Revisiting Feature Alignment for One-stage Object Detection
Ship Detection: An Improved YOLOv3 Method

2018

[ACCV] Reverse Densely Connected Feature Pyramid Network for Object Detection
[BMVC] Enhancement of SSD by concatenating feature maps for object detection
[CVPR] An Analysis of Scale Invariance in Object Detection
[CVPR] Cascade R-CNN: Delving into High Quality Object Detection
[CVPR] DOTA: A Large-scale Dataset for Object Detection in Aerial Images
[CVPR] Path Aggregation Network for Instance Segmentation
[CVPR] Pseudo Mask Augmented Object Detection
[CVPR] Rotation Sensitive Regression for Oriented Scene Text Detection
[CVPR] Scale-Transferable Object Detection
[CVPR] Single-Shot Object Detection with Enriched Semantics
[CVPR] Single-Shot Refinement Neural Network for Object Detection
[CVPR] Squeeze-and-Excitation Networks
[CVPR] Weakly Supervised Instance Segmentation using Class Peak Response
[ECCV] Acquisition of Localization Confidence for Accurate Object Detection
[ECCV] Deep Feature Pyramid Reconfiguration for Object Detection
[ECCV] DetNet: A Backbone network for Object Detection
[ECCV] Learning to Segment via Cut-and-Paste
[ECCV] Modeling Visual Context is Key to Augmenting Object Detection Datasets
[ECCV] Receptive Field Block Net for Accurate and Fast Object Detection
[ICLR] Multi-Scale Dense Convolutional Networks for Efficient Prediction
[ICANN] Further advantages of data augmentation on convolutional neural networks
[IEEE ISBI] A Novel Focal Tversky loss function with improved Attention U-Net for lesion segmentation
[IEEE TIP] TextBoxes++: A single-shot oriented scene text detector
[IEEE Trans Multimedia] Arbitrary-oriented scene text detection via rotation proposals
[IJAC] An Overview of Contour Detection Approaches
[IJCV] What Makes Good Synthetic Training Data for Learning Disparity and Optical Flow Estimation?
[J Mach Learn Res] Neural Architecture Search: A Survey
[Remote Sens.] Automatic Ship Detection of Remote Sensing Images from Google Earth in Complex Scenes Based on Multi-Scale Rotation Dense Feature Pyramid Networks
[VISIGRAPP] Learning Transformation Invariant Representations with Weak Supervision
[WACV] Understanding Convolution for Semantic Segmentation
Data Augmentation by Pairing Samples for Images Classification
MDSSD: Multi-scale Deconvolutional Single Shot Detector for Small Objects
RAM: Residual Attention Module for Single Image Super-Resolution
R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection

References

Visual attention mechanism | The relationship and difference between Non-local module and Self-attention? [Link]
Visual attention mechanism | Visual attention mechanism for classification: SEnet, CBAM, SKNet [Link]
Summary | Pytorch-based YOLO target detection project engineering collection [Link]
https://zhuanlan.zhihu.com/p/140022058
https://viblo.asia/p/tong-hop-kien-thuc-tu-yolov1-den-yolov5-phan-3-63vKjgJ6Z2R
https://towardsdatascience.com/ensemble-learning-bagging-boosting-3098079e5422
YOLO Series: https://viblo.asia/p/tong-hop-kien-thuc-tu-yolov1-den-yolov5-phan-3-63vKjgJ6Z2R
https://github.com/srebroa/awesome-yolo

Page updated

Google Sites

Report abuse