Feature Pyramid Network (Feature Pyramid Network) is a basic component in a recognition system for detecting objects of different scales. Recognizing targets at multiple scales is a challenge for computer vision. By extracting multi-scale feature information for fusion, the model's accuracy is improved.
The main challenges facing multi-scale object detection:
How to learn multi-scale feature representation with strong semantic information?
How to design a general feature representation to solve multiple sub-problems in object detection? Such as target category, positioning, segmentation, etc.?
How to efficiently calculate multi-scale feature representation?
[FPN] T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature Pyramid Networks for Object Detection", Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2117-2125, 2017.
[Nas-FPN]
[PAFPN]
[BFP]
[AugFPN] AugFPN: Improving Multi-scale Feature Learning for Object Detection [Paper]
[BiFPN] EfficientDet: Scalable and Efficient Object Detection
[RCNet] Z. Zong, Q. Cao, and B. Leng, "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection", Proceedings of the 29th ACM International Conference on Multimedia, pp. 5637–5645, 2021.
Feature Pyramid Grids
CE-FPN: Enhancing Channel Information for Object Detection
SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization [Paper]
SPP (Spatial Pyramid Pooling): Pools features at multiple scales to increase invariance to spatial transformations.
PPM (Pyramid Pooling Module): Similar to SPP, but with more flexibility in pooling scales and fusion strategies.
ASPP (Atrous Spatial Pyramid Pooling): Combines spatial pyramid pooling with atrous convolution (dilated convolution) to capture multi-scale contextual information without losing resolution.
FPN (Feature Pyramid Networks): Constructs a multi-scale feature pyramid from a single-scale input image, allowing for object detection and segmentation at different scales.
PANet (Path Aggregation Network): Enhances FPN by adding bottom-up paths for feature propagation, further improving feature representation.
Generalized-FPN: Generalizes the concept of FPN to different backbone architectures and tasks.
RFB (Receptive Field Block Net): Combines multi-branch convolutions with different dilation rates to expand receptive fields for better context aggregation.
ASFF (Adaptive Spatial Feature Fusion): Adaptively fuses features from different levels of a feature pyramid for more effective information integration.
SFAM (Scale-wise Feature Aggregation Module): Aggregates features from multiple scales in a scale-aware manner for robust feature representation.
LawinASPP: Use of "large window attention" as a way to capture multi-scale contextual information)
Integrates Spatial Pyramid Matching (SPM) [39] into the CNN to output one fixed-size dimensional feature vector.
From YOLOv3 [63], Redmon and Farhadi also integrated the improved SPP module between the Backbone and FPN to form the YOLOv3-SPP.
YOLOv3-608-SPP upgrades AP50 by 2.7% at the cost of 0.5% extra computation.
The ASPP is the SPP module with dilated convolution operation.
Improve dilated convolution to obtain more comprehensive spatial coverage of feature maps.
RFB [47] only costs 7% extra inference time to increase the AP50 of SSD by 5.7% on MS COCO .