A Survey on Object Instance Segmentation
{, }
Paper: https://link.springer.com/article/10.1007/s42979-022-01407-3
Code:
{, }
Paper: https://link.springer.com/article/10.1007/s42979-022-01407-3
Code:
1) Motivation, Objectives and Related Works:
Motivation:
In recent years, instance segmentation has become a key research area in computer vision. This technology has been applied in varied applications such as robotics, healthcare and intelligent driving.
Instance segmentation technology not only detects the location of the object but also marks edges for each single instance, which can solve both object detection and semantic segmentation concurrently.
Objectives:
Give a detailed introduction to the instance segmentation technology based on deep learning, reinforcement learning and transformers.
Introduction:
Object classification [110–112] [8-10]
110. Krizhevsky A, Sutskever I, Hinton G. Imagenet classifcation with deep convolutional neural networks. Adv Neural Inf Process Syst. 2012;25.
111. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. 2014. arXiv preprint arXiv:1409. 1556.
112. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. 2016.
8. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. 2016.
9. Krizhevsky A, Sutskever I, Hinton G. Imagenet classifcation with deep convolutional neural networks. Adv Neural Inf Process Syst. 2012;25:1097–105.
10. Tang P, Wang X, Huang Z, Bai X, Liu W. Deep patch learning for weakly supervised object classifcation and discovery. Pattern Recogn. 2017;71:446–59.
Object detection [15, 29, 113, 114] [11–20]
15. Ren S, He K, Girshick R, Sun J. Faster r-cnn: towards real-time object detection with region proposal networks. 2015. arXiv preprint arXiv:1506.01497.
29. Girshick R. Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448. 2015.
113. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unifed, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788. 2016.
114. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY., Berg A. Ssd: single shot multibox detector. In: European conference on computer vision, pp. 21–37. 2016.
11. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587. 2014.
12. Huang L, Yang Y, Deng Y, Yu Y. Densebox: unifying landmark localization with end to end object detection. 2015. arXiv preprint arXiv:1509.04874.
13. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg A . Ssd: Single shot multibox detector. In: European conference on computer vision, pp. 21–37. 2016.
14. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unifed, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788. 2016.
16. Tang P, Wang C, Wang X, Liu W, Zeng W, Wang J. Object detection in videos by high quality object linking. IEEE Trans Pattern Anal Mach Intell. 2019;42(5):1272–8.
17. Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M. Deep learning for generic object detection: a survey. Int J Comput Vis. 2020;128(2):261–318.
18. Gidaris S, Komodakis N. Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp. 1134–1142. 2015.
19. Stewart R, Andriluka M, Ng A. End-to-end people detection in crowded scenes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2325–2333. 2016.
20. Dai J, Li Y, He K, Sun J. R-fcn: object detection via region-based fully convolutional networks. 2016. arXiv preprint arXiv:1605.06409.
Semantic segmentation [21, 23, 24]
21. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille A. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell. 2017;40(4):834–48.
22. Huang Z, Wang X, Huang L, Huang C, Wei Y, Liu W. Ccnet: Criss-cross attention for semantic segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 603–612. 2019.
23. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431– 3440. 2015.
24. Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890. 2017.
Instance segmentation [28, 54].
28. He K, Gkioxari G, Dollár P, Girshick R. Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 2961–2969. 2017.
54. Huang Z, Huang L, Gong Y, Huang C, Wang X. Mask scoring r-cnn. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 6409–6418. 2019.
Related Works:
Convolutional Neural Networks (CNNs) based Instance Segmentation [1–5]
Reinforcement Learning (RL) based Instance Segmentation [6, 7].
Transformers based Instance Segmentation
computer vision problem [115, 116].
image recognition [117, 118],
object detection [119, 120],
segmentation [121]
others use cases [122, 123].
Survey papers on instance technology [133, 134].
Contribution:
This is the frst survey paper on instance segmentation that broadly covers the technology based on diferent techniques such as deep learning, reinforcement learning and transformers. Specifcally, we present a comprehensive overview of more than 40 papers to cover the recent progress in detail.
We provide complete coverage of this feld by sorting the paper based on the techniques used. This survey paper represents the diferent applications using instance segmentation technology in Fig. 3.
Finally, we provide a discussion about the key challenges, highlighting open problems and outlining the future scope.
2) Methodology:
Instance Segmentation Using Deep Learning:
Proposal-based Approaches
Mask R-CNN
Instance-Sensitive Fully Convolutional Networks (FCNs)
Fully Convolutional Instance-Aware Semantic Segmentation
Boundary-Aware Instance Segmentation
S4Net: Single-Stage Salient‑Instance Segmentation
PANet: Path Aggregation Network for Instance Segmentation
Masklab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
TensorMask: A Foundation for Dense Object Segmentation
ShapeMask: Learning to Segment Novel Objects by Refning Shape Priors
YOLACT: Real-Time Instance Segmentation
Hybrid Task Cascade for Instance Segmentation
Mask Scoring R-CNN
Blendmask: Top-Down Meets Bottom-Up for Instance Segmentation
Mask Encoding for Single-Shot Instance Segmentation
Boundary-Preserving Mask R-CNN
Proposal-Free Approaches
PolarMask: Single-Shot Instance Segmentation with Polar Representation
PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond
RetinaMask: Learning to Predict Masks Improves State-of-the-Art Single-Shot Detection for Free
Deep Watershed Transform for Instance Segmentation
InstanceCut: from Edges to Instances with MultiCut
Recurrent Instance Segmentation
Iterative Instance Segmentation
Pixelwise Instance Segmentation with a Dynamically Instantiated Network
End-to-End Instance Segmentation with Recurrent Attention
SGN: Sequential Grouping Networks for Instance Segmentation
Proposal-Free Network for Instance-Level Object Segmentation
Distance to Center of Mass Encoding for Instance Segmentation
TernausNetV2: Fully Convolutional Network for Instance Segmentation
Weakly Supervised Instance Segmentation Using Class Peak Response
SSAP: Single-Shot Instance Segmentation with Affinity Pyramid
DeeperLab: Single-Shot Image Parser
Pose2Seg: Detection Free Human Instance Segmentation
PolyTransform: Deep Polygon Transformer for Instance Segmentation
CenterMask: Real-Time Anchor-Free Instance Segmentation
SOLO: Segmenting Objects by Locations
SOLOv2: Dynamic and Fast Instance Segmentation
Instance Segmentation Using Reinforcement Learning:
Actor-Critic Instance Segmentation
Reinforced Coloring for End-to-End Instance Segmentation
Instance Segmentation Based on Transformers:
ISTR: End-to-End Instance Segmentation with Transformers
SOTR: Segmenting Objects with Transformers
SOIT: Segmenting Objects with Instance-Aware Transformers
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
3) Experimental Results:
Dataset:
Metrics:
Results on instance segmentation based on different techniques along with datasets
Results on instance segmentation for other datasets
Experimental Results:
Ablations:
References:
n2 n0
θ