Instance Segmentation
Instance segmentation, which aims to assign a pixel-wise instance mask with a category label to each object in an image, has great potential in various computer vision applications, such as autonomous driving and robotics.
Instance segmentation, which aims to assign a pixel-wise instance mask with a category label to each object in an image, has great potential in various computer vision applications, such as autonomous driving and robotics.
0) Overview:
Dataset:
MS COCO Dataset
Cityscapes Dataset
The Mapillary Vistas Dataset (MVD)
Pascal VOC 2012 Dataset
CVPPP Dataset
KITTI Dataset
Metrics:
Purpose:
1) Papers:
BPR:
Two-stage methods usually follow the classical detect-then-segment strategy.
Mask R-CNN [13] inherits from the two-stage detector Faster R-CNN [32] to first detect objects in an image and further perform binary segmentation within each detected bounding box.
Following Mask R-CNN, PANet [25] enhances feature representation through bottom-up path augmentation.
Mask Scoring R-CNN [14] adds an additional mask-IoU head to re-score the mask predictions.
One-stage methods recently attracts more attention due to the rapid development of one-stage detectors [22, 37, 53].
Some methods [2, 3, 19, 46, 51] continue to adapt the detect-then-segment strategy but replace the detectors with the one-stage alternatives. YOLACT [2] achieves real-time speed by learning a set of prototypes and the prototypes are assembled with the learned linear coefficients. BlendMask [3] further improves this idea by assembling with attention maps. Some recent proposed methods [6, 36, 40, 41] eliminate the need for detection by directly segmenting objects in a location-wise manner. CondInst [36] and SOLOv2 [41] achieve remarkable performance with high efficiency. In addition, there are some approaches [9, 11, 30] built upon the semantic segmentation models, which usually learn the pixel-wise embeddings and then cluster them into instances. Several works [1, 31, 42, 45] replace the pixel-wise instance representation into the contour-based representation.
References: