0) Overview:
Dataset:
Considering the human annotation behavior for instance segmentation, the annotators usually first localize and categorize each object in the given image, and then explicitly or implicitly segment some coarse instance masks at a low resolution.
To obtain a high-quality mask, the annotators need to repeatedly zoom into the local boundary regions and explore the sharper boundary segmentation at higher resolution.
Intuitively, high-level semantics are required to localize and roughly segment objects, while low-level details (e.g. color consistency and contrast) are more critical for segmenting the local boundary regions.
Metrics:
Purpose:
The quality of the predicted instance mask is still not satisfactory.
One of the most important problems is the imprecise segmentation around instance boundaries.
1) Papers:
BPR:
Designing a boundary-aware segmentation model by integrating an extra and specialized module to process boundaries.
[BMask R-CNN] and [Gated-SCNN] employ an extra branch to enhance the boundary awareness of mask features by estimating boundaries directly, which can fix the optimization bias to some extent, while the low-resolution issue remains unsolved.
[DecoupleSegNets] "Improving semantic segmentation via decoupled body and edge supervision", 2020.
[Focal-BG] "Focal boundary guided salient object detection", 2019.
[HMEDN] "High-resolution encoder-decoder networks for low-contrast medical image segmentation", 2019.
[CGBNet] employs a boundary delineation refinement module to obtain better boundaries in semantic segmentation
[PointRend] iteratively samples the feature points with unreliable predictions and refines them with a shared MLP.
IABL (Wang et al., 2022) and ContourLoss (Chen et al., 2020) use boundary-aware loss functions to improve the boundary quality. B2Inst (Kim et al., 2021) uses the extra boundary basis, and RefineMask (Zhang et al., 2021) propose a multistage boundary refinement module to improve the boundary quality for instance segmentation. These specialized modules usually process boundaries in a low resolution due to the GPU memory constraint, thus the low-resolution issue remains unsolved. By contrast, the proposed BPR method only processes the smaller boundary patches, thus the patches could be up-sampled into larger scales when inputted into the refinement network.
Refining the boundaries based on the results of existing segmentation models with a post-processing scheme.
[SegFix] is a general refinement mechanism, which replaces the unreliable (coarse) predictions of boundary pixels with the predictions of interior pixels. The effectiveness of SegFix highly depends on the accuracy of boundary prediction. However, it is very challenging to directly estimate precise instance boundaries. Intuitively, the instance segmentation task could easily be settled if the precise boundaries are already given.
[PolyTransform] transforms the contour of an instance into a set of polygon vertices. A Transformer based network is applied to predict the offsets of vertices towards object boundaries. It achieves superior performance while suffering from a large computational overhead due to the large instance patch and the heavy Transformer architecture.
[BPR] Correcting the error pixels near object boundaries can improve the mask quality a lot.
[DeepStrip] proposes to convert the boundary regions into a strip image and compute a boundary prediction in the strip domain.
BMask RCNN:
Instance Refinement
HED [49], a fully convolutional holistically-nested edge detector, performs in an image-to-image manner and end-to-end training.
"Holistically-nested edge detection".
CASENet [52] presents a novel challenging task semantic boundary detection, aiming to detect category-aware boundaries.
"Casenet: Deep category-aware semantic edge detection".
[1,53] investigate the label misalignment problem caused by noisy labels in semantic boundary detection.
"Devil is in the edges: learning semantic boundaries from noisy annotations".
"Simultaneous edge alignment and learning".
[50] proposes a geometric aware loss function for object skeleton detection in natural images.
"Learning geometry-aware skeleton detection".
Semantic Refinement
Chen et al. [10] propose a fully connected conditional random field (CRF) [32] to capture spatial details and refine boundaries.
"Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs".
"Efficient inference in fully connected CRFs with gaussian edge potentials".
Recent semantic segmentation methods [3,8,23,45,51] leverage predicted boundaries or edges to facilitate semantic segmentation.
"Semantic segmentation with boundary neural fields".
"Semantic image segmentation with task-specific edge detection using CNNs and a discriminatively trained domain transform".
"Object boundary guided semantic segmentation".
"Gated-SCNN: gated shape CNNs for semantic segmentation".
"Learning a discriminative feature network for semantic segmentation".
[11,54] refine segmentation results with direction fields learned from predicted boundaries.
"Learning directional feature maps for cardiac MRI segmentation".
"SegFix: model-agnostic boundary refinement for segmentation".
Zimmermann et al. propose an edge agreement head [55] to focus on the boundaries of instances with an auxiliary edge loss.
"Faster training of mask R-CNN by focusing on instance boundaries"
[BMask R-CNN] explicitly predicts instance-level boundaries, from which we obtain instance shape information for better mask localization.
==> Compared to semantic segmentation, boundaries in instance segmentation have dual relations to the masks.
References: