0) Overview:
Generally speaking, object segmentation methods can be divided into three categories:
The unsupervised methods: such as K-means [1], EM [2], FH segmentation [3], Active contour [4], normalized cut [5], meanshift clustering [6], MLSS [7] or SAS [8], are implemented without prior knowledge about the images.
The semi-supervised methods [9]–[12], users can label pixels as foreground or background with interactive segmentation approaches. User input can locate where the object is (location information), and colour and texture information contained in the scribbles provide prior knowledge about what the object is.
Fully-supervised segmentation methods are mostly designed for detecting and extracting specific semantic object categories in images and accurately labelled training dataset is required. Currently, deep learning-based semantic segmentation solutions have also achieved significant improvement and attracted a lot of attention [13]–[17].
UNSUPERVISED SEGMENTATION
A typical unsupervised segmentation algorithm always contains two parts:
feature extraction from images’ pixels
dividing an image into non-overlapping regions by pixels clustering, such as the details described in methods such as normalized cut [5], MLSS [7], and SAS [8].
For example:
a segmentation framework based on bipartite graph partitioning is designed to aggregate multi-layer superpixels in SAS [8].
In MLSS [7], a semi-supervised learning strategy is applied to generate pairwise affinities based on the sparse graph constructed on pixels and over-segmented regions. Then the pairwise affinities are applied to the spectral segmentation algorithms.
However, the performance of those methods may suffer from two principal drawbacks: being sensitive to the segmentation parameters such as cluster numbers and the whole flowchart is complex, which can not be optimized jointly.
W-Net [18] involves techniques such as designing complex loss functions.
Related Works:
Many unsupervised segmentation methods have been proposed recently, such as mean-shift (MS), k-means [6], normalized cuts (NCuts) [5], Felzenzwalb and Huttenlocher’s graph-based (FH) [3], SDTV [19], KM [20], UCM [21], CCP [22], MLSS [7] and SAS [8].
Mean-shift (MS) [6] builds a non-parametric probability distribution in a feature space and applies mean shift filtering in this domain to yield a convergence point for each pixel.
Normalized cuts (NCuts) [5] focuses on minimizing the similarity between groups while maximizing the associations within groups.
Other methods can be divided into two categories: region-based and contour-based methods.
Region-based unsupervised segmentation methods focus on finding the similarity among neighboring pixels and merging them using features including color, texture, contour, or luminance.
Superpixels are always taken as important cues for aiding segmentation and one of the typical works is MLSS [7], in which a multi-layer semi-supervised learning scheme is proposed to construct a dense affinity matrix over pixels and superpixels for spectral clustering.
Another highlighted work is SAS [8], a novel segmentation framework based on bipartite graph partitioning to is designed to aggregate multi-layer superpixels.
Contour-based methods focus on generating segmentation masks via contour cues.
In [21], the image segmentation problem is constructed as a contour detection problem. A contour detection using multiscale local brightness, color, and texture is proposed firstly. Then an Ultrametric Contour Map (UCM) is constructed by generating a hierarchical region tree from contours.
In CCP [22], a contour-guided color palette (CCP) is designed firstly. Then, it is further fine-tuned by post-processing techniques such as leakage avoidance, fake boundary removal, and small region mergence to generate robust segmentation masks.
1) Dataset and Metrics:
Paper List:
2) Methods:
References: