Unsupervised Learning

0) Motivation, Objectives and Related works:

Objectives:

In this post, I will be covering all the latest clustering techniques which leverage deep learning.
The goal of most of these techniques is to cluster the data-points such that the data-points of the same ground truth class are assigned the same cluster.
The deep learning-based clustering techniques are different from traditional clustering techniques as they cluster the data-points by finding complex patterns rather than using simple pre-defined metrics like intra-cluster euclidean distance. [Link]
Non-parametric deep clustering: methods which utilize deep clustering when the number of clusters is not known apriorly and needs to be inferred.

1) Dataset and Metrics:

Image Dataset:

Metrics:

Validation and Assessment:

Clustering Validation: Evaluate the goodness of the clustering
- External - Supervised: Employ criteria not inherent to the dataset.
- Internal - Unsupervised: Criteria derive from data itself
- Relative: Compare different clustering, usually those obtained via different parameter settings fro the same algorithm.
Clustering Stability: Understand the sensitivity of the clustering result to various algorithm parameters. (to nb. of cluster)
Clustering Tendency: Assess the suitability of clustering (whether data has any inherent grouping structure)

External - Supervised Metrics:

[ACC] Clustering accuracy - range (0,1)
[NMI] Normalized mutual information - range (0,1)
- Strehl, A. and Ghosh, J. (2002). Cluster ensembles—a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research, 3:583–617.
- Vinh, N. X., Epps, J., and Bailey, J. (2010). Information-theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance. Journal of Machine Learning Research, 11:2837–2854.
- Cai, D., He, X., and Han, J. (2011). Locally consistent concept factorization for document clustering. IEEE Transactions on Knowledge and Data Engineering, 23(6):902–913.
[ARI] Adjusted Rand Index - range (-1,1)
Purity:
Maximum Matching:
F-Measure

Internal - Unsupervised Metrics:

Relative:

2) Methods:

Similarity/ Distance Calculation Methods:

The Measurement Indicators of Clustering:

External indicators refer to indicators that need to be compared and analyzed with the help of the actual situation of the data during the evaluation process.
Internal indicators refer to indicators that can be evaluated without other data.

References:

Page updated

Google Sites

Report abuse

About Me: