K-Modes

K-Modes

Labels / Tags

  • Clustering

Principle

K-Modes clustering algorithm is an unsupervised algorithm technique working on binary data ({0, 1}^d). It generate a list of k prototypes by iterating over input dataset, each time a new partitioning is generated associating points to their closest prototype. Then each prototype is updated by computing the mode of each cluster. Iterative process can be stopped in various way such as setting a maximal number of iteration, wait until prototypes shift is under a threshold, or by looking for WSSE variation.

Initial prototypes can be given or automatically set, for example they can be chosed randomly within input dataset or through others technique la K-Means++ ref_to_add_same_as_kmeans.

Scalability

Computational complexity is in O(n.k.iter.d).

Where :

  • n is the number of data points.
  • K is the number of prototypes.
  • iter is the number of iterations.
  • d is the dimensionality of the data.

Input

  • A collection of binary vector, i.e. {0, 1}^d vectors where $d$ is the dimmensionality of the data.

Parameters

  • A binary metric, by default Hamming.
  • Stopping criteria. Many strategies have been developped. Actually A, B, C are available.
  • K : prototype number intialization.
  • (Optionally) list of K prototypes.
  • IterMax : The maximum number of iterations.

Ouput

The KModesModel containing the K learned prototypes (final clusters mean) as binary vector.

Predictor

K-Modes algorithm returns a KModesModel describes above. In order to obtain the full HardClustering over input dataset or on another data a predictor is required. It is build with the KModesModel and allow to predict for one point to which cluster it belongs or for a collection of data it returns a whole HardClustering.

Associated visualization

Prototypes like.

Practical strategies

Many strategies are born around K-Modes algorithm and how to get the better of it.

Find the best $K$ value

TODO : Describe evolution of an index with K parameter.

Choose the right stopping criteria

How to stop the algorithm at the right moment is an important factor which can imply to balance between diminish execution time at acceptable level and get accurate enough results

Business case

Usage