Self Organized Map (SOM)

Labels / Tags

  • Clustering
  • Hard
  • Numerical vector

Principle

Self-organizing map is a clustering algorithm, it allow projection in small spaces that are generally two dimensional. The basic model proposed by Kohonen consists on a discrete set $C$ of cells called map. The size of the grid C is denoted by k and must be provided a priori. A variety of self-organizing models is derived from the first original model proposed by Kohonen. All models are different from each other but share the same idea: depict large data-sets on a simple geometric relationship projected on a reduced topology (1D or 2D). This grid has topological order of $k$ cells. Each cell $c$ has its own cluster denoted. Self-organizing process requires neighbourhood functions to preserve topological relationships between cells. Hence the neighbourhood functions are needed to update prototypes.

Scalability

Computational complexity is in O(n.k.iter.d). Where :

  • n is the number of data points.
  • k is the number of prototypes.
  • iter is the number of iterations.
  • d is the dimensionality of the data.

Input

  • A collection of numerical vector (R^d).

Parameters

  • A continous metric, by default Euclidean.
  • Stopping criteria. Many strategies have been developped. Actually A, B, C are available.
  • K prototypes initialization, also denote as h.w.
  • (Optionally) list of K prototypes.
  • Maximum number of iteration.
  • Neighborhood function.

Output

SOMModel ref_to_SOMModel_type

SOMModel contains the grid of size K=w.h prototypes.

Predictor

If we do not consider the structure between the SOMModel prototype grid, taking them as a collection of prototypes. ClosestPrototypePredictor [ref_to_ClosestPrototype] is a good start, it will allow, for a new data point, to affect the ClusterId of its closest SOMModel’s prototype (numerical vector) on regards of the used dissimilarity measure.

Associated visualization

  • SOM like
  • 2/3D numerical vector
  • Rn numerical vector

Practical strategies

Business case

Usage

tools for visualization