Epsilon Proximity

Labels / Tags

  • Clustering

Principle

Epsilon Proximity is a clustering algorithm working on any space as long as a metric associated to input data type is given. It consists to gather every points closer than $\epsilon$ each others pair to pair into same cluster.

It can also be seen as a graph where vertices are points and edges between two vertices exists if and only if distance between those two points is under $\epsilon$. Thus there will be as many connected component than clusters.

Scalability

Computational complexity is in $O(n^2)$.

Where :

  • n is the number of data points.

Input

  • A collection of data representation (R) of any nature.

Parameters

  • A metric on R.
  • $\epsilon$ : the threshold under which two points are considered into same cluster.

Output

As with DBSCAN ref_todo_ algorithm, $\epsilon$-Proximity return a Clusters ref_to_Clusters_type model i.e. a collection of Cluster ref_to_Cluster_type. For whose unfamaliar with HephIA concepts, a Clusters model has a sibling which is the HardClustering model. Both have cluster’s affection for each data observation and differ in their data structure.

Predictor

$\epsilon$-Proximity algorithm output beeing a Clusters it already contains affectation for each observation because they are gather into their respective cluster.

Some predictor can directly or through a conversion be used with Clusters model such has ClosestPrototypePredictor or KNNHardClusteringPredictor.

Associated visualization

  • Clusters like.
  • HardClustering like.
  • Prototypes like.

Practical strategies

Business case

Usage