Epsilon Proximity
Labels / Tags
- Clustering
Principle
Epsilon Proximity is a clustering algorithm working on any space as long as a metric associated to input data type is given. It consists to gather every points closer than $\epsilon$ each others pair to pair into same cluster.
It can also be seen as a graph where vertices are points and edges between two vertices exists if and only if distance between those two points is under $\epsilon$. Thus there will be as many connected component than clusters.
Scalability
Computational complexity is in $O(n^2)$.
Where :
- n is the number of data points.
Input
- A collection of data representation (
R) of any nature.
Parameters
- A metric on
R. - $\epsilon$ : the threshold under which two points are considered into same cluster.
Output
As with DBSCAN ref_todo_ algorithm, $\epsilon$-Proximity return a Clusters ref_to_Clusters_type model i.e. a collection of Cluster ref_to_Cluster_type. For whose unfamaliar with HephIA concepts, a Clusters model has a sibling which is the HardClustering model. Both have cluster’s affection for each data observation and differ in their data structure.
Predictor
$\epsilon$-Proximity algorithm output beeing a Clusters it already contains affectation for each observation because they are gather into their respective cluster.
Some predictor can directly or through a conversion be used with Clusters model such has ClosestPrototypePredictor or KNNHardClusteringPredictor.
Associated visualization
- Clusters like.
- HardClustering like.
- Prototypes like.
Practical strategies
Recommended association
Business case
Usage