Local Area


labels/tags

Principle

LocalArea clustering algorithm is an iterative algorithm consisting to pick a prototype from unclusterized data, randomly or with any already implemented picking strategy. Once the prototype is chosen, we affect all observations within a user defined area. It could be an $\epsilon$-area, a $KNN$-area, or any other implemented area definition. Its default implementation is quadratic even so linear versions exists (some use hashing techniques). An advantage of this algorithm is that it can find automatically the number of cluster.

Scalability

Computing complexity is in $O(n^2)$. Where :

  • n is the number of data points.

Input

General idea

This clustering algorithm depends essentially from a metric over the representation type of data with which it deals with. Then it can handle any data type at condition to provide an associate dissimilarity mesure on it.

  • A collection of data of any type, let’s define it as R (Representation).
  • A dissimilarity measure on R.

Any custom representation R can be added if necessary.

Available version

Numerical vector collection

see standard input data for continuous data

Binary vector collection

Categorical vector collection

Monovariate temporal series collection

Multivariate temporal series collection

Hyperparameters

  • Metric fitting with data type.
    • Continuous :
      • Euclidean
      • Minkowski
    • Binary :
      • Hamming
      • MeanMahanttan
      • Jaccard
      • Vari
      • PatternDifference
      • ShapeDifference
      • SizeDifference
    • Mixed (Binary & Continuous)
    • Monovariate Temporal Series
    • Multivariate Temporal Series

Ouput format

A model containing the $K$ clusters with their associate prototype.

Associated visualization

Prototypes like. Clusters like.

Business case

Usage