Local Area
labels/tags
- Tagged according to tag list
- clustering
Principle
LocalArea clustering algorithm is an iterative algorithm consisting to pick a prototype from unclusterized data, randomly or with any already implemented picking strategy. Once the prototype is chosen, we affect all observations within a user defined area. It could be an $\epsilon$-area, a $KNN$-area, or any other implemented area definition. Its default implementation is quadratic even so linear versions exists (some use hashing techniques). An advantage of this algorithm is that it can find automatically the number of cluster.
Scalability
Computing complexity is in $O(n^2)$. Where :
- n is the number of data points.
Input
General idea
This clustering algorithm depends essentially from a metric over the representation type of data with which it deals with. Then it can handle any data type at condition to provide an associate dissimilarity mesure on it.
- A collection of data of any type, let’s define it as
R(Representation). - A dissimilarity measure on
R.
Any custom representation R can be added if necessary.
Available version
Numerical vector collection
see standard input data for continuous data
Binary vector collection
Categorical vector collection
Monovariate temporal series collection
Multivariate temporal series collection
Hyperparameters
- Metric fitting with data type.
- Continuous :
- Euclidean
- Minkowski
- Binary :
- Hamming
- MeanMahanttan
- Jaccard
- Vari
- PatternDifference
- ShapeDifference
- SizeDifference
- Mixed (Binary & Continuous)
- Monovariate Temporal Series
- Multivariate Temporal Series
- Continuous :
Ouput format
A model containing the $K$ clusters with their associate prototype.
Associated visualization
Prototypes like. Clusters like.
Business case
Usage