Ball Hall
Labels / Tags
- Internal quality index
- NumericalEvaluation
Principle
Ball Hall index is the average over each clusters of their mean dispersion, where mean dispersion is the average distance between point and the prototype of the same cluster.
The smaller the score is, the more compact clusters are.
It works on any kind of representation R as long as a dissimilarity measure is associate to it.
$ S = \frac{1}{K}\sum_{k=1}^K \frac{1}{n_k}\sum_{i\in{I_k}} D(x_i, p_k) $
Where :
- $K$ is the number of clusters.
- $n_k$ is the number of element in cluster $k$.
- $I_k$ is the set of elements of cluster $k$.
- $D(., .)$ is the dissimilarity measure.
Scalability
Computational complexity is in $O(n)$.
Input
A collection of values of same type from usal one to any other one as long as associated dissimilarity measure exists.
Currently supported types are :
- Numerical vector
- Binary vector
- Mixed Vector
- Monovariate time series
- Multivariate time series
Parameters
1 : metric, a dissimilarity measure the given input type.
Let be R the type of the data values. Then given metric parameter must by of type :
- $D : $(
R,R) $=>$ Numerical value
2 : aggregator, an aggregating function for a collection of values.
It takes must be takes the following form : [R] => R, where [R] is a collection of values of type R.
It exists many aggregators working on a various range of data values type.
Following sub section will expose some of them organized by data values type.
Numerical value
mean
median
mode
medoid
Binary value
mode
Threshold vote
Ouput
A NumericalEvaluation which is here a simple numerical value.
Associated visualization
Every visualization dealing with numerical collection :
- Bar chart / Histogram
Business case
Usage