Davies Bouldin
Labels / Tags
- Tagged according to tag list
- quality index
Principle
The Davies–Bouldin index (DBI), introduced by David L. Davies and Donald W. Bouldin in 1979, is a metric for evaluating clustering algorithms. This is an internal evaluation scheme, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset. This has a drawback that a good value reported by this method does not imply the best information retrieval.
The smaller the score is, the more compact clusters are.
It works on any kind of representation R as long as a dissimilarity measure is associate to it.
$\delta_k = \frac{1}{n_k}\sum_{i\in{I_k}} D(x_i, p_k)$
$\Delta_{kk’} = D(p_k, p_{k’})$
$S = \frac{1}{K}\sum_{k=1}^K \max_{k’ \neq k}\frac{\delta_k + \delta_{k’}}{\Delta_{kk’}}$
Where :
- $K$ is the number of clusters.
- $\Delta_{kk’}$ the distance between the barycenters $p_k$ and $p_{k’}$ of clusters $p_k$ and $p_{k’}$ .
- $I_k$ is the set of elements of cluster $k$.
- $D(., .)$ is the dissimilarity measure.
Scalability
Computational complexity is in $O(n)$.
Memory complexity is in $O(n)$.
Input
A collection of values of same type from usal one to any other one as long as associated dissimilarity measure exists.
Currently supported types are :
- Numerical vector
- Binary vector
- Mixed Vector
- Monovariate time series
- Multivariate time series
Parameters
TODO COPIER COLLER de Ball Hall
- metric : a dissimilarity measure on
R - aggregator : an aggregating lambda
[R] => R, mean, mode, medoid, median, …
Ouput format
A score as a real value.
Associated visualization
Any visualization dealing with numerical values collection :
- Bar chart / Histogram
Business case
Usage