Czekanowski-Dice TODO

NMI

Labels / Tags

  • Tagged according to tag list
  • quality index

Principle

The Davies–Bouldin index (DBI), introduced by David L. Davies and Donald W. Bouldin in 1979, is a metric for evaluating clustering algorithms. This is an internal evaluation scheme, where the validation of how well the clustering has been done is made using quantities and features inherent to the dataset. This has a drawback that a good value reported by this method does not imply the best information retrieval.

The smaller the score is, the more compact clusters are.

It works on any kind of representation R as long as a dissimilarity measure is associate to it.

$\delta_k = \frac{1}{n_k}\sum_{i\in{I_k}} D(x_i, p_k)$

$\Delta_{kk’} = D(p_k, p_{k’})$

$S = \frac{1}{K}\sum_{k=1}^K \max_{k’ \neq k}\frac{\delta_k + \delta_{k’}}{\Delta_{kk’}}$

Where :

  • $K$ is the number of clusters.
  • $\Delta_{kk’}$ the distance between the barycenters $p_k$ and $p_{k’}$ of clusters $p_k$ and $p_{k’}$ .
  • $I_k$ is the set of elements of cluster $k$.
  • $D(., .)$ is the dissimilarity measure.

Scalability

Computational complexity is in $O(n)$.

Memory complexity is in $O(n)$.

Input

A collection of values of same type from usal one to any other one as long as associated dissimilarity measure exists.

Currently supported types are :

  • Numerical vector
  • Binary vector
  • Mixed Vector
  • Monovariate time series
  • Multivariate time series

Parameters

TODO COPIER COLLER de Ball Hall

  • metric : a dissimilarity measure on R
  • aggregator : an aggregating lambda [R] => R, mean, mode, medoid, median, …

Ouput format

A score as a real value.

Associated visualization

Any visualization dealing with numerical values collection :

  • Bar chart / Histogram

Business case

Usage