Ball Hall

Labels / Tags

  • Internal quality index
  • NumericalEvaluation

Principle

Ball Hall index is the average over each clusters of their mean dispersion, where mean dispersion is the average distance between point and the prototype of the same cluster.

The smaller the score is, the more compact clusters are.

It works on any kind of representation R as long as a dissimilarity measure is associate to it.

$ S = \frac{1}{K}\sum_{k=1}^K \frac{1}{n_k}\sum_{i\in{I_k}} D(x_i, p_k) $

Where :

  • $K$ is the number of clusters.
  • $n_k$ is the number of element in cluster $k$.
  • $I_k$ is the set of elements of cluster $k$.
  • $D(., .)$ is the dissimilarity measure.

Scalability

Computational complexity is in $O(n)$.

Input

A collection of values of same type from usal one to any other one as long as associated dissimilarity measure exists.

Currently supported types are :

  • Numerical vector
  • Binary vector
  • Mixed Vector
  • Monovariate time series
  • Multivariate time series

Parameters

1 : metric, a dissimilarity measure the given input type.

Let be R the type of the data values. Then given metric parameter must by of type :

  • $D : $(R, R) $=>$ Numerical value

2 : aggregator, an aggregating function for a collection of values.

It takes must be takes the following form : [R] => R, where [R] is a collection of values of type R.

It exists many aggregators working on a various range of data values type.

Following sub section will expose some of them organized by data values type.

Numerical value

mean

median

mode

medoid

Binary value

mode

Threshold vote

Ouput

A NumericalEvaluation which is here a simple numerical value.

Associated visualization

Every visualization dealing with numerical collection :

  • Bar chart / Histogram

Business case

Usage