Clustering Internal Indices

Clustering internal indices are measures that evaluate the quality or performance of a clustering algorithm. In the internal index clustering process we have these 3 methods for the moment:

Davis-Buldin index being more sensitive to cluster separation.
Ball-Hall index considering the volume of clusters.
wGSS index being based on the maximum similarity between each cluster and its nearest neighbor.

Representation

Keys _id, name, stepId, domainInformation, dataset, project, processingInfo, creationTS, latestUpdateTS are unchanged and follow the classic of representation only dataSpecification keys are changed and are described as above :

dataSpecification :
- keyword : Its value is “clusteringInternalIndices”. Cf to mandatory keys.
- valueType : It is the type of processing data values and then depends on them. Cf to mandatory keys.
- meaning: Its value is “clustering internal indices”. Cf to mandatory keys.
- view: Cf to mandatory keys.
- dataLocationId: Cf to mandatory keys.

Currently, only numerical vector can be saved during raw data loading which gives us un single template at this moment.

JSON template for numerical vector processing data representation.


{
  "dataSpecification": {
    "keyword": "clusteringInternalIndices",
    "valueType": {
      "dataType": "numerical",
      "structureType": "scalar"
    },
    "meaning": "clustering internal indices",
    "view": {
      "name": "view_02-02-2023_10:06:08",
      "id": "63db8b1047b888b6942d7367"
    },
    "dataLocationId": "63db8d55c8a49c108a4d38af",
    "inPutParameters": {
      "dataLocationOfNormalization": "63db8baff5df0b234518d0e3",
      "metric": "euclidean",
      "indices_name": [
        "ballHall",
        "davisBuldin",
        "wGSS"
      ]
    }
  }
}

Observations

We save the observations in mongoDB.

{
  "_id": "ObjectId(63db8d55c8a49c108a4d38b2)",
  "name": "ballHall",
  "value": 1686361.751482711,
  "representationId": "ObjectId(63db8d55c8a49c108a4d38ac)"
}