Clustering Internal Indices

Clustering internal indices are measures that evaluate the quality or performance of a clustering algorithm. In the internal index clustering process we have these 3 methods for the moment:

  • Davis-Buldin index being more sensitive to cluster separation.
  • Ball-Hall index considering the volume of clusters.
  • wGSS index being based on the maximum similarity between each cluster and its nearest neighbor.

Representation

Keys _id, name, stepId, domainInformation, dataset, project, processingInfo, creationTS, latestUpdateTS are unchanged and follow the classic of representation only dataSpecification keys are changed and are described as above :

Currently, only numerical vector can be saved during raw data loading which gives us un single template at this moment.

JSON template for numerical vector processing data representation.


{
  "dataSpecification": {
    "keyword": "clusteringInternalIndices",
    "valueType": {
      "dataType": "numerical",
      "structureType": "scalar"
    },
    "meaning": "clustering internal indices",
    "view": {
      "name": "view_02-02-2023_10:06:08",
      "id": "63db8b1047b888b6942d7367"
    },
    "dataLocationId": "63db8d55c8a49c108a4d38af",
    "inPutParameters": {
      "dataLocationOfNormalization": "63db8baff5df0b234518d0e3",
      "metric": "euclidean",
      "indices_name": [
        "ballHall",
        "davisBuldin",
        "wGSS"
      ]
    }
  }
}

Observations

We save the observations in mongoDB.

{
  "_id": "ObjectId(63db8d55c8a49c108a4d38b2)",
  "name": "ballHall",
  "value": 1686361.751482711,
  "representationId": "ObjectId(63db8d55c8a49c108a4d38ac)"
}