Clustering internal indices

Clustering internal indices are measures that evaluate the quality or performance of a clustering algorithm. In the internal index clustering process we have these 3 methods for the moment:

  • Davis-Buldin index being more sensitive to cluster separation.
  • Ball-Hall index considering the volume of clusters.
  • wGSS index being based on the maximum similarity between each cluster and its nearest neighbor.

Hyper-parameters

  • dataLocationIdOfNormalization: The dataLocationId of normalization process to get the dataSet from parquet.
  • metric: The metric we want to use for the moment we have euclidean.
  • indices_name: davisBuldin, ballHall and wGSS.

Payload JSON template example :

{
  "processingKeyword": "clusteringInternalIndices",
  "customer": "hephia",
  "name": "hephia_clusteringInternalIndices",
  "creationTS": 1675331301,
  "latestUpdateTS": 1675331301,
  "status": "1",
  "dataLocations": [
    {
      "role": "parquet",
      "dataLocationId": "63db8494eac14d53c1dce3ed"
    }
  ],
  "processingContext": {
    "processingName": "clusteringInternalIndices",
    "editionContext": "notebook",
    "callingContext": "ds-lab",
    "view": {
      "name": "view_02-02-2023_09:29:56",
      "id": "63db82947926d269918c5113"
    },
    "dataset": {
      "name": "cii_0006"
    },
    "project": {
      "id": 190242736,
      "name": "datasets"
    }
  },
  "stepId": 1,
  "hyperParameters": {
    "dataLocationIdOfNormalization": "63db8494eac14d53c1dce3ed",
    "metric": "euclidean",
    "indices_name": [
      "ballHall",
      "davisBuldin",
      "wGSS"
    ]
  }
}