Clustering internal indices
Clustering internal indices are measures that evaluate the quality or performance of a clustering algorithm. In the internal index clustering process we have these 3 methods for the moment:
- Davis-Buldin index being more sensitive to cluster separation.
- Ball-Hall index considering the volume of clusters.
- wGSS index being based on the maximum similarity between each cluster and its nearest neighbor.
Hyper-parameters
dataLocationIdOfNormalization: The dataLocationId of normalization process to get the dataSet from parquet.metric: The metric we want to use for the moment we haveeuclidean.indices_name:davisBuldin,ballHallandwGSS.
Payload JSON template example :
{
"processingKeyword": "clusteringInternalIndices",
"customer": "hephia",
"name": "hephia_clusteringInternalIndices",
"creationTS": 1675331301,
"latestUpdateTS": 1675331301,
"status": "1",
"dataLocations": [
{
"role": "parquet",
"dataLocationId": "63db8494eac14d53c1dce3ed"
}
],
"processingContext": {
"processingName": "clusteringInternalIndices",
"editionContext": "notebook",
"callingContext": "ds-lab",
"view": {
"name": "view_02-02-2023_09:29:56",
"id": "63db82947926d269918c5113"
},
"dataset": {
"name": "cii_0006"
},
"project": {
"id": 190242736,
"name": "datasets"
}
},
"stepId": 1,
"hyperParameters": {
"dataLocationIdOfNormalization": "63db8494eac14d53c1dce3ed",
"metric": "euclidean",
"indices_name": [
"ballHall",
"davisBuldin",
"wGSS"
]
}
}