Clustering Internal Indices
Clustering internal indices are measures that evaluate the quality or performance of a clustering algorithm. In the internal index clustering process we have these 3 methods for the moment:
- Davis-Buldin index being more sensitive to cluster separation.
- Ball-Hall index considering the volume of clusters.
- wGSS index being based on the maximum similarity between each cluster and its nearest neighbor.
Representation
Keys _id, name, stepId, domainInformation, dataset, project, processingInfo, creationTS, latestUpdateTS are unchanged and follow the classic of representation only dataSpecification keys are changed and are described as above :
dataSpecification:keyword: Its value is “clusteringInternalIndices”. Cf to mandatory keys.valueType: It is the type of processing data values and then depends on them. Cf to mandatory keys.meaning: Its value is “clustering internal indices”. Cf to mandatory keys.view: Cf to mandatory keys.dataLocationId: Cf to mandatory keys.
Currently, only numerical vector can be saved during raw data loading which gives us un single template at this moment.
JSON template for numerical vector processing data representation.
{
"dataSpecification": {
"keyword": "clusteringInternalIndices",
"valueType": {
"dataType": "numerical",
"structureType": "scalar"
},
"meaning": "clustering internal indices",
"view": {
"name": "view_02-02-2023_10:06:08",
"id": "63db8b1047b888b6942d7367"
},
"dataLocationId": "63db8d55c8a49c108a4d38af",
"inPutParameters": {
"dataLocationOfNormalization": "63db8baff5df0b234518d0e3",
"metric": "euclidean",
"indices_name": [
"ballHall",
"davisBuldin",
"wGSS"
]
}
}
}
Observations
We save the observations in mongoDB.
{
"_id": "ObjectId(63db8d55c8a49c108a4d38b2)",
"name": "ballHall",
"value": 1686361.751482711,
"representationId": "ObjectId(63db8d55c8a49c108a4d38ac)"
}