AHC
Classification Ascendante Hiérarchique(CAH) is a type of clustering algorithm that builds a hierarchy of clusters by successively merging smaller clusters into larger ones.
Representations
Keys _id, name, stepId, domainInformation, dataset, project, processingInfo, creationTS, latestUpdateTS
are unchanged and follow the classic of representation only dataSpecification keys
are changed and are described as above:
dataSpecification:keyword: Its value is “hardClustering” for the representation of hardClustering and “prototypes” for the representation of prototypes **. Cf to mandatory keys.valueType: It is the type of processing data values and then depends on them. Cf to mandatory keys.meaning: Its value is “hierarchical clustering with bisecting Kmeans”. Cf to mandatory keys.view: Cf to mandatory keys.dataLocationId: Cf to mandatory keys. Currently, only numerical vector can be saved during raw data loading which gives us un single template at this moment.
JSON template for numerical vector processing data representation.
- Representation of prototypes:
{
"dataSpecification": {
"keyword": "prototypes",
"valueType": {
"datatype": "numericalData",
"structureType": "Hierarchical Clustering"
},
"meaning": "hierarchical clustering with bisecting Kmeans",
"view": {
"name": "view_02-02-2023_11:07:45",
"id": "63db99816b332aad216c21e6"
},
"dataLocationId": "63db9a82a52c4511df13c8f5",
"hyperParameters": {
"maxNbClusters": 8,
"numberOfClusters": 2,
"seed": 1
}
}
}
- Representation of hardClustering:
{
"dataSpecification": {
"keyword": "hardClustering",
"valueType": {
"datatype": "numericalData",
"structureType": "Hierarchical Clustering"
},
"meaning": "hierarchical clustering with bisecting Kmeans",
"view": {
"name": "view_02-02-2023_11:07:45",
"id": "63db99816b332aad216c21e6"
},
"dataLocationId": "63db9a83a52c4511df13c902",
"hyperParameters": {
"maxNbClusters": 8,
"numberOfClusters": 2,
"seed": 1
}
}
}
Observations
- We save the observations of prototypes in mongoDB:
{
"_id": "ObjectId(63db9a83a52c4511df13c8f9)",
"clustersNumber": 3,
"prototypes": [
{
"clusterId": 0,
"prototype": [
{
"columnName": "cp_a_1_bar",
"value": 5.632708722788574e-17
},
{
"columnName": "cp_a_2_bar",
"value": -3.2857467549600007e-16
},
{
"columnName": "cp_r_bar",
"value": 7.510278297051431e-17
}
]
}
],
"representationId": "ObjectId(63db9a82a52c4511df13c8f2)"
}
- We save the observations of hardClustering in parquet:
