AHC

Classification Ascendante Hiérarchique(CAH) is a type of clustering algorithm that builds a hierarchy of clusters by successively merging smaller clusters into larger ones.

Representations

Keys _id, name, stepId, domainInformation, dataset, project, processingInfo, creationTS, latestUpdateTS are unchanged and follow the classic of representation only dataSpecification keys are changed and are described as above:

JSON template for numerical vector processing data representation.

  • Representation of prototypes:
{
  "dataSpecification": {
    "keyword": "prototypes",
    "valueType": {
      "datatype": "numericalData",
      "structureType": "Hierarchical Clustering"
    },
    "meaning": "hierarchical clustering with bisecting Kmeans",
    "view": {
      "name": "view_02-02-2023_11:07:45",
      "id": "63db99816b332aad216c21e6"
    },
    "dataLocationId": "63db9a82a52c4511df13c8f5",
    "hyperParameters": {
      "maxNbClusters": 8,
      "numberOfClusters": 2,
      "seed": 1
    }
  }
}
  • Representation of hardClustering:
{
  "dataSpecification": {
    "keyword": "hardClustering",
    "valueType": {
      "datatype": "numericalData",
      "structureType": "Hierarchical Clustering"
    },
    "meaning": "hierarchical clustering with bisecting Kmeans",
    "view": {
      "name": "view_02-02-2023_11:07:45",
      "id": "63db99816b332aad216c21e6"
    },
    "dataLocationId": "63db9a83a52c4511df13c902",
    "hyperParameters": {
      "maxNbClusters": 8,
      "numberOfClusters": 2,
      "seed": 1
    }
  }
}

Observations

  • We save the observations of prototypes in mongoDB:
{
  "_id": "ObjectId(63db9a83a52c4511df13c8f9)",
  "clustersNumber": 3,
  "prototypes": [
    {
      "clusterId": 0,
      "prototype": [
        {
          "columnName": "cp_a_1_bar",
          "value": 5.632708722788574e-17
        },
        {
          "columnName": "cp_a_2_bar",
          "value": -3.2857467549600007e-16
        },
        {
          "columnName": "cp_r_bar",
          "value": 7.510278297051431e-17
        }
      ]
    }
  ],
  "representationId": "ObjectId(63db9a82a52c4511df13c8f2)"
}
  • We save the observations of hardClustering in parquet: Payload exchange schema