How to fill Jupiter Doc

Minimal requirements for an ipynb file doc

Files have to be named as follows :

  • Documentation_AlgoXXX // Documentation_QualityIndexYYY // …

Every doc file should at least contains :

  • Title : Name of features description. Ex : K-Means. // QUESTION : Does the title should be space separated or as lib class name : Ex : HardClustering / Hard Clustering
  • Labels / Tags :
  • Principle : A description as concise and informative as possible of :
    • What the algorithm does (to specify).
    • What kind of data it takes in input.
    • What kind of model it returns.
  • Scalability.
    • Complexity.
      • Computing : In $O(n)$, $O(n^2)$, $O(n.k)$, …
      • Memory (optional if not easily accessible).
  • Input : The type of input data.
  • Parameters.
    • 1 : parameter 1 + description
    • 2 : parameter 2 + description
    • .
    • .
  • Output.
    • Start by exposing the output model type, for ex : KMeansModel / Clusters / HardClustering.
    • Describe its features and meaning.
  • Associated visualization : List of visualization associate to pipeline output.
  • Practical strategies
  • Recommended association
  • Business case
  • Usage

Where each bullet point have to be in its own jupyter notebook cell.

Predictor

  • Input
    • Predictor hyperparameters
    • Queries type
      • Single query
      • Collection type
  • Output
    • Single query
    • Multiple queries

Clustering Algorithms

  • Predictor : the default predictor associate to this output model.
    • IFF the output is not a Clusters or a Clustering (Hard/Soft).
    • It contains set of hyperparameters of the predictor if they exist.
    • Express that it can be applied to one data or a collection, i.e for clustering, a distribution or a clustering (hard/soft).
      • It HAS to include :
        • Single data : ClusterId (Int) / XXXDistribution.
        • Collection of data : HardClustering / SoftClustering.

Quality indices

Internals

Externals

Explain concept of ground truth.

Dimension reduction

Vectorizator