datalocation template
DataLocation
DataLocation class describe all necessary information on which system data is store (currently Mongo/Parquet/GCS) and how to reach it with various strategy which may include path, database, collection.
It becomes storage specific with two version for parquet, mongo, and CSV data.
Common mandatory keys description :
_id: MongoId.name: User defined name for this DataLocation, if not specify it will be named with the keynameof its associated representation followed by “_dataLocation” (ex: “myRepresentation_dataLocation”).databaseEngine: Used database engine.environment: Enumerator describing used environment, it takes one of following values :dev,preprod,prod.representationId: MongoId of associated representation.value: It describe how access to related data and depends on databaseEngine value.
(datalocation_parquet)=
Parquet data storage
value key description :
value:location: parquet folder.restrictedToColumns: Array of String with column name which are extracted. If every column are taken the array is empty.
DataLocation JSON template :
{
"_id": "62bc108b8c51f362811989c8",
"name": "processingData_TS",
"databaseEngine": "parquet",
"environment": "dev",
"representationId": "62bc108b8c51f362811989c6",
"value": {
"location": "gs://hephia-database-dev/parquet/industry/processingData",
"restrictedToColumns": [
"col1",
"col2"
]
}
}
(datalocation_csv_gcs)=
CSV in GCS data storage
value key description :
value:location: parquet folder.restrictedToColumns: Array of String with column name which are extracted. If every column are taken the array is empty.
Update CSV file JSON template :
{
"_id": "62bc108b8c51f362811989c8",
"name": "numericalData_TS",
"databaseEngine": "gcs",
"environment": "dev",
"representationId": "62bc108b8c51f362811989c6",
"value": {
"location": "gs://hephia-database-dev/parquet/clientName/customFile.csv",
"restrictedToColumns": [
"col1",
"col2"
]
}
}
(datalocation_mongo)=
Mongo data storage
value key description :
- It is a dictionary containing two keys :
database: Mongo database where is store collection to search in. It will often be the same asenvironmentkey value.collection: name of the collection in mongoDB.
{
"_id": "62bc108b8c51f362811989c8",
"databaseEngine": "mongo",
"environment": "dev",
"representationId": "62bc108b8c51f362811989c6",
"value": {
"database": "dev",
"collection": "representations_som_demo_7"
}
}