MTS LBM SOM

Principle

This pipeline works on multivariate time series (MTS) data, where a MTS value is a set of $d$ monovariate time series. It first vectorize with Fourier transform the raw input data to LBM_Input_DataType (to defined, i.e. $[R^{d_2}]$ ($d$ size array), where $d_2$ is the number of taken Fourier coefficient), let’s name it Vec1.

After what the dimension reduction PCA algorithm is applied on each vectorized feature to form a second vectorization that we will name Vec2.

Once this is achieved we apply LBM algorithm on Vec2 to returns a LBMModel.

We also generate a temporary vectorization, Vec3, by concatenating Vec2 features ($[R^{d_2}]$) to form a new data values of type $R^{d_3}$, where $d_3 = d * d_2$. It will then see its dimensionality reduced with another PCA round to reduce values dimensionality to $d_4 \lt d_3$ and get a last vectorization Vec4.

Vec4 dataset can then be applied with SOM algorithm to output a grid of prototypes on $R^{d_4}$ wrapped within a SOMModel. Staying in the $d_4$ reduced space we then look for SOMModel’s prototypes $K$NN within vectorization Vec4, let recall than the $K$NN are real data values even if they have been transformed from their original MTS space onto a $R^{d_4}$ space on contrary to SOMModel’s prototypes which are virtual. Once the $K$NN obtained, thank’s to their id, they are linked to their original representation which is visualizable.

It is then possible to look a customized version of the standard SOM prototypes grid. As for every SOM prototypes grid visualization, there are as many grid than features in the input dataset, in each grid, cells represent prototypes’ values for the given feature. For numerical vector dataset, a color gradient is used to symbolize the value intensity within the range of a given feature. Here, and for each grid, we are directly visualizing

a single MTS feature, i.e. one of its dimension of type monovariate time series. The specifity here is that rather than visualize a color within a gradient color range, we are looking a single feature of the $K$NN values transposed onto their input dataset types for a given virtual prototype. Each cell then shows the $KNN$ monovariate time series for the virtual associated SOMModel’s prototype.