Talk: Data Fusion based on Coupled Matrix and Tensor Factorisations
When the goal is to discover the underlying patterns in a complex system such as the human metabolism or the human brain, the complexity of the problem necessitates collection and analysis of data from multiple sources. Therefore, data fusion, i.e., knowledge extraction by jointly analyzing complementary data sets, is a topic of interest in many fields. For instance, in metabolomics, analytical platforms such as Liquid Chromatography - Mass Spectrometry and Nuclear Magnetic Resonance spectroscopy are used for chemical profiling of biological samples. Measurements from different platforms are capable of detecting different chemical compounds with different levels of sensitivity, and their fusion has the potential to provide a more complete picture of the metabolome related to a specific condition. However, data fusion remains a challenging task since there is a lack of data mining tools that can jointly analyze incomplete (i.e., with missing entries) heterogeneous (i.e., in the form of higher-order tensors and matrices) data sets, and capture the underlying shared/unshared patterns.
We formulate data fusion as a coupled matrix and tensor factorization (CMTF) problem and discuss its extension to structure-revealing data fusion, i.e., fusion models that can identify shared and unshared factors in coupled data sets. In order to solve the coupled factorization problem, we use an all-at-once optimization approach, which easily extends to coupled analysis of data sets with missing entries. Numerical experiments on simulated and experimental coupled data sets demonstrate that while traditional methods based on matrix factorizations have limitations in terms of jointly analyzing heterogeneous data sets, the structure-revealing CMTF model can successfully capture the underlying patterns by exploiting the low-rank structure of higher-order tensors. We will show the broad impact of CMTF-based fusion models with applications from metabolomics, neuroscience and recommender systems.