The defense will take place at the OsloMet campus, Pilestredet 35, Ellen Gleditschs hus: PI254. Prior to the defense, at 10:00, there will be a trial lecture (title to be announced).
Main research findings
Data fusion is the task of jointly analyzing multiple interrelated data sets such that they can interact and inform each other. Data fusion is an indispensable data analysis approach in various application areas like medicine, chemometrics or remote sensing, where information about the same phenomenon is acquired from multiple modalities, e.g. multiple sensing technologies. While none of the modalities alone can provide a complete picture of the phenomenon, data from different modalities can complement each other. For instance, different imaging techniques like electroencephalography (EEG) and functional magnetic resonance (fMRI) provide complementary temporal and spatial resolutions of brain activity.
Data can often be represented in the form of matrices and higher-order tensors, i.e. multiway arrays. EEG imaging data, for example, can be organized as a three-way tensor with modes subjects, time and electrodes. Coupled matrix and tensor factorizations which model each data set as a sum of low-rank components, are an effective approach for the joint analysis of such data sets and can be used to extract interpretable latent patterns that give insight into the underlying processes generating the data.
However, data sets obtained from multiple sources are often heterogeneous which poses many challenges in data fusion. For instance, the data sets can consist of different data types, can have different sizes and dimensions, different noise characteristics, can be recorded with different sampling rates or can be both of dynamic and static nature. Furthermore, data sets can have both shared and unshared components. To account for the different characteristics of the data sets, coupled matrix and tensor factorization models require to incorporate different tensor decomposition models, different loss functions and diverse types of coupling structures between data sets. In addition, various constraints and regularization are regularly needed to promote identifiability and interpretability of the extracted patterns.
In this thesis, first, a coupled matrix and tensor factorization model that has the potential to automatically reveal shared and unshared components is applied to a multi-modal neuroimaging data set and potential biomarkers of a psychiatric disorder are extracted. We present a systematic study of this coupled matrix and tensor factorization model for biomarker discovery, demonstrating both the effectiveness and the limitations of the model.
In the main part of the thesis, a flexible algorithmic framework for constrained linearly coupled matrix and tensor factorizations is proposed. The framework supports a wide range of important constraints, regularizations and loss functions as well as linear coupling relations in a seamless way. The framework facilitates the use of two different tensor decomposition models, namely the popular CANDECOMP/PARAFAC (CP) model as well as the PARAFAC2 model. Furthermore, we introduce a new algorithm for fitting PARAFAC2 models that makes it possible to flexibly impose various constraints on all modes of PARAFAC2.
We show through experiments on synthetic data that our proposed approach can accurately extract the true underlying components in a variety of settings and that it achieves competitive performance and in some cases even superior performance compared to state-of-the-art methods in terms of computational efficiency. Furthermore, we demonstrate the promise of PARAFAC2-based coupled matrix and tensor factorization models for the joint analysis of dynamic and static data sets building on the ability of the PARAFAC2 model to account for either evolving patterns or individual time profiles in dynamic data. Experiments on real data from chemometrics and remote sensing show the versatility and applicability of the proposed framework employing various constraints and linear coupling structures.
- First opponent: Nikolaos Sidiropoulos, Professor /Ph.D., Electrical and Computer Engineering, University of Virginia, Virginia, USA.
- Second opponent: Borbála Hunyadi, Assistant Professor, Ph.D. Department of Microelectronics, Faculty of Electrical Engineering, Mathematics and Computer Science, TU Delft, Netherlands.
Chair of the committee
- Hugo Lewi Hammer, Professor, Ph.D., Department of Computer Science, Faculty of Technology, Art and Design, OsloMet, Oslo, Norway.
Leader of the public defense
- Anis Yazidi, Professor, Department of Computer Science, Innovation, Digital Transformation and Sustainability, Faculty of Technology, Art and Design, OsloMet, Oslo.
- Main supervisor: Evrim Acar Ataman, Head of Department, Chief Research Scientist, Research Professor, Simula.
- Co-supervisor: Jeremy E. Cohen, CNRS researcher at CREATIS, Department Myriad, Centre de recherche en acquisition et traitement de l’image pour la sante, Lyon.