Knowledge-guided machine learning for interpretable pattern discovery
The goal of the project is to develop unsupervised machine learning methods that will guide real data analysis with mechanistic models and reveal interpretable patterns to extract insights from complex data.
In order to understand complex systems such as the human metabolome (i.e., small biochemical compounds in the body) or human brain, different sensing technologies are used generating complex data sets, which are noisy, incomplete, multiway (i.e., with more than two axes of variation such as a people by metabolites by time array) and multimodal (i.e., data from different modalities such as genetics, microbiome and metabolomics). These data sets need to be analyzed by data mining methods that can turn such data into knowledge, for instance, to reveal unknown stratifications of people and an improved understanding of the underlying biochemical processes. While unsupervised methods have successfully revealed interpretable patterns from such complex data, they have so far been mainly data-driven and learn the patterns under specific structural assumptions, e.g., data following a low-rank structure. This project focuses on incorporating mechanistic models in unsupervised learning and guiding the analysis of real data with the prior scientific information encapsulated in mechanistic models. The project will address how to incorporate mechanistic models in unsupervised learning to reveal insights from complex data. As an application, we will focus on longitudinal metabolomics data analysis. We will jointly analyze dynamic (time-resolved) metabolomics data from a COPSAC (Copenhagen Prospective Studies on Asthma in Childhood) cohort together with dynamic metabolomics data simulated using a human whole-body metabolic model.
Goal
The goal of the project is to develop unsupervised machine learning methods that will guide real data analysis with mechanistic models and reveal interpretable patterns to extract insights from complex data.
Learning outcome
- Knowledge-guided Machine Learning (KGML): KGML is a rapidly emerging field and there have been significant efforts in this field under various names (e.g., informed ML, physics-informed deep learning, physics-guided neural networks) to incorporate prior information. This thesis will cover the relevant state-of-the-art.
- (Coupled) Tensor Factorizations: Tensor factorizations are effective methods in terms of revealing insights from multiway data sets, and they have been extended to joint analysis of data from multiple sources. The project will involve hands-on experience with algorithms and models based on (coupled) tensor factorizations.
- Interdisciplinary collaboration: You will learn to interact and communicate with domain experts.
Qualifications
- Background in numerical linear algebra, statistics and machine learning is required.
- Experience in mathematical modelling and real data analysis is a plus.
Supervisors
- Evrim Acar
Collaboration partners
- University of Amsterdam
- COpenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Denmark, University of Copenhagen
References
- L. Li, H. Hoefsloot, B. M. Bakker, D. Horner, M. A. Rasmussen, A. K. Smilde, and E. Acar, Longitudinal metabolomics data analysis informed by mechanistic models, bioRxiv, 2024