Extracting insights from multiple metabolomics data sets through data fusion
The project focuses on joint analysis of NMR (Nuclear Magnetic Resonance) spectroscopy measurements of plasma and urine samples as well as faecal metabolome data using interpretable multimodal data mining.
The project focuses on the analysis of complex biological datasets—specifically, NMR measurements of plasma and urine samples and faecal metabolome data collected during a meal challenge test from the COPSAC2000 cohort. This study will use the data from 299 generally healthy individuals. Analysis of such data sets is a challenging task since data sets are multimodal (i.e., comes from different sources), higher-order (also referred to as multiway), and some are dynamic while some are static. Data analysis becomes even more challenging when the goal is to extract interpretable patterns from such data with the goal of revealing insights, i.e., how individuals differ in terms of their metabolic response to food, how that is observed in plasma, urine and faecal metabolome and whether extracted patterns are related to metabolic dysfunction. Tensor factorizations have been successfully used to reveal the underlying patterns in higher-order tensors, and extended to joint analysis of multimodal data through coupled matrix/tensor factorizations (CMTF). We will use CMTF-based approaches to jointly analyze these data sets and assess the performance of the methods in terms of pattern discovery.
Expected outcomes: By the end of the thesis, you will have developed a strong foundation in data analysis, matrix/tensor factorization methods, and interdisciplinary research. In addition, you will gain experience in analyzing state-of-the-art biomedical data sets. The results of this project may contribute to the identification of new biomarkers facilitating precision health potentially leading to publications or presentations at scientific conferences.
Goal
The goal of the project is to capture metabolic differences among individuals in response to a meal challenge test.
Learning outcome
- Matrix and Tensor Decomposition Methods: The thesis project will provide hands-on experience with matrix (e.g., PCA, NMF) and tensor decompositions (e.g., CP decomposition), essential tools for analyzing high-dimensional data. You will also get familiar with different models and numerical optimization methods.
- Data Fusion Techniques: You will learn how to analyze multimodal data via CMTF based approaches.
- Interdisciplinary collaboration: You will learn to interact with domain experts (e.g., clinicians) when communicating the results of your analysis.
Qualifications
Background in linear algebra and statistics is required. Fluency in Matlab is a plus.
Supervisors
- Evrim Acar
- Balazs Erdos
- Carla Schenker
Collaboration partners
- COpenhagen Prospective Studies on Asthma in Childhood (COPSAC), Herlev and Gentofte Hospital, University of Copenhagen, Denmark, University of Copenhagen
References
- NMR measurements of plasma samples have been previously analyzed in the following publications:
- S. Yan, L. Li, D. Horner, P. Ebrahimi, B. Chawes, L. O. Dragsted, M. A. Rasmussen, A. K. Smilde, E. Acar. Characterizing human postprandial metabolic response using multiway data analysis, Metabolomics, 20:50, 2024
- L. Li, S. Yan, D. Horner, M. A. Rasmussen, A. K. Smilde, E. Acar. Revealing static and dynamic biomarkers from postprandial metabolomics data through coupled matrix and tensor factorizations, Metabolomics, 20:86, 2024
- L. Li, H. Hoefsloot, B. M. Bakker, D. Horner, M. A. Rasmussen, A. K. Smilde, and E. Acar, Longitudinal metabolomics data analysis informed by mechanistic models, bioRxiv, 2024