Project examples during the ComPh Modelling Week

Jump to project

Project 1: Disease risk prediction using polygenic risk scores

 Supervisor: Turid Frahnow

Evidence has been accruing that a considerable proportion of phenotypic variation of complex traits can be explained by a set of genetic markers, which do not achieving significant impact as single marker. Therefore, polygenic risk scores (PRSs) have recently been used to summarize genetic effects and to predict individual trait values and/or risks of diseases. Nevertheless, the challenges for PRSs are manifold as well as the methods used to accept the challenges.

In this project, we will study the different ways of predicting disease risk based on a toy data set. The data set contains metabolic and genetic information of fictitious study participants with or without a metabolic disease.

The task of the students is to apply the knowledge from the previous lecture practically by implementing a PRS for the given data set. Thereby, the various challenges of PRSs will be addressed such as the selection of the best subset of genetic and non-genetic predictors and the comparison of different models to improve the predictive efficiency. Finally, the PRS will be validated using a second toy data set with unknown disease status.

Required skills: Basics in statistical modelling and basic understanding of biology (genetics, human physiology)

Recommended literature:

  1. Jake Lever, Martin Krzywinski & Naomi Altman. Model selection and overfitting. (2016) Nature Methods 13, 9: 703–704
  2. Simin Liu & Yiqing Song. Building Genetic Scores to Predict Risk of Complex Diseases in Humans: Is it Possible? (2010) Diabetes 59: 2729–2731
  3. Michael Laimighofer, Jan Krumsiek, Florian Büttner & Fabian J. Theis. Unbiased Prediction and Feature Selection in High-Dimensional Survival Regression. (2016) Journal of Computational Biology 23, 4: 279-290

Project 2: Drug target prediction using mechanistic models

 Supervisor: Jan Hasenauer

One of the most critical steps in drug development is the selection of the molecular target. As this decision strongly influences the success rates in the preclinical and clinical phases, many pharmaceutical companies nowadays exploit sophisticated design principles. These principles often rely on mathematical models obtained by the integration of large amounts of literature data and data obtained using own experiments.

In this project, we will study targets in the ErbB signaling pathway for the inhibition of downstream signaling via Akt and Erk. ErbB signaling is up-regulated in many cancers and several drugs have been designed to target different components of the pathway.

To identify potential targets, we will implement a mechanistic mathematical model of the biochemical reaction network. The model will be simulated and the sensitivity of the downstream components to different perturbations will be analyzed. Subsequently, we will implement different drug candidates in the model and assess the robustness of the design to model uncertainties. This analysis can be complemented by the study of drug synergies.

Recommended literature:

  1. Schoeberl et al., Therapeutically targeting ErbB3: A key node in ligand-induced activation of the ErbB receptor-PI3K axis. Science, 77: ra31, 2009.
  2. Fitzgerald et al., Systems biology and combination therapy in the quest for clinical efficacy, Nat. Chem. Biol., 2, 2006.

Project 3: RNA-seq differential gene expression analysis

 Supervisor: Lukas Simon

RNA-sequencing (RNA-seq) is an approach which profiles the transcriptome of cells and represents a major component of biological and biomedical research. This technology gives insight into the complex behavior of transcripts including gene expression and alternative splicing.

In this project the participants will analyze real RNA-seq gene expression data from the public gene expression data repository “ArrayExpress”. The data and required bioinformatics software will be provided via cloud computing. Each user will connect to a previously configured virtual machine and perform the analysis ‘in the cloud’. The participants will complete the standard RNA-seq processing workflow including read alignment and gene expression quantification. Next, differential gene expression analysis will be performed to identify significantly de-regulated genes. The tutorial will be conducted with in the R statistical software, so previous experience with coding software will be helpful but is not required. The overall goal of this project is to give a ‘hands-on’ introduction to the entirety of RNA-seq analysis starting from raw RNA-seq reads to gaining meaningful biological insight.

Recommended literature:

  1. Brazma A et al (2003). ArrayExpress--a public repository for microarray gene expression data at the EBI. Nucleic Acids Res.
  2. R Core Team (2016). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.

Project 4: Inverse problems of the heart

 Supervisor: Valeriya Naumova

Diagnosis of cardiac physiology from routine clinical signals can be aided by modelling and data-driven analysis. In this project, students will estimate electrophysiological properties of the heart from vectorcardiograms using a combination of finite element modelling and machine learning techniques. Students will create a ventricular electrophysiology model describing the anatomy (i.e. geometry and fiber architecture) and action potential propagation and generate a database of electrophysiology simulations where synthetic vectorcardiograms are associated with known model properties (e.g. stimulus location, conductivity, activation times).

The simulated vectorcardiogram signals will be used to train a machine learning algorithm to predict electrophysiology properties from untrained signals. Several prediction scenarios with various levels of difficulties will be offered to the project team. Students will also exploit techniques for dimensionality reduction in order to determine and extract specific features from the signals, which could be predictive of a patient’s response to the treatment options. Due to high dimensionality and roughness/noisiness of the simulated signals, students will learn and apply quite generic tools from the theory of inverse problems, learning, dimensionality reduction, and regularization theory. 

This project is ideal for students interested in whole-heart electrophysiology, patient-specific modelling, machine learning, dimensionality reduction, and inverse problem solving.

Recommended literature:

  1. Naumova V. et al. Extrapolation in variable RKHSs with application to the blood glucose reading. Inverse Problems 27, 2011.

Project 5: Accurate biophysical simulations with machine learning techniques

 Supervisor: Kristian Valen-Sendstad

Cardiovascular diseases are burdening the healthcare systems and the costs are expected to rise in the years to come. Acute stroke alone is estimated to cost the European countries an overwhelming 40 billion annually. Although systemic risk factors have been associated with higher prevalence of cardiovascular diseases, the cause of a stroke, atherosclerotic plaques and defect balloon-shaped blood vessels in the brain (aneurysms), are focally distributed. This highlights the role of blood flow-induced wall shear stress (WSS) and its continuous role in vascular remodelling. Direct measurements of these stresses are difficult and medical image-based computational fluid dynamics (CFD) has been extensively used to study the 'patient-specific' local abnormal forces in search for a mechanistic biological link to disease initiation.

State-of-the-art biophysical models are based on patient-specific medical images, but there is a significant amount of manual labor at the pre-processing stage, i.e., segmentation of the region of interest, to which the models will be applied. In addition to that, the medical images contain quite a lot of noise leading to difficulties in distinguishing between arteries and surrounding tissue, not to mention image artifacts versus smaller vessels, cf., the attached screenshot.

The aim of the current project is to explore the opportunities offered by machine learning tools for de-noising of the 2D/3D medical images towards better segmentation of the blood vessels in the brain. The impact of the studied tools will be significant, as it will enable large cohort studies of patient-specific cerebra blood flow.

Required skills: Mathematical understanding of image processing, sparsity concept and machine learning.

Recommended literature:

  1. Elad M. Sparse and Redundant Representations. From Theory to Applications in Signal and Image Processing. Springer Verlag, 2010.