Synthetic Medical Tabular Data Generation using Deep Generative Models

In this project, students will focus on the generation of synthetic tabular medical data using deep learning methods. They will design and evaluate models such as Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Diffusion Models to create data resembling structured patient records (e.g., demographics, lab test values, and diagnoses).

Being a master’s student

List of projects

This project is derived from the larger European Union project SEARCH (https://ihi-search.eu/), which focuses on developing synthetic data and AI tools for healthcare. Within SEARCH, one major goal is to generate realistic and privacy-preserving synthetic biomedical data to support research and innovation without compromising patient confidentiality.

Goals / learning outcomes

Students will:

Select or simulate a representative medical tabular dataset.
Implement at least two generative approaches for tabular data generation.
Apply and compare metrics for realism, utility, and privacy (e.g., Maximum Mean Discrepancy, JS Divergence, Distance to Closest Record).
Explore basic interpretability methods (e.g., latent space analysis, feature importance).
Reflect on the ethical implications of synthetic data generation based on provided frameworks and discussions with supervisors.

Students will gain practical experience with cutting-edge generative models and evaluation techniques in a real-world context tied to an EU-funded project. It bridges mathematical modeling, data science, and healthcare innovation, providing a strong foundation for future work in AI, privacy, and biomedical research.

Qualifications

Fundamentals of machine learning and Python programming
Basic understanding of deep learning models and neural networks
Familiarity with tabular data analysis and statistics

Public or simulated datasets will be used for this project. Students are expected to follow good reproducibility and documentation practices, contributing to open science principles promoted in the SEARCH project.

Supervisors

Vajira Thambawita
Molly Maleckar
Pål Halvorsen

Collaboration partners

This project is part of SEARCH (Synthetic hEalthcare dAta goveRnanCe Hub), a multi-disciplinary initiative focused on creating synthetic healthcare data and facilitating secure data sharing across the biomedical ecosystem. Read more about SEARCH here.

Synthetic Medical Tabular Data Generation using Deep Generative Models

Goals / learning outcomes

Qualifications

Supervisors

Collaboration partners

Associated contacts