Augmentation and Generation of Biomedical Signals for Privacy- Preserving AI
The students will explore and implement data augmentation techniques and generative models for biomedical signals. These methods may include traditional signal augmentation (e.g., noise addition, scaling, time-warping), as well as deep learning–based generative techniques such as TimeGAN, 1D-VAEs, or simple CNN-based generative pipelines.
This project is a small, focused task derived from the broader EU-funded initiative SEARCH (https://ihi-search.eu/), which aims to enable secure and ethical AI development using synthetic healthcare data. Within this initiative, one of the key challenges is generating and augmenting 1D biomedical signals such as EEG (electroencephalogram), PPG (photoplethysmogram), and ECG (electrocardiogram) to support the training of robust AI models without compromising patient privacy.
The students will explore and implement data augmentation techniques and generative models for biomedical signals. These methods may include traditional signal augmentation (e.g., noise addition, scaling, time-warping), as well as deep learning–based generative techniques such as TimeGAN, 1D-VAEs, or simple CNN-based generative pipelines.
Goals / learning outcome
Students will:
- Choose a target signal modality (e.g., EEG or PPG) using public or simulated data.
- Apply classical augmentation techniques and evaluate their effect on a downstream classification task.
- Implement and evaluate one generative model for producing synthetic signal data.
- Compare real and synthetic/augmented signals using statistical and signal similarity metrics (e.g., Dynamic Time Warping, FFT analysis, Maximum Mean Discrepancy).
- Reflect on utility, realism, and privacy of the generated data using methods derived from the state of the art and discussed and defined with supervisors.
This project contributes to the larger SEARCH initiative by addressing synthetic signal generation challenges and supporting the development of resilient biomedical AI pipelines.
Students will use open datasets (e.g., PhysioNet signals) or generate simulated signals to ensure data safety. All results should follow reproducible research standards, supporting the open science goals of the SEARCH project.
Qualifications
- Basic understanding of signal processing and digital signals
- Machine learning and Python programming knowledge
- Experience with neural networks is beneficial, but not mandatory
Supervisors
- Vajira Thambawita
- Molly Maleckar
- Pål Halvorsen
Collaboration partners
This project is part of SEARCH (Synthetic hEalthcare dAta goveRnanCe Hub), a multi-disciplinary initiative focused on creating synthetic healthcare data and facilitating secure data sharing across the biomedical ecosystem. Read more about SEARCH here.