Launch of governance hub for synthetic healthcare data
illustration for SEARCH

Launch of governance hub for synthetic healthcare data

Published:

The launch of the Synthetic hEalthcare dAta goveRnanCe Hub (SEARCH) marks a leap in healthcare research. A hub that will enhance data security and drive AI analytics in healthcare innovation by combining synthetic data and federated learning.

One of the main challenges to adopting big data projects in healthcare is the inability to combine and analyse diverse data sets due to privacy rights. Current anonymisation techniques destroy most of the valuable information in the datasets, reducing their research value. And in 2022, healthcare surpassed the finance sector as the most breached industry, making the protection of health data even more important. 

“If we want to take full advantage of the value of AI, machine learning, analytics, and bioinformatics we need to develop better methods to anonymize the data, and enable data collaborations between the public and private sectors”, says research scientist Vajira Thambawita. 

Learning from real data without accessing it 

Thambawita will lead the work from SimulaMet in the newly funded hub, alongside Molly Maleckar from Simula Research Laboratory. Together with the 26 other partners in the consortium, they will develop a new method of combining synthetic data and federated learning.

Federated learning is a method used in machine learning where the training process is distributed across multiple devices or servers holding local data samples, without exchanging the data themselves. This approach allows for the creation of a machine learning model that learns from data located at different sites while preserving data privacy.

But what does it mean to combine synthetic data generation and federated learning? Instead of learning from the real, sensitive data at each location, the algorithm learns to generate synthetic data that mimics the real data, without moving the real data from its original location. 

"This is ideal for privacy concerns because the real, sensitive data is never exposed or shared with others, but synthetic data", says Thambawita. 

Improving data security and AI analytics 

By combining synthetic data and federated networks, the goal is to boost data security and drive AI data analytics in genomics and precision medicine. This new combination addresses key challenges such as safeguarding sensitive information, fostering collaboration in research, and enabling advanced AI-driven analyses.

In their work, Thambawita and Maleckar will focus on two case studies. 

“Namely, gastrointestinal cancers and cardiovascular diseases to create the synthetic data. This allows us to validate how accurately it replicates real-world data and the clinical outcomes from AI analytics.” 

The datasets will be used to develop AI-based tools that support diagnostics, personalised treatment, and predictive health outcomes, improving patient care while reducing privacy risks.

Developing data generation and sharing solutions

They will develop and validate tools for Biomedical Data Generation and a sharing solution, along with generalisable methodologies for synthetic data generation and validation. These tools will generate valuable biomedical insights, and drive new collaborations across the private and the public sectors. 

A thorough benchmarking process will be conducted to assess the technical and scientific proficiency of these solutions. 

Project details: SEARCH 

Funded under the Innovative Health Initiative Joint Undertaking (IHI JU), SEARCH boasts an initial budget of over €15.2 million. SEARCH is a multi-disciplinary initiative focused on creating synthetic healthcare data and facilitating secure data sharing across the biomedical ecosystem. With a consortium of 26 partners from across Europe, SEARCH aims to accelerate healthcare innovation by generating FAIRified synthetic data for use in AI/ML models, enabling large-scale data collaborations while preserving privacy and compliance with regulatory standards.

Key Objectives and Innovations:

  • Next-Generation Synthetic Data: SEARCH leverages deep generative models to create realistic synthetic replicas of healthcare data (EHRs, genomics, medical signals, and radiological imaging), replicating the performance of real-world data while maintaining privacy.
  • Federated Learning for Privacy & Scale: By keeping patient data securely in its original location, SEARCH’s federated learning framework enhances collaboration across healthcare sectors while protecting sensitive information. This fosters AI model development and the wider adoption of new healthcare tools.
  • Accelerating AI Innovation: SEARCH will enable the development of cutting-edge AI-powered decision-support tools by providing gold-standard synthetic datasets for benchmarking biomedical AI solutions, fueling faster diagnostic tools, and creating new personalised healthcare approaches.

For more information, visit the SEARCH website

Associated contacts

Vajira Thambawita

Vajira Thambawita

Senior Research Scientist

Molly Maleckar

Molly Maleckar

Research Professor