Evaluation of the Level of Privacy in Generated Synthetic Data

For industries like healthcare and finance having the capacity to create high quality synthetic data that does not have the privacy constraints of normal data is extremely valuable. However, a key issue is to evaluate to what extent the synthetic data preserve privacy. For example, if a generated data point is identical to a normal data point, this obviously violates privacy.
Master

In the project we will use the normal data to train generative model(s) that will be used to generate synthetic data. We will further develop different techniques to evaluate to what extent the synthetic data preserve privacy.

Goal

Develop methods to evaluate to what extent privacy is preserved in generated synthetic data.

Learning outcome

To explain issues with privacy in real-life data within different applications and industries. Train generative models on real-life data. To develop new machine learning methodology, and especially to be able to measure the level of privacy in generated synthetic data.

Qualifications

Competence in machine learning.

Supervisors

  • Michael Riegler
  • Hugo Hammer

References

 

Contact person