Evaluation of the Level of Privacy in Generated Synthetic Data
For industries like healthcare and finance having the capacity to create high quality synthetic data that does not have the privacy constraints of normal data is extremely valuable. However, a key issue is to evaluate to what extent the synthetic data preserve privacy. For example, if a generated data point is identical to a normal data point, this obviously violates privacy.
Master
In the project we will use the normal data to train generative model(s) that will be used to generate synthetic data. We will further develop different techniques to evaluate to what extent the synthetic data preserve privacy.
Goal
Develop methods to evaluate to what extent privacy is preserved in generated synthetic data.
Learning outcome
To explain issues with privacy in real-life data within different applications and industries. Train generative models on real-life data. To develop new machine learning methodology, and especially to be able to measure the level of privacy in generated synthetic data.
Qualifications
Competence in machine learning.
Supervisors
- Michael Riegler
- Hugo Hammer
References
- Table evaluator
- DeepFake electrocardiograms using generative adversarial networks are the beginning of the end for privacy issues in medicine