Prediction of dry eye disease using metabolomics data
Development of methods to improve metabolomics data handling for predicting dry eye disease.
Utilizing the TwinsUK database, we're examining a metabolomics dataset to predict dry eye disease from 901 metabolites in 1,500 participants. We've successfully predicted dry eye using machine learning. To enhance predictions, we're refining data pre-processing. Our improvements target: 1. Outliers: Assessing outlier impact, considering various handling methods. 2. Imputation: With 10.4% missing data, we're examining imputation techniques, considering why data might be missing. 3. Standardization/Normalization: Evaluating different methods and their effects on predictions. Additional exploration areas: 4. Evaluation of Slope: Using "Slope" to identify significant metabolites in dry eye prediction. 5. Bootstrap Methods: Ensuring robust results and controlling cofactors. 6. Explainable AI Methods: Investigating interactions between metabolites and their significance to dry eye disease.
Goal
Refine methods for outlier handling, data imputation, and standardization.
Learning outcome
- Machine Learning
- Outlier detection
- Data imputation
- Data standardization
- Medical applications
Qualifications
- Python programming
- Knowledge about machine learning is an advantage
Supervisors
- Hugo Hammer
- Michael Riegler
Collaboration partners
- Leif Hynnekleiv, OsloMet
References
- Associations between metabolomics and dry eye: https://www.aaojournal.org/article/S0161-6420(16)31057-0/fulltext
- Missing values types: https://stefvanbuuren.name/fimd/sec-MCAR.html
- RALPS method for metabolomics normalization: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9978579/
- Imputation in proteomics 2022: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8776850/
- Imputation in metabolomics 2018: https://www.nature.com/articles/s41598-017-19120-0