New tool and guide on evaluation metrics for medical applications of AI
There is an increased interest in machine learning as a medical tool, to improve healthcare applications and support decision-making for medical professionals. At the same time, there is a lack of understanding regarding how to properly evaluate such models, so that the performance measured translates into the clinical practice.
– As we start considering adoption of AI algorithms in medicine, there is a critical need for clinicians to understand what the performance metrics of those algorithms are and what they mean. This is similar to understanding the accuracy of diagnostic tests we use in clinical medicine. To address this need, Thambawita et al provide a framework and a ready-to-use interpretable metric tool for anyone to use, says Sravanthi Parasa, gastroenterologist involved in the project.
Helps researchers properly assess the quality of binary classification models
The newly launched open-source and web-based tool, MediMetrics, intends to let researchers and clinicians easily calculate, verify, understand and incorporate different metrics into their research. The tool was launched in conjunction with a study on how machine learning models may be inaccurately evaluated when only using a subset of metrics to measure performance.
Applying machine learning models to the real world is very different from a typical experimental setting, where what works during testing may fail in a clinical setting. Therefore, it is essential to thoroughly evaluate all models using the appropriate metrics to fully understand how they may perform in different environments and scenarios.
– Different metrics address different aspects of a model’s performance. It is therefore very important to understand exactly which question is being answered by the metric one is using, says postdoctoral researcher Inga Strümke.
MediMetrics is available on medimetrics.no.
The code is available on GitHub.