On evaluation metrics for medical applications of artificial intelligence