Model Evaluation
效能評估
1 Reading List
"Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning" by Sebastian Raschka, 2020. (https://arxiv.org/abs/1811.12808)
"Towards a guideline for evaluation metrics in medical image segmentation", Müller et al. BMC Research Notes (2022) 15:210 (https://doi.org/10.1186/s13104-022-06096-y)
2 AI 模型效能評估簡介
當我們建立人工智慧模型後,緊接而來的問題便是模型表現如何,由於表現的面相眾多,我們必須要掌握各種不同的衡量方式,才能使模型衡量面面俱到。在這一系列的影片中,我們將介紹各種面向的人工智慧模型效能評估方法。我們將會先介紹衡量分類模型的指標,並且利用其特性連結到衡量分割模型的指標。最後,我們也會針對模型的變化程度、拓展性,與合理性,介紹一些常見的評估方式。
Slides |
AI 模型效能評估.pdf [XYZ01]
3 多分類的效能評估
Reference
Mathematical formula basis
Explanation for micro-average & macro-average recall (sensitivity)
Galar, M., Fernández, A., Barrenechea, E., Bustince, H., & Herrera, F. (2011). An overview of ensemble methods for binary classifiers in multi-class problems: Experimental study on one-vs-one and one-vs-all schemes. Pattern Recognition, 44(8), 1761-1776.
4 Statistical Concepts
Confusion Matrix, Sensitivity, and Specificity
Machine Learning Fundamentals: The Confusion Matrix・StatQuest・[07:12]
Machine Learning Fundamentals: Sensitivity and Specificity・StatQuest・[11:46]
Sensitivity & Specificity Explained・Physiotutors・[07:57]
How to calculate Sensitivity and Specificity・Physiotutors・[05:44]
Sensitivity, Specificity, PPV, NPV・Dirty Medicine・[11:14]
The tradeoff between sensitivity and specificity・Rahul Patwari・[12:35]
PPV, NPV, and Prevalence
How to Calculate Positive (PPV) and Negative Predictive Values (NPV)・Physiotutors・[05:17]
Likelihood Ratios Explained・Physiotutors・[08:18]
Sensitivity, Specificity, PPV, NPV・Dirty Medicine・[11:14]
Positive predictive value - the role of prevalence・Medmastery・[05:12]
Positive Predictive Value - The role of specificity・Medmastery・[02:53]
Precision–Recall Curve (PRC) for imbalanced datasets
The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets・Takaya Saito and Marc Rehmsmeier・https://doi.org/10.1371%2Fjournal.pone.0118432
Likelihood Ratios
Likelihood ratios in diagnostic testing・Wikipedia
Diagnostic tests 4: likelihood ratios・Deeks and Altman・BMJ. 2004・https://doi.org/10.1136/bmj.329.7458.168
Cross Validation
A Guide to Cross-Validation for Artificial Intelligence in Medical Imaging ・ Bradshaw, Huemann, Hu, and Rahmim ・ https://doi.org/10.1148/ryai.220232
A Guide to Cross-Validation for AI (with Dr. Tyler Bradshaw) ・ Armam Rahmim ・ [49:19]
Cross-validation Colab Tutorial with examples using MNIST ・ Zach Huemann
Cross Validation in Medical AI・蔡欣翰 ・ [47:11]
How to Train a Final Machine Learning Model・Jason Brownlee・February 9, 2017
Complete Guide to Cross Validation・Rob Mulla・[29:48]
SHapley Additive exPlanations (SHAP)
A method in machine learning that explains individual predictions by quantifying the contribution of each feature to a model's output.
An Introduction to SHAP Values and Machine Learning Interpretability・DataCamp・9 min read
SHapley Additive exPlanations (SHAP)・Conor O'Sullivan・Playlist [01:10:42]
SHAP values for beginners | What they mean and their applications・[7:07]
SHAP with Python (Code and Explanations)・[15:41]
The mathematics behind Shapley Values・[11:48]
Shapley Values for Machine Learning・[11:06]
4 Significant Limitations of SHAP・[6:35]
SHAP Violin and Heatmap Plots | Interpretations and New Insights・[5:26]
SHAP for Binary and Multiclass Target Variables | Code and Explanations for Classification Problems・[12:59]
Introduction to SHAP with Python・Conor O'Sullivan・10 min read
To Be Organized
Kappa Value Calculation | Reliability・Physiotutors・[03:28]
Reliability & Validity Explained・Physiotutors・[02:56]
Reliability (Reproducability) Explained | Statistics in Healthcare・Physiotutors・[06:09]
Agreement Explained | Statistics in Healthcare・Physiotutors・[09:10]
X XYZ
Classification
Hypothesis Testing
Fisher's Exact Test [Slides, Video, Codes]
Mann-Whitney U Test [Slides, Video, Codes]
Paired DeLong's Method [Slides, Video, Codes]
P-Value [Slides, Video, Codes]
Confusion Matrix
Sensitivity [Slides, Video, Codes]
Specificity [Slides, Video, Codes]
Accuracy [Slides, Video, Codes]
Balanced Accuracy [Slides, Video, Codes]
ROC Curve
AUC [Slides, Video, Codes]
pROC [Slides, Video, Codes]
Descriptive Statistics
Mean [Slides, Video, Codes]
Median [Slides, Video, Codes]
IQR [Slides, Video, Codes]
Estimation
Confidence Interval [Slides, Video, Codes]
Likelihood
Likelihood ratio [Slides, Video, Codes]
Positive likelihood ratio [Slides, Video, Codes]
Negative likelihood ratio [Slides, Video, Codes]
Likelihood ratio test [Slides, Video, Codes]
Segmentation
Hypothesis Testing
Fisher's Exact Test [Slides, Video, Codes]
Mann-Whitney U Test [Slides, Video, Codes]
P-Value [Slides, Video, Codes]
Confusion Matrix
Sensitivity [Slides, Video, Codes]
Specificity [Slides, Video, Codes]
Accuracy [Slides, Video, Codes]
Positive predictive value [Slides, Video, Codes]
Negative predictive value [Slides, Video, Codes]
True positive rate [Slides, Video, Codes]
False positive rate [Slides, Video, Codes]
ROC Curve
AUC [Slides, Video, Codes]
Descriptive Statistics
Mean [Slides, Video, Codes]
Median [Slides, Video, Codes]
IQR [Slides, Video, Codes]
Estimation
Confidence Interval [Slides, Video, Codes]
Detection
A
A1 [Slides, Video, Codes]
A2 [Slides, Video, Codes]
B
B1 [Slides, Video, Codes]
B2 [Slides, Video, Codes]
C
C1 [Slides, Video, Codes]
C2 [Slides, Video, Codes]
Registration
A
A1 [Slides, Video, Codes]
A2 [Slides, Video, Codes]
B
B1 [Slides, Video, Codes]
B2 [Slides, Video, Codes]
C
C1 [Slides, Video, Codes]
C2 [Slides, Video, Codes]
Prognosis
Confusion Matrix
Sensitivity [Slides, Video, Codes]
Specificity [Slides, Video, Codes]
Accuracy [Slides, Video, Codes]
Positive predictive value [Slides, Video, Codes]
Negative predictive value [Slides, Video, Codes]
True positive rate [Slides, Video, Codes]
False positive rate [Slides, Video, Codes]
Hypothesis Testing
Student t test [Slides, Video, Codes]
Log rank test [Slides, Video, Codes]
P-Value [Slides, Video, Codes]
Survival analysis
Kaplan-Meier analysis [Slides, Video, Codes]
Survival curve [Slides, Video, Codes]
C-index [Slides, Video, Codes]
Odds
Odds ratio [Slides, Video, Codes]
Estimation
Confidence interval [Slides, Video, Codes]
Descriptive Statistics
Median [Slides, Video, Codes]
ROC Curve
AUC [Slides, Video, Codes]
XGBoost
SHAP [Slides, Video, Codes]
4 Case Study
Segmentation
Classification
Deep Learning to Distinguish Pancreatic Cancer Tissue From Non-cancerous Pancreatic Tissue: a Retrospective Study With Cross-racial External Validation
The Lancet Digital Health, Vol. 2, Iss. 6, pp. 303-313, 2020
https://doi.org/10.1016/S2589-7500(20)30078-9
Hypothesis Testing
Fisher's Exact Test
Mann-Whitney U Test
P-Value
Confusion Matrix
Sensitivity
Specificity
Accuracy
Balanced Accuracy
ROC Curve
AUC
Descriptive Statistics
Mean
Median
IQR
Estimation
Confidence Interval
Classification
Detection of Pancreatic Cancer With Two- and Three-dimensional Radiomic Analysis in a Nationwide Population-based Real-world Dataset
BMC Cancer, 23:58, 2023
https://doi.org/10.1186/s12885-023-10536-8
Likelihood
Likelihood ratio
Positive likelihood ratio
Negative likelihood ratio
Likelihood ratio test
Confusion Matrix
Sensitivity
Specificity
Accuracy
ROC Curve
AUC
pROC
Hypothesis Testing
Fisher's Exact Test
Mann-Whitney U Test
Paired DeLong's Method
Prognosis prediction
Radiomic Analysis of Magnetic Resonance Imaging Predicts Brain Metastases Velocity and Clinical Outcome After Upfront Radiosurgery
Neuro-Oncology Advances, vdaa100, 2020
https://doi.org/10.1093/noajnl/vdaa100
Confusion Matrix
Sensitivity
Specificity
Accuracy
Positive predictive value
Negative predictive value
True positive rate
False positive rate
Hypothesis Testing
Student t test
Log rank test
P-Value
Survival analysis
Kaplan-Meier analysis
Survival curve
C-index
Odds
Odds ratio
Estimation
Confidence interval
Descriptive Statistics
Median
ROC curve
XGBoost
SHAP
Classification
Radiomic Features Distinguish Pancreatic Cancer from Non-cancerous Pancreas.
Radiology: Imaging Cancer, Vol. 3, No. 4, 2021.
https://doi.org/10.1148/rycan.2021210010
Confusion Matrix
Sensitivity
Specificity
Accuracy
Descriptive Statistics
Median
IQR
Mean
Standard deviation
ROC Curve
AUC
Hypothesis Testing
DeLong Test
Fisher's Exact Test
Mann-Whitney U Test
P-value
Inflated type I error
Estimation
Confidence Interval
Probability Distribution
Binomial distribution
Student t distribution
XGboost
Segmentation
Classification
Pancreatic cancer detection on CT scans with deep learning: a nationwide population-based study
Radiology, Vol. 306, No.1, pp. 172-182, 2023.
https://pubs.rsna.org/doi/full/10.1148/radiol.220152
Confusion Matrix
Sensitivity
Specificity
Accuracy
Hypothesis Testing
McNemar Test
Fisher's Exact Test
Mann-Whitney U Test
P-value
ROC Curve
AUC
Estimation
Confidence interval
Likelihood
Likelihood ratio
Descriptive Statistics
IQR