Skip to main content
Fig. 2 | Cardiovascular Diabetology

Fig. 2

From: Machine learning in precision diabetes care and cardiovascular risk prediction

Fig. 2

Discrimination, calibration, and net clinical benefit. The comprehensive evaluation of a predictive model requires the simultaneous evaluation of its discrimination, calibration, and incremental value beyond the current standard-of-care. A The area under the receiver operating characteristic curve (AUROC) reflects the trade-off between sensitivity (true positive rate) and specificity (1-false positive rate) at different thresholds and provides a measure of separability, in other words the ability of the model to distinguish between classes (0.5 = no separation, 1 = perfect separation). B Models with similar AUROC may exhibit different behavior when the prevalence of the label varies. The precision–recall curve demonstrates the trade-off between the positive predictive value (precision) and sensitivity (recall), and illustrates how the area under the curve may vary substantially as the prevalence of the label of interest decreases from 50 to 5%. C Models with similar AUROC may also differ in their calibration. A model with good calibration (i.e. blue line) makes probabilistic predictions that match real world probabilities. On the other hand, the model shown in orange underestimates and overestimates risk at lower and higher prediction thresholds, respectively. D Finally, models should be compared against established standard-of-cares while incorporating clinical consequences and comparing the net clinical benefit across varying risk levels to established or no risk stratification approaches. Curves were generated using synthetic datasets for illustration purposes

Back to article page