Skip to main content

Table 2 Overview of 6 machine-learning model analysis on all 345 features in binary classification

From: Machine-learning to stratify diabetic patients using novel cardiac biomarkers and integrative genomics

Model

Training

Training (StDev)

Testing

Testing (StDev)

F1 score

Important features

Important feature bias

AUC

LR

0.608

0.301

0.667

0.0

0.640

Complex III, Complex I, CpG31, CpG28, CpG30, Complex IV, CpG8, CpG4, CpG12, Age

(− 2.688), (− 1.688), (1.648), (− 1.163), (− 1.016), (0.982), (0.945), (0.887), (0.882), (0.848)

NA

LDA

0.567

0.203

0.556

0.0

0.400

SNP16245, SNP16344, SNP151, SNP5463, SNP4295, SNP13722, SNP94, SNP15884, SNP9055, SNP477

(− 3.896E+15), (− 3.896E+15), (− 3.896E+15), (− 3.896E+15), (− 2.719E+15), (− 2.719E+15), (3.398E+14), (3.398E+14), (3.398E+14), 0.266

0.700

KNN

0.642

0.239

0.444

0.0

0.430

NA

NA

0.600

NB

0.725

0.227

0.778

0.0

0.780

Mito 5hmC, Methyltransferase

(1.000), (0.000)

0.775

SVM

0.583

0.337

0.667

0.0

0.640

Complex III, CpG31, Complex I, CpG28, CpG8, CpG22, CpG12, CpG29, CpG4, CpG35

(− 0.732), (0.488), (− 0.443), (− 0.372), (0.350), (− 0.349), (0.322), (− 0.260), (0.259), (0.257)

NA

CART

0.790

0.209

0.711

0.1

0.714

CpG 24, CpG 28, Nuc 5mC, CpG11, CpG23, CpG1, CpG4

(0.587%), (0.213%), (0.040%), (0.040%), (0.040%), (0.040%), (0.040%)

0.715

  1. Model analysis was conducted five times and averages are reported for the resulting training accuracy, training standard deviation, testing accuracy, testing standard deviation, F1 score, and area under the curve (AUC). Important biomarker features associated with each trained model are provided along with the associated influence value for each feature. Important features are listed in order of influence within the model. LR, LDA, SVM feature bias exists as an influence parameter where magnitude dictates feature influence. A positive influence value indicates the biomarker favors classification towards one label while a negative value indicates favorable classification of the opposite label. The larger the magnitude, the more strongly that feature shifts classification. NB feature influence indicates the most important biomarker per class in binary (0,1) classification schemes. CART feature bias percentages indicate feature influence on the created classification tree. Larger percentages indicate a feature that arises near the beginning of a tree before subsequent branching. Influence is not provided for KNN due to model restrictions