Skip to main content

Table 3 Overview of 6 machine-learning model analysis on all 345 features in multiple classification

From: Machine-learning to stratify diabetic patients using novel cardiac biomarkers and integrative genomics

Model

Training

Training (StDev)

Testing

Testing (StDev)

F1 score

Important features

Important feature bias

LR

0.333

0.207

0.444

0.0

0.430

Complex V, CpG35, BMI, CpG38, CpG18, CpG40, CpG19, CpG23, Complex IV, CpG25

(− 2.417), (− 2.214), (1.942), (− 1.541), (− 1.313), (− 0.994), (− 0.881), (− 0.824), (− 0.812), (0.8071)

LDA

0.433

0.178

0.333

0.0

0.170

SNP11167, SNP10506, SNP16309, SNP16343, SNP2294, SNP14139, SNP16162, SNP3672, SNP8642, SNP143

(− 4.623E+14), (− 4.623E+14), (− 4.623E+14), (− 4.623E+14), (− 4.623E+14), (− 4.623E+14), (− 4.623E+14), (− 4.623E+14), (− 4.623E+14), (5.779E+13)

KNN

0.358

0.239

0.444

0.0

0.450

NA

NA

NB

0.425

0.243

0.778

0.0

0.780

Methyltransferase, Mito 5hmC, Nuc 5 hmC

(0.000), (1.000), (2.000)

SVM

0.442

0.163

0.556

0.0

0.520

Complex V, BMI, Complex III, Complex I, Complex IV, CpG31, Age, CpG19, CpG22, CpG6

(− 0.943), (0.754), (0.561), (− 0.383), (− 0.344), (0.307), (− 0.287), (− 0.268), (− 0.210), (0.198)

CART

0.660

0.257

0.556

0.0

0.558

CpG24, TFAM CpG, TFAM Non-CpG, BMI, SNP94, Complex IV, SNP8557, CpG7, SNP242, SNP13722, Complex III, Mito 5mC

(0.328%), (0.206%), (0.176%), (0.137%), (0.016%), (0.045%), (0.016%), (0.016%), (0.016%), (0.016%), (0.016%), (0.016%)

  1. Model analysis was conducted five times and averages are reported for the resulting training accuracy, training standard deviation, testing accuracy, testing standard deviation, and F1 score. Important biomarker features associated with each trained model are provided along with the associated influence value for each feature. Important features are listed in order of influence within the model. LR, LDA, SVM feature bias exists as an influence parameter where magnitude dictates feature influence. A positive influence value indicates the biomarker favors classification towards one label while a negative value indicates favorable classification of the opposite label. The larger the magnitude, the more strongly that feature shifts classification. NB feature influence indicates the most important biomarker per class in multiple (0,1,2) classification schemes. CART feature bias percentages indicate feature influence on the created classification tree. Larger percentages indicate a feature that arises near the beginning of a tree before subsequent branching. Influence is not provided for KNN due to model restrictions