An innovative model for predicting coronary heart disease using triglyceride-glucose index: a machine learning-based cohort study

Background Various predictive models have been developed for predicting the incidence of coronary heart disease (CHD), but none of them has had optimal predictive value. Although these models consider diabetes as an important CHD risk factor, they do not consider insulin resistance or triglyceride (TG). The unsatisfactory performance of these prediction models may be attributed to the ignoring of these factors despite their proven effects on CHD. We decided to modify standard CHD predictive models through machine learning to determine whether the triglyceride-glucose index (TyG-index, a logarithmized combination of fasting blood sugar (FBS) and TG that demonstrates insulin resistance) functions better than diabetes as a CHD predictor. Methods Two-thousand participants of a community-based Iranian population, aged 20–74 years, were investigated with a mean follow-up of 9.9 years (range: 7.6–12.2). The association between the TyG-index and CHD was investigated using multivariate Cox proportional hazard models. By selecting common components of previously validated CHD risk scores, we developed machine learning models for predicting CHD. The TyG-index was substituted for diabetes in CHD prediction models. All components of machine learning models were explained in terms of how they affect CHD prediction. CHD-predicting TyG-index cut-off points were calculated. Results The incidence of CHD was 14.5%. Compared to the lowest quartile of the TyG-index, the fourth quartile had a fully adjusted hazard ratio of 2.32 (confidence interval [CI] 1.16–4.68, p-trend 0.04). A TyG-index > 8.42 had the highest negative predictive value for CHD. The TyG-index-based support vector machine (SVM) performed significantly better than diabetes-based SVM for predicting CHD. The TyG-index was not only more important than diabetes in predicting CHD; it was the most important factor after age in machine learning models. Conclusion We recommend using the TyG-index in clinical practice and predictive models to identify individuals at risk of developing CHD and to aid in its prevention. Supplementary Information The online version contains supplementary material available at 10.1186/s12933-023-01939-9.


Introduction
CHD is a major public health challenge and contributes to the global disease burden.Despite improved prevention methods and treatment techniques [1,2], it is still the leading cause of morbidity and mortality worldwide, representing 32% of all deaths [3], and an enormous stress on the national health finances [4,5].Thus, CHD risk assessment is a global public health priority.
A better prediction of CHD may be possible by considering insulin resistance, which occurs years or even decades before diabetes [18].Previous Mendelian randomized analyses, systematic reviews, and meta-analyses have advocated the association between insulin resistance and CHD by altering vascular wall responses for insulin and promoting atherosclerosis [19][20][21].The hyperinsulinemic-euglycemic clamp test is the gold standard of insulin resistance measurement, but it is not applicable in clinical studies because of its invasive, complicated, and expensive protocol [22,23].Another validated index is the homeostasis model assessment of insulin resistance (HOMA-IR) calculated by dividing serum glucose by insulin concentrations.Circulating insulin concentration is not routinely measured in primary care.Moreover, it has limited value in subjects receiving subcutaneous insulin.Therefore, HOMA-IR is not a suitable index for primary prevention strategies [23].The TyG-index is a logarithmized product of FBS and TG.It has been shown to correlate highly with the hyperinsulinemiceuglycemic clamp and HOMA-IR [24].Moreover, it is a simple, low-cost protocol that can be used in all subjects regardless of their insulin treatment status [23].Additionally, it contains TG, another risk factor for CHD [25,26] as indicated by several studies; nonetheless, it has not been considered in previous models [6][7][8][9][10][11][12][13].Therefore, it seems sensible to modify these models with the TyGindex and then evaluate their effectiveness.
Machine learning algorithms have been demonstrated to be extremely useful in predicting cardiovascular disease [27].Their ability to capture complex interactions and nonlinear relationships between variables and outcomes makes them superior to standard statistical models [28].Several studies have shown that machine learning algorithms outperform traditional models [29][30][31].Despite this, no study has explored the impact of TyG-index on the prediction of CHD through machine learning.For these reasons, machine learning models should be chosen to fully assess how TyG-index and diabetes impact and interact with other variables when predicting CHD.
In view of the above, the primary objective of the current study was to investigate the association between the TyG-index and CHD in a 10-year prospective cohort study.The ultimate objective was to modify standard CHD predictive models through machine learning to determine whether the TyG-index functions better than diabetes as a CHD predictor.

Study population
This cohort study was conducted using data from Yazd Healthy Heart Project (YHHP) a population-based epidemiological study evaluating cardiovascular diseases and metabolic disorders [32].
In YHHP, 100 clusters and 20 families from each cluster were defined, and one adult (aged 20-74 years) from each family was randomly selected for participation and evaluation in the first phase conducted in 2005-2006 (n = 2000, men = 1000, women = 1000) [32].

Included participants
From the 2000 participants, 17 were omitted from the study due to loss during the second phase; from the 1983 individuals participating in the baseline examination, 62 were excluded due to diagnosis of CHD at baseline, 78 due to death during the study, and 308 due to missing data.The remaining 1552 participants (804 men, mean age 48.6 ± 14.7 years) were included in the present study (Fig. 1).

Ethical approval
The present study was approved by the Shahid Sadoughi University of Medical Sciences ethics committee (ethics code: IR.SSU.REC.1401.069)and conducted based on the Declaration of Helsinki on medical research [33].Informed consent was obtained from study participants during the initial and follow-up phases.The present research is reported based on strengthening the reporting of observational studies in the epidemiology (STROBE) statement [34].

Anthropometric and blood pressure measurements
Height was measured in both phases using a stadiometer fixed on a wall with no dents or bumps.While the participants were standing barefoot, their heels, hips, shoulders, and head touching the wall, and their head fixed horizontally to the nearest 0.5 cm.Participants were weighed to the nearest 0.1 kg in the first phase using a digital scale (Seca, Germany) with minimal clothing and in the second phase using another digital scale (Model BF511, Omron Co. Karada body scan, Osaka, Japan).The superior border of the iliac crest and widest part of the buttock were considered to measure waist and hip circumferences, respectively, to the nearest 0.1 cm using a non-stretchable tape.
An automatic digital blood pressure monitor (Omron, M6 comfort, Osaka, Japan) was used to measure blood pressure of the participants' right arms, while they were in the sitting position.Blood pressure measurements were taken by a trained nurse twice, with an interval of 5 min [32].

Data collection
Data including demographic features, education, physical activity, smoking habits, family history of premature CHD, and dietary habits were collected by completing questionnaires.
Trained interviewers completed questionnaires to assess physical activity, educational attainment, dietary habits and smoking status, in the first phase of the study.For educational attainment, participants were categorized as having a primary, high school, or academic education.Physical activity was assessed using the International Physical Activity Questionnaire (IPAQ) [35].Participants were categorized as having low, moderate, or vigorous level of activity if their activity was < 600, 600-1200, or > 1200 kilocalories/week, respectively.Participants were divided into groups of smokers or nonsmokers based on their current smoking status.CHD occurrence in either father or brother less than 45 years of age, or mother or sister less than 55 years of age was defined as a family history of premature CHD [32].A questionnaire was used to determine the use of fried foods, salt, removing poultry skin, eating out, meat consumption, and removing fat from meat.
CHD events were defined as occurrences of fatal or non-fatal CHD, myocardial infarction (MI), percutaneous coronary intervention (PCI), coronary artery bypass grafting (CABG), and new angina.The diagnosis of new angina was based on positive findings from the Rose angina questionnaire [36] in addition to positive electrocardiogram changes, elevated cardiac enzymes, and positive exercise tolerance test or coronary artery angiogram.
The time of outcome for fatal or non-fatal CHD, MI, CABG, positive exercise test, positive cardiac enzymes, and PCI was determined based on medical records.All Rose angina questionnaires [36] and electrocardiograms were investigated by an expert medical practitioner.

Statistical analysis
Statistical analyses were performed with SPSS version 24.0 (IBM Corp., Armonk, NY, USA), Python 3, and R version 4.2.2 (www.R-proje ct.org).Continuous variables were described as mean ± standard deviation (SD) and compared by independent T-test or ANOVA.Categorical variables were described as numbers (percentage) and compared using chi-square tests.
The TyG-index, the primary exposure variable of interest, was defined as: and analysed as quartiles based on sex-specific distributions and as continuous measures.Multivariable Cox proportional hazard models were used to estimate the risk of CHD development.Four models were evaluated: model I was adjusted for age and sex; model II was further adjusted for physical activity, education, family history of premature CHD, and smoking; model III was further adjusted for total cholesterol, HDL, body mass index (BMI), waist-to-hip ratio, blood pressure, SUA, and LDL; and, model IV was further adjusted for consuming fried foods, adding salt, removing poultry skin, using high fat dairy products, dining out, meat consumption, and removing fat from meat.Finally, medication use was adjusted in our models for investigating whether it could modify the association.
The "OptimalCutpoints" [37] R package was used to assess TyG-index cut-off points that can predict CHD.We stratified these cut-points based on sex and diabetes status.
In accordance with previous studies [31,38], we selected several machine-learning models to construct CHD-prediction models (logistic regression, decision tree, random forest, K nearest neighbor (KNN), and SVM).To simulate previous standard CHD predictor models, we investigated the literature and selected the common components between Framingham risk scores [6], SCORE CVD death risk score [7], QRISK risk calculator [12], Reynolds CVD risk score [8], ACC/AHA pooled cohort hard CVD risk calculator [9], JBS3 risk score [10], MESA risk score [11], and China-PAR risk predictor [13].As a result of these investigations, age, sex, blood pressure, total cholesterol, HDL, waist-tohip ratio, diabetes, smoking status, and family history of premature heart disease were considered in simulating a standard CHD prediction model.As part of the preprocessing of data, all missing values and evaluated outliers and highly correlated features were excluded.Because of imbalanced outcome data (14.5% incidence), we used SMOTE (over-sampling method) [39], which Tg mg/dL × fasting glucose mg/dL 2 has been proven reliable for CHD [38].After standardizing continuous variables and randomly splitting data into 70/30, we trained models on the larger part of the dataset and evaluated their performance on the smaller part.Afterward, we modified our dataset, by substituting the TyG-index for diabetes, and repeated the previous steps.For demonstrating the comparison of true positive, true negative, false negative, and false positive values of models, we used confusion matrices.We chose to use different color spectra to help illustrate the comparison, and make it easier to understand.To report model performance we calculated area under the curves (AUC), sensitivity, specificity, Cohen-kappa score Matthew's correlation coefficient, and F1-score.We used the generally accepted AUC index [31] and DeLong test [40] to compare the performance of these models.In order to make sense of machine learning models and counter the black box character of machine learning models, we used the "Dalex" library [41] to determine how much the performance of a model changes when a selected explanatory variable is removed.

Results
Additional file 1: Table S1 summarizes the baseline characteristics of the study participants according to the follow-up process.Participants lost to follow-up were significantly older and less frequently male than participants who completed the follow-up.
Additional file 1: Table S2 explains the baseline characteristics of the study participants based on their gender.
The baseline characteristics of participants according to TyG-index quartiles are presented in Table 1.Participants in the highest quartile of serum TyG-index levels (TyG-index > 9.32) were older and had higher total cholesterol, TG, SUA, and fasting blood glucose levels, higher diabetes rates, blood pressure and anthropometric indices, lower HDL levels, and less education.

TyG-index and incidence of CHD
The overall incidence of new-onset CHD in the second visit was 14.5%.The incidence of CHD was 6.4%, 11.1%, 14%, and 26% in quartiles 1 to 4, respectively.
When stratifying for gender, the association between TyG-index and risk of CHD in men was no longer significant after adjusting for laboratory markers and dietary patterns, yet it was still significantly associated with CHD in women: HR 4.65 (1.34-16.1)for Q4 compared to Q1. Diabetes medications confounded the association between TyG-index and CHD but dyslipidaemia treatment did not.A TyG-index higher than 9.07 in women and 8.92 in men had the highest sensitivity and specificity simultaneously for predicting CHD (Table 3).
Table 4 shows the statistical functions, as well as the confusion matrices for predicting models consisting of true positive, false positive, true negative, and false negative values.Random forest models had the highest sensitivity and specificity.A significant improvement was seen in the SVM model after modification with the TyG-index.Other models showed no significant changes.In Fig. 2, all the components of these models are compared in terms of their impact on prediction.Eliminating diabetes decreased AUC

Discussion
The results of this prospective cohort study in a community-based Iranian population followed for 9.9 years indicate that higher a TyG-index is associated with a higher risk of CHD.This association was more evident in females.Additionally, TyG-index outperformed diabetes in CHD prediction models.

CHD and TyG-index association
An association between the TyG-index and CHD was previously confirmed in both observational [23,[42][43][44][45][46][47][48][49] and meta-analyses studies [19,50,51], but the inconsistency in predictive values, the incompleteness of confounding factors (especially diet and medications), and the need to investigate the association in non-diabetic patients in observational studies and heterogeneity in meta-analyses prompted the current study [19].Previous studies have suggested TyG-index cutoff points of 9 and 9.323 for preventing CHD [52].The results of the current study will aid healthcare providers in our region to screen their patients for a TyG-index of ≥ 8.42, which our results showed as having the highest negative predictive value, and to consider pharmacological treatment for values of ≥ 9.28, which had the highest positive predictive value in the current study, and to control those under 8.99, which had the highest sensitivity and specificity simultaneously.

Mechanisms
FBS and TG are reflections of insulin resistance in the liver and adipocytes, respectively [53].A combination of these two factors, the TyG-index demonstrated 96.5% sensitivity and 85% specificity for detecting insulin resistance, a better performance than that of HOMA-IR [51].Resistance to insulin can trigger inflammatory processes, lipid metabolism deregulations, sympathetic nervous system over-activation, endothelial dysfunction, and eventually, thrombosis and CHD [43,45,46,51,[54][55][56][57].Therefore, the TyG-index can serve as a simple, practical, cost-effective, reproducible, and reliable surrogate marker for insulin resistance measurement in CHD prevention plans [54].

TyG-index and gender
Studies have shown that the TyG-index plays a significant role in CHD incidence in women [42,43,45,46,54,58,59].Nonetheless, one study reported a greater role in men [60], and another found no differences between genders [55].The current study found an association in both genders which persisted only in women after multivariable adjustment.This finding may be explained by the fact that nearly half of the female participants were over 50 years of age and susceptible to menopause at the baseline.Insulin resistance and higher CHD risks can occur after menopause because of decreasing estrogen levels [45,46,54,55,59].Furthermore, the TyG-index was an independent risk factor for CHD until model II in nondiabetic participants.The lack of association in diabetic participants may have been due to lifestyle changes and medication consumption during the 10 years of followup [61].Our analysis showed that diabetes treatment made the association non-significant.The first line of diabetes treatment is metformin which can decrease insulin resistance [62], confirming the insignificant association between the TyG-index and CHD in diabetic participants.

Prediction of CHD based on TyG-index
Previous studies have suggested that the TyG-index predicts cardiovascular events more accurately than hemoglobin A 1 c [23].In addition, several studies have implicated that adding the TyG-index to the Framingham risk score can increase its predictive power [48,49] Previous studies concluded that SVM and random forest were the most effective model for predicting CHD [38,63,64], the current study found that random forest achieved the highest AUC.In both random forest and SVM models, diabetes played no role, while the TyG-index was the second most influential component.The current study found that the use of the TyGindex instead of diabetes in machine learning models can significantly improve the predictive power of CHD predicting models.Machine learning models demonstrated that the TyG-index was not only more important than diabetes in predicting CHD, but it also was the most important factor after age.To the best of our knowledge, the TyG-index is not used in any clinical guideline [19], but the American Diabetes Association (ADA) suggested in 2022 that patients with elevated TG levels (≥ 150 mg/dL [1.7 mmol/L]) should implement enhanced lifestyle interventions and optimal glycemic control [65].Our findings advocate the inclusion of the TyG-index in future CHD prevention guidelines.

Strengths and limitations
The following strengths of the current study should be noted.This study is the first to evaluate the predictive power of TyG-index in CHD using machine-learning techniques.To the best of our knowledge, the optimal cut-off points had not previously been evaluated in the Iranian population.The community-based prospective nature of our study and definite outcome determination minimize the chance of reverse causation and recall bias.Including both old and young populations was another advantage the current study had over others, as most previous studies recruited middle-aged and older adults.Furthermore, the current study attempted to ameliorate the adjustment of confounders by adding family history of premature CHD, medication use, dietary habits, complete lipid profile components and all anthropometric features to our models.The long follow-up time in the present study acts as a doubleedged sword; indeed, it can reflect the lifetime risk of CHD, but on the other hand, our inability to evaluate and control voluntarily health check-ups or lifestyle changes during the ten-year study period may have affected our findings.Compared to previous studies, we had an identical method for defining of CHD by investigating ECGs, cardiac enzymes, using the Rose angina questionnaire, exercise tolerance test, and coronary artery angiogram.This study had several limitations.First, it was embedded in an observational setting, and despite a wide range of adjustments, we cannot rule out the possibility of unmeasured confounders.Single baseline TyG-index investigation may incline our results to intra-individual variation.Second, we may have observed gender-specific results due to the lack of data on menopausal status.Third, only Iranian subjects were included, so our findings might not be generalizable to other countries.

Conclusion
The TyG-index can be used in clinical practice and predictive models as a highly valuable index for predicting and preventing CHD, but further studies are needed to validate our findings.

Fig. 1
Fig. 1 Flow diagram of participants attending the 10-year follow-up study

Fig. 2
Fig. 2 Impact of different components of machine learning models on predicting CHD

Table 1
Baseline clinical characteristics and biological variables of the participants according to serum TyG-index a quartiles model, TyG-index removal decreased AUC from 1 to 22%.The current study showed that the TyG-index was much better than diabetes in predicting CHD; overall, it was the second most important factor after age.

Table 2
Risk of CHD a according to quartiles of TyG-index b , overall and stratified by gender Results are expressed as hazard ratio and (95% CI c ). Model I: adjusted for age and sex; Model II: Adjusted for age, sex, smoking, physical activity, education, and family history; Model III: model II plus SUA d , HDL e , total cholesterol, BMI f , Waist to hip ratio, SBP g , DBP h , LDL i ; Model IV: model III plus using fried food, adding salt, removing poultry skin, using high fat dairy products, dining out, meat consumption and removing its fat a Coronary heart disease b Triglyceride-glucose index c Confidence interval d Serum uric acid e High-density lipoprotein f Body mass index g Systolic blood pressure h Diastolic blood pressure i Low density lipoprotein

Table 3
TyG-index a cut-off points Negative diagnostic ratio value; a particular value for the Negative Diagnostic Likelihood Ratio a Triglyceride-glucose index * Positive diagnostic ratio; a particular value for the Positive Diagnostic Likelihood Ratio *

Table 4
Comparison of the primary and TyG-index a -modified* versions of CHD b predictive models using machine learning primary models were designed based on age, sex, blood pressure, smoking status, total cholesterol, HDL c , waist/hip ratio, family history of CHD and diabetes.In modified models diabetes was replaced by TyG-index b Coronary heart disease c High-density lipoprotein d Area under the curve e Support vector machine f K nearest neighbor*