Development of predictive risk models for major adverse cardiovascular events among patients with type 2 diabetes mellitus using health insurance claims data

Young, James B.; Gauthier-Loiselle, Marjolaine; Bailey, Robert A.; Manceur, Ameur M.; Lefebvre, Patrick; Greenberg, Morris; Lafeuille, Marie-Hélène; Duh, Mei Sheng; Bookhart, Brahim; Wysham, Carol H.

doi:10.1186/s12933-018-0759-z

Original investigation
Open access
Published: 24 August 2018

Development of predictive risk models for major adverse cardiovascular events among patients with type 2 diabetes mellitus using health insurance claims data

James B. Young¹,
Marjolaine Gauthier-Loiselle ORCID: orcid.org/0000-0002-9439-595X²,
Robert A. Bailey³,
Ameur M. Manceur²,
Patrick Lefebvre²,
Morris Greenberg⁴,
Marie-Hélène Lafeuille²,
Mei Sheng Duh⁴,
Brahim Bookhart³ &
…
Carol H. Wysham⁵

Cardiovascular Diabetology volume 17, Article number: 118 (2018) Cite this article

5631 Accesses
34 Citations
Metrics details

Abstract

Background

There exist several predictive risk models for cardiovascular disease (CVD), including some developed specifically for patients with type 2 diabetes mellitus (T2DM). However, the models developed for a diabetic population are based on information derived from medical records or laboratory results, which are not typically available to entities like payers or quality of care organizations. The objective of this study is to develop and validate models predicting the risk of cardiovascular events in patients with T2DM based on medical insurance claims data.

Methods

Patients with T2DM aged 50 years or older were identified from the Optum™ Integrated Real World Evidence Electronic Health Records and Claims de-identified database (10/01/2006–09/30/2016). Risk factors were assessed over a 12-month baseline period and cardiovascular events were monitored from the end of the baseline period until end of data availability, continuous enrollment, or death. Risk models were developed using logistic regressions separately for patients with and without prior CVD, and for each outcome: (1) major adverse cardiovascular events (MACE; i.e., non-fatal myocardial infarction, non-fatal stroke, CVD-related death); (2) any MACE, hospitalization for unstable angina, or hospitalization for congestive heart failure; (3) CVD-related death. Models were developed and validated on 70% and 30% of the sample, respectively. Model performance was assessed using C-statistics.

Results

A total of 181,619 patients were identified, including 136,544 (75.2%) without prior CVD and 45,075 (24.8%) with a history of CVD. Age, diabetes-related hospitalizations, prior CVD diagnoses and chronic pulmonary disease were the most important predictors across all models. C-statistics ranged from 0.70 to 0.81, indicating that the models performed well. The additional inclusion of risk factors derived from pharmacy claims (e.g., use of antihypertensive, and use of antihyperglycemic) or from medical records and laboratory measures (e.g., hemoglobin A1c, urine albumin to creatinine ratio) only marginally improved the performance of the models.

Conclusion

The claims-based models developed could reliably predict the risk of cardiovascular events in T2DM patients, without requiring pharmacy claims or laboratory measures. These models could be relevant for providers and payers and help implement approaches to prevent cardiovascular events in high-risk diabetic patients.

Background

Type 2 diabetes may cause complications of microvascular origin, including nephropathy, neuropathy, and retinopathy, or macrovascular origin, including peripheral artery disease and cardiovascular disease (CVD) [1, 2]. Although diabetes clinical practice guidelines are intended to reflect consensus and evidence-based best medical practices, different entities have some conflicting recommendations, and providing high-quality and detailed guidelines for specific patient subgroups remains challenging [3]. For example, relative to non-diabetic patients, patients with type 2 diabetes have a two- to threefold higher risk of suffering from a CVD event, including a higher risk of myocardial infarction (MI), stroke, unstable angina, and congestive heart failure [4,5,6,7], and a higher rate of CVD-related death [8]. Therefore, certain patients with type 2 diabetes could benefit from specialized care that both improve glycemic control and mitigate the risk of CVD.

Thus, having reliable tools making use of readily available data to predict the risk of cardiovascular events among patients with type 2 diabetes may allow healthcare resources to be directed towards patients at high risk, and help healthcare providers meet new quality standard of care. In fact, in 2016, the National Committee for Quality Assurance (NCQA) implemented a new Healthcare Effectiveness Data and Information Set (HEDIS) performance measure based on the rates of hospitalization for potentially preventable complications [9]. More specifically, this measure, which is used by over 90% of health plans in the US [9], targets, among other complications, diabetes short- and long-term complications, including CVD events leading to hospitalization [10]. This means that higher rates of adverse cardiovascular events among patients with type 2 diabetes may negatively affect the NCQA ratings of healthcare providers. Moreover, given the high costs incurred by patients with both CVD and diabetes [11], using such tool efficiently may translate into significant cost savings.

Several of the predictive CVD risk models that have been developed for the general population include diabetes as a risk factor, with models derived from the Framingham Heart Study being among the most well-known [12,13,14]. Scores based on the Framingham risk models assign weights to risk factors in order to predict cardiovascular events separately for men and women. Risk factors identified for CVD include older age, smoking status, treated and untreated systolic blood pressure, total cholesterol and high-density-lipoprotein cholesterol levels, and diabetes [12,13,14]. However, the Framingham risk models were not developed for patients with diabetes, and were shown to systematically underestimate CVD risk in this population [15]. In fact, the characteristics of patients enrolled in the Framingham study may differ from real-world populations with diabetes in several ways, including the proportion of minorities, socioeconomic determinants of health, and comorbidity burden [16]. Thus, other risk models have been developed for this population, but all of them rely on data from medical records [17,18,19,20,21,22,23]. For example, risk models derived from the United Kingdom Prospective Diabetes Study (UKPDS) identified several risk factors that cannot be used as quantitative predictors using health insurance claims, such as duration of type 2 diabetes, glycated hemoglobin (HbA1c) levels, systolic blood pressure, and cholesterol/high-density lipoprotein ratio [21, 23]. Similarly, the ADVANCE study identified age at diabetes diagnosis, known duration of diabetes, pulse pressure, treated hypertension, HbA1c, urinary albumin/creatinine ratio, and non-HDL cholesterol among risk factors for CVD events; these risk factors cannot be assessed using health insurance claims [22]. Consequently, these models cannot be used to predict CVD risk by entities, like payers, that do not have access to information derived from medical records or laboratory results.

As the face of healthcare provision changes and population management evolves, entities such as public and private payers are moving toward a capitated system of reimbursement, with payments made based on value rather than volume of care. It is thus important for both payers and providers to be able to assess the risks in a given population. Therefore, a CVD risk assessment tool based solely on accessible medical data such as health insurance claims would be relevant for payers to help identify patients with type 2 diabetes at high risk of CVD events. In fact, rationally allocating resources towards these patients by, for example, including CVD risk models in a tool made available to healthcare providers may result in reduced morbidity, mortality, and cost savings. Thus, this study aimed to develop new predictive models and assess their performance in predicting the risk of cardiovascular events in patients with type 2 diabetes based solely on information available in medical health insurance claims. More specifically, models were developed for patients without prior CVD events (hereinafter referred to as the primary prevention population) and for patients with prior CVD events (hereinafter referred to as the secondary prevention population).

Methods

Study design

A retrospective observational study design was used to model the risk of CVD events in patients with type 2 diabetes (Additional file 1). The index date was defined as a randomly selected date among those with a diagnosis of type 2 diabetes (International Classification of Diseases, 9th Revision, Clinical Modification [ICD-9-CM]: 250.x0 and 250.x2, International Classification of Diseases, 10th Revision, Clinical Modification [ICD-10-CM]: E11.xxx) followed by ≥ 13 months of continuous healthcare plan enrollment. The random selection enabled us to capture a representative sample of patients from a real-world setting with various disease duration. Risk factors for cardiovascular events were assessed during the baseline period, defined as the first 12 months following the index date. Cardiovascular events were monitored during the subsequent at-risk period, which was required to last ≥ 1 month and spanned from the end of the baseline period until the earliest among (i) end of data availability, (ii) end of continuous healthcare plan enrollment, or (iii) death. For each study outcome, the at-risk period was censored at the first occurrence of a given study outcome (see study outcomes section for more details).

Data source

The Optum™ Integrated Real-World Evidence Electronic Health Records and Claims database (Optum database), which combines de-identified electronic medical records and insurance claims, was used to develop and validate the risk models (October 1, 2006–September 30, 2016). This database comprises information on demographics, medical history, and diagnoses for all types of medical encounters (i.e., intensive care unit, emergency department [ED], ward, etc.), in-hospital procedures and medication administrations, prescriptions, laboratory results, and date of death. The database is de-identified and fully compliant with the patient confidentiality requirements of the Health Insurance Portability and Accountability Act (HIPAA).

Study population

Patients ≥ 50 years with ≥ 1 recorded diagnosis for type 2 diabetes (i.e., ICD-9-CM: 250.x0, and 250.x2; ICD-10-CM: E11.xxx) were included in the study. Patients were required to have ≥ 13 months of continuous eligibility in their healthcare plan after the index date. Patients were excluded if they had ≥ 1 recorded diagnosis for type 1 or gestational diabetes mellitus (i.e., ICD-9-CM: 250.x1, 250.x3, and 648.8x; ICD-10-CM: E10.xxx, O24.4xx, and O99.81x). Moreover, given the growing evidence suggesting that these medications may mitigate cardiovascular risk, to avoid potential confounding, patients were further excluded if they had ≥ 1 prescription fill for a sodium glucose co-transporter 2 (SGLT2) inhibitor or a glucagon-like peptide-1 (GLP-1) receptor agonist at any time during the study period [24,25,26,27].

The study population was further stratified into the primary and secondary prevention populations based on whether patients had ≥ 1 diagnosis for any cardiovascular events of interest (see below) in any setting (i.e., inpatient [IP], ED, or outpatient) prior to the at-risk period.

Study outcomes

Study outcomes included (1) any major adverse cardiovascular event (MACE), which comprised non-fatal MI, non-fatal stroke, and CVD-related death (defined below), (2) any MACE, hospitalization for unstable angina, or hospitalization for congestive heart failure; hereinafter referred to as MACE-plus, and (3) CVD-related death, defined as a death occurring within 30 days after a diagnosis for MI, stroke, unstable angina, heart failure, sudden cardiac arrest, cardiogenic shock, other cerebrovascular events, or other cardiovascular events recorded in a medical claim in any setting (Additional file 2 for ICD codes).

Of note, because it was not possible to determine whether diagnoses for MI or stroke recorded in outpatient settings were actual cardiovascular events or follow-up visits for which the diagnosis was recorded for billing purposes, only diagnoses recorded in an ED or IP settings were considered in the risk models; diagnoses could be recorded in any position.

Statistical analyses

Distinct predictive risk models were developed for the primary and secondary prevention populations for each of the three study outcomes. A split sample approach was used: The primary and secondary prevention populations were each randomly split into a training (70% of the sample) and a validation (30% of the sample) set. The training sets were used to develop the predictive models, and the validation sets were used to assess the predictive accuracy of the models.

For the prediction of study outcomes, potential risk factors were derived from the published literature and included age, gender, race, ethnicity, year, region, insurance type, prior cardiovascular events, time since first observed type 2 diabetes diagnosis, number of diabetes-related medical visits, Charlson comorbidity index (CCI) [28], adapted diabetes complications severity index (aDCSI) [29], and recorded diagnosis for selected comorbidities such as hypertension, hyperlipidemia, infections, mental disorders, chronic pulmonary disease, and obesity. Univariate associations between potential risk factors and outcomes were assessed; in order to develop more parsimonious models, risk factors were excluded if the standardized difference between patients with and without a given outcome was below 0.10, or if they were present in less than 0.5% of the sample.

Pooled logistic regression models were developed to relate each candidate risk factor to outcomes at pre-specified time points during the at-risk period. A logistic regression model was selected because it can estimate the probability of an event occurring in an interval of time [30]. More specifically, for each patient, the at-risk period was stratified into windows of 6 months during which the outcomes were assessed. For example, the follow-up of a patient who had MACE 15 months after the beginning of the at-risk period was censored at the occurrence of this outcome and stratified in three windows in the regression model: (1) 0–6 months without MACE, (2) 6–12 months without MACE, and (3) 12–18 months with a MACE. For all windows, risk factors were evaluated at baseline, and indicator variables for each time interval were included in the regression models. The risk factors included in the final risk models were chosen using a stepwise variable selection approach based on Akaike’s Information Criterion, in conjunction with tenfold cross-validation methods within the training set. Further specifications of risk factors were tested and variance inflation factor analysis was used to assess the presence of multicollinearity between risk factors, which resulted in the final models.

The performance of the final risk models was evaluated based on discrimination (i.e., C-statistics) in the training and validation sets [31]. The C-statistic is a measure of the predictive accuracy of a logistic regression, which varies between 0.5 (random discrimination) and 1.0 (perfect discrimination). It corresponds to the area under the receiver operating characteristic (ROC) curve [32]. In order to provide a more comprehensive view of the performance of models based on information derived from medical claims, other models that included risk factors derived from medical claims, pharmacy claims, and medical records and laboratory results were developed.

Results

A total of 181,619 patients with type 2 diabetes were included in the study; 136,544 (75.2%) in the primary prevention population and 45,075 (24.8%) in the secondary prevention population (Fig. 1). Among patients in the training set and in the primary prevention population, the proportions of patients with MACE, MACE-plus, and CVD-related death during the at-risk period were 4.7%, 6.5%, and 1.8%, respectively (Additional file 3). In the secondary prevention population, the same proportions were 16.5%, 24.9%, and 8.2%, respectively (Additional file 3). The median duration of the at-risk period following the index date in the training set of the primary prevention population was 12 months (range 1–109 months), with 5.4% of patients having a follow-up longer than 60 months. The median duration of the at-risk period in the training set of the secondary prevention population was 11 months (range 1–108 months), with 3.9% of patients having a follow-up longer than 60 months.

Patients with a CVD event during the at-risk period were older and had higher aDCSI scores compared to patients without CVD events for both the primary and the secondary prevention populations (primary prevention population: mean age = 72.7 vs. 66.4 years, mean aDCSI = 1.9 vs. 1.1, respectively; secondary prevention population: mean age = 75.0 vs. 71.4 years, mean aDCSI = 4.1 vs. 3.2, respectively; Additional file 3). Most patients (> 75%) had a recorded diagnosis for hypertension and/or hyperlipidemia in both the primary and secondary prevention populations. Moreover, compared to patients without CVD events, patients with a CVD event during the at-risk period were more likely to have a recorded diagnosis for select baseline comorbidities—such as infections (primary prevention population: 53.8% vs. 48.8%; secondary prevention population: 69.1% vs. 61.5%, respectively) chronic pulmonary disease (primary prevention population: 22.7% vs. 15.6%; secondary prevention population: 44.5% vs. 31.4%, respectively), and peripheral vascular disorders (primary prevention population: 19.0% vs. 9.3%; secondary prevention population: 34.3% vs. 26.1%, respectively) (Additional file 3).

Risk models

For the primary prevention population, a total of 12–17 risk factors were included in the models, and most of them were significantly associated with the study outcomes (Table 1). Across all study outcomes, age was the risk factor with the largest impact on the risk of having an event (Table 1). Other risk factors consistently associated with a significantly higher risk of cardiovascular events were recorded diagnosis for other CVD-related conditions (i.e., conditions used to define CVD-related death), diabetes-related hospitalization, higher aDCSI score, recorded diagnosis for chronic pulmonary disease, cancer, fluid and electrolyte disorder, or coagulopathy, and having the baseline period prior to 2011 (Table 1). In addition, hypertension was associated with a higher risk of MACE-plus, while deficiency anemia and pulmonary circulation disorders were associated with a higher risk of CVD-related death (Table 1). Being commercially insured was associated with a lower risk of CVD events for all outcomes, being a female was associated with a lower risk of MACE and CVD-related death, and being Hispanic or Asian was associated with a lower risk of CVD-related death (Table 1).

Table 1 Risk models for MACE in the primary prevention population

Full size table

For the secondary prevention population, 15–20 risk factors were included in the models, and most of them were significantly associated with the study outcomes (Table 2). As for the primary prevention population, older age was the risk factor with the largest impact on the risk of CVD (Table 2). Diabetes-related hospitalization, higher aDCSI score, recorded diagnosis for chronic pulmonary disease or fluid and electrolyte disorders, and having the baseline period prior to 2011 were consistently associated with a significantly higher risk of CVD events (Table 2). In addition, payer type, time since last recorded CVD diagnosis, prior recorded diagnosis for congestive heart failure or iron-deficiency anemia, and ethnicity were identified as predictors of CVD events for all outcomes (Table 2). Prior MI, stroke, and other CVD-related conditions were associated with a higher risk of MACE and MACE-plus, but not of CVD-related death (Table 2). Other risk factors identified for only certain outcomes included race, region, insurance type, recorded diagnosis for mental disorders, obesity, cancer, peripheral vascular disorders, erectile dysfunction, coagulopathy, and pulmonary circulation disorders (Table 2). Interestingly, while being a female was associated with a lower risk of MACE and CVD-related death in the primary prevention population, gender was not associated with an improved predictive accuracy in the secondary prevention population, and thus, was not included as a risk factor in these models (Table 2). Conversely, obesity was not selected as a risk factor in the primary prevention population, whereas it was associated with a lower risk of MACE and CVD-related death in the secondary prevention population.

Table 2 Risk models for MACE in the secondary prevention population

Full size table

The risk models performed well in predicting MACE, MACE-plus, and CVD-related death with C-statistics ranging between 0.70 and 0.81 when considering both the training and validation sets (Tables 1 and 2, Fig. 2). Notably, the highest predictive accuracy was observed for models predicting CVD-related death (Tables 1 and 2; Fig. 2). In addition, the models were well calibrated, with differences between the median predicted risk and median observed risk that did not exceed 0.1% for each of the study outcomes in both the primary and secondary prevention populations (data not shown).

In addition, to further assess the potential impact of using information exclusively derived from medical claims data on performance, predictive models that also included risk factors obtained from pharmacy claims, as well as from medical records and laboratory results were developed. These models included up to 11 additional risk factors, but only showed limited improvements in terms of predictive accuracy, with C-statistics increasing by no more than 0.01 in the training and validation sets for both the primary and secondary prevention populations (data not shown).

Examples

Notably, the risk models can be used to assess CVD risk at different time windows separated by intervals of 6 months over a maximum of 5 years. For instance, the average patient in the primary prevention population—a 67 year old female with an aDCSI score of 1 and recorded diagnosis for hypertension and hyperlipidemia—had a predicted risk of MACE of 1.4% after 1 year, 2.7% after 2 years, and 6.8% after 5 years. The predicted 5-year risk for MACE-plus and CVD-related death were 10.6% and 1.7%, respectively (Table 3: Case 1). For the secondary prevention population, the average patient was a 73 year old male diagnosed with prior congestive heart failure ≥ 12 months ago, other CVD-related conditions, an aDCSI score of 3, recorded diagnosis for hypertension, hyperlipidemia, and infection within the last year. The predicted risk of MACE for that patient were 5.8% after 1 year, 10.5% after 2 years, and 21.8% after 5 years. The predicted 5-year risk for MACE-plus and CVD-related death were 35.2% and 9.9%, respectively (Table 3: Case 2).

Table 3 Predicted risk for the average patient in primary and secondary prevention population

Full size table

Discussion

This study developed and validated models that predict the risk of adverse cardiovascular events in patients with type 2 diabetes using exclusively information derived from health insurance claims. The main risk factors identified in the primary prevention population included age, diabetes-related hospitalizations, and recorded diagnosis for coagulopathy and chronic pulmonary disease. In the secondary prevention population, age, prior CVD diagnoses, diabetes-related hospitalizations, and recorded diagnosis for chronic pulmonary disease had the most important impact on the risk of having a CVD event. Overall, the models reliably predicted the cardiovascular events for the primary and secondary prevention populations, as illustrated by the C-statistics ranging between 0.70 and 0.81.

The finding that age was one of the most important risk factor in predicting cardiovascular events is consistent with findings in previous studies that primarily focused on a diabetes population, such as the UKPDS risk engine [21, 23] and studies that focused on a general population, such as the Framingham Heart Study [12]. However, a major difference between the models developed in the current study and previous ones is that the latter included risk factors derived from laboratory results and medical records [12, 17, 18, 20,21,22], which are often not available to national quality of care organizations and payers. In contrast, the current study used only information that is readily available from medical claims data.

Nonetheless, claims-based information can be used as a proxy for risk factors derived from laboratory results and medical records. For example, blood pressure measurements were not available in claims data, but hypertension-identified based on a recorded diagnosis in a medical claim—was included in the models. Similarly, recorded diagnosis for hyperlipidemia was used as a proxy for high-density lipoprotein cholesterol and low-density lipoprotein cholesterol levels, although it was not included in any models. Yet, certain risk factors identified in the Framingham and UKPDS models tend to be underreported in medical claims, and thus, may have limited predictive accuracy in claims-based models. For example, although diagnosis codes for smoking do exist, this condition is typically underreported in medical claims. Therefore, smoking was not included in any of the claims-based models. However because our study was limited to risk factors available in insurance claims data, certain risk factors identified in other studies were not available for selection in the models. In particular, several studies pointed to a link between glycemic markers and CVD [33,34,35,36,37], but given that HbA1c measures are not available in insurance claims data, this potential risk factor could not be included in the models.

This study also found that obesity was associated with a lower risk of MACE and CVD-related death in the secondary prevention population. Several previous studies found obesity to be associated with better survival in patients with chronic or cardiac diseases, hence the term “obesity paradox” to describe this counterintuitive phenomenon [38]. Several explanations have been proposed, including the advantages of fat reserves during illness, biases or confounding in observational studies (e.g., more intensive management), or weight loss due to illness in the reference group [39]. However, due to the observational nature of the current study, no causal relationship can be inferred.

Regardless of the aforementioned differences in the risk factors identified in the current study versus previously published models, the models developed here performed well in predicting the risk of cardiovascular events in a population with two well-defined risk factors, namely patients with type 2 diabetes and above 50 years of age. Overall, the predictive accuracies of the models presented in the current study are comparable to those of previously published models. For example, the Framingham risk score, which included diabetes as a predictor, yielded C-statistics of 0.76 and 0.79 for men and women in the general population, respectively [12]. However, when evaluated in an older diabetic cohort and in patients without prior CVD, the Framingham risk score had a C-statistic of 0.65 [19]. The performances of the claims-based models presented here were also comparable to those of previously developed risk models specific to the diabetic population, such as the UKPDS risk engine [21, 40]. Although C-statistics were not reported in the UKPDS original publications, subsequent validations in other diabetic cohorts yielded C-statistics ranging from 0.61 to 0.73 [19, 41]. The ADVANCE model, developed in a population of diabetic patients at risk of cardiovascular events similar to the secondary prevention population in this study, also presented comparable C-statistics of 0.69-0.70 [22]. Moreover, several other multivariate risk models were published and reported C-statistics ranging between 0.64 and 0.70 [17, 18, 20]. A comprehensive external validation study would be needed to evaluate the performance of the different models on the same cohort of patients [42].

The Framingham and UKPDS models were not developed and tested for patients with a prior history of CVD (i.e., the secondary prevention population), meaning that their predictive accuracy may be lower in this subpopulation [12, 21]. Therefore, another advantage of the models developed in the current study over several previous ones is their ability to predict CVD risk in patients with prior history of CVD, who represented almost a quarter of the sample population. More generally, the reliability of this claims-based approach is perhaps best illustrated by the limited incremental predictive accuracy conferred by the additional inclusion of variables derived from medical records or laboratory results.

In light of the HEDIS performance measure that targets hospitalization for potentially preventable complications, rationally allocating healthcare resources to patients with type 2 diabetes at higher risk of cardiovascular complications may help healthcare providers meet quality of care standards, and lead to reductions in morbidity, mortality, and cost savings. With growing evidence suggesting that certain types of diabetes treatments—such as SGLT2 inhibitors or GLP-1 receptor agonists—may mitigate cardiovascular risk in addition to improving glycemic control, the potential dual purpose of these diabetes medications could be considered-despite their higher cost—to optimize treatment decisions in patients with type 2 diabetes at high risk of CVD [24,25,26,27]. Patients receiving these game-changing treatments were excluded from the present study due to the potential for indication bias: the use of SGLT2 inhibitors or GLP1 receptor agonists could effectively reduce the risk of CVD, but may appear as risk factors associated with a higher risk of CVD if these agents are preferentially prescribed to higher-risk patients. Such counterintuitive phenomena are common in observational studies. Another potential clinical application of the models developed here would be to identify patients at high risk of CVD events within a certain time window in order to provide preventive care. The threshold used for this high-risk group could be rationally determined using the risk that maximizes the sum of the model sensitivity and specificity. For example, using this method, the high-risk threshold in the primary prevention population would be 2.5%, 3.5%, and 1.0% for MACE, MACE-plus, and CVD-related death, respectively (sensitivity ranging from 67 to 73%, and specificity ranging from 67 to 76%). In the sample population used in the current study, applying these thresholds would result in approximately one out of three patients classified at high-risk of having MACE or MACE-plus within a 1-year window, and one out of four patients at high-risk of CVD-related death. In the secondary prevention population, the same thresholds would be 12.5%, 18.0%, and 5.0% for MACE, MACE-plus, and CVD-related death, respectively (sensitivity ranging from 62 to 78%, and specificity ranging from 66 to 72%), resulting in approximately one out of three patients with a high-risk of having any cardiovascular event within a 1-year window.

Limitations

The current study is subject to a few limitations. First, the identification of study outcomes was based on definitional algorithms using health insurance claims data that have not been fully validated, which could lead to the misclassification of outcomes. Second, patients may have experienced cardiovascular events prior to the start of data availability, and may have been misclassified in the primary prevention population. Third, a recorded diagnosis code on a medical claim is not an attestation that the patient has the condition, because the code may represent a rule-out diagnosis or may be recorded incorrectly. Fourth, risk predictions beyond 60 months post-index should be interpreted with caution as a limited number of patients had an at-risk period of such duration. Moreover, risk predictions over longer periods may be confounded by changes in therapeutic strategies. Despite these limitations, healthcare claims are a valuable resource to develop such models. Indeed, the large sample size typically available in claims database prevents over-fitting the models to a specific data set, thereby increasing their external validity, as illustrated by the negligible decrease in predictive accuracy observed within the validation set compared to the training set. Future studies are needed to externally validate the model in a distinct population or database. Finally, it should be noted that the risk models developed aimed at identifying patients at risk of CVD events, no causal inference can be drawn from this model based on observational data.

Conclusions

In summary, this study developed risk models that could reliably identify patients with type 2 diabetes at risk of MACE, MACE-plus, and CVD-related death based on information available in health insurance claims. Ultimately, stakeholders—such as quality of care organizations and payers—may use these models to identify diabetic patients at high risk of cardiovascular events and potentially improve their clinical management, thereby preventing a significant part of the disease burden and associated costs.

Abbreviations

aDCSI:: adapted diabetes complications severity index
CCI:: Charlson comorbidity index
CVD:: cardiovascular disease
ED:: emergency department
GLP-1:: glucagon-like peptide-1
HbA1c:: glycated hemoglobin
HEDIS:: Healthcare Effectiveness Data and Information Set
HIPAA:: Health Insurance Portability and Accountability Act
ICD-9-CM:: International Classification of Diseases, 9th Revision, Clinical Modification
ICD-10-CM:: International Classification of Diseases, 10th Revision, Clinical Modification
IP:: inpatient
MACE:: major adverse cardiovascular event
MI:: myocardial infarction
NCQA:: National Committee for Quality Assurance
SGLT2:: sodium glucose co-transporter 2
UKPDS:: United Kingdom Prospective Diabetes Study

References

Center for Disease Control. National Diabetes Statistics Report 2017. Atlanta: Centers for Disease Control; 2017.
Google Scholar
Stolar M. Glycemic control and complications in type 2 diabetes mellitus. Am J Med. 2010;123(3):S3–11.
Article PubMed CAS Google Scholar
Greenfield S, Kaplan SH. When clinical practice guidelines collide: finding a way forward. Ann Intern Med. 2017;167(9):677–8.
Article PubMed Google Scholar
Kannel WB, McGee DL. Diabetes and cardiovascular disease: The Framingham study. JAMA. 1979;241(19):2035–8.
Article PubMed CAS Google Scholar
Almdal T, Scharling H, Jensen JS, Vestergaard H. The independent effect of type 2 diabetes mellitus on ischemic heart disease, stroke, and death: a population-based study of 13,000 men and women with 20 years of follow-up. Arch Intern Med. 2004;164(13):1422–6.
Article PubMed Google Scholar
Emerging Risk Factors Collaboration. Association of cardiometabolic multimorbidity with mortality. JAMA. 2015;314(1):52–60.
Article CAS Google Scholar
The Emerging Risk Factors Collaboration. Diabetes mellitus, fasting blood glucose concentration, and risk of vascular disease: a collaborative meta-analysis of 102 prospective studies. Lancet. 2010;375:2215–22.
Article PubMed Central CAS Google Scholar
Laakso M. Cardiovascular disease in type 2 diabetes from population to man to mechanisms: the Kelly West Award Lecture 2008. Diabetes Care. 2010;33(2):442–9.
Article PubMed PubMed Central CAS Google Scholar
HEDIS^® & Performance Measurement. http://www.ncqa.org/hedis-quality-measurement. Accessed 26 July 2016.
Hospitalization for potentially preventable complications: rate of discharges for ambulatory care sensitive conditions (ACSC) per 1,000 members and the risk-adjusted ratio of observed to expected discharges for ACSC by chronic and acute conditions, for members 67 years of age and older. https://www.qualitymeasures.ahrq.gov/summaries/summary/49840/hospitalization-for-potentially-preventable-complications-rate-of-discharges-for-ambulatory-care-sensitive-conditions-acsc-per-1000-members-and-the-riskadjusted-ratio-of-observed-to-expected-discharges-for-acsc-by-chronic-and-acute-conditions-for-members-67-ye. Accessed 26 July 2016.
Nichols GA, Brown JB. The impact of cardiovascular disease on medical care costs in subjects with and without type 2 diabetes. Diabetes Care. 2002;25(3):482–6.
Article PubMed Google Scholar
D’Agostino RB Sr, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, Kannel WB. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation. 2008;117(6):743–53.
Article PubMed Google Scholar
Pencina MJ, D’Agostino RB Sr, Larson MG, Massaro JM, Vasan RS. Predicting the 30-year risk of cardiovascular disease: the framingham heart study. Circulation. 2009;119(24):3078–84.
Article PubMed PubMed Central Google Scholar
Wilson PWF, D’Agostino RB, Levy D, Belanger AM, Silbershatz S, Kannel WB. Prediction of coronary heart disease using risk factor categories. Circulation. 1998;97:1837–47.
Article PubMed CAS Google Scholar
Coleman RL, Stevens RJ, Retnakaran R, Holman RR. Framingham, SCORE, and DECODE risk equations do not provide reliable cardiovascular risk estimates in type 2 diabetes. Diabetes Care. 2007;30(5):1292–3.
Article PubMed Google Scholar
Garrison LP Jr, Neumann PJ, Erickson P, Marshall D, Mullins CD. Using real-world data for coverage and payment decisions: The ISPOR Real-World Data Task Force report. Value Health. 2007;10(5):326–35.
Article PubMed Google Scholar
Cederholm J, Eeg-Olofsson K, Eliasson B, Zethelius B, Nilsson PM, Gudbjornsdottir S, Swedish National Diabetes R. Risk prediction of cardiovascular disease in type 2 diabetes: a risk equation from the Swedish National Diabetes Register. Diabetes Care. 2008;31(10):2038–43.
Article PubMed PubMed Central Google Scholar
Kaasenbrood L, Poulter NR, Sever PS, Colhoun HM, Livingstone SJ, Boekholdt SM, Pressel SL, Davis BR, van der Graaf Y, Visseren FL, et al. Development and validation of a model to predict absolute vascular risk reduction by moderate-intensity statin therapy in individual patients with type 2 diabetes mellitus: The Anglo Scandinavian Cardiac Outcomes Trial, Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack Trial, and Collaborative Atorvastatin Diabetes Study. Circ Cardiovasc Qual Outcomes. 2016;9(3):213–21.
Article PubMed Google Scholar
Kengne AP, Patel A, Marre M, Travert F, Lievre M, Zoungas S, Chalmers J, Colagiuri S, Grobbee DE, Hamet P, et al. The Framingham and UK Prospective Diabetes Study (UKPDS) risk equations do not reliably estimate the probability of cardiovascular events in a large ethnically diverse sample of patients with diabetes: the Action in Diabetes and Vascular Disease: Preterax and Diamicron-MR Controlled Evaluation (ADVANCE) Study. Diabetologia. 2010;53(5):821–31.
Article PubMed CAS Google Scholar
Robinson T, Elley R, Wells S, Robinson E, Kenealy T, Pylypchuk R, Bramley D, Arrol B, Crengle S, Riddell T, et al. New Zealand Diabetes Cohort Study cardiovascular risk score for people with Type 2 diabetes: validation in the PREDICT cohort. J Prim Healthc. 2012;4(3):181–8.
Google Scholar
Stevens RJ, Kothari V, Adler AI, Stratton IM. The UKPDS risk engine: a model for the risk of coronary heart disease in Type II diabetes (UKPDS 56). Clin Sci. 2001;101(6):671–9.
Article PubMed CAS Google Scholar
Kengne AP, Patel A, Marre M, Travert F, Lievre M, Zoungas S, Chalmers J, Colagiuri S, Grobbee DE, Hamet P, et al. Contemporary model for cardiovascular risk prediction in people with type 2 diabetes. Eur J Cardiovasc Prev Rehabil. 2011;18(3):393–8.
Article PubMed Google Scholar
Kothari V, Stevens RJ, Adler AI, Stratton IM, Manley SE, Neil HA, Holman RR. UKPDS 60: risk of stroke in type 2 diabetes estimated by the UK Prospective Diabetes Study risk engine. Stroke. 2002;33(7):1776–81.
Article PubMed Google Scholar
Neal B, Perkovic V, Mahaffey KW, de Zeeuw D, Fulcher G, Erondu N, Shaw W, Law G, Desai M, Matthews DR, et al. Canagliflozin and cardiovascular and renal events in type 2 diabetes. N Engl J Med. 2017;377(7):644–57.
Article PubMed CAS Google Scholar
Zinman B, Wanner C, Lachin JM, Fitchett D, Bluhmki E, Hantel S, Mattheus M, Devins T, Johansen OE, Woerle HJ, et al. Empagliflozin, cardiovascular outcomes, and mortality in type 2 diabetes. N Engl J Med. 2015;373(22):2117–28.
Article PubMed CAS Google Scholar
Marso SP, Daniels GH, Brown-Frandsen K, Kristensen P, Mann JF, Nauck MA, Nissen SE, Pocock S, Poulter NR, Ravn LS, et al. Liraglutide and cardiovascular outcomes in type 2 diabetes. N Engl J Med. 2016;375(4):311–22.
Article PubMed PubMed Central CAS Google Scholar
Ferrannini E, DeFronzo RA. Impact of glucose-lowering drugs on cardiovascular disease in type 2 diabetes. Eur Heart J. 2015;36(34):2288–96.
Article PubMed CAS Google Scholar
Quan H, Li B, Couris CM, Fushimi K, Graham P, Hider P, Januel JM, Sundararajan V. Updating and validating the Charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. Am J Epidemiol. 2011;173(6):676–82.
Article PubMed Google Scholar
Young BA, Lin E, Von Korff M, Simon G, Ciechanowski P, Ludman EJ, Everson-Stewart S, Kinder L, Oliver M, Boyko EJ, et al. Diabetes complications severity index and risk of mortality, hospitalization, and healthcare utilization. Am J Managed Care. 2008;14(1):15–23.
Google Scholar
Cupples LA, D’Agostino RB, Anderson K, Kannel WB. Comparison of baseline and repeated measure covariate techniques in the Framingham heart study. Stat Med. 1988;7(1–2):205–18.
Article PubMed CAS Google Scholar
Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982;143(1):29–36.
Article PubMed CAS Google Scholar
Harrell FEJ. Regression modelling strategies. New York: Springer Verlag Inc; 2010.
Google Scholar
Bots SH, van der Graaf Y, Nathoe HM, de Borst GJ, Kappelle JL, Visseren FL, Westerink J, Group SS. The influence of baseline risk on the relation between HbA1c and risk for new cardiovascular events and mortality in patients with type 2 diabetes and symptomatic cardiovascular disease. Cardiovasc Diabetol. 2016;15(1):101.
Article PubMed PubMed Central CAS Google Scholar
Kennedy MW, Fabris E, Suryapranata H, Kedhi E. Is ischemia the only factor predicting cardiovascular outcomes in all diabetes mellitus patients? Cardiovasc Diabetol. 2017;16(1):51.
Article PubMed PubMed Central CAS Google Scholar
She J, Deng Y, Wu Y, Xia Y, Li H, Liang X, Shi R, Yuan Z. Hemoglobin A1c is associated with severity of coronary artery stenosis but not with long term clinical outcomes in diabetic and nondiabetic patients with acute myocardial infarction undergoing primary angioplasty. Cardiovasc Diabetol. 2017;16(1):97.
Article PubMed PubMed Central Google Scholar
van Steen SC, Schrieks IC, Hoekstra JB, Lincoff AM, Tardif JC, Mellbin LG, Ryden L, Grobbee DE, DeVries JH, AleCardio study g. The haemoglobin glycation index as predictor of diabetes-related complications in the AleCardio trial. Eur J Prev Cardiol. 2017;24(8):858–66.
Article PubMed Google Scholar
Yang ZK, Shen Y, Shen WF, Pu LJ, Meng H, Zhang RY, Zhang Q, Chen QJ, De Caterina R, Lu L. Elevated glycated albumin and reduced endogenous secretory receptor for advanced glycation endproducts levels in serum predict major adverse cardio-cerebral events in patients with type 2 diabetes and stable coronary artery disease. Int J Cardiol. 2015;197:241–7.
Article PubMed Google Scholar
Niedziela J, Hudzik B, Niedziela N, Gasior M, Gierlotka M, Wasilewski J, Myrda K, Lekston A, Polonski L, Rozentryt P. The obesity paradox in acute coronary syndrome: a meta-analysis. Eur J Epidemiol. 2014;29(11):801–12.
Article PubMed PubMed Central CAS Google Scholar
Stokes A, Preston SH. Smoking and reverse causation create an obesity paradox in cardiovascular disease. Obesity. 2015;23(12):2485–90.
Article PubMed Google Scholar
Stevens RJ, Coleman RL, Adler AI, Stratton IM, Matthews DR, Holman RR. Risk factors for myocardial infarction case fatality and stroke case fatality in type 2 diabetes. UKPDS 66 Diabetes Care. 2004;27(1):201–7.
Article PubMed Google Scholar
van der Leeuw J, van Dieren S, Beulens JWJ, Boeing H, Spijkerman AMW, van der Graaf Y, Nöthlings U, Visseren FLJ, Rutten GEHM, et al. The validation of cardiovascular risk scores for patients with type 2 diabetes mellitus. Heart. 2015;101(3):222–9.
Article PubMed Google Scholar
van Dieren S, Beulens JW, Kengne AP, Peelen LM, Rutten GE, Woodward M, van der Schouw YT, Moons KG. Prediction models for the risk of cardiovascular disease in patients with type 2 diabetes: a systematic review. Heart. 2012;98(5):360–9.
Article PubMed Google Scholar
National Business Coalition on Health. 2010. Measuring Success: A Coalition Guide for Implementing a Diabetes Recognition Program Initiative. http://www.nbch.org/nbch/files/ccLibraryFiles/Filename/000000001823/DRP%20Implementation%20Techncial%20Guide.pdf. Accessed 27 July 2016.

Download references

Authors’ contributions

All authors participated to the conception and design of the study, analysis or interpretation of the data presented in this manuscript, and critically revised the intellectual content of this manuscript. MGL, RAB, AMM, PL, and ML participated in the acquisition/collection of data. All authors read and approved the final manuscript.

Acknowledgements

Medical writing assistance was provided by Samuel Rochette, an employee of Analysis Group, Inc.

Competing interests

JBY and CHW acted as consultants for Janssen Scientific Affairs, LLC. MGL, AMM, PL, MG, MHL and MSD are employees of Analysis Group, Inc., which has received consultancy fees from Janssen Scientific Affairs, LLC. RAB and BB are employees of Janssen Scientific Affairs, LLC and may own stock or stock options.

Availability of data and materials

The data that support the findings of this study are available from Optum (a division of UnitedHealth Group), but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Any researchers interested in obtaining the data used in this study can access the database through Optum, under a license agreement, including the payment of appropriate license fee.

Consent for publication

Patients’ data used were de-identified prior to their use in the current study and are, thus, fully compliant with the patient confidentiality requirements of the Health Insurance Portability and Accountability Act.

Ethics approval and consent to participate

Not applicable.

Funding

This study was funded by Janssen Scientific Affairs, LLC.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Authors and Affiliations

Cleveland Clinic Foundation Lerner College of Medicine of Case Western Reserve University, Cleveland, OH, USA
James B. Young
Analysis Group, Inc., 1000 De La Gauchetière Ouest, Suite 1200, Montreal, QC, H3B 4W5, Canada
Marjolaine Gauthier-Loiselle, Ameur M. Manceur, Patrick Lefebvre & Marie-Hélène Lafeuille
Janssen Scientific Affairs, LLC, Raritan, NJ, USA
Robert A. Bailey & Brahim Bookhart
Analysis Group Inc., Boston, MA, USA
Morris Greenberg & Mei Sheng Duh
Rockwood Clinic, Spokane, WA, USA
Carol H. Wysham

Authors

James B. Young
View author publications
You can also search for this author in PubMed Google Scholar
Marjolaine Gauthier-Loiselle
View author publications
You can also search for this author in PubMed Google Scholar
Robert A. Bailey
View author publications
You can also search for this author in PubMed Google Scholar
Ameur M. Manceur
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Lefebvre
View author publications
You can also search for this author in PubMed Google Scholar
Morris Greenberg
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Hélène Lafeuille
View author publications
You can also search for this author in PubMed Google Scholar
Mei Sheng Duh
View author publications
You can also search for this author in PubMed Google Scholar
Brahim Bookhart
View author publications
You can also search for this author in PubMed Google Scholar
Carol H. Wysham
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marjolaine Gauthier-Loiselle.

Additional files

Additional file 1.

Study design.

Additional file 2.

Definition of outcomes and risk factors.

Additional file 3.

Risk factors among patients with and without any major adverse cardiovascular events during the at-risk period (training set).

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Young, J.B., Gauthier-Loiselle, M., Bailey, R.A. et al. Development of predictive risk models for major adverse cardiovascular events among patients with type 2 diabetes mellitus using health insurance claims data. Cardiovasc Diabetol 17, 118 (2018). https://doi.org/10.1186/s12933-018-0759-z

Download citation

Received: 25 May 2018
Accepted: 12 August 2018
Published: 24 August 2018
DOI: https://doi.org/10.1186/s12933-018-0759-z

Development of predictive risk models for major adverse cardiovascular events among patients with type 2 diabetes mellitus using health insurance claims data

Abstract

Background

Methods

Results

Conclusion

Background

Methods

Study design

Data source

Study population

Study outcomes

Statistical analyses

Results

Risk models

Examples

Discussion

Limitations

Conclusions

Abbreviations

References

Authors’ contributions

Acknowledgements

Competing interests

Availability of data and materials

Consent for publication

Ethics approval and consent to participate

Funding

Publisher’s Note

Author information

Authors and Affiliations

Corresponding author

Additional files

Additional file 1.

Additional file 2.

Additional file 3.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Cardiovascular Diabetology

Contact us