A molecular signature for the metabolic syndrome by urine metabolomics

Background Metabolic syndrome (MetS) is a multimorbid long-term condition without consensual medical definition and a diagnostic based on compatible symptomatology. Here we have investigated the molecular signature of MetS in urine. Methods We used NMR-based metabolomics to investigate a European cohort including urine samples from 11,754 individuals (18–75 years old, 41% females), designed to populate all the intermediate conditions in MetS, from subjects without any risk factor up to individuals with developed MetS (4–5%, depending on the definition). A set of quantified metabolites were integrated from the urine spectra to obtain metabolic models (one for each definition), to discriminate between individuals with MetS. Results MetS progression produces a continuous and monotonic variation of the urine metabolome, characterized by up- or down-regulation of the pertinent metabolites (17 in total, including glucose, lipids, aromatic amino acids, salicyluric acid, maltitol, trimethylamine N-oxide, and p-cresol sulfate) with some of the metabolites associated to MetS for the first time. This metabolic signature, based solely on information extracted from the urine spectrum, adds a molecular dimension to MetS definition and it was used to generate models that can identify subjects with MetS (AUROC values between 0.83 and 0.87). This signature is particularly suitable to add meaning to the conditions that are in the interface between healthy subjects and MetS patients. Aging and non-alcoholic fatty liver disease are also risk factors that may enhance MetS probability, but they do not directly interfere with the metabolic discrimination of the syndrome. Conclusions Urine metabolomics, studied by NMR spectroscopy, unravelled a set of metabolites that concomitantly evolve with MetS progression, that were used to derive and validate a molecular definition of MetS and to discriminate the conditions that are in the interface between healthy individuals and the metabolic syndrome. Supplementary Information The online version contains supplementary material available at 10.1186/s12933-021-01349-9.

. MetS constitutes a first order medical problem with a worldwide prevalence between 10 and 40% depending on the country or region [2]. This prevalence is directly attributed to unhealthy lifestyle habits, leading to a growing number of people affected by obesity or diabetes that are also associated with the development of MetS.
Albeit its importance, there is no consensus definition for MetS, in line with the complex nature of the syndrome. The current diagnostic of MetS is mostly based on the coincident identification of at least three from a set of known risk factors (RF, Table 1). Several relevant health institutions like the World Health Organization (WHO), the International Diabetes Federation (IDF), the National Cholesterol Education Program-Third Adult Treatment Panel (NCEP:ATP III), the European Group for the Study of Insulin Resistance (EGIR), and the American Association for Clinical Endocrinology (AACE) differ on which risk factors (RF) contribute and/or are essential for diagnosing MetS (bold-highlighted RFs in Table 1) [3][4][5][6][7][8].
There is consensus on some RF contributing to MetS: altered glucose metabolism, obesity, dyslipidemia and high blood pressure [9] but it is not clear how many of the contributing RF are required to diagnose MetS, nor the relation between a given combination of RF and the severity of the syndrome. In 2009, a seminal document attempted to unify some of the existing definitions for MetS and concluded that it emerges only when at least three of the abovementioned RF are present, with no single one being essential (Harmonized column in Table 1) [6]. Cut-off levels for each of the RF were also defined but this strategy suffers from the inherent difficulty to obtain a causal relationship between a RF and the syndrome.
Another unresolved issue is the putative relationship between MetS and non-alcoholic fatty liver disease (NAFLD), which is commonly considered to be the hepatic manifestation of the metabolic syndrome [10], mostly due to their congruent RF. Yet, there is little experimental evidence linking both diseases, and whether NAFLD and MetS are different expressions of the same disease or related comorbidities remains an open question.
All these ambiguities underline the need for new more objective and accurate signatures of MetS, ideally based on molecular and quantifiable descriptors. Metabolomics is a powerful tool to investigate MetS since all its contributing RF are expected to significantly alter metabolism [11]. Urine is metabolically very concentrated, not homeostatized and the very large number of metabolites found in urine may properly account for all the contributing RF to MetS [12][13][14][15]. In turn, NMR is particularly adequate for the analysis of complex solutions such as plasma, serum and urine [16] and it has been applied to study MetS, in serum samples so far [17].
In here, we have investigated MetS by using a large cohort of individuals mostly from a Southern European population (two Spanish regions), analysing close to 12,000 urine samples by NMR spectroscopy. The cohort includes volunteers of the general population and MetS. An integrative analysis of this large spectra database allowed corroborating some of the already reported biomarkers, reporting novel ones and, most importantly, obtaining a metabolic signature of MetS progression and identifying the relative contributing risk for each factor.

Sample cohorts from healthy individuals and patients
A large cohort including individuals (n approx. 12,000) with different degree of the MetS was collected from this specific study. This cohort consisted of four different subcohorts (OSARTEN, OBENUTIC, PREDIMED and KIROLGETXO) recruited in a European country (Spain) and another one in different European regions (NAFLD). The relevant data for each subcohort is summarized in the Supplementary material (text and Additional file 1: Tables S1-S5). The procedures for sample collection and handling were the same one for every subcohort under consideration and abided standard operating procedures. Following the Declaration of Helsinki principles, all participants in the study provided informed consent to clinical investigations, with evaluation and approval from the corresponding ethics committee. All data was anonymized to protect the confidentiality of participants.

Sample preparation
Samples were stored at − 80 °C and, on the day of the analysis, were defrosted at room temperature during 30 min. Aliquots were centrifuged at 6000 rpm for 5 min at 4 °C and then 630 μL of the supernatant were transferred into a 1.5 mL tube. Subsequently, 70 μL of a phosphate buffer (1.5 M KH 2 PO 4 /K 2 HPO 4 , 2 mM NaN 3 , 1% TSP in 70% D 2 O, pH 7.4) were added in the same microcentrifuge tube to minimize pH variation. The mix of urine and buffer was briefly vortexed and 600 μL of the mixture were finally transferred into a 5 mm NMR tube.

NMR measurements
Experiments were performed as previously described [18,19]. In brief, two complementary experiments were recorded per sample: a one-dimensional (1D) 1 H spectrum with water presaturation for metabolite quantification and a two-dimensional (2D) J-resolved 1 H spectrum. For selected samples, a 2D 1 H, 1 H-TOCSY (TOtal Correlation SpectroscopY) spectrum was also recorded to confirm metabolite identification. Metabolites were identified from the 1D 1 H NMR spectra using the Chenomx NMR software (version 8.6) and corroborated by experimental spiking when necessary.

Statistical analysis
A cohort composed of OSARTEN, OBENUTIC, and PREDIMED subcohorts was used to analyse the 16 pathological conditions. A principal component analysis (PCA) was used to summarize and visualize (by PC 1 and 2) each condition, which was represented by its average profile. Each pathological condition was compared with the apparently healthy (0000) one. This comparison employed Wilcoxon nonparametric hypothesis testing for each bin to identify those with a statistically significant difference (p-value < 0.05), after adjustment by the False Discovery Rate (FDR) method to control for Type I errors due to multiple comparisons. Binary logarithms of fold-changes (log 2 FC) were used to quantify the magnitude and direction of differences. Fold-changes were calculated as the average of a variable within the target condition divided by its average within the apparently healthy condition. Different conditions and bins were clustered and organized as dendrograms in heatmaps, using hierarchical clustering by the complete-linkage method and Euclidean distances. To quantify differences between average profiles of conditions, a multivariate Euclidean distance (with autoscale) was calculated between the apparently healthy and all other conditions. Resulting distances were scaled (range 0 to 1) and translated into a colour code for a graph connecting the different adjacent conditions, which was generated with igraph (R package version 1.2.6).

Classification models for MetS
For each available MetS definition a binary classification model was built, with heatmap selected bins as input and MetS diagnosis (no/yes) as output. The data was randomly divided into training (75%) and testing (25%) sets. The performance was summarized in ROC curves for each MetS definition, including their AUCs with pertaining 95% confidence intervals and cut-off points to maximize the Youden index with associated specificity and sensitivity parameters.

Microalbuminuria analysis
A semi-quantitative analysis using a test strip was done to each urine sample for the detection of proteinuria. The output results were considered as negative/positive if the value of proteinuria (identified as microalbuminuria) was lower/higher than 10 mg/dL.

Setting the problem
To investigate the molecular signature of MetS, we first identified the RF that may contribute to the syndrome from the general characteristics of the donors. Four factors have well-known association with the development of MetS and they have been included in this study ( Table 1): alterations in glucose metabolism, obesity (determined from BMI since waist circumference was inaccessible), dyslipidemia and hypertension. The WHO also considers microalbuminuria as a potential RF, but it is not routinely determined in all the medical check-ups and we have evaluated its putative influence in MetS with an independent sub-study (vide infra).
Our study was designed not only to investigate the contribution for each of the RF to MetS independently, but also to evaluate all their possible combinations, a total of 16 (2 4 ) different conditions. We used a nomenclature for the conditions where the digits represent the four risk factors (RF 1 RF 2 RF 3 RF 4 ), binary coded by "1" or "0" to indicate that the given factor is present or absent in the condition ( Table 2). According to this notation, a 0000 sample would originate from an apparently healthy subject while, for instance, a sample encoded as 1011 would belong to a patient that has diabetes, dyslipidemia and hypertension, but no obesity. A quantitative definition for the inclusion criteria for each of the RF is also listed in Table 2.
Additional file 1: Table S6 shows the number of samples allocated to each condition (including OSARTEN, PREDIMED and OBENUTIC subcohorts), also stratified by sex. The apparently healthy condition is more prevalent than the rest of conditions, due to the characteristics of the OSARTEN subcohort, formed of active population. Even though some conditions are less prevalent, the number of samples in each condition is enough to reach high statistical power. In the worst case (1110, with 62 samples), it is still possible to detect a Cohen's smallmedium effect size with more than 80% power in comparisons with the apparently healthy condition.

The urine 1 H NMR spectrum is sensitive to MetS
NMR-based metabolomics of urine allows the quantification of several hundreds of metabolites that include central metabolism, xenobiotics, metabolites from microbiota and nutrition derivatives among others [20] and, therefore, is an optimal source of information for the metabolic characterization of MetS. An unsupervised PCA analysis of the urine NMR spectra of the different subcohorts (Additional file 1: Figure S1) reported no significant differences, validating their full inclusion in the study. From all classified spectra, an average spectrum was composed for each of the 16 conditions. A PCA analysis of their mean profiles (Fig. 1A) shows that all conditions separate well in 2D principal components space, highlighting a differential manifestation of RF in the urine spectrum. Interestingly, four well-differentiated clusters of conditions can be observed in the PCA plot, that always discriminates well between diabetes and hypertension (coloured ellipses in Fig. 1A), consistent with previous observations [15], while obesity and dyslipidemia are separated only within each cluster, indicating a lower level of modification of the urine metabolites induced by these two factors [21,22].
Based on these results, we then compared each condition to the apparently healthy one (samples from individuals with 0000). The heatmap in Fig. 1B shows the results obtained from the univariate analysis of the acquired urine samples, considering the intensity of the spectral bins as variables. The conditions (in the abscise axis) and the bins/metabolites (in the ordinate axis) have been sorted according to unsupervised cluster analysis. The bins have been assigned to the contributing metabolites and up to 17 different metabolites (and one unassigned bin) contribute to the discrimination of the conditions (Table 3). For the metabolites that are present in more than one bin, the most significant bin was used for the metabolite quantification. For each condition, the p-value indicates the statistical significance of the variation with respect to apparently healthy individuals (see asterisks inside the squares), while the fold change is colour-coded according to the bar legend: a red/blue value in the heatmap indicates up/down regulation of the bin. In most cases, all the bins that correspond to a given metabolite produce consistent fold changes, while the small differences observed in the magnitude of the fold change can be attributed to the metabolic heterogeneity of certain bins. Yet, citric acid shows upregulation at the 2.66 ppm bin and downregulation at the 2.57 ppm bin (Fig. 1B). This is explained by the large sensitivity of citric acid to pH and osmolarity, that produces small changes in the chemical shift and the intensity of the (outer) bins vary accordingly (Additional file 1: Figure S2) [23].
Several important conclusions can be extracted from the heatmap: (i) MetS emerges as a complex metabolic scenario where some metabolites upregulate and some others are downregulated in urine, (ii) the (unsupervised) cluster analysis sorts the conditions in a way that naturally progresses towards the consensus definition of MetS Bruzzone et al. Cardiovasc Diabetol (2021) 20:155  (i. e., the conditions with more RF = 1 fall in the right side of the heatmap and vice versa); (iii) the metabolic variation is concomitant to the progression towards MetS, with close-to-linear variations of the metabolite concentrations as a function of the conditions; and (iv) most of the pertinent metabolites are related to the molecular pathophysiology of the RF under consideration (Table 3): aromatic amino acids and histidine have been already associated to MetS [24][25][26]; insulin resistance is obviously related with an increase in glucose [27] and/or with elevated urine levels of p-cresol sulfate [28]; hypertension is associated with low imidazole concentrations [29,30]; upregulation of steroid lipids is a hallmark for dyslipidemia and obesity [31][32][33][34] and a set of the discovered metabolites are related to obesity [35], salicyluric acid [36] and trimethylamine N-oxide (TMAO) [37,38]). In turn, we also associate here, for the first time, some other dysregulated metabolites to MetS: methylhippuric acid, maltitol, 4-hydroxyphenylpyruvic acid (4-HPPA), trigonelline, quinolinic acid and nicotinuric acid.

Towards a molecular discrimination of MetS
To further illustrate the relationship between the observed metabolic changes and MetS, in Fig. 1C we sketched a correlation map where adjacent conditions differing by only one RF are connected by lines and coloured by their Spearman's correlation distance to the apparently healthy condition (0000), as indicated. The graph shows once more that the variation of the urine metabolome (colors in Fig. 1C) agrees well with MetS progression (raising number of RF = 1). Furthermore, the graph also reveals that not all the factors equally contribute to MetS progression; instead, for a given number We have also used the spectral database to create metabolic models of MetS (see Research Design and Methods for details) adapted to the different criteria used to define the MetS. To that end, we have first identified the number of cases with MetS condition, according to the different definitions, and using the general characteristics of the curated pool of 10,792 subjects. Only three out of the five different definitions from Table 1 can be truly distinguished with the general characteristics available in our cohort (here called independent definitions). Specifically, the MetS definition according to the WHO, EGIR and AACE are represented by the cluster of 1111, 1011, 1101 and 1110 conditions (squares and triangles in Fig. 1C); the MetS definition from NCEP:ATPIII and Harmonized are represented by the former conditions plus 0111 (squares, triangles and rhombus in Fig. 1C), and the IDF MetS definition is represented by the 1111, 1101, 1110 and 0111 conditions (squares and rhombus in Fig. 1C). Using these classifications, we found 642 cases for the NCEP:ATPIII or Harmonized definitions, 552 cases for the IDF definition and 494 cases for the WHO, EGIR or AACE definitions. Subsequently, we used the spectral information collected from the urine samples to train and test three metabolic models that maximizes the differences between the MetS and non-MetS conditions, one per independent definition and using 75% (8,094) /25% (2698) samples as training/validation cohorts. Figure 2A-C shows the ROC curves for the three models under consideration. Moreover, we have scrutinized the cohort, calculating its probability of undergoing MetS, for the three models/independent definitions (Fig. 2D-F). Specifically, after applying each model, samples were scored with a "MetS probability" between 0 and 1. The figure represents the distribution of these scores as a smoothed histogram (kernel densities). These plots evidence that people without MetS tend to cluster together in the region of low scores while people with MetS tend to be spread mainly along high score regions, also reflecting the heterogeneity of the syndrome. The results show that the models, based solely on the metabolomic analysis of urine samples, can identify MetS, in excellent compliance with all three independent definitions, with AUROC values between 0.83 and 0.87. We believe that the discrepancies reflect the differences between our molecular  [48] signature and the standard definitions for MetS. Indeed, while all independent definitions are largely consistent with our derived MetS metabotype, those including insulin resistance as mandatory criteria perform slightly better. This result is consistent with the statistical distance of the adiabetic 0111 condition that appears closer to the apparently healthy group than to full MetS (Fig. 1C). This condition is included in the NCEP:ATPIII, IDF and Harmonized definitions. Finally, the heatmaps segregated by gender (Additional file 1: Figure S3) renders equivalent results than the one obtained for the entire cohort (Fig. 1B), indicating that sex is not affecting the metabolic characterization of MetS. In turn, aging is a well-known risk factor for many diseases, including MetS [39]. The OSARTEN and OBENUTIC subcohorts are well-balanced in age while the PREDIMED cohort is older on average. A potential caveat is, therefore, that our metabolic model might partially monitor the aging process. To discard this pitfall, we also analysed an independent cohort (KIROLGETXO) that was not used in deriving our metabolic model and sampled a senior population (age between 60 and 85) with healthy lifestyle including regular sport activities.
Not surprisingly, this cohort is enriched in people with none (n = 34) or only one MetS risk factor (n = 40) (Additional file 1: Table S4), and our metabolic model accordingly indicates only a very low probability for suffering MetS (Fig. 3A).

The role of microalbuminuria and impaired renal function in MetS
As the WHO considers microalbuminuria as an RF for MetS, we also analysed the proteinuria values (> 10 mg/ dL) for all the urine samples from the OSARTEN cohort. Since albumin is the main protein of the urine, we equated microalbuminuria with proteinuria. The OSARTEN cohort is large enough to represent most of the MetS conditions with sufficient statistical significance, despite being strongly biased towards the apparently healthy and more healthy conditions. Additional file 1: Figure S4 shows how the percentage of microalbuminuria increases as the condition approaches the full MetS condition (1111, at the right of the plot). This result suggests that microalbuminuria is related to MetS, as acknowledged by the WHO and consistent with previous reports relating hypertension and elevated proteinuria  [40]. Yet, at worst (i.e., in condition 1111), only 10% of the samples show microalbuminuria, showing it to be only a secondary risk factor in the aetiology of MetS.
For the OSARTEN II and OBENUTIC cohorts, the estimated glomerular filtration rate (E-GFR) was determined from the available serum creatinine concentrations using the Chronic Kidney Disease Epidemiology Collaboration equation [41]. The values were sorted according to the G1-G5 scale (Additional file 1: Figure S5): most individuals (75%) fall in G1 category (normal or high GRF), 24.8% fall in G2 category (mildly decreased) and a residual percentage of individuals fall in G3a or G3b categories. None of the subjects have severely decreased GFR (G4) neither show kidney failure (G5). These results indicate that the observed metabolic changes are not biased by impaired renal function.

Metabolic relationship between NAFLD and MetS
We have also investigated the putative relationship between MetS and NAFLD, the latter without discriminating between non-alcoholic fatty liver (NAFL) and NASH. Most of the RF defining MetS contribute to NAFLD progression and whether NAFLD is indeed the hepatic manifestation of NASH, as previously suggested [10], remains an open question. We analysed a cohort of 234 urines from patients with NAFLD, diagnosed and staged by liver biopsy, the reference method for the characterization of the disease [42]. Based on the WHO, EGIR and AACE criteria, samples were classified in two subcohorts: NAFLD with MetS and NAFLD without MetS. We then used our metabolic model to predict the probability of MetS for the two subcohorts. Figure 3B shows the pertaining probability distributions for the general population (apparently healthy, 0000), the NAFLD without or with MetS subcohorts and the MetS population (with unknown status about NAFLD). As expected, the NAFLD without MetS subcohort indeed shows a low probability for having MetS on average, with a very similar distribution to the general population (also without MetS), implying that the NAFLD associated metabotype differs from the one for MetS. This result is consistent with the lack of association between transaminase levels and MetS patients [24]. In contrast, the NAFLD with MetS subcohort shows a complex probability distribution, highlighting the fact that a simultaneous presence of NAFLD and MetS confounds the metabolic definition for the syndrome, suggesting a partial overlap of associated metabotypes in line with their common risk factors. Taken together, our results suggest that MetS and NAFLD may be comorbidities with distinct metabolic profiles, albeit with some overlapping features.

Discussion
Our goal was to investigate the molecular signature of MetS in a large European cohort having a wide-range of MetS-related phenotypes. In here, we provide an unprecedented study using NMR spectroscopy and over a very large cohort of urine samples, specifically designed to populate all the possible intermediate conditions between healthy volunteers and MetS patients, the latter being characterized by the accumulation of RF and not biased by any specific definition of the syndrome. Remarkably, we always found a smooth but monotonic metabolic variation for a specific set of metabolites ( Fig. 4 and Table 3), well-reflecting the progressive deterioration of the metabolism due to the accumulation of RF towards MetS. Any case, not all these factors contribute equally to MetS progression, providing a molecular signature of the syndrome, as highlighted by the risk factors enclosed in the orange ellipse in Fig. 1C. This molecular definition of MetS (conditions 1001, 1011, 1101 and 1111) may be of particular interest in the discrimination of conditions that are in the interface between healthy individuals and MetS patients.
Our molecular signature of MetS considers the problems with the metabolism of the glucose as a compulsory risk factor for MetS, in line with WHO definition. They include insulin resistance and are related with the pre-diabetic and diabetic state. These problems are wellreflected in our analysis by the high levels of glucose found. Other related metabolites include p-cresol sulfate, a uremic toxin that originates from tyrosine metabolism by intestinal microbes, also associated with insulin resistance [28], and 4-HPPA is also involved in this pathway. Finally, maltitol is a polyol used as a sugar derivative recommended in individuals at risk of T2D [43].
We also found hypertension a compulsory risk factor of MetS. Consistently, almost 80% of the patients affected by MetS present elevated blood pressure [44]. Lowered histidine and imidazole levels could be linked to an impairment in the concentration of the endogenous ligands of the imidazoline and α 2 -adrenogenic receptor, ultimately associated to hypertension episodes [29,30]. In turn, dyslypidemia, directly reflected in the elevated levels of lipids in urine [24,25] and obesity, monitored by abnormal levels of TMAO, trigonelline and salicyluric acid, contribute to MetS but they would not constitute essential risk factors according to our molecular signature of MetS. Fig. 4 A molecular signature for MetS. All the risk factors that contribute to MetS have at least one metabolite in urine that is altered and contributes to the MetS metabotype. Such characteristic metabotype has been used to create a metabolic model to predict the probability of suffering MetS from the NMR analysis of a urine sample. Red and blue arrows correspond to up-and down-regulated metabolites in urine respectively. Created with BioRender.com The large number of samples in our study allowed to derive consistent metabolic models for discriminating MetS, adapted to the current existing definitions, and based only on a straightforward urine analysis by 1 H NMR spectroscopy (with no need of adding characteristics from the individual). The target setting for our models was to compare our molecular definition of MetS with the current diagnostics for the syndrome (Table 1), adding a molecular dimension to its definition. All existing definitions, based on slightly differing sets of risk factors, agree well with our derived metabolic profile, with high AUROC values for discrimination ranging between 0.85 and 0.92, performing better than a previously reported model [45]. Here, the WHO, EGIR, and AACE definitions including diabetes as a compulsory risk factor for MetS condition agree best with our predictions from urine metabotyping, presumably owing to the important weight of urinary glucose in the metabolic model. Actually, the AUROC values would raise up to 0.86-0.92 if hyperglycemia is defined as glucose higher than 110 mg/ dL (instead of 100 mg/dL, Additional file 1: Figure S6). Finally, our results also show a significant propensity for albuminuria in individuals with MetS, again in agreement with the WHO definition.
Finally, we also compared our urinary metabolic model for MetS, obtained from a vast and well-balanced sample cohort with the vast majority of them showing normal transaminase values, with an independent subcohort of NAFLD patients diagnosed by biopsy. While the results show a certain overlap of metabolic profiles between MetS and NAFLD, in agreement with their shared symptomatology, our MetS model can distinguish exclusive NAFLD condition without MetS comorbidity (Fig. 3B).

Limitations of the study
The study is under the assumption that urine is sensitive to all the factors that contribute to MetS. Specifically, obesity and dyslipidemia induced lower changes, that could also be related to their intrinsic metabolic variability. Even though we found metabolites associated to all the risk factors in MetS, the inclusion of metabolomic information from other matrices (i. e. serum) is desirable.

Conclusions
In summary, we have demonstrated that NMR-based metabolomics of urine samples can identify individuals with MetS condition. The relevant metabolites for discrimination are associated with all contributing risk factors, thus providing a holistic molecular signature for the metabolic syndrome. These results may improve clinical decision making and potentially guide early intervention in this important syndrome.