Integration of metabolomics methodologies for the development of predictive models for mortality risk in patients with severe COVID-19.

doi:10.21203/rs.3.rs-4418889/v1

Integration of metabolomics methodologies for the development of predictive models for mortality risk in patients with severe COVID-19.

2024 · doi:10.21203/rs.3.rs-4418889/v1

preprint OA: closed

Full text JSON View at publisher

Full text 116,883 characters · extracted from preprint-html · click to expand

Integration of metabolomics methodologies for the development of predictive models for mortality risk in patients with severe COVID-19. | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Integration of metabolomics methodologies for the development of predictive models for mortality risk in patients with severe COVID-19. Shanpeng Cui, Qiuyuan Han, Ran Zhang, Yue Li, Ming Li, Wenhua Liu, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4418889/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 02 Jan, 2025 Read the published version in BMC Infectious Diseases → Version 1 posted 14 You are reading this latest preprint version Abstract Background The global spread of Coronavirus disease (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has prompted the investigation of a predictive model for early mortality risk estimation in critical-type COVID-19 patients through the integration of metabolomics and clinical data using machine learning techniques in this study. Methods One hundred patients with severe COVID-19 infection, comprising 46 survivors and 53 non-survivors, were enrolled from the Second Hospital affiliated with Harbin Medical University. A predictive model was developed within 24 hours of admission utilizing blood metabolomics and clinical data. Differential metabolite analysis and other techniques were employed to identify relevant features. The performance of the models was evaluated by comparing the area under the receiver operating characteristic curve (AUROC). The ultimate predictive model underwent external validation with a cohort of 50 critical COVID-19 patients from the First Hospital affiliated with Harbin Medical University. Results Significant disparities in blood metabolomics and laboratory parameters were noted between individuals who survived and those who did not. Two metabolite indicators, Itaconic acid and 3-Oxalomalate, along with four laboratory tests (LYM, IL-6, PCT, and CRP), were identified as the six variables in all four models. The external validation set demonstrated that the KNN model exhibited the highest AUC of 0.935 among the four models. When considering a 50% risk of mortality threshold, the validation set displayed a sensitivity of 0.926 and a specificity of 0.934. Conclusions The prognostic outcome of COVID-19 patients is significantly influenced by the levels of Itaconic acid, 3-Oxalomalate, LYM, IL-6, PCT, and CRP upon admission. These six indicators can be utilized to assess the mortality risk in affected individuals. COVID-19 Metabolomics Machine Learning Risk of Mortality Predictive Model Figures Figure 1 Figure 2 Figure 3 1. Introduction Since the onset of the global COVID-19 pandemic in December 2019, the SARS-CoV-2 virus has persistently disseminated worldwide, resulting in more than 770 million infections and 7 million fatalities as of the present time[1]. In response to this ongoing crisis, the Chinese government and health authorities have conducted comprehensive analyses and investigations into the emergence and progression of the novel coronavirus. The "Diagnosis and Treatment Protocol for Novel Coronavirus Infection (COVID-19)" in China categorizes SARS-CoV-2 infection into mild, moderate, severe, and critical types based on the diverse range of symptoms and severity levels. Within the realm of critical care medicine, the elderly demographic exhibits significantly elevated rates of severe illness, mortality, and adverse outcomes in comparison to other age cohorts. While the elderly population in China exhibits a relatively high vaccination rate, the decline in protective antibodies over time poses a potential risk of severe illness or mortality from future COVID-19 infections [2]. Thus, there is a critical need to prioritize the diagnosis and prevention of critical illness to enhance the elderly population's ability to manage COVID-19 infections and mitigate the associated mortality risk. The current clinical prognosis assessment for COVID-19 patients primarily depends on clinical manifestations, hematological parameters, and imaging data[3]. Nevertheless, these conventional approaches may not consistently forecast disease progression, particularly in the critical care phase characterized by rapid changes in conditions. Therefore, the identification of more precise and sensitive biomarkers to anticipate mortality risk in critical-type patients is imperative. Metabolites, which play a direct role in numerous biochemical reactions within the body, have the capacity to function as biomarkers that can indicate an individual's health status and the progression of pathological processes. In recent years, metabolomics has been increasingly utilized to diagnose, predict outcomes, and monitor treatment responses in a variety of medical conditions[4, 5]. Following the onset of the COVID-19 pandemic, a multitude of studies employing metabolomics to evaluate the severity of COVID-19 infection have surfaced, demonstrating its significant potential as a tool for supporting clinical decision-making [6-11]. Liquid chromatography-mass spectrometry (LC-MS) metabolomics analysis was conducted on serum samples from the patients, leading to the identification of two predictive biomarkers for mortality risk in COVID-19. Subsequently, a predictive model for mortality risk in the disease was developed by integrating the patients' laboratory tests and metabolomics findings. The objective of this study is to develop a predictive model for mortality risk in critically ill COVID-19 patients by integrating metabolomics techniques. Through the analysis of metabolite spectrum data and clinical laboratory tests, we aim to identify key features associated with mortality risk and utilize this knowledge to create a robust predictive model. 2. Methods This prospective, retrospective cohort study was carried out in the intensive care medicine departments of two provincial tertiary care hospitals located in Heilongjiang Province, China. The study included patients who were admitted between January 3, 2023, and March 30, 2023. All patients included in the study satisfied the diagnostic criteria outlined as follows: positive reverse transcriptase polymerase chain reaction (RT-PCR) of SARS-CoV-2 ribonucleic acid (RNA) confirming the diagnosis of COVID-19 infection; and meeting the diagnostic criteria of critical type in the 10th edition of the Diagnostic and Treatment Program for Novel Coronavirus Infections ( Critical type is defined as one of the following: ① respiratory failure and the need for mechanical ventilation; ② the presence of shock; ③ combined with other organ failure requiring ICU supervision); age 65 years or older and length of stay in the ICU of at least seven days. A total of 181 critically ill patients diagnosed with COVID-19 were initially included in the study, with 31 patients subsequently excluded due to incomplete clinical data, ICU stays of fewer than seven days, or abandonment of treatment during hospitalization, thus impacting the accuracy of clinical outcome determination. Ultimately, the analysis included a total of 150 patients from two hospitals: 100 from the Second Hospital of Harbin Medical University and 50 from the First Hospital of Harbin Medical University. Metabolomic and clinical data were collected from a total of 150 patients, with 100 patients from the Second Hospital of Harbin Medical University utilized for the development of the machine learning model, and the remaining 50 patients from the First Hospital of Harbin Medical University employed for external validation of the model. The study protocol received approval from the Institutional Ethics Committee of the Second Hospital of Harbin Medical University, as well as from the partner hospital and its respective Institutional Ethics Committee. Given the unique circumstances of critically ill patients, obtaining informed consent directly from them was not feasible. Therefore, prior to collecting blood samples, the medical staff informed the patients' families about the purpose of the sampling and potential risks and obtained consent through the signing of an informed consent form. 2.2 Data and sample collection Demographic information, clinical characteristics, initial symptoms, chest CT results, nucleic acid test results, and various clinical data were gathered from the electronic medical record by experienced clinicians. The documented results were independently reviewed by two clinicians. Within 24 hours of the patient's admission, 1.5 ml of arterial blood was collected for metabolomics testing. In relation to laboratory findings, metabolomics data was obtained from blood samples along with various metrics indicative of inflammatory status, liver and kidney function, blood glucose levels, lipid profile, cardiac function, and coagulation. Prior to data collection, personally identifiable information such as names and identification numbers was anonymized, and a study ID was electronically assigned to each participant to safeguard patient confidentiality. A total of 150 samples were included in the study. Following sample collection, the whole blood samples were centrifuged by the hospital laboratory staff at 3000 rpm for 15 minutes. Subsequently, 200 μL of the upper plasma layer was pipetted into EP tubes, which were then numbered and stored at -80°C for future metabolomics testing. Prior to being received by the aforementioned personnel, all samples were appropriately anonymized. 2.3 Metabolite extraction 300μL of methanol (Thermo Fisher) was added to 100μL of serum for the untargeted metabolomics study. 5μL of 4-CL-phenylalanine (Sigma, 500 µg/mL ) was added as an internal standard and vortexed for 2 minutes. The mixture was then incubated on ice for 30 minutes to precipitate proteins. Subsequently, samples were centrifuged at 15,000 g for 10 minutes at 4℃. Following centrifugation, 300 μL of the supernatant was transferred to another clean tube and re-centrifuged again at 15,000 g for 5 minutes at 4°C. Next, 250 μL of the supernatant was transferred and separated into two aliquots. Finally, metabolite extracts were dried by a vacuum concentrator system (Labconco) and stored at -80℃ until analysis. Dried metabolite extracts were reconstituted with 120μL of 60% acetonitrile for injection based on the previous method[12] 2.4 Untargeted metabolite profiling An equal volume of each resuspended sample was mixed to create a quality control (QC) sample. The QC sample was injected 6 times before analyzing the sample cohort and inserted after every 10 samples. The samples' accuracy, precision, and linearity in the untargeted metabolomics profiling were assessed and recalibrated before metabolite annotation. Serum metabolites were analyzed using the X500B qTOF (AB sciex) coupled with a high-performance LC system (Exion LC AD system). Data were acquired within the mass/charge ratio (m/z) range of 60 to 1200 Da in the TOF MS scan, with collision energy 10V (CE) and from 50 to 1000 Da in the MS/MS scan, with collision energy 35 ± 15 V in both positive and negative ion modes. The electrospray source conditions were set as follows: curtain gas, 35 psi; temperature, 550°C; CAD gas, seven psi; spray voltage, 5.5 kV (positive) and 4.5 kV (negative). Metabolites were chromatographically separated on a UPLC HSS T3 column (ACQUITY 1.8 µm 150 × 2.1 mm, Waters) with the oven temperature maintained at 40°C. A 5-μL sample was injected at a 0.3 mL/min flow rate in mobile phase A (0.1% formic acid, Thermo Fisher) and mobile phase B (acetonitrile, Thermo Fisher). The following gradient was employed: 0.01 min 1% B, 1.5 min 1% B, 2 min 5% B, 3 min 70% B, 11 min 99% B, 15 min 99% B, 15.1 min 1% B. The raw data (.wiff) were converted into processed data (.mzXML and .mgf) using Proteo Wizard 3.0 software for further analysis. 2.5 Metabolomics data processing and annotation The MS1 peak table was processed using R package called xcms (version 1.46.0) with the following key parameters: method = “centWave”; ppm = 15; snthr = 10; peakwidth = c(5, 40); minifrac = 0.5. The generated MS1 peak table includes the mass-to-charge ratio (m/z), retention time (RT), peak abundances, and other relevant information[13]. Metabolite annotation was performed with the modified peak table by MetDNA2, a web-based platform for untargeted annotation. Subsequently, the annotated metabolite data were analyzed through Metaboanalyst 5.0 for further analyses like PCA and ROC analysis[14, 15]. 2.6 Statistical analysis and modeling methods Clinical and metabolite data were selected based on statistical analysis of characteristic differences. The parameters were subsequently categorized as binary by machine learning models for events (non-survival-0) and non-events (survival-1). In creating the model, the population was randomly divided into a training set (50%) and a test set (50%). The six metric parameters were then passed through four models using Python (3.7) to maximize the AUC (area under the curve) for K-fold cross-validation to determine the performance of the KNN, Random Forest, Plain Bayes, and Decision Tree algorithms. The K-Nearest Neighbor (KNN) algorithm is utilized in data mining and analysis to identify samples within the same category that are proximate in feature space. This method involves determining the k nearest neighbors for each data point in the sample, with the premise that if these neighbors all belong to the same category, the sample is classified accordingly, sharing similar characteristic attributes. The KNN algorithm relies exclusively on the proximity of the k nearest neighbors to make decisions, rather than utilizing a pre-defined decision function. This approach often yields improved classification and regression outcomes, particularly for datasets containing discriminative samples that exhibit significant overlap or intersection. The Naive Bayes classification model is based on probabilistic statistics, specifically utilizing Bayes' theorem and the assumption of conditional independence of features for classification. The algorithm initially calculates the probability of each feature variable belonging to a specific category, and then, assuming independence, computes the probability of the feature variables belonging to a particular category or the highest probability of any category. Naive Bayes is known for its fast training and prediction speeds, making it well-suited for real-time classification tasks. The Decision Tree classification model is a commonly used and easily interpretable machine learning technique that creates a hierarchical structure by iteratively splitting data features to classify data. Beginning at the root node, the model selects the best feature value to partition the data until a leaf node is reached, resulting in a final classification. This model is characterized by its ease of comprehension and explication, rendering it well-suited for the analysis of extensive datasets. Following the training process, the decision tree produces a discriminative model in the form of a tree structure, which determines the correct classification outcome based on the decisions made at each node. The Random Forest algorithm is a supervised learning ensemble method that consists of multiple decision trees. This Bagging-type algorithm combines weak classifiers through voting or averaging to improve the accuracy and generalization performance of the model. The success of Random Forest can be attributed to its incorporation of randomness and ensemble techniques, which help prevent overfitting and enhance precision. 3. Results 3.1 Demographics and baseline characteristics of survivors and non-survivors Following the application of predetermined criteria for inclusion, a cohort of 150 patients with critical COVID-19 illness was identified, comprising 46% survivors (n=69) and 54% non-survivors (n=81). Subsequently, clinical data from a subset of 100 patients were utilized for model characterization screening. The area under the curve (AUC) values for lymphocyte count, interleukin-6 (IL-6), C-reactive protein (CRP), and procalcitonin (PCT) were found to be significantly higher in deceased patients compared to survivors, with all corresponding p-values demonstrating statistical significance below 0.01. Table 1 demonstrates the values for statistically significant laboratory results, and complete clinical data are available in the Supplementary Information. Figure 1 shows the study flow chart. Table 1 Overview of the patient’s laboratory results. Characteristics Survivors Non-survivors P AUC No. 54 46 Blood routine examination WBC count, ×10 9 /L 8.3（6.3,10.1） 11.4（7.7,13.3）＜0.001 0.6952 Lymphocyte cell count,×10 9 /L 17.1±8.6 7.1±4.5 ＜0.001 0.8482 Neutrophil cell count, ×10 9 /L 75.2±11.2 87.2±15.8 ＜0.001 0.8456 RBC count, ×10 9 /L 3.8（3.4,4.2） 2.9（2.3,3.4）＜0.001 0.8360 Table 1 Overview of the patient’s laboratory results. Platelet count,×10 9 /L 251.2±106.5 164.3±90.5 ＜0.001 0.7460 Hemoglobin, g/L 118.7±17.6 93.0±27.1 ＜0.001 0.8329 IL-6, pg/mL 39.3（29.9,44.9） 313.9（104.1,736.8）＜0.001 0.8698 PCT, ng/mL 0.12（0,0.3） 1.1（0.4,7.2）＜0.001 0.8537 CRP, mg/L 33.0±25.1 94.7±50.9 ＜0.001 0.8585 PT, s 11.5（10.8,12.3） 13.0（11.9,15.7）＜0.001 0.7901 APTT, s 31.7±4.3 48.6±35.9 ＜0.001 0.7333 D-dimer, mg/L 537.5（210.2,1544.2） 1362.9（673.0,3478.7）＜0.001 0.7331 INR 1.0（1.0,1.2） 1.3（1.1,1.4）＜0.001 0.8406 TnI, ng/L 0.000（0.000,0.006） 0.043（0.015,0.114）＜0.001 0.8217 AST, U/L 18.0（14.7,27.2） 34.0（22.0±57.2）＜0.001 0.7703 Cr, μmol/L 67.5（57.7,77.5） 119.0（77.7,173.0）＜0.001 0.7987 BUN, mmol/L 7.79（5.58,10.67） 11.94（10.51，14.05）＜0.001 0.8422 NT-proBNP, pg/mL 614.0（206.0,1038.0） 2382.5（607.0,7882.7）＜0.001 0.7800 Note: The continuity values of the normal distribution are displayed as mean ± standard deviation, while the continuity values of the nonnormal distribution are represented as median (25th, 75th percentiles) . a p Values were estimated by using the Person Chi-square test and Mann-Whitney U test for continuous variables or t-test. Figure 1 The study flow chart. 3.2 Analysis of plasma differential metabolites in two groups of survivors and non-survivors during COVID-19 infection A total of 150 plasma samples were analyzed for the presence of metabolites. To investigate the relationship between specific metabolites and patient mortality, 100 samples were selected for the construction of a predictive model. Following normalization of metabolite values, differences in metabolite expression were visualized in Figure 2A. Subsequently, in Figure 2B, the data obtained from 100 samples underwent a differential metabolic analysis, resulting in the identification of 100 metabolites that were differentially expressed between survivors and non-survivors. Among these metabolites, 23 were found to be upregulated while 77 were downregulated. In Figure 2C, Principal Component Analysis (PCA) revealed two distinct clusters representing surviving and deceased COVID-19 patients, with a statistically significant disparity between the two groups. Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA) further demonstrated significant differences in the serum non-targeted metabolomic profiles of surviving and deceased patients, particularly in positive and negative ion modes. The findings presented in Figure 2D indicate that the top 34 metabolites, as determined by their high variable importance in the predicted scores, hold promise for distinguishing between surviving patients and non-survivors. To investigate potential biomarker metabolites that may predict mortality in critical-type COVID-19 patients, we individually analyzed 34 metabolites and reviewed relevant literature. We excluded metabolites related to energy metabolism that could be influenced by the patient's daily activities, unidentified metabolites lacking relevant research, and metabolites that may be impacted by medication use post-hospital admission (some of which may serve as intermediates in drug metabolism). Consequently, two metabolites were chosen as indicators for the subsequent prediction model. The area under the receiver operating characteristic curve ( AUROC) values were computed for the subjects to evaluate the identification precision of the candidate biomarker metabolites. The findings indicated that 3-Oxalomalate exhibited AUROC values ranging from 0.926 to 0.991, while Itaconic acid demonstrated values between 0.952 and 1. Consequently, these two metabolites were selected as the definitive discriminating markers. Figure 2 Detected metabolites analysis of serum samples from survivors and non-survivors. (A) (B)Volcano plots show the change in transformed P-value (−log10) against the log2 (survival/mortality). Gray dashed lines: cut-off values (ratio >1.5 and p-value <0.05). Orange dots: High expression of metabolites (23). Blue dots: Low expression of metabolites (77). The size of the points varies according to the absolute value of [Log2(survival/mortality)]. (C) Principal Component Analysis (PCA). The green and red points are survival and mortality. (D) Variable important in projection (VIP). (E) and (F) show the AUC values of 3-Oxalomalate and Itaconic acid. 3.3 Machine Learning Model Figure 3 (A) Violin plots: The median is denoted by a white dot, the interquartile range is represented by a thick grey bar, and the remaining distribution is depicted by thin grey lines. 3.3.1 Statistical Analysis The utilization of violin plots and kernel density plots effectively illustrates the distribution of clinical and metabolomic data upon admission for patients who succumbed to their condition (0) and those who survived (1). 3.3.2 Development of the Predictive Model encompasses four primary stages: (I) Data Selection and Preprocessing; (II) Risk Scoring Function; (III) Model Evaluation; (IV) External Validation. (I) Data Selection and Preprocessing Following a thorough analysis of the dataset, clinical and metabolomic data from 150 patients were incorporated. Binary variables were coded such that "1" represented "yes" and "0" represented "no", while discrete variables were standardized. A comprehensive analysis was conducted on a dataset comprising 347+20 features, including metabolites and clinical indicators. Through a combination of differential screening and literature review, six indicators, consisting of two metabolites and four clinical indicators, were identified as being correlated with patient survival status. A predictive model was developed to differentiate between surviving and deceased patients. The Box Tidwell test was employed to evaluate the linearity of the data, with a significance level of p<0.05 indicating linear independence. (II) Risk Scoring Function After inputting individual data into the algorithm, the machine predicts two metabolite data points and four clinical data points for the patient. An increment of 1 is applied to the death risk score for each abnormal indicator detected, while the score remains unchanged for patients with normal indicators. The final score is then utilized in a formula to determine the percentage value of the death risk. A death probability falling within the range of 0 to 49% is categorized as survival, whereas a probability between 50 and 100% is classified as death. Figure 3B illustrates the correlation pattern between the risk score and death probability, displaying an S-shaped curve. Figure 3C depicts the approximate normal distribution of risk scores for surviving and deceased patients. It is observed that when the risk score exceeds 3, the probability of death exceeds 50%. (III) Model Evaluation We examined four different machine learning methods to analyze the potential linear or nonlinear relationships between predictive indicator variables and predicted outcomes, specifically focusing on survival or non-survival. The models tested were KNN, Naive Bayes, Decision Tree, and Random Forest, with results evaluated in terms of model performance and attributes. This study utilizes four traditional machine learning algorithms (KNN, Naive Bayes, Random Forest, and Decision Tree) to conduct data classification testing and evaluates the classification performance of these algorithms through experimentation. The study employs the Sklearn machine learning library, which is based on the Python language, to utilize four classifiers derived from the Naïve-Bayes, Neighbors, Tree, and Ensemble modules within the Sklearn library for model training and category prediction. Additionally, it provides essential performance metrics to assess the algorithms and classification models, such as accuracy, specificity, sensitivity, and area under the receiver operating characteristic curve (AUROC), as shown in Figure 3D. The comparative experimental results for the four algorithms, namely Naive Bayes, KNN, Random Forest, and Decision Tree, are displayed in Table 2, highlighting the optimal metrics for emphasis. Table 2 Experimental results for the four algorithms. Naive Bayes KNN Random Forest Decision Tree Accuracy:0.88 Sensitivity/Recall:0.857 Specificity:0.897 AUROC:0.889 Accuracy:0.93 Sensitivity/Recall:0.926 Specificity:0.934 AUROC:0.935 Accuracy:0.88 Sensitivity/Recall:0.755 Specificity:0.813 AUROC:0.88 Accuracy:0.85 Sensitivity/Recall:0.755 Specificity:0.934 AUROC:0.861 After evaluating the performance of the four predictive models selected, the KNN model was ultimately selected as the final model based on its high AUROC value of 0.935 and its minimal deviation from the performance of the other models. Additionally, the model's simplicity and high interpretability were taken into consideration. (IV) External Validation A cohort of fifty laboratory-confirmed severe COVID-19 patients, comprising 27 survivors and 23 non-survivors, were admitted to the First Affiliated Hospital of Harbin Medical University for external validation. The chosen predictive model was employed to predict the likelihood of mortality in these patients, and its efficacy was assessed through measures of accuracy, AUROC, sensitivity, and specificity. 4. Discussion This study, conducted at two tertiary hospitals affiliated with Harbin Medical University, retrospectively analyzes the relationship between specific metabolites (3-Oxalomalate and Itaconic acid), lymphocyte count, interleukin-6 (IL-6), C-reactive protein (CRP), and procalcitonin (PCT) levels in the bloodstream and the mortality risk in individuals with COVID-19. The findings suggest that these factors play a crucial role as prognostic indicators for mortality in COVID-19 patients. The utilization of a KNN algorithm within the machine learning framework has shown notable effectiveness in forecasting outcomes using the specified six determinants. The utilization of laboratory-based biomarkers in the development of predictive models presents a comprehensive and effective approach for assessing mortality risks linked to COVID-19. This research has identified four biomarkers, namely lymphocyte count (LYM), IL-6, CRP, and PCT, as notable indicators of mortality. During the initial phases of the pandemic, Sun et al. found that older age upon hospital admission and decreased lymphocyte count was associated with increased mortality among COVID-19 patients, establishing these factors as autonomous risk factors for the elderly population[ 16 ]. The decline in humoral and cellular immune function with advancing age results in decreased immune system activation upon viral invasion, thereby elevating the risk of mortality in older individuals with diminished T-cell counts. The reduction in lymphoid T-cell counts is particularly notable in individuals with severe COVID-19, as evidenced by elevated levels of C-reactive protein (CRP), a protein synthesized in the liver[ 17 ]. The heightened levels of CRP following infection may be associated with the excessive production of inflammatory cytokines in patients with severe COVID-19, serving as an early prognostic marker for disease severity. In a cohort study involving 298 participants, the area under the receiver operating characteristic (ROC) curve for C-reactive protein (CRP) was found to be significantly higher than that of age, neutrophil count, and platelet count[ 18 ]. A retrospective analysis conducted by Tao Liu et al. on 69 patients with severe COVID-19 revealed a notable elevation in interleukin-6 (IL-6) levels upon admission in severe cases compared to non-severe cases. The variations in IL-6 levels were found to be closely associated with the severity and prognosis of severe COVID-19, showing a strong positive correlation with CRP levels (r = 0.781, P < 0.001). Although there were slight variations in IL-2, IL-4, IL-10, tumor necrosis factor-alpha (TNF-α) levels, and interferon-gamma (IFN-γ) before and after treatment, these levels remained within the normal range[ 19 ]. In contrast, the initial increase in procalcitonin (PCT) levels is more significant in bacterial infections compared to viral infections or non-infectious systemic inflammatory response syndrome (SIRS)[ 20 ]. However, a comprehensive clinical study conducted by Zhang et al. involving 140 hospitalized COVID-19 patients demonstrated that individuals with severe disease presentations displayed notably higher levels of C-reactive protein and procalcitonin in comparison to those with less severe manifestations (P < 0.001)[ 21 ]. Additionally, a separate retrospective examination of 452 severe cases of COVID-19 provided further evidence that procalcitonin serves as a significant independent predictor of mortality[ 22 ]. The laboratory assays mentioned above have been validated in previous studies for their ability to assess disease severity and mortality risk in COVID-19 patients. This study has advanced beyond clinical constraints by utilizing non-targeted metabolomic analysis of blood samples from ICU patients, leading to the identification of two metabolites that improve predictive accuracy for mortality risk. Initially, itaconic acid underwent extensive investigation in the field of basic medical sciences, with a growing emphasis on its potent antibactnate (4-OI)[ 23 – 27 ], which has demonstrated the ability to inhibit the replication of SARS-CoV-2. Through activation of Nrf2, an antioxidant and anti-inflammatory transcription factor, 4-OI induces an antiviral response capable of suppressing the replication of SARS-CoV-2 and other pathogenic viruses, such as herpes simplex virus 1 (HSV-1) and vaccinia virus[ 28 ]. The non-targeted metabolomic analysis performed in this study has determined that Itaconic acid demonstrates an AUC value of 0.981 in predicting COVID-19 mortality risk, further supporting previous research findings. Combining this metabolite with 3-Oxalomalate (AUC: 0.963) and the four established laboratory parameters significantly enhances the accuracy of predictive modeling. This study recognizes specific inherent constraints. The patient sample is limited to individuals from northern China, restricting the diversity of the dataset and the amount of data collected due to geographical limitations. Subsequent versions of this model could potentially improve through the inclusion of a broader dataset, leading to enhanced refinement and development. Furthermore, the non-targeted metabolomic results, which are relative, do not yield an absolute quantitative value. Although this method has demonstrated improved predictive accuracy using available clinical data, its implementation in real-world clinical environments presents significant challenges. During the COVID-19 pandemic, critical care medicine is essential for preserving human life. Utilizing precise predictive methods allows for the accurate assessment of patient mortality risk, aiding in the allocation of necessary resources. Ideally, this precision enables the implementation of personalized treatment plans for high-risk patients, potentially leading to significant improvements in patient outcomes. In clinical practice, it is observed that certain patients with a high mortality risk may derive greater benefit from palliative care and familial support during the terminal phase of life, representing an additional notable aspect of the predictive model. 5. Conclusion In conclusion, the mortality risk prediction model for severe COVID-19 patients, established in this study based on two metabolites and four laboratory-derived clinical indicators, offers a precise prognosis for the likelihood of death. This model signifies a notable progression from prior models that exclusively utilized laboratory testing. The application of the KNN algorithm in this context aids healthcare professionals in formulating tailored treatment plans, ultimately enhancing the effectiveness of interventions and minimizing the occurrence of mortality and other unfavorable consequences. Declarations Ethics approval and consent to participate The study protocol was approved by the Ethics Committee of the Second Affiliated Hospital of Harbin Medical University (approval number: KY2023-023), and signed informed consent forms were collected from all study subjects or their families. All studies were conducted in accordance with the Declaration of Helsinki as revised in 2013. Consent for publication Written informed consent was obtained from the patient for publication. A copy of the written consent is available for review by the Editor-in-Chief of this journal. Conflict-of-interest statement All authors declare that they have no conflicts of interest to disclose. Availability of data and materials Data sharing does not apply to this article as no datasets were generated or analyzed during the current study. Funding The Outstanding Youth Project of Heilongjiang Natural Science Foundation (Nos. JQ2021H002); The National Key Research and Development Program of China (Nos. 2021YFC2501800); Key R&D Plan Project in Heilongjiang Province (No. GY2023ZB0075); Harbin Medical University Foundation Youth Project (NO. PYQN2023-9). Authors' contributions CSP contributed to the manuscript in conceptualization, method, data collection, formal data analysis, drafting, visualization, and writing the manuscript as well as critical revision and statistical analysis. This author takes responsibility for the content of this manuscript, including the data and analysis. HQY and ZR contributed to the manuscript in conceptualization, data collection, drafting, writing the manuscript as well as comprehensive critical revision. LY contributed to the manuscript in data collection, formal data analysis, drafting and writing the manuscript as well as critical revision and statistical analysis. LM contributed to the manuscript in conceptualization, methodology, critical revision, and statistical analysis. LWH contributed to the manuscript in, data collection, formal data analysis, and writing the manuscript as well as critical revision and statistical analysis. ZJB contributed to the manuscript in data collection, formal data analysis, drafting and writing the manuscript as well as critical revision. WHL contributed to the manuscript in, formal data analysis and writing the manuscript as well as critical revision. All authors have read and agreed to the published version of the manuscript. Acknowledgments We are grateful to all those who offer selfless advice, help, and support to our study. References COVID-19 cases | WHO COVID-19 dashboard. In., vol. 2024: 19. Chen Y, Klein SL, Garibaldi BT, Li H, Wu C, Osevala NM, Li T, Margolick JB, Pawelec G, Leng SX. Aging in COVID-19: Vulnerability, immunity and intervention. AGEING RES REV. 2021;65:101205. Gao HB, Zhang J. [Analysis of prognostic factors in patients with COVID-19 infection]. Zhonghua jie he he hu xi za zhi = Zhonghua jiehe he hu xi zazhi = Chinese. J tuberculosis Respiratory Dis. 2024;47(3):296–300. López-López Á, López-Gonzálvez Á, Barker-Tejeda TC, Barbas C. A review of validated biomarkers obtained through metabolomics. EXPERT REV MOL DIAGN. 2018;18(6):557–75. Gonzalez-Covarrubias V, Martínez-Martínez E, Del Bosque-Plata L. The Potential of Metabolomics in Biomedical Applications. Metabolites 2022, 12(2). Maeda R, Seki N, Uwamino Y, Wakui M, Nakagama Y, Kido Y, Sasai M, Taira S, Toriu N, Yamamoto M, et al. Amino acid catabolite markers for early prognostication of pneumonia in patients with COVID-19. NAT COMMUN. 2023;14(1):8469. Chatelaine HAS, Chen Y, Braisted J, Chu SH, Chen Q, Stav M, Begum S, Diray-Arce J, Sanjak J, Huang M et al. Nucleotide, Phospholipid, and Kynurenine Metabolites Are Robustly Associated with COVID-19 Severity and Time of Plasma Sample Collection in a Prospective Cohort Study. INT J MOL SCI 2023, 25(1). Shi D, Yan R, Lv L, Jiang H, Lu Y, Sheng J, Xie J, Wu W, Xia J, Xu K, et al. The serum metabolome of COVID-19 patients is distinctive and predictive. Metab Clin Exp. 2021;118:154739. Roberts I, Wright Muelas M, Taylor JM, Davison AS, Xu Y, Grixti JM, Gotts N, Sorokin A, Goodacre R, Kell DB. Untargeted metabolomics of COVID-19 patient serum reveals potential prognostic markers of both severity and outcome. Metabolomics: Official J Metabolomic Soc. 2021;18(1):6. Julkunen H, Cichońska A, Slagboom PE, Würtz P. Metabolic biomarker profiling for identification of susceptibility to severe pneumonia and COVID-19 in the general population. ELIFE 2021, 10. Sindelar M, Stancliffe E, Schwaiger-Haber M, Anbukumar DS, Adkins-Travis K, Goss CW, O'Halloran JA, Mudd PA, Liu W, Albrecht RA, et al. Longitudinal metabolomics of human plasma reveals prognostic markers of COVID-19 disease severity. Cell Rep Med. 2021;2(8):100369. Fang W, Zhu Y, Yang S, Tong X, Ye C. Reciprocal regulation of phosphatidylcholine synthesis and H3K36 methylation programs metabolic adaptation. CELL REP. 2022;39(2):110672. Smith CA, Want EJ, O'Maille G, Abagyan R, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. ANAL CHEM. 2006;78(3):779–87. Zhou Z, Luo M, Zhang H, Yin Y, Cai Y, Zhu ZJ. Metabolite annotation from knowns to unknowns through knowledge-guided multi-layer metabolic networking. NAT COMMUN. 2022;13(1):6656. Pang Z, Zhou G, Ewald J, Chang L, Hacariz O, Basu N, Xia J. Using MetaboAnalyst 5.0 for LC-HRMS spectra processing, multi-omics integration and covariate adjustment of global metabolomics data. NAT PROTOC. 2022;17(8):1735–61. Sun H, Ning R, Tao Y, Yu C, Deng X, Zhao C, Meng S, Tang F, Xu D. Risk Factors for Mortality in 244 Older Adults With COVID-19 in Wuhan, China: A Retrospective Study. J AM GERIATR SOC. 2020;68(6):E19–23. Qin C, Zhou L, Hu Z, Zhang S, Yang S, Tao Y, Xie C, Ma K, Shang K, Wang W, et al. Dysregulation of Immune Response in Patients With Coronavirus 2019 (COVID-19) in Wuhan, China. CLIN INFECT DIS. 2020;71(15):762–8. Luo X, Zhou W, Yan X, Guo T, Wang B, Xia H, Ye L, Xiong J, Jiang Z, Liu Y, et al. Prognostic Value of C-Reactive Protein in Patients With Coronavirus 2019. CLIN INFECT DIS. 2020;71(16):2174–9. Liu T, Zhang J, Yang Y, Ma H, Li Z, Zhang J, Cheng J, Zhang X, Zhao Y, Xia Z, et al. The role of interleukin-6 in monitoring severe case of coronavirus disease 2019. EMBO MOL MED. 2020;12(7):e12421. Paudel R, Dogra P, Montgomery-Yates AA, Coz YA. Procalcitonin: A promising tool or just another overhyped test? INT J MED SCI. 2020;17(3):332–7. Zhang JJ, Dong X, Cao YY, Yuan YD, Yang YB, Yan YQ, Akdis CA, Gao YD. Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan. China ALLERGY. 2020;75(7):1730–41. Shang Y, Liu T, Wei Y, Li J, Shao L, Liu M, Zhang Y, Zhao Z, Xu H, Peng Z et al. Scoring systems for predicting mortality for severe patients with COVID-19. EClinicalMedicine : 2020, 24:100426. Mills EL, Ryan DG, Prag HA, Dikovskaya D, Menon D, Zaslona Z, Jedrychowski MP, Costa A, Higgins M, Hams E et al. Itaconate is an anti-inflammatory metabolite that activates Nrf2 via alkylation of KEAP1. NATURE : 2018, 556(7699):113–117. Michelucci A, Cordes T, Ghelfi J, Pailot A, Reiling N, Goldmann O, Binz T, Wegner A, Tallam A, Rausell A, et al. Immune-responsive gene 1 protein links metabolism to immunity by catalyzing itaconic acid production. P NATL ACAD SCI USA. 2013;110(19):7820–5. O'Neill LAJ, Artyomov MN. Itaconate: the poster child of metabolic reprogramming in macrophage function. Nat Rev Immunol. 2019;19(5):273–81. Chen M, Sun H, Boot M, Shao L, Chang S, Wang W, Lam TT, Lara-Tejero M, Rego EH, Galán JE. Itaconate is an effector of a Rab GTPase cell-autonomous host defense pathway against Salmonella. Volume 369. New York, N.Y.): Science; 2020. pp. 450–5. 6502. Swain A, Bambouskova M, Kim H, Andhey PS, Duncan D, Auclair K, Chubukov V, Simons DM, Roddy TP, Stewart KM, et al. Comparative evaluation of itaconate and its derivatives reveals divergent inflammasome and type I interferon regulation in macrophages. Nat Metabolism. 2020;2(7):594–602. Olagnier D, Farahani E, Thyrsted J, Blay-Cadanet J, Herengt A, Idorn M, Hait A, Hernaez B, Knudsen A, Iversen MB, et al. SARS-CoV2-mediated suppression of NRF2-signaling reveals potent antiviral and anti-inflammatory activity of 4-octyl-itaconate and dimethyl fumarate. NAT COMMUN. 2020;11(1):4938. Additional Declarations No competing interests reported. Supplementary Files supplementarymaterials.docx Cite Share Download PDF Status: Published Journal Publication published 02 Jan, 2025 Read the published version in BMC Infectious Diseases → Version 1 posted Editorial decision: Revision requested 25 Oct, 2024 Reviews received at journal 25 Oct, 2024 Reviewers agreed at journal 25 Oct, 2024 Reviews received at journal 23 Sep, 2024 Reviews received at journal 21 Sep, 2024 Editor invited by journal 12 Sep, 2024 Reviewers agreed at journal 11 Sep, 2024 Reviewers agreed at journal 10 Sep, 2024 Reviews received at journal 10 Sep, 2024 Reviewers agreed at journal 10 Sep, 2024 Reviewers invited by journal 08 Sep, 2024 Editor assigned by journal 28 May, 2024 Submission checks completed at journal 17 May, 2024 First submitted to journal 14 May, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4418889","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":307614818,"identity":"8959e8bf-49b3-4030-97ce-6fd709638a8d","order_by":0,"name":"Shanpeng Cui","email":"","orcid":"","institution":"Second Affiliated Hospital of Harbin Medical University","correspondingAuthor":false,"prefix":"","firstName":"Shanpeng","middleName":"","lastName":"Cui","suffix":""},{"id":307614819,"identity":"3488a39e-76d7-4cc2-b7f3-0d9b37b02b22","order_by":1,"name":"Qiuyuan Han","email":"","orcid":"","institution":"Second Affiliated Hospital of Harbin Medical University","correspondingAuthor":false,"prefix":"","firstName":"Qiuyuan","middleName":"","lastName":"Han","suffix":""},{"id":307614825,"identity":"c82cb518-0fe0-4fde-900d-62a074b7c439","order_by":2,"name":"Ran Zhang","email":"","orcid":"","institution":"Harbin University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Ran","middleName":"","lastName":"Zhang","suffix":""},{"id":307614826,"identity":"300a8c8f-0388-4c60-865b-d23a7e2b5133","order_by":3,"name":"Yue Li","email":"","orcid":"","institution":"Second Affiliated Hospital of Harbin Medical University","correspondingAuthor":false,"prefix":"","firstName":"Yue","middleName":"","lastName":"Li","suffix":""},{"id":307614829,"identity":"d4bc51a7-75b9-4676-8868-ee94c599048d","order_by":4,"name":"Ming Li","email":"","orcid":"","institution":"Second Affiliated Hospital of Harbin Medical University","correspondingAuthor":false,"prefix":"","firstName":"Ming","middleName":"","lastName":"Li","suffix":""},{"id":307614830,"identity":"bd59e167-5fa5-4a14-9687-ca40ab5a6b12","order_by":5,"name":"Wenhua Liu","email":"","orcid":"","institution":"Second Affiliated Hospital of Harbin Medical University","correspondingAuthor":false,"prefix":"","firstName":"Wenhua","middleName":"","lastName":"Liu","suffix":""},{"id":307614831,"identity":"5ed8f35e-a8f6-46b6-9880-2a308c061f1c","order_by":6,"name":"Junbo Zheng","email":"","orcid":"","institution":"Second Affiliated Hospital of Harbin Medical University","correspondingAuthor":false,"prefix":"","firstName":"Junbo","middleName":"","lastName":"Zheng","suffix":""},{"id":307614832,"identity":"9847028f-8b9e-4b38-bd64-66698c2cde26","order_by":7,"name":"Hongliang Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABCElEQVRIiWNgGAWjYBACNiidAMSMB0AkPzPz4QfEamEAa5FsZ0szIMY2iBYQYXCeR0ECn1I+9jNmEj931ObxS7dfOPCg5o7d5sM8DAYMNTbROB3Gk2Mm2XvmeLHknDMFBxKOPUvedpj3wAOGY2m5Dbi0SPCY3eBtO5a44UZOwoEEtsPJZof5EgwYGw7j1XLzL1zLv8PJxs08BhKEtNzmbasBakk/cCCx7bCdATMhLTxp5b9l2w4kzpyRw3Agse9wgsRhYCAn4PGLfPvhzYZv2+oS+yXSHz788e2wPX//4cMPPtTY4NQCBYeBmAccgYlglQn4lYNAHRCzPwCx7AkrHgWjYBSMgpEGAFFgZB6UsbuJAAAAAElFTkSuQmCC","orcid":"","institution":"Second Affiliated Hospital of Harbin Medical University","correspondingAuthor":true,"prefix":"","firstName":"Hongliang","middleName":"","lastName":"Wang","suffix":""}],"badges":[],"createdAt":"2024-05-14 11:40:06","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4418889/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4418889/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12879-024-10402-3","type":"published","date":"2025-01-02T15:57:08+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":57708140,"identity":"d8c67125-7b28-4052-bd1c-f91b2454350c","added_by":"auto","created_at":"2024-06-04 15:15:47","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":533298,"visible":true,"origin":"","legend":"\u003cp\u003eThe study flow chart.\u003c/p\u003e","description":"","filename":"fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-4418889/v1/a430d7f59b647741fc9f5852.png"},{"id":57708138,"identity":"c74028d6-8079-48b4-9fa4-c83b25ef3245","added_by":"auto","created_at":"2024-06-04 15:15:47","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":4893344,"visible":true,"origin":"","legend":"\u003cp\u003eDetected metabolites analysis of serum samples from survivors and non-survivors. (A) (B)Volcano plots show the change in transformed P-value (−log10) against the log2 (survival/mortality). Gray dashed lines: cut-off values (ratio \u0026gt;1.5 and p-value \u0026lt;0.05). Orange dots: High expression of metabolites (23). Blue dots: Low expression of metabolites (77). The size of the points varies according to the absolute value of [Log2(survival/mortality)]. (C) Principal Component Analysis (PCA). The green and red points are survival and mortality. (D) Variable important in projection (VIP). (E) and (F) show the AUC values of 3-Oxalomalate and Itaconic acid.\u003c/p\u003e","description":"","filename":"fig2.png","url":"https://assets-eu.researchsquare.com/files/rs-4418889/v1/8b2fc8797e0bd793daccc397.png"},{"id":57708916,"identity":"1ff2fe7c-f04a-408a-b3d8-fb4ac5089239","added_by":"auto","created_at":"2024-06-04 15:23:47","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":510203,"visible":true,"origin":"","legend":"\u003cp\u003e(A) Violin plots: The median is denoted by a white dot, the interquartile range is represented by a thick grey bar, and the remaining distribution is depicted by thin grey lines.\u003c/p\u003e","description":"","filename":"fig3.png","url":"https://assets-eu.researchsquare.com/files/rs-4418889/v1/35508b581077102d73da764a.png"},{"id":73093237,"identity":"f8c8cfe0-18d2-4b6b-bebf-5afd790872cf","added_by":"auto","created_at":"2025-01-06 16:11:36","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":7568133,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4418889/v1/85f1e55e-96ec-4482-826b-f3280a672416.pdf"},{"id":57708136,"identity":"339cd218-ba97-4f1a-ab1b-f2b331742c60","added_by":"auto","created_at":"2024-06-04 15:15:47","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":21608,"visible":true,"origin":"","legend":"","description":"","filename":"supplementarymaterials.docx","url":"https://assets-eu.researchsquare.com/files/rs-4418889/v1/acd97b9e4907f2ad39a62171.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Integration of metabolomics methodologies for the development of predictive models for mortality risk in patients with severe COVID-19.","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eSince the onset of the global COVID-19 pandemic in December 2019, the SARS-CoV-2 virus has persistently disseminated worldwide, resulting in more than 770 million infections and 7 million fatalities as of the present time[1]. In response to this ongoing crisis, the Chinese government and health authorities have conducted comprehensive analyses and investigations into the emergence and progression of the novel coronavirus. The \u0026quot;Diagnosis and Treatment Protocol for Novel Coronavirus Infection (COVID-19)\u0026quot; in China categorizes SARS-CoV-2 infection into mild, moderate, severe, and critical types based on the diverse range of symptoms and severity levels. Within the realm of critical care medicine, the elderly demographic exhibits significantly elevated rates of severe illness, mortality, and adverse outcomes in comparison to other age cohorts. While the elderly population in China exhibits a relatively high vaccination rate, the decline in protective antibodies over time poses a potential risk of severe illness or mortality from future COVID-19 infections\u0026nbsp;[2]. Thus, there is a critical need to prioritize the diagnosis and prevention of critical illness to enhance the elderly population\u0026apos;s ability to manage COVID-19 infections and mitigate the associated mortality risk.\u003c/p\u003e\n\u003cp\u003eThe current clinical prognosis assessment for COVID-19 patients primarily depends on clinical manifestations, hematological parameters, and imaging data[3]. Nevertheless, these conventional approaches may not consistently forecast disease progression, particularly in the critical care phase characterized by rapid changes in conditions. Therefore, the identification of more precise and sensitive biomarkers to anticipate mortality risk in critical-type patients is imperative. Metabolites, which play a direct role in numerous biochemical reactions within the body, have the capacity to function as biomarkers that can indicate an individual\u0026apos;s health status and the progression of pathological processes. In recent years, metabolomics has been increasingly utilized to diagnose, predict outcomes, and monitor treatment responses in a variety of medical conditions[4, 5]. Following the onset of the COVID-19 pandemic, a multitude of studies employing metabolomics to evaluate the severity of COVID-19 infection have surfaced, demonstrating its significant potential as a tool for supporting clinical decision-making\u0026nbsp;[6-11]. Liquid chromatography-mass spectrometry (LC-MS) metabolomics analysis was conducted on serum samples from the patients, leading to the identification of two predictive biomarkers for mortality risk in COVID-19. Subsequently, a predictive model for mortality risk in the disease was developed by integrating the patients\u0026apos; laboratory tests and metabolomics findings.\u003c/p\u003e\n\u003cp\u003eThe objective of this study is to develop a predictive model for mortality risk in critically ill COVID-19 patients by integrating metabolomics techniques. Through the analysis of metabolite spectrum data and clinical laboratory tests, we aim to identify key features associated with mortality risk and utilize this knowledge to create a robust predictive model.\u003c/p\u003e"},{"header":"2. Methods","content":"\u003cp\u003eThis prospective, retrospective cohort study was carried out in the intensive care medicine departments of two provincial tertiary care hospitals located in Heilongjiang Province, China. The study included patients who were admitted between January 3, 2023, and March 30, 2023. All patients included in the study satisfied the diagnostic criteria outlined as follows: positive reverse transcriptase polymerase chain reaction (RT-PCR) of SARS-CoV-2 ribonucleic acid (RNA) confirming the diagnosis of COVID-19 infection; and meeting the diagnostic criteria of critical type in the 10th edition of the Diagnostic and Treatment Program for Novel Coronavirus Infections ( Critical type is defined as one of the following:\u0026nbsp;\u003cstrong\u003e①\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003erespiratory failure and the need for mechanical ventilation;\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e②\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003ethe presence of shock;\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e③\u003c/strong\u003e\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003ecombined with other organ failure requiring ICU supervision); age 65 years or older and length of stay in the ICU of at least seven days. A total of 181 critically ill patients diagnosed with COVID-19 were initially included in the study, with 31 patients subsequently excluded due to incomplete clinical data, ICU stays of fewer than seven days, or abandonment of treatment during hospitalization, thus impacting the accuracy of clinical outcome determination. Ultimately, the analysis included a total of 150 patients from two hospitals: 100 from the Second Hospital of Harbin Medical University and 50 from the First Hospital of Harbin Medical University. Metabolomic and clinical data were collected from a total of 150 patients, with 100 patients from the Second Hospital of Harbin Medical University utilized for the development of the machine learning model, and the remaining 50 patients from the First Hospital of Harbin Medical University employed for external validation of the model.\u003c/p\u003e\n\u003cp\u003eThe study protocol received approval from the Institutional Ethics Committee of the Second Hospital of Harbin Medical University, as well as from the partner hospital and its respective Institutional Ethics Committee. Given the unique circumstances of critically ill patients, obtaining informed consent directly from them was not feasible. Therefore, prior to collecting blood samples, the medical staff informed the patients\u0026apos; families about the purpose of the sampling and potential risks and obtained consent through the signing of an informed consent form.\u003c/p\u003e\n\u003cp\u003e2.2 Data and sample collection\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eDemographic information, clinical characteristics, initial symptoms, chest CT results, nucleic acid test results, and various clinical data were gathered from the electronic medical record by experienced clinicians. The documented results were independently reviewed by two clinicians. Within 24 hours of the patient\u0026apos;s admission, 1.5 ml of arterial blood was collected for metabolomics testing. In relation to laboratory findings, metabolomics data was obtained from blood samples along with various metrics indicative of inflammatory status, liver and kidney function, blood glucose levels, lipid profile, cardiac function, and coagulation. Prior to data collection, personally identifiable information such as names and identification numbers was anonymized, and a study ID was electronically assigned to each participant to safeguard patient confidentiality. A total of 150 samples were included in the study. Following sample collection, the whole blood samples were centrifuged by the hospital laboratory staff at 3000 rpm for 15 minutes. Subsequently, 200\u0026nbsp;\u0026mu;L of the upper plasma layer was pipetted into EP tubes, which were then numbered and stored at -80\u0026deg;C for future metabolomics testing. Prior to being received by the aforementioned personnel, all samples were appropriately anonymized.\u003c/p\u003e\n\u003cp\u003e2.3 Metabolite extraction\u003c/p\u003e\n\u003cp\u003e300\u0026mu;L of methanol (Thermo Fisher) was added to 100\u0026mu;L of serum for the untargeted metabolomics study. 5\u0026mu;L of 4-CL-phenylalanine (Sigma, 500 \u0026micro;g/mL ) was added as an internal standard and vortexed for 2 minutes. The mixture was then incubated on ice for 30 minutes to precipitate proteins. Subsequently, samples were centrifuged at 15,000 g for 10 minutes at 4℃. Following centrifugation, 300 \u0026mu;L of the supernatant was transferred to another clean tube and re-centrifuged again at 15,000 g for 5 minutes at 4\u0026deg;C. Next, 250 \u0026mu;L of the supernatant was transferred and separated into two aliquots. Finally, metabolite extracts were dried by a vacuum concentrator system (Labconco) and stored at -80℃ until analysis. Dried metabolite extracts were reconstituted with 120\u0026mu;L of 60% acetonitrile for injection based on the previous method[12]\u003c/p\u003e\n\u003cp\u003e2.4 Untargeted metabolite profiling\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAn equal volume of each resuspended sample was mixed to create a quality control (QC) sample. The QC sample was injected 6 times before analyzing the sample cohort and inserted after every 10 samples. The samples\u0026apos; accuracy, precision, and linearity in the untargeted metabolomics profiling were assessed and recalibrated before metabolite annotation.\u003c/p\u003e\n\u003cp\u003eSerum metabolites were analyzed using the X500B qTOF (AB sciex) coupled with a high-performance LC system (Exion LC AD system). Data were acquired within the mass/charge ratio (m/z) range of 60 to 1200 Da in the TOF MS scan, with collision energy 10V (CE) and from 50 to 1000 Da in the MS/MS scan, with collision energy 35 \u0026plusmn; 15 V in both positive and negative ion modes. The electrospray source conditions were set as follows: curtain gas, 35 psi; temperature, 550\u0026deg;C; CAD gas, seven psi; spray voltage, 5.5 kV (positive) and 4.5 kV (negative). Metabolites were chromatographically separated on a UPLC HSS T3 column (ACQUITY 1.8 \u0026micro;m 150 \u0026times; 2.1 mm, Waters) with the oven temperature maintained at 40\u0026deg;C. A 5-\u0026mu;L sample was injected at a 0.3 mL/min flow rate in mobile phase A (0.1% formic acid, Thermo Fisher) and mobile phase B (acetonitrile, Thermo Fisher). The following gradient was employed: 0.01 min 1% B, 1.5 min 1% B, 2 min 5% B, 3 min 70% B, 11 min 99% B, 15 min 99% B, 15.1 min 1% B. The raw data (.wiff) were converted into processed data (.mzXML and .mgf) using Proteo Wizard 3.0 software for further analysis.\u003c/p\u003e\n\u003cp\u003e2.5 Metabolomics data processing and annotation\u003c/p\u003e\n\u003cp\u003eThe MS1 peak table was processed using R package called xcms (version 1.46.0) with the following key parameters: method\u0026thinsp;=\u0026thinsp;\u0026ldquo;centWave\u0026rdquo;; ppm\u0026thinsp;=\u0026thinsp;15; snthr\u0026thinsp;=\u0026thinsp;10; peakwidth\u0026thinsp;=\u0026thinsp;c(5, 40); minifrac\u0026thinsp;=\u0026thinsp;0.5. The generated MS1 peak table includes the mass-to-charge ratio (m/z), retention time (RT), peak abundances, and other relevant information[13]. Metabolite annotation was performed with the modified peak table by MetDNA2, a web-based platform for untargeted annotation. Subsequently, the annotated metabolite data were analyzed through Metaboanalyst 5.0 for further analyses like PCA and ROC analysis[14, 15].\u003c/p\u003e\n\u003cp\u003e2.6 Statistical analysis and modeling methods\u003c/p\u003e\n\u003cp\u003eClinical and metabolite data were selected based on statistical analysis of characteristic differences. The parameters were subsequently categorized as binary by machine learning models for events (non-survival-0) and non-events (survival-1). In creating the model, the population was randomly divided into a training set (50%) and a test set (50%). The six metric parameters were then passed through four models using Python (3.7) to maximize the AUC (area under the curve) for K-fold cross-validation to determine the performance of the KNN, Random Forest, Plain Bayes, and Decision Tree algorithms.\u003c/p\u003e\n\u003cp\u003eThe K-Nearest Neighbor (KNN) algorithm is utilized in data mining and analysis to identify samples within the same category that are proximate in feature space. This method involves determining the k nearest neighbors for each data point in the sample, with the premise that if these neighbors all belong to the same category, the sample is classified accordingly, sharing similar characteristic attributes. The KNN algorithm relies exclusively on the proximity of the k nearest neighbors to make decisions, rather than utilizing a pre-defined decision function. This approach often yields improved classification and regression outcomes, particularly for datasets containing discriminative samples that exhibit significant overlap or intersection.\u003c/p\u003e\n\u003cp\u003eThe Naive Bayes classification model is based on probabilistic statistics, specifically utilizing Bayes\u0026apos; theorem and the assumption of conditional independence of features for classification. The algorithm initially calculates the probability of each feature variable belonging to a specific category, and then, assuming independence, computes the probability of the feature variables belonging to a particular category or the highest probability of any category. Naive Bayes is known for its fast training and prediction speeds, making it well-suited for real-time classification tasks.\u003c/p\u003e\n\u003cp\u003eThe Decision Tree classification model is a commonly used and easily interpretable machine learning technique that creates a hierarchical structure by iteratively splitting data features to classify data. Beginning at the root node, the model selects the best feature value to partition the data until a leaf node is reached, resulting in a final classification. This model is characterized by its ease of comprehension and explication, rendering it well-suited for the analysis of extensive datasets. Following the training process, the decision tree produces a discriminative model in the form of a tree structure, which determines the correct classification outcome based on the decisions made at each node.\u003c/p\u003e\n\u003cp\u003eThe Random Forest algorithm is a supervised learning ensemble method that consists of multiple decision trees. This Bagging-type algorithm combines weak classifiers through voting or averaging to improve the accuracy and generalization performance of the model. The success of Random Forest can be attributed to its incorporation of randomness and ensemble techniques, which help prevent overfitting and enhance precision.\u003c/p\u003e"},{"header":"3. Results","content":"\u003cp\u003e3.1 Demographics and baseline characteristics of survivors and non-survivors\u003c/p\u003e\n\u003cp\u003eFollowing the application of predetermined criteria for inclusion, a cohort of 150 patients with critical COVID-19 illness was identified, comprising 46% survivors (n=69) and 54% non-survivors (n=81). Subsequently, clinical data from a subset of 100 patients were utilized for model characterization screening. The area under the curve (AUC) values for lymphocyte count, interleukin-6 (IL-6), C-reactive protein (CRP), and procalcitonin (PCT) were found to be significantly higher in deceased patients compared to survivors, with all corresponding p-values demonstrating statistical significance below 0.01. Table 1 demonstrates the values for statistically significant laboratory results, and complete clinical data are available in the Supplementary Information. Figure 1 shows the\u003cstrong\u003e\u0026nbsp;\u003c/strong\u003estudy flow chart.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 1 Overview of the patient\u0026rsquo;s laboratory results.\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"564\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.01769911504425%\"\u003e\n \u003cp\u003eCharacteristics\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.008849557522122%\"\u003e\n \u003cp\u003eSurvivors\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.01769911504425%\"\u003e\n \u003cp\u003eNon-survivors\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.743362831858407%\"\u003e\n \u003cp\u003e\u003cem\u003eP\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.035398230088495%\"\u003e\n \u003cp\u003e\u003cem\u003eAUC\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"0.17699115044247787%\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.01769911504425%\"\u003e\n \u003cp\u003eNo.\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.008849557522122%\"\u003e\n \u003cp\u003e54\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.01769911504425%\"\u003e\n \u003cp\u003e46\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.743362831858407%\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.035398230088495%\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"0.17699115044247787%\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"100%\" colspan=\"6\"\u003e\n \u003cp\u003e\u003cstrong\u003eBlood routine examination\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.01769911504425%\"\u003e\n \u003cp\u003eWBC count, \u0026times;10\u003csup\u003e9\u003c/sup\u003e/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.008849557522122%\"\u003e\n \u003cp\u003e8.3（6.3,10.1）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.01769911504425%\"\u003e\n \u003cp\u003e11.4（7.7,13.3）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.743362831858407%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.035398230088495%\"\u003e\n \u003cp\u003e0.6952\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"0.17699115044247787%\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.01769911504425%\"\u003e\n \u003cp\u003eLymphocyte cell count,\u0026times;10\u003csup\u003e9\u003c/sup\u003e/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.008849557522122%\"\u003e\n \u003cp\u003e17.1\u0026plusmn;8.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.01769911504425%\"\u003e\n \u003cp\u003e7.1\u0026plusmn;4.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.743362831858407%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.035398230088495%\"\u003e\n \u003cp\u003e0.8482\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"0.17699115044247787%\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.01769911504425%\"\u003e\n \u003cp\u003eNeutrophil cell count, \u0026times;10\u003csup\u003e9\u003c/sup\u003e/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.008849557522122%\"\u003e\n \u003cp\u003e75.2\u0026plusmn;11.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.01769911504425%\"\u003e\n \u003cp\u003e87.2\u0026plusmn;15.8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.743362831858407%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.035398230088495%\"\u003e\n \u003cp\u003e0.8456\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"0.17699115044247787%\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.01769911504425%\"\u003e\n \u003cp\u003eRBC count, \u0026times;10\u003csup\u003e9\u003c/sup\u003e/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.008849557522122%\"\u003e\n \u003cp\u003e3.8（3.4,4.2）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.01769911504425%\"\u003e\n \u003cp\u003e2.9（2.3,3.4）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.743362831858407%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.035398230088495%\"\u003e\n \u003cp\u003e0.8360\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"0.17699115044247787%\"\u003e\n \u003cp\u003e\u0026nbsp;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003e\u003cstrong\u003eTable 1 Overview of the patient\u0026rsquo;s laboratory results.\u003c/strong\u003e\u003c/p\u003e\n\u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"564\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003ePlatelet count,\u0026times;10\u003csup\u003e9\u003c/sup\u003e/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e251.2\u0026plusmn;106.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e164.3\u0026plusmn;90.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.7460\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003eHemoglobin, g/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e118.7\u0026plusmn;17.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e93.0\u0026plusmn;27.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.8329\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003eIL-6, pg/mL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e39.3（29.9,44.9）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e313.9（104.1,736.8）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.8698\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003ePCT, ng/mL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e0.12（0,0.3）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e1.1（0.4,7.2）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.8537\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003eCRP, mg/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e33.0\u0026plusmn;25.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e94.7\u0026plusmn;50.9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.8585\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003ePT, s\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e11.5（10.8,12.3）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e13.0（11.9,15.7）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.7901\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003eAPTT, s\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e31.7\u0026plusmn;4.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e48.6\u0026plusmn;35.9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.7333\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003eD-dimer, mg/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e537.5（210.2,1544.2）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e1362.9（673.0,3478.7）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.7331\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003eINR\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e1.0（1.0,1.2）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e1.3（1.1,1.4）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.8406\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003eTnI, ng/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e0.000（0.000,0.006）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e0.043（0.015,0.114）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.8217\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003eAST, U/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e18.0（14.7,27.2）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e34.0（22.0\u0026plusmn;57.2）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.7703\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003eCr, \u0026mu;mol/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e67.5（57.7,77.5）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e119.0（77.7,173.0）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.7987\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003eBUN, mmol/L\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e7.79（5.58,10.67）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e11.94（10.51，14.05）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.8422\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003eNT-proBNP, pg/mL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"23.04964539007092%\"\u003e\n \u003cp\u003e614.0（206.0,1038.0）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"26.06382978723404%\"\u003e\n \u003cp\u003e2382.5（607.0,7882.7）\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.76595744680851%\"\u003e\n \u003cp\u003e＜0.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"12.056737588652481%\"\u003e\n \u003cp\u003e0.7800\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eNote: The continuity values of the normal distribution are displayed as mean \u0026plusmn; standard deviation, while the continuity values of the nonnormal distribution are represented as median (25th, 75th percentiles) .\u003c/p\u003e\n\u003cp\u003e\u003csup\u003ea\u003c/sup\u003e\u003cem\u003ep\u003c/em\u003e Values were estimated by using the Person Chi-square test and Mann-Whitney U test for continuous variables or t-test.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure 1\u0026nbsp;\u003c/strong\u003eThe study flow chart.\u003c/p\u003e\n\u003cp\u003e3.2 Analysis of plasma differential metabolites in two groups of survivors and non-survivors during COVID-19 infection\u003c/p\u003e\n\u003cp\u003eA total of 150 plasma samples were analyzed for the presence of metabolites. To investigate the relationship between specific metabolites and patient mortality, 100 samples were selected for the construction of a predictive model. Following normalization of metabolite values, differences in metabolite expression were visualized in Figure 2A. Subsequently, in Figure 2B, the data obtained from 100 samples underwent a differential metabolic analysis, resulting in the identification of 100 metabolites that were differentially expressed between survivors and non-survivors. Among these metabolites, 23 were found to be upregulated while 77 were downregulated. In Figure 2C, Principal Component Analysis (PCA) revealed two distinct clusters representing surviving and deceased COVID-19 patients, with a statistically significant disparity between the two groups. Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA) further demonstrated significant differences in the serum non-targeted metabolomic profiles of surviving and deceased patients, particularly in positive and negative ion modes. The findings presented in Figure 2D indicate that the top 34 metabolites, as determined by their high variable importance in the predicted scores, hold promise for distinguishing between surviving patients and non-survivors. To investigate potential biomarker metabolites that may predict mortality in critical-type COVID-19 patients, we individually analyzed 34 metabolites and reviewed relevant literature. We excluded metabolites related to energy metabolism that could be influenced by the patient\u0026apos;s daily activities, unidentified metabolites lacking relevant research, and metabolites that may be impacted by medication use post-hospital admission (some of which may serve as intermediates in drug metabolism). Consequently, two metabolites were chosen as indicators for the subsequent prediction model. The area under the receiver operating characteristic curve ( AUROC) values were computed for the subjects to evaluate the identification precision of the candidate biomarker metabolites. The findings indicated that 3-Oxalomalate exhibited AUROC values ranging from 0.926 to 0.991, while Itaconic acid demonstrated values between 0.952 and 1. Consequently, these two metabolites were selected as the definitive discriminating markers.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFigure 2\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDetected metabolites analysis of serum samples from survivors and non-survivors. (A) (B)Volcano plots show the change in transformed P-value (\u0026minus;log10) against the log2 (survival/mortality). Gray dashed lines: cut-off values (ratio \u0026gt;1.5 and p-value \u0026lt;0.05). Orange dots: High expression of metabolites (23). Blue dots: Low expression of metabolites (77). The size of the points varies according to the absolute value of [Log2(survival/mortality)]. (C) Principal Component Analysis (PCA). The green and red points are survival and mortality. (D) Variable important in projection (VIP). (E) and (F) show the AUC values of 3-Oxalomalate and Itaconic acid.\u003c/p\u003e\n\u003cp\u003e3.3 Machine Learning Model\u003c/p\u003e\n\u003cp\u003e\u003cimg src=\"https://myfiles.space/user_files/127393_c7e80a1c9bb65875/127393_custom_files/img1716979853.png\" alt=\"image\"\u003e\u003cstrong\u003eFigure 3\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Violin plots: The median is denoted by a white dot, the interquartile range is represented by a thick grey bar, and the remaining distribution is depicted by thin grey lines.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e3.3.1 Statistical Analysis\u003c/p\u003e\n\u003cp\u003eThe utilization of violin plots and kernel density plots effectively illustrates the distribution of clinical and metabolomic data upon admission for patients who succumbed to their condition (0) and those who survived (1).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e3.3.2 Development of the Predictive Model encompasses four primary stages:\u003c/p\u003e\n\u003cp\u003e(I) Data Selection and Preprocessing; (II) Risk Scoring Function; (III) Model Evaluation; (IV) External Validation.\u003c/p\u003e\n\u003cp\u003e(I) Data Selection and Preprocessing\u003c/p\u003e\n\u003cp\u003eFollowing a thorough analysis of the dataset, clinical and metabolomic data from 150 patients were incorporated. Binary variables were coded such that \u0026quot;1\u0026quot; represented \u0026quot;yes\u0026quot; and \u0026quot;0\u0026quot; represented \u0026quot;no\u0026quot;, while discrete variables were standardized.\u003c/p\u003e\n\u003cp\u003eA comprehensive analysis was conducted on a dataset comprising 347+20 features, including metabolites and clinical indicators. Through a combination of differential screening and literature review, six indicators, consisting of two metabolites and four clinical indicators, were identified as being correlated with patient survival status. A predictive model was developed to differentiate between surviving and deceased patients. The Box Tidwell test was employed to evaluate the linearity of the data, with a significance level of p\u0026lt;0.05 indicating linear independence.\u003c/p\u003e\n\u003cp\u003e(II) Risk Scoring Function\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAfter inputting individual data into the algorithm, the machine predicts two metabolite data points and four clinical data points for the patient. An increment of 1 is applied to the death risk score for each abnormal indicator detected, while the score remains unchanged for patients with normal indicators. The final score is then utilized in a formula to determine the percentage value of the death risk. A death probability falling within the range of 0 to 49% is categorized as survival, whereas a probability between 50 and 100% is classified as death.\u003c/p\u003e\n\u003cp\u003eFigure 3B illustrates the correlation pattern between the risk score and death probability, displaying an S-shaped curve. Figure 3C depicts the approximate normal distribution of risk scores for surviving and deceased patients. It is observed that when the risk score exceeds 3, the probability of death exceeds 50%.\u003c/p\u003e\n\u003cp\u003e(III) Model Evaluation\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWe examined four different machine learning methods to analyze the potential linear or nonlinear relationships between predictive indicator variables and predicted outcomes, specifically focusing on survival or non-survival. The models tested were KNN, Naive Bayes, Decision Tree, and Random Forest, with results evaluated in terms of model performance and attributes.\u003c/p\u003e\n\u003cp\u003eThis study utilizes four traditional machine learning algorithms (KNN, Naive Bayes, Random Forest, and Decision Tree) to conduct data classification testing and evaluates the classification performance of these algorithms through experimentation.\u003c/p\u003e\n\u003cp\u003eThe study employs the Sklearn machine learning library, which is based on the Python language, to utilize four classifiers derived from the Na\u0026iuml;ve-Bayes, Neighbors, Tree, and Ensemble modules within the Sklearn library for model training and category prediction. Additionally, it provides essential performance metrics to assess the algorithms and classification models, such as accuracy, specificity, sensitivity, and area under the receiver operating characteristic curve (AUROC), as shown in Figure 3D.\u003c/p\u003e\n\u003cp\u003eThe comparative experimental results for the four algorithms, namely Naive Bayes, KNN, Random Forest, and Decision Tree, are displayed in Table 2, highlighting the optimal metrics for emphasis.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 2 Experimental results for the four algorithms.\u003c/strong\u003e\u003c/p\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"598\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eNaive Bayes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eKNN\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eRandom Forest\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eDecision Tree\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eAccuracy:0.88\u003c/p\u003e\n \u003cp\u003eSensitivity/Recall:0.857\u003c/p\u003e\n \u003cp\u003eSpecificity:0.897\u003c/p\u003e\n \u003cp\u003eAUROC:0.889\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003e\u003cstrong\u003eAccuracy:0.93\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eSensitivity/Recall:0.926\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eSpecificity:0.934\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eAUROC:0.935\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eAccuracy:0.88\u003c/p\u003e\n \u003cp\u003eSensitivity/Recall:0.755\u003c/p\u003e\n \u003cp\u003eSpecificity:0.813\u003c/p\u003e\n \u003cp\u003eAUROC:0.88\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd width=\"25%\" valign=\"top\"\u003e\n \u003cp\u003eAccuracy:0.85\u003c/p\u003e\n \u003cp\u003eSensitivity/Recall:0.755\u003c/p\u003e\n \u003cp\u003eSpecificity:0.934\u003c/p\u003e\n \u003cp\u003eAUROC:0.861\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eAfter evaluating the performance of the four predictive models selected, the KNN model was ultimately selected as the final model based on its high AUROC value of 0.935 and its minimal deviation from the performance of the other models. Additionally, the model\u0026apos;s simplicity and high interpretability were taken into consideration.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e(IV) External Validation\u003c/p\u003e\n\u003cp\u003eA cohort of fifty laboratory-confirmed severe COVID-19 patients, comprising 27 survivors and 23 non-survivors, were admitted to the First Affiliated Hospital of Harbin Medical University for external validation. The chosen predictive model was employed to predict the likelihood of mortality in these patients, and its efficacy was assessed through measures of accuracy, AUROC, sensitivity, and specificity.\u003c/p\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eThis study, conducted at two tertiary hospitals affiliated with Harbin Medical University, retrospectively analyzes the relationship between specific metabolites (3-Oxalomalate and Itaconic acid), lymphocyte count, interleukin-6 (IL-6), C-reactive protein (CRP), and procalcitonin (PCT) levels in the bloodstream and the mortality risk in individuals with COVID-19. The findings suggest that these factors play a crucial role as prognostic indicators for mortality in COVID-19 patients. The utilization of a KNN algorithm within the machine learning framework has shown notable effectiveness in forecasting outcomes using the specified six determinants.\u003c/p\u003e \u003cp\u003eThe utilization of laboratory-based biomarkers in the development of predictive models presents a comprehensive and effective approach for assessing mortality risks linked to COVID-19. This research has identified four biomarkers, namely lymphocyte count (LYM), IL-6, CRP, and PCT, as notable indicators of mortality. During the initial phases of the pandemic, Sun et al. found that older age upon hospital admission and decreased lymphocyte count was associated with increased mortality among COVID-19 patients, establishing these factors as autonomous risk factors for the elderly population[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. The decline in humoral and cellular immune function with advancing age results in decreased immune system activation upon viral invasion, thereby elevating the risk of mortality in older individuals with diminished T-cell counts. The reduction in lymphoid T-cell counts is particularly notable in individuals with severe COVID-19, as evidenced by elevated levels of C-reactive protein (CRP), a protein synthesized in the liver[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. The heightened levels of CRP following infection may be associated with the excessive production of inflammatory cytokines in patients with severe COVID-19, serving as an early prognostic marker for disease severity. In a cohort study involving 298 participants, the area under the receiver operating characteristic (ROC) curve for C-reactive protein (CRP) was found to be significantly higher than that of age, neutrophil count, and platelet count[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. A retrospective analysis conducted by Tao Liu et al. on 69 patients with severe COVID-19 revealed a notable elevation in interleukin-6 (IL-6) levels upon admission in severe cases compared to non-severe cases. The variations in IL-6 levels were found to be closely associated with the severity and prognosis of severe COVID-19, showing a strong positive correlation with CRP levels (r\u0026thinsp;=\u0026thinsp;0.781, P\u0026thinsp;\u0026lt;\u0026thinsp;0.001). Although there were slight variations in IL-2, IL-4, IL-10, tumor necrosis factor-alpha (TNF-α) levels, and interferon-gamma (IFN-γ) before and after treatment, these levels remained within the normal range[\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. In contrast, the initial increase in procalcitonin (PCT) levels is more significant in bacterial infections compared to viral infections or non-infectious systemic inflammatory response syndrome (SIRS)[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. However, a comprehensive clinical study conducted by Zhang et al. involving 140 hospitalized COVID-19 patients demonstrated that individuals with severe disease presentations displayed notably higher levels of C-reactive protein and procalcitonin in comparison to those with less severe manifestations (P\u0026thinsp;\u0026lt;\u0026thinsp;0.001)[\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. Additionally, a separate retrospective examination of 452 severe cases of COVID-19 provided further evidence that procalcitonin serves as a significant independent predictor of mortality[\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe laboratory assays mentioned above have been validated in previous studies for their ability to assess disease severity and mortality risk in COVID-19 patients. This study has advanced beyond clinical constraints by utilizing non-targeted metabolomic analysis of blood samples from ICU patients, leading to the identification of two metabolites that improve predictive accuracy for mortality risk. Initially, itaconic acid underwent extensive investigation in the field of basic medical sciences, with a growing emphasis on its potent antibactnate (4-OI)[\u003cspan additionalcitationids=\"CR24 CR25 CR26\" citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], which has demonstrated the ability to inhibit the replication of SARS-CoV-2. Through activation of Nrf2, an antioxidant and anti-inflammatory transcription factor, 4-OI induces an antiviral response capable of suppressing the replication of SARS-CoV-2 and other pathogenic viruses, such as herpes simplex virus 1 (HSV-1) and vaccinia virus[\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. The non-targeted metabolomic analysis performed in this study has determined that Itaconic acid demonstrates an AUC value of 0.981 in predicting COVID-19 mortality risk, further supporting previous research findings. Combining this metabolite with 3-Oxalomalate (AUC: 0.963) and the four established laboratory parameters significantly enhances the accuracy of predictive modeling.\u003c/p\u003e \u003cp\u003eThis study recognizes specific inherent constraints. The patient sample is limited to individuals from northern China, restricting the diversity of the dataset and the amount of data collected due to geographical limitations. Subsequent versions of this model could potentially improve through the inclusion of a broader dataset, leading to enhanced refinement and development. Furthermore, the non-targeted metabolomic results, which are relative, do not yield an absolute quantitative value. Although this method has demonstrated improved predictive accuracy using available clinical data, its implementation in real-world clinical environments presents significant challenges.\u003c/p\u003e \u003cp\u003eDuring the COVID-19 pandemic, critical care medicine is essential for preserving human life. Utilizing precise predictive methods allows for the accurate assessment of patient mortality risk, aiding in the allocation of necessary resources. Ideally, this precision enables the implementation of personalized treatment plans for high-risk patients, potentially leading to significant improvements in patient outcomes. In clinical practice, it is observed that certain patients with a high mortality risk may derive greater benefit from palliative care and familial support during the terminal phase of life, representing an additional notable aspect of the predictive model.\u003c/p\u003e"},{"header":"5. Conclusion","content":"\u003cp\u003eIn conclusion, the mortality risk prediction model for severe COVID-19 patients, established in this study based on two metabolites and four laboratory-derived clinical indicators, offers a precise prognosis for the likelihood of death. This model signifies a notable progression from prior models that exclusively utilized laboratory testing. The application of the KNN algorithm in this context aids healthcare professionals in formulating tailored treatment plans, ultimately enhancing the effectiveness of interventions and minimizing the occurrence of mortality and other unfavorable consequences.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe study protocol was approved by the Ethics Committee of the Second Affiliated Hospital of Harbin Medical University (approval number: KY2023-023), and signed informed consent forms were collected from all study subjects or their families. All studies were conducted in accordance with the Declaration of Helsinki as revised in 2013.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWritten informed consent was obtained from the patient for publication. A copy of the written consent is available for review by the Editor-in-Chief of this journal.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflict-of-interest statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors declare that they have no conflicts of interest to disclose.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eData sharing does not apply to this article as no datasets were generated or analyzed during the current study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe\u0026nbsp;Outstanding Youth Project of Heilongjiang Natural Science Foundation (Nos. JQ2021H002);\u003c/p\u003e\n\u003cp\u003eThe\u0026nbsp;National\u0026nbsp;Key\u0026nbsp;Research and\u0026nbsp;Development Program of China\u0026nbsp;(Nos. 2021YFC2501800);\u003c/p\u003e\n\u003cp\u003eKey R\u0026amp;D Plan Project in Heilongjiang Province (No. GY2023ZB0075);\u003c/p\u003e\n\u003cp\u003eHarbin Medical University\u0026nbsp;Foundation Youth Project (NO.\u0026nbsp;PYQN2023-9).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eCSP\u0026nbsp;contributed to the manuscript in conceptualization, method, data collection, formal data analysis, drafting, visualization, and writing the manuscript as well as critical revision and statistical analysis. This author takes responsibility for the content of this manuscript, including the data and analysis.\u0026nbsp;HQY and ZR\u0026nbsp;contributed to the manuscript in conceptualization, data collection, drafting, writing the manuscript as well as comprehensive critical revision.\u0026nbsp;LY\u0026nbsp;contributed to the manuscript in data collection, formal data analysis, drafting and writing the manuscript as well as critical revision and statistical analysis.\u0026nbsp;LM\u0026nbsp;contributed to the manuscript in conceptualization, methodology, critical revision, and statistical analysis.\u0026nbsp;LWH\u0026nbsp;contributed to the manuscript in, data collection, formal data analysis, and writing the manuscript as well as critical revision and statistical analysis.\u0026nbsp;ZJB\u0026nbsp;contributed to the manuscript in data collection, formal data analysis, drafting and writing the manuscript as well as critical revision.\u0026nbsp;WHL\u0026nbsp;contributed to the manuscript in, formal data analysis and writing the manuscript as well as critical revision. All authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe are grateful to all those who offer selfless advice, help, and support to our study.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eCOVID-19 cases | WHO COVID-19 dashboard. In., vol. 2024: 19.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen Y, Klein SL, Garibaldi BT, Li H, Wu C, Osevala NM, Li T, Margolick JB, Pawelec G, Leng SX. Aging in COVID-19: Vulnerability, immunity and intervention. AGEING RES REV. 2021;65:101205.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGao HB, Zhang J. [Analysis of prognostic factors in patients with COVID-19 infection]. Zhonghua jie he he hu xi za zhi\u0026thinsp;=\u0026thinsp;Zhonghua jiehe he hu xi zazhi\u0026thinsp;=\u0026thinsp;Chinese. J tuberculosis Respiratory Dis. 2024;47(3):296\u0026ndash;300.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eL\u0026oacute;pez-L\u0026oacute;pez \u0026Aacute;, L\u0026oacute;pez-Gonz\u0026aacute;lvez \u0026Aacute;, Barker-Tejeda TC, Barbas C. A review of validated biomarkers obtained through metabolomics. EXPERT REV MOL DIAGN. 2018;18(6):557\u0026ndash;75.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGonzalez-Covarrubias V, Mart\u0026iacute;nez-Mart\u0026iacute;nez E, Del Bosque-Plata L. The Potential of Metabolomics in Biomedical Applications. Metabolites 2022, 12(2).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaeda R, Seki N, Uwamino Y, Wakui M, Nakagama Y, Kido Y, Sasai M, Taira S, Toriu N, Yamamoto M, et al. Amino acid catabolite markers for early prognostication of pneumonia in patients with COVID-19. NAT COMMUN. 2023;14(1):8469.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChatelaine HAS, Chen Y, Braisted J, Chu SH, Chen Q, Stav M, Begum S, Diray-Arce J, Sanjak J, Huang M et al. Nucleotide, Phospholipid, and Kynurenine Metabolites Are Robustly Associated with COVID-19 Severity and Time of Plasma Sample Collection in a Prospective Cohort Study. INT J MOL SCI 2023, 25(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShi D, Yan R, Lv L, Jiang H, Lu Y, Sheng J, Xie J, Wu W, Xia J, Xu K, et al. The serum metabolome of COVID-19 patients is distinctive and predictive. Metab Clin Exp. 2021;118:154739.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRoberts I, Wright Muelas M, Taylor JM, Davison AS, Xu Y, Grixti JM, Gotts N, Sorokin A, Goodacre R, Kell DB. Untargeted metabolomics of COVID-19 patient serum reveals potential prognostic markers of both severity and outcome. Metabolomics: Official J Metabolomic Soc. 2021;18(1):6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJulkunen H, Cichońska A, Slagboom PE, W\u0026uuml;rtz P. Metabolic biomarker profiling for identification of susceptibility to severe pneumonia and COVID-19 in the general population. ELIFE 2021, 10.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSindelar M, Stancliffe E, Schwaiger-Haber M, Anbukumar DS, Adkins-Travis K, Goss CW, O'Halloran JA, Mudd PA, Liu W, Albrecht RA, et al. Longitudinal metabolomics of human plasma reveals prognostic markers of COVID-19 disease severity. Cell Rep Med. 2021;2(8):100369.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFang W, Zhu Y, Yang S, Tong X, Ye C. Reciprocal regulation of phosphatidylcholine synthesis and H3K36 methylation programs metabolic adaptation. CELL REP. 2022;39(2):110672.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSmith CA, Want EJ, O'Maille G, Abagyan R, Siuzdak G. XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. ANAL CHEM. 2006;78(3):779\u0026ndash;87.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou Z, Luo M, Zhang H, Yin Y, Cai Y, Zhu ZJ. Metabolite annotation from knowns to unknowns through knowledge-guided multi-layer metabolic networking. NAT COMMUN. 2022;13(1):6656.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePang Z, Zhou G, Ewald J, Chang L, Hacariz O, Basu N, Xia J. Using MetaboAnalyst 5.0 for LC-HRMS spectra processing, multi-omics integration and covariate adjustment of global metabolomics data. NAT PROTOC. 2022;17(8):1735\u0026ndash;61.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSun H, Ning R, Tao Y, Yu C, Deng X, Zhao C, Meng S, Tang F, Xu D. Risk Factors for Mortality in 244 Older Adults With COVID-19 in Wuhan, China: A Retrospective Study. J AM GERIATR SOC. 2020;68(6):E19\u0026ndash;23.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQin C, Zhou L, Hu Z, Zhang S, Yang S, Tao Y, Xie C, Ma K, Shang K, Wang W, et al. Dysregulation of Immune Response in Patients With Coronavirus 2019 (COVID-19) in Wuhan, China. CLIN INFECT DIS. 2020;71(15):762\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuo X, Zhou W, Yan X, Guo T, Wang B, Xia H, Ye L, Xiong J, Jiang Z, Liu Y, et al. Prognostic Value of C-Reactive Protein in Patients With Coronavirus 2019. CLIN INFECT DIS. 2020;71(16):2174\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLiu T, Zhang J, Yang Y, Ma H, Li Z, Zhang J, Cheng J, Zhang X, Zhao Y, Xia Z, et al. The role of interleukin-6 in monitoring severe case of coronavirus disease 2019. EMBO MOL MED. 2020;12(7):e12421.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePaudel R, Dogra P, Montgomery-Yates AA, Coz YA. Procalcitonin: A promising tool or just another overhyped test? INT J MED SCI. 2020;17(3):332\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang JJ, Dong X, Cao YY, Yuan YD, Yang YB, Yan YQ, Akdis CA, Gao YD. Clinical characteristics of 140 patients infected with SARS-CoV-2 in Wuhan. China ALLERGY. 2020;75(7):1730\u0026ndash;41.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShang Y, Liu T, Wei Y, Li J, Shao L, Liu M, Zhang Y, Zhao Z, Xu H, Peng Z et al. Scoring systems for predicting mortality for severe patients with COVID-19. EClinicalMedicine : 2020, 24:100426.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMills EL, Ryan DG, Prag HA, Dikovskaya D, Menon D, Zaslona Z, Jedrychowski MP, Costa A, Higgins M, Hams E et al. Itaconate is an anti-inflammatory metabolite that activates Nrf2 via alkylation of KEAP1. NATURE : 2018, 556(7699):113\u0026ndash;117.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMichelucci A, Cordes T, Ghelfi J, Pailot A, Reiling N, Goldmann O, Binz T, Wegner A, Tallam A, Rausell A, et al. Immune-responsive gene 1 protein links metabolism to immunity by catalyzing itaconic acid production. P NATL ACAD SCI USA. 2013;110(19):7820\u0026ndash;5.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eO'Neill LAJ, Artyomov MN. Itaconate: the poster child of metabolic reprogramming in macrophage function. Nat Rev Immunol. 2019;19(5):273\u0026ndash;81.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen M, Sun H, Boot M, Shao L, Chang S, Wang W, Lam TT, Lara-Tejero M, Rego EH, Gal\u0026aacute;n JE. Itaconate is an effector of a Rab GTPase cell-autonomous host defense pathway against Salmonella. Volume 369. New York, N.Y.): Science; 2020. pp. 450\u0026ndash;5. 6502.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSwain A, Bambouskova M, Kim H, Andhey PS, Duncan D, Auclair K, Chubukov V, Simons DM, Roddy TP, Stewart KM, et al. Comparative evaluation of itaconate and its derivatives reveals divergent inflammasome and type I interferon regulation in macrophages. Nat Metabolism. 2020;2(7):594\u0026ndash;602.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOlagnier D, Farahani E, Thyrsted J, Blay-Cadanet J, Herengt A, Idorn M, Hait A, Hernaez B, Knudsen A, Iversen MB, et al. SARS-CoV2-mediated suppression of NRF2-signaling reveals potent antiviral and anti-inflammatory activity of 4-octyl-itaconate and dimethyl fumarate. NAT COMMUN. 2020;11(1):4938.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-infectious-diseases","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"infd","sideBox":"Learn more about [BMC Infectious Diseases](http://bmcinfectdis.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/infd","title":"BMC Infectious Diseases","twitterHandle":"#bmcinfectdis","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"COVID-19, Metabolomics, Machine Learning, Risk of Mortality, Predictive Model","lastPublishedDoi":"10.21203/rs.3.rs-4418889/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4418889/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe global spread of Coronavirus disease (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has prompted the investigation of a predictive model for early mortality risk estimation in critical-type COVID-19 patients through the integration of metabolomics and clinical data using machine learning techniques in this study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethods\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eOne hundred patients with severe COVID-19 infection, comprising 46 survivors and 53 non-survivors, were enrolled from the Second Hospital affiliated with Harbin Medical University. A predictive model was developed within 24 hours of admission utilizing blood metabolomics and clinical data. Differential metabolite analysis and other techniques were employed to identify relevant features. The performance of the models was evaluated by comparing the area under the receiver operating characteristic curve (AUROC). The ultimate predictive model underwent external validation with a cohort of 50 critical COVID-19 patients from the First Hospital affiliated with Harbin Medical University.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSignificant disparities in blood metabolomics and laboratory parameters were noted between individuals who survived and those who did not. Two metabolite indicators, Itaconic acid and 3-Oxalomalate, along with four laboratory tests (LYM, IL-6, PCT, and CRP), were identified as the six variables in all four models. The external validation set demonstrated that the KNN model exhibited the highest AUC of 0.935 among the four models. When considering a 50% risk of mortality threshold, the validation set displayed a sensitivity of 0.926 and a specificity of 0.934.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe prognostic outcome of COVID-19 patients is significantly influenced by the levels of Itaconic acid, 3-Oxalomalate, LYM, IL-6, PCT, and CRP upon admission. These six indicators can be utilized to assess the mortality risk in affected individuals.\u003c/p\u003e","manuscriptTitle":"Integration of metabolomics methodologies for the development of predictive models for mortality risk in patients with severe COVID-19.","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-06-04 15:15:42","doi":"10.21203/rs.3.rs-4418889/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-10-25T16:24:50+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-10-25T15:48:30+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"22857901748611647126024268849688037214","date":"2024-10-25T09:43:58+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-09-24T03:45:43+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-09-21T05:03:39+00:00","index":"hide","fulltext":""},{"type":"editorInvited","content":"","date":"2024-09-12T18:32:06+00:00","index":"","fulltext":""},{"type":"reviewerAgreed","content":"154734777639934337475365001203731692095","date":"2024-09-11T05:36:38+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"152298661974607793325615461996877263843","date":"2024-09-11T03:36:26+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-09-10T12:03:41+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"175282273550887498727435524459686422650","date":"2024-09-10T11:12:12+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-09-09T03:18:07+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-05-28T08:54:43+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-05-17T13:43:53+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Infectious Diseases","date":"2024-05-14T11:38:48+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-infectious-diseases","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"infd","sideBox":"Learn more about [BMC Infectious Diseases](http://bmcinfectdis.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/infd","title":"BMC Infectious Diseases","twitterHandle":"#bmcinfectdis","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c8b4a931-5986-4c25-8d21-6878f6881959","owner":[],"postedDate":"June 4th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-01-06T15:59:55+00:00","versionOfRecord":{"articleIdentity":"rs-4418889","link":"https://doi.org/10.1186/s12879-024-10402-3","journal":{"identity":"bmc-infectious-diseases","isVorOnly":false,"title":"BMC Infectious Diseases"},"publishedOn":"2025-01-02 15:57:08","publishedOnDateReadable":"January 2nd, 2025"},"versionCreatedAt":"2024-06-04 15:15:42","video":"","vorDoi":"10.1186/s12879-024-10402-3","vorDoiUrl":"https://doi.org/10.1186/s12879-024-10402-3","workflowStages":[]},"version":"v1","identity":"rs-4418889","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4418889","identity":"rs-4418889","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00