Combination of multiple omics and machine learning identifies diagnostic genes for ARDS and COVID-19 | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Combination of multiple omics and machine learning identifies diagnostic genes for ARDS and COVID-19 Chuanxi Tian, Yikun Guo, Huifang Guan, Kaile Ma, Rui Hao, Wei Zhu, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3892523/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract BACKGROUND Acute respiratory distress syndrome (ARDS) is a common acute clinical syndrome of the respiratory system with a high mortality rate and difficult prognosis.COVID-19 is a serious respiratory infectious disease caused by coronaviruses in a global pandemic. Some studies have suggested a possible association between COVID-19 and ARDS, but few studies have investigated the mechanism of interaction between them. METHODS Microarray data of ARDS (GSE32707 and GSE66890) and COVID-19 (GSE213313) were downloaded from the GEO database and searched for common differential genes for enrichment analysis.WGCNA was used to identify co-expression modules and genes associated with ARDS and COVID-19. RF and LASSO were performed for candidate gene identification. Machine learning XGBoost improved the diagnosis of hub genes in ARDS and COVID-19. The degree of immune cell infiltration in ARDS and COVID-19 samples was assessed using the CIBERSORT algorithm, and the relationship between hub genes and infiltrating immune cells was investigated. Changes in pathway activity per cell were visualized using Seurat standard flow down clustering (seurat) to visualize peripheral blood mononuclear cell (PBMC) single-cell RNA sequencing (scRNA-seq) data from patients with sepsis-combined ARDS and patients with sepsis alone. RESULTS Limma difference analysis identified 314 up-regulated genes and 241 down-regulated genes in ARDS and COVID-19.WGCNA identified the purple-red co-expression module as the core module of ARDS and COVID-19. Five candidate genes, namely HIST1H2BK, TCF4, OLFM4, KIF14 and HK1, were screened using two machine learning algorithms, RF and LASSO. XGBoost constructed diagnostic models to evaluate the hub genes with high diagnostic efficacy in ARDS and COVID-19. Single-cell sequencing revealed the presence of alterations in five immune subpopulations, including monocytes, B cells, T cells, NK cells and platelets, with high expression levels and cellular occupancy of TCF4 and HK1, which are involved in oxidative reactions. Health sciences/Biomarkers Health sciences/Medical research machine learning multiple omics ARDS COVID-19 diagnostic genes Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Introduction Acute respiratory distress syndrome (ARDS) is a syndrome clinically characterized by progressive respiratory distress and refractory hypoxemia, with diffuse alveolar infiltrates seen on chest radiography( 1 ). ARDS has a rapid onset and progression, and is one of the leading causes of death in critically ill patients( 2 ).ARDS patients account for 10.4% of ICU admissions, and mortality is as high as 35%-45%( 3 ).The pathogenesis of ARDS is complex and difficult to treat, so there is a need for more in-depth study of ARDS. Patients with ARDS account for 10.4% of ICU admissions, with mortality rates as high as 35%-45%( 3 ). The pathogenesis of ARDS is complex and difficult to treat, so more in-depth mechanistic studies of ARDS are needed to identify new biomarkers for early diagnosis, treatment and prognosis. COVID-19 is an acute respiratory infectious disease caused by Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), which gradually spreads to all parts of the world as outbreaks occur, causing serious impacts on global economic and social development ( 4 ). The main manifestations of novel coronavirus pneumonia are fever, dry cough, and malaise, and a few patients are accompanied by upper respiratory and gastrointestinal symptoms such as nasal congestion, runny nose, and diarrhea, and it also affects multiple organs of the human body, such as cardiovascular, gastrointestinal tract, liver, and kidneys. Studies have shown that the clinical burden of COVID-19 may extend well beyond the acute infection phase, where long-term multiple sequelae have a significant impact on an individual's quality of life, and is the most important health challenge globally ( 5 ). Relevant studies have shown that patients with severe and critically ill COVID-19 are prone to rapid progression to ARDS in a short period of time, and that the incidence and severity of ARDS due to COVID-19 are positively correlated and have a worse prognosis than patients with simple ARDS.( 6 ) Late stage ARDS due to COVID-19 is difficult to control, with a mortality rate ranging from 26% to 61. 5%, and early detection, diagnosis and treatment are critical to control and improve the prognosis of patients with COVID-19-induced ARDS with symptoms of lung inflammation, thick airway mucus secretion, elevated levels of proinflammatory cytokines, lung injury and microthrombosis ( 7 ). Therefore, it is important to explore the mechanism of progression of COVID-19 to ARDS, and the study of related diagnostic, therapeutic and prognostic biomarkers is more urgent. In recent years, with the continuous development of computer and biosequencing technologies, it has become possible to analyze disease-related genes at the molecular level as medical research continues to deepen. Bioinformatics, a science that combines molecular biology and information technology, has been widely used in recent studies ( 8 ). In this study, we screened and analyzed the common genes of ARDS and COVID-19 by various methods of bioinformatics, hoping to provide new biomarkers and theoretical basis for the diagnosis and treatment of both diseases. Materials and Methods 2.1 Differential Gene Screening and Enrichment Analysis We collected microarray data from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database ( https://www.ncbi.nlm.nih.gov/geo/ ) ( 9 ) for the ARDS dataset (GSE32707 and GSE66890) and the COVID-19 dataset (GSE213313). The data were filtered, background corrected, log2 transformed and normalized by R software. The "SVA" package was used to merge and remove the batch effect on the datasets GSE32707 and GSE66890, and to perform principal component analysis (PCA) on the samples in the three datasets to observe the distribution of clusters among the samples. The "limma" package was used to identify the differentially expressed genes (DEGs) of ARDS and COVID-19, and |log2(FC)|> 1 and p < 0.05 were set as the screening criteria. Next, we analyzed the common genes that were up-regulated and down-regulated in ARDS and COVID-19 using Wayne plots and performed gene enrichment analysis (Gene Ontology, GO). Weighted Gene Co-expression Network Analysis We used the "WGCNA" package to construct co-expression gene modules and screened genes with expression > 0.5 for further analysis by selecting the optimal soft thresholds, identifying the most highly correlated ARDS and COVID-19 modules as well as multiple co-expression module genes for further analysis ( 10 ). Machine Learning Candidate Gene Screening Random forest (RF) algorithm and least absolute shrinkage and selection operator (LASSO) logistic regression were used to screen key genes from the crossover genes of DEGs and WGCNAs ( 11 , 12 ).RF is an integrated prediction method that handles a large number of input variables and evaluates the significance of the variables.LASSO is a regression method that has shown superiority in evaluating high-dimensional data. We used the RF algorithm to first screen the diagnostic genes with importance scores greater than 0.5. Among the obtained genes, the LASSO algorithm was further downscaled to obtain the final diagnostic genes, and their respective ROC curves were plotted. RF analysis and LASSO regression were performed using the R packages "random forest" and "glmnet" ( 13 ). Hub gene diagnostic model construction Extreme Gradient Boosting (XGBoost) is a commonly used supervised integrated learning algorithm with powerful scalability and convenient features for model visualization and optimization, and the expression values of hub genes are used as feature values for training XGBoost models [16]. We first selected the ARDS (GSE32707) and COVID-19 (GSE213313) datasets as the training set, and used the ARDS (GSE66890) dataset for validation. The diagnostic effectiveness of the model was evaluated by plotting the receiver operating characteristic (ROC), precision recall (PR) curve, and area under the curve (AUC). Analysis of immune infiltration CIBERSORT is an inverse convolution algorithm that has been widely used to label the genomes of different types of immune cells in the microenvironment ( 14 ). The algorithm simulates the transcriptionally characterized substrates of 22 types of immune cells, including B cells, plasma cells, T cells, natural killer cells, monocytes, macrophages, dendritic cells, mast cells, eosinophils, and neutrophils, using the R software in conjunction with the inverse convolution method. We compared immune cell infiltration in peripheral blood mononuclear cell (PBMC) samples from the disease group with normal samples. Single-cell sequencing quality control and downscaling We downloaded the single-cell RNA sequencing dataset (GSE151263) from the GEO database, and the "Seurat" and "SingleR" software packages were used to analyze the scRNA-seq dataset. Cells with ≤ 10% of mitochondrial genes and ≤ 3% of erythroid genes were retained. At the same time, we removed the number of genes (nFeature RNA) ≤ 200 or 5000 ≥ cells. Next, we performed downscaling and clustering and selected 3000 highly variable genes. Combined with the elbow plot, we selected inflection points and PCs with smooth curves, selected the top 10 dimensions for subsequent analysis, and showed the effect of UMAP and tSNE on downscaling. We then performed cell-associated annotation by immuno-cell-associated labeling using the "SingleR" package ( 15 ). Finally, we visualized the expression of hub genes in different immune cells using violin plots. Statistical Analysis All statistical tests were performed using R software version 4.1.2. Differences between the two groups were analyzed using the Wilcoxon or Student's t-test. Correlations between variables were determined using the Pearson or Spearman correlation test. Statistical significance was set at a two-tailed p < 0.05. Results Identification of differentially expressed genes between ARDS and COVID-19 Principal component analysis (PCA) was used to visualize the distribution of these samples before and after correction for batch effects (Fig. 1 A, 1 B). We performed data correction and normalization on three datasets (GSE32707, GSE66890, and GSE213313) and identified 1114 DEGs in ARDS, including 575 up-regulated and 539 down-regulated genes, and 3587 DEGs in COVID-19, including 1738 up-regulated and 1849 down-regulated genes. Meanwhile, by plotting the Wayne diagram to screen the common DEGs between ARDS and COVID-19, the results showed that 180 and 61 overlapping DEGs were found in the up-regulated and down-regulated DEGs, respectively (Fig. 1 C, 1 D). Enrichment analysis To explore the biological functions and pathways of the common DEGs, we performed GO enrichment analysis (Fig. 1 E, 1 F).The results of GO analysis showed that upregulated DEGs were mainly enriched in mitotic cell cycle, cytoplasmic vesicle lumen, and protein kinase regulator activity, while downregulated DEGs were mainly enriched in cellular defense response, endocytosis vesicles, and T cell receptor binding. Weighted gene co-expression network analysis of ARDS and COVID-19 We used co-expression analysis to construct co-expression networks to explore the correlation between clinical traits and genes. In this study, clustering analysis was performed using the "Flash clust" function. When the threshold was set to 75, 6 outlier samples were detected and deleted, and 145 samples were retained (Fig. 2 A, 2 B). The "Select Soft Threshold" function of "WGCNA" filters out power parameters from 1 to 30. A power of β = 5 is selected as the most appropriate soft threshold to ensure a scale-free network, and the results show that the optimal soft power value is 10, and a total of 11 modules are identified. "Cutree" dynamics and module characterizing gene functions were used to construct cluster maps (Fig. 2 C, 2 D), and a total of 11 modules consisting of genes with similar co-expression traits were obtained. Heat maps of module-trait relationships were then plotted according to Spearman correlation coefficients to evaluate each module with disease clinical features (Fig. 2 E). The purple-red module indicated high connectivity between ARDS and COVID-19 (ARDS: r = 0.16, p = 0.05; COVID-19: r = 0.17, p = 0.04). The purple-red module contained positively associated genes of ARDS and positively associated genes of COVID-19 (Fig. 2 F, 2 G).GO analysis of the module genes of ARDS and COVID-19 showed that the co-expressed genes of BP were mainly enriched in blood coagulation, trauma repair, and regulation of humoral level, and CC were mainly concentrated in the actin cytoskeleton, platelet α-granules, and cytoplasmic vesicle lumen. MF was mainly associated with actin binding, adhesin binding, and aminoglycans, and KEGG analysis of ARDS and COVID-19 showed that it was mainly associated with focal adhesion, PI3K-Akt pathway, and regulation of actin cytoskeleton (Fig. 3 A). Machine Learning Identification of Intersecting Genes in ARDS and COVID-19 We used RF method to screen the intersecting genes of ARDS and COVID-19, and further screened the intersecting genes by RF algorithm while visualizing them in the order of gene importance. The significance of the top 30 significant genes was also visualized (Fig. 3 B, 3 C). We further performed dimensionality reduction by LASSO to obtain the last 5 genes, which were HIST1H2BK, TCF4, OLFM4, KIF14, and HK1 (Fig. 3 D, 3 E).We constructed candidate gene diagnostic models from the GSE66890 training set using XGBoost and validated them in the GSE32707 dataset. In the GSE66890 dataset of ARDS, the AUC of ROC curves was 0.952 and the AUC of PRCurves was 0.961, while in the validation set GSE32707, the AUC of ROC curves was 0.725 and the AUC of PRCurves was 0.543; the AUCs of ROC curves of ARDS models were all greater than 0.7, indicating that the model has good diagnostic value (Fig. 3 F, 3 G). Meanwhile, to verify whether it could identify COVID-19 patients, we used the COVID-19 dataset GSE213313 in the same model, and the results showed that the AUC of ROC curves was 1 and the AUC of PR curves was 1, indicating that the model was also used in COVID-19 patients with a high diagnostic effect (Fig. 3 H). We further evaluated the diagnostic value of the five central genes screened, and the ROC curves of HIST1H2BK (AUC = 0.802), OLFM4 (AUC = 0.716), KIF14 (AUC = 0.812), and HK1 (AUC = 0.740) were all greater than 0.7, which had high diagnostic value, and TCF4 (AUC = 0.692) had a diagnostic value that was relatively slightly worse (Fig. 4 A-E). Immune infiltration analysis The pathomechanisms of both ARDS and COVID-19 are related to inflammatory responses due to overstimulation of the immune system, and therefore we performed immune infiltration analyses on the datasets of both. We analyzed disease and immune infiltrating cell correlations based on 22 types of immune cells using the CIBERSORT method (Fig. 4 F-G). Violin plots showed that naïve B cells and regulatory T cells appeared increased in combined ARDS samples compared to control samples; in COVID-19 samples, activated mast cells, macrophages, resting NK cells and plasma cells showed an upward trend compared to normal samples, and memory B cells, CD8 T cells, resting memory CD4 T cells, activated NK cells and activated dendritic cells showed a downward trend. Single-cell sequencing in patients with combined ARDS and sepsis We downloaded the single-cell RNA sequencing dataset (GSE168522) and selected one healthy and one AD patient in the dataset as a pruning subject for analysis. First, we performed data quality control. We retained cells with less than 10% mitochondrial genes and less than 3% erythrocytes. Cells with gene number (nFeature RNA) greater than 2000 or less than 200 were filtered out (Fig. 5 A-C). The batch effects were also merged and corrected by the "Harmony" package (Fig. 5 D), with a high overlap between samples, and then the number of PCs with smooth curves was selected, and the top 10 dimensions were taken for the subsequent analysis, and the downscaling effects of UMAP and tSNE were shown. We further clustered the cells using the FindCluster function, which showed that the percentages of monocytes, B cells, T cells, NK cells and platelets increased in the ARDS group. The cell ratio analysis also showed that T cells, monocytes, and B cells were more abundant in patients with sepsis combined with ARDS, and the number of NK cells tended to be higher in patients with sepsis combined with ARDS(Fig. 6 A-B). We performed cellular annotation of five genes previously screened by machine learning and diagnostic prediction modeling in the merged ARDS group and the unmerged ARDS single-cell RNA sequencing dataset. The results showed that TCF4, HK1, and HIST1H2BK were annotated in all five cell clusters, and KIF14 was mainly annotated by monocytes (Fig. 6 C).TCF4-annotated cells were mainly concentrated in B cells and monocytes, HK1-annotated cells were mainly concentrated in monocytes and T cells, and HIST1H2BK-annotated cells were mainly concentrated in monocytes. TCF4, HK1 were highly expressed in both groups, but the gene expression was higher in the combined ARDS group than in the unincorporated ARDS group, and the expression of HIST1H2BK and KIF14 was lower in both groups, and KIF14 was almost not expressed (Fig. 6 D). Due to the number of samples and the sequencing method, OLFM4 was not detected in the single-cell sequencing of the dataset. Next, we analyzed the cell ratio and expression of the four genes; TCF4 had elevated expression in B cells, monocytes and platelets in the combined ARDS group, and B cells in the uncomplicated ARDS group, but the cell percentage was smaller than that in the combined ARDS group; HK1 had elevated expression in monocytes, NK cells and platelets in the combined ARDS group, and the cell percentage was larger than that in the uncomplicated ARDS group (Fig. 7 A); HIST1H2BK had elevated platelet expression in the uncomplicated ARDS group and a greater cell percentage than in the uncomplicated ARDS group (Fig. 7 B). Previous studies have shown that oxidative stress plays an important role in the pathogenesis of sepsis and ARDS, and that oxidative stress and inflammatory response interact with each other to promote disease progression. Analysis of the ssGSEA metabolic pathway showed that there was a difference in oxidative scores between the two groups, with the combined ARDS group having a higher oxidative score than the uncomplicated ARDS group (Fig. 7 C), suggesting that the patients with combined ARDS had intense oxidative stress and a more severe disease. TCF4 and HK1 were further analyzed due to their higher expression and percentage in the five cell clusters. The results of the analysis showed that the expression of TCF4 and HK1 was increased in monocytes and B cells, and the gene coexpression overlap was higher in both monocytes and B cells in patients with combined ARDS compared with patients without combined ARDS (Fig. 7 D). Correlation analysis of TCF4 and oxidation scores showed a high correlation between TCF4 and oxidation scores in B cells of patients with combined ARDS and a high correlation between TCF4 and oxidation scores in monocytes of patients without combined ARDS (Fig. 7 E). Discussion The global pandemic of SARS-CoV-2 has not only increased the focus on COVID-19, but has also stimulated an in-depth exploration of the complex relationship between acute respiratory distress syndrome (ARDS) and it ( 16 , 17 ). Our research focuses on uncovering the biological mechanisms that may be shared by these two diseases, particularly in terms of similarities in gene expression and immune responses. By combining machine learning algorithms and single-cell sequencing analyses, we not only identified a potential link between ARDS and COVID-19, but also highlighted several key genes, such as TCF4 and HK1, that may play a critical role in the pathogenesis of both diseases. In our analysis, we identified differentially expressed genes (DEGs) that are co-up- and down-regulated in ARDS and COVID-19, suggesting that they play important roles in the pathophysiology of these diseases. In particular, our enrichment analysis revealed the major biological functions and pathways involved in these genes, providing valuable insights for a deeper understanding of these diseases. For example, KEGG pathway analysis revealed that these common DEGs are closely associated with key biological processes such as inflammatory response and apoptosis, which play a central role in the severe pathophysiology of ARDS and COVID-19. In addition, our study employs advanced machine learning techniques such as Random Forest (RF) and Least Absolute Shrinkage and Selection Operator (LASSO) to refine our identification of potential key hub genes. Using the RF approach, we initially screened a set of candidate diagnostic genes, and to further refine these results and identify common key genes for COVID-19 and ARDS, we performed a downscaling analysis using the LASSO approach, which ultimately led to the identification of five key genes: HIST1H2BK, TCF4, OLFM4, KIF14, and HK1. These genes were highly significant in our model and have been shown in the literature to play key roles in processes such as immune response, inflammation, and cell death. In particular, TCF4 displays key hub gene properties associated with immune regulation in the context of ARDS and COVID-19, and shows altered expression patterns under various inflammatory conditions. This is highly consistent with recent findings, one of which showed that the combination of hCMSCs and liraglutide significantly improved the therapeutic efficacy of ALI via the cAMP/PKAc/β-catenin signaling pathway, in which TCF4 plays a central role ( 18 ). In addition, gene expression analysis of SARS-CoV-2 infection revealed that TCF4 may be associated with cardiovascular complications of COVID-19, particularly with regard to vascular function ( 19 ). More specifically, a study of pediatric patients with COVID-19 identified autoantibodies against TCF4, highlighting the potential importance of TCF4 in the regulation of immune responses ( 20 ). Taken together, these findings highlight the central role of TCF4 in the pathophysiology of ARDS and COVID-19 and the importance of further exploring its function and therapeutic potential.The HK1 gene is of particular importance in the context of ARDS and COVID-19. Notably, the study identified unique SARS-CoV-2 phylogenetic clusters, mostly associated with HK1, coinciding with the large-scale COVID-19 outbreak in Hong Kong in July 2020 ( 21 ). In addition, a study using 18F-FDG PET technology revealed a critical role for HK1 in neutrophil activation and lung inflammation, suggesting that increased HK1 activity is associated with increased neutrophil glucose uptake and migration capacity, which is a key factor in the development of ALI/ARDS ( 22 ). These findings not only emphasize the importance of HK1 in the inflammatory and immune response, but also highlight its potential value in molecular imaging and early disease diagnosis.OLFM4 may play an important role in the pathophysiology of ARDS and COVID-19, particularly in the inflammatory response and immune regulation. This is supported by the existing literature, in which one study constructed a predictive model based on transcriptional biomarkers and clinical parameters and emphasized the importance of OLFM4 in predicting sepsis-induced ARDS ( 23 ). In addition, a global gene expression study and docking analysis confirmed the overexpression of OLFM4 in COVID-19 infection ( 24 ). Although KIF14 has been shown to be important in the pathophysiology of COVID-19, especially in the co-pathogenesis with digestive cancers, its role in ARDS is unclear ( 25 ). This discrepancy not only highlights the complexity of gene expression in different disease conditions, but also provides a new direction for future studies to explore in depth the potential function and role of KIF14 in ARDS. To validate our findings and assess the potential of these genes in disease diagnosis, we further performed XGBoost-based diagnostic model construction and validation. Notably, the key genes we identified, such as HIST1H2BK, OLFM4, KIF14, and HK1, showed good performance in the model, suggesting their potential value in the diagnosis of ARDS and COVID-19. The expression levels of these genes were strongly correlated with the severity of the disease and the clinical outcome of the patients, further emphasizing their importance in clinical applications. In addition, we investigated the role of immune cells in ARDS and COVID-19, in particular their infiltration patterns and functional status during the pathological process. Using CIBERSORT algorithm analysis, we observed changes in the distribution of different immune cell subpopulations in diseased tissues that were consistent with the expression patterns of the key genes we identified. This not only reveals the complexity of the immune response, but also highlights the need for further studies to understand the immunopathology of ARDS and COVID-19. Although our study provides valuable insights, there are several limitations. First, due to the lack of a comprehensive single-cell data set, our analysis relied heavily on available public data and previously published studies. In addition, our study did not directly examine how specific genes affect intercellular communication or the tissue microenvironment, which may be an important direction for future research. Finally, while our model shows diagnostic potential, validation of these findings in a broader patient population is needed. In conclusion, our study reveals a potential molecular link between ARDS and COVID-19, particularly in terms of gene expression and immune response. The key genes and pathways we identified not only provide insight into these diseases, but may also guide future therapeutic strategies and drug development. However, more research is needed to understand in detail how these genes and pathways specifically affect the disease process and their potential for clinical application. Declarations Conflict of Interest Chuanxi Tian, Yikun Guo, Huifang Guan, Kaile Ma, Rui Hao, Wei Zhu, Jinyue Zhao, and Min Li declare that they have no competing interests. Author Contributions Conceptualization: C.X.T., M.L., and W.Z.; Formal analysis and Data Curation: Y.K.G., C.X.T., and J.Y.Z.; Writing - Original Draft: C.X.T., Y.K.G., H.F.G.; Writing - Review & Editing: C.X.T., Y.K.G., H.F.G., M.L., K.L.M., and R.H. All authors read and approved the final manuscript. Funding This work was supported by the "China-Austria Joint Laboratory Construction and Joint Research on Traditional Chinese Medicine for Prevention and Treatment of Major Infectious Diseases along the Belt and Road" project (Project No. 2020YFE0205100), with the support of the National Key Program for Strategic International Science and Technology Innovation and Cooperation. Acknowledgments None. Data availability statement The datasets generated and/or analysed during the current study are available in the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/), for the ARDS dataset (GSE32707 and GSE66890) and the COVID-19 dataset (GSE213313). References Thompson BT, Chambers RC, Liu KD. Acute Respiratory Distress Syndrome. N Engl J Med (2017) 377:562–572. doi: 10.1056/NEJMra1608077 Min Tang, Na Li. Pathophy siological mechanism of acute respiratory distress syndrome and research progress on diagnostic biomarkers of ARDS. China Journal of Modern Medicine (2022) 32:1–6. Bellani G, Laffey JG, Pham T, Fan E, Brochard L, Esteban A, Gattinoni L, van Haren F, Larsson A, McAuley DF, et al. Epidemiology, Patterns of Care, and Mortality for Patients With Acute Respiratory Distress Syndrome in Intensive Care Units in 50 Countries. JAMA (2016) 315:788–800. doi: 10.1001/jama.2016.0291 Majumder J, Minko T. Recent Developments on Therapeutic and Diagnostic Approaches for COVID-19. AAPS J (2021) 23:14. doi: 10.1208/s12248-020-00532-2 Lippi G, Sanchis-Gomar F, Henry BM. COVID-19 and its long-term sequelae: what do we know in 2023? Pol Arch Intern Med (2023) 133:16402. doi: 10.20452/pamw.16402 Zheng J, Miao J, Guo R, Guo J, Fan Z, Kong X, Gao R, Yang L. Mechanism of COVID-19 Causing ARDS: Exploring the Possibility of Preventing and Treating SARS-CoV-2. Front Cell Infect Microbiol (2022) 12:931061. doi: 10.3389/fcimb.2022.931061 Quesada-Gomez JM, Entrenas-Castillo M, Bouillon R. Vitamin D receptor stimulation to reduce acute respiratory distress syndrome (ARDS) in patients with coronavirus SARS-CoV-2 infections: Revised Ms SBMB 2020_166. J Steroid Biochem Mol Biol (2020) 202:105719. doi: 10.1016/j.jsbmb.2020.105719 Shen Y, Liu J, Zhang L, Dong S, Zhang J, Liu Y, Zhou H, Dong W. Identification of Potential Biomarkers and Survival Analysis for Head and Neck Squamous Cell Carcinoma Using Bioinformatics Strategy: A Study Based on TCGA and GEO Datasets. Biomed Res Int (2019) 2019:7376034. doi: 10.1155/2019/7376034 Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, et al. NCBI GEO: archive for functional genomics data sets–update. Nucleic Acids Res (2013) 41:D991-995. doi: 10.1093/nar/gks1193 WGCNA: an R package for weighted correlation network analysis - PubMed. https://pubmed.ncbi.nlm.nih.gov/19114008/ [Accessed November 14, 2023] Breiman L. Random Forests. Machine Learning (2001) 45:5–32. doi: 10.1023/A:1010933404324 Tibshirani R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological) (1996) 58:267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x Engebretsen S, Bohlin J. Statistical predictions with glmnet. Clin Epigenetics (2019) 11:123. doi: 10.1186/s13148-019-0730-1 Chen B, Khodadoust MS, Liu CL, Newman AM, Alizadeh AA. Profiling Tumor Infiltrating Immune Cells with CIBERSORT. Methods Mol Biol (2018) 1711:243–259. doi: 10.1007/978-1-4939-7493-1_12 Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, Hao Y, Stoeckius M, Smibert P, Satija R. Comprehensive Integration of Single-Cell Data. Cell (2019) 177:1888–1902.e21. doi: 10.1016/j.cell.2019.05.031 Meyer NJ, Gattinoni L, Calfee CS. Acute respiratory distress syndrome. Lancet (2021) 398:622–637. doi: 10.1016/S0140-6736(21)00439-6 Attaway AH, Scheraga RG, Bhimraj A, Biehl M, Hatipoğlu U. Severe covid-19 pneumonia: pathogenesis and clinical management. BMJ (2021) 372:n436. doi: 10.1136/bmj.n436 Feng Y, Wang L, Ma X, Yang X, Don O, Chen X, Qu J, Song Y. Effect of hCMSCs and liraglutide combination in ALI through cAMP/PKAc/β-catenin signaling pathway. Stem Cell Res Ther (2020) 11:2. doi: 10.1186/s13287-019-1492-6 Jha PK, Vijay A, Halu A, Uchida S, Aikawa M. Gene Expression Profiling Reveals the Shared and Distinct Transcriptional Signatures in Human Lung Epithelial Cells Infected With SARS-CoV-2, MERS-CoV, or SARS-CoV: Potential Implications in Cardiovascular Complications of COVID-19. Front Cardiovasc Med (2020) 7:623012. doi: 10.3389/fcvm.2020.623012 Bartley CM, Johns C, Ngo TT, Dandekar R, Loudermilk RL, Alvarenga BD, Hawes IA, Zamecnik CR, Zorn KC, Alexander JR, et al. Anti-SARS-CoV-2 and Autoantibody Profiles in the Cerebrospinal Fluid of 3 Teenaged Patients With COVID-19 and Subacute Neuropsychiatric Symptoms. JAMA Neurol (2021) 78:1503–1509. doi: 10.1001/jamaneurol.2021.3821 To KK-W, Chan W-M, Ip JD, Chu AW-H, Tam AR, Liu R, Wu AK-L, Lung K-C, Tsang OT-Y, Lau DP-L, et al. Unique Clusters of Severe Acute Respiratory Syndrome Coronavirus 2 Causing a Large Coronavirus Disease 2019 Outbreak in Hong Kong. Clin Infect Dis (2021) 73:137–142. doi: 10.1093/cid/ciaa1119 Rodrigues RS, Bozza FA, Hanrahan CJ, Wang L-M, Wu Q, Hoffman JM, Zimmerman GA, Morton KA. 18F-fluoro-2-deoxyglucose PET informs neutrophil accumulation and activation in lipopolysaccharide-induced acute lung injury. Nucl Med Biol (2017) 48:52–62. doi: 10.1016/j.nucmedbio.2017.01.005 Yao R-Q, Shen Z, Ma Q-M, Ling P, Wei C-R, Zheng L-Y, Duan Y, Li W, Zhu F, Sun Y, et al. Combination of transcriptional biomarkers and clinical parameters for early prediction of sepsis indued acute respiratory distress syndrome. Front Immunol (2022) 13:1084568. doi: 10.3389/fimmu.2022.1084568 A J, N A, K R. Global Gene Expression and Docking Profiling of COVID-19 Infection. Frontiers in genetics (2022) 13: doi: 10.3389/fgene.2022.870836 Xiong Z, Yang Y, Li W, Lin Y, Huang W, Zhang S. Exploring Key Biomarkers and Common Pathogenesis of Seven Digestive System Cancers and Their Correlation with COVID-19. Curr Issues Mol Biol (2023) 45:5515–5533. doi: 10.3390/cimb45070349 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3892523","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":271050272,"identity":"95ccea90-1853-40f4-a15d-715b0d57b111","order_by":0,"name":"Chuanxi Tian","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABEklEQVRIie2RP0vDQBTA7wgkywM7XijETyBcCbSD2n6VHAGzOknHOwLn0upqv0WmolvCDVnyAVK6JIuumcQu1WuHosNF3BzuBw/e8H68fwhZLP8RRwfmhwyLJppfwZknRNv1GPBNSWlX3QT+QqUh6VPQSUGuv5IqpHUiB9CjzDzntdk9q/MJifkQXIdlm1YigqbBBTcN5k5Gy0qNXhYFDwFcttoy2dyiOBznxl3GQywVzkrBYyDAHrbsnhKUs7VR8d4PyixTmCughPFNIQn0KnDswnQXIZ4iGg5q/Iui4M5fyiTOqiJFXR7pIzN9ZGrexXss12QnL6+zOnn7YPtP/cqybbv5NDApJujfyi0Wi8Xyky/r0GIntyX0RwAAAABJRU5ErkJggg==","orcid":"","institution":"Guang’anmen Hospital, China Academy of Chinese Medical Sciences","correspondingAuthor":true,"prefix":"","firstName":"Chuanxi","middleName":"","lastName":"Tian","suffix":""},{"id":271050273,"identity":"c1e4b386-02a7-43d2-b62e-e8c7398defa6","order_by":1,"name":"Yikun Guo","email":"","orcid":"","institution":"Beijing University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Yikun","middleName":"","lastName":"Guo","suffix":""},{"id":271050274,"identity":"e45a095c-cf67-42de-aec6-44ae08bb7ded","order_by":2,"name":"Huifang Guan","email":"","orcid":"","institution":"Changchun University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Huifang","middleName":"","lastName":"Guan","suffix":""},{"id":271050275,"identity":"374f2f32-5f25-484d-9538-37e6ec91f188","order_by":3,"name":"Kaile Ma","email":"","orcid":"","institution":"China Academy of Chinese Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Kaile","middleName":"","lastName":"Ma","suffix":""},{"id":271050276,"identity":"f03b11eb-aff2-46b4-80bd-17e128e11714","order_by":4,"name":"Rui Hao","email":"","orcid":"","institution":"China Academy of Chinese Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Rui","middleName":"","lastName":"Hao","suffix":""},{"id":271050277,"identity":"db452612-8edd-4850-97dd-41a2896be40f","order_by":5,"name":"Wei Zhu","email":"","orcid":"","institution":"Department of Traditional Chinese Medicine,The 942th Hospital of Joint Logistics Support force of Chinese People's Liberation Army","correspondingAuthor":false,"prefix":"","firstName":"Wei","middleName":"","lastName":"Zhu","suffix":""},{"id":271050278,"identity":"105ce87e-b346-49d1-ad3c-302bc430b608","order_by":6,"name":"Jinyue Zhao","email":"","orcid":"","institution":"Changchun University of Chinese Medicine","correspondingAuthor":false,"prefix":"","firstName":"Jinyue","middleName":"","lastName":"Zhao","suffix":""},{"id":271050279,"identity":"e4cddc89-30f0-4fc9-b3cc-5a72a1660d25","order_by":7,"name":"Min Li","email":"","orcid":"","institution":"Guang’anmen Hospital, China Academy of Chinese Medical Sciences","correspondingAuthor":false,"prefix":"","firstName":"Min","middleName":"","lastName":"Li","suffix":""}],"badges":[],"createdAt":"2024-01-24 01:44:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3892523/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3892523/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":50750381,"identity":"91139d2d-23e0-4f5c-bef3-41342723b417","added_by":"auto","created_at":"2024-02-06 17:30:11","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":133827,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eDifferential expression analysis and enrichment analysis of differentially expressed genes in ARDS and COVID-19 datasets. \u003c/strong\u003eA, B Principal component analysis plots. C Up-regulated differential gene intersection. D Down-regulated differential gene intersection. E Up-regulated differential gene GO analysis. F Down-regulated differential GO analysis\u003c/p\u003e","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-3892523/v1/1e1cecd9fae079380dacf6bb.png"},{"id":50750384,"identity":"403bdcdf-4252-4764-9de0-89dc70295624","added_by":"auto","created_at":"2024-02-06 17:30:11","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":176368,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eWeighted gene co-expression network analysis of ARDS and COVID-19. \u003c/strong\u003eA, B WGCNA co-expression network. C, D Gene function clustering map. E Heat map of module-trait relationships. F Distribution map of ARDS positively associated genes. G Distribution of COVID-19 positively associated genes\u003c/p\u003e","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-3892523/v1/ea1b0f72a8415658d8a8cf20.png"},{"id":50750386,"identity":"7d80b33c-a741-40c0-9a5b-b03c79f2a36e","added_by":"auto","created_at":"2024-02-06 17:30:11","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":243882,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eModular gene enrichment analysis of ARDS and COVID-19 and machine learning analysis of intersecting genes in ARDS and COVID-19.\u003c/strong\u003e A Bar graph of GO analysis. B Random forest analysis. C Gene significance visualization. D, E Lasso regression model analysis. F XGBoost diagnostic model for ARDS. G XGBoost diagnostic model for COVID-19\u003c/p\u003e","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-3892523/v1/c361296a76d501c0f8f8d685.png"},{"id":50750388,"identity":"26222c0d-e508-4be4-bbc1-e7196d0d6cea","added_by":"auto","created_at":"2024-02-06 17:30:11","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":311854,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAnalysis of the diagnostic value of central genes and immune infiltration analysis of the ARDS and COVID-19 datasets. \u003c/strong\u003eA HIST1H2BK. B OLFM4. C KIF14. D HK1. E TCF4. F Immuno-infiltration analysis of sepsis samples and samples with sepsis combined with ARDS. G Immune infiltration analysis of normal and COVID-19 samples\u003c/p\u003e","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-3892523/v1/13613c446530218a166f6bdf.png"},{"id":50750383,"identity":"2cb4de93-29e8-486f-8cfa-ed47a70ffe54","added_by":"auto","created_at":"2024-02-06 17:30:11","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":51725,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eQuality control and annotation of single-cell sequencing data. \u003c/strong\u003eA Single-cell sequencing cell content . B Genes (features), counts and mitochondrial gene percentages before quality control. C Genes (characteristics), counts and mitochondrial gene percentages after quality control. D Combination of batch effects, correction and selection of PC values\u003c/p\u003e","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-3892523/v1/3f541cb5495046b798e93d62.png"},{"id":50750385,"identity":"fb4d299d-61c6-4de2-bb1d-bd7175716307","added_by":"auto","created_at":"2024-02-06 17:30:11","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":63249,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSingle-cell sequencing analysis. \u003c/strong\u003eA Cellular annotation and UMAP visualization of single-cell sequencing data, different colors indicate different clusters; blue: B cell; orange: monocyte; green: NK cell; purple: platelet: grey: T cell. B Visualization of the ratio of various cells. C Expression levels of HIST1H2BK, TCF4, KIF14, and HK1 in five cell clusters. D Expression levels of HIST1H2BK, TCF4, KIF14, and HK1 in patients with sepsis and sepsis combined with ARDS\u003c/p\u003e","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-3892523/v1/717d6ffd75fd95d04bd6c40a.png"},{"id":50751454,"identity":"a9be9af3-a8af-416e-8ab0-9b1c8abcc7ba","added_by":"auto","created_at":"2024-02-06 17:38:11","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":44509,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSingle-cell sequencing analysis of central genes.\u003c/strong\u003e A Proportion of HIST1H2BK, TCF4, KIF14, and HK1 in different sample cell clusters. B Expression levels of HIST1H2BK, TCF4, KIF14, and HK1 in different sample cell clusters. C Analysis of oxidative scores in patients with sepsis and sepsis combined with ARDS. D Expression levels of TCF4, and HK1 were analyzed in different cell clusters from patients with sepsis and sepsis-combined ARDS. E Correlation analysis between TCF4 and oxidative score\u003c/p\u003e","description":"","filename":"Onlinefloatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-3892523/v1/19fa5d901cd856caaadfdd28.png"},{"id":67178163,"identity":"b119249a-d0f2-4b21-93eb-54f00433cd7f","added_by":"auto","created_at":"2024-10-22 05:25:32","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1981222,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3892523/v1/f3fd5e19-b9a4-4643-b177-3587e4893c15.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Combination of multiple omics and machine learning identifies diagnostic genes for ARDS and COVID-19","fulltext":[{"header":"Introduction","content":"\u003cp\u003eAcute respiratory distress syndrome (ARDS) is a syndrome clinically characterized by progressive respiratory distress and refractory hypoxemia, with diffuse alveolar infiltrates seen on chest radiography(\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e). ARDS has a rapid onset and progression, and is one of the leading causes of death in critically ill patients(\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e).ARDS patients account for 10.4% of ICU admissions, and mortality is as high as 35%-45%(\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e).The pathogenesis of ARDS is complex and difficult to treat, so there is a need for more in-depth study of ARDS. Patients with ARDS account for 10.4% of ICU admissions, with mortality rates as high as 35%-45%(\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e). The pathogenesis of ARDS is complex and difficult to treat, so more in-depth mechanistic studies of ARDS are needed to identify new biomarkers for early diagnosis, treatment and prognosis.\u003c/p\u003e \u003cp\u003eCOVID-19 is an acute respiratory infectious disease caused by Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), which gradually spreads to all parts of the world as outbreaks occur, causing serious impacts on global economic and social development (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e). The main manifestations of novel coronavirus pneumonia are fever, dry cough, and malaise, and a few patients are accompanied by upper respiratory and gastrointestinal symptoms such as nasal congestion, runny nose, and diarrhea, and it also affects multiple organs of the human body, such as cardiovascular, gastrointestinal tract, liver, and kidneys. Studies have shown that the clinical burden of COVID-19 may extend well beyond the acute infection phase, where long-term multiple sequelae have a significant impact on an individual's quality of life, and is the most important health challenge globally (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eRelevant studies have shown that patients with severe and critically ill COVID-19 are prone to rapid progression to ARDS in a short period of time, and that the incidence and severity of ARDS due to COVID-19 are positively correlated and have a worse prognosis than patients with simple ARDS.(\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e) Late stage ARDS due to COVID-19 is difficult to control, with a mortality rate ranging from 26% to 61. 5%, and early detection, diagnosis and treatment are critical to control and improve the prognosis of patients with COVID-19-induced ARDS with symptoms of lung inflammation, thick airway mucus secretion, elevated levels of proinflammatory cytokines, lung injury and microthrombosis (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e). Therefore, it is important to explore the mechanism of progression of COVID-19 to ARDS, and the study of related diagnostic, therapeutic and prognostic biomarkers is more urgent.\u003c/p\u003e \u003cp\u003eIn recent years, with the continuous development of computer and biosequencing technologies, it has become possible to analyze disease-related genes at the molecular level as medical research continues to deepen. Bioinformatics, a science that combines molecular biology and information technology, has been widely used in recent studies (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). In this study, we screened and analyzed the common genes of ARDS and COVID-19 by various methods of bioinformatics, hoping to provide new biomarkers and theoretical basis for the diagnosis and treatment of both diseases.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Differential Gene Screening and Enrichment Analysis\u003c/h2\u003e \u003cp\u003eWe collected microarray data from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.ncbi.nlm.nih.gov/geo/\u003c/span\u003e\u003cspan address=\"https://www.ncbi.nlm.nih.gov/geo/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) (\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e) for the ARDS dataset (GSE32707 and GSE66890) and the COVID-19 dataset (GSE213313). The data were filtered, background corrected, log2 transformed and normalized by R software. The \"SVA\" package was used to merge and remove the batch effect on the datasets GSE32707 and GSE66890, and to perform principal component analysis (PCA) on the samples in the three datasets to observe the distribution of clusters among the samples. The \"limma\" package was used to identify the differentially expressed genes (DEGs) of ARDS and COVID-19, and |log2(FC)|\u0026gt; 1 and p \u0026lt; 0.05 were set as the screening criteria. Next, we analyzed the common genes that were up-regulated and down-regulated in ARDS and COVID-19 using Wayne plots and performed gene enrichment analysis (Gene Ontology, GO).\u003c/p\u003e \u003cp\u003e \u003cb\u003eWeighted Gene Co-expression Network Analysis\u003c/b\u003e \u003c/p\u003e \u003cp\u003eWe used the \"WGCNA\" package to construct co-expression gene modules and screened genes with expression \u0026gt; 0.5 for further analysis by selecting the optimal soft thresholds, identifying the most highly correlated ARDS and COVID-19 modules as well as multiple co-expression module genes for further analysis (\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cb\u003eMachine Learning Candidate Gene Screening\u003c/b\u003e \u003c/p\u003e \u003cp\u003eRandom forest (RF) algorithm and least absolute shrinkage and selection operator (LASSO) logistic regression were used to screen key genes from the crossover genes of DEGs and WGCNAs (\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e).RF is an integrated prediction method that handles a large number of input variables and evaluates the significance of the variables.LASSO is a regression method that has shown superiority in evaluating high-dimensional data. We used the RF algorithm to first screen the diagnostic genes with importance scores greater than 0.5. Among the obtained genes, the LASSO algorithm was further downscaled to obtain the final diagnostic genes, and their respective ROC curves were plotted. RF analysis and LASSO regression were performed using the R packages \"random forest\" and \"glmnet\" (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cb\u003eHub gene diagnostic model construction\u003c/b\u003e \u003c/p\u003e \u003cp\u003eExtreme Gradient Boosting (XGBoost) is a commonly used supervised integrated learning algorithm with powerful scalability and convenient features for model visualization and optimization, and the expression values of hub genes are used as feature values for training XGBoost models [16]. We first selected the ARDS (GSE32707) and COVID-19 (GSE213313) datasets as the training set, and used the ARDS (GSE66890) dataset for validation. The diagnostic effectiveness of the model was evaluated by plotting the receiver operating characteristic (ROC), precision recall (PR) curve, and area under the curve (AUC).\u003c/p\u003e \u003cp\u003e \u003cb\u003eAnalysis of immune infiltration\u003c/b\u003e \u003c/p\u003e \u003cp\u003eCIBERSORT is an inverse convolution algorithm that has been widely used to label the genomes of different types of immune cells in the microenvironment (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e). The algorithm simulates the transcriptionally characterized substrates of 22 types of immune cells, including B cells, plasma cells, T cells, natural killer cells, monocytes, macrophages, dendritic cells, mast cells, eosinophils, and neutrophils, using the R software in conjunction with the inverse convolution method. We compared immune cell infiltration in peripheral blood mononuclear cell (PBMC) samples from the disease group with normal samples.\u003c/p\u003e \u003cp\u003e \u003cb\u003eSingle-cell sequencing quality control and downscaling\u003c/b\u003e \u003c/p\u003e \u003cp\u003eWe downloaded the single-cell RNA sequencing dataset (GSE151263) from the GEO database, and the \"Seurat\" and \"SingleR\" software packages were used to analyze the scRNA-seq dataset. Cells with ≤ 10% of mitochondrial genes and ≤ 3% of erythroid genes were retained. At the same time, we removed the number of genes (nFeature RNA) ≤ 200 or 5000 ≥ cells. Next, we performed downscaling and clustering and selected 3000 highly variable genes. Combined with the elbow plot, we selected inflection points and PCs with smooth curves, selected the top 10 dimensions for subsequent analysis, and showed the effect of UMAP and tSNE on downscaling. We then performed cell-associated annotation by immuno-cell-associated labeling using the \"SingleR\" package (\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e). Finally, we visualized the expression of hub genes in different immune cells using violin plots.\u003c/p\u003e \u003cp\u003e \u003cb\u003eStatistical Analysis\u003c/b\u003e \u003c/p\u003e \u003cp\u003eAll statistical tests were performed using R software version 4.1.2. Differences between the two groups were analyzed using the Wilcoxon or Student's t-test. Correlations between variables were determined using the Pearson or Spearman correlation test. Statistical significance was set at a two-tailed p \u0026lt; 0.05.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003eIdentification of differentially expressed genes between ARDS and COVID-19\u003c/p\u003e\u003cp\u003ePrincipal component analysis (PCA) was used to visualize the distribution of these samples before and after correction for batch effects (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eA, \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eB). We performed data correction and normalization on three datasets (GSE32707, GSE66890, and GSE213313) and identified 1114 DEGs in ARDS, including 575 up-regulated and 539 down-regulated genes, and 3587 DEGs in COVID-19, including 1738 up-regulated and 1849 down-regulated genes. Meanwhile, by plotting the Wayne diagram to screen the common DEGs between ARDS and COVID-19, the results showed that 180 and 61 overlapping DEGs were found in the up-regulated and down-regulated DEGs, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eC, \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eD).\u003c/p\u003e\u003cp\u003e \u003cb\u003eEnrichment analysis\u003c/b\u003e \u003c/p\u003e\u003cp\u003eTo explore the biological functions and pathways of the common DEGs, we performed GO enrichment analysis (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eE, \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eF).The results of GO analysis showed that upregulated DEGs were mainly enriched in mitotic cell cycle, cytoplasmic vesicle lumen, and protein kinase regulator activity, while downregulated DEGs were mainly enriched in cellular defense response, endocytosis vesicles, and T cell receptor binding.\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cp\u003e \u003cb\u003eWeighted gene co-expression network analysis of ARDS and COVID-19\u003c/b\u003e \u003c/p\u003e\u003cp\u003eWe used co-expression analysis to construct co-expression networks to explore the correlation between clinical traits and genes. In this study, clustering analysis was performed using the \"Flash clust\" function. When the threshold was set to 75, 6 outlier samples were detected and deleted, and 145 samples were retained (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA, \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB).\u003c/p\u003e\u003cp\u003eThe \"Select Soft Threshold\" function of \"WGCNA\" filters out power parameters from 1 to 30. A power of β = 5 is selected as the most appropriate soft threshold to ensure a scale-free network, and the results show that the optimal soft power value is 10, and a total of 11 modules are identified.\u003c/p\u003e\u003cp\u003e\"Cutree\" dynamics and module characterizing gene functions were used to construct cluster maps (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC, \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eD), and a total of 11 modules consisting of genes with similar co-expression traits were obtained. Heat maps of module-trait relationships were then plotted according to Spearman correlation coefficients to evaluate each module with disease clinical features (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eE). The purple-red module indicated high connectivity between ARDS and COVID-19 (ARDS: r = 0.16, p = 0.05; COVID-19: r = 0.17, p = 0.04). The purple-red module contained positively associated genes of ARDS and positively associated genes of COVID-19 (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eF, \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eG).GO analysis of the module genes of ARDS and COVID-19 showed that the co-expressed genes of BP were mainly enriched in blood coagulation, trauma repair, and regulation of humoral level, and CC were mainly concentrated in the actin cytoskeleton, platelet α-granules, and cytoplasmic vesicle lumen. MF was mainly associated with actin binding, adhesin binding, and aminoglycans, and KEGG analysis of ARDS and COVID-19 showed that it was mainly associated with focal adhesion, PI3K-Akt pathway, and regulation of actin cytoskeleton (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA).\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cp\u003e \u003cb\u003eMachine Learning Identification of Intersecting Genes in ARDS and COVID-19\u003c/b\u003e \u003c/p\u003e\u003cp\u003eWe used RF method to screen the intersecting genes of ARDS and COVID-19, and further screened the intersecting genes by RF algorithm while visualizing them in the order of gene importance.\u003c/p\u003e\u003cp\u003eThe significance of the top 30 significant genes was also visualized (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB, \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC). We further performed dimensionality reduction by LASSO to obtain the last 5 genes, which were HIST1H2BK, TCF4, OLFM4, KIF14, and HK1 (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eD, \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eE).We constructed candidate gene diagnostic models from the GSE66890 training set using XGBoost and validated them in the GSE32707 dataset. In the GSE66890 dataset of ARDS, the AUC of ROC curves was 0.952 and the AUC of PRCurves was 0.961, while in the validation set GSE32707, the AUC of ROC curves was 0.725 and the AUC of PRCurves was 0.543; the AUCs of ROC curves of ARDS models were all greater than 0.7, indicating that the model has good diagnostic value (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eF, \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eG). Meanwhile, to verify whether it could identify COVID-19 patients, we used the COVID-19 dataset GSE213313 in the same model, and the results showed that the AUC of ROC curves was 1 and the AUC of PR curves was 1, indicating that the model was also used in COVID-19 patients with a high diagnostic effect (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eH).\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cp\u003eWe further evaluated the diagnostic value of the five central genes screened, and the ROC curves of HIST1H2BK (AUC = 0.802), OLFM4 (AUC = 0.716), KIF14 (AUC = 0.812), and HK1 (AUC = 0.740) were all greater than 0.7, which had high diagnostic value, and TCF4 (AUC = 0.692) had a diagnostic value that was relatively slightly worse (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA-E).\u003c/p\u003e\u003cp\u003eImmune infiltration analysis\u003c/p\u003e\u003cp\u003eThe pathomechanisms of both ARDS and COVID-19 are related to inflammatory responses due to overstimulation of the immune system, and therefore we performed immune infiltration analyses on the datasets of both. We analyzed disease and immune infiltrating cell correlations based on 22 types of immune cells using the CIBERSORT method (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eF-G). Violin plots showed that naïve B cells and regulatory T cells appeared increased in combined ARDS samples compared to control samples; in COVID-19 samples, activated mast cells, macrophages, resting NK cells and plasma cells showed an upward trend compared to normal samples, and memory B cells, CD8 T cells, resting memory CD4 T cells, activated NK cells and activated dendritic cells showed a downward trend.\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cp\u003e \u003cb\u003eSingle-cell sequencing in patients with combined ARDS and sepsis\u003c/b\u003e \u003c/p\u003e\u003cp\u003eWe downloaded the single-cell RNA sequencing dataset (GSE168522) and selected one healthy and one AD patient in the dataset as a pruning subject for analysis. First, we performed data quality control. We retained cells with less than 10% mitochondrial genes and less than 3% erythrocytes. Cells with gene number (nFeature RNA) greater than 2000 or less than 200 were filtered out (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA-C). The batch effects were also merged and corrected by the \"Harmony\" package (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eD), with a high overlap between samples, and then the number of PCs with smooth curves was selected, and the top 10 dimensions were taken for the subsequent analysis, and the downscaling effects of UMAP and tSNE were shown.\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cp\u003eWe further clustered the cells using the FindCluster function, which showed that the percentages of monocytes, B cells, T cells, NK cells and platelets increased in the ARDS group. The cell ratio analysis also showed that T cells, monocytes, and B cells were more abundant in patients with sepsis combined with ARDS, and the number of NK cells tended to be higher in patients with sepsis combined with ARDS(Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA-B).\u003c/p\u003e\u003cp\u003eWe performed cellular annotation of five genes previously screened by machine learning and diagnostic prediction modeling in the merged ARDS group and the unmerged ARDS single-cell RNA sequencing dataset. The results showed that TCF4, HK1, and HIST1H2BK were annotated in all five cell clusters, and KIF14 was mainly annotated by monocytes (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eC).TCF4-annotated cells were mainly concentrated in B cells and monocytes, HK1-annotated cells were mainly concentrated in monocytes and T cells, and HIST1H2BK-annotated cells were mainly concentrated in monocytes. TCF4, HK1 were highly expressed in both groups, but the gene expression was higher in the combined ARDS group than in the unincorporated ARDS group, and the expression of HIST1H2BK and KIF14 was lower in both groups, and KIF14 was almost not expressed (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eD). Due to the number of samples and the sequencing method, OLFM4 was not detected in the single-cell sequencing of the dataset.\u003c/p\u003e\u003cp\u003e \u003c/p\u003e\u003cp\u003eNext, we analyzed the cell ratio and expression of the four genes; TCF4 had elevated expression in B cells, monocytes and platelets in the combined ARDS group, and B cells in the uncomplicated ARDS group, but the cell percentage was smaller than that in the combined ARDS group; HK1 had elevated expression in monocytes, NK cells and platelets in the combined ARDS group, and the cell percentage was larger than that in the uncomplicated ARDS group (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eA); HIST1H2BK had elevated platelet expression in the uncomplicated ARDS group and a greater cell percentage than in the uncomplicated ARDS group (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eB). Previous studies have shown that oxidative stress plays an important role in the pathogenesis of sepsis and ARDS, and that oxidative stress and inflammatory response interact with each other to promote disease progression. Analysis of the ssGSEA metabolic pathway showed that there was a difference in oxidative scores between the two groups, with the combined ARDS group having a higher oxidative score than the uncomplicated ARDS group (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eC), suggesting that the patients with combined ARDS had intense oxidative stress and a more severe disease.\u003c/p\u003e\u003cp\u003eTCF4 and HK1 were further analyzed due to their higher expression and percentage in the five cell clusters. The results of the analysis showed that the expression of TCF4 and HK1 was increased in monocytes and B cells, and the gene coexpression overlap was higher in both monocytes and B cells in patients with combined ARDS compared with patients without combined ARDS (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eD). Correlation analysis of TCF4 and oxidation scores showed a high correlation between TCF4 and oxidation scores in B cells of patients with combined ARDS and a high correlation between TCF4 and oxidation scores in monocytes of patients without combined ARDS (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003eE).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe global pandemic of SARS-CoV-2 has not only increased the focus on COVID-19, but has also stimulated an in-depth exploration of the complex relationship between acute respiratory distress syndrome (ARDS) and it (\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e). Our research focuses on uncovering the biological mechanisms that may be shared by these two diseases, particularly in terms of similarities in gene expression and immune responses. By combining machine learning algorithms and single-cell sequencing analyses, we not only identified a potential link between ARDS and COVID-19, but also highlighted several key genes, such as TCF4 and HK1, that may play a critical role in the pathogenesis of both diseases.\u003c/p\u003e\u003cp\u003eIn our analysis, we identified differentially expressed genes (DEGs) that are co-up- and down-regulated in ARDS and COVID-19, suggesting that they play important roles in the pathophysiology of these diseases. In particular, our enrichment analysis revealed the major biological functions and pathways involved in these genes, providing valuable insights for a deeper understanding of these diseases. For example, KEGG pathway analysis revealed that these common DEGs are closely associated with key biological processes such as inflammatory response and apoptosis, which play a central role in the severe pathophysiology of ARDS and COVID-19.\u003c/p\u003e\u003cp\u003eIn addition, our study employs advanced machine learning techniques such as Random Forest (RF) and Least Absolute Shrinkage and Selection Operator (LASSO) to refine our identification of potential key hub genes. Using the RF approach, we initially screened a set of candidate diagnostic genes, and to further refine these results and identify common key genes for COVID-19 and ARDS, we performed a downscaling analysis using the LASSO approach, which ultimately led to the identification of five key genes: HIST1H2BK, TCF4, OLFM4, KIF14, and HK1. These genes were highly significant in our model and have been shown in the literature to play key roles in processes such as immune response, inflammation, and cell death.\u003c/p\u003e\u003cp\u003eIn particular, TCF4 displays key hub gene properties associated with immune regulation in the context of ARDS and COVID-19, and shows altered expression patterns under various inflammatory conditions. This is highly consistent with recent findings, one of which showed that the combination of hCMSCs and liraglutide significantly improved the therapeutic efficacy of ALI via the cAMP/PKAc/β-catenin signaling pathway, in which TCF4 plays a central role (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e). In addition, gene expression analysis of SARS-CoV-2 infection revealed that TCF4 may be associated with cardiovascular complications of COVID-19, particularly with regard to vascular function (\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e). More specifically, a study of pediatric patients with COVID-19 identified autoantibodies against TCF4, highlighting the potential importance of TCF4 in the regulation of immune responses (\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e). Taken together, these findings highlight the central role of TCF4 in the pathophysiology of ARDS and COVID-19 and the importance of further exploring its function and therapeutic potential.The HK1 gene is of particular importance in the context of ARDS and COVID-19. Notably, the study identified unique SARS-CoV-2 phylogenetic clusters, mostly associated with HK1, coinciding with the large-scale COVID-19 outbreak in Hong Kong in July 2020 (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e). In addition, a study using 18F-FDG PET technology revealed a critical role for HK1 in neutrophil activation and lung inflammation, suggesting that increased HK1 activity is associated with increased neutrophil glucose uptake and migration capacity, which is a key factor in the development of ALI/ARDS (\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThese findings not only emphasize the importance of HK1 in the inflammatory and immune response, but also highlight its potential value in molecular imaging and early disease diagnosis.OLFM4 may play an important role in the pathophysiology of ARDS and COVID-19, particularly in the inflammatory response and immune regulation. This is supported by the existing literature, in which one study constructed a predictive model based on transcriptional biomarkers and clinical parameters and emphasized the importance of OLFM4 in predicting sepsis-induced ARDS (\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e). In addition, a global gene expression study and docking analysis confirmed the overexpression of OLFM4 in COVID-19 infection (\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e). Although KIF14 has been shown to be important in the pathophysiology of COVID-19, especially in the co-pathogenesis with digestive cancers, its role in ARDS is unclear (\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e). This discrepancy not only highlights the complexity of gene expression in different disease conditions, but also provides a new direction for future studies to explore in depth the potential function and role of KIF14 in ARDS.\u003c/p\u003e\u003cp\u003eTo validate our findings and assess the potential of these genes in disease diagnosis, we further performed XGBoost-based diagnostic model construction and validation. Notably, the key genes we identified, such as HIST1H2BK, OLFM4, KIF14, and HK1, showed good performance in the model, suggesting their potential value in the diagnosis of ARDS and COVID-19. The expression levels of these genes were strongly correlated with the severity of the disease and the clinical outcome of the patients, further emphasizing their importance in clinical applications.\u003c/p\u003e\u003cp\u003eIn addition, we investigated the role of immune cells in ARDS and COVID-19, in particular their infiltration patterns and functional status during the pathological process. Using CIBERSORT algorithm analysis, we observed changes in the distribution of different immune cell subpopulations in diseased tissues that were consistent with the expression patterns of the key genes we identified. This not only reveals the complexity of the immune response, but also highlights the need for further studies to understand the immunopathology of ARDS and COVID-19.\u003c/p\u003e\u003cp\u003eAlthough our study provides valuable insights, there are several limitations. First, due to the lack of a comprehensive single-cell data set, our analysis relied heavily on available public data and previously published studies. In addition, our study did not directly examine how specific genes affect intercellular communication or the tissue microenvironment, which may be an important direction for future research. Finally, while our model shows diagnostic potential, validation of these findings in a broader patient population is needed.\u003c/p\u003e\u003cp\u003eIn conclusion, our study reveals a potential molecular link between ARDS and COVID-19, particularly in terms of gene expression and immune response. The key genes and pathways we identified not only provide insight into these diseases, but may also guide future therapeutic strategies and drug development. However, more research is needed to understand in detail how these genes and pathways specifically affect the disease process and their potential for clinical application.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eConflict of Interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eChuanxi Tian, Yikun Guo, Huifang Guan, Kaile Ma, Rui Hao, Wei Zhu, Jinyue Zhao, and \u0026nbsp; Min Li \u0026nbsp;declare that they have no competing interests.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConceptualization: C.X.T., M.L., and W.Z.; Formal analysis and Data Curation: Y.K.G., C.X.T., and J.Y.Z.; Writing - Original Draft: C.X.T., Y.K.G., H.F.G.; \u0026nbsp;Writing - Review \u0026amp; Editing: C.X.T., Y.K.G., H.F.G., M.L., K.L.M., and R.H. All authors read and approved the final manuscript.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by the \u0026quot;China-Austria Joint Laboratory Construction and Joint Research on Traditional Chinese Medicine for Prevention and Treatment of Major Infectious Diseases along the Belt and Road\u0026quot; project (Project No. 2020YFE0205100),\u0026nbsp;with the support of the National Key Program for Strategic International Science and Technology Innovation and Cooperation.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNone.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets generated and/or analysed during the current study are available in the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database (https://www.ncbi.nlm.nih.gov/geo/), for the ARDS dataset (GSE32707 and GSE66890) and the COVID-19 dataset (GSE213313).\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eThompson BT, Chambers RC, Liu KD. Acute Respiratory Distress Syndrome. N Engl J Med (2017) 377:562\u0026ndash;572. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1056/NEJMra1608077\u003c/span\u003e\u003cspan address=\"10.1056/NEJMra1608077\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMin Tang, Na Li. Pathophy siological mechanism of acute respiratory distress syndrome and research progress on diagnostic biomarkers of ARDS. China Journal of Modern Medicine (2022) 32:1\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBellani G, Laffey JG, Pham T, Fan E, Brochard L, Esteban A, Gattinoni L, van Haren F, Larsson A, McAuley DF, et al. Epidemiology, Patterns of Care, and Mortality for Patients With Acute Respiratory Distress Syndrome in Intensive Care Units in 50 Countries. JAMA (2016) 315:788\u0026ndash;800. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1001/jama.2016.0291\u003c/span\u003e\u003cspan address=\"10.1001/jama.2016.0291\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMajumder J, Minko T. Recent Developments on Therapeutic and Diagnostic Approaches for COVID-19. AAPS J (2021) 23:14. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1208/s12248-020-00532-2\u003c/span\u003e\u003cspan address=\"10.1208/s12248-020-00532-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLippi G, Sanchis-Gomar F, Henry BM. COVID-19 and its long-term sequelae: what do we know in 2023? Pol Arch Intern Med (2023) 133:16402. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.20452/pamw.16402\u003c/span\u003e\u003cspan address=\"10.20452/pamw.16402\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZheng J, Miao J, Guo R, Guo J, Fan Z, Kong X, Gao R, Yang L. Mechanism of COVID-19 Causing ARDS: Exploring the Possibility of Preventing and Treating SARS-CoV-2. Front Cell Infect Microbiol (2022) 12:931061. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fcimb.2022.931061\u003c/span\u003e\u003cspan address=\"10.3389/fcimb.2022.931061\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eQuesada-Gomez JM, Entrenas-Castillo M, Bouillon R. Vitamin D receptor stimulation to reduce acute respiratory distress syndrome (ARDS) in patients with coronavirus SARS-CoV-2 infections: Revised Ms SBMB 2020_166. J Steroid Biochem Mol Biol (2020) 202:105719. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.jsbmb.2020.105719\u003c/span\u003e\u003cspan address=\"10.1016/j.jsbmb.2020.105719\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShen Y, Liu J, Zhang L, Dong S, Zhang J, Liu Y, Zhou H, Dong W. Identification of Potential Biomarkers and Survival Analysis for Head and Neck Squamous Cell Carcinoma Using Bioinformatics Strategy: A Study Based on TCGA and GEO Datasets. Biomed Res Int (2019) 2019:7376034. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1155/2019/7376034\u003c/span\u003e\u003cspan address=\"10.1155/2019/7376034\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBarrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, et al. NCBI GEO: archive for functional genomics data sets\u0026ndash;update. Nucleic Acids Res (2013) 41:D991-995. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/nar/gks1193\u003c/span\u003e\u003cspan address=\"10.1093/nar/gks1193\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWGCNA: an R package for weighted correlation network analysis - PubMed. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pubmed.ncbi.nlm.nih.gov/19114008/\u003c/span\u003e\u003cspan address=\"https://pubmed.ncbi.nlm.nih.gov/19114008/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e [Accessed November 14, 2023]\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBreiman L. Random Forests. Machine Learning (2001) 45:5\u0026ndash;32. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1023/A:1010933404324\u003c/span\u003e\u003cspan address=\"10.1023/A:1010933404324\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTibshirani R. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological) (1996) 58:267\u0026ndash;288. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1111/j.2517-6161.1996.tb02080.x\u003c/span\u003e\u003cspan address=\"10.1111/j.2517-6161.1996.tb02080.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEngebretsen S, Bohlin J. Statistical predictions with glmnet. Clin Epigenetics (2019) 11:123. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13148-019-0730-1\u003c/span\u003e\u003cspan address=\"10.1186/s13148-019-0730-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen B, Khodadoust MS, Liu CL, Newman AM, Alizadeh AA. Profiling Tumor Infiltrating Immune Cells with CIBERSORT. Methods Mol Biol (2018) 1711:243\u0026ndash;259. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/978-1-4939-7493-1_12\u003c/span\u003e\u003cspan address=\"10.1007/978-1-4939-7493-1_12\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, Hao Y, Stoeckius M, Smibert P, Satija R. Comprehensive Integration of Single-Cell Data. Cell (2019) 177:1888\u0026ndash;1902.e21. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.cell.2019.05.031\u003c/span\u003e\u003cspan address=\"10.1016/j.cell.2019.05.031\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMeyer NJ, Gattinoni L, Calfee CS. Acute respiratory distress syndrome. Lancet (2021) 398:622\u0026ndash;637. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/S0140-6736(21)00439-6\u003c/span\u003e\u003cspan address=\"10.1016/S0140-6736(21)00439-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAttaway AH, Scheraga RG, Bhimraj A, Biehl M, Hatipoğlu U. Severe covid-19 pneumonia: pathogenesis and clinical management. BMJ (2021) 372:n436. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1136/bmj.n436\u003c/span\u003e\u003cspan address=\"10.1136/bmj.n436\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFeng Y, Wang L, Ma X, Yang X, Don O, Chen X, Qu J, Song Y. Effect of hCMSCs and liraglutide combination in ALI through cAMP/PKAc/β-catenin signaling pathway. Stem Cell Res Ther (2020) 11:2. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s13287-019-1492-6\u003c/span\u003e\u003cspan address=\"10.1186/s13287-019-1492-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJha PK, Vijay A, Halu A, Uchida S, Aikawa M. Gene Expression Profiling Reveals the Shared and Distinct Transcriptional Signatures in Human Lung Epithelial Cells Infected With SARS-CoV-2, MERS-CoV, or SARS-CoV: Potential Implications in Cardiovascular Complications of COVID-19. Front Cardiovasc Med (2020) 7:623012. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fcvm.2020.623012\u003c/span\u003e\u003cspan address=\"10.3389/fcvm.2020.623012\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBartley CM, Johns C, Ngo TT, Dandekar R, Loudermilk RL, Alvarenga BD, Hawes IA, Zamecnik CR, Zorn KC, Alexander JR, et al. Anti-SARS-CoV-2 and Autoantibody Profiles in the Cerebrospinal Fluid of 3 Teenaged Patients With COVID-19 and Subacute Neuropsychiatric Symptoms. JAMA Neurol (2021) 78:1503\u0026ndash;1509. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1001/jamaneurol.2021.3821\u003c/span\u003e\u003cspan address=\"10.1001/jamaneurol.2021.3821\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTo KK-W, Chan W-M, Ip JD, Chu AW-H, Tam AR, Liu R, Wu AK-L, Lung K-C, Tsang OT-Y, Lau DP-L, et al. Unique Clusters of Severe Acute Respiratory Syndrome Coronavirus 2 Causing a Large Coronavirus Disease 2019 Outbreak in Hong Kong. Clin Infect Dis (2021) 73:137\u0026ndash;142. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1093/cid/ciaa1119\u003c/span\u003e\u003cspan address=\"10.1093/cid/ciaa1119\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRodrigues RS, Bozza FA, Hanrahan CJ, Wang L-M, Wu Q, Hoffman JM, Zimmerman GA, Morton KA. 18F-fluoro-2-deoxyglucose PET informs neutrophil accumulation and activation in lipopolysaccharide-induced acute lung injury. Nucl Med Biol (2017) 48:52\u0026ndash;62. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.nucmedbio.2017.01.005\u003c/span\u003e\u003cspan address=\"10.1016/j.nucmedbio.2017.01.005\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYao R-Q, Shen Z, Ma Q-M, Ling P, Wei C-R, Zheng L-Y, Duan Y, Li W, Zhu F, Sun Y, et al. Combination of transcriptional biomarkers and clinical parameters for early prediction of sepsis indued acute respiratory distress syndrome. Front Immunol (2022) 13:1084568. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fimmu.2022.1084568\u003c/span\u003e\u003cspan address=\"10.3389/fimmu.2022.1084568\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eA J, N A, K R. Global Gene Expression and Docking Profiling of COVID-19 Infection. Frontiers in genetics (2022) 13: doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3389/fgene.2022.870836\u003c/span\u003e\u003cspan address=\"10.3389/fgene.2022.870836\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXiong Z, Yang Y, Li W, Lin Y, Huang W, Zhang S. Exploring Key Biomarkers and Common Pathogenesis of Seven Digestive System Cancers and Their Correlation with COVID-19. Curr Issues Mol Biol (2023) 45:5515\u0026ndash;5533. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.3390/cimb45070349\u003c/span\u003e\u003cspan address=\"10.3390/cimb45070349\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"machine learning, multiple omics, ARDS, COVID-19, diagnostic genes","lastPublishedDoi":"10.21203/rs.3.rs-3892523/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3892523/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBACKGROUND\u003c/h2\u003e \u003cp\u003eAcute respiratory distress syndrome (ARDS) is a common acute clinical syndrome of the respiratory system with a high mortality rate and difficult prognosis.COVID-19 is a serious respiratory infectious disease caused by coronaviruses in a global pandemic. Some studies have suggested a possible association between COVID-19 and ARDS, but few studies have investigated the mechanism of interaction between them.\u003c/p\u003e\u003ch2\u003eMETHODS\u003c/h2\u003e \u003cp\u003eMicroarray data of ARDS (GSE32707 and GSE66890) and COVID-19 (GSE213313) were downloaded from the GEO database and searched for common differential genes for enrichment analysis.WGCNA was used to identify co-expression modules and genes associated with ARDS and COVID-19. RF and LASSO were performed for candidate gene identification. Machine learning XGBoost improved the diagnosis of hub genes in ARDS and COVID-19. The degree of immune cell infiltration in ARDS and COVID-19 samples was assessed using the CIBERSORT algorithm, and the relationship between hub genes and infiltrating immune cells was investigated. Changes in pathway activity per cell were visualized using Seurat standard flow down clustering (seurat) to visualize peripheral blood mononuclear cell (PBMC) single-cell RNA sequencing (scRNA-seq) data from patients with sepsis-combined ARDS and patients with sepsis alone.\u003c/p\u003e\u003ch2\u003eRESULTS\u003c/h2\u003e \u003cp\u003eLimma difference analysis identified 314 up-regulated genes and 241 down-regulated genes in ARDS and COVID-19.WGCNA identified the purple-red co-expression module as the core module of ARDS and COVID-19. Five candidate genes, namely HIST1H2BK, TCF4, OLFM4, KIF14 and HK1, were screened using two machine learning algorithms, RF and LASSO. XGBoost constructed diagnostic models to evaluate the hub genes with high diagnostic efficacy in ARDS and COVID-19. Single-cell sequencing revealed the presence of alterations in five immune subpopulations, including monocytes, B cells, T cells, NK cells and platelets, with high expression levels and cellular occupancy of TCF4 and HK1, which are involved in oxidative reactions.\u003c/p\u003e","manuscriptTitle":"Combination of multiple omics and machine learning identifies diagnostic genes for ARDS and COVID-19","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-02-06 17:30:06","doi":"10.21203/rs.3.rs-3892523/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"16251b88-8ae5-47fc-a9fc-130d20eb194b","owner":[],"postedDate":"February 6th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":28563601,"name":"Health sciences/Biomarkers"},{"id":28563602,"name":"Health sciences/Medical research"}],"tags":[],"updatedAt":"2024-10-22T05:08:20+00:00","versionOfRecord":[],"versionCreatedAt":"2024-02-06 17:30:06","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-3892523","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3892523","identity":"rs-3892523","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.