An integrated machine learning model of transcriptomic genes in multi-center chronic obstructive pulmonary disease reveals the causal role of TIMP4 in airway epithelial cell | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article An integrated machine learning model of transcriptomic genes in multi-center chronic obstructive pulmonary disease reveals the causal role of TIMP4 in airway epithelial cell Erkang Yi, Haiqing Li, Yu Liu, Qingyang Li, Chengshu Xie, Ruining Sun, and 7 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5739006/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 23 Apr, 2025 Read the published version in Respiratory Research → Version 1 posted 6 You are reading this latest preprint version Abstract Background Chronic obstructive pulmonary disease (COPD) is a heterogeneous syndrome, resulting in inconsistent findings across studies. Identifying a core set of genes consistently involved in COPD pathogenesis, independent of patient variability, is essential. Methods We integrated lung tissue sequencing data from patients with COPD across two centers. We used weighted gene co-expression network analysis and machine learning to identify 13 potential pathogenic genes common to both centers. Additionally, a gene-based model was constructed to distinguish COPD at the molecular level and validated in independent cohorts. Gene expression in specific cell types was analyzed, and Mendelian randomization was used to confirm associations between candidate genes and lung function/COPD. Results Tissue inhibitor of metalloproteinase 4 (TIMP4) was identified as a key pathogenic gene and validated in COPD cohorts. Further analysis using single-cell sequencing from mice and patients with COPD revealed that TIMP4 is involved in ciliated cells. In primary human airway epithelial cells cultured at the air-liquid interface, TIMP4 overexpression reduced ciliated cell numbers. Conclusions We developed a 13-gene model for distinguishing COPD at the molecular level and identified TIMP4 as a potential hub pathogenic gene. This finding provides insights into shared disease mechanisms and positions TIMP4 as a promising therapeutic target for further investigation. Chronic obstructive pulmonary disease Machine learning Mendelian randomization Matrix metallopeptidase 4 Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Introduction Chronic obstructive pulmonary disease (COPD) is a heterogeneous and complex respiratory condition that presents a significant social burden[ 1 ]. Current treatments primarily focus on symptom management and preventing acute exacerbations, yet they have notable limitations[ 2 ]. Common therapies, including bronchodilators, inhaled corticosteroids, and oxygen therapy, do not effectively halt disease progression or improve long-term outcomes[ 2 , 3 ]. Consequently, there is an urgent need to investigate COPD pathogenesis further to identify new therapeutic targets and develop more effective interventions. Numerous studies have performed genomic sequencing on lung tissue samples from patients with COPD to elucidate the molecular mechanisms associated with the condition[ 4 , 5 ]. However, these studies frequently produce inconsistent results across different research centers, similarly attributable to variations in study design, sample selection, sequencing technologies, and data analysis methods[ 6 ]. Furthermore, the differential genes identified from sequencing analyses of different cohorts vary. Even when common differential genes are identified, the correlation of their expression with clinical characteristics may also differ across cohorts[ 6 , 7 ]. In this study, we aimed to enhance robustness and reliability by integrating lung tissue sequencing data from patients with COPD across various centers. We employed machine learning techniques and model construction to identify key genes that effectively differentiate between patients with non-COPD and COPD. The model developed from these hub genes consistently distinguishes COPD from non-COPD across various datasets, potentially providing new insights into diagnostic markers and therapeutic targets for COPD. Further analysis using single-cell sequencing and Mendelian randomization (MR) studies of these hub genes revealed that tissue inhibitor of metalloproteinase 4 (TIMP4) is specifically expressed in ciliated cells and is significantly upregulated in patients with COPD. MR and clinical cohort data suggest a close relationship between TIMP4 expression, lung function, and computed tomography (CT) imaging findings. A comprehensive understanding of the early pathogenic events, derived from analyzing ciliated cells and their interactions with other cell types, could pave the way for innovative management strategies for complex, multifactorial chronic airway and pulmonary diseases. The TIMP family plays a crucial role in regulating extracellular matrix (ECM) degradation and remodeling in COPD by regulating matrix metalloproteinase (MMP) activity. This regulation balances tissue destruction and repair, influencing airway inflammation and fibrosis. We hypothesize that TIMP4 expression may be crucial in affecting ciliated cell function and airway clearance capacity, playing a key role in COPD progression. Materials and Methods Human bronchial brushing collection Primary human bronchial epithelial (HBE) cells were obtained from brushings of 5th–6th order bronchioles during fiberoptic bronchoscopy using an endoscopic cytobrush, as previously described[ 8 ]. The material was obtained from the Biobank of the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China. The COPD diagnosis was confirmed by post-bronchodilator forced expiratory volume in one second (FEV1)/forced vital capacity (FVC) < 70%. The exclusion criteria included asthma, bronchiectasis, pulmonary fibrosis, and active infection. This study followed the ethical guidelines outlined in the Declaration of Helsinki and was approved by the Ethics Committee of the First Affiliated Hospital of Guangzhou Medical University (approval number 2020-51). All participants provided written informed consent before enrollment. HBE cell air-liquid interface culture, lentiviral infections, and cigarette smoke extract treatment HBE cell air-liquid interface culture, lentiviral infections, and cigarette smoke extract treatment HBE cells were cultivated under air-liquid interface (ALI) conditions to form well-differentiated, pseudostratified cultures, following previously described methods[ 9 ]. Briefly, isolated HBE cells were maintained and expanded (one passage) in T75 flasks with bronchial epithelial cell expansion medium (AEGM, 05040, STEMCELL Technologies) at 37°C in a 5% carbon dioxide (CO 2 ) incubator. At 80% confluence, cells were detached with 0.05% trypsin-ethylenediaminetetracetic acid (EDTA; Gibco) and seeded on membrane supports (12 mm Transwell culture inserts, 0.4 µm pore size, Costar) coated with 0.05 mg collagen from calf skin (Sigma–Aldrich) in AEGM supplemented with 1% penicillin/streptomycin. HBE cells were cultured for two days until they reached complete confluence. The apical medium was removed, and the basal medium was replaced by an ALI culture medium (05001, STEMCELL Technologies). Cultures were maintained under ALI conditions by changing the medium in the basal filter chamber three times a week. For cigarette smoke extract (CSE) treatment, a 1.023 mg/mL stock solution was diluted to 0.02 mg/mL. Epithelial cells were cultured in a differentiation medium containing CSE at 37°C in a 5% CO 2 incubator from day 5 to day 14. For the rescue experiment, epithelial cells were cultured in a differentiation medium containing CSE at 37°C in a 5% CO 2 incubator from day 5 to day 14. The medium was replaced every 24 h before collection for analysis. Ribonucleic acid extraction, complementary deoxyribonucleic acid synthesis, and quantitative real-time polymerase chain reaction Ribonucleic acid (RNA) extraction from lung tissues and cells was performed using a commercially available RNA isolation kit, following the manufacturer's recommended protocol[ 10 ]. Complementary deoxyribonucleic acid synthesis was conducted using a reverse transcription kit designed for quantitative polymerase chain reaction (qPCR) applications, with 1,000 ng of total RNA as the starting material. Quantitative real-time PCR (qRT-PCR) was conducted using a SYBR green-based PCR master mix on an RT-PCR detection system. Gene expression levels were quantified using the comparative CT (2 –∆∆CT ) method, with glyceraldehyde-3-phosphate dehydrogenase (GAPDH) as the endogenous control. A complete list of primer sequences used in this study is provided in Supplementary Data 28. Western blotting (WB) WB analysis was conducted using previously established protocols[ 11 ]. Briefly, protein samples from cell lines and lung tissue were prepared using radioimmunoprecipitation assay lysis buffer (Catalog # 89901, Thermo, USA) with added protease inhibitors (Catalog # 78430, Thermo, USA) and incubated at 4°C for 20 min. Proteins were separated on a 10% sodium dodecyl sulfate-polyacrylamide gel electrophoresis and transferred to polyvinylidene difluoride membranes (BioRad, USA). Membranes were blocked and incubated overnight at 4°C with primary antibodies against TIMP4 (Catalog # 12326-1-AP, proteintech, China), MMP9 (Catalog # 13667, CST, USA), fibronectin-1 (FN1; Catalog # 26836, CST, USA), β-catenin (Catalog # 9562, CST, USA), non-p-β-catenin (Catalog # 4176, CST, USA), and GAPDH (Catalog # 60004-1-Ig, proteintech, China). After washing, membranes were incubated with a horseradish peroxidase-conjugated secondary antibody (Proteintech) and visualized using enhanced chemiluminescence on an Amersham Imager 680 (Thermo Fisher Scientific, USA). Multiple immunofluorescence assay of HBE cells ALI cultures were fixed in 4% paraformaldehyde overnight at 4°C, then incubated in a permeabilization solution (0.2% Triton X-100 in phosphate-buffered saline [PBS]) for 15 min. Subsequently, cultures were blocked with 10% goat serum, PBS, and 3% bovine serum albumin solutions for 1 h at room temperature (RT). Primary antibodies were applied and incubated overnight at 4°C, followed by washing and incubating with secondary antibodies for 1 h at RT. Cultures were then stained with 4',6-diamidino-2-phenylindole (DAPI) for 10 min at RT before being mounted for imaging. The following primary antibodies were used: Mouse anti-acetylated α-tubulin (1:1500, Sigma, T7451), mouse anti-TIMP4 (Catalog # 12326-1-AP, Proteintech, China), and rabbit anti-MUC5AC (1:200, Abcam, ab3649). Secondary antibodies included goat anti-mouse immunoglobulin G (IgG; H + L) cross-adsorbed secondary antibody, Alexa Fluor™ 488/568/647, and goat anti-rabbit IgG (H + L) cross-adsorbed secondary antibody, Alexa Fluor™ 488/568/647. Messenger RNA microarray chip datasets and bioinformatics Several microarray datasets, such as GSE47460[ 12 ], GSE76925[ 5 ], GSE103174[ 13 ], GSE239897[ 14 ], and GSE37147[ 15 ], were obtained from the Gene Expression Omnibus (GEO) repository. These datasets used various platforms: GPL14550 for GSE47460 (108 controls, 220 COPD samples), GPL10558 for GSE76925 (40 controls, 111 COPD samples), GPL13667 for GSE103174 (21 controls, 44 COPD samples), and GPL17303 for GSE239897 (40 controls, 111 COPD samples). For GSE37147, we excluded patients with a recent history of using inhaled medications, resulting in a final cohort of 136 controls and 63 patients with COPD. Data visualization and analysis were performed using R packages, such as "ggplot2" for volcano plots, "ggbiplot" for principal component analysis, and "corrplot" for gene correlation assessments. Differentially expressed genes (DEGs) were identified using the limma package from the R/Bioconductor. Significance criteria varied as follows: p 0.2 for GSE47460, while p 0.4 for GSE76925. Functional annotation of genes was performed using gene ontology enrichment analysis ( http://geneontology.org ) and Kyoto encyclopedia of genes and genomes pathway analysis ( http://kegg.jp ). Gene set enrichment analysis (GSEA) was performed using dedicated software[ 16 ], and multi-dataset integration was achieved using Metascape ( http://metascape.org/ ). Protein-protein interaction (PPI) networks were constructed using the STRING database[ 17 ] ( http://string-db.org ) and visualized with Cytoscape software (version 3.8.3). Weighted gene co-expression network analysis (WGCNA) was conducted on GSE47460 and GSE76925 datasets and RNA-seq using the 'WGCNA' R package[ 18 ]. Networks were correlated with COPD clinical status and various pulmonary function parameters, including FEV1 percentage of predicted value (FEV1%pre), FVC percentage of predicted value (FVC%pre), FEV1/FVC ratio, and low attenuation area percentage (%LAA950) value. Machine learning We implemented a multi-tiered machine learning approach to elucidate gene signatures associated with COPD. In the initial phase, four distinct algorithms were employed: Support vector machine recursive feature elimination (SVM-RFE), least absolute shrinkage and selection operator (LASSO) model, elastic net, and random forest model. Gene selection was guided by the optimal lambda value (λ) within one standard error of the minimum error; the λ value was determined through 10-fold cross-validation using the "glmnet" R packages[ 19 ]. We utilized an extensive array of machine-learning algorithms for model construction and validation. These included ensemble methods (Voting, GradientBoosting, Adaptive Boosting, Extra Tree, Random Forest, Bootstrap Aggregating), decision tree-based approaches, probabilistic models (Naïve Bayes), instance-based learning (K-Nearest Neighbors), SVM, gradient descent methods (Stochastic Gradient Descent), various regression techniques (Logistic Regression, BayesianRidge, ElasticNet, LASSO, Linear_Lasso, Ridge Regression with Cross-validation, Ridge_Regression, Linear_Regression), and neural network approaches (Artificial Neural Network). Model selection was performed by ranking TrainSet Accuracy and TestSet Accuracy values across distinct datasets. The definitive gene set represents consensus features identified through LASSO, RFE, Random Forest (RF), and Elastic Net algorithms across multiple cohorts. Following hub gene identification, all subsequent analyses were performed using R software (version 4.0.3). We employed several R packages, including "caret," "e1071," "glmnet," "tree," "randomForest," "adabag," "nnet," "xgboost," and "ggplot2," to implement the algorithms and visualize results. Single-cell RNA-seq analysis We obtained single-cell RNA-sequencing (scRNA-seq) data for mouse lung tissue from the GEO database (GSE168299[ 20 ]), comprising eight samples (Air = 4, Smoke = 4) with 41,099 cells and 20,832 detected genes. Human lung tissue scRNA-seq data were retrieved from GSE173869[ 21 ], including 12 samples (non-smoker = 3, COPD = 9) with 39,425 cells and 33,538 detected genes. Cell type-specific marker genes were previously established for both datasets. Analysis and visualization of scRNA-seq data were performed using R and Seurat Package ( https://satijalab.org/seurat/ ). The analytical pipeline followed established protocols for data normalization, dimensionality reduction, and clustering. DEGs from various cell subsets underwent Reactome enrichment analysis ( https://reactome.org/ ). Intercellular communication was analyzed using the "CellChat" R package (version 1.1.3), while pseudotime analysis was validated using the "Monocle 2" R package. Visualization was accomplished using the "ggplot2" R package. RNA-seq and bioinformatics RNA samples were sequenced at Wekomo (China) using the Illumina system (San Diego, USA). The resulting RNA-seq data were aligned to the Ensembl (version 105) transcript annotations. The "Limma" package in R software was employed to identify DEGs. The time series analysis of gene expression was performed with the Mfuzz software. CSE extraction and preparation CSE was produced by a commercial combustible cigarette (Hongmei, Hongta Group, China), as described previously [ 9 ]. Mainstream cigarette smoke (CS) was generated using a Cerulean CETI 8 MK3 smoking machine (CERULEAN, UK), following ISO 20778:2018 standards, with a 55 mL puff volume, 2 s duration, and a 30 s interval. The mainstream smoke was passed through two collection vessels containing 2 × 20 mL of Dulbecco's Modified Eagle Medium/Nutrient Mixture F-12 medium, then combined and shaken for 20 min to obtain an aqueous CSE. The extract was filtered twice through a 0.22 µm membrane, aliquoted, and stored in the dark at − 80°C. The nicotine concentration in the CSE was measured by gas chromatography-mass spectrometry by Shenzhen Fogcore Technology Co., Ltd. and determined to be 1.36 mg/mL. Cell culture Cell culture was performed according to the operating manuals for PneumaCult™-Ex Plus Medium (Catalog # 05040) and PneumaCult™-ALI Medium (Catalog # 05001) from STEMCELL Technologies as previously described[ 8 ]. The ALI cultures were established using Transwell plates (Catalog # 3460, Corning). Primary airway epithelial cells were cultured at 37°C with 5% CO 2 until 80% confluence of cell colonies was achieved, followed by dissociation with TrypLE™ Express (Gibco) and seeding at a density of 2.3 × 10 5 cells/cm 2 . Transwell plates were maintained at 37°C with 5% CO 2 , with medium changes every two days. Once cells reach 100% confluence (typically 4–6 days), the apical medium is removed, and the basal medium is replaced with a differentiation medium, marking day 0 of differentiation. Medium changes were performed every two days until the formation of visible ciliary beating on day 21. The CSE was added to the basolateral chamber at a 0.02 mg/mL concentration on day 6. Lentivirus infection The lentivirus used in this study was provided by Yunzhou Biotechnology Co., Ltd. (Guangzhou, China). The TIMP4 overexpression vector was constructed as pLV [Exp]-EF1A > hTIMP4NM_003256.4:3xGGGGS: mCherry (ns): P2A: Puro, while the control vector was pLV[Exp]-EF1A > mCherry(ns): P2A: Puro. Primary airway basal cells were used for lentiviral infection in the second to third passage. When the airway basal cells reached approximately 80% confluence, they were dissociated, and 500,000 cells were transferred to a T75 flask. The lentiviral solution was added at a multiplicity of infection of 10, and the cells were cultured in PneumaCult™-Ex Plus Medium supplemented with 5 µM Y-27632 for 16 h. After incubation, the viral-containing medium was removed, and the cells were washed twice with PBS, followed by continued culture in fresh PneumaCult™-Ex Plus medium with medium changes every two days. Once the cells reached 60–80% confluence, puromycin was added at a concentration of 2 µg/mL for selection. After four days of selection, puromycin was maintained at 1 µg/mL. When the airway basal cells reached 80–90% confluence, they were transferred to the ALI model for differentiation, with 1 µg/mL puromycin continuously added to the differentiation medium. TIMP4 measurement Peripheral blood samples from 185 non-COPD and 116 COPD from the early COPD (ECOPD) study (the Chinese Clinical Trial Registry, ChiCTR1900024643)[ 22 ] were randomly included in this analysis. The data of some of the participants have been previously published. Peripheral blood samples were obtained by trained staff in EDTA collection tubes and centrifuged at 3,000 rpm for 10 min at RT, and the supernatants were stored at − 80°C. Plasma TIMP4 was measured using the TIMP-4 ELISA kit (CSB-E04735h, CUSABIO, China). Genome-wide association data sources Our study analyzed genetic associations with COPD-related phenotypes using Integrative Epidemiology Unit (IEU) Open genome-wide association (GWAS) project data. We accessed single nucleotide polymorphisms (SNPs) linked to FEV1/FVC [ 23 ], FEV1 , FVC , FEV1/FVC < 0.7 [ 24 ] and COPD diagnosis from GWAS catalog entries[ 25 ]. Data for FEV1/FVC ratio (n = 321,047), FEV1 (n = 321,047), and FVC (n = 321,047) were obtained from separate GWAS. Additional GWAS data included FEV1/FVC < 0.7 (cases: 55,907, controls: 297,408) and COPD diagnosis (cases: 26,710, controls: 334,484). The expression quantitative trait loci (eQTL) analysis was conducted for selected genes using datasets from 515 individuals with lung tissue and 755 individuals with whole blood, sourced from the GTEx_v8 database ( https://www.gtexportal.org/home )[ 26 ]. Additionally, the eQTLs associated with the selected genes served as proxies for increased expression of these genes. All GWAS datasets were accessed through the IEU GWAS database ( https://gwas.mrcieu.ac.uk/ ). This approach was used to explore the genetic basis of COPD and related phenotypes using large-scale data. Summary-data-based MR analyses Our study employed summary-data-based MR (SMR) and heterogeneity in dependent instruments (HEIDI) tests within cis-regulatory regions using SMR software[ 27 ]. This approach uses a single-nucleotide variant at a primary xQTL as an instrumental variable, combined with summary-level eQTL and GWAS data, to explore potential causal or pleiotropic relationships between gene expression and traits of interest. We applied standard SMR software settings, including a p -value threshold of 5.0 × 10 –8 for top eQTL selection and a 1 Mb window around the probe center for cis-eQTL identification. All analyses were restricted to cis-regulatory regions. Statistical significance was determined by a p < 0.05 for SMR, while for HEIDI, a p < 0.05 suggested significant linkage. This methodology investigated genetic associations and potential causal relationships in COPD-related phenotypes, providing a nuanced interpretation of the data. Two-sample MR Our MR analysis primarily used the inverse-variance weighted (IVW) method, supplemented by MR-Egger, weighted median, simple mode, and weighted mode approach[ 28 , 29 ]. We assessed heterogeneity across individual causal effects using Cochran's Q statistic in MR-Egger and IVW methods, with p 0.05 suggested the absence of directional horizontal pleiotropy, while p > 0.05 in the MR-PRESSO global test indicated no evidence of horizontal pleiotropic outliers. We conducted leave-one-out sensitivity analyses to ensure robustness and employed Steiger filtering to verify causal directionality. All statistical analyses were performed in R software using the 'TwoSampleMR' package[ 25 ], with a significance threshold of p < 0.05. In cases of significant heterogeneity, we applied a random effects model for IVW estimates. Statistical analyses All statistical analyses were performed using GraphPad Prism (version 8.0.1) or Medcalc (version 23.0.1) software. The two-tailed paired Student's t-test, two-tailed Mann–Whitney test, one-way ANOVA, and two-tailed Pearson correlation were used to determine the significance between means. Linear regression models were implemented to assess associations between gene expression levels and clinical parameters. Receiver operating characteristic (ROC) curve analysis was performed to calculate area under the curve (AUC) values. P -values were represented as follows: ns (not significant), * p < 0.05, ** p < 0.01, *** p < 0.001, and ** p < 0.0001. Results Identification of common gene signature in lung tissue sequencing from two patients with COPD cohorts The analytical workflow of this study is schematically summarized in Fig. 1 . We analyzed lung tissue sequencing data from two patients with COPD cohorts: The GSE47460 (108 controls, 220 patients with COPD) and GSE76925 (37 smoker controls and 110 COPD patients with smoking history). The WGCNA was performed on both datasets (Figs. 2 A and S1- 2 ), yielding 24 and 16 gene modules for GSE47460 and GSE76925, respectively. We then correlated these modules with COPD status, lung function, and CT indicators. In GSE47460, modules " magenta ," " tan ," " midnightblue ," " skyblue ," " brown ," and " darkgreen " negatively correlated with COPD status and LAA950% , but positively with lung function including FEV1%pred and FVC%pred . Conversely, " lightgreen ," " black ," " white ," and " pink " modules indicated opposite correlations (Fig. 2 A and Supplementary Data 1). Analysis of GSE76925 revealed similar patterns: " brown " and " green " modules negatively correlated with COPD status and LAA950% but positively with lung function ( FEV1%pred and FEV1/FVC ). The " turquoise " module exhibited inverse correlations (Fig. 2 A and Supplementary Data 2). We performed DEG analysis (COPD versus control) on both datasets. The GSE47460 yielded 890 upregulated and 863 downregulated genes (Figures S3 A-B), while GSE76925 indicated 438 upregulated and 1,141 downregulated genes (Figures S3 C-D and Supplementary Data 3). Enrichment analysis of these gene sets revealed that negatively correlated genes were associated with regulating the Wnt signaling pathway , secretion , extracellular matrix organization , and cell-cell adhesion . Positively correlated genes were linked to the regulation of leukocyte migration , epithelial cell proliferation , positive cytokine production , and NABA CORE MATRISOME functions (Fig. 2 B). We then intersected these DEGs with the modules significantly correlated with COPD status, CT indicators, and lung function. In GSE47460, we identified 310 COPD-positively correlated and 611 COPD-negatively correlated overlapping genes (Fig. 2 C). GSE76925 yielded 94 COPD-positively correlated and 191 COPD-negatively correlated overlapping genes (Fig. 2 D and Supplementary Data 4). We performed LASSO analysis on the overlapping gene sets, stratified by COPD status, to further identify key gene clusters crucial in COPD. This yielded 21 and 17 potential hub genes, respectively (Fig. 2 E and Supplemental Data 5). These gene sets indicated no overlap, underscoring the heterogeneity in COPD sequencing results across different cohorts. We combined these LASSO gene sets into a 38-gene signature to identify hub genes functioning consistently across diverse COPD cohorts. We then applied various machine learning models (LASSO, Elasticnet, Random Forest, and SVM-RFE) to both datasets using this signature (Figs. 2 F and S4). By intersecting genes obtained from the same machine learning method across both datasets, we identified 5, 4, 11, and 6 overlapping genes, respectively (Supplemental Data 6). Integration of these results yielded a final set of 13 genes: ANGPTL1, DUSP26, FGG, GAS2, VEGFD, BHLHE22, SYNGR1, TIMP4, CXCL12, GEMIN5, SV2B, HTR2B, and TMEM117 . A model constructed with 13 genes that accurately identify COPD Subsequently, we divided the two lung tissue sequencing datasets into training and validation sets. We evaluated the performance of 20 different machine learning methods using the identified 13-gene signature. All 20 models demonstrated the ability to effectively distinguish between COPD and non-COPD populations in independent lung tissue sequencing datasets using the 13-gene signature (Figs. 3 A-B). However, only Extra Trees and Random Forest methods consistently indicated high accuracy in both training and test sets across both datasets, with test set accuracies exceeding 0.8 (Supplemental Data 7). Based on these results, we selected extra trees and random forest models for further analysis, as they demonstrated high and consistent performance in discriminating COPD status across different cohorts. We first generated nomogram scores for each gene in both models across the two datasets, revealing broadly similar gene scores between the datasets (Figures S5A-B). Extra Trees and Random Forest models demonstrated good discrimination among patients with COPD, indicating better performance in predicting COPD cases but higher error rates in predicting controls (Figures S5C-D). To validate the models' reliability, we applied the models constructed using GSE47460 to predict outcomes in GSE76925. Both extra trees and random forest models demonstrated high accuracy in the area under the curve (AUC ET = 0.871, AUC RF = 0.859). The reverse validation also yielded high accuracy (AUC ET = 0.789, AUC RF = 0.786), confirming the models' reliability (Fig. 3 C). To further validate our model's accuracy in diverse COPD populations, we incorporated lung tissue sequencing data from two external cohorts: GSE103174 (21 control and 44 COPD samples) and GSE239869 (43 control and 39 COPD samples). Both models, constructed using the 13-gene signature, demonstrated robust performance in these cohorts. For GSE103174, Extra Trees and Random Forest models achieved AUC values of 0.693 and 0.71, respectively. In GSE239869, the models indicated even stronger predictive power (AUC ET = 0.883, AUC RF = 0.862) (Fig. 4 D). We analyzed the expression of the 13 hub genes in GSE47460 and GSE76925 datasets. All genes, except VEGFD in GSE47460, demonstrated significant differences between control and COPD groups, with most exhibiting similar trends across datasets (Fig. 3 E). We assessed correlations between these genes, lung function, and CT indicators using 13 linear regression models. Across models and datasets, the genes were significantly associated with FEV1%pre and LA950% (Figs. 3 F and S6– 8 ). These results were used to validate the stability and specificity of our 13-gene model in patients with COPD and its significant correlation with clinical characteristics. The expression distribution of the 13 genes used to construct the model across various lung cell subtypes in both humans and mice To explore the expression patterns of the 13 selected genes across lung cell subpopulations, we analyzed single-cell sequencing data from patients with COPD lung tissue (GSE173896), comprising four non-smokers and nine patients with COPD. Using the original cell clustering strategy, we identified 22 cell subpopulations (Figs. 4 A and S9A and Supplementary Data 8). We calculated expression scores for each cell subpopulation based on the integrated expression of the 13 genes (Fig. 4 B). Results demonstrated expression in all subpopulations, with alveolar and adventitial fibroblasts scoring highest (Fig. 4 C). Expression scores were significantly increased in CD8T cells, CD4T cells, natural killer cells, macrophages, neutrophils, AT1, AT2, monocytes, ciliated cells, B cells, and pericytes/smooth muscle actin (SMA) cells. Conversely, scores were significantly lower in alveolar and adventitial fibroblasts, dendritic cells, plasma cells, and vascular endothelial (VE) cells (Figure S10B). Individual gene analysis indicated BHLHE22, CXCL12, VEGFD , and ANGPTL1 highly expressed in fibroblasts; GAS2 and TIMP4 in ciliated cells; TMEM117 in macrophages; SYANGR1 in immune cells; FGG in AT2 cells; and DUSP26 in plasma cells. The TMEM117, HTR2B , and SV2B revealed low expression across all cell types (Fig. 4 D). To validate our findings from human lung tissue, we analyzed single-cell sequencing data from a mouse COPD model (GSE168299), including four air-exposed and four CS-exposed mice. We used the original cell clustering strategy, identifying 27 cell subpopulations (Figs. 4 E and S10A and Supplementary Data 9); consistent with human data, all cell types indicated expression, with alveolar and adventitial fibroblasts exhibiting the highest scores, followed by endothelial cells (Fig. 4 F-G). However, significant score differences between groups were limited to alveolar fibroblasts, basal cells, B cells, ciliated cells, club cells, some endothelial cells, macrophages, and monocytes (Figure S10B). Several genes indicated similar primary expression patterns in mouse and human lung cell subpopulations. However, SV2B, ANGPTL1 , and GAS2 exhibited low expression across all mouse lung cell types, differing from human results. Additionally, CXCL12 indicated high expression in endothelial cells, and DUSP26 was expressed in mouse ciliated cells (Fig. 4 H). MR analysis was used to identify TIMP4 as a potential hub gene influencing COPD progression within the signature To identify potentially causal genes within our signature for COPD, we conducted MR analyses (Fig. 5 A). We extracted SNPs associated with the 13 genes from GTEx_v8 as instrumental variables for lung tissue and whole blood. Outcomes included FEV1/FVC < 0.7 , physician-diagnosed COPD , FEV1 , FVC , and FEV1/FVC from GWAS studies (Supplementary Data 10). We performed two-sample MR (TSMR) and SMR analyses, including HEIDI tests for summary-level data. ANGPTL1, BHLHE22, FGG, HTR2B , and TIMP4 lacked suitable instruments in peripheral blood. Furthermore, BHLHE22, GEMIN5, and TMEM117 did not have suitable instrumental variables for the SMR analysis of lung tissue. ANGPTL1 in lung tissue was positively associated with FVC and FEV1 and negatively with FEV1/FVC < 0.7 . FGG indicated significance only in TSMR, positively associating with lung function and negatively with COPD diagnosis . HRT2B in lung tissue is positively associated with FEV1 and FEV1/FVC , negatively with FEV1/FVC < 0.7 , and positively with COPD diagnosis. GEMIN5 in both lung and blood is negatively associated with lung function and COPD diagnosis in TSMR. SV2B in lung tissue is positively associated with FVC, FEV1, FEV1/FVC < 0.7 , and COPD diagnosis in TSMR. The TIMP4 in lung tissue is negatively associated with lung function and positively with COPD diagnosis in both TSMR and SMR. TMEM117 in lung tissue is negatively associated with lung function and positively with COPD diagnosis in TSMR and SMR. Notably, the above results indicated no heterogeneity, but only BHLHE22, FGG, SV2B, SYNGR1, TIMP4 , and TMEM117 displayed largely absent horizontal pleiotropy (Figs. 4 B-C and S11 and Supplementary Data 11–17). We constructed a receiver operating characteristic (ROC) curve model using the 13 genes, achieving AUCs of 0.883 and 0.966 in two datasets (Figures S12A). Considering the MR results and excluding genes with non-significant outcomes, we selected SVB2, TIMP4 , and ANGPTL1 for further investigation. These three genes maintained reasonable AUC values in both datasets. The ROC model using only these genes achieved AUCs of 0.767 and 0.881 (Figures S12B and Supplementary Data 18), highlighting their significance within the gene set. We then evaluated correlations between these three genes and both datasets' lung function and CT indicators. TIMP4 and SV2B negatively correlated with lung function and positively with %LAA950 in both datasets. ANGPTL1 positively correlated with lung function in both datasets and negatively with %LAA950 , though only significantly in GSE76925 (Figures S13A-B). Further analysis of their expression in primary cell types revealed that ANGPTL1 expression decreased in alveolar fibroblasts but remained unchanged in adventitial fibroblasts (Figure S13C). SV2B expression indicated non-significant changes in alveolar or adventitial fibroblasts (Figure S13D). TIMP4 expression increased in ciliated cells of the COPD group (Figure S13E). Given that SV2B demonstrated no differential expression in alveolar or adventitial fibroblasts, SV2B and ANGPTL1 exhibited weak expression across various cell subpopulations in mouse lung tissue, which hampers subsequent single-cell analysis. We selected TIMP4 as a potential key pathogenic gene for further study, which is highly expressed in ciliated cells in both human and mouse lung tissues. We observed that TIMP4 expression progressively increased with higher Global Initiative for Chronic Obstructive Lung Disease (GOLD) stages in patients with COPD (Fig. 5 D). However, there was a non-significant difference in expression between GOLD 3 and GOLD 4 (Fig. 5 E). Previous single-cell results indicated that TIMP4 is primarily expressed in ciliated cells. We analyzed bronchial brush samples from patients with COPD (COPD = 64, smokers = 137) to validate TIMP4 expression. Our analysis revealed that TIMP4 expression was also elevated in the brush samples from patients with COPD (Fig. 5 F). While smoking similarly increased TIMP4 expression (Fig. 5 G), the high expression of TIMP4 in patients with COPD was found to be independent of smoking status (Fig. 5 H). We randomly selected 301 samples from the ECOPD study cohort , which included 116 patients with COPD and 185 non-COPD controls, to investigate whether TIMP4 expression in peripheral blood reflects that observed in lung tissue and to validate the results from lung tissue and airway brush samples and to confirm the findings from MR analysis (Figure S14A). TIMP4 secretion levels were measured in peripheral plasma. However, the area under the curve (AUC) value derived from the receiver operating characteristic (ROC) analysis did not demonstrate a high discriminative capacity, suggesting potential limitations in its standalone predictive performance (Figure S14B). We employed two models to assess the relationship between TIMP4 and clinical features: correlation analysis ( Model 1 ) and regression analysis ( Model 2 ). In both models, TIMP4 levels were negatively correlated with several lung function measures, including pre-bronchodilator spirometry FEV1 (pre_FEV1), post-bronchodilator spirometry FEV1 (post_FEV1), maximum mid-expiratory flow (pre_MMEF), post_MMEF, pre-bronchodilator spirometry forced expiratory flow at 50% of FVC (pre_FEF50), post_FEF50, pre-bronchodilator spirometry forced expiratory flow at 75% of FVC (pre_FEF75), post_FEF75 , and post_FEV1/FVC (Figs. 4 J and S14C). However, non-significant correlation was found with FVC. Additionally, TIMP4 expression demonstrated significant negative correlations with CT metrics, including expiratory LAA-856 , inspiratory LAA-950 , and percent residual maximum flow at specific airway diameter ( PRMfSAD , Fig. 5 K). In contrast, TIMP4 expression indicated a non-significant association with questionnaire scores, including the COPD assessment test and clinical COPD questionnaire (Supplementary Data 19). These results suggest that TIMP4 is elevated across various samples in patients with COPD and is significantly associated with lung function and CT indicators, suggesting an essential role for TIMP4 in COPD progression. TIMP4-positive ciliated cells exhibit stronger effects from WNT and non-canonical WNT signaling pathways in both incoming and outgoing signals Using single-cell sequencing data from human and mouse lung tissues, we observed that TIMP4 is highly expressed in ciliated cells and indicates some expression in fibroblasts and pericytes or SMA cells (Figs. 6 A-B). We categorized ciliated cells into TIMP4-positive (TIMP4_Pos) and TIMP4-negative (TIMP4_Neg) subsets and analyzed the DEGs between them (Fig. 6 C). In human ciliated cells, TIMP4_Pos exhibited 1,267 significantly upregulated genes and 219 downregulated genes, while in mice, they exhibited 513 upregulated and 229 downregulated genes (Supplementary Data 20). An enrichment analysis of the upregulated genes in TIMP4_Pos ciliated cells revealed significant associations with cilium formation, ciliary structure, and ciliary movement in both datasets (Fig. 6 D and Supplementary Data 21). The PPI network highlighted cilium-related functions at the core (Figure S15A). Notably, 141 overlapping upregulated genes were identified between humans and mice (Figure S15B), suggesting similar functional characteristics despite species differences. Additionally, downregulated genes in TIMP4_Pos ciliated cells were primarily enriched in interferon signaling , VEGFA-VEGFR2 signaling , and L13a-mediated translational silencing of ceruloplasmin expression (Fig. 6 E), with 11 overlapping genes found across both datasets (Figure S15C and Supplementary Data 22). Subsequently, we analyzed intercellular communication between TIMP4_Pos and TIMP4_Neg ciliated cells and other cell subpopulations. Our findings revealed that in both human and mouse sequencing data, the signal intensity of WNT and non-canonical WNT pathways was significantly more vigorous in TIMP4_Pos ciliated cells than TIMP4_Neg ciliated cells, both in incoming and outgoing signaling patterns (Figs. 5 F and S16). Increasing evidence suggests that WNT and non-canonical WNT pathways play crucial roles in epithelial cell functions in patients with COPD[ 30 ], including tissue repair and epithelial-mesenchymal transition (EMT)[ 31 ]. We analyzed the receptor-ligand interactions of WNT and non-canonical WNT pathways in TIMP4_Pos and TIMP4_Neg ciliated cells. In the human sequencing data for the WNT pathway, alveolar fibroblasts, AT2 cells, basal or club cells, and B plasma cells revealed a stronger effect on TIMP4_Pos ciliated cells. Conversely, TIMP4_Pos ciliated cells significantly impacted AT1, AT2, basal or club cells, VE veins, and ciliated cells (Figs. 5 G and S17A). For the non-canonical WNT pathway , basal or club cells exerted a stronger effect on TIMP4_Pos ciliated cells, which also influenced AT1, AT2, basal or club cells, VE veins, and ciliated cells (Figs. 5 G and S17B and Supplementary Data 23). In the mouse sequencing data, the WNT pathway indicated that basal cells exhibited a significant effect on TIMP4_Pos ciliated cells, while the latter exhibited greater effects on fibroblasts, endothelial cells, and ciliated cells (Figures S18A-B). In the non-canonical WNT pathway , TIMP4-positive ciliated cells indicated a stronger effect on endothelial cells (Figures S17C-D and Supplementary Data 24). Moreover, we examined the expression differences of WNT and non-canonical WNT ligands and receptors in TIMP4_Pos and TIMP4_Neg ciliated cells. In human sequencing, WNT7B, WNT9A, FZD3, FZD6, LRP6, and WNT5B were observed to be highly expressed in TIMP4_Pos cells (Figures S19A-B). In mouse sequencing, WNT4, WNT7b, FZD6, FZD3, LRP6, WNT5a, and WNT11 were similarly elevated in TIMP4-positive cells (Figures S20A and B). The results indicate that TIMP4-positive ciliated cells may exert functional effects on ciliated cells through WNT or non-canonical WNT pathways . TIMP4 overexpression in airway epithelial cells leads to a reduction in the expression of ciliated genes Subsequently, we conducted immunohistochemical analysis and confirmed that TIMP4 expression is elevated in the lung tissue of patients with COPD, with its primary localization in epithelial cells (Figure S21A). Furthermore, RNA levels of TIMP4 are increased in bronchial brushings from patients with COPD (Figure S21B). We constructed a lentivirus carrying a TIMP4 overexpression vector further to investigate the role of TIMP4 in epithelial cells. We transduced primary airway epithelial cells obtained from bronchial brushings with the TIMP4 vector, followed by culturing the cells at the ALI in a Transwell system. On day 6 of differentiation, we stimulated the cells with CSE and collected them on day 28 (Fig. 7 A). Multiplex immunofluorescence (IF) confirmed the successful establishment of the ALI model (Figure S21C). Notably, we observed diminished cilia fluorescence in cells with high TIMP4 expression and validated the efficient overexpression of TIMP4 in epithelial cells at the ALI (Figures S21D-E). Subsequently, we performed transcriptome sequencing on four groups of epithelial cells at ALI: CON_PBS , TIMP4_PBS , CON_CSE , and TIMP4_CSE (Figure S22A). The expression of DEGs was evident in both PBS and CSE groups (Figures S22B-C). We performed GSEA on CON_PBS versus TIMP4_PBS and CON_CSE versus TIMP4_CSE groups. The TIMP4_PBS group is significantly associated with the FN matrix formation and interleukin (IL)-18 signaling pathways (Figure S22D). In the TIMP4_CSE group, significant positive correlations were observed with the IL-36 signaling, FN matrix formation, and LTR4/CysLTR-mediated IL-4 production pathways (Figure S22E). Notably, the latter has been reported to be closely associated with airway inflammation, particularly in asthma[ 32 ]. Subsequently, we conducted WGCNA and categorized all genes into 13 modules (Figures S22F-G). Correlation analysis based on group characteristics revealed that the orangered4 module was positively correlated with the TIMP4_CSE group, while the darkmagenta and mediumpurple3 modules indicated significant negative correlations with this group (Fig. 7 B). We performed enrichment analysis on the genes from these three modules. The darkmagenta module, containing the highest number of genes (Supplementary Data 25), was predominantly enriched in cilia-related processes, including cilium organization , ciliopathies , cilium assembly , and epithelial cilium movement (Fig. 7 C). The mediumpurple3 module was associated with positive regulation of endothelial cell migration and transport along microtubules (Fig. 7 D), while the orangered4 module was linked to hemostasis (Fig. 7 E). By utilizing the 235 genes identified by Anirudh Patir et al. as closely linked to ciliary function, we found that 160 of these genes intersected with the darkmagenta module (Figure S22H and Supplementary Data 26). Notably, these genes were significantly downregulated upon CSE stimulation, with a more pronounced downregulation observed in the TIMP4_CSE group compared to the CON_CSE group (Fig. 7 F). Besides, we analyzed the genes from the three identified modules concerning DEG sets of ciliated cells from single-cell RNA sequencing of human lung tissues. This analysis revealed 169 overlapping downregulated genes and 67 overlapping upregulated genes (Figure S23A). Enrichment analysis indicated that the downregulated genes were primarily associated with ciliary function. In contrast, the upregulated genes were enriched in processes related to negative regulation of cell proliferation , response to bacteria , and response to reactive oxygen species (Figure S23B). We identified eight distinct clusters using the Mfuzz package for clustering based on gene expression changes (Figure S23C and Supplementary Data 27). Among the genes in the three WGCNA modules, cluster 2 contained the most overlapping genes, comprising 58.08% of the WGCNA gene set (Figure S23D), and was significantly enriched in ciliary functions (Figure S23E). These findings suggest that gene expression changes induced by TIMP4 overexpression in airway epithelial cells cultured at ALI are primarily enriched in ciliary-related pathways, highlighting a solid connection between TIMP4 expression and ciliary structure and function. Overexpression of TIMP4 reduces cilia in airway epithelial cells cultured at ALI We conducted an IF analysis on primary epithelial cells cultured at ALI based on these results. The fluorescence intensity of the ciliary markers α-tubulin and muc5ac in TIMP4 and NC groups of the PBS cohort indicated non-significant changes. However, the TIMP4_CSE group exhibited a notable reduction in α-tubulin compared to the CON_CSE group, while the increase in muc5ac fluorescence was statistically non-significant (Fig. 8 A). These findings suggest a solid connection between TIMP4 and ciliary structure and function. Based on sequencing data and previous studies, we identified critical genes associated with ciliary function, including RFX3, TTLL6, HYDIN, SPAG16, SPAG17, DNAAF1, DNAH11, CFAP44, and CFAP65 . In sequencing data from airway brush samples, TIMP4 expression was negatively correlated with the genes above, indicating statistically significant correlations with TTLL6, HYDIN, DNAAF1, DNAH11, CFAP44 , and SPAG17 (Figure S24A). Further correlation analyses in COPD and control groups revealed that TIMP4 was significantly negatively correlated with SPAG17, HYDIN , and DNAH11 (Figure S24B). Furthermore, we validated the expression of these ciliary genes in cells cultured at ALI. The findings indicated that CSE stimulation significantly decreased the expression of most ciliary genes. Notably, RNA levels of RFX3, TTLL6, DNAAF1, HYDIN , and SPAG17 were markedly reduced in the TIMP4_CSE group compared to the CON_CSE group (Fig. 8 B), while expression levels of SPAG17 , DNAH11, CFAP65 , and CFAP44 did not differ significantly between the two groups (Figure S25A). As a TIMP family member, TIMP4 primarily interacts with MMPs to exert its effects. We explored the correlation between TIMP4 and MMPs in patients with COPD and found significant positive correlations with MMP1, MMP3, MMP8, MMP9, and MMP10 in the GSE47460 dataset (Figure S25B). In ALI sequencing results, MMP9 and MMP10 levels were notably elevated in the TIMP4_CSE group compared to the CON_CSE group (Figure S25C). Furthermore, the correlation between TIMP4 and MMP9 as well as MMP10 was more pronounced in the COPD group within the lung tissue sequencing data (Figure S25D). Prior research indicated that MMP9 and MMP10 are upregulated in patients with COPD and are often associated with epithelial cell EMT and the WNT/β-catenin pathway. In ALI cells, the levels of MMP9 and FN1, a marker of epithelial cell EMT, were significantly higher in the TIMP4_CSE group than in the CON_CSE group (Fig. 8 C). Furthermore, β-catenin and phosphorylated β-catenin expression levels were elevated in the TIMP4_CSE group (Fig. 8 D). These results suggest that the elevated expression of TIMP4 under CSE stimulation may impair ciliated cell function, potentially by regulating MMP9 and the β-catenin pathway. Discussion COPD is a heterogeneous lung disease characterized by chronic respiratory symptoms and persistent airflow obstruction[ 33 ]. However, classifying the diverse population of patients with COPD into distinct subtypes poses significant challenges[ 34 ]. COPD subtypes can be categorized based on various phenotypes, including those related to disease progression (rapid decline in lung function), exacerbation patterns (frequent acute exacerbators), and comorbidity-related phenotypes (COPD-asthma overlap and COPD-bronchiectasis overlap)[ 35 ]. Recently, there has been growing focus on the type 2 inflammation phenotype of COPD, characterized by eosinophilia, which underscores the need for a more nuanced understanding of this disease[ 36 ]. The substantial phenotypic divergence among COPD patients—even those meeting standardized diagnostic criteria—underscores the limitations of relying solely on clinical symptoms and spirometry for disease stratification[ 37 ]. This variability presents substantial challenges that complicate early intervention and treatment. Advancing in sequencing technologies and genomics drive research into the genetic factors underlying COPD, aiming to elucidate its pathogenesis[ 38 ] and enable classifying patients with COPD into subtypes for more precise and targeted therapeutic approaches[ 39 ]. Global sequencing initiatives in COPD lung tissue have yielded numerous putative pathogenic candidates, yet marked interstudy discordance persists among reported genes[ 40 ]. This variability stems from technical confounders (specimen procurement protocols, biopsy localization, platform variability) compounded by patient subphenotypic diversity.[ 40 ]. Emerging meta-analytic strategies—including cross-cohort DEG intersection [ 41 ] and rank-rank abundance normalization[ 7 ] — enhance reproducibility by prioritizing genes with conserved differential expression. However, such approaches risk excluding functionally pivotal genes exhibiting non-linear expression dynamics or failing to meet arbitrary fold-change thresholds[ 42 ]. Contemporary clinical research increasingly deploys ML-driven frameworks to identify prognostic biomarkers and construct predictive models. Hendrik Pott's COSYCONET model demonstrated IL-6 and CRP thresholdspredict 3-year COPD exacerbation risk [ 43 ]. Genomic integration strategies now synergize clinical parameters with computational analytics: Justin Sui's single-cell resolution analysis identified gelsolin as a key COPD mediator through neural network-driven feature selection[ 44 ], while Erkang Yi's multimodal integration revealed CLEC5A's role in early-stage COPD via gradient boosting classifiers[ 9 ]. Our multi-cohort machine learning framework integrating clinical and transcriptomic data from distinct COPD populations revealed striking molecular heterogeneity, evidenced by the complete lack of overlapping candidate genes between cohorts through LASSO regression analysis. To address this biological variability, we implemented a cross-cohort validation strategy combining feature selection models, ultimately identifying a 13-gene diagnostic signature containing both partially characterized (FGG, TIMP4, CXCL12, HTR2B) and novel COPD-associated targets (ANGPTL1, DUSP26, GAS2, VEGFD, BHLHE22, SYNGR1, GEMIN5, SV2B, TMEM117). Our results validate and extend key COPD-related findings: The elevated FGG expression corroborates Zhang et al.'s clinical observation[ 45 ], while our multi-omics feature selection reinforces Hao et al.'s reported association between TIMP4 levels and FEV1% decline/acute exacerbations[ 46 ]. CXCL12's inclusion suggests broader mechanistic implications beyond Roos et al.'s established IL-17A-mediated neutrophilic inflammation[ 47 ], potentially involving alternative pathological dimensions. HTR2B's independent selection not only supports Li et al.'s comorbidity hypothesis [ 48 ] but also reveals its potential role in non-neoplastic COPD progression. Currently, there have been no reported studies linking ANGPTL1, DUSP26, GAS2, VEGFD, BHLHE22, SYNGR1, GEMIN5, SV2B, and TMEM117 to COPD. The gene-derived models demonstrated significant correlations with pulmonary function decline and emphysema progression. Multi-tissue expression profiling revealed diagnostic score distribution across both structural (alveolar epithelial) and immune (macrophage/T-cell) compartments, substantiating COPD's multi-factorial pathomechanistic networks. While ensemble models showed strong COPD specificity (random forest: 82.3%; extra trees: 79.1%), reduced discrimination in non-COPD cohorts (specificity: 68.9%) suggests prevalent pre-symptomatic molecular signatures preceding functional impairment. These findings necessitate longitudinal validation to establish diagnostic thresholds for early-stage COPD/PRISM identification. Mendelian randomization (MR) capitalizes on genetic variants' random segregation during meiosis to mitigate confounding biases in causal inference[ 49 ]. Our application of multivariable MR (SMR/TSMR) revealed TIMP4—a ciliated cell-enriched protease—as a COPD-predisposing gene with dose-dependent effects on lung function decline. Despite limited instrument variable availability across tissues, rigorous colocalization analyses confirmed lung-specific causal effects distinct from blood-mediated pathways. Respiratory cilia, essential for maintaining airway sterility through coordinated mucociliary clearance, exhibit pathologically reduced beat frequency[ 50 ] and dyssynchrony in COPD, creating a vicious cycle of retained pathogens, chronic inflammation, and progressive tissue remodeling[ 35 ].In addition, Transforming growth factor-beta and MMPs are crucial in COPD pathogenesis[ 51 ], numerous studies report persistent activation of the WNT/β-catenin pathway in the airway epithelium of patients with COPD, promoting EMT of the airway epithelium. TIMP4, a member of the TIMP family, plays an essential role in regulating the homeostasis of the ECM [ 35 ] but also modulates processes such as fibrosis[ 35 ] and inflammation[ 36 ]. While TIMPs have been extensively studied in cancer and vascular diseases, research on TIMP-4 is limited[ 52 ]. Jenny Lutshumba et al. demonstrated that TIMP4 transcriptional activation can contribute to the development of abdominal aortic aneurysm[ 53 ]. Research on TIMP4 in pulmonary diseases, particularly COPD, is scarce. However, studies have indicated that TIMP-4 levels in exhaled breath condensate are elevated in patients with COPD compared to healthy controls and negatively correlated with predicted FEV1%[ 54 ]. Although we have identified TIMP4 as a potential pathogenic gene in COPD progression, our study did not explore its specific functional mechanisms, particularly in vivo . The observed elevation of MMPs in COPD patients may trigger compensatory increases in TIMPs to counteract protease hyperactivity[ 55 , 56 ], as evidenced by the strong positive correlation between TIMP4 and MMPs levels in our analyses. We hypothesize that TIMP4 may play a role in ciliated cells based on MR and cohort sample results, as well as single-cell sequencing analysis. To investigate this, we used human primary epithelial cells cultured at an ALI and exposed to prolonged CSE to simulate authentic airway epithelial conditions. Under CSE stimulation, overexpression of TIMP4 significantly reduced the expression of cilia-related genes, a phenomenon not observed in the PBS control group. This suggests that TIMP4 may exert its effects only in response to external stimuli. We propose that TIMP4, through its impairment of ciliary function, exacerbates COPD progression - a maladaptive compensatory response to MMP dysregulation that paradoxically accelerates airway remodeling[ 56 ]. Our findings suggest TIMP4 contributes to COPD pathogenesis by disrupting MMP/anti-protease balance, leading to abnormal collagen deposition, and may exacerbate airflow limitation through ECM-mediated impairment of airway smooth muscle elasticity and mucociliary clearance. These hypothesized mechanisms require experimental validation, underscoring the need for future studies using ciliated cell-specific TIMP4 knockout models to establish its precise role in disease progression. In summary, we developed a model constructed using various machine learning methods to specifically identify COPD in multi-center patient cohorts. We identified TIMP4 as a potential pathogenic gene by integrating single-cell analysis of human lung tissues and lung tissues from COPD mouse models and MR analysis of GWAS data related to COPD and lung function. Additionally, we preliminarily confirmed its impact on ciliated cells in human primary epithelial cells cultured at ALI. Future research should investigate the specific role of TIMP4 in ciliated cells during COPD progression and uncover its underlying mechanisms. Further efforts should be made to evaluate TIMP4’s potential as a biomarker and early therapeutic target for COPD. Declarations Data availability The datasets supporting the conclusions of this article are available in online repositories. Names of repositories/repositories and accession number(s) are provided in the article/Additional Material. The transcriptome sequencing data from this study have been deposited in the Genome Sequence Archive (GSA) under accession number OMIX009278. All other relevant materials are available from the corresponding author upon reasonable request. Funding This study received grants from the National Natural Science Foundation of China (Nos.: 82200045, 82270043, and 81970045), the Foundation of Guangzhou National Laboratory (Nos.: SRPG23-005, SRPG22-018 and SRPG22-016), the Youth Foundation of the National Key Laboratory of Respiratory Diseases (No.: SKLRD-Z-202326), and the Postdoctoral Startup Foundation of Guangzhou City (grants awarded to Q.Y.L). Human Ethics and Consent to Participate declarations This study was conducted in accordance with the ethical guidelines outlined in the Declaration of Helsinki and approved by the Ethics Committee of the First Affiliated Hospital of Guangzhou Medical University (approval number: 2020-51). The peripheral blood and airway brush cell samples from COPD patients and the control group were sourced from the Early Chronic Obstructive Pulmonary Disease (ECOPD) study (Trial registration: Chinese Clinical Trial Registry ChiCTR190002464. Registered on 19 July 2019). Written informed consent was obtained from all participants prior to specimen collection. Contributions Erkang Yi, Haiqing Li, and Pixin Ran conceived and designed the experiments. Erkang Yi, Haiqing Li, Yu Liu, Qingyang Li, Ruining Sun, and Hairong Wang performed the experiments. Erkang Yi and Chengshu Xie performed bioinformatics analysis. Erkang Yi, Fan Wu, Kunning Zhou, Zhishan Deng and Xinru Ran performed clinical correlation analysis. Erkang Yi and Haiqing Li wrote the manuscript and analyzed the data. Erkang Yi, Qingyang Li, Yumin Zhou, and Pixin Ran supervised the research and acquired the funding. Consent for publication Not applicable. Conflict of interest The authors declare no conflicts of interest. References Wang C, Xu J, Yang L, Xu Y, Zhang X, Bai C, Kang J, Ran P, Shen H, Wen F et al : Prevalence and risk factors of chronic obstructive pulmonary disease in China (the China Pulmonary Health [CPH] study): a national cross-sectional study . Lancet 2018, 391 (10131):1706-1717. Sin DD, Doiron D, Agusti A, Anzueto A, Barnes PJ, Celli BR, Criner GJ, Halpin D, Han MK, Martinez FJ et al : Air pollution and COPD: GOLD 2023 committee report . Eur Respir J 2023, 61 (5). Belz DC, Putcha N, Alupo P, Siddharthan T, Baugh A, Hopkinson N, Castaldi P, Papi A, Mannino D, Miravitlles M et al : Call to Action: How Can We Promote the Development of New Pharmacologic Treatments in COPD? Am J Respir Crit Care Med 2024. Ezzie ME, Crawford M, Cho JH, Orellana R, Zhang S, Gelinas R, Batte K, Yu L, Nuovo G, Galas D et al : Gene expression networks in COPD: microRNA and mRNA regulation . Thorax 2012, 67 (2):122-131. Morrow JD, Zhou X, Lao T, Jiang Z, DeMeo DL, Cho MH, Qiu W, Cloonan S, Pinto-Plata V, Celli B et al : Functional interactors of three genome-wide association study genes are differentially expressed in severe chronic obstructive pulmonary disease lung tissue . Sci Rep 2017, 7 :44232. Castaldi PJ, Benet M, Petersen H, Rafaels N, Finigan J, Paoletti M, Marike Boezen H, Vonk JM, Bowler R, Pistolesi M et al : Do COPD subtypes really exist? COPD heterogeneity and clustering in 10 independent cohorts . Thorax 2017, 72 (11):998-1006. Yi E, Cao W, Zhang J, Lin B, Wang Z, Wang X, Bai G, Mei X, Xie C, Jin J et al : Genetic screening of MMP1 as a potential pathogenic gene in chronic obstructive pulmonary disease . Life Sci 2022:121214. Cao W, Li J, Che L, Yang R, Wu Z, Hu G, Zou W, Zhao Z, Zhou Y, Jiang X et al : Single-cell transcriptomics reveals e-cigarette vapor-induced airway epithelial remodeling and injury . Respir Res 2024, 25 (1):353. Li Q, Liu Y, Wang X, Xie C, Mei X, Cao W, Guan W, Lin X, Xie X, Zhou C et al : The influence of CLEC5A on early macrophage-mediated inflammation in COPD progression . Cell Mol Life Sci 2024, 81 (1):330. Yi E, Zhang J, Zheng M, Zhang Y, Liang C, Hao B, Hong W, Lin B, Pu J, Lin Z et al : Long noncoding RNA IL6-AS1 is highly expressed in chronic obstructive pulmonary disease and is associated with interleukin 6 by targeting miR-149-5p and early B-cell factor 1 . Clin Transl Med 2021, 11 (7):e479. Yi E, Lin B, Zhang Y, Wang X, Zhang J, Liu Y, Jin J, Hong W, Lin Z, Cao W et al : Smad3-mediated lncRNA HSALR1 enhances the non-classic signalling pathway of TGF-beta1 in human bronchial fibroblasts by binding to HSP90AB1 . Clin Transl Med 2023, 13 (6):e1292. Kim S, Herazo-Maya JD, Kang DD, Juan-Guardela BM, Tedrow J, Martinez FJ, Sciurba FC, Tseng GC, Kaminski N: Integrative phenotyping framework (iPF): integrative clustering of multiple omics data identifies novel lung disease subphenotypes . BMC Genomics 2015, 16 :924. Cruz T, Lopez-Giraldo A, Noell G, Guirao A, Casas-Recasens S, Garcia T, Saco A, Sellares J, Agusti A, Faner R: Smoking Impairs the Immunomodulatory Capacity of Lung-Resident Mesenchymal Stem Cells in Chronic Obstructive Pulmonary Disease . Am J Respir Cell Mol Biol 2019, 61 (5):575-583. de Fays C, Geudens V, Gyselinck I, Kerckhof P, Vermaut A, Goos T, Vermant M, Beeckmans H, Kaes J, Van Slambrouck J et al : Mucosal immune alterations at the early onset of tissue destruction in chronic obstructive pulmonary disease . Front Immunol 2023, 14 :1275845. Steiling K, van den Berge M, Hijazi K, Florido R, Campbell J, Liu G, Xiao J, Zhang X, Duclos G, Drizik E et al : A dynamic bronchial airway gene expression signature of chronic obstructive pulmonary disease and lung function impairment . Am J Respir Crit Care Med 2013, 187 (9):933-942. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES et al : Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles . Proc Natl Acad Sci U S A 2005, 102 (43):15545-15550. Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, Gable AL, Fang T, Doncheva NT, Pyysalo S et al : The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest . Nucleic Acids Res 2023, 51 (D1):D638-D646. Langfelder P, Horvath S: WGCNA: an R package for weighted correlation network analysis . BMC Bioinformatics 2008, 9 :559. Engebretsen S, Bohlin J: Statistical predictions with glmnet . Clin Epigenetics 2019, 11 (1):123. Sauler M, McDonough JE, Adams TS, Kothapalli N, Barnthaler T, Werder RB, Schupp JC, Nouws J, Robertson MJ, Coarfa C et al : Characterization of the COPD alveolar niche using single-cell RNA sequencing . Nat Commun 2022, 13 (1):494. Watanabe N, Fujita Y, Nakayama J, Mori Y, Kadota T, Hayashi Y, Shimomura I, Ohtsuka T, Okamoto K, Araya J et al : Anomalous Epithelial Variations and Ectopic Inflammatory Response in Chronic Obstructive Pulmonary Disease . Am J Respir Cell Mol Biol 2022, 67 (6):708-719. Wu F, Zhou Y, Peng J, Deng Z, Wen X, Wang Z, Zheng Y, Tian H, Yang H, Huang P et al : Rationale and design of the Early Chronic Obstructive Pulmonary Disease (ECOPD) study in Guangdong, China: a prospective observational cohort study . J Thorac Dis 2021, 13 (12):6924-6935. Shrine N, Guyatt AL, Erzurumluoglu AM, Jackson VE, Hobbs BD, Melbourne CA, Batini C, Fawcett KA, Song K, Sakornsakolpat P et al : New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries . Nat Genet 2019, 51 (3):481-493. Higbee DH, Granell R, Sanderson E, Davey Smith G, Dodd JW: Lung function and cardiovascular disease: a two-sample Mendelian randomisation study . Eur Respir J 2021, 58 (3). Burgess S, Thompson SG: Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects . Am J Epidemiol 2015, 181 (4):251-260. Consortium GT: Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans . Science 2015, 348 (6235):648-660. Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, Montgomery GW, Goddard ME, Wray NR, Visscher PM et al : Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets . Nat Genet 2016, 48 (5):481-487. Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, Laurin C, Burgess S, Bowden J, Langdon R et al : The MR-Base platform supports systematic causal inference across the human phenome . Elife 2018, 7 . Han Z, Tian R, Ren P, Zhou W, Wang P, Luo M, Jin S, Jiang Q: Parkinson's disease and Alzheimer's disease: a Mendelian randomization study . BMC Med Genet 2018, 19 (Suppl 1):215. Baarsma HA, Konigshoff M: 'WNT-er is coming': WNT signalling in chronic lung diseases . Thorax 2017, 72 (8):746-759. Heijink IH, de Bruin HG, Dennebos R, Jonker MR, Noordhoek JA, Brandsma CA, van den Berge M, Postma DS: Cigarette smoke-induced epithelial expression of WNT-5B: implications for COPD . Eur Respir J 2016, 48 (2):504-515. Gartner Y, Bitar L, Zipp F, Vogelaar CF: Interleukin-4 as a therapeutic target . Pharmacol Ther 2023, 242 :108348. Agusti A, Hogg JC: Update on the Pathogenesis of Chronic Obstructive Pulmonary Disease . N Engl J Med 2019, 381 (13):1248-1256. Corlateanu A, Mendez Y, Wang Y, Garnica RJA, Botnaru V, Siafakas N: "Chronic obstructive pulmonary disease and phenotypes: a state-of-the-art." . Pulmonology 2020, 26 (2):95-100. Han MK, Hanania NA, Martinez FJ: Confronting the Challenge of COPD: What Is New in the Approaches to Diagnosis, Treatment, and Patient Outcomes . Chest 2018, 154 (4):984-985. Bhatt SP, Rabe KF, Hanania NA, Vogelmeier CF, Bafadhel M, Christenson SA, Papi A, Singh D, Laws E, Patel N et al : Dupilumab for COPD with Blood Eosinophil Evidence of Type 2 Inflammation . N Engl J Med 2024, 390 (24):2274-2283. Segal LN, Martinez FJ: Chronic obstructive pulmonary disease subpopulations and phenotyping . J Allergy Clin Immunol 2018, 141 (6):1961-1971. Agusti A, Celli B, Faner R: What does endotyping mean for treatment in chronic obstructive pulmonary disease? Lancet 2017, 390 (10098):980-987. Leung JM, Obeidat M, Sadatsafavi M, Sin DD: Introduction to precision medicine in COPD . Eur Respir J 2019, 53 (4). Ragland MF, Benway CJ, Lutz SM, Bowler RP, Hecker J, Hokanson JE, Crapo JD, Castaldi PJ, DeMeo DL, Hersh CP et al : Genetic Advances in Chronic Obstructive Pulmonary Disease. Insights from COPDGene . Am J Respir Crit Care Med 2019, 200 (6):677-690. Zhu M, Ye M, Wang J, Ye L, Jin M: Construction of Potential miRNA-mRNA Regulatory Network in COPD Plasma by Bioinformatics Analysis . Int J Chron Obstruct Pulmon Dis 2020, 15 :2135-2145. Stockley RA, Halpin DMG, Celli BR, Singh D: Chronic Obstructive Pulmonary Disease Biomarkers and Their Interpretation . Am J Respir Crit Care Med 2019, 199 (10):1195-1204. Pott H, Weckler B, Gaffron S, Martin R, Maier D, Alter P, Biertz F, Speicher T, Bertrams W, Jung AL et al : Diffusion capacity and static hyperinflation as markers of disease progression predict 3-year mortality in COPD: Results from COSYCONET . Respirology 2024. Sui J, Xiao H, Mbaekwe U, Ting NC, Murday K, Hu Q, Gregory AD, Kapellos TS, Yildirim AO, Konigshoff M et al : Interpretable machine learning uncovers epithelial transcriptional rewiring and a role for Gelsolin in COPD . JCI Insight 2024. Zhang H, Li C, Song X, Cheng L, Liu Q, Zhang N, Wei L, Chung K, Adcock IM, Ling C et al : Integrated analysis reveals lung fibrinogen gamma chain as a biomarker for chronic obstructive pulmonary disease . Ann Transl Med 2021, 9 (24):1765. Hao W, Li M, Zhang Y, Zhang C, Wang P: Comparative Study of Cytokine Levels in Different Respiratory Samples in Mild-to-Moderate AECOPD Patients . Lung 2019, 197 (5):565-572. Roos AB, Sanden C, Mori M, Bjermer L, Stampfli MR, Erjefalt JS: IL-17A Is Elevated in End-Stage Chronic Obstructive Pulmonary Disease and Contributes to Cigarette Smoke-induced Lymphoid Neogenesis . Am J Respir Crit Care Med 2015, 191 (11):1232-1241. Li Y, Wang Y, Wu R, Li P, Cheng Z: HTR2B as a novel biomarker of chronic obstructive pulmonary disease with lung squamous cell carcinoma . Sci Rep 2024, 14 (1):13206. Walker VM, Davies NM, Hemani G, Zheng J, Haycock PC, Gaunt TR, Davey Smith G, Martin RM: Using the MR-Base platform to investigate risk factors and drug targets for thousands of phenotypes . Wellcome Open Res 2019, 4 :113. Loges NT, Marthin JK, Raidt J, Olbrich H, Hoben IM, Cindric S, Bracht D, Konig J, Rieck C, George S et al : A range of 30-62% of functioning multiciliated airway cells is sufficient to maintain ciliary airway clearance . Eur Respir J 2024, 64 (4). Churg A, Zhou S, Wright JL: Series "matrix metalloproteinases in lung health and disease": Matrix metalloproteinases in COPD . Eur Respir J 2012, 39 (1):197-209. Melendez-Zajgla J, Del Pozo L, Ceballos G, Maldonado V: Tissue inhibitor of metalloproteinases-4. The road less traveled . Mol Cancer 2008, 7 :85. Lutshumba J, Liu S, Zhong Y, Hou T, Daugherty A, Lu H, Guo Z, Gong MC: Deletion of BMAL1 in Smooth Muscle Cells Protects Mice From Abdominal Aortic Aneurysms . Arterioscler Thromb Vasc Biol 2018, 38 (5):1063-1075. Hao W, Li M, Zhang C, Zhang Y, Wang P: Inflammatory mediators in exhaled breath condensate and peripheral blood of healthy donors and stable COPD patients . Immunopharmacol Immunotoxicol 2019, 41 (2):224-230. Hao W, Li M, Zhang Y, Zhang C, Xue Y: Expressions of MMP-12, TIMP-4, and Neutrophil Elastase in PBMCs and Exhaled Breath Condensate in Patients with COPD and Their Relationships with Disease Severity and Acute Exacerbations . J Immunol Res 2019, 2019 :7142438. Sun J, Bao J, Shi Y, Zhang B, Yuan L, Li J, Zhang L, Sun M, Zhang L, Sun W: Effect of simvastatin on MMPs and TIMPs in cigarette smoke-induced rat COPD model . Int J Chron Obstruct Pulmon Dis 2017, 12 :717-724. Additional Declarations No competing interests reported. Supplementary Files Supplementaltables.xlsx SupplementalFigurescleanversionR1.docx WesternblotRawdata.pdf Cite Share Download PDF Status: Published Journal Publication published 23 Apr, 2025 Read the published version in Respiratory Research → Version 1 posted Editorial decision: Accepted 15 Apr, 2025 Reviews received at journal 13 Apr, 2025 Reviewers agreed at journal 13 Apr, 2025 Reviewers invited by journal 06 Apr, 2025 Submission checks completed at journal 04 Apr, 2025 First submitted to journal 02 Apr, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5739006","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":439210687,"identity":"8da7241f-0406-49bd-ae77-ce27576a0618","order_by":0,"name":"Erkang Yi","email":"","orcid":"","institution":"Guangzhou National Laboratory, Guangzhou International BioIsland","correspondingAuthor":false,"prefix":"","firstName":"Erkang","middleName":"","lastName":"Yi","suffix":""},{"id":439210688,"identity":"1af4ffa9-9ffb-4601-b64d-6c9c46bdaa25","order_by":1,"name":"Haiqing Li","email":"","orcid":"","institution":"the First Affiliated Hospital of Guangzhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Haiqing","middleName":"","lastName":"Li","suffix":""},{"id":439210689,"identity":"655e7954-e76c-4023-ac5d-ae27a426187f","order_by":2,"name":"Yu Liu","email":"","orcid":"","institution":"the First Affiliated Hospital of Guangzhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Yu","middleName":"","lastName":"Liu","suffix":""},{"id":439210690,"identity":"12b2a92d-1bbb-450d-9fc4-23910a698e05","order_by":3,"name":"Qingyang Li","email":"","orcid":"","institution":"the First Affiliated Hospital of Guangzhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Qingyang","middleName":"","lastName":"Li","suffix":""},{"id":439210691,"identity":"0c7c0e0e-7ec0-4637-a0e9-3130be06e516","order_by":4,"name":"Chengshu Xie","email":"","orcid":"","institution":"Guangzhou National Laboratory, Guangzhou International BioIsland","correspondingAuthor":false,"prefix":"","firstName":"Chengshu","middleName":"","lastName":"Xie","suffix":""},{"id":439210692,"identity":"7605a289-39c1-4911-800b-6492db56bdae","order_by":5,"name":"Ruining Sun","email":"","orcid":"","institution":"the First Affiliated Hospital of Guangzhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Ruining","middleName":"","lastName":"Sun","suffix":""},{"id":439210693,"identity":"873930c8-2e44-4c16-8424-3a760029dcb1","order_by":6,"name":"Fan Wu","email":"","orcid":"","institution":"the First Affiliated Hospital of Guangzhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Fan","middleName":"","lastName":"Wu","suffix":""},{"id":439210694,"identity":"ea03ab57-e60a-4770-925c-f4f8c87079fd","order_by":7,"name":"Zhishan Deng","email":"","orcid":"","institution":"the First Affiliated Hospital of Guangzhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Zhishan","middleName":"","lastName":"Deng","suffix":""},{"id":439210695,"identity":"fe184d77-3b51-484a-b6e7-6ed22f35c654","order_by":8,"name":"Kunning Zhou","email":"","orcid":"","institution":"the First Affiliated Hospital of Guangzhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Kunning","middleName":"","lastName":"Zhou","suffix":""},{"id":439210696,"identity":"0b67ed87-0a45-4221-84fd-98a13a0e21fc","order_by":9,"name":"Hairong Wang","email":"","orcid":"","institution":"Guangzhou National Laboratory, Guangzhou International BioIsland","correspondingAuthor":false,"prefix":"","firstName":"Hairong","middleName":"","lastName":"Wang","suffix":""},{"id":439210697,"identity":"0d8d9043-f507-4437-aa04-cc65845503f7","order_by":10,"name":"Xinru Ran","email":"","orcid":"","institution":"Guangzhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Xinru","middleName":"","lastName":"Ran","suffix":""},{"id":439210698,"identity":"9f7638ad-5c0c-4d58-84b1-3d7c3fb2f621","order_by":11,"name":"Yumin Zhou","email":"","orcid":"","institution":"the First Affiliated Hospital of Guangzhou Medical University","correspondingAuthor":false,"prefix":"","firstName":"Yumin","middleName":"","lastName":"Zhou","suffix":""},{"id":439210699,"identity":"b7f6a5fb-76f9-41bb-857f-0782c963e905","order_by":12,"name":"Pixin Ran","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAy0lEQVRIiWNgGAWjYNCCCgYGAzCDjWgtZ0jWwthGihZzidyHnwvn1dqbi50xYPhQdpiBf3YDfi2WPceNpWduO85sOTvHgHHGucMMEncO4NdicLyNQZp32zE2g9s5Bsy8bYcZDCQSCGg5zMb8m3fOMR6wlr9EaTnexibN21AjAdbCSIwWy55jbNY8xw4YGNxOKzjYcy6dR+IGAS3mEmnMt3lq6uwNbidvfPCjzFqOfwYhh0Gow2DyABDz4FeP0FJHUOEoGAWjYBSMYAAAGnk+OXKanxYAAAAASUVORK5CYII=","orcid":"","institution":"the First Affiliated Hospital of Guangzhou Medical University","correspondingAuthor":true,"prefix":"","firstName":"Pixin","middleName":"","lastName":"Ran","suffix":""}],"badges":[],"createdAt":"2024-12-31 04:08:15","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5739006/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5739006/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12931-025-03238-1","type":"published","date":"2025-04-23T15:57:43+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":80279722,"identity":"77b421db-820b-4959-9665-a4fc6b3e063b","added_by":"auto","created_at":"2025-04-10 05:37:18","extension":"jpg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":4448740,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAnalysis workflow of the study\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eHub genes for chronic obstructive pulmonary disease (COPD) recognition were identified through integrated analysis of multi-center lung tissue sequencing data from COPD patients, employing weighted gene co-expression network analysis (WGCNA) and multiple machine learning algorithms to establish a predictive gene model. Mendelian randomization analysis was subsequently applied to prioritize a central candidate gene. Functional exploration of this hub gene was conducted using single-cell RNA sequencing data derived from both human COPD specimens and murine experimental models. Clinical relevance was further validated by correlating its expression levels with disease severity metrics and spirometric parameters in primary COPD cohorts and independent validation datasets. Mechanistic investigations were completed through functional assays in an in vitro cellular model to evaluate its biological relevance in COPD pathogenesis.\u003c/p\u003e","description":"","filename":"Figure1.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-5739006/v1/fbb8f1bda603299fba2258c2.jpg"},{"id":80280667,"identity":"aaa19537-03a6-48e4-a041-65e0c97f9249","added_by":"auto","created_at":"2025-04-10 05:45:19","extension":"jpg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":5124367,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eIdentification of common gene signatures in lung tissue sequencing from two patients with COPD cohorts.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) The module-feature correlation heatmap depicts the correlation between modules and clinical parameters (COPD, FEV1% predicted, %LAA950, FVC% predicted or FEV/FVC) in GSE47460 and GSE76925. The number in the top left corner of each box represents the correlation coefficient, with red indicating a positive correlation and blue indicating a negative correlation. The number in the bottom right corner indicates the statistical significance, with darker colors representing greater statistical significance. (B) Gene enrichment analysis was conducted on significant modules identified by WGCNA in the GSE76925 and GSE47460 datasets. (C-D) The intersecting genes from COPD-positive modules and significantly upregulated DEGs, and those from COPD-negative modules and significantly downregulated DEGs, were included as candidate genes in a LASSO analysis using the GSE47460 dataset (C) or GSE76925 dataset (D). (E) After combining the genes selected by LASSO from GSE47460 and GSE76925, model construction and gene selection were performed separately in each dataset (GSE47460 and GSE76925). (F) The Venn diagram indicates the intersection of genes selected by four different machine learning methods (LASSO, Elasticnet, RFE-SVM, and Random Forest) in each of GSE47460 and GSE76925 datasets.\u003c/p\u003e","description":"","filename":"Figure2.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-5739006/v1/5694c13b461c284810cc7c0d.jpg"},{"id":80279736,"identity":"640647da-573f-42b9-8f23-6db5ca8e1b6d","added_by":"auto","created_at":"2025-04-10 05:37:19","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":4084641,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eA model constructed with 13 genes that accurately identify COPD.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A-B) ROC results demonstrate the performance of 20 different machine learning methods, based on the selected 13-gene model, in identifying COPD across GSE47460 (A) and GSE76925 (B). (C) ROC results indicated that the random forest and extra tree models constructed using GSE47460 and GSE76925 datasets were cross-validated against each other. (D) ROC results display AUC outcomes for validating the random forest and extra tree models using two external patients with COPD lung tissue sequencing data (GSE103174 and GSE239897). (E) The expression changes of the 13 genes (ANGPTL1, DUSP26, FGG, GAS2, VEGFD, BHLHE22, SYNGR1, TIMP4, CXCL12, GEMIN5, SV2B, HTR2B, and TMEM117) used to construct the model are indicated between control and COPD groups in GSE47460 and GSE76925 datasets. (F) Scatter plots revealing predicted versus observed FEV1% predicted values for each of the 13 regression models (AdaBoost, Decision Tree, ElasticNet, GLM, LASSO, Least.Angle, Linear, NeuralNet, RandomForest, Ridge, SGD, SVR, and voting) in GSE47460. Each point represents a sample in the test set. R-squared values and \u003cem\u003ep\u003c/em\u003e-values are displayed for each model.\u003c/p\u003e\n\u003cp\u003eData indicated mean ± SD. \u003cem\u003eP\u003c/em\u003e-values are indicated in charts determined by a two-tailed student test (E).\u003c/p\u003e","description":"","filename":"Figure3.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-5739006/v1/a6804576b1fc34e8897b04f5.jpg"},{"id":80279737,"identity":"dfbed804-82b1-47cc-8468-6c1a6cf370ab","added_by":"auto","created_at":"2025-04-10 05:37:19","extension":"jpg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":3554583,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe expression distribution of the 13 genes was used to construct the model across various lung cell subtypes in both humans and mice.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Uniform manifold approximation and projection (UMAP) plot visualizes single-cell transcriptomes of non-smokers and COPD in the GSE173896 dataset.\u003c/p\u003e\n\u003cp\u003e(B-C) UMAP plot (B) and dot-plot (C) visualize the expression scores of the 13 genes across different cell subtypes in the human lung from GSE173896.\u003c/p\u003e\n\u003cp\u003e(D) The enrichment of the 13 genes in different cell subpopulations of the human lung is illustrated by the bubble chart based on data from GSE173896.\u003c/p\u003e\n\u003cp\u003e(E) UMAP plot visualizes the single-cell transcriptomes of air-exposed mice and smoke-exposed mice from the GSE168299 dataset.\u003c/p\u003e\n\u003cp\u003e(F-G) UMAP plot (F) and dot-plot (G) visualize the expression scores of the 13 genes across different cell subtypes in the mouse lung from GSE168299.\u003c/p\u003e\n\u003cp\u003e(H) The enrichment of the 13 genes in different cell subpopulations of the mouse lung is illustrated by the bubble chart based on data from GSE168299.\u003c/p\u003e","description":"","filename":"Figure4.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-5739006/v1/0fd76477bc7ce159c3d1495d.jpg"},{"id":80279732,"identity":"1999a3e6-7a27-4f09-8270-cc725deade21","added_by":"auto","created_at":"2025-04-10 05:37:19","extension":"jpg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1708379,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMR analysis identified TIMP4 as a potential hub gene influencing the progression of COPD within the signature.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A)\u003cstrong\u003e \u003c/strong\u003eDiagram and steps of MR analysis process between relative genes and COPD outcome (including COPD, FEV1/FVC \u0026lt; 0.7, FEV1, FVC, and FEV1/FVC). Assumption 1, Independent Variables (IVs) are associated with exposure; assumption 2, IVs are unaffected by confounders; assumption 3, IVs influence outcome solely through exposure, excluding alternative pathways.\u003c/p\u003e\n\u003cp\u003e(B)\u003cstrong\u003e \u003c/strong\u003eForest plot of TSMR analyses between the expression of relative genes in the lung with COPD (COPD diagnosis and FEV1/FVC \u0026lt; 0.7) and lung function (FEV1, FVC, and FEV1/FVC) outcomes. Only statistical results are visualized. Effect sizes (Beta, 95% confidence interval [CI]) are in standard deviation change in COPD outcome per standard deviation of the expression of relative genes in the lung. Points on the forest plot represent effect size estimates, while whiskers denote 95% CIs.\u003c/p\u003e\n\u003cp\u003e(C) The heatmap displays the results of TSMR and SMR for the association of relevant genes with COPD (defined as COPD diagnosis and FEV1/FVC \u0026lt; 0.7) and lung function outcomes (FEV1, FVC, and FEV1/FVC). Red indicates positive beta values, blue indicates negative beta values, and the white box without value indicates that the gene lacks sufficient SNPs for analysis as an instrumental variable.\u003c/p\u003e\n\u003cp\u003e(D-E)\u003cstrong\u003e \u003c/strong\u003eTIMP4 expression in patients with different GOLD stages from GSE47460 (D) and GSE76925 datasets (E).\u003c/p\u003e\n\u003cp\u003e(F-G)\u003cstrong\u003e \u003c/strong\u003eComparison of TIMP4 expression between control and patients with COPD in the GSE37147 dataset (F) and among patients in the former smokers and current smokers’ groups (G).\u003c/p\u003e\n\u003cp\u003e(H)\u003cstrong\u003e \u003c/strong\u003eExpression of TIMP4 in former smokers, current smokers, former smokers with COPD, and current smokers with COPD.\u003c/p\u003e\n\u003cp\u003e(I) Secretion levels of TIMP4 in peripheral blood of patients with COPD compared to the control group.\u003c/p\u003e\n\u003cp\u003e(J) Correlation analysis of TIMP4 secretion levels in peripheral blood with Pre_FEV1 and Post_FEV1.\u003c/p\u003e\n\u003cp\u003e(K)\u003cstrong\u003e \u003c/strong\u003eCorrelation analysis of TIMP4 secretion levels in peripheral blood with quantitative CT metrics, including %LAA950, %LAA856, and PRMfSAD.\u003c/p\u003e\n\u003cp\u003eData indicated mean ± SD. \u003cem\u003eP\u003c/em\u003e-values revealed in charts determined by two-tailed Student's t-test (D–I) and Pearson’s correlation two-tailed (J-K). *\u003cem\u003ep \u003c/em\u003e\u0026lt; 0.05, **\u003cem\u003ep\u003c/em\u003e\u0026lt; 0.01, ***\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001 and ****\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.0001.\u003c/p\u003e","description":"","filename":"Figure5.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-5739006/v1/5948afe01c48d96338bcea67.jpg"},{"id":80279733,"identity":"4bc1849e-7029-4f6f-b7a9-04718ba623ca","added_by":"auto","created_at":"2025-04-10 05:37:19","extension":"jpg","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":2006192,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTIMP4-positive ciliated cells exhibit stronger effects from\u003c/strong\u003e\u003cem\u003e\u003cstrong\u003e WNT\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e and \u003c/strong\u003e\u003cem\u003e\u003cstrong\u003enon-canonical WNT\u003c/strong\u003e\u003c/em\u003e\u003cstrong\u003e signaling pathways in both incoming and outgoing signals.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A-B) UMAP plot indicates the expression distribution of TIMP4 across cell subpopulations in human lung tissue (A) and lung tissue from a COPD mouse model (B), sourced from GSE173896 and GSE168299, respectively.\u003c/p\u003e\n\u003cp\u003e(C) Volcano plot representation of DEGs between TIMP4-positive and TIMP4-negative ciliated cells in humans (left) and mice (right). Each point represents a gene, with log\u003csup\u003e2\u003c/sup\u003e fold change on the x-axis and -log\u003csup\u003e10\u003c/sup\u003e (adjusted p-value) on the y-axis. Red and blue points indicate significantly upregulated and downregulated genes, respectively (\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.05 and |log2 fold change| \u0026gt; 0.2). Horizontal dashed lines represent the significance threshold, and vertical dashed lines indicate fold change cutoffs.\u003c/p\u003e\n\u003cp\u003e(D-E) Combined enrichment analysis of upregulated DEGs (A) or downregulated (B) in TIMP4-positive versus TIMP4-negative ciliated cells in patients with COPD and COPD mouse models using Metascape.\u003c/p\u003e\n\u003cp\u003e(F) Intercellular interactions among various cell types within TIMP4-positive and TIMP4-negative ciliated cells in patients with COPD from the GSE173896 dataset. Red text denotes TIMP4-positive and TIMP4-negative ciliated cells, while blue boxes indicate \u003cem\u003eWNT\u003c/em\u003e and \u003cem\u003encWNT\u003c/em\u003e pathways.\u003c/p\u003e\n\u003cp\u003e(G) Comparison of \u003cem\u003eWNT signaling pathway\u003c/em\u003e or \u003cem\u003encWNT signaling pathway\u003c/em\u003e network interactions between TIMP4-positive and TIMP4-negative ciliated cells and other cell subpopulations in GSE173896.\u003c/p\u003e","description":"","filename":"Figure6.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-5739006/v1/409beda0b3fc0cc56c7d109e.jpg"},{"id":80279725,"identity":"54fd7a81-84b5-401b-9f53-f2418c051fce","added_by":"auto","created_at":"2025-04-10 05:37:19","extension":"jpg","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":1746787,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTIMP4 overexpression in airway epithelial cells leads to a reduction in the expression of ciliated genes.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Flowchart illustrating the sampling of primary human epithelial cells, lentiviral infection, establishment of the ALI, and CSE stimulation.\u003c/p\u003e\n\u003cp\u003e(B) The module-feature correlation heatmap depicts the correlation between modules and groups (CON_PBS, CON_CSE, TIMP4_PBS, and TIMP4_CSE) in RNA-seq of airway epithelial cells cultured at ALI. The number in the top left corner of each box represents the correlation coefficient, with red indicating a positive correlation and green indicating a negative correlation. The number in the bottom right corner indicates the statistical significance, with darker colors representing greater statistical significance.\u003c/p\u003e\n\u003cp\u003e(C and E) Enrichment analyses using Metascape for genes in the WGCNA modules darkmagenta, mediumpurple3, and orangered4, respectively.\u003c/p\u003e\n\u003cp\u003e(F) The heatmap displays the expression of 160 intersecting ciliated genes in the sequencing results of airway epithelial cells cultured at ALI.\u003c/p\u003e","description":"","filename":"Figure7.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-5739006/v1/397d29cf9c4624999f3a2571.jpg"},{"id":80279741,"identity":"ccb2899f-45b1-41b7-85b5-11cb5eb7e244","added_by":"auto","created_at":"2025-04-10 05:37:19","extension":"jpg","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":6268523,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTIMP4 overexpression reduces cilia in airway epithelial cells cultured at ALI and decreases ciliary gene expression.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e(A) Multiplex IF staining of human airway epithelial cells cultured at ALI. Green: α-tubulin; red: Muc5ac. Blue: DAPI (n = 5).\u003c/p\u003e\n\u003cp\u003e(B) qRT-PCR analysis of RNA levels of TTLL6, DNAFF1, RFX3, and HYDIN in primary human cells cultured at ALI across four groups (n = 5).\u003c/p\u003e\n\u003cp\u003e(C) WB analysis was conducted to assess the protein expression levels of FN1 and MMP9 across the four groups (n = 4).\u003c/p\u003e\n\u003cp\u003e(D) WB analysis was conducted to assess the protein expression levels of β-catenin and non-p-β-catenin across the four groups (n = 4).\u003c/p\u003e\n\u003cp\u003eData are expressed as mean ± SD. \u003cem\u003eP\u003c/em\u003e-values indicated in charts are determined by one-way analysis of variance (A–D). *\u003cem\u003ep \u003c/em\u003e\u0026lt; 0.05, **\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.01 and ***\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.001.\u003c/p\u003e","description":"","filename":"Figure8.tif.jpg","url":"https://assets-eu.researchsquare.com/files/rs-5739006/v1/4d7d7f77c5a1bfc06e84fb83.jpg"},{"id":81570887,"identity":"335bc028-146f-422f-8972-050ccbe8169b","added_by":"auto","created_at":"2025-04-28 16:14:14","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":32906546,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5739006/v1/704420b4-f005-43ec-89f4-b98fbede344e.pdf"},{"id":80280666,"identity":"0e27da39-b94f-42b6-a05c-6d865bd02a4e","added_by":"auto","created_at":"2025-04-10 05:45:18","extension":"xlsx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":2268282,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementaltables.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-5739006/v1/c887206b52401883a3c37362.xlsx"},{"id":80279764,"identity":"7af88c93-1131-4fd8-9c4f-f02573feb6af","added_by":"auto","created_at":"2025-04-10 05:37:21","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":50457983,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementalFigurescleanversionR1.docx","url":"https://assets-eu.researchsquare.com/files/rs-5739006/v1/abf88dd592eedaa7fd05d97e.docx"},{"id":80279721,"identity":"4f5f2aa5-6a18-42b9-b6fe-bd8526672373","added_by":"auto","created_at":"2025-04-10 05:37:18","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":497861,"visible":true,"origin":"","legend":"","description":"","filename":"WesternblotRawdata.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5739006/v1/9dcbf6d4a668bf3ee3387c5d.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"An integrated machine learning model of transcriptomic genes in multi-center chronic obstructive pulmonary disease reveals the causal role of TIMP4 in airway epithelial cell","fulltext":[{"header":"Introduction","content":"\u003cp\u003eChronic obstructive pulmonary disease (COPD) is a heterogeneous and complex respiratory condition that presents a significant social burden[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. Current treatments primarily focus on symptom management and preventing acute exacerbations, yet they have notable limitations[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Common therapies, including bronchodilators, inhaled corticosteroids, and oxygen therapy, do not effectively halt disease progression or improve long-term outcomes[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Consequently, there is an urgent need to investigate COPD pathogenesis further to identify new therapeutic targets and develop more effective interventions.\u003c/p\u003e \u003cp\u003eNumerous studies have performed genomic sequencing on lung tissue samples from patients with COPD to elucidate the molecular mechanisms associated with the condition[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. However, these studies frequently produce inconsistent results across different research centers, similarly attributable to variations in study design, sample selection, sequencing technologies, and data analysis methods[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Furthermore, the differential genes identified from sequencing analyses of different cohorts vary. Even when common differential genes are identified, the correlation of their expression with clinical characteristics may also differ across cohorts[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIn this study, we aimed to enhance robustness and reliability by integrating lung tissue sequencing data from patients with COPD across various centers. We employed machine learning techniques and model construction to identify key genes that effectively differentiate between patients with non-COPD and COPD. The model developed from these hub genes consistently distinguishes COPD from non-COPD across various datasets, potentially providing new insights into diagnostic markers and therapeutic targets for COPD.\u003c/p\u003e \u003cp\u003eFurther analysis using single-cell sequencing and Mendelian randomization (MR) studies of these hub genes revealed that tissue inhibitor of metalloproteinase 4 (TIMP4) is specifically expressed in ciliated cells and is significantly upregulated in patients with COPD. MR and clinical cohort data suggest a close relationship between TIMP4 expression, lung function, and computed tomography (CT) imaging findings. A comprehensive understanding of the early pathogenic events, derived from analyzing ciliated cells and their interactions with other cell types, could pave the way for innovative management strategies for complex, multifactorial chronic airway and pulmonary diseases.\u003c/p\u003e \u003cp\u003eThe TIMP family plays a crucial role in regulating extracellular matrix (ECM) degradation and remodeling in COPD by regulating matrix metalloproteinase (MMP) activity. This regulation balances tissue destruction and repair, influencing airway inflammation and fibrosis. We hypothesize that TIMP4 expression may be crucial in affecting ciliated cell function and airway clearance capacity, playing a key role in COPD progression.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eHuman bronchial brushing collection\u003c/h2\u003e \u003cp\u003ePrimary human bronchial epithelial (HBE) cells were obtained from brushings of 5th\u0026ndash;6th order bronchioles during fiberoptic bronchoscopy using an endoscopic cytobrush, as previously described[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. The material was obtained from the Biobank of the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China. The COPD diagnosis was confirmed by post-bronchodilator forced expiratory volume in one second (FEV1)/forced vital capacity (FVC)\u0026thinsp;\u0026lt;\u0026thinsp;70%. The exclusion criteria included asthma, bronchiectasis, pulmonary fibrosis, and active infection. This study followed the ethical guidelines outlined in the Declaration of Helsinki and was approved by the Ethics Committee of the First Affiliated Hospital of Guangzhou Medical University (approval number 2020-51). All participants provided written informed consent before enrollment.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eHBE cell air-liquid interface culture, lentiviral infections, and cigarette smoke extract treatment\u003c/h3\u003e\n\u003cdiv class=\"Heading\"\u003eHBE cell air-liquid interface culture, lentiviral infections, and cigarette smoke extract treatment\u003c/div\u003e \u003cp\u003eHBE cells were cultivated under air-liquid interface (ALI) conditions to form well-differentiated, pseudostratified cultures, following previously described methods[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Briefly, isolated HBE cells were maintained and expanded (one passage) in T75 flasks with bronchial epithelial cell expansion medium (AEGM, 05040, STEMCELL Technologies) at 37\u0026deg;C in a 5% carbon dioxide (CO\u003csub\u003e2\u003c/sub\u003e) incubator. At 80% confluence, cells were detached with 0.05% trypsin-ethylenediaminetetracetic acid (EDTA; Gibco) and seeded on membrane supports (12 mm Transwell culture inserts, 0.4 \u0026micro;m pore size, Costar) coated with 0.05 mg collagen from calf skin (Sigma\u0026ndash;Aldrich) in AEGM supplemented with 1% penicillin/streptomycin. HBE cells were cultured for two days until they reached complete confluence. The apical medium was removed, and the basal medium was replaced by an ALI culture medium (05001, STEMCELL Technologies). Cultures were maintained under ALI conditions by changing the medium in the basal filter chamber three times a week. For cigarette smoke extract (CSE) treatment, a 1.023 mg/mL stock solution was diluted to 0.02 mg/mL. Epithelial cells were cultured in a differentiation medium containing CSE at 37\u0026deg;C in a 5% CO\u003csub\u003e2\u003c/sub\u003e incubator from day 5 to day 14. For the rescue experiment, epithelial cells were cultured in a differentiation medium containing CSE at 37\u0026deg;C in a 5% CO\u003csub\u003e2\u003c/sub\u003e incubator from day 5 to day 14. The medium was replaced every 24 h before collection for analysis.\u003c/p\u003e\n\u003ch3\u003eRibonucleic acid extraction, complementary deoxyribonucleic acid synthesis, and quantitative real-time polymerase chain reaction\u003c/h3\u003e\n\u003cp\u003eRibonucleic acid (RNA) extraction from lung tissues and cells was performed using a commercially available RNA isolation kit, following the manufacturer's recommended protocol[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Complementary deoxyribonucleic acid synthesis was conducted using a reverse transcription kit designed for quantitative polymerase chain reaction (qPCR) applications, with 1,000 ng of total RNA as the starting material. Quantitative real-time PCR (qRT-PCR) was conducted using a SYBR green-based PCR master mix on an RT-PCR detection system. Gene expression levels were quantified using the comparative CT (2\u003csup\u003e\u0026ndash;∆∆CT\u003c/sup\u003e) method, with glyceraldehyde-3-phosphate dehydrogenase (GAPDH) as the endogenous control. A complete list of primer sequences used in this study is provided in Supplementary Data 28.\u003c/p\u003e\n\u003ch3\u003eWestern blotting (WB)\u003c/h3\u003e\n\u003cp\u003eWB analysis was conducted using previously established protocols[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Briefly, protein samples from cell lines and lung tissue were prepared using radioimmunoprecipitation assay lysis buffer (Catalog # 89901, Thermo, USA) with added protease inhibitors (Catalog # 78430, Thermo, USA) and incubated at 4\u0026deg;C for 20 min. Proteins were separated on a 10% sodium dodecyl sulfate-polyacrylamide gel electrophoresis and transferred to polyvinylidene difluoride membranes (BioRad, USA). Membranes were blocked and incubated overnight at 4\u0026deg;C with primary antibodies against TIMP4 (Catalog # 12326-1-AP, proteintech, China), MMP9 (Catalog # 13667, CST, USA), fibronectin-1 (FN1; Catalog # 26836, CST, USA), β-catenin (Catalog # 9562, CST, USA), non-p-β-catenin (Catalog # 4176, CST, USA), and GAPDH (Catalog # 60004-1-Ig, proteintech, China). After washing, membranes were incubated with a horseradish peroxidase-conjugated secondary antibody (Proteintech) and visualized using enhanced chemiluminescence on an Amersham Imager 680 (Thermo Fisher Scientific, USA).\u003c/p\u003e\n\u003ch3\u003eMultiple immunofluorescence assay of HBE cells\u003c/h3\u003e\n\u003cp\u003eALI cultures were fixed in 4% paraformaldehyde overnight at 4\u0026deg;C, then incubated in a permeabilization solution (0.2% Triton X-100 in phosphate-buffered saline [PBS]) for 15 min. Subsequently, cultures were blocked with 10% goat serum, PBS, and 3% bovine serum albumin solutions for 1 h at room temperature (RT). Primary antibodies were applied and incubated overnight at 4\u0026deg;C, followed by washing and incubating with secondary antibodies for 1 h at RT. Cultures were then stained with 4',6-diamidino-2-phenylindole (DAPI) for 10 min at RT before being mounted for imaging. The following primary antibodies were used: Mouse anti-acetylated α-tubulin (1:1500, Sigma, T7451), mouse anti-TIMP4 (Catalog # 12326-1-AP, Proteintech, China), and rabbit anti-MUC5AC (1:200, Abcam, ab3649). Secondary antibodies included goat anti-mouse immunoglobulin G (IgG; H\u0026thinsp;+\u0026thinsp;L) cross-adsorbed secondary antibody, Alexa Fluor\u0026trade; 488/568/647, and goat anti-rabbit IgG (H\u0026thinsp;+\u0026thinsp;L) cross-adsorbed secondary antibody, Alexa Fluor\u0026trade; 488/568/647.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eMessenger RNA microarray chip datasets and bioinformatics\u003c/h2\u003e \u003cp\u003eSeveral microarray datasets, such as GSE47460[\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e], GSE76925[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], GSE103174[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], GSE239897[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], and GSE37147[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e], were obtained from the Gene Expression Omnibus (GEO) repository. These datasets used various platforms: GPL14550 for GSE47460 (108 controls, 220 COPD samples), GPL10558 for GSE76925 (40 controls, 111 COPD samples), GPL13667 for GSE103174 (21 controls, 44 COPD samples), and GPL17303 for GSE239897 (40 controls, 111 COPD samples). For GSE37147, we excluded patients with a recent history of using inhaled medications, resulting in a final cohort of 136 controls and 63 patients with COPD.\u003c/p\u003e \u003cp\u003eData visualization and analysis were performed using R packages, such as \"ggplot2\" for volcano plots, \"ggbiplot\" for principal component analysis, and \"corrplot\" for gene correlation assessments. Differentially expressed genes (DEGs) were identified using the limma package from the R/Bioconductor. Significance criteria varied as follows: \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05 and fold changes\u0026thinsp;\u0026gt;\u0026thinsp;0.2 for GSE47460, while \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05 and fold changes\u0026thinsp;\u0026gt;\u0026thinsp;0.4 for GSE76925.\u003c/p\u003e \u003cp\u003eFunctional annotation of genes was performed using gene ontology enrichment analysis (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://geneontology.org\u003c/span\u003e\u003cspan address=\"http://geneontology.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) and Kyoto encyclopedia of genes and genomes pathway analysis (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://kegg.jp\u003c/span\u003e\u003cspan address=\"http://kegg.jp\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Gene set enrichment analysis (GSEA) was performed using dedicated software[\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], and multi-dataset integration was achieved using Metascape (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://metascape.org/\u003c/span\u003e\u003cspan address=\"http://metascape.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eProtein-protein interaction (PPI) networks were constructed using the STRING database[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://string-db.org\u003c/span\u003e\u003cspan address=\"http://string-db.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e) and visualized with Cytoscape software (version 3.8.3). Weighted gene co-expression network analysis (WGCNA) was conducted on GSE47460 and GSE76925 datasets and RNA-seq using the 'WGCNA' R package[\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. Networks were correlated with COPD clinical status and various pulmonary function parameters, including FEV1 percentage of predicted value (FEV1%pre), FVC percentage of predicted value (FVC%pre), FEV1/FVC ratio, and low attenuation area percentage (%LAA950) value.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eMachine learning\u003c/h3\u003e\n\u003cp\u003eWe implemented a multi-tiered machine learning approach to elucidate gene signatures associated with COPD. In the initial phase, four distinct algorithms were employed: Support vector machine recursive feature elimination (SVM-RFE), least absolute shrinkage and selection operator (LASSO) model, elastic net, and random forest model. Gene selection was guided by the optimal lambda value (λ) within one standard error of the minimum error; the λ value was determined through 10-fold cross-validation using the \"glmnet\" R packages[\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. We utilized an extensive array of machine-learning algorithms for model construction and validation. These included ensemble methods (Voting, GradientBoosting, Adaptive Boosting, Extra Tree, Random Forest, Bootstrap Aggregating), decision tree-based approaches, probabilistic models (Na\u0026iuml;ve Bayes), instance-based learning (K-Nearest Neighbors), SVM, gradient descent methods (Stochastic Gradient Descent), various regression techniques (Logistic Regression, BayesianRidge, ElasticNet, LASSO, Linear_Lasso, Ridge Regression with Cross-validation, Ridge_Regression, Linear_Regression), and neural network approaches (Artificial Neural Network). Model selection was performed by ranking TrainSet Accuracy and TestSet Accuracy values across distinct datasets. The definitive gene set represents consensus features identified through LASSO, RFE, Random Forest (RF), and Elastic Net algorithms across multiple cohorts.\u003c/p\u003e \u003cp\u003eFollowing hub gene identification, all subsequent analyses were performed using R software (version 4.0.3). We employed several R packages, including \"caret,\" \"e1071,\" \"glmnet,\" \"tree,\" \"randomForest,\" \"adabag,\" \"nnet,\" \"xgboost,\" and \"ggplot2,\" to implement the algorithms and visualize results.\u003c/p\u003e\n\u003ch3\u003eSingle-cell RNA-seq analysis\u003c/h3\u003e\n\u003cp\u003eWe obtained single-cell RNA-sequencing (scRNA-seq) data for mouse lung tissue from the GEO database (GSE168299[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]), comprising eight samples (Air\u0026thinsp;=\u0026thinsp;4, Smoke\u0026thinsp;=\u0026thinsp;4) with 41,099 cells and 20,832 detected genes. Human lung tissue scRNA-seq data were retrieved from GSE173869[\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e], including 12 samples (non-smoker\u0026thinsp;=\u0026thinsp;3, COPD\u0026thinsp;=\u0026thinsp;9) with 39,425 cells and 33,538 detected genes. Cell type-specific marker genes were previously established for both datasets. Analysis and visualization of scRNA-seq data were performed using R and Seurat Package (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://satijalab.org/seurat/\u003c/span\u003e\u003cspan address=\"https://satijalab.org/seurat/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). The analytical pipeline followed established protocols for data normalization, dimensionality reduction, and clustering. DEGs from various cell subsets underwent Reactome enrichment analysis (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://reactome.org/\u003c/span\u003e\u003cspan address=\"https://reactome.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). Intercellular communication was analyzed using the \"CellChat\" R package (version 1.1.3), while pseudotime analysis was validated using the \"Monocle 2\" R package. Visualization was accomplished using the \"ggplot2\" R package.\u003c/p\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eRNA-seq and bioinformatics\u003c/h2\u003e \u003cp\u003eRNA samples were sequenced at Wekomo (China) using the Illumina system (San Diego, USA). The resulting RNA-seq data were aligned to the Ensembl (version 105) transcript annotations. The \"Limma\" package in R software was employed to identify DEGs. The time series analysis of gene expression was performed with the \u003cem\u003eMfuzz\u003c/em\u003e software.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eCSE extraction and preparation\u003c/h2\u003e \u003cp\u003eCSE was produced by a commercial combustible cigarette (Hongmei, Hongta Group, China), as described previously [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Mainstream cigarette smoke (CS) was generated using a Cerulean CETI 8 MK3 smoking machine (CERULEAN, UK), following ISO 20778:2018 standards, with a 55 mL puff volume, 2 s duration, and a 30 s interval. The mainstream smoke was passed through two collection vessels containing 2 \u0026times; 20 mL of Dulbecco's Modified Eagle Medium/Nutrient Mixture F-12 medium, then combined and shaken for 20 min to obtain an aqueous CSE. The extract was filtered twice through a 0.22 \u0026micro;m membrane, aliquoted, and stored in the dark at \u0026minus;\u0026thinsp;80\u0026deg;C. The nicotine concentration in the CSE was measured by gas chromatography-mass spectrometry by Shenzhen Fogcore Technology Co., Ltd. and determined to be 1.36 mg/mL.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eCell culture\u003c/h2\u003e \u003cp\u003eCell culture was performed according to the operating manuals for PneumaCult\u0026trade;-Ex Plus Medium (Catalog # 05040) and PneumaCult\u0026trade;-ALI Medium (Catalog # 05001) from STEMCELL Technologies as previously described[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. The ALI cultures were established using Transwell plates (Catalog # 3460, Corning). Primary airway epithelial cells were cultured at 37\u0026deg;C with 5% CO\u003csub\u003e2\u003c/sub\u003e until 80% confluence of cell colonies was achieved, followed by dissociation with TrypLE\u0026trade; Express (Gibco) and seeding at a density of 2.3 \u0026times; 10\u003csup\u003e5\u003c/sup\u003e cells/cm\u003csup\u003e2\u003c/sup\u003e. Transwell plates were maintained at 37\u0026deg;C with 5% CO\u003csub\u003e2\u003c/sub\u003e, with medium changes every two days. Once cells reach 100% confluence (typically 4\u0026ndash;6 days), the apical medium is removed, and the basal medium is replaced with a differentiation medium, marking day 0 of differentiation. Medium changes were performed every two days until the formation of visible ciliary beating on day 21. The CSE was added to the basolateral chamber at a 0.02 mg/mL concentration on day 6.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eLentivirus infection\u003c/h2\u003e \u003cp\u003e The lentivirus used in this study was provided by Yunzhou Biotechnology Co., Ltd. (Guangzhou, China). The TIMP4 overexpression vector was constructed as pLV [Exp]-EF1A\u0026thinsp;\u0026gt;\u0026thinsp;hTIMP4NM_003256.4:3xGGGGS: mCherry (ns): P2A: Puro, while the control vector was pLV[Exp]-EF1A\u0026thinsp;\u0026gt;\u0026thinsp;mCherry(ns): P2A: Puro. Primary airway basal cells were used for lentiviral infection in the second to third passage. When the airway basal cells reached approximately 80% confluence, they were dissociated, and 500,000 cells were transferred to a T75 flask. The lentiviral solution was added at a multiplicity of infection of 10, and the cells were cultured in PneumaCult\u0026trade;-Ex Plus Medium supplemented with 5 \u0026micro;M Y-27632 for 16 h. After incubation, the viral-containing medium was removed, and the cells were washed twice with PBS, followed by continued culture in fresh PneumaCult\u0026trade;-Ex Plus medium with medium changes every two days. Once the cells reached 60\u0026ndash;80% confluence, puromycin was added at a concentration of 2 \u0026micro;g/mL for selection. After four days of selection, puromycin was maintained at 1 \u0026micro;g/mL. When the airway basal cells reached 80\u0026ndash;90% confluence, they were transferred to the ALI model for differentiation, with 1 \u0026micro;g/mL puromycin continuously added to the differentiation medium.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eTIMP4 measurement\u003c/h2\u003e \u003cp\u003ePeripheral blood samples from 185 non-COPD and 116 COPD from the early COPD (ECOPD) study (the Chinese Clinical Trial Registry, ChiCTR1900024643)[\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e] were randomly included in this analysis. The data of some of the participants have been previously published. Peripheral blood samples were obtained by trained staff in EDTA collection tubes and centrifuged at 3,000 rpm for 10 min at RT, and the supernatants were stored at \u0026minus;\u0026thinsp;80\u0026deg;C. Plasma TIMP4 was measured using the TIMP-4 ELISA kit (CSB-E04735h, CUSABIO, China).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eGenome-wide association data sources\u003c/h2\u003e \u003cp\u003eOur study analyzed genetic associations with COPD-related phenotypes using Integrative Epidemiology Unit (IEU) Open genome-wide association (GWAS) project data. We accessed single nucleotide polymorphisms (SNPs) linked to \u003cem\u003eFEV1/FVC\u003c/em\u003e[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e], \u003cem\u003eFEV1\u003c/em\u003e, \u003cem\u003eFVC\u003c/em\u003e, \u003cem\u003eFEV1/FVC\u0026thinsp;\u0026lt;\u0026thinsp;0.7\u003c/em\u003e[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e] and COPD diagnosis from GWAS catalog entries[\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. Data for \u003cem\u003eFEV1/FVC ratio\u003c/em\u003e (n\u0026thinsp;=\u0026thinsp;321,047), FEV1 (n\u0026thinsp;=\u0026thinsp;321,047), and FVC (n\u0026thinsp;=\u0026thinsp;321,047) were obtained from separate GWAS. Additional GWAS data included \u003cem\u003eFEV1/FVC\u0026thinsp;\u0026lt;\u0026thinsp;0.7\u003c/em\u003e (cases: 55,907, controls: 297,408) and COPD diagnosis (cases: 26,710, controls: 334,484). The expression quantitative trait loci (eQTL) analysis was conducted for selected genes using datasets from 515 individuals with lung tissue and 755 individuals with whole blood, sourced from the GTEx_v8 database (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.gtexportal.org/home\u003c/span\u003e\u003cspan address=\"https://www.gtexportal.org/home\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e)[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. Additionally, the eQTLs associated with the selected genes served as proxies for increased expression of these genes.\u003c/p\u003e \u003cp\u003eAll GWAS datasets were accessed through the IEU GWAS database (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://gwas.mrcieu.ac.uk/\u003c/span\u003e\u003cspan address=\"https://gwas.mrcieu.ac.uk/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e). This approach was used to explore the genetic basis of COPD and related phenotypes using large-scale data.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eSummary-data-based MR analyses\u003c/h2\u003e \u003cp\u003eOur study employed summary-data-based MR (SMR) and heterogeneity in dependent instruments (HEIDI) tests within cis-regulatory regions using SMR software[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e]. This approach uses a single-nucleotide variant at a primary xQTL as an instrumental variable, combined with summary-level eQTL and GWAS data, to explore potential causal or pleiotropic relationships between gene expression and traits of interest. We applied standard SMR software settings, including a \u003cem\u003ep\u003c/em\u003e-value threshold of 5.0 \u0026times; 10\u003csup\u003e\u0026ndash;8\u003c/sup\u003e for top eQTL selection and a 1 Mb window around the probe center for cis-eQTL identification. All analyses were restricted to cis-regulatory regions. Statistical significance was determined by a \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05 for SMR, while for HEIDI, a \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05 suggested significant linkage. This methodology investigated genetic associations and potential causal relationships in COPD-related phenotypes, providing a nuanced interpretation of the data.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eTwo-sample MR\u003c/h2\u003e \u003cp\u003eOur MR analysis primarily used the inverse-variance weighted (IVW) method, supplemented by MR-Egger, weighted median, simple mode, and weighted mode approach[\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. We assessed heterogeneity across individual causal effects using Cochran's Q statistic in MR-Egger and IVW methods, with \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05 indicating significant heterogeneity. Horizontal pleiotropy was evaluated using MR-Egger regression and MR-PRESSO. An MR-Egger intercept near zero with \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.05 suggested the absence of directional horizontal pleiotropy, while \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026gt;\u0026thinsp;0.05 in the MR-PRESSO global test indicated no evidence of horizontal pleiotropic outliers.\u003c/p\u003e \u003cp\u003eWe conducted leave-one-out sensitivity analyses to ensure robustness and employed Steiger filtering to verify causal directionality. All statistical analyses were performed in R software using the 'TwoSampleMR' package[\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e], with a significance threshold of \u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05. In cases of significant heterogeneity, we applied a random effects model for IVW estimates.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eStatistical analyses\u003c/h2\u003e \u003cp\u003eAll statistical analyses were performed using GraphPad Prism (version 8.0.1) or Medcalc (version 23.0.1) software. The two-tailed paired Student's t-test, two-tailed Mann\u0026ndash;Whitney test, one-way ANOVA, and two-tailed Pearson correlation were used to determine the significance between means. Linear regression models were implemented to assess associations between gene expression levels and clinical parameters. Receiver operating characteristic (ROC) curve analysis was performed to calculate area under the curve (AUC) values. \u003cem\u003eP\u003c/em\u003e-values were represented as follows: ns (not significant), *\u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05, **\u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.01, ***\u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.001, and **\u003cem\u003ep\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.0001.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003eIdentification of common gene signature in lung tissue sequencing from two patients with COPD cohorts\u003c/h2\u003e \u003cp\u003eThe analytical workflow of this study is schematically summarized in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. We analyzed lung tissue sequencing data from two patients with COPD cohorts: The GSE47460 (108 controls, 220 patients with COPD) and GSE76925 (37 smoker controls and 110 COPD patients with smoking history). The WGCNA was performed on both datasets (Figs.\u0026nbsp;\u0026lt;link rid=\"fig2\"\u0026gt;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u0026lt;/link\u0026gt;\u003c/span\u003eA and S1-\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e), yielding 24 and 16 gene modules for GSE47460 and GSE76925, respectively. We then correlated these modules with COPD status, lung function, and CT indicators. In GSE47460, modules \"\u003cem\u003emagenta\u003c/em\u003e,\" \"\u003cem\u003etan\u003c/em\u003e,\" \"\u003cem\u003emidnightblue\u003c/em\u003e,\" \"\u003cem\u003eskyblue\u003c/em\u003e,\" \"\u003cem\u003ebrown\u003c/em\u003e,\" and \"\u003cem\u003edarkgreen\u003c/em\u003e\" negatively correlated with COPD status and \u003cem\u003eLAA950%\u003c/em\u003e, but positively with lung function including \u003cem\u003eFEV1%pred\u003c/em\u003e and \u003cem\u003eFVC%pred\u003c/em\u003e. Conversely, \"\u003cem\u003elightgreen\u003c/em\u003e,\" \"\u003cem\u003eblack\u003c/em\u003e,\" \"\u003cem\u003ewhite\u003c/em\u003e,\" and \"\u003cem\u003epink\u003c/em\u003e\" modules indicated opposite correlations (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA and Supplementary Data 1). Analysis of GSE76925 revealed similar patterns: \"\u003cem\u003ebrown\u003c/em\u003e\" and \"\u003cem\u003egreen\u003c/em\u003e\" modules negatively correlated with COPD status and \u003cem\u003eLAA950%\u003c/em\u003e but positively with lung function (\u003cem\u003eFEV1%pred\u003c/em\u003e and \u003cem\u003eFEV1/FVC\u003c/em\u003e). The \"\u003cem\u003eturquoise\u003c/em\u003e\" module exhibited inverse correlations (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA and Supplementary Data 2).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe performed DEG analysis (COPD versus control) on both datasets. The GSE47460 yielded 890 upregulated and 863 downregulated genes (Figures \u003cspan refid=\"MOESM3\" class=\"InternalRef\"\u003eS3\u003c/span\u003eA-B), while GSE76925 indicated 438 upregulated and 1,141 downregulated genes (Figures \u003cspan refid=\"MOESM3\" class=\"InternalRef\"\u003eS3\u003c/span\u003eC-D and Supplementary Data 3). Enrichment analysis of these gene sets revealed that negatively correlated genes were associated with regulating the \u003cem\u003eWnt signaling pathway\u003c/em\u003e, \u003cem\u003esecretion\u003c/em\u003e, \u003cem\u003eextracellular matrix organization\u003c/em\u003e, and \u003cem\u003ecell-cell adhesion\u003c/em\u003e. Positively correlated genes were linked to the \u003cem\u003eregulation of leukocyte migration\u003c/em\u003e, \u003cem\u003eepithelial cell proliferation\u003c/em\u003e, \u003cem\u003epositive cytokine production\u003c/em\u003e, and \u003cem\u003eNABA CORE MATRISOME functions\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB). We then intersected these DEGs with the modules significantly correlated with COPD status, CT indicators, and lung function. In GSE47460, we identified 310 COPD-positively correlated and 611 COPD-negatively correlated overlapping genes (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC). GSE76925 yielded 94 COPD-positively correlated and 191 COPD-negatively correlated overlapping genes (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eD and Supplementary Data 4).\u003c/p\u003e \u003cp\u003eWe performed LASSO analysis on the overlapping gene sets, stratified by COPD status, to further identify key gene clusters crucial in COPD. This yielded 21 and 17 potential hub genes, respectively (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eE and Supplemental Data 5). These gene sets indicated no overlap, underscoring the heterogeneity in COPD sequencing results across different cohorts. We combined these LASSO gene sets into a 38-gene signature to identify hub genes functioning consistently across diverse COPD cohorts. We then applied various machine learning models (LASSO, Elasticnet, Random Forest, and SVM-RFE) to both datasets using this signature (Figs.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eF and S4). By intersecting genes obtained from the same machine learning method across both datasets, we identified 5, 4, 11, and 6 overlapping genes, respectively (Supplemental Data 6). Integration of these results yielded a final set of 13 genes: \u003cem\u003eANGPTL1, DUSP26, FGG, GAS2, VEGFD, BHLHE22, SYNGR1, TIMP4, CXCL12, GEMIN5, SV2B, HTR2B, and TMEM117\u003c/em\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003eA model constructed with 13 genes that accurately identify COPD\u003c/h2\u003e \u003cp\u003eSubsequently, we divided the two lung tissue sequencing datasets into training and validation sets. We evaluated the performance of 20 different machine learning methods using the identified 13-gene signature. All 20 models demonstrated the ability to effectively distinguish between COPD and non-COPD populations in independent lung tissue sequencing datasets using the 13-gene signature (Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA-B). However, only \u003cem\u003eExtra Trees\u003c/em\u003e and \u003cem\u003eRandom Forest\u003c/em\u003e methods consistently indicated high accuracy in both training and test sets across both datasets, with test set accuracies exceeding 0.8 (Supplemental Data 7). Based on these results, we selected extra trees and random forest models for further analysis, as they demonstrated high and consistent performance in discriminating COPD status across different cohorts.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe first generated nomogram scores for each gene in both models across the two datasets, revealing broadly similar gene scores between the datasets (Figures S5A-B). \u003cem\u003eExtra Trees\u003c/em\u003e and \u003cem\u003eRandom Forest\u003c/em\u003e models demonstrated good discrimination among patients with COPD, indicating better performance in predicting COPD cases but higher error rates in predicting controls (Figures S5C-D). To validate the models' reliability, we applied the models constructed using GSE47460 to predict outcomes in GSE76925. Both extra trees and random forest models demonstrated high accuracy in the area under the curve (AUC\u003csub\u003eET\u003c/sub\u003e = 0.871, AUC\u003csub\u003eRF\u003c/sub\u003e = 0.859). The reverse validation also yielded high accuracy (AUC\u003csub\u003eET\u003c/sub\u003e = 0.789, AUC\u003csub\u003eRF\u003c/sub\u003e = 0.786), confirming the models' reliability (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC).\u003c/p\u003e \u003cp\u003eTo further validate our model's accuracy in diverse COPD populations, we incorporated lung tissue sequencing data from two external cohorts: GSE103174 (21 control and 44 COPD samples) and GSE239869 (43 control and 39 COPD samples). Both models, constructed using the 13-gene signature, demonstrated robust performance in these cohorts. For GSE103174, \u003cem\u003eExtra Trees\u003c/em\u003e and \u003cem\u003eRandom Forest\u003c/em\u003e models achieved AUC values of 0.693 and 0.71, respectively. In GSE239869, the models indicated even stronger predictive power (AUC\u003csub\u003eET\u003c/sub\u003e = 0.883, AUC\u003csub\u003eRF\u003c/sub\u003e = 0.862) (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eD).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe analyzed the expression of the 13 hub genes in GSE47460 and GSE76925 datasets. All genes, except \u003cem\u003eVEGFD\u003c/em\u003e in GSE47460, demonstrated significant differences between control and COPD groups, with most exhibiting similar trends across datasets (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eE). We assessed correlations between these genes, lung function, and CT indicators using 13 linear regression models. Across models and datasets, the genes were significantly associated with FEV1%pre and LA950% (Figs.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eF and S6\u0026ndash;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e8\u003c/span\u003e). These results were used to validate the stability and specificity of our 13-gene model in patients with COPD and its significant correlation with clinical characteristics.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eThe expression distribution of the 13 genes used to construct the model across various lung cell subtypes in both humans and mice\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo explore the expression patterns of the 13 selected genes across lung cell subpopulations, we analyzed single-cell sequencing data from patients with COPD lung tissue (GSE173896), comprising four non-smokers and nine patients with COPD. Using the original cell clustering strategy, we identified 22 cell subpopulations (Figs.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA and S9A and Supplementary Data 8). We calculated expression scores for each cell subpopulation based on the integrated expression of the 13 genes (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB). Results demonstrated expression in all subpopulations, with alveolar and adventitial fibroblasts scoring highest (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eC). Expression scores were significantly increased in CD8T cells, CD4T cells, natural killer cells, macrophages, neutrophils, AT1, AT2, monocytes, ciliated cells, B cells, and pericytes/smooth muscle actin (SMA) cells. Conversely, scores were significantly lower in alveolar and adventitial fibroblasts, dendritic cells, plasma cells, and vascular endothelial (VE) cells (Figure S10B). Individual gene analysis indicated \u003cem\u003eBHLHE22, CXCL12, VEGFD\u003c/em\u003e, and \u003cem\u003eANGPTL1\u003c/em\u003e highly expressed in fibroblasts; GAS2 and TIMP4 in ciliated cells; TMEM117 in macrophages; \u003cem\u003eSYANGR1\u003c/em\u003e in immune cells; \u003cem\u003eFGG\u003c/em\u003e in AT2 cells; and \u003cem\u003eDUSP26\u003c/em\u003e in plasma cells. The \u003cem\u003eTMEM117, HTR2B\u003c/em\u003e, and \u003cem\u003eSV2B\u003c/em\u003e revealed low expression across all cell types (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eD).\u003c/p\u003e \u003cp\u003eTo validate our findings from human lung tissue, we analyzed single-cell sequencing data from a mouse COPD model (GSE168299), including four air-exposed and four CS-exposed mice. We used the original cell clustering strategy, identifying 27 cell subpopulations (Figs.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE and S10A and Supplementary Data 9); consistent with human data, all cell types indicated expression, with alveolar and adventitial fibroblasts exhibiting the highest scores, followed by endothelial cells (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eF-G). However, significant score differences between groups were limited to alveolar fibroblasts, basal cells, B cells, ciliated cells, club cells, some endothelial cells, macrophages, and monocytes (Figure S10B). Several genes indicated similar primary expression patterns in mouse and human lung cell subpopulations. However, \u003cem\u003eSV2B, ANGPTL1\u003c/em\u003e, and \u003cem\u003eGAS2\u003c/em\u003e exhibited low expression across all mouse lung cell types, differing from human results. Additionally, \u003cem\u003eCXCL12\u003c/em\u003e indicated high expression in endothelial cells, and DUSP26 was expressed in mouse ciliated cells (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eH).\u003c/p\u003e \u003cp\u003e \u003cb\u003eMR analysis was used to identify TIMP4 as a potential hub gene influencing COPD progression within the signature\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo identify potentially causal genes within our signature for COPD, we conducted MR analyses (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eA). We extracted SNPs associated with the 13 genes from GTEx_v8 as instrumental variables for lung tissue and whole blood. Outcomes included \u003cem\u003eFEV1/FVC\u0026thinsp;\u0026lt;\u0026thinsp;0.7\u003c/em\u003e, \u003cem\u003ephysician-diagnosed COPD\u003c/em\u003e, \u003cem\u003eFEV1\u003c/em\u003e, \u003cem\u003eFVC\u003c/em\u003e, and \u003cem\u003eFEV1/FVC\u003c/em\u003e from GWAS studies (Supplementary Data 10). We performed two-sample MR (TSMR) and SMR analyses, including HEIDI tests for summary-level data. \u003cem\u003eANGPTL1, BHLHE22, FGG, HTR2B\u003c/em\u003e, and \u003cem\u003eTIMP4\u003c/em\u003e lacked suitable instruments in peripheral blood. Furthermore, BHLHE22, GEMIN5, and \u003cem\u003eTMEM117\u003c/em\u003e did not have suitable instrumental variables for the SMR analysis of lung tissue.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003eANGPTL1\u003c/em\u003e in lung tissue was positively associated with \u003cem\u003eFVC\u003c/em\u003e and \u003cem\u003eFEV1\u003c/em\u003e and negatively with \u003cem\u003eFEV1/FVC\u0026thinsp;\u0026lt;\u0026thinsp;0.7\u003c/em\u003e. FGG indicated significance only in TSMR, positively associating with lung function and negatively with \u003cem\u003eCOPD diagnosis\u003c/em\u003e. \u003cem\u003eHRT2B\u003c/em\u003e in lung tissue is positively associated with \u003cem\u003eFEV1\u003c/em\u003e and \u003cem\u003eFEV1/FVC\u003c/em\u003e, negatively with \u003cem\u003eFEV1/FVC\u0026thinsp;\u0026lt;\u0026thinsp;0.7\u003c/em\u003e, and positively with COPD diagnosis. \u003cem\u003eGEMIN5\u003c/em\u003e in both lung and blood is negatively associated with lung function and COPD diagnosis in TSMR. \u003cem\u003eSV2B\u003c/em\u003e in lung tissue is positively associated with \u003cem\u003eFVC, FEV1, FEV1/FVC\u0026thinsp;\u0026lt;\u0026thinsp;0.7\u003c/em\u003e, and \u003cem\u003eCOPD diagnosis\u003c/em\u003e in TSMR. The \u003cem\u003eTIMP4\u003c/em\u003e in lung tissue is negatively associated with lung function and positively with \u003cem\u003eCOPD diagnosis\u003c/em\u003e in both TSMR and SMR. \u003cem\u003eTMEM117\u003c/em\u003e in lung tissue is negatively associated with lung function and positively with \u003cem\u003eCOPD diagnosis\u003c/em\u003e in TSMR and SMR. Notably, the above results indicated no heterogeneity, but only \u003cem\u003eBHLHE22, FGG, SV2B, SYNGR1, TIMP4\u003c/em\u003e, and \u003cem\u003eTMEM117\u003c/em\u003e displayed largely absent horizontal pleiotropy (Figs.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB-C and S11 and Supplementary Data 11\u0026ndash;17).\u003c/p\u003e \u003cp\u003eWe constructed a receiver operating characteristic (ROC) curve model using the 13 genes, achieving AUCs of 0.883 and 0.966 in two datasets (Figures S12A). Considering the MR results and excluding genes with non-significant outcomes, we selected \u003cem\u003eSVB2, TIMP4\u003c/em\u003e, and \u003cem\u003eANGPTL1\u003c/em\u003e for further investigation. These three genes maintained reasonable AUC values in both datasets. The ROC model using only these genes achieved AUCs of 0.767 and 0.881 (Figures S12B and Supplementary Data 18), highlighting their significance within the gene set.\u003c/p\u003e \u003cp\u003eWe then evaluated correlations between these three genes and both datasets' lung function and CT indicators. \u003cem\u003eTIMP4\u003c/em\u003e and \u003cem\u003eSV2B\u003c/em\u003e negatively correlated with lung function and positively with \u003cem\u003e%LAA950\u003c/em\u003e in both datasets. \u003cem\u003eANGPTL1\u003c/em\u003e positively correlated with lung function in both datasets and negatively with \u003cem\u003e%LAA950\u003c/em\u003e, though only significantly in GSE76925 (Figures S13A-B). Further analysis of their expression in primary cell types revealed that \u003cem\u003eANGPTL1\u003c/em\u003e expression decreased in alveolar fibroblasts but remained unchanged in adventitial fibroblasts (Figure S13C). SV2B expression indicated non-significant changes in alveolar or adventitial fibroblasts (Figure S13D). TIMP4 expression increased in ciliated cells of the COPD group (Figure S13E).\u003c/p\u003e \u003cp\u003eGiven that \u003cem\u003eSV2B\u003c/em\u003e demonstrated no differential expression in alveolar or adventitial fibroblasts, \u003cem\u003eSV2B\u003c/em\u003e and \u003cem\u003eANGPTL1\u003c/em\u003e exhibited weak expression across various cell subpopulations in mouse lung tissue, which hampers subsequent single-cell analysis. We selected \u003cem\u003eTIMP4\u003c/em\u003e as a potential key pathogenic gene for further study, which is highly expressed in ciliated cells in both human and mouse lung tissues. We observed that \u003cem\u003eTIMP4\u003c/em\u003e expression progressively increased with higher Global Initiative for Chronic Obstructive Lung Disease (GOLD) stages in patients with COPD (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eD). However, there was a non-significant difference in expression between GOLD 3 and GOLD 4 (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eE). Previous single-cell results indicated that TIMP4 is primarily expressed in ciliated cells. We analyzed bronchial brush samples from patients with COPD (COPD\u0026thinsp;=\u0026thinsp;64, smokers\u0026thinsp;=\u0026thinsp;137) to validate \u003cem\u003eTIMP4\u003c/em\u003e expression. Our analysis revealed that TIMP4 expression was also elevated in the brush samples from patients with COPD (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eF). While smoking similarly increased \u003cem\u003eTIMP4\u003c/em\u003e expression (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eG), the high expression of \u003cem\u003eTIMP4\u003c/em\u003e in patients with COPD was found to be independent of smoking status (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eH).\u003c/p\u003e \u003cp\u003eWe randomly selected 301 samples from the \u003cem\u003eECOPD study cohort\u003c/em\u003e, which included 116 patients with COPD and 185 non-COPD controls, to investigate whether \u003cem\u003eTIMP4\u003c/em\u003e expression in peripheral blood reflects that observed in lung tissue and to validate the results from lung tissue and airway brush samples and to confirm the findings from MR analysis (Figure S14A). \u003cem\u003eTIMP4\u003c/em\u003e secretion levels were measured in peripheral plasma. However, the area under the curve (AUC) value derived from the receiver operating characteristic (ROC) analysis did not demonstrate a high discriminative capacity, suggesting potential limitations in its standalone predictive performance (Figure S14B).\u003c/p\u003e \u003cp\u003eWe employed two models to assess the relationship between \u003cem\u003eTIMP4\u003c/em\u003e and clinical features: correlation analysis (\u003cem\u003eModel 1\u003c/em\u003e) and regression analysis (\u003cem\u003eModel 2\u003c/em\u003e). In both models, \u003cem\u003eTIMP4\u003c/em\u003e levels were negatively correlated with several lung function measures, including pre-bronchodilator spirometry \u003cem\u003eFEV1 (pre_FEV1), post-bronchodilator spirometry FEV1 (post_FEV1), maximum mid-expiratory flow (pre_MMEF), post_MMEF, pre-bronchodilator spirometry forced expiratory flow at 50% of FVC (pre_FEF50), post_FEF50, pre-bronchodilator spirometry forced expiratory flow at 75% of FVC (pre_FEF75), post_FEF75\u003c/em\u003e, and \u003cem\u003epost_FEV1/FVC\u003c/em\u003e (Figs.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eJ and S14C). However, non-significant correlation was found with FVC. Additionally, TIMP4 expression demonstrated significant negative correlations with CT metrics, including \u003cem\u003eexpiratory LAA-856\u003c/em\u003e, \u003cem\u003einspiratory LAA-950\u003c/em\u003e, and \u003cem\u003epercent residual maximum flow at specific airway diameter\u003c/em\u003e (\u003cem\u003ePRMfSAD\u003c/em\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eK). In contrast, TIMP4 expression indicated a non-significant association with questionnaire scores, including the \u003cem\u003eCOPD assessment test\u003c/em\u003e and \u003cem\u003eclinical COPD questionnaire\u003c/em\u003e (Supplementary Data 19). These results suggest that \u003cem\u003eTIMP4\u003c/em\u003e is elevated across various samples in patients with COPD and is significantly associated with lung function and CT indicators, suggesting an essential role for \u003cem\u003eTIMP4\u003c/em\u003e in COPD progression.\u003c/p\u003e \u003cp\u003e \u003cb\u003eTIMP4-positive ciliated cells exhibit stronger effects from\u003c/b\u003e \u003cb\u003eWNT\u003c/b\u003e \u003cb\u003eand\u003c/b\u003e \u003cb\u003enon-canonical WNT\u003c/b\u003e \u003cb\u003esignaling pathways in both incoming and outgoing signals\u003c/b\u003e\u003c/p\u003e \u003cp\u003eUsing single-cell sequencing data from human and mouse lung tissues, we observed that \u003cem\u003eTIMP4\u003c/em\u003e is highly expressed in ciliated cells and indicates some expression in fibroblasts and pericytes or SMA cells (Figs.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e6\u003c/span\u003eA-B). We categorized ciliated cells into TIMP4-positive (TIMP4_Pos) and TIMP4-negative (TIMP4_Neg) subsets and analyzed the DEGs between them (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e6\u003c/span\u003eC). In human ciliated cells, TIMP4_Pos exhibited 1,267 significantly upregulated genes and 219 downregulated genes, while in mice, they exhibited 513 upregulated and 229 downregulated genes (Supplementary Data 20). An enrichment analysis of the upregulated genes in TIMP4_Pos ciliated cells revealed significant associations with cilium formation, ciliary structure, and ciliary movement in both datasets (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e6\u003c/span\u003eD and Supplementary Data 21). The PPI network highlighted cilium-related functions at the core (Figure S15A). Notably, 141 overlapping upregulated genes were identified between humans and mice (Figure S15B), suggesting similar functional characteristics despite species differences. Additionally, downregulated genes in TIMP4_Pos ciliated cells were primarily \u003cem\u003eenriched in interferon signaling\u003c/em\u003e, \u003cem\u003eVEGFA-VEGFR2 signaling\u003c/em\u003e, and \u003cem\u003eL13a-mediated translational silencing of ceruloplasmin expression\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e6\u003c/span\u003eE), with 11 overlapping genes found across both datasets (Figure S15C and Supplementary Data 22).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSubsequently, we analyzed intercellular communication between TIMP4_Pos and TIMP4_Neg ciliated cells and other cell subpopulations. Our findings revealed that in both human and mouse sequencing data, the signal intensity of \u003cem\u003eWNT\u003c/em\u003e and \u003cem\u003enon-canonical WNT pathways\u003c/em\u003e was significantly more vigorous in TIMP4_Pos ciliated cells than TIMP4_Neg ciliated cells, both in incoming and outgoing signaling patterns (Figs.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eF and S16).\u003c/p\u003e \u003cp\u003eIncreasing evidence suggests that \u003cem\u003eWNT\u003c/em\u003e and \u003cem\u003enon-canonical WNT pathways\u003c/em\u003e play crucial roles in epithelial cell functions in patients with COPD[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e], including tissue repair and epithelial-mesenchymal transition (EMT)[\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. We analyzed the receptor-ligand interactions of \u003cem\u003eWNT\u003c/em\u003e and \u003cem\u003enon-canonical WNT pathways\u003c/em\u003e in TIMP4_Pos and TIMP4_Neg ciliated cells. In the human sequencing data for the WNT pathway, alveolar fibroblasts, AT2 cells, basal or club cells, and B plasma cells revealed a stronger effect on TIMP4_Pos ciliated cells. Conversely, TIMP4_Pos ciliated cells significantly impacted AT1, AT2, basal or club cells, VE veins, and ciliated cells (Figs.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eG and S17A). For the non-canonical \u003cem\u003eWNT pathway\u003c/em\u003e, basal or club cells exerted a stronger effect on TIMP4_Pos ciliated cells, which also influenced AT1, AT2, basal or club cells, VE veins, and ciliated cells (Figs.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eG and S17B and Supplementary Data 23). In the mouse sequencing data, the WNT pathway indicated that basal cells exhibited a significant effect on TIMP4_Pos ciliated cells, while the latter exhibited greater effects on fibroblasts, endothelial cells, and ciliated cells (Figures S18A-B). In the \u003cem\u003enon-canonical WNT pathway\u003c/em\u003e, TIMP4-positive ciliated cells indicated a stronger effect on endothelial cells (Figures S17C-D and Supplementary Data 24). Moreover, we examined the expression differences of WNT and non-canonical WNT ligands and receptors in TIMP4_Pos and TIMP4_Neg ciliated cells. In human sequencing, WNT7B, WNT9A, FZD3, FZD6, LRP6, and WNT5B were observed to be highly expressed in TIMP4_Pos cells (Figures S19A-B). In mouse sequencing, WNT4, WNT7b, FZD6, FZD3, LRP6, WNT5a, and WNT11 were similarly elevated in TIMP4-positive cells (Figures S20A and B). The results indicate that TIMP4-positive ciliated cells may exert functional effects on ciliated cells through \u003cem\u003eWNT\u003c/em\u003e or \u003cem\u003enon-canonical WNT pathways\u003c/em\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003eTIMP4 overexpression in airway epithelial cells leads to a reduction in the expression of ciliated genes\u003c/b\u003e \u003c/p\u003e \u003cp\u003eSubsequently, we conducted immunohistochemical analysis and confirmed that TIMP4 expression is elevated in the lung tissue of patients with COPD, with its primary localization in epithelial cells (Figure S21A). Furthermore, RNA levels of TIMP4 are increased in bronchial brushings from patients with COPD (Figure S21B). We constructed a lentivirus carrying a TIMP4 overexpression vector further to investigate the role of TIMP4 in epithelial cells. We transduced primary airway epithelial cells obtained from bronchial brushings with the TIMP4 vector, followed by culturing the cells at the ALI in a Transwell system. On day 6 of differentiation, we stimulated the cells with CSE and collected them on day 28 (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e7\u003c/span\u003eA). Multiplex immunofluorescence (IF) confirmed the successful establishment of the ALI model (Figure S21C). Notably, we observed diminished cilia fluorescence in cells with high TIMP4 expression and validated the efficient overexpression of TIMP4 in epithelial cells at the ALI (Figures S21D-E). Subsequently, we performed transcriptome sequencing on four groups of epithelial cells at ALI: \u003cem\u003eCON_PBS\u003c/em\u003e, \u003cem\u003eTIMP4_PBS\u003c/em\u003e, \u003cem\u003eCON_CSE\u003c/em\u003e, and \u003cem\u003eTIMP4_CSE\u003c/em\u003e (Figure S22A). The expression of DEGs was evident in both PBS and CSE groups (Figures S22B-C). We performed GSEA on CON_PBS versus TIMP4_PBS and CON_CSE versus TIMP4_CSE groups. The TIMP4_PBS group is significantly associated with the FN matrix formation and interleukin (IL)-18 signaling pathways (Figure S22D). In the TIMP4_CSE group, significant positive correlations were observed with the IL-36 signaling, FN matrix formation, and LTR4/CysLTR-mediated IL-4 production pathways (Figure S22E). Notably, the latter has been reported to be closely associated with airway inflammation, particularly in asthma[\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSubsequently, we conducted WGCNA and categorized all genes into 13 modules (Figures S22F-G). Correlation analysis based on group characteristics revealed that the orangered4 module was positively correlated with the TIMP4_CSE group, while the \u003cem\u003edarkmagenta\u003c/em\u003e and \u003cem\u003emediumpurple3 modules\u003c/em\u003e indicated significant negative correlations with this group (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e7\u003c/span\u003eB). We performed enrichment analysis on the genes from these three modules. The darkmagenta module, containing the highest number of genes (Supplementary Data 25), was predominantly enriched in cilia-related processes, including \u003cem\u003ecilium organization\u003c/em\u003e, \u003cem\u003eciliopathies\u003c/em\u003e, \u003cem\u003ecilium assembly\u003c/em\u003e, and \u003cem\u003eepithelial cilium movement\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e7\u003c/span\u003eC). The mediumpurple3 module was associated with \u003cem\u003epositive regulation of endothelial cell migration\u003c/em\u003e and \u003cem\u003etransport along microtubules\u003c/em\u003e (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e7\u003c/span\u003eD), while the orangered4 module was linked to hemostasis (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e7\u003c/span\u003eE). By utilizing the 235 genes identified by Anirudh Patir et al. as closely linked to ciliary function, we found that 160 of these genes intersected with the darkmagenta module (Figure S22H and Supplementary Data 26). Notably, these genes were significantly downregulated upon CSE stimulation, with a more pronounced downregulation observed in the TIMP4_CSE group compared to the CON_CSE group (Fig.\u0026nbsp;\u003cspan refid=\"Fig8\" class=\"InternalRef\"\u003e7\u003c/span\u003eF).\u003c/p\u003e \u003cp\u003eBesides, we analyzed the genes from the three identified modules concerning DEG sets of ciliated cells from single-cell RNA sequencing of human lung tissues. This analysis revealed 169 overlapping downregulated genes and 67 overlapping upregulated genes (Figure S23A). Enrichment analysis indicated that the downregulated genes were primarily associated with ciliary function. In contrast, the upregulated genes were enriched in processes related to \u003cem\u003enegative regulation of cell proliferation\u003c/em\u003e, \u003cem\u003eresponse to bacteria\u003c/em\u003e, and \u003cem\u003eresponse to reactive oxygen species\u003c/em\u003e (Figure S23B). We identified eight distinct clusters using the Mfuzz package for clustering based on gene expression changes (Figure S23C and Supplementary Data 27). Among the genes in the three WGCNA modules, cluster 2 contained the most overlapping genes, comprising 58.08% of the WGCNA gene set (Figure S23D), and was significantly enriched in ciliary functions (Figure S23E).\u003c/p\u003e \u003cp\u003eThese findings suggest that gene expression changes induced by TIMP4 overexpression in airway epithelial cells cultured at ALI are primarily enriched in ciliary-related pathways, highlighting a solid connection between TIMP4 expression and ciliary structure and function.\u003c/p\u003e \u003cdiv id=\"Sec23\" class=\"Section3\"\u003e \u003ch2\u003eOverexpression of TIMP4 reduces cilia in airway epithelial cells cultured at ALI\u003c/h2\u003e \u003cp\u003eWe conducted an IF analysis on primary epithelial cells cultured at ALI based on these results. The fluorescence intensity of the ciliary markers α-tubulin and muc5ac in TIMP4 and NC groups of the PBS cohort indicated non-significant changes. However, the TIMP4_CSE group exhibited a notable reduction in α-tubulin compared to the CON_CSE group, while the increase in muc5ac fluorescence was statistically non-significant (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e8\u003c/span\u003eA). These findings suggest a solid connection between \u003cem\u003eTIMP4\u003c/em\u003e and ciliary structure and function. Based on sequencing data and previous studies, we identified critical genes associated with ciliary function, including \u003cem\u003eRFX3, TTLL6, HYDIN, SPAG16, SPAG17, DNAAF1, DNAH11, CFAP44, and CFAP65\u003c/em\u003e. In sequencing data from airway brush samples, TIMP4 expression was negatively correlated with the genes above, indicating statistically significant correlations with \u003cem\u003eTTLL6, HYDIN, DNAAF1, DNAH11, CFAP44\u003c/em\u003e, and \u003cem\u003eSPAG17\u003c/em\u003e (Figure S24A). Further correlation analyses in COPD and control groups revealed that \u003cem\u003eTIMP4\u003c/em\u003e was significantly negatively correlated with \u003cem\u003eSPAG17, HYDIN\u003c/em\u003e, and \u003cem\u003eDNAH11\u003c/em\u003e (Figure S24B). Furthermore, we validated the expression of these ciliary genes in cells cultured at ALI. The findings indicated that CSE stimulation significantly decreased the expression of most ciliary genes. Notably, RNA levels of \u003cem\u003eRFX3, TTLL6, DNAAF1, HYDIN\u003c/em\u003e, and \u003cem\u003eSPAG17\u003c/em\u003e were markedly reduced in the TIMP4_CSE group compared to the CON_CSE group (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e8\u003c/span\u003eB), while expression levels of \u003cem\u003eSPAG17\u003c/em\u003e, \u003cem\u003eDNAH11, CFAP65\u003c/em\u003e, and \u003cem\u003eCFAP44\u003c/em\u003e did not differ significantly between the two groups (Figure S25A).\u003c/p\u003e \u003cp\u003eAs a TIMP family member, TIMP4 primarily interacts with MMPs to exert its effects. We explored the correlation between TIMP4 and MMPs in patients with COPD and found significant positive correlations with MMP1, MMP3, MMP8, MMP9, and MMP10 in the GSE47460 dataset (Figure S25B). In ALI sequencing results, MMP9 and MMP10 levels were notably elevated in the TIMP4_CSE group compared to the CON_CSE group (Figure S25C). Furthermore, the correlation between TIMP4 and MMP9 as well as MMP10 was more pronounced in the COPD group within the lung tissue sequencing data (Figure S25D). Prior research indicated that MMP9 and MMP10 are upregulated in patients with COPD and are often associated with epithelial cell EMT and the \u003cem\u003eWNT/β-catenin\u003c/em\u003e pathway. In ALI cells, the levels of MMP9 and FN1, a marker of epithelial cell EMT, were significantly higher in the TIMP4_CSE group than in the CON_CSE group (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e8\u003c/span\u003eC). Furthermore, \u003cem\u003eβ-catenin\u003c/em\u003e and \u003cem\u003ephosphorylated β-catenin\u003c/em\u003e expression levels were elevated in the TIMP4_CSE group (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e8\u003c/span\u003eD). These results suggest that the elevated expression of TIMP4 under CSE stimulation may impair ciliated cell function, potentially by regulating MMP9 and the β-catenin pathway.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eCOPD is a heterogeneous lung disease characterized by chronic respiratory symptoms and persistent airflow obstruction[\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. However, classifying the diverse population of patients with COPD into distinct subtypes poses significant challenges[\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. COPD subtypes can be categorized based on various phenotypes, including those related to disease progression (rapid decline in lung function), exacerbation patterns (frequent acute exacerbators), and comorbidity-related phenotypes (COPD-asthma overlap and COPD-bronchiectasis overlap)[\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]. Recently, there has been growing focus on the type 2 inflammation phenotype of COPD, characterized by eosinophilia, which underscores the need for a more nuanced understanding of this disease[\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe substantial phenotypic divergence among COPD patients\u0026mdash;even those meeting standardized diagnostic criteria\u0026mdash;underscores the limitations of relying solely on clinical symptoms and spirometry for disease stratification[\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. This variability presents substantial challenges that complicate early intervention and treatment. Advancing in sequencing technologies and genomics drive research into the genetic factors underlying COPD, aiming to elucidate its pathogenesis[\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e] and enable classifying patients with COPD into subtypes for more precise and targeted therapeutic approaches[\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eGlobal sequencing initiatives in COPD lung tissue have yielded numerous putative pathogenic candidates, yet marked interstudy discordance persists among reported genes[\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. This variability stems from technical confounders (specimen procurement protocols, biopsy localization, platform variability) compounded by patient subphenotypic diversity.[\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e]. Emerging meta-analytic strategies\u0026mdash;including cross-cohort DEG intersection [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e] and rank-rank abundance normalization[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] \u0026mdash; enhance reproducibility by prioritizing genes with conserved differential expression. However, such approaches risk excluding functionally pivotal genes exhibiting non-linear expression dynamics or failing to meet arbitrary fold-change thresholds[\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eContemporary clinical research increasingly deploys ML-driven frameworks to identify prognostic biomarkers and construct predictive models. Hendrik Pott's COSYCONET model demonstrated IL-6 and CRP thresholdspredict 3-year COPD exacerbation risk [\u003cspan citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e]. Genomic integration strategies now synergize clinical parameters with computational analytics: Justin Sui's single-cell resolution analysis identified gelsolin as a key COPD mediator through neural network-driven feature selection[\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e], while Erkang Yi's multimodal integration revealed CLEC5A's role in early-stage COPD via gradient boosting classifiers[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eOur multi-cohort machine learning framework integrating clinical and transcriptomic data from distinct COPD populations revealed striking molecular heterogeneity, evidenced by the complete lack of overlapping candidate genes between cohorts through LASSO regression analysis. To address this biological variability, we implemented a cross-cohort validation strategy combining feature selection models, ultimately identifying a 13-gene diagnostic signature containing both partially characterized (FGG, TIMP4, CXCL12, HTR2B) and novel COPD-associated targets (ANGPTL1, DUSP26, GAS2, VEGFD, BHLHE22, SYNGR1, GEMIN5, SV2B, TMEM117). Our results validate and extend key COPD-related findings: The elevated FGG expression corroborates Zhang et al.'s clinical observation[\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e], while our multi-omics feature selection reinforces Hao et al.'s reported association between TIMP4 levels and FEV1% decline/acute exacerbations[\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e]. CXCL12's inclusion suggests broader mechanistic implications beyond Roos et al.'s established IL-17A-mediated neutrophilic inflammation[\u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e], potentially involving alternative pathological dimensions. HTR2B's independent selection not only supports Li et al.'s comorbidity hypothesis [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e] but also reveals its potential role in non-neoplastic COPD progression. Currently, there have been no reported studies linking ANGPTL1, DUSP26, GAS2, VEGFD, BHLHE22, SYNGR1, GEMIN5, SV2B, and TMEM117 to COPD.\u003c/p\u003e \u003cp\u003eThe gene-derived models demonstrated significant correlations with pulmonary function decline and emphysema progression. Multi-tissue expression profiling revealed diagnostic score distribution across both structural (alveolar epithelial) and immune (macrophage/T-cell) compartments, substantiating COPD's multi-factorial pathomechanistic networks. While ensemble models showed strong COPD specificity (random forest: 82.3%; extra trees: 79.1%), reduced discrimination in non-COPD cohorts (specificity: 68.9%) suggests prevalent pre-symptomatic molecular signatures preceding functional impairment. These findings necessitate longitudinal validation to establish diagnostic thresholds for early-stage COPD/PRISM identification. Mendelian randomization (MR) capitalizes on genetic variants' random segregation during meiosis to mitigate confounding biases in causal inference[\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e]. Our application of multivariable MR (SMR/TSMR) revealed TIMP4\u0026mdash;a ciliated cell-enriched protease\u0026mdash;as a COPD-predisposing gene with dose-dependent effects on lung function decline. Despite limited instrument variable availability across tissues, rigorous colocalization analyses confirmed lung-specific causal effects distinct from blood-mediated pathways.\u003c/p\u003e \u003cp\u003eRespiratory cilia, essential for maintaining airway sterility through coordinated mucociliary clearance, exhibit pathologically reduced beat frequency[\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e] and dyssynchrony in COPD, creating a vicious cycle of retained pathogens, chronic inflammation, and progressive tissue remodeling[\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e].In addition, Transforming growth factor-beta and MMPs are crucial in COPD pathogenesis[\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e], numerous studies report persistent activation of the \u003cem\u003eWNT/β-catenin pathway\u003c/em\u003e in the airway epithelium of patients with COPD, promoting EMT of the airway epithelium. TIMP4, a member of the TIMP family, plays an essential role in regulating the homeostasis of the ECM [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e] but also modulates processes such as fibrosis[\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e] and inflammation[\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e]. While TIMPs have been extensively studied in cancer and vascular diseases, research on TIMP-4 is limited[\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]. Jenny Lutshumba et al. demonstrated that TIMP4 transcriptional activation can contribute to the development of abdominal aortic aneurysm[\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e]. Research on TIMP4 in pulmonary diseases, particularly COPD, is scarce. However, studies have indicated that TIMP-4 levels in exhaled breath condensate are elevated in patients with COPD compared to healthy controls and negatively correlated with predicted FEV1%[\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eAlthough we have identified TIMP4 as a potential pathogenic gene in COPD progression, our study did not explore its specific functional mechanisms, particularly \u003cem\u003ein vivo\u003c/em\u003e. The observed elevation of MMPs in COPD patients may trigger compensatory increases in TIMPs to counteract protease hyperactivity[\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e, \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e], as evidenced by the strong positive correlation between TIMP4 and MMPs levels in our analyses. We hypothesize that TIMP4 may play a role in ciliated cells based on MR and cohort sample results, as well as single-cell sequencing analysis. To investigate this, we used human primary epithelial cells cultured at an ALI and exposed to prolonged CSE to simulate authentic airway epithelial conditions. Under CSE stimulation, overexpression of TIMP4 significantly reduced the expression of cilia-related genes, a phenomenon not observed in the PBS control group. This suggests that TIMP4 may exert its effects only in response to external stimuli. We propose that TIMP4, through its impairment of ciliary function, exacerbates COPD progression - a maladaptive compensatory response to MMP dysregulation that paradoxically accelerates airway remodeling[\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e]. Our findings suggest TIMP4 contributes to COPD pathogenesis by disrupting MMP/anti-protease balance, leading to abnormal collagen deposition, and may exacerbate airflow limitation through ECM-mediated impairment of airway smooth muscle elasticity and mucociliary clearance. These hypothesized mechanisms require experimental validation, underscoring the need for future studies using ciliated cell-specific TIMP4 knockout models to establish its precise role in disease progression.\u003c/p\u003e \u003cp\u003eIn summary, we developed a model constructed using various machine learning methods to specifically identify COPD in multi-center patient cohorts. We identified TIMP4 as a potential pathogenic gene by integrating single-cell analysis of human lung tissues and lung tissues from COPD mouse models and MR analysis of GWAS data related to COPD and lung function. Additionally, we preliminarily confirmed its impact on ciliated cells in human primary epithelial cells cultured at ALI. Future research should investigate the specific role of TIMP4 in ciliated cells during COPD progression and uncover its underlying mechanisms. Further efforts should be made to evaluate TIMP4\u0026rsquo;s potential as a biomarker and early therapeutic target for COPD.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData availability\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets supporting the conclusions of this article are available in online repositories. Names of repositories/repositories and accession number(s) are provided in the article/Additional Material. The transcriptome sequencing data from this study have been deposited in the Genome Sequence Archive (GSA) under accession number OMIX009278. All other relevant materials are available from the corresponding author upon reasonable request.\u003c/p\u003e\n\u003cp\u003eFunding\u003c/p\u003e\n\u003cp\u003eThis study received grants from the National Natural Science Foundation of China (Nos.: 82200045, 82270043, and 81970045), the Foundation of Guangzhou National Laboratory (Nos.: SRPG23-005,\u0026nbsp;SRPG22-018 and SRPG22-016), the Youth Foundation of the National Key Laboratory of Respiratory Diseases (No.: SKLRD-Z-202326), and\u0026nbsp;the Postdoctoral Startup Foundation of Guangzhou City (grants awarded to Q.Y.L).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHuman Ethics and Consent to Participate declarations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was conducted in accordance with the ethical guidelines outlined in the Declaration of Helsinki and approved by the Ethics Committee of the First Affiliated Hospital of Guangzhou Medical University (approval number: 2020-51). The peripheral blood and airway brush cell samples\u0026nbsp;from COPD patients and the control group were sourced from the Early Chronic Obstructive Pulmonary Disease (ECOPD) study (Trial registration: Chinese Clinical Trial Registry ChiCTR190002464. Registered on 19 July 2019).\u0026nbsp;Written informed consent was obtained from all participants prior to specimen collection.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eContributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eErkang Yi, Haiqing Li, and Pixin Ran conceived and designed the experiments. Erkang Yi, Haiqing Li, Yu Liu, Qingyang Li, Ruining Sun, and Hairong Wang performed the experiments. Erkang Yi and Chengshu Xie performed bioinformatics analysis. Erkang Yi, Fan Wu, Kunning Zhou, Zhishan Deng and Xinru Ran performed clinical correlation analysis. Erkang Yi and Haiqing Li wrote the manuscript and analyzed the data. Erkang Yi, Qingyang Li, Yumin Zhou, and Pixin Ran supervised the research and acquired the funding.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflict of interest\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no conflicts of interest.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eWang C, Xu J, Yang L, Xu Y, Zhang X, Bai C, Kang J, Ran P, Shen H, Wen F\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003ePrevalence and risk factors of chronic obstructive pulmonary disease in China (the China Pulmonary Health [CPH] study): a national cross-sectional study\u003c/strong\u003e. \u003cem\u003eLancet \u003c/em\u003e2018, \u003cstrong\u003e391\u003c/strong\u003e(10131):1706-1717.\u003c/li\u003e\n\u003cli\u003eSin DD, Doiron D, Agusti A, Anzueto A, Barnes PJ, Celli BR, Criner GJ, Halpin D, Han MK, Martinez FJ\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eAir pollution and COPD: GOLD 2023 committee report\u003c/strong\u003e. \u003cem\u003eEur Respir J \u003c/em\u003e2023, \u003cstrong\u003e61\u003c/strong\u003e(5).\u003c/li\u003e\n\u003cli\u003eBelz DC, Putcha N, Alupo P, Siddharthan T, Baugh A, Hopkinson N, Castaldi P, Papi A, Mannino D, Miravitlles M\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eCall to Action: How Can We Promote the Development of New Pharmacologic Treatments in COPD?\u003c/strong\u003e \u003cem\u003eAm J Respir Crit Care Med \u003c/em\u003e2024.\u003c/li\u003e\n\u003cli\u003eEzzie ME, Crawford M, Cho JH, Orellana R, Zhang S, Gelinas R, Batte K, Yu L, Nuovo G, Galas D\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eGene expression networks in COPD: microRNA and mRNA regulation\u003c/strong\u003e. \u003cem\u003eThorax \u003c/em\u003e2012, \u003cstrong\u003e67\u003c/strong\u003e(2):122-131.\u003c/li\u003e\n\u003cli\u003eMorrow JD, Zhou X, Lao T, Jiang Z, DeMeo DL, Cho MH, Qiu W, Cloonan S, Pinto-Plata V, Celli B\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eFunctional interactors of three genome-wide association study genes are differentially expressed in severe chronic obstructive pulmonary disease lung tissue\u003c/strong\u003e. \u003cem\u003eSci Rep \u003c/em\u003e2017, \u003cstrong\u003e7\u003c/strong\u003e:44232.\u003c/li\u003e\n\u003cli\u003eCastaldi PJ, Benet M, Petersen H, Rafaels N, Finigan J, Paoletti M, Marike Boezen H, Vonk JM, Bowler R, Pistolesi M\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eDo COPD subtypes really exist? COPD heterogeneity and clustering in 10 independent cohorts\u003c/strong\u003e. \u003cem\u003eThorax \u003c/em\u003e2017, \u003cstrong\u003e72\u003c/strong\u003e(11):998-1006.\u003c/li\u003e\n\u003cli\u003eYi E, Cao W, Zhang J, Lin B, Wang Z, Wang X, Bai G, Mei X, Xie C, Jin J\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eGenetic screening of MMP1 as a potential pathogenic gene in chronic obstructive pulmonary disease\u003c/strong\u003e. \u003cem\u003eLife Sci \u003c/em\u003e2022:121214.\u003c/li\u003e\n\u003cli\u003eCao W, Li J, Che L, Yang R, Wu Z, Hu G, Zou W, Zhao Z, Zhou Y, Jiang X\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eSingle-cell transcriptomics reveals e-cigarette vapor-induced airway epithelial remodeling and injury\u003c/strong\u003e. \u003cem\u003eRespir Res \u003c/em\u003e2024, \u003cstrong\u003e25\u003c/strong\u003e(1):353.\u003c/li\u003e\n\u003cli\u003eLi Q, Liu Y, Wang X, Xie C, Mei X, Cao W, Guan W, Lin X, Xie X, Zhou C\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eThe influence of CLEC5A on early macrophage-mediated inflammation in COPD progression\u003c/strong\u003e. \u003cem\u003eCell Mol Life Sci \u003c/em\u003e2024, \u003cstrong\u003e81\u003c/strong\u003e(1):330.\u003c/li\u003e\n\u003cli\u003eYi E, Zhang J, Zheng M, Zhang Y, Liang C, Hao B, Hong W, Lin B, Pu J, Lin Z\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eLong noncoding RNA IL6-AS1 is highly expressed in chronic obstructive pulmonary disease and is associated with interleukin 6 by targeting miR-149-5p and early B-cell factor 1\u003c/strong\u003e. \u003cem\u003eClin Transl Med \u003c/em\u003e2021, \u003cstrong\u003e11\u003c/strong\u003e(7):e479.\u003c/li\u003e\n\u003cli\u003eYi E, Lin B, Zhang Y, Wang X, Zhang J, Liu Y, Jin J, Hong W, Lin Z, Cao W\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eSmad3-mediated lncRNA HSALR1 enhances the non-classic signalling pathway of TGF-beta1 in human bronchial fibroblasts by binding to HSP90AB1\u003c/strong\u003e. \u003cem\u003eClin Transl Med \u003c/em\u003e2023, \u003cstrong\u003e13\u003c/strong\u003e(6):e1292.\u003c/li\u003e\n\u003cli\u003eKim S, Herazo-Maya JD, Kang DD, Juan-Guardela BM, Tedrow J, Martinez FJ, Sciurba FC, Tseng GC, Kaminski N: \u003cstrong\u003eIntegrative phenotyping framework (iPF): integrative clustering of multiple omics data identifies novel lung disease subphenotypes\u003c/strong\u003e. \u003cem\u003eBMC Genomics \u003c/em\u003e2015, \u003cstrong\u003e16\u003c/strong\u003e:924.\u003c/li\u003e\n\u003cli\u003eCruz T, Lopez-Giraldo A, Noell G, Guirao A, Casas-Recasens S, Garcia T, Saco A, Sellares J, Agusti A, Faner R: \u003cstrong\u003eSmoking Impairs the Immunomodulatory Capacity of Lung-Resident Mesenchymal Stem Cells in Chronic Obstructive Pulmonary Disease\u003c/strong\u003e. \u003cem\u003eAm J Respir Cell Mol Biol \u003c/em\u003e2019, \u003cstrong\u003e61\u003c/strong\u003e(5):575-583.\u003c/li\u003e\n\u003cli\u003ede Fays C, Geudens V, Gyselinck I, Kerckhof P, Vermaut A, Goos T, Vermant M, Beeckmans H, Kaes J, Van Slambrouck J\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eMucosal immune alterations at the early onset of tissue destruction in chronic obstructive pulmonary disease\u003c/strong\u003e. \u003cem\u003eFront Immunol \u003c/em\u003e2023, \u003cstrong\u003e14\u003c/strong\u003e:1275845.\u003c/li\u003e\n\u003cli\u003eSteiling K, van den Berge M, Hijazi K, Florido R, Campbell J, Liu G, Xiao J, Zhang X, Duclos G, Drizik E\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eA dynamic bronchial airway gene expression signature of chronic obstructive pulmonary disease and lung function impairment\u003c/strong\u003e. \u003cem\u003eAm J Respir Crit Care Med \u003c/em\u003e2013, \u003cstrong\u003e187\u003c/strong\u003e(9):933-942.\u003c/li\u003e\n\u003cli\u003eSubramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eGene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles\u003c/strong\u003e. \u003cem\u003eProc Natl Acad Sci U S A \u003c/em\u003e2005, \u003cstrong\u003e102\u003c/strong\u003e(43):15545-15550.\u003c/li\u003e\n\u003cli\u003eSzklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, Gable AL, Fang T, Doncheva NT, Pyysalo S\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eThe STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest\u003c/strong\u003e. \u003cem\u003eNucleic Acids Res \u003c/em\u003e2023, \u003cstrong\u003e51\u003c/strong\u003e(D1):D638-D646.\u003c/li\u003e\n\u003cli\u003eLangfelder P, Horvath S: \u003cstrong\u003eWGCNA: an R package for weighted correlation network analysis\u003c/strong\u003e. \u003cem\u003eBMC Bioinformatics \u003c/em\u003e2008, \u003cstrong\u003e9\u003c/strong\u003e:559.\u003c/li\u003e\n\u003cli\u003eEngebretsen S, Bohlin J: \u003cstrong\u003eStatistical predictions with glmnet\u003c/strong\u003e. \u003cem\u003eClin Epigenetics \u003c/em\u003e2019, \u003cstrong\u003e11\u003c/strong\u003e(1):123.\u003c/li\u003e\n\u003cli\u003eSauler M, McDonough JE, Adams TS, Kothapalli N, Barnthaler T, Werder RB, Schupp JC, Nouws J, Robertson MJ, Coarfa C\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eCharacterization of the COPD alveolar niche using single-cell RNA sequencing\u003c/strong\u003e. \u003cem\u003eNat Commun \u003c/em\u003e2022, \u003cstrong\u003e13\u003c/strong\u003e(1):494.\u003c/li\u003e\n\u003cli\u003eWatanabe N, Fujita Y, Nakayama J, Mori Y, Kadota T, Hayashi Y, Shimomura I, Ohtsuka T, Okamoto K, Araya J\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eAnomalous Epithelial Variations and Ectopic Inflammatory Response in Chronic Obstructive Pulmonary Disease\u003c/strong\u003e. \u003cem\u003eAm J Respir Cell Mol Biol \u003c/em\u003e2022, \u003cstrong\u003e67\u003c/strong\u003e(6):708-719.\u003c/li\u003e\n\u003cli\u003eWu F, Zhou Y, Peng J, Deng Z, Wen X, Wang Z, Zheng Y, Tian H, Yang H, Huang P\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eRationale and design of the Early Chronic Obstructive Pulmonary Disease (ECOPD) study in Guangdong, China: a prospective observational cohort study\u003c/strong\u003e. \u003cem\u003eJ Thorac Dis \u003c/em\u003e2021, \u003cstrong\u003e13\u003c/strong\u003e(12):6924-6935.\u003c/li\u003e\n\u003cli\u003eShrine N, Guyatt AL, Erzurumluoglu AM, Jackson VE, Hobbs BD, Melbourne CA, Batini C, Fawcett KA, Song K, Sakornsakolpat P\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eNew genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries\u003c/strong\u003e. \u003cem\u003eNat Genet \u003c/em\u003e2019, \u003cstrong\u003e51\u003c/strong\u003e(3):481-493.\u003c/li\u003e\n\u003cli\u003eHigbee DH, Granell R, Sanderson E, Davey Smith G, Dodd JW: \u003cstrong\u003eLung function and cardiovascular disease: a two-sample Mendelian randomisation study\u003c/strong\u003e. \u003cem\u003eEur Respir J \u003c/em\u003e2021, \u003cstrong\u003e58\u003c/strong\u003e(3).\u003c/li\u003e\n\u003cli\u003eBurgess S, Thompson SG: \u003cstrong\u003eMultivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects\u003c/strong\u003e. \u003cem\u003eAm J Epidemiol \u003c/em\u003e2015, \u003cstrong\u003e181\u003c/strong\u003e(4):251-260.\u003c/li\u003e\n\u003cli\u003eConsortium GT: \u003cstrong\u003eHuman genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans\u003c/strong\u003e. \u003cem\u003eScience \u003c/em\u003e2015, \u003cstrong\u003e348\u003c/strong\u003e(6235):648-660.\u003c/li\u003e\n\u003cli\u003eZhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, Montgomery GW, Goddard ME, Wray NR, Visscher PM\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eIntegration of summary data from GWAS and eQTL studies predicts complex trait gene targets\u003c/strong\u003e. \u003cem\u003eNat Genet \u003c/em\u003e2016, \u003cstrong\u003e48\u003c/strong\u003e(5):481-487.\u003c/li\u003e\n\u003cli\u003eHemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, Laurin C, Burgess S, Bowden J, Langdon R\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eThe MR-Base platform supports systematic causal inference across the human phenome\u003c/strong\u003e. \u003cem\u003eElife \u003c/em\u003e2018, \u003cstrong\u003e7\u003c/strong\u003e.\u003c/li\u003e\n\u003cli\u003eHan Z, Tian R, Ren P, Zhou W, Wang P, Luo M, Jin S, Jiang Q: \u003cstrong\u003eParkinson\u0026apos;s disease and Alzheimer\u0026apos;s disease: a Mendelian randomization study\u003c/strong\u003e. \u003cem\u003eBMC Med Genet \u003c/em\u003e2018, \u003cstrong\u003e19\u003c/strong\u003e(Suppl 1):215.\u003c/li\u003e\n\u003cli\u003eBaarsma HA, Konigshoff M: \u003cstrong\u003e\u0026apos;WNT-er is coming\u0026apos;: WNT signalling in chronic lung diseases\u003c/strong\u003e. \u003cem\u003eThorax \u003c/em\u003e2017, \u003cstrong\u003e72\u003c/strong\u003e(8):746-759.\u003c/li\u003e\n\u003cli\u003eHeijink IH, de Bruin HG, Dennebos R, Jonker MR, Noordhoek JA, Brandsma CA, van den Berge M, Postma DS: \u003cstrong\u003eCigarette smoke-induced epithelial expression of WNT-5B: implications for COPD\u003c/strong\u003e. \u003cem\u003eEur Respir J \u003c/em\u003e2016, \u003cstrong\u003e48\u003c/strong\u003e(2):504-515.\u003c/li\u003e\n\u003cli\u003eGartner Y, Bitar L, Zipp F, Vogelaar CF: \u003cstrong\u003eInterleukin-4 as a therapeutic target\u003c/strong\u003e. \u003cem\u003ePharmacol Ther \u003c/em\u003e2023, \u003cstrong\u003e242\u003c/strong\u003e:108348.\u003c/li\u003e\n\u003cli\u003eAgusti A, Hogg JC: \u003cstrong\u003eUpdate on the Pathogenesis of Chronic Obstructive Pulmonary Disease\u003c/strong\u003e. \u003cem\u003eN Engl J Med \u003c/em\u003e2019, \u003cstrong\u003e381\u003c/strong\u003e(13):1248-1256.\u003c/li\u003e\n\u003cli\u003eCorlateanu A, Mendez Y, Wang Y, Garnica RJA, Botnaru V, Siafakas N: \u003cstrong\u003e\u0026quot;Chronic obstructive pulmonary disease and phenotypes: a state-of-the-art.\u0026quot;\u003c/strong\u003e. \u003cem\u003ePulmonology \u003c/em\u003e2020, \u003cstrong\u003e26\u003c/strong\u003e(2):95-100.\u003c/li\u003e\n\u003cli\u003eHan MK, Hanania NA, Martinez FJ: \u003cstrong\u003eConfronting the Challenge of COPD: What Is New in the Approaches to Diagnosis, Treatment, and Patient Outcomes\u003c/strong\u003e. \u003cem\u003eChest \u003c/em\u003e2018, \u003cstrong\u003e154\u003c/strong\u003e(4):984-985.\u003c/li\u003e\n\u003cli\u003eBhatt SP, Rabe KF, Hanania NA, Vogelmeier CF, Bafadhel M, Christenson SA, Papi A, Singh D, Laws E, Patel N\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eDupilumab for COPD with Blood Eosinophil Evidence of Type 2 Inflammation\u003c/strong\u003e. \u003cem\u003eN Engl J Med \u003c/em\u003e2024, \u003cstrong\u003e390\u003c/strong\u003e(24):2274-2283.\u003c/li\u003e\n\u003cli\u003eSegal LN, Martinez FJ: \u003cstrong\u003eChronic obstructive pulmonary disease subpopulations and phenotyping\u003c/strong\u003e. \u003cem\u003eJ Allergy Clin Immunol \u003c/em\u003e2018, \u003cstrong\u003e141\u003c/strong\u003e(6):1961-1971.\u003c/li\u003e\n\u003cli\u003eAgusti A, Celli B, Faner R: \u003cstrong\u003eWhat does endotyping mean for treatment in chronic obstructive pulmonary disease?\u003c/strong\u003e \u003cem\u003eLancet \u003c/em\u003e2017, \u003cstrong\u003e390\u003c/strong\u003e(10098):980-987.\u003c/li\u003e\n\u003cli\u003eLeung JM, Obeidat M, Sadatsafavi M, Sin DD: \u003cstrong\u003eIntroduction to precision medicine in COPD\u003c/strong\u003e. \u003cem\u003eEur Respir J \u003c/em\u003e2019, \u003cstrong\u003e53\u003c/strong\u003e(4).\u003c/li\u003e\n\u003cli\u003eRagland MF, Benway CJ, Lutz SM, Bowler RP, Hecker J, Hokanson JE, Crapo JD, Castaldi PJ, DeMeo DL, Hersh CP\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eGenetic Advances in Chronic Obstructive Pulmonary Disease. Insights from COPDGene\u003c/strong\u003e. \u003cem\u003eAm J Respir Crit Care Med \u003c/em\u003e2019, \u003cstrong\u003e200\u003c/strong\u003e(6):677-690.\u003c/li\u003e\n\u003cli\u003eZhu M, Ye M, Wang J, Ye L, Jin M: \u003cstrong\u003eConstruction of Potential miRNA-mRNA Regulatory Network in COPD Plasma by Bioinformatics Analysis\u003c/strong\u003e. \u003cem\u003eInt J Chron Obstruct Pulmon Dis \u003c/em\u003e2020, \u003cstrong\u003e15\u003c/strong\u003e:2135-2145.\u003c/li\u003e\n\u003cli\u003eStockley RA, Halpin DMG, Celli BR, Singh D: \u003cstrong\u003eChronic Obstructive Pulmonary Disease Biomarkers and Their Interpretation\u003c/strong\u003e. \u003cem\u003eAm J Respir Crit Care Med \u003c/em\u003e2019, \u003cstrong\u003e199\u003c/strong\u003e(10):1195-1204.\u003c/li\u003e\n\u003cli\u003ePott H, Weckler B, Gaffron S, Martin R, Maier D, Alter P, Biertz F, Speicher T, Bertrams W, Jung AL\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eDiffusion capacity and static hyperinflation as markers of disease progression predict 3-year mortality in COPD: Results from COSYCONET\u003c/strong\u003e. \u003cem\u003eRespirology \u003c/em\u003e2024.\u003c/li\u003e\n\u003cli\u003eSui J, Xiao H, Mbaekwe U, Ting NC, Murday K, Hu Q, Gregory AD, Kapellos TS, Yildirim AO, Konigshoff M\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eInterpretable machine learning uncovers epithelial transcriptional rewiring and a role for Gelsolin in COPD\u003c/strong\u003e. \u003cem\u003eJCI Insight \u003c/em\u003e2024.\u003c/li\u003e\n\u003cli\u003eZhang H, Li C, Song X, Cheng L, Liu Q, Zhang N, Wei L, Chung K, Adcock IM, Ling C\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eIntegrated analysis reveals lung fibrinogen gamma chain as a biomarker for chronic obstructive pulmonary disease\u003c/strong\u003e. \u003cem\u003eAnn Transl Med \u003c/em\u003e2021, \u003cstrong\u003e9\u003c/strong\u003e(24):1765.\u003c/li\u003e\n\u003cli\u003eHao W, Li M, Zhang Y, Zhang C, Wang P: \u003cstrong\u003eComparative Study of Cytokine Levels in Different Respiratory Samples in Mild-to-Moderate AECOPD Patients\u003c/strong\u003e. \u003cem\u003eLung \u003c/em\u003e2019, \u003cstrong\u003e197\u003c/strong\u003e(5):565-572.\u003c/li\u003e\n\u003cli\u003eRoos AB, Sanden C, Mori M, Bjermer L, Stampfli MR, Erjefalt JS: \u003cstrong\u003eIL-17A Is Elevated in End-Stage Chronic Obstructive Pulmonary Disease and Contributes to Cigarette Smoke-induced Lymphoid Neogenesis\u003c/strong\u003e. \u003cem\u003eAm J Respir Crit Care Med \u003c/em\u003e2015, \u003cstrong\u003e191\u003c/strong\u003e(11):1232-1241.\u003c/li\u003e\n\u003cli\u003eLi Y, Wang Y, Wu R, Li P, Cheng Z: \u003cstrong\u003eHTR2B as a novel biomarker of chronic obstructive pulmonary disease with lung squamous cell carcinoma\u003c/strong\u003e. \u003cem\u003eSci Rep \u003c/em\u003e2024, \u003cstrong\u003e14\u003c/strong\u003e(1):13206.\u003c/li\u003e\n\u003cli\u003eWalker VM, Davies NM, Hemani G, Zheng J, Haycock PC, Gaunt TR, Davey Smith G, Martin RM: \u003cstrong\u003eUsing the MR-Base platform to investigate risk factors and drug targets for thousands of phenotypes\u003c/strong\u003e. \u003cem\u003eWellcome Open Res \u003c/em\u003e2019, \u003cstrong\u003e4\u003c/strong\u003e:113.\u003c/li\u003e\n\u003cli\u003eLoges NT, Marthin JK, Raidt J, Olbrich H, Hoben IM, Cindric S, Bracht D, Konig J, Rieck C, George S\u003cem\u003e et al\u003c/em\u003e: \u003cstrong\u003eA range of 30-62% of functioning multiciliated airway cells is sufficient to maintain ciliary airway clearance\u003c/strong\u003e. \u003cem\u003eEur Respir J \u003c/em\u003e2024, \u003cstrong\u003e64\u003c/strong\u003e(4).\u003c/li\u003e\n\u003cli\u003eChurg A, Zhou S, Wright JL: \u003cstrong\u003eSeries \u0026quot;matrix metalloproteinases in lung health and disease\u0026quot;: Matrix metalloproteinases in COPD\u003c/strong\u003e. \u003cem\u003eEur Respir J \u003c/em\u003e2012, \u003cstrong\u003e39\u003c/strong\u003e(1):197-209.\u003c/li\u003e\n\u003cli\u003eMelendez-Zajgla J, Del Pozo L, Ceballos G, Maldonado V: \u003cstrong\u003eTissue inhibitor of metalloproteinases-4. The road less traveled\u003c/strong\u003e. \u003cem\u003eMol Cancer \u003c/em\u003e2008, \u003cstrong\u003e7\u003c/strong\u003e:85.\u003c/li\u003e\n\u003cli\u003eLutshumba J, Liu S, Zhong Y, Hou T, Daugherty A, Lu H, Guo Z, Gong MC: \u003cstrong\u003eDeletion of BMAL1 in Smooth Muscle Cells Protects Mice From Abdominal Aortic Aneurysms\u003c/strong\u003e. \u003cem\u003eArterioscler Thromb Vasc Biol \u003c/em\u003e2018, \u003cstrong\u003e38\u003c/strong\u003e(5):1063-1075.\u003c/li\u003e\n\u003cli\u003eHao W, Li M, Zhang C, Zhang Y, Wang P: \u003cstrong\u003eInflammatory mediators in exhaled breath condensate and peripheral blood of healthy donors and stable COPD patients\u003c/strong\u003e. \u003cem\u003eImmunopharmacol Immunotoxicol \u003c/em\u003e2019, \u003cstrong\u003e41\u003c/strong\u003e(2):224-230.\u003c/li\u003e\n\u003cli\u003eHao W, Li M, Zhang Y, Zhang C, Xue Y: \u003cstrong\u003eExpressions of MMP-12, TIMP-4, and Neutrophil Elastase in PBMCs and Exhaled Breath Condensate in Patients with COPD and Their Relationships with Disease Severity and Acute Exacerbations\u003c/strong\u003e. \u003cem\u003eJ Immunol Res \u003c/em\u003e2019, \u003cstrong\u003e2019\u003c/strong\u003e:7142438.\u003c/li\u003e\n\u003cli\u003eSun J, Bao J, Shi Y, Zhang B, Yuan L, Li J, Zhang L, Sun M, Zhang L, Sun W: \u003cstrong\u003eEffect of simvastatin on MMPs and TIMPs in cigarette smoke-induced rat COPD model\u003c/strong\u003e. \u003cem\u003eInt J Chron Obstruct Pulmon Dis \u003c/em\u003e2017, \u003cstrong\u003e12\u003c/strong\u003e:717-724.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"respiratory-research","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"rere","sideBox":"Learn more about [Respiratory Research](http://respiratory-research.biomedcentral.com/)","snPcode":"12931","submissionUrl":"https://submission.nature.com/new-submission/12931/3","title":"Respiratory Research","twitterHandle":"@RespiratoryBMC","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Chronic obstructive pulmonary disease, Machine learning, Mendelian randomization, Matrix metallopeptidase 4","lastPublishedDoi":"10.21203/rs.3.rs-5739006/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5739006/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eChronic obstructive pulmonary disease (COPD) is a heterogeneous syndrome, resulting in inconsistent findings across studies. Identifying a core set of genes consistently involved in COPD pathogenesis, independent of patient variability, is essential.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eWe integrated lung tissue sequencing data from patients with COPD across two centers. We used weighted gene co-expression network analysis and machine learning to identify 13 potential pathogenic genes common to both centers. Additionally, a gene-based model was constructed to distinguish COPD at the molecular level and validated in independent cohorts. Gene expression in specific cell types was analyzed, and Mendelian randomization was used to confirm associations between candidate genes and lung function/COPD.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eTissue inhibitor of metalloproteinase 4 (TIMP4) was identified as a key pathogenic gene and validated in COPD cohorts. Further analysis using single-cell sequencing from mice and patients with COPD revealed that TIMP4 is involved in ciliated cells. In primary human airway epithelial cells cultured at the air-liquid interface, TIMP4 overexpression reduced ciliated cell numbers.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eWe developed a 13-gene model for distinguishing COPD at the molecular level and identified TIMP4 as a potential hub pathogenic gene. This finding provides insights into shared disease mechanisms and positions TIMP4 as a promising therapeutic target for further investigation.\u003c/p\u003e","manuscriptTitle":"An integrated machine learning model of transcriptomic genes in multi-center chronic obstructive pulmonary disease reveals the causal role of TIMP4 in airway epithelial cell","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-10 05:37:13","doi":"10.21203/rs.3.rs-5739006/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Accepted","date":"2025-04-15T13:51:31+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-04-13T14:30:26+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"315005287149916953945468467920915481078","date":"2025-04-13T13:07:19+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-04-07T03:00:34+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-04-04T10:09:07+00:00","index":"","fulltext":""},{"type":"submitted","content":"Respiratory Research","date":"2025-04-02T09:12:57+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"respiratory-research","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"rere","sideBox":"Learn more about [Respiratory Research](http://respiratory-research.biomedcentral.com/)","snPcode":"12931","submissionUrl":"https://submission.nature.com/new-submission/12931/3","title":"Respiratory Research","twitterHandle":"@RespiratoryBMC","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"BMC/SO AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"7c253817-8090-4066-949b-1722abec9c80","owner":[],"postedDate":"April 10th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-04-28T16:13:04+00:00","versionOfRecord":{"articleIdentity":"rs-5739006","link":"https://doi.org/10.1186/s12931-025-03238-1","journal":{"identity":"respiratory-research","isVorOnly":false,"title":"Respiratory Research"},"publishedOn":"2025-04-23 15:57:43","publishedOnDateReadable":"April 23rd, 2025"},"versionCreatedAt":"2025-04-10 05:37:13","video":"","vorDoi":"10.1186/s12931-025-03238-1","vorDoiUrl":"https://doi.org/10.1186/s12931-025-03238-1","workflowStages":[]},"version":"v1","identity":"rs-5739006","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5739006","identity":"rs-5739006","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.