Metagenomic Analysis of Bronchoalveolar Lavage Fluid Enables Differential Diagnosis Between Lung Cancer and Pulmonary Infections

preprint OA: closed
Full text JSON View at publisher
Full text 187,398 characters · extracted from preprint-html · click to expand
Metagenomic Analysis of Bronchoalveolar Lavage Fluid Enables Differential Diagnosis Between Lung Cancer and Pulmonary Infections | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Metagenomic Analysis of Bronchoalveolar Lavage Fluid Enables Differential Diagnosis Between Lung Cancer and Pulmonary Infections Yu Chen, Dongsheng Han, Fei Yu, Bin Yang, Yifei Shen, Dan Zhang, and 11 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3883914/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Recent advances in unbiased metagenomic next-generation sequencing (mNGS) have enabled the simultaneous examination of both microbial and host genetic material in a single test. This study harnesses cost-effective bronchoalveolar lavage fluid (BALF) mNGS data from patients with lung cancer (n=123) and pulmonary infections (n=279). We developed a machine learning-based diagnostic approach to differentiate between these two conditions, which are often misdiagnosed in clinical settings. To ensure independence between model construction and validation, we divided the cohorts based on the collection dates of the samples. The training cohort (lung cancer, n=87; pulmonary infection, n=197) revealed distinct differences in DNA/RNA microbial composition, bacteriophage abundances, and host responses, including gene expression, transposable element levels, immune cell composition, and tumor fraction determined by copy number variation (CNV). These features, blinded to the validation cohort, were integrated into a host/microbe metagenomics-driven machine learning model (Model VI). The model demonstrated an Area Under the Curve (AUC) of 0.87 (95% CI = 0.857-0.883) in the training cohort and 0.831 (95% CI = 0.819-0.843) in the validation cohort for differentiating between patients with lung cancer and pulmonary infections. Applying a composite predictive model based on a rule-in and rule-out strategy significantly increased accuracy in distinguishing lung cancer from tuberculosis (ACC=0.913), fungal infection (ACC=0.955), and bacterial infection (ACC=0.836). These results underscore the potential of mNGS-based analysis as a valuable, cost-effective tool for the early differentiation of lung cancer from pulmonary infections, offering a comprehensive testing solution in a clinical context. Biological sciences/Microbiology/Infectious-disease diagnostics Biological sciences/Biological techniques/Sequencing/Next-generation sequencing metagenomic next-generation sequencing mNGS lung cancer pulmonary infections Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Lung cancer and pulmonary infections pose significant global health challenges with high incidence, mortality rates, and substantial socioeconomic burdens 1 , 2 . In the absence of rapid and accurate histopathological or microbiological test results, clinicians often find it challenging to distinguish between them based solely on clinical and radiological characteristics, leading to misdiagnosis and delays or errors in treatment 3 , 4 . Various pathogens causing pulmonary infections, such as bacteria ( Pseudomonas , Streptococcus ), mycobacteria ( Mycobacterium tuberculosis , Non-tuberculous mycobacteria ), aerobic actinomycetes ( Nocardia ), fungi ( Aspergillus , Mucor , cryptococcus ), and others, can mimic lung cancer, sharing indistinguishable clinical symptoms (e.g., dyspnea, fatigue, cough, and hemoptysis) and radiographic features (e.g., spiculated solid nodules or masses, cavities with nodular margins, and chest wall and mediastinal invasion) 3 , 5 . Consequently, clinicians often employ multiple testing methods to detect lung infections and cancer 6 . An affordable diagnostic method requiring fewer samples, aiding clinicians in quicker and accurate decisions, would greatly benefit patient treatment and management. Metagenomic Next-generation Sequencing (mNGS) is a sequencing technology capable of identifying pathogens in specimens with microbial nucleic acid concentrations beyond detection limits within 24 hours or even less 7 – 9 . In recent years, it has been widely employed in the diagnosis of various complex infectious diseases and has been confirmed a powerful tool with an excellent diagnostic accuracy in detecting pneumonia-related pathogens 10 – 12 . Excitingly, recent studies have confirmed that analyzing transcriptomic data derived from human sequences of mNGS testing can aid in distinguishing infectious diseases such as sepsis, acute respiratory infections, tuberculous meningitis, and non-infectious diseases 13 – 15 . Developing intelligent algorithms based on chromosomal instability and tumor-related copy number variations generated by mNGS data is useful to diagnose malignant tumors 16 – 18 . These studies prompt us to further contemplate whether it is possible to utilize mNGS data from respiratory tract samples to establish an integrative genomic diagnostic method that combines microbial and host response characteristics of the patients. This method is anticipated to identify pulmonary infectious diseases that can be mistaken for lung cancer without escalating patient testing expenses, utilizing minimal tests and samples, and within a relatively short timeframe. Here, we conducted mNGS testing on bronchoalveolar lavage fluid samples (BALF-mNGS) from 402 clinical patients with lung cancer or pulmonary infections. Subsequently, we analyzed the microbial information and host response information derived from metagenomic sequencing data, and based on this, we established and validated an integrated host/microbe metagenomics-driven machine learning approach for the differential diagnosis of lung cancer and pulmonary infections. Methods Study design, patient collection and ethics statement. Patients with suspected lung cancer or pulmonary infections were enrolled at the First Affiliated Hospital, Zhejiang University School of Medicine (FAHZU), a 5000-bed tertiary university hospital with a State Key Laboratory for Diagnosis and Treatment of Infectious Diseases located in southeastern China. Enrollment occurred between February 27, 2021, and May 27, 2023, for patients aged ≥ 18, requiring bronchoalveolar lavage fluid (BALF) samples within 72 hours of hospitalization to establish the etiology (Fig. 1 ). Exclusions involved cases with underlying leukemia, no definitive diagnosis post-extensive follow-up, or lacking matching DNA and RNA mNGS data from BALF samples (Fig. 1 ). A total of 123 lung cancer, 279 pulmonary infections including tuberculosis, fungal, and bacterial infections, and 32 negative control cases (e.g., immune pneumonitis, organizing pneumonia and drug-related pneumonia) were included. The diagnosis of lung cancer relies on clinical suspicion and positive laboratory results from tests cytology, flow cytometry and/or tissue biopsy. Pathological information of all samples was determined based on surgically resected tissue sections according to 2015 WHO Histological Classification of Lung Cancer 19 . The diagnosis of pulmonary infections is based on clinical suspicion and determination of the causative pathogen through standard microbiological diagnostics (e.g., cultures, antigen/antibody tests, PCR, sequencing). Archival material at FAHZU was retrospectively analyzed under no-patient contact protocols approved by the FAHZU Institutional Review Board (IIT20220714A). A written consent given prior to the procedure used to obtain the sample covered the use of residual samples for research. We constructed training cohort and validation cohort by time order of collecting date. We ranked all lung cancer samples by collection time and separated them into 7:3 proportion (Fig. 1 , Extend Data Table S1 ). On the other words, we can consider training and validation cohorts were two independent cohorts collected from different time. Also, we ranked all pulmonary infection samples and separated them into training cohort and validation cohort. Feature selections were operated in training cohort and blinded to validation cohort. DNA/RNA extraction, library construction and sequencing. For metagenomic sequencing (DNA sequencing), 1 mL of BALF sample was subjected to depletion of host nucleic acid using 1 U benzonase (Sigma) and 0.5% Tween 20 (Sigma) and incubation at 37°C for 5 min. A total of 600 µL of the mixture was transferred to new tubes containing 500 µL of ceramic beads for bead beating using a Minilys Personal TGrinder H24 Homogenizer (catalogue number: OSE-TH-01, Tiangen, China). Then, the nucleic acid from 400 µL of the pretreated sample was extracted and eluted in 60 µL elution buffer using a QIAamp UCP Pathogen Mini Kit (catalogue number: 50214, Qiagen, Germany). The extracted DNA was quantified using a Qubit dsDNA HS Assay Kit (catalogue number: Q32854, Invitrogen, USA) 9 . For metatranscriptome sequencing (RNA sequencing), 1 mL BALF sample was centrifuged at 12,000 rpm for 10 min. Then, 200 µL of the precipitate was lysed in TRIzol LS (Thermo Fisher Scientific, Carlsbad, CA, USA), followed by RNA extraction using a Direct-zol RNA Miniprep kit (Zymo Research, Irvine, CA, USA) according to the manufacturer's instructions 20 . According to the manufacturer's instructions, 30 µL DNA was used to generate libraries with the Nextera DNA Flex kit (Illumina, San Diego, CA, USA), and 10 µL of purified RNA was used for cDNA generation and library preparation with an Ovation Trio RNA-Seq Library Preparation Kit (NuGEN, CA, USA). A Qubit dsDNA HS Assay Kit was used to measure the library concentration. The library quality was assessed with an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and a High Sensitivity DNA kit. The library was sequenced using an Illumina NextSeq 550 sequencer with a 75-cycle single-end sequencing strategy 20 , 21 . Microbial composition analysis and bacteriophage annotation. As previous study described, we used a validated mNGS sequencing pipeline for microbial composition analysis 9 . In brief, Trimmomatic was used to remove low-quality, duplicate, and < 50 bp reads, as well as adapter contamination 22 . Kcomplexity removed low-complexity reads using using default parameters. Human sequences were excluded by mapping to human reference genome(hg38) using SNAP v1.0beta 8 . Kraken2 v.2.0.7 and Bracken v.2.5 created taxonomic profiles using default settings and the default database ( https://benlangmead.github.io/aws-indexes/k2 ) 9,23 . Sequencing reads for detected microbes were normalized as RPM (reads per million) to correct for various sequencing depths. The BALF mNGS data from 32 non-infection and non-cancer cases were used as negative controls (NC, Extended Data Table S2). Microorganisms found in NC samples and their relative abundance in different patient groups (lung cancer, infection and NCs) were shown in Extended Data Fig. 1 . Like what we did in previous research 9 , we calculated mean and standard deviation of species relative abundances in NC and preset positive cutoff value with mean + 3SD in NC. If a certain microbe in the mNGS data of patients with lung cancer or infection has a higher RPM than the cutoff value in NC, we defined it as positive and showed in our microbial count table. For bacteriophage annotation, the cleaned reads were aligned against a curated phage database (CPD) containing 26,159 phage representative genomes using BLAST (word size: 18, e-value: 0.0005, culling limit: 1) 24 . Phage counting in mNGS data relied on relative abundances 25 . Host gene expression, transposable elements expression, cell-type composition analysis. For the analysis of host gene expression, high-quality data were aligned to the human genome hg38 using HISAT2 with default parameters. Gene-level quantification was performed using FeatureCounts 26 , 27 . The gene counts were aggregated using the featureCounts program from the Subread package release 2.0.0 ( http://subread.sourceforge.net/ ) 21 . Additionally, trimmed clean reads were mapped using STAR with previously defined parameters 28 . TEtranscripts software was utilized to estimate the abundances of Transposable Elements (TE) and to conduct differential expression analysis. The GTF file containing transposable element annotations was obtained from https://hammelllab.labsites.cshl.edu/software/#TEtranscripts . Differentially expressed genes (DEGs) and TE were identified in each group using the DESeq2 package, applying criteria of FDR ≤ 0.05 and Fold-change ≥ 1.5 29 . Gene set enrichment analysis (GSEA) for DEGs was carried out using the REACTOME, KEGG, and GO databases 30 – 32 . Significantly enriched pathways or biological processes were determined based on Fisher's exact test (p-value < 0.05), following Benjamin and Hochberg's adjustment. For the identification of immune-related genes (IRGs), the list was obtained from the Import database ( https://www.immport.org/home ), while interferon-stimulated genes (ISGs) were retrieved from the referenced study 33 . To estimate the relative proportions of invasive immune cell types and infer the proportions of immune cells, the CIBERSORT algorithm was applied with the original gene signature file LM22 and 1000 permutations 34 . Latent variables were calculated by PLIER R package 35 . Copy number-derived tumor fractions calling. The DNA metagenomic sequencing data were used in downstream analyses to identify CNVs through the ichorCNA 36 . CNVkit and estimate software package to generate ctDNA tumor fractions as previously described and validated in tumor tissue and body fluid 37 , 38 . The ichorCNA ploidy parameter restart value was set to 2 and the maximum copy number to use was lowered to 3. The tumor fraction with the highest loglikelihood was retrieved and reported. Wilcoxon rank-sum test assessed the difference between each group's probability value. Host/microbe multi-dimension diagnosis modelling. Lefse and DESeq2 was used for investigating the association between disease types and microbial and host feature quantified using RNA and DNA data separately 39 . We first performed the univariate screening test to identify significant features associated between disease types and microbial DNA/RNA relative abundances, Transcripts per Millions (TPM) of host gene expression, TPM of transposable elements and tumor fraction/CNV of each chromosome, respectively. Within each type of data, given the adjusted P value cut-off were set to 0.05, the features with an adjusted P value less than the cut-off were selected and integrated as a sub-community. Different variables were entered for the selection process microbial-ecological indices, and species for which the relative abundances differed significantly between different pulmonary diseases. The R package "mlr3" was used to perform machine learning models. Variables were then entered into logistic regression models, and statistically significant variables were subsequently used to construct a prediction model. 10-fold CV were used for estimating 95% confidence interval of AUC. The prediction model accuracy, sensitivity and specificity were assessed using the AUC. DeLong test was used for calculating significance of p-value between two ROCs 40 . Statistics and reproducibility. The key features of the microbial composition, including the Shannon index, Simpson index, Chao1 index, and ACE index, were computed using the vegan package in R software after sequence processing 41 . Permutational multivariate ANOVA (PERMANOVA) was conducted using the "vegan" package to determine the difference in sample β-diversity (measured by Bray‒Curtis distance). Principal coordinates analysis (PCoA) and envfit functions were used to identify the species involved in microbial variation. Continuous nonparametric data were compared using the Mann‒Whitney-Wilcoxon test. Categorical data were compared using the chi-square test or Fisher's exact test. Correlation coefficients between species and clinical factors and between the microbiota and host genes were calculated using Spearman's rank correlation analysis using the Hmisc package. All data analyses were performed with the R studio built under R version 4.1.0. P values less than 0.05 were considered statistically significant. Results Clinical features of study cohort Based on the established criteria ( Methods section, Fig. 1 ), we enrolled a total of 402 patients, consisting of 123 cancer patients (Cancer group) and 279 patients with pulmonary infections (Infection group). According to etiological findings, the infection group was further subdivided into three subgroups: pulmonary tuberculous (TB group, n = 86), fungal infection (Fungal group, n = 79), and bacterial infection (Bacterial group, n = 114). Most patients, regardless of their subgroup, exhibited similar clinical and imaging characteristics, such as race (all were Chinese), underlying medical conditions, white blood cells (WBC) count and inflammatory indicators such as Procalcitonin (PCT) and C-reactive protein (CRP), and results of chest computed tomography (CT) scan (e.g., patchy shadows and nodules, cavities, mediastinal lymphadenopathy) (Table 1 ). The median mNGS DNA data per patient was 21.9 million reads (IQR 18.0-27.6 M), with the vast majority of reads (> 95%) being human. The median mNGS RNA data per patient was 19.1 million reads (IQR 13.8–26.2 M). Table 1 Demographic and clinical characteristics of the enrolled patients Characteristics Overall Lung Cancer Bacterial Infection Fungal Infection Tuberculosis p-value Patient demographics Total number, n 402 123 114 79 86 Age (median [IQR]) 59.50 [50.00, 67.50] 58.00 [51.00, 69.50] 57.00 [50.00, 66.00] 57.00 [46.00, 69.00] 57.50 [35.00, 67.75] 0.114 Sex = Male, n(%) 255(63.4) 86(69.9) 60(52.6) 50(63.3) 59(68.6) 0.086 Underlying conditions , n(%) Cardiovascular disease 69 (17.2) 24 (19.5) 18 (15.8) 17 (21.5) 10 (11.6) 0.309 Immunological disease 22 (5.5) 4 (3.3) 7 (6.1) 9 (11.4) 2 (2.3) 0.057 Liver insufficiency 34 (8.5) 9 (7.3) 9 (7.9) 11 (13.9) 5 (5.8) 0.293 Renal insufficiency 63 (15.7) 17 (13.8) 18 (15.8) 15 (19.0) 13 (15.1) 0.507 COPD 126 (31.3) 33 (26.8) 39 (34.2) 27 (34.2) 27 (31.4) 0.941 Center nervous system disorder 21 (5.2) 7 (5.7) 8 (7.0) 5 (6.3) 1 (1.2) 0.22 HIV 3 (0.7) 0 (0.0) 0 (0.0) 2 (2.5) 1 (1.2) 0.064 Hypertension 99 (24.6) 40 (32.5) 27 (23.7) 19 (24.1) 13 (15.1) 0.038 Diabetes 55 (13.7) 17 (13.8) 19 (16.7) 8 (10.1) 11 (12.8) 0.643 Laboratory testing , median [IQR] WBC (10×10 9 /L) 6.50 [5.05, 9.33] 6.42 [4.95, 9.23] 7.35 [5.32, 10.31] 6.58 [4.65, 9.20] 6.08 [5.07, 7.86] 0.102 NEUT(%) 70.30 [61.82, 80.25] 70.80 [63.35, 81.40] 71.30 [61.95, 80.67] 74.50 [60.50, 86.65] 67.50 [60.20, 73.05] 0.013 CRP (mg/L) 17.03 [3.30, 56.53] 22.27 [4.73, 71.02] 16.44 [3.30, 59.38] 10.30 [3.21, 52.62] 11.20 [3.55, 41.72] 0.136 PCT (ng/mL) 0.09 [0.04, 0.36] 0.10 [0.04, 0.37] 0.11 [0.05, 0.48] 0.19 [0.04, 0.55] 0.05 [0.05, 0.12] 0.042 Chest CT imaging features , n(%) Pulmonary emphysema 90 (22.4) 39 (31.7) 22 (19.3) 17 (21.5) 12 (14.0) 0.018 Pulmonary nodule 136 (33.8) 52 (42.3) 22 (19.3) 32 (40.5) 30 (34.9) 0.001 Pulmonary cavity 50 (12.4) 10 (8.1) 14 (12.3) 8 (10.1) 18 (20.9) 0.055 Ground-glass shadow 58 (14.4) 23 (18.7) 11 (9.6) 15 (19.0) 9 (10.5) 0.095 Multiple patchy solid shadows 284 (70.6) 82 (66.7) 80 (70.2) 53 (67.1) 69 (80.2) 0.078 Malignant pleural effusion 144 (35.8) 50 (40.7) 37 (32.5) 29 (36.7) 28 (32.6) 0.533 Pleural thickening 62 (15.4) 28 (22.8) 13 (11.4) 11 (13.9) 10 (11.6) 0.069 Mediastinal lymphadenopathy 146 (36.3) 58 (47.2) 35 (30.7) 21 (26.6) 32 (37.2) 0.012 We compared and screened the differential features within the mNGS data of the cancer and infection groups in training cohort to establish a differential diagnosis approach for lung cancer and pulmonary infections. Subsequently, the cancer group was compared separately to the tuberculosis, fungal, and bacterial groups to develop a diagnostic method capable of rapidly distinguishing lung cancer from infections caused by different pathogens. Microbial community structure of different pulmonary diseases Microbial communities were assessed in a total of 284 samples within the training cohort, comprising 87 cases of lung cancer and 197 cases of pulmonary infection. Analysis of the microbial community complexity, as gauged by metrics such as Richness (Observed species number), ACE, Chao1, Shannon diversity index, Simpson diversity index, and Evenness index, revealed no significant differences in α-diversity among the BALF samples. This observation held true for comparisons between the cancer and infection groups, as well as within different infection su RNA bgroups (Extended Data Fig. 2 A, C, assessed via Mann-Whitney-U test, p-value > 0.05). However, β-diversity analysis based on the Bray-Curtis distance indicated that the microbial composition of BALF samples of cancer group was distinct from either infection groups or infection subgroups (Fig. 2 A, B, PERMANOVA, P < 0.01). To find out specific microorganisms of different pulmonary diseases, we did Lefse analysis. The findings revealed a higher prevalence of S. oralis , S. mitts , V. parvula , P. gingivalis , and C. orthopsilosis , which are often regarded as oral or airway commensals, in lung cancer compared to pulmonary infection (Fig. 2 C, LDA score > 2, adjusted p-value < 0.05). Conversely, pathogenic microorganisms commonly linked with infections, such as M. tuberculosis , A. oryzae , A. fumigatus , and C. neoformans , were more frequently detected in the pulmonary infection. These distinct microbial profiles could potentially serve as valuable indicators for diagnosing pulmonary diseases. For the RNA data, both α-diversity (Extend Data Fig. 2 B, D, Mann-Whitney-U test, p-value < 0.05) and β-diversity (Extend Data Fig. 3 A, B, PERMANOVA, p-value < 0.01) analyses of microbiome supported distinct microbial community features in the lower airways among pulmonary diseases. Interestingly, we found that RNA relative abundances of some pathogenic microorganisms, such as M. tuberculosis , were significantly higher in Pulmonary Infection or infection subgroups comparing Lung Cancer (Extend Data Fig. 3 C, Mann-Whitney-U test, p-value < 0.01). Difference in host immune response, transposable elements, and immune cell abundance of different pulmonary diseases To discern host immune responses between lung cancer and infection, we conducted BALF host gene expression analyses, revealing substantial variations among various groups as depicted in volcano graph analysis (Extend Data Fig. 4 A-D). GSEA enrichment analysis highlighted significant enrichment of differential expression genes (DEGs) in innate immune pathways like T-cell receptor signaling and cytokine-cytokine receptor signaling (Fig. 3 A). Employing PLIER on training datasets, we delineated host transcriptomic profiles across 545 canonical Pathways, identifying multiple differentially expressed latent variables (LVs) with distinct biological functions across different groups (Fig. 3 B, Mann-Whitney-U Test, adjusted p-value < 0.05). Specifically, in the cancer group, lower airway transcriptomes exhibited upregulation of the cell cycle (LV102 and LV107), while LV165, annotated as cytokine-cytokine receptor interaction pathways, displayed upregulation, contrary to LV86 in the same pathways (Fig. 3 B). Furthermore, we observed upregulation of interferon signaling and the innate immune system in infection groups (Fig. 3 D, E), notably driven by Pulmonary Tuberculosis, which exhibited the well-established upregulation of interferon signaling. For further exploration, we selected differentially expressed immune genes (IMG) from the ImmPort database and interferon-stimulated genes (ISG) from prior research 33 , 42 . Notably, TB-associated markers GBP1 and GBP5 were elevated in the TB group (Extend Data Fig. 5 A, adjusted p-value < 0.01). Four genes emerged as notably upregulated in the cancer group, and intriguingly, these genes were chemokines: C-C motif chemokine ligand 7 (CCL7), C-C motif chemokine ligand 8 (CCL8), C-C motif chemokine ligand 13 (CCL13) and pro-platelet basic protein (PPBP) also known as CXCL7 (Extend Data Fig. 5 A, indicated by red triangle, adjusted p-value < 0.01). Studies suggest that CCL7, highly expressed in tumor tissues, recruits cDC1 cells, aiding antitumor immunity and checkpoint immunotherapy. Additionally, CCL7, CCL8, and CCL13 are linked to tumor-associated macrophages (M2) 43 – 45 . We identified 27 transposable elements among lung cancer and three infection groups (Extend Data Fig. 4 E-H, Extend Data Fig. 5 B), notably finding significantly higher LTR-ERV (LTR6A and HUERS-P3-int) levels in lung cancer (Extend Data Fig. 5 B, adjusted p-value = 0.019). To investigate variations in immune cell abundance across different groups, we estimated cell-type levels in host transcriptomes using computational quantification methods, including a deconvolution approach implemented in CIBERSORTx. Macrophage M1 were significantly elevated in pulmonary tuberculosis (Fig. 4 B, Mann-Whitney-U Test, adjusted p-value < 0.05), whereas Macrophage M2 levels were higher in fungal infection, pulmonary tuberculosis and lung cancer Macrophage M2 levels were higher in fungal infection, pulmonary tuberculosis and lung cancer (Fig. 4 C, Mann-Whitney-U Test, adjusted p-value < 0.01). Neutrophils were enriched in bacterial infectioncomparing with lung cancer (Fig. 4 D, Mann-Whitney-U test, p-value < 0.01). Furthermore, we observed notably higher monocytes in fungal infection (Fig. 4 E, Mann-Whitney-U Test, adjusted p-value < 0.01). Copy number variants and CNV-derived tumor fraction of different pulmonary diseases To enhance CNV and tumor fraction estimations in BALF mNGS data, we used three distinct software tools. CNVkit revealed slight increases in CNV counts on chromosomes 11 (cancer group) and 3 (infection group) (Extended Data Fig. 6 A, p-value < 0.05). Higher CNV percentages on chromosome 3 were noted in the infection group (Extended Data Fig. 6 B, p-value 0.05). Subsequently, ichorCNA estimated tumor fractions at 5.96% (lung cancer, 95% CI 4.15%-7.77%) and 6.29% (pulmonary infection, 95% CI 0.54%-12.04%) (Extended Data Fig. 7A). Notably, no significant differences in tumor fractions were observed between the lung cancer and the three infection subgroups (Extended Data Fig. 7B). Calculated scores (Stromal, Immune, ESTIMATE, Tumor Purity) using 'estimate' software showed no differences between cancer and all the infection groups (Extended Data Fig. 7A, B). This suggests that, unlike Cancer-Negative (Benign) samples 16 , BALF samples from infection patients display comparable levels of copy number variations seen in cancer patients. Host/microbe metagenomics-based modelling for lung cancer and pulmonary infection diagnosis We initially utilized a range of combined metrics from the training set, including microbial and bacteriophage DNA/RNA abundances, host gene expression, immune cell composition, transposable element expression, and CNV-derived tumor fraction, to identify the most effective machine learning classifier among 10 options. The Random Forest classifier emerged as the optimal choice (Extended Data Fig. 8). Subsequently, within this classifier, we built six preset diagnostic models (Model I-VI) using various biological features or their combinations from the mNGS dataset (Fig. 5 A). These models were individually trained and evaluated to identify the most effective one for distinguishing lung cancer from infection in both general and subgroup comparisons (Fig. 5 A). The results unveiled that Model VI, incorporating differential features from microbial and bacteriophage DNA/RNA abundances, host gene expression, immune cell composition, transposable elements, and CNV-derived tumor fraction, exhibited the highest discriminatory capability in both general and subgroup comparisons. Specifically, for the general comparison, Model VI demonstrated an AUC of 0.87 (95% CI = 0.857–0.883) with 73.8% sensitivity and 84.5% specificity in the training cohort. In the validation cohort, it achieved an AUC of 0.831 (95% CI = 0.819–0.843) with 67.1% sensitivity and 94.4% specificity, effectively distinguishing lung cancer from pulmonary infections (Fig. 5 C, Extended Data Table S3). The highlighted host transcriptome features in Model VI included genes involved in the cell cycle and cytokine-cytokine receptor pathways, such as ULBP1, BG3GAT1, and CCL13 (Extended Data Table S4; Fig. 5 G, H, and I, Mann-Whitney-U test, p-value < 0.05). Notably, CCL13, a downstream gene of EGFR, serves as a typical LUAD biomarker, while ULBP1 and BG3GAT1 are genes regulated by CCL13 for cDC modulation. In the subgroup comparisons, Model VI showcased notable performance. For instance, in distinguishing lung cancer from bacterial infection, it attained an AUC of 0.849, with 67.6% sensitivity and 91.7% specificity in the validation cohort. Similarly, in discerning lung cancer from fungal infection, Model VI displayed an AUC of 0.811, sensitivity of 82.6%, and specificity of 77.7%. Furthermore, when differentiating lung cancer from pulmonary tuberculosis, Model VI achieved an AUC of 0.882, sensitivity of 84.0%, and specificity of 77.7% (Fig. 5 B, D, E, and F, Extended Data Table 1 ). Noteworthy observations included higher levels of MAS1 associated with apoptosis and tissue injuries in Bacterial Infection, increased AP1AR levels correlated with TLR4 in Pulmonary Tuberculosis, and elevated ZPLD2P levels in Fungal Infection compared to lung cancer (Fig. 5 K, J, and L, Mann-Whitney-U test, p-value < 0.01). A composite predictive model for Lung cancer and infection diagnosis With a rule-in and rule-out strategy, we developed a composite predictive model that combines the Model-VI used for general comparison with either Model VI used for subgroup comparison, aiming to enhance the diagnostic accuracy for lung cancer and infections. In this rule-in and rule-out strategy, if both Model-VI of general comparison and either Model VI used for a subgroup comparison classified a patient as lung cancer, we defined it in rule-in-band (i.e., positive of cancer diagnosis). While if both models classified a patient as infection, we defined it in rule-out-band (i.e., positive of infection diagnosis) (Fig. 5 A). The validation cohort from each subgroup comparison was utilized to evaluate the performance of the composite predictive model. Within the lung cancer versus bacterial infection group, a total of 61 patients were categorized, with 41 identified as rule-in and 20 as rule-out (Fig. 6 A, Table 2 ). Similarly, within the comparison of lung cancer versus fungal infection, 44 patients were classified, comprising 28 rule-in and 16 rule-out cases (Fig. 6 B, Table 2 ). Moreover, in the evaluation of lung cancer versus fungal infection, 46 patients were allocated, consisting of 30 rule-in and 16 rule-out instances (Fig. 6 C, Table 2 ). Table 2 Test statistics for combination strategy. Treated Cancer Infection LR* Specificity# Sensitivity+ Lung Cancer vs. Bacterial Infection Rule-Out 2 18 0.11 0.9 - Rule-In 33 8 4.13 - 0.805 Lung Cancer vs. Fungal Infection Rule-Out 1 15 0.07 0.938 - Rule-In 27 1 27 - 0.964 Lung Cancer vs. Pulmonary Tuberculosis Rule-Out 0 16 0 1 - Rule-In 26 4 6.5 - 0.867 LR*: Likelihood Ratio, serves as an indicator of cancer risks. A higher LR signifies a stronger correlation with lung cancer. For example, within the Rule-out band for lung cancer versus bacterial infection, there were 18 patients classified as infection and 2 patients classified as having cancer. The LR calculation resulted in 2/18 = 0.11. Specificity#: refers to the accuracy of the rule-out band in correctly identifying infected patients. It is calculated as the number of infected patients correctly identified by the rule-out band (true positives) divided by the sum of true positives and the number of infected patients mistakenly identified as having cancer by the rule-out band. Sensitivity+: refers to the accuracy of the rule-in band in correctly identifying cancer patients. This is calculated by dividing the number of cancer patients correctly identified by the rule-in band (true positives) by the sum of true positives and the number of cancer patients mistakenly identified as having an infection by the rule-in band. From the results, it is evident that employing this strategy significantly enhanced the diagnostic accuracy (ACC) in distinguishing between lung cancer and bacterial infection, elevating it from 0.800 (56/70) to 0.836 (51/61) (Fig. 6 A). This enhancement was accompanied by a sensitivity of 80.5%, reflecting the rule-in band's accuracy in correctly identifying individuals with cancer, and a specificity of 90.0%, demonstrating the rule-out band's accuracy in correctly identifying patients with an infection (Table 2 ). Similarly, there was a significant enhancement in ACC, rising from 0.797 (47/59) to 0.955 (42/44) alongside a specificity of 93.8% and sensitivity of 96.4% for diagnosing Lung cancer and Fungal Infection (Fig. 6 B, Table 2 ). Of note, this method yielded 100.0% specificity and 86.7% sensitivity in distinguishing Lung cancer and Pulmonary Tuberculosis (ACC = 0.913, 42/46) (Fig. 6 C, Table 2 ). Accordingly, this integrated predictive approach indeed provides a highly accurate strategy to better utilize complex data generated by mNGS for distinguishing various pulmonary diseases in a clinically viable manner. Discussion In the realm of diagnostics, BALF-based mNGS testing has emerged as a rapid assay to pinpoint pulmonary infection pathogens 12 , 46 – 48 . Despite over 90% of mNGS results being human-origin reads, often disregarded as "noise", recent research posits that these sequences may harbor valuable biomarkers linked to the host's disease state 15 , 49 . Our study pioneers a comprehensive host/microbe metagenomics approach, utilizing BALF mNGS data for diagnosing lung cancer and pulmonary infections. This innovative methodology exhibits exceptional accuracy in distinguishing between lung cancer and diverse pulmonary infections (including pulmonary tuberculosis, fungal infection, and bacterial infection), amplifying the clinical applicability of BALF mNGS testing. While BALF samples exhibit inherent heterogeneity compared to whole blood or tissue specimens 18 , 50 , 51 , our analytical model demonstrates significant robustness. Specifically tailored for distinguishing lung cancer from pulmonary infection, our Model VI achieved a notable AUC of 0.831 (95% CI = 0.819–0.843) within the validation cohort. This cohort encompassed a spectrum of complex pulmonary infections, including bacterial, fungal, and tuberculosis infections, each characterized by substantial variations in host immune responses, pathogen profiles, and microbiota compositions. Impressively, our model's performance is comparable to the uniform multi-omics models utilized in other studies for different sample types. For instance, the IMX-BVN model used to differentiate acute bacterial infections from others achieved an AUC of 0.86 (95% CI 0.77–0.93), while distinguishing acute viral infections scored an AUC of 0.85 (95% CI 0.76–0.93) 52 . The diagnostic capacity of whole blood transcriptomics in discerning sepsis from non-sepsis states showed an AUC of 0.82, while plasma cell-free RNA transcriptomics reached an AUC of 0.77 14 . These studies indirectly demonstrate similar remarkable efficacy of our model in managing complex pulmonary conditions. Moreover, in the validation cohort differentiating lung cancer from pulmonary tuberculosis, the AUC escalated to 0.882 (95% CI = 0.875–0.891), showcasing the advantage of integrating multi-omics into the Model VI. Identifying patients with lung cancer or pulmonary infections remains a crucial clinical challenge in many medical settings. The decision to administer empirical antibiotics often relies on an educated guess. If we could further refine our diagnosis of specific infection subgroups (such as bacteria, fungi, or tuberculosis) after confirming an infection firstly using our developed Model VI, it could assist clinicians in more accurately employing antibiotic therapies. Hence, we have further developed a more rigorous integrated predictive model based on predefined rule-in and rule-out strategies, enhancing the differentiation accuracy between lung cancer and infection subgroups. The result showed improved accuracy in distinguishing lung cancer from pulmonary tuberculosis (ACC = 0.913), fungal infection (ACC = 0.955), and bacterial infection (AUC = 0.836). Such diagnostic approaches promise more precise clinical diagnoses, thereby yielding greater benefits for patients. Our study tested an integrated host-microbe mNGS diagnostic approach, examining microbial (including bacteriophage) DNA/RNA abundance, host gene expression, transposable elements, immune cell composition, and copy-number variants (CNV) derived tumor fraction. Prior research only only one or a few features independently to help diagnosis, like lung cancer microbiomes 53 . Previous 16s rRNA sequencing revealed higher Firmicutes and TM7 presence in lung cancer versus healthy controls 54 . Veillonella and Megasphaera showed promise as lung cancer biomarkers (AUC: 0.888), indicating distinctive bacterial profiles in lung cancer versus benign conditions 54 . Our data detected subtle microbial differences between lung cancer and pulmonary infections and infectious subgroups. Veillonella parvula notably increased in lung cancer compared to bacterial/fungal infection (Fig. 1 C, LDA score > 2, adjusted p-value < 0.05). Yet, the microbiome had limited diagnostic predictive power for diagnosis of lung cancer and pulmonary infection (AUC = 0.518 in validation cohort). Extracting more distinctive biological information from sequencing data is crucial for differentiating lung cancer from pulmonary infections. We believe that host immune dysregulation disrupts the composition of respiratory microbiota. Previous literature has underscored significant changes in the dynamic equilibrium between host and microbiome in conditions such as lung cancer and infections 55 , 56 . In this study, we independently compared the contributions of Microbial/Bacteriophage relative abundances (Model I and Model II), Host gene expression and composition of immune cell (Model III), TE expression levels (Model IV), and CNV-derived tumor fraction (Model V) for diagnosing lung cancer from infections. The results indicate that host immune response (Model III) reflects the most prominent differences in pulmonary disease status compared to other categories (Figs. 5 D-F, DeLong's ROC test, p-value < 0.05). In spite of the limited cellular content in certain BALF samples from patients, we successfully retrieved a robust human gene expression dataset. These data unveiled distinct immune responses across various pulmonary diseases. Analysis using PLIER revealed significant differences in latent variables associated with cell cycle, interferon, and cytokine pathways among these diseases. Notably, our findings highlighted genes involved in cell cycle regulation concurrently influencing PI3K-Akt signaling, p53 signaling, and lung cancer pathways, under the regulation of EGFR 57 . Additionally, we identified the GPB5 gene, known for its high diagnostic relevance in active tuberculosis 58 , and observed elevated expression levels of interferon signaling pathways in the pulmonary tuberculosis group compared to the other groups (Figure S4A). This further underscores the reliability of our findings regarding the host immune response. Our top three classifier genes for lung cancer and pulmonary infection were identified as B3GAT1, ULBP1, and CCL13. Interestingly, these genes have not been previously linked in host gene expression signatures in bodily fluids related to lung cancer. Specifically, ULBP1's role as a ligand for the NKG2D receptor activates NK cells in lung cancer, fostering NK cell-mediated tumor surveillance and cytotoxicity 59 . Expression of ULBP1-6, particularly in squamous-cell carcinoma, correlates with clinical outcomes in NSCLC patients, suggesting a predictive value for clinical prognosis 60 . Conversely, CCL13, a ligand for CCR2, contributes to cancer-related processes such as metastasis and immunosuppression. CCR2 expression in M2 macrophages is integral in the bidirectional communication between these macrophages and cancer cells, driving lung cancer progression 61 . Additionally, CCL13, derived from M2 tumor-associated macrophages, promotes oral cancer metastasis by inducing inflammatory cytokines 45 . Finally, B3GAT1, or beta-1,3-glucuronyltransferase 1, holds significance in cancer, particularly concerning tumor cell motility and specific carbohydrate epitope biosynthesis. Its role in canonical integrin signaling pathways influences tumor cell motility, while its involvement in HNK-1 carbohydrate epitope biosynthesis bears relevance to neurodevelopment and cancer-related processes 62 . We investigated for the first time the expression levels of transposable elements in BALF samples from pulmonary diseases in this study. HERVK11D showed higher expression in lung cancer compared to fungal infection and tuberculosis. Similarly, ERVK-MER11B was more expressed in bacterial infection than tuberculosis (Figure S4, adjusted p-value < 0.05). The heightened expression of HERV-K, linked to basal-like and triple-negative breast cancer progression, illustrates altered gene expression driving cancer advancement. HERV-derived long non-coding RNAs also promote cancer progression, signaling significant gene profile shifts in these cancers 63 , 64 . Additionally, two ERV1 were notably higher in lung cancer compared to all pulmonary infections (Figure S4, adjusted p-value < 0.05). These findings underscore the importance of Repetitive Sequences in human health, exemplified by severe COVID-19 pneumonia triggering intense inflammatory responses and HERVs dysregulation in BALF samples. For example, HERV-FRD, notably upregulated in COVID-19 BALF, suggests HERVs as potential disease progression biomarkers linked to increased severity in aging 65 . Surprisingly, we first found that certain transposable elements were more expressed in BALF during pulmonary infection than in lung cancer. GSAT satellite, notably higher in bacterial and fungal infections (Figure S4, adjusted p-value < 0.05), regulated by AP-1, holds significance in various pulmonary diseases by impacting gene expression and inflammatory cell activation crucial in pulmonary infections 66 , 67 . Several studies have employed copy number variants (CNV) from bodily fluids to diagnose pulmonary malignancies 16 , 18 , 68 . In these investigations, whole genome testing of metagenomic data demonstrated a heightened diagnostic accuracy for pulmonary malignancies in samples initially identified as negative via conventional testing. Intriguingly, our findings indicated no significant distinction in the CNV- derived tumor fraction of BALF samples between the lung cancer and pulmonary infections. This reiterates that relying solely on one-dimensional information acquired from conventional BALF mNGS, characterized by low-depth sequencing, is insufficient for diagnosing intricate or multifaceted diseases. Despite insights gained, our study has limitations. Firstly, our cohort lacked viral pneumonia cases due to reduced incidence during China's COVID-19 control measures. in fact, most of viral pneumonia often exhibits clinical and radiological differences from lung cancer, lessening the need for complex differential diagnostics than other infections associated with bacteria, fungi and mycobacteria. Secondly, our study focused on distinguishing infection from cancer, and therefore, the model established cannot address the differentiation effectiveness among various infection subgroups. We are conducting another study developing diagnostic models for distinguishing between infection subgroups, and some progress has been made thus far. In conclusion, we report that integrated host and microbe information from BAL nucleic acid enables accurate diagnosis of lung cancer and pulmonary infections. Future studies are needed to validate and test the clinical impact of this culture-independent diagnostic approach. Declarations Data Availability. Microbial reads from metagenomic and metatranscriptomic data were deposited in NCBI's Sequence Read Archive (SRA) database under project number PRJNA1056765. Host gene expression profile derived from metatranscriptomic were deposited in GSE252118. Ethical Approval The study was approved by the Ethics Committee of the Institutional Review Board of FAHZU (study no. IIT20220714A). Written informed consent was waived because of the non-interventional study design. Acknowledgements We thank all clinicians who provided detailed diagnostic and treatment data of patients for our study, as well as all infectious disease (ID) physicians, clinical microbiologists and oncologists who received our clinical consultations. Funding This study was supported by the National Key R&D Program of China (2023YFC2308300),“Leading Geese” Research and Development Plan of Zhejiang Province (No. 2024C03218), Zhejiang Provincial Natural Science Foundation (grant number LY23H200001), Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholar (grant number LR23H200002). Author contributions Study design, D.H., S.Z., and Y.C.; Sample detection, F.Y., D.Z., M.X., L.Y., J.Z., J.W. and J.T; Data collection, D.H., F.Y., B.L., H.T. and H.Z.; Data analysis, D.H., F.Y., B.Y., K.M., H.L. and Y.C.; Wrote the paper: D.H. and B.Y. All authors have read and approved the final version of the manuscript. Competing interests The authors declare no competing interests. References Kreier, F. Cancer will cost the world $25 trillion over next 30 years. Nature , (2023). Agusti, A., Vogelmeier, C. F. & Halpin, D. M. G. Tackling the global burden of lung disease through prevention and early diagnosis. The Lancet Respiratory Medicine . 10 , 1013-1015 (2022). McKelvy, B. J. et al. Infectious Diseases That May Mimic Lung Cancer. In: Moran, C. A., Truong, M. T. & de Groot, P. M., editors. The Thorax: Medical, Radiological, and Pathological Assessment . Cham: Springer International Publishing; 2023. pp. 827-851. Newman-Toker, D. E. et al. Serious misdiagnosis-related harms in malpractice claims: The "Big Three" - vascular events, infections, and cancers. Diagnosis (Berlin, Germany) . 6 , 227 (2019). Guimarães, M. D. et al. Fungal Infection Mimicking Pulmonary Malignancy: Clinical and Radiological Characteristics. Lung . 191 , 655-662 (2013). Fabre, V. et al. Principles of diagnostic stewardship: A practical guide from the Society for Healthcare Epidemiology of America Diagnostic Stewardship Task Force. Infection Control & Hospital Epidemiology . 44 , 178-185 (2023). Blauwkamp, T. A. et al. Analytical and clinical validation of a microbial cell-free DNA sequencing test for infectious disease. Nature Microbiology . 4 , 663-674 (2019). Miller, S. et al. Laboratory validation of a clinical metagenomic sequencing assay for pathogen detection in cerebrospinal fluid. Genome Res. 29 , 831-842 (2019). Diao, Z. et al. Validation of a Metagenomic Next-Generation Sequencing Assay for Lower Respiratory Pathogen Detection. Microbiology Spectrum . 11 , (2023). Chiu, C. Y. & Miller, S. A. Clinical metagenomics. Nat. Rev. Genet. 20 , 341-355 (2019). Diao, Z., Han, D., Zhang, R. & Li, J. Metagenomics next-generation sequencing tests take the stage in the diagnosis of lower respiratory tract infections. Journal of Advanced Research . 38 , 201-212 (2022). Edgeworth, J. D. Respiratory metagenomics: route to routine service. Curr. Opin. Infect. Dis. 36 , 115-123 (2023). Ramachandran, P. S. et al. Integrating central nervous system metagenomics and host response for diagnosis of tuberculosis meningitis and its mimics. Nat. Commun. 13 , (2022). Kalantar, K. L. et al. Integrated host-microbe plasma metagenomics for sepsis diagnosis in a prospective cohort of critically ill adults. Nature Microbiology . 7 , 1805-1816 (2022). Langelier, C. et al. Integrating host response and unbiased microbe detection for lower respiratory tract infection diagnosis in critically ill adults. Proceedings of the National Academy of Sciences . 115 , E12353-E12362 (2018). Gu, W. et al. Detection of cryptogenic malignancies from metagenomic whole genome sequencing of body fluids. Genome Med. 13 , (2021). Gu, W. et al. Detection of Neoplasms by Metagenomic Next-Generation Sequencing of Cerebrospinal Fluid. Jama Neurol. 78 , 1355-1366 (2021). Guo, Y. et al. Metagenomic next-generation sequencing to identify pathogens and cancer in lung biopsy tissue. Ebiomedicine . 73 , 103639 (2021). Travis, W. D. et al. The 2015 World Health Organization Classification of Lung Tumors: Impact of Genetic, Clinical and Radiologic Advances Since the 2004 Classification. J. Thorac. Oncol. 10 , 1243-1260 (2015). Sulaiman, I. et al. Microbial signatures in the lower airways of mechanically ventilated COVID-19 patients associated with poor clinical outcome. Nat Microbiol . 6 , 1245-1258 (2021). Zhou, Z. et al. Heightened Innate Immune Responses in the Respiratory Tract of COVID-19 Patients. Cell Host Microbe . 27 , 883-890 (2020). Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics . 30 , 2114-2120 (2014). Ho, S., Wheeler, N. E., Millard, A. D. & van Schaik, W. Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data. Microbiome . 11 , 84 (2023). Haddock, N. L. et al. Phage diversity in cell-free DNA identifies bacterial pathogens in human sepsis cases. Nat Microbiol . 8 , 1495-1507 (2023). Haddock, N. L. et al. Phage diversity in cell-free DNA identifies bacterial pathogens in human sepsis cases. Nat Microbiol . 8 , 1495-1507 (2023). Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37 , 907-915 (2019). Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics . 30 , 923-930 (2014). Jin, Y., Tam, O. H., Paniagua, E. & Hammell, M. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. Bioinformatics . 31 , 3593-3599 (2015). Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol . 15 , 550 (2014). Subramanian, A. et al. Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles. Proceedings of the National Academy of Sciences - Pnas . 102 , 15545-15550 (2005). Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 45 , D353-D361 (2017). Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 50 , D687-D692 (2022). Schoggins, J. W. et al. A diverse range of gene products are effectors of the type I interferon antiviral response. Nature . 472 , 481-485 (2011). Steen, C. B., Liu, C. L., Alizadeh, A. A. & Newman, A. M. Profiling Cell Type Abundance and Expression in Bulk Tissues with CIBERSORTx. Methods Mol Biol . 2117 , 135-157 (2020). Mao, W., Zaslavsky, E., Hartmann, B. M., Sealfon, S. C. & Chikina, M. Pathway-level information extractor (PLIER) for gene expression data. Nat. Methods . 16 , 607-610 (2019). Adalsteinsson, V. A. et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. Nat. Commun. 8 , 1313-1324 (2017). Talevich, E., Shain, A. H., Botton, T. & Bastian, B. C. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. Plos Comput Biol . 12 , e1004873 (2016). Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4 , 2612 (2013). Segata, N. et al. Metagenomic biomarker discovery and explanation. Genome Biol . 12 , R60 (2011). Mayhew, M. B. et al. A generalizable 29-mRNA neural-network classifier for acute bacterial and viral infections. Nat. Commun. 11 , 1177 (2020). Ren, L. et al. Dynamics of the Upper Respiratory Tract Microbiota and Its Association with Mortality in COVID-19. Am J Respir Crit Care Med . 204 , 1379-1390 (2021). Bhattacharya, S. et al. ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. Sci Data . 5 , 180015 (2018). Nakayama, T. et al. Inflammatory molecular endotypes of nasal polyps derived from White and Japanese populations. J. Allergy Clin. Immun. 149 , 1296-1308 (2022). Korbecki, J. et al. CC Chemokines in a Tumor: A Review of Pro-Cancer and Anti-Cancer Properties of the Ligands of Receptors CCR1, CCR2, CCR3, and CCR4. Int. J. Mol. Sci. 21 , 8412 (2020). Liu, Z. et al. Tumor-Associated Macrophages Promote Metastasis of Oral Squamous Cell Carcinoma via CCL13 Regulated by Stress Granule. Cancers (Basel) . 14 , (2022). Diao, Z., Han, D., Zhang, R. & Li, J. Metagenomics next-generation sequencing tests take the stage in the diagnosis of lower respiratory tract infections. Journal of Advanced Research , (2021). Charalampous, T. et al. Evaluating the potential for respiratory metagenomics to improve treatment of secondary infection and detection of nosocomial transmission on expanded COVID-19 intensive care units. Genome Med. 13 , 182 (2021). Charalampous, T. et al. Nanopore metagenomics enables rapid clinical diagnosis of bacterial lower respiratory infection. Nat. Biotechnol. 37 , 783-792 (2019). Mick, E. et al. Integrated host/microbe metagenomics enables accurate lower respiratory tract infection diagnosis in critically ill children. J. Clin. Invest. 133 , (2023). Davidson, K. R., Ha, D. M., Schwarz, M. I. & Chan, E. D. Bronchoalveolar lavage as a diagnostic procedure: a review of known cellular and molecular findings in various lung diseases. J Thorac Dis . 12 , 4991-5019 (2020). Chellapandian, D. et al. Bronchoalveolar lavage and lung biopsy in patients with cancer and hematopoietic stem-cell transplantation recipients: a systematic review and meta-analysis. J. Clin. Oncol. 33 , 501-509 (2015). Mayhew, M. B. et al. A generalizable 29-mRNA neural-network classifier for acute bacterial and viral infections. Nat. Commun. 11 , (2020). Ran, Z. et al. Pulmonary Micro-Ecological Changes and Potential Microbial Markers in Lung Cancer Patients. Front Oncol . 10 , 576855 (2020). Lee, S. H. et al. Characterization of microbiome in bronchoalveolar lavage fluid of patients with lung cancer comparing with benign mass like lesions. Lung Cancer . 102 , 89-95 (2016). Dickson, R. P. & Huffnagle, G. B. The Lung Microbiome: New Principles for Respiratory Bacteriology in Health and Disease. Plos Pathog . 11 , e1004923 (2015). Man, W. H., de Steenhuijsen Piters, W. A. A. & Bogaert, D. The microbiota of the respiratory tract: gatekeeper to respiratory health. Nature Reviews. Microbiology . 15 , 259-270 (2017). Da, C. S. G., Shepherd, F. A. & Tsao, M. S. EGFR mutations and lung cancer. Annu Rev Pathol . 6 , 49-69 (2011). Sweeney, T. E., Braviak, L., Tato, C. M. & Khatri, P. Genome-wide expression for diagnosis of pulmonary tuberculosis: a multicohort analysis. The Lancet Respiratory Medicine . 4 , 213-224 (2016). Schmiedel, D. & Mandelboim, O. NKG2D Ligands-Critical Targets for Cancer Immune Escape and Therapy. Front Immunol . 9 , 2040 (2018). Gowen, B. G. et al. A forward genetic screen reveals novel independent regulators of ULBP1, an activating ligand for natural killer cells. Elife . 4 , (2015). Schmall, A. et al. Macrophage and cancer cell cross-talk via CCR2 and CX3CR1 is a fundamental mechanism driving lung cancer. Am J Respir Crit Care Med . 191 , 437-447 (2015). Jeffries, A. R. et al. beta-1,3-Glucuronyltransferase-1 gene implicated as a candidate for a schizophrenia-like psychosis through molecular analysis of a balanced translocation. Mol Psychiatry . 8 , 654-663 (2003). Lemaitre, C., Tsang, J., Bireau, C., Heidmann, T. & Dewannieux, M. A human endogenous retrovirus-derived gene that can contribute to oncogenesis by activating the ERK pathway and inducing migration and invasion. Plos Pathog . 13 , e1006451 (2017). Jin, X. et al. The endogenous retrovirus-derived long noncoding RNA TROJAN promotes triple-negative breast cancer progression via ZMYND8 degradation. Sci Adv . 5 , eaat9820 (2019). Kitsou, K. et al. Upregulation of Human Endogenous Retroviruses in Bronchoalveolar Lavage Fluid of COVID-19 Patients. Microbiol Spectr . 9 , e126021 (2021). Wang, A. et al. Transcription factor complex AP-1 mediates inflammation initiated by Chlamydia pneumoniae infection. Cell. Microbiol. 15 , 779-794 (2013). Arancio, W. & Coronnello, C. Repetitive Sequence Transcription in Breast Cancer. Cells (Basel, Switzerland) . 11 , 2522 (2022). Lin, P. et al. A multicenter-retrospective cohort study of chromosome instability in lung cancer: clinical characteristics and prognosis of patients harboring chromosomal instability detected by metagenomic next-generation sequencing. J Thorac Dis . 15 , 112-122 (2023). Additional Declarations There is NO Competing Interest. Supplementary Files Supplementarytable.xlsx Dataset S1,Dataset S2,Dataset S3,Dataset S4 nrsoftwarepolicyNCOMMS2404205.pdf Software Policy checklist nrreportingsummaryNCOMMS2404205.pdf Reporting Summary SupplementaryInformation.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3883914","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":271950761,"identity":"a36edb78-303b-48e1-a6a8-5b75974b7f8c","order_by":0,"name":"Yu Chen","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA4UlEQVRIiWNgGAWjYDACCSjNz8CQAKSYSdAi2UCyFoMDYIoILfyzm4895m2rS9x8/sAzCYYK68QG9rMH8Fty51i6MW8bW+K2AwfSJBjOpCc28OQl4NViIJFjJs3bxpO47WBDmgRj2+HEBgkeAwJa8r8BtUgkbm5mAGr5R5SWHDagFoPEDWwgLQ1EaJG4kWYmOedcgvGMMwzJFglAj7Xx5ODXwj8j+ZnEm7I62f7+M4k3PtRYy/azn8GvBQSYeBgYHBsYeBLAkclGUD0QMP5gYLBnYGA/QIziUTAKRsEoGIEAALUAQRX0cY8LAAAAAElFTkSuQmCC","orcid":"","institution":"Zhejiang University","correspondingAuthor":true,"prefix":"","firstName":"Yu","middleName":"","lastName":"Chen","suffix":""},{"id":271950762,"identity":"0698acca-908d-48b4-8085-24cd066ddb1d","order_by":1,"name":"Dongsheng Han","email":"","orcid":"https://orcid.org/0000-0002-1892-8603","institution":"The First Affiliated Hospital, Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Dongsheng","middleName":"","lastName":"Han","suffix":""},{"id":271950763,"identity":"469af1b5-6ae9-48a3-99cc-1626be1d3bff","order_by":2,"name":"Fei Yu","email":"","orcid":"","institution":"First Affiliated Hospital Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Fei","middleName":"","lastName":"Yu","suffix":""},{"id":271950764,"identity":"6ad9a5f4-b7e3-4649-94af-991c24e34984","order_by":3,"name":"Bin Yang","email":"","orcid":"","institution":"Vision Medicals Co., Ltd,","correspondingAuthor":false,"prefix":"","firstName":"Bin","middleName":"","lastName":"Yang","suffix":""},{"id":271950765,"identity":"53e4ac77-ba62-473f-83b4-9e545bb8d993","order_by":4,"name":"Yifei Shen","email":"","orcid":"","institution":"Zhejiang University","correspondingAuthor":false,"prefix":"","firstName":"Yifei","middleName":"","lastName":"Shen","suffix":""},{"id":271950766,"identity":"a21a25b2-ae23-4d7f-bf62-a578cbfa5a90","order_by":5,"name":"Dan Zhang","email":"","orcid":"","institution":"Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Dan","middleName":"","lastName":"Zhang","suffix":""},{"id":271950767,"identity":"2d1a2ad1-50dc-402e-b467-ef412c205a76","order_by":6,"name":"Huifang Liu","email":"","orcid":"","institution":"Vision Medicals Co., Ltd","correspondingAuthor":false,"prefix":"","firstName":"Huifang","middleName":"","lastName":"Liu","suffix":""},{"id":271950768,"identity":"807c46b6-9315-4b48-a2ec-2eec3c52a847","order_by":7,"name":"Lou Bin","email":"","orcid":"","institution":"The First Affiliated Hospital, Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Lou","middleName":"","lastName":"Bin","suffix":""},{"id":271950769,"identity":"c2e52aa2-e442-4723-a5e9-45b723fc1a89","order_by":8,"name":"Bin Lou","email":"","orcid":"","institution":"The First Affiliated Hospital, Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Bin","middleName":"","lastName":"Lou","suffix":""},{"id":271950770,"identity":"13d29bb0-b7a5-4af2-9efb-850a724072bf","order_by":9,"name":"Jingchao Wang","email":"","orcid":"","institution":"The First Affiliated Hospital, Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Jingchao","middleName":"","lastName":"Wang","suffix":""},{"id":271950771,"identity":"3272f4d9-cc24-488c-8b09-037895da0c61","order_by":10,"name":"Kanagavel Murugesan","email":"","orcid":"","institution":"Stanford University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Kanagavel","middleName":"","lastName":"Murugesan","suffix":""},{"id":271950772,"identity":"714888a6-9601-499b-865e-38810baaed1b","order_by":11,"name":"Hui Tang","email":"","orcid":"","institution":"The First Affiliated Hospital, Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Hui","middleName":"","lastName":"Tang","suffix":""},{"id":271950773,"identity":"d4eeee00-44f9-4a1b-b9a9-a6d5440e7908","order_by":12,"name":"Hua Zhou","email":"","orcid":"https://orcid.org/0000-0001-6397-3203","institution":"The First Affiliated Hospital, Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Hua","middleName":"","lastName":"Zhou","suffix":""},{"id":271950774,"identity":"f1b62b66-095f-450c-bb5b-682358b8078e","order_by":13,"name":"Mengxiao Xie","email":"","orcid":"","institution":"The First Affiliated Hospital, Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Mengxiao","middleName":"","lastName":"Xie","suffix":""},{"id":271950775,"identity":"2d55a6c4-07d1-40a0-a165-aa876d759260","order_by":14,"name":"Lingjun Yuan","email":"","orcid":"","institution":"The First Affiliated Hospital, Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Lingjun","middleName":"","lastName":"Yuan","suffix":""},{"id":271950776,"identity":"e6c67e73-ab58-4cac-8752-81614dc439e9","order_by":15,"name":"Jieting Zhou","email":"","orcid":"","institution":"The First Affiliated Hospital, Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Jieting","middleName":"","lastName":"Zhou","suffix":""},{"id":271950777,"identity":"3a4fe844-005f-41a6-a2d5-44ec13042a53","order_by":16,"name":"Shufa Zheng","email":"","orcid":"","institution":"Zhejiang University School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Shufa","middleName":"","lastName":"Zheng","suffix":""}],"badges":[],"createdAt":"2024-01-21 07:10:15","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3883914/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3883914/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":50935078,"identity":"06e81148-a590-48ef-ad97-a7c1633b4091","added_by":"auto","created_at":"2024-02-09 20:28:03","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":1184287,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eStudy overview and analysis workflow.\u003c/strong\u003e \u003cstrong\u003eA.\u003c/strong\u003e Enrollment flow diagram for the patients with suspected lung cancer or pneumonia that was studied. The patients were divided into lung cancer group and pulmonary infection group. The pulmonary infection group was further divided into tuberculosis group, fungal infection group and bacterial infection group. \u003cstrong\u003eB. \u003c/strong\u003eData from each infection group were compared with the lung cancer group. The patients were randomly divided into a training cohort and a validation cohort at a ratio of 7:3. C. development and validation a microbe/host mNGS diagnostic approach for the differential diagnosis of lung cancer and pulmonary infections.\u003c/p\u003e","description":"","filename":"OnlineFigure1.png","url":"https://assets-eu.researchsquare.com/files/rs-3883914/v1/95021383d8c82e8fa3077bc7.png"},{"id":50935079,"identity":"0b15a8cb-5098-4009-988e-3c38ce406708","added_by":"auto","created_at":"2024-02-09 20:28:03","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":2093070,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMicrobial and bacteriophage composition analyses in BALF DNA mNGS data.\u003c/strong\u003e \u003cstrong\u003eA.\u003c/strong\u003e PCoA based on Bray–Curtis dissimilarity index of microbial and bacteriophage composition in comparing lung cancer and pulmonary infection (single-variable PERMANOVA, P value). PCoA1, principal component 1; PCoA2, principal component 2. \u003cstrong\u003eB.\u003c/strong\u003e PCoA based on Bray–Curtis dissimilarity index of BALF mNGS data in comparing lung cancer and infection subgroups (single-variable PERMANOVA, P value). \u003cstrong\u003eC.\u003c/strong\u003e Bubble plot displaying Lefse analysis results and the relative abundance of microorganisms consistently differentially enriched across various pulmonary disease groups. The size of each bubble corresponds to the median relative abundance of statistically significant findings. A red dashed line delineates positive (on the right) and negative (on the left) fold changes. Bubbles indicate statistical significance (adjusted p-value \u0026lt; 0.05). Distinct bubble colors denote different pulmonary diseases, matching the labels atop the bubble graph, illustrating the enrichment of specific species within each pulmonary disease group.\u003c/p\u003e","description":"","filename":"OnlineFigure2.png","url":"https://assets-eu.researchsquare.com/files/rs-3883914/v1/c671b96eb4676a4b5814f1ba.png"},{"id":50935084,"identity":"2431419e-a358-4478-8a2d-1c131047330d","added_by":"auto","created_at":"2024-02-09 20:28:03","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":3033049,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHost immune profiling in different pulmonary diseases. A.\u003c/strong\u003e Normalized enrichment scores of selected KEGG terms that reached statistical significance (adjusted p-value \u0026lt; 0.05) in the gene set enrichment analysis (GSEA) using differentially expressed (DE) genes between Lung Cancer and pulmonary infectious groups. \u003cstrong\u003eB. \u003c/strong\u003eHeatmap of 39 significantly differential expressed latent variables (LVs) which had biological function. (Left) Heatmap of differential LVs on average of each group. C-F. GSEA of Cell Cycle, Cytokine-cytokine receptor interaction, Interferon Signaling and Innate Immune System. Each line representing one particular gene set with unique color, and up-regulated genes located in the left approaching the origin of the coordinates, by contrast the down-regulated lay on the right of x-axis. Only gene sets with NOM p-value \u0026lt; 0.05 and FDR q-value \u0026lt; 0.05 were considered significant. And only several leading gene sets were displayed in the plot.\u003c/p\u003e","description":"","filename":"OnlineFigure3.png","url":"https://assets-eu.researchsquare.com/files/rs-3883914/v1/abcff6f9589cb5f1c3d840a7.png"},{"id":50935549,"identity":"65bce7e2-8122-4306-9409-6d4eca75e305","added_by":"auto","created_at":"2024-02-09 20:36:03","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1511036,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHost immune cell profiling in different pulmonary diseases. A. \u003c/strong\u003eIn silico estimation of cell-type proportions in the bulk RNA-sequencing using single-cell signatures. Cell-type abundance quantification plots. Comparison of abundance of Marcrophage M1 (\u003cstrong\u003eB\u003c/strong\u003e), Monocytes (\u003cstrong\u003eC\u003c/strong\u003e), Neutrophils (\u003cstrong\u003eD\u003c/strong\u003e) and Mast cells activated (\u003cstrong\u003eE\u003c/strong\u003e) among pulmonary disease groups in the BAL fluids. P-values were obtained using Wilcoxon rank-sum test (two-sided), *,**,*** represent significance between two groups and N.S. represents no significance between two groups.\u003c/p\u003e","description":"","filename":"OnlineFigure4.png","url":"https://assets-eu.researchsquare.com/files/rs-3883914/v1/8989f182e022cba5730eba88.png"},{"id":50935086,"identity":"9fa90df2-cf38-4ad8-a611-f69aa28fdbef","added_by":"auto","created_at":"2024-02-09 20:28:03","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":4570848,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eHost/microbe metagenomics-based modelling for lung cancer and pulmonary infection diagnosis. A. \u003c/strong\u003eGraphical scheme of modelling process. These comparisons encompassed six models: Model I: Microbial and Bacteriophage DNA relative abundances; Model II: Microbial and Bacteriophage RNA relative abundances; Model III: Host gene expression and composition of immune cell; Model IV: Transposable elements expressed levels; Model V: CNV-derived tumor fraction and Model VI: integrating all these features. \u003cstrong\u003eB.\u003c/strong\u003e The area under the curve (AUC), along with median values and 95% confidence intervals, was calculated for Receiver Operating Characteristic (ROC) curve analyses using various datasets. The training datasets were represented by black squares with error bars, while the validation datasets were denoted by red triangles. \u003cstrong\u003eC\u003c/strong\u003e-\u003cstrong\u003eF\u003c/strong\u003e. ROC of validation datasets for classifying lung cancer versus pulmonary infection. Delong test was used for Comparing Two ROC Curves-Paired Design. *,**,*** represent significance between two groups and N.S. represents no significance between two groups. \u003cstrong\u003eG-L\u003c/strong\u003e. Color distinctions represent various groups associated with pulmonary diseases. The median is visually depicted by black lines. The y-axis in each panel was trimmed at the maximum value among all groups of 1.5*IQR above the third quartile, where IQR is the interquartile range. For each host gene/transposable element (TE), we conducted formal comparisons among groups within the training cohort. Pairwise comparisons were performed with a Mann-Whitney test followed by Holm’s correction for multiple testing.\u003c/p\u003e","description":"","filename":"OnlineFigure5.png","url":"https://assets-eu.researchsquare.com/files/rs-3883914/v1/add97a99d088efb8071ebc26.png"},{"id":50935080,"identity":"1186e711-9f30-46ad-b986-76e115ff1877","added_by":"auto","created_at":"2024-02-09 20:28:03","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":331639,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe performance of composite predictive model using a rule-in and rule-out strategy. A. \u003c/strong\u003eIn the validation cohort comparing lung cancer and bacterial infections, only 2 cancer patients were incorrectly classified as bacterial infections (rule-out-band), while 8 bacterial infections were wrongly classified as cancer (rule-in-band). The overall identification accuracy was 83.6% (51/61). \u003cstrong\u003eB.\u003c/strong\u003e In the validation cohort comparing lung cancer and fungal infections, only 1 cancer patient was incorrectly classified as fungal infection (rule-out-band), and 1 fungal infection was wrongly classified as cancer (rule-in-band). The overall identification accuracy was 95% (42/44). \u003cstrong\u003eC.\u003c/strong\u003e In the validation cohort comparing lung cancer and pulmonary tuberculosis, only 4 tuberculosis patients were incorrectly classified as cancer (rule-in-band), with an overall identification accuracy of 91% (42/46).\u003c/p\u003e","description":"","filename":"OnlineFigure6.png","url":"https://assets-eu.researchsquare.com/files/rs-3883914/v1/61d686115f9cbcc15b7958eb.png"},{"id":59728501,"identity":"32aa4667-6af4-4e09-85b1-d7e40c9f00ef","added_by":"auto","created_at":"2024-07-05 11:34:17","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":4026876,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3883914/v1/e9751f49-f665-457d-9247-4932810a23c4.pdf"},{"id":50935076,"identity":"c3a7f26c-8028-4147-b0e2-897fd077c932","added_by":"auto","created_at":"2024-02-09 20:28:02","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":53079,"visible":true,"origin":"","legend":"Dataset S1,Dataset S2,Dataset S3,Dataset S4","description":"","filename":"Supplementarytable.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-3883914/v1/02261ad14005163387e8dee4.xlsx"},{"id":50935077,"identity":"e58283dc-73d1-4d24-b479-0358518524ce","added_by":"auto","created_at":"2024-02-09 20:28:02","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":1316913,"visible":true,"origin":"","legend":"\u003cp\u003eSoftware Policy checklist\u003c/p\u003e","description":"","filename":"nrsoftwarepolicyNCOMMS2404205.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3883914/v1/f63c2fb5d73da8aee121a4b8.pdf"},{"id":50935081,"identity":"ce01c68f-a77f-493f-97bd-2a3a8fc581a3","added_by":"auto","created_at":"2024-02-09 20:28:03","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":1665903,"visible":true,"origin":"","legend":"\u003cp\u003eReporting Summary\u003c/p\u003e","description":"","filename":"nrreportingsummaryNCOMMS2404205.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3883914/v1/e470be0130a460e223d16301.pdf"},{"id":50935085,"identity":"72e61824-6adb-475f-af96-78b3f811da7e","added_by":"auto","created_at":"2024-02-09 20:28:03","extension":"docx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":6623716,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cbr\u003e\u003c/p\u003e","description":"","filename":"SupplementaryInformation.docx","url":"https://assets-eu.researchsquare.com/files/rs-3883914/v1/c29d8d749a1fc6dcef5c7f18.docx"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"Metagenomic Analysis of Bronchoalveolar Lavage Fluid Enables Differential Diagnosis Between Lung Cancer and Pulmonary Infections","fulltext":[{"header":"Introduction","content":"\u003cp\u003eLung cancer and pulmonary infections pose significant global health challenges with high incidence, mortality rates, and substantial socioeconomic burdens \u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e,\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e. In the absence of rapid and accurate histopathological or microbiological test results, clinicians often find it challenging to distinguish between them based solely on clinical and radiological characteristics, leading to misdiagnosis and delays or errors in treatment \u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e,\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eVarious pathogens causing pulmonary infections, such as bacteria (\u003cem\u003ePseudomonas\u003c/em\u003e, \u003cem\u003eStreptococcus\u003c/em\u003e), mycobacteria (\u003cem\u003eMycobacterium tuberculosis\u003c/em\u003e, \u003cem\u003eNon-tuberculous mycobacteria\u003c/em\u003e), aerobic actinomycetes (\u003cem\u003eNocardia\u003c/em\u003e), fungi (\u003cem\u003eAspergillus\u003c/em\u003e, \u003cem\u003eMucor\u003c/em\u003e, \u003cem\u003ecryptococcus\u003c/em\u003e), and others, can mimic lung cancer, sharing indistinguishable clinical symptoms (e.g., dyspnea, fatigue, cough, and hemoptysis) and radiographic features (e.g., spiculated solid nodules or masses, cavities with nodular margins, and chest wall and mediastinal invasion) \u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e,\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e. Consequently, clinicians often employ multiple testing methods to detect lung infections and cancer \u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e. An affordable diagnostic method requiring fewer samples, aiding clinicians in quicker and accurate decisions, would greatly benefit patient treatment and management.\u003c/p\u003e \u003cp\u003eMetagenomic Next-generation Sequencing (mNGS) is a sequencing technology capable of identifying pathogens in specimens with microbial nucleic acid concentrations beyond detection limits within 24 hours or even less \u003csup\u003e\u003cspan additionalcitationids=\"CR8\" citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. In recent years, it has been widely employed in the diagnosis of various complex infectious diseases and has been confirmed a powerful tool with an excellent diagnostic accuracy in detecting pneumonia-related pathogens\u003csup\u003e\u003cspan additionalcitationids=\"CR11\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e\u003c/sup\u003e. Excitingly, recent studies have confirmed that analyzing transcriptomic data derived from human sequences of mNGS testing can aid in distinguishing infectious diseases such as sepsis, acute respiratory infections, tuberculous meningitis, and non-infectious diseases \u003csup\u003e\u003cspan additionalcitationids=\"CR14\" citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e. Developing intelligent algorithms based on chromosomal instability and tumor-related copy number variations generated by mNGS data is useful to diagnose malignant tumors \u003csup\u003e\u003cspan additionalcitationids=\"CR17\" citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e. These studies prompt us to further contemplate whether it is possible to utilize mNGS data from respiratory tract samples to establish an integrative genomic diagnostic method that combines microbial and host response characteristics of the patients. This method is anticipated to identify pulmonary infectious diseases that can be mistaken for lung cancer without escalating patient testing expenses, utilizing minimal tests and samples, and within a relatively short timeframe.\u003c/p\u003e \u003cp\u003eHere, we conducted mNGS testing on bronchoalveolar lavage fluid samples (BALF-mNGS) from 402 clinical patients with lung cancer or pulmonary infections. Subsequently, we analyzed the microbial information and host response information derived from metagenomic sequencing data, and based on this, we established and validated an integrated host/microbe metagenomics-driven machine learning approach for the differential diagnosis of lung cancer and pulmonary infections.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003e \u003cb\u003eStudy design, patient collection and ethics statement.\u003c/b\u003e Patients with suspected lung cancer or pulmonary infections were enrolled at the First Affiliated Hospital, Zhejiang University School of Medicine (FAHZU), a 5000-bed tertiary university hospital with a State Key Laboratory for Diagnosis and Treatment of Infectious Diseases located in southeastern China. Enrollment occurred between February 27, 2021, and May 27, 2023, for patients aged\u0026thinsp;\u0026ge;\u0026thinsp;18, requiring bronchoalveolar lavage fluid (BALF) samples within 72 hours of hospitalization to establish the etiology (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Exclusions involved cases with underlying leukemia, no definitive diagnosis post-extensive follow-up, or lacking matching DNA and RNA mNGS data from BALF samples (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). A total of 123 lung cancer, 279 pulmonary infections including tuberculosis, fungal, and bacterial infections, and 32 negative control cases (e.g., immune pneumonitis, organizing pneumonia and drug-related pneumonia) were included. The diagnosis of lung cancer relies on clinical suspicion and positive laboratory results from tests cytology, flow cytometry and/or tissue biopsy. Pathological information of all samples was determined based on surgically resected tissue sections according to 2015 WHO Histological Classification of Lung Cancer\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e\u003c/sup\u003e. The diagnosis of pulmonary infections is based on clinical suspicion and determination of the causative pathogen through standard microbiological diagnostics (e.g., cultures, antigen/antibody tests, PCR, sequencing). Archival material at FAHZU was retrospectively analyzed under no-patient contact protocols approved by the FAHZU Institutional Review Board (IIT20220714A). A written consent given prior to the procedure used to obtain the sample covered the use of residual samples for research.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe constructed training cohort and validation cohort by time order of collecting date. We ranked all lung cancer samples by collection time and separated them into 7:3 proportion (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, Extend Data Table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003eS1\u003c/span\u003e). On the other words, we can consider training and validation cohorts were two independent cohorts collected from different time. Also, we ranked all pulmonary infection samples and separated them into training cohort and validation cohort. Feature selections were operated in training cohort and blinded to validation cohort.\u003c/p\u003e \u003cp\u003e \u003cb\u003eDNA/RNA extraction, library construction and sequencing.\u003c/b\u003e For metagenomic sequencing (DNA sequencing), 1 mL of BALF sample was subjected to depletion of host nucleic acid using 1 U benzonase (Sigma) and 0.5% Tween 20 (Sigma) and incubation at 37\u0026deg;C for 5 min. A total of 600 \u0026micro;L of the mixture was transferred to new tubes containing 500 \u0026micro;L of ceramic beads for bead beating using a Minilys Personal TGrinder H24 Homogenizer (catalogue number: OSE-TH-01, Tiangen, China). Then, the nucleic acid from 400 \u0026micro;L of the pretreated sample was extracted and eluted in 60 \u0026micro;L elution buffer using a QIAamp UCP Pathogen Mini Kit (catalogue number: 50214, Qiagen, Germany). The extracted DNA was quantified using a Qubit dsDNA HS Assay Kit (catalogue number: Q32854, Invitrogen, USA)\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. For metatranscriptome sequencing (RNA sequencing), 1 mL BALF sample was centrifuged at 12,000 rpm for 10 min. Then, 200 \u0026micro;L of the precipitate was lysed in TRIzol LS (Thermo Fisher Scientific, Carlsbad, CA, USA), followed by RNA extraction using a Direct-zol RNA Miniprep kit (Zymo Research, Irvine, CA, USA) according to the manufacturer's instructions\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eAccording to the manufacturer's instructions, 30 \u0026micro;L DNA was used to generate libraries with the Nextera DNA Flex kit (Illumina, San Diego, CA, USA), and 10 \u0026micro;L of purified RNA was used for cDNA generation and library preparation with an Ovation Trio RNA-Seq Library Preparation Kit (NuGEN, CA, USA). A Qubit dsDNA HS Assay Kit was used to measure the library concentration. The library quality was assessed with an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) and a High Sensitivity DNA kit. The library was sequenced using an Illumina NextSeq 550 sequencer with a 75-cycle single-end sequencing strategy\u003csup\u003e\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e,\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003e\u003cb\u003eMicrobial composition analysis and bacteriophage annotation.\u003c/b\u003e As previous study described, we used a validated mNGS sequencing pipeline for microbial composition analysis\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e. In brief, Trimmomatic was used to remove low-quality, duplicate, and \u0026lt;\u0026thinsp;50 bp reads, as well as adapter contamination\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e. Kcomplexity removed low-complexity reads using using default parameters. Human sequences were excluded by mapping to human reference genome(hg38) using SNAP v1.0beta\u003csup\u003e\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e. Kraken2 v.2.0.7 and Bracken v.2.5 created taxonomic profiles using default settings and the default database (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://benlangmead.github.io/aws-indexes/k2\u003c/span\u003e\u003cspan address=\"https://benlangmead.github.io/aws-indexes/k2\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e)\u003csup\u003e9,23\u003c/sup\u003e. Sequencing reads for detected microbes were normalized as RPM (reads per million) to correct for various sequencing depths. The BALF mNGS data from 32 non-infection and non-cancer cases were used as negative controls (NC, Extended Data Table S2). Microorganisms found in NC samples and their relative abundance in different patient groups (lung cancer, infection and NCs) were shown in Extended Data Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Like what we did in previous research \u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e, we calculated mean and standard deviation of species relative abundances in NC and preset positive cutoff value with mean\u0026thinsp;+\u0026thinsp;3SD in NC. If a certain microbe in the mNGS data of patients with lung cancer or infection has a higher RPM than the cutoff value in NC, we defined it as positive and showed in our microbial count table.\u003c/p\u003e \u003cp\u003eFor bacteriophage annotation, the cleaned reads were aligned against a curated phage database (CPD) containing 26,159 phage representative genomes using BLAST (word size: 18, e-value: 0.0005, culling limit: 1)\u003csup\u003e\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e\u003c/sup\u003e. Phage counting in mNGS data relied on relative abundances\u003csup\u003e\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003eHost gene expression, transposable elements expression, cell-type composition analysis.\u003c/b\u003e For the analysis of host gene expression, high-quality data were aligned to the human genome hg38 using HISAT2 with default parameters. Gene-level quantification was performed using FeatureCounts \u003csup\u003e\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e,\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e\u003c/sup\u003e. The gene counts were aggregated using the featureCounts program from the Subread package release 2.0.0 (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://subread.sourceforge.net/\u003c/span\u003e\u003cspan address=\"http://subread.sourceforge.net/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e)\u003csup\u003e21\u003c/sup\u003e. Additionally, trimmed clean reads were mapped using STAR with previously defined parameters \u003csup\u003e\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e\u003c/sup\u003e. TEtranscripts software was utilized to estimate the abundances of Transposable Elements (TE) and to conduct differential expression analysis. The GTF file containing transposable element annotations was obtained from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://hammelllab.labsites.cshl.edu/software/#TEtranscripts\u003c/span\u003e\u003cspan address=\"https://hammelllab.labsites.cshl.edu/software/#TEtranscripts\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Differentially expressed genes (DEGs) and TE were identified in each group using the DESeq2 package, applying criteria of FDR\u0026thinsp;\u0026le;\u0026thinsp;0.05 and Fold-change\u0026thinsp;\u0026ge;\u0026thinsp;1.5\u003csup\u003e29\u003c/sup\u003e. Gene set enrichment analysis (GSEA) for DEGs was carried out using the REACTOME, KEGG, and GO databases \u003csup\u003e\u003cspan additionalcitationids=\"CR31\" citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e\u003c/sup\u003e. Significantly enriched pathways or biological processes were determined based on Fisher's exact test (p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05), following Benjamin and Hochberg's adjustment. For the identification of immune-related genes (IRGs), the list was obtained from the Import database (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.immport.org/home\u003c/span\u003e\u003cspan address=\"https://www.immport.org/home\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e), while interferon-stimulated genes (ISGs) were retrieved from the referenced study \u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e\u003c/sup\u003e. To estimate the relative proportions of invasive immune cell types and infer the proportions of immune cells, the CIBERSORT algorithm was applied with the original gene signature file LM22 and 1000 permutations \u003csup\u003e\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e\u003c/sup\u003e. Latent variables were calculated by PLIER R package\u003csup\u003e\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003eCopy number-derived tumor fractions calling.\u003c/b\u003e The DNA metagenomic sequencing data were used in downstream analyses to identify CNVs through the ichorCNA\u003csup\u003e\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e\u003c/sup\u003e. CNVkit and estimate software package to generate ctDNA tumor fractions as previously described and validated in tumor tissue and body fluid\u003csup\u003e\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e,\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e\u003c/sup\u003e. The ichorCNA ploidy parameter restart value was set to 2 and the maximum copy number to use was lowered to 3. The tumor fraction with the highest loglikelihood was retrieved and reported. Wilcoxon rank-sum test assessed the difference between each group's probability value.\u003c/p\u003e \u003cp\u003e \u003cb\u003eHost/microbe multi-dimension diagnosis modelling.\u003c/b\u003e Lefse and DESeq2 was used for investigating the association between disease types and microbial and host feature quantified using RNA and DNA data separately\u003csup\u003e\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e\u003c/sup\u003e. We first performed the univariate screening test to identify significant features associated between disease types and microbial DNA/RNA relative abundances, Transcripts per Millions (TPM) of host gene expression, TPM of transposable elements and tumor fraction/CNV of each chromosome, respectively. Within each type of data, given the adjusted P value cut-off were set to 0.05, the features with an adjusted P value less than the cut-off were selected and integrated as a sub-community. Different variables were entered for the selection process microbial-ecological indices, and species for which the relative abundances differed significantly between different pulmonary diseases. The R package \"mlr3\" was used to perform machine learning models. Variables were then entered into logistic regression models, and statistically significant variables were subsequently used to construct a prediction model. 10-fold CV were used for estimating 95% confidence interval of AUC. The prediction model accuracy, sensitivity and specificity were assessed using the AUC. DeLong test was used for calculating significance of p-value between two ROCs\u003csup\u003e\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003e \u003cb\u003eStatistics and reproducibility.\u003c/b\u003e The key features of the microbial composition, including the Shannon index, Simpson index, Chao1 index, and ACE index, were computed using the vegan package in R software after sequence processing\u003csup\u003e\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e\u003c/sup\u003e. Permutational multivariate ANOVA (PERMANOVA) was conducted using the \"vegan\" package to determine the difference in sample β-diversity (measured by Bray‒Curtis distance). Principal coordinates analysis (PCoA) and envfit functions were used to identify the species involved in microbial variation. Continuous nonparametric data were compared using the Mann‒Whitney-Wilcoxon test. Categorical data were compared using the chi-square test or Fisher's exact test. Correlation coefficients between species and clinical factors and between the microbiota and host genes were calculated using Spearman's rank correlation analysis using the Hmisc package. All data analyses were performed with the R studio built under R version 4.1.0. P values less than 0.05 were considered statistically significant.\u003c/p\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eClinical features of study cohort\u003c/h2\u003e \u003cp\u003eBased on the established criteria (\u003cspan refid=\"Sec2\" class=\"InternalRef\"\u003eMethods\u003c/span\u003e section, Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), we enrolled a total of 402 patients, consisting of 123 cancer patients (Cancer group) and 279 patients with pulmonary infections (Infection group). According to etiological findings, the infection group was further subdivided into three subgroups: pulmonary tuberculous (TB group, n\u0026thinsp;=\u0026thinsp;86), fungal infection (Fungal group, n\u0026thinsp;=\u0026thinsp;79), and bacterial infection (Bacterial group, n\u0026thinsp;=\u0026thinsp;114). Most patients, regardless of their subgroup, exhibited similar clinical and imaging characteristics, such as race (all were Chinese), underlying medical conditions, white blood cells (WBC) count and inflammatory indicators such as Procalcitonin (PCT) and C-reactive protein (CRP), and results of chest computed tomography (CT) scan (e.g., patchy shadows and nodules, cavities, mediastinal lymphadenopathy) (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The median mNGS DNA data per patient was 21.9\u0026nbsp;million reads (IQR 18.0-27.6 M), with the vast majority of reads (\u0026gt;\u0026thinsp;95%) being human. The median mNGS RNA data per patient was 19.1\u0026nbsp;million reads (IQR 13.8\u0026ndash;26.2 M).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDemographic and clinical characteristics of the enrolled patients\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCharacteristics\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOverall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLung Cancer\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBacterial Infection\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eFungal Infection\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eTuberculosis\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003ep-value\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003ePatient demographics\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTotal number, n\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e402\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e123\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e114\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e79\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge (median [IQR])\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e59.50 [50.00, 67.50]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e58.00 [51.00, 69.50]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e57.00 [50.00, 66.00]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e57.00 [46.00, 69.00]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e57.50 [35.00, 67.75]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.114\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSex\u0026thinsp;=\u0026thinsp;Male, n(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e255(63.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e86(69.9)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e60(52.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e50(63.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e59(68.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.086\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eUnderlying conditions\u003c/b\u003e, \u003cb\u003en(%)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCardiovascular disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e69 (17.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e24 (19.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e18 (15.8)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e17 (21.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e10 (11.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.309\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eImmunological disease\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e22 (5.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4 (3.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e7 (6.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e9 (11.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e2 (2.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.057\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLiver insufficiency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e34 (8.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e9 (7.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e9 (7.9)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e11 (13.9)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e5 (5.8)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.293\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRenal insufficiency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e63 (15.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e17 (13.8)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e18 (15.8)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e15 (19.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e13 (15.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.507\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCOPD\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e126 (31.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e33 (26.8)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e39 (34.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e27 (34.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e27 (31.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.941\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCenter nervous system disorder\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e21 (5.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e7 (5.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e8 (7.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e5 (6.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1 (1.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.22\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHIV\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3 (0.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0 (0.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0 (0.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e2 (2.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1 (1.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.064\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHypertension\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e99 (24.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e40 (32.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e27 (23.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e19 (24.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e13 (15.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.038\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiabetes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e55 (13.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e17 (13.8)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e19 (16.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e8 (10.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e11 (12.8)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.643\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLaboratory testing\u003c/b\u003e, \u003cb\u003emedian [IQR]\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eWBC (10\u0026times;10\u003csup\u003e9\u003c/sup\u003e/L)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e6.50 [5.05, 9.33]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e6.42 [4.95, 9.23]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e7.35 [5.32, 10.31]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e6.58 [4.65, 9.20]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e6.08 [5.07, 7.86]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.102\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNEUT(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e70.30 [61.82, 80.25]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e70.80 [63.35, 81.40]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e71.30 [61.95, 80.67]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e74.50 [60.50, 86.65]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e67.50 [60.20, 73.05]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.013\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCRP (mg/L)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e17.03 [3.30, 56.53]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e22.27 [4.73, 71.02]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e16.44 [3.30, 59.38]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e10.30 [3.21, 52.62]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e11.20 [3.55, 41.72]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.136\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePCT (ng/mL)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.09 [0.04, 0.36]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.10 [0.04, 0.37]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.11 [0.05, 0.48]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.19 [0.04, 0.55]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.05 [0.05, 0.12]\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.042\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eChest CT imaging features\u003c/b\u003e, \u003cb\u003en(%)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePulmonary emphysema\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e90 (22.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e39 (31.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e22 (19.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e17 (21.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e12 (14.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.018\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePulmonary nodule\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e136 (33.8)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e52 (42.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e22 (19.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e32 (40.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e30 (34.9)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePulmonary cavity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e50 (12.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e10 (8.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e14 (12.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e8 (10.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e18 (20.9)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.055\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGround-glass shadow\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e58 (14.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e23 (18.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e11 (9.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e15 (19.0)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e9 (10.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.095\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMultiple patchy solid shadows\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e284 (70.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e82 (66.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e80 (70.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e53 (67.1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e69 (80.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.078\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMalignant pleural effusion\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e144 (35.8)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e50 (40.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e37 (32.5)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e29 (36.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e28 (32.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.533\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePleural thickening\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e62 (15.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e28 (22.8)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e13 (11.4)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e11 (13.9)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e10 (11.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.069\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMediastinal lymphadenopathy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e146 (36.3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e58 (47.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e35 (30.7)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e21 (26.6)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e32 (37.2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.012\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eWe compared and screened the differential features within the mNGS data of the cancer and infection groups in training cohort to establish a differential diagnosis approach for lung cancer and pulmonary infections. Subsequently, the cancer group was compared separately to the tuberculosis, fungal, and bacterial groups to develop a diagnostic method capable of rapidly distinguishing lung cancer from infections caused by different pathogens.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eMicrobial community structure of different pulmonary diseases\u003c/h2\u003e \u003cp\u003eMicrobial communities were assessed in a total of 284 samples within the training cohort, comprising 87 cases of lung cancer and 197 cases of pulmonary infection. Analysis of the microbial community complexity, as gauged by metrics such as Richness (Observed species number), ACE, Chao1, Shannon diversity index, Simpson diversity index, and Evenness index, revealed no significant differences in α-diversity among the BALF samples. This observation held true for comparisons between the cancer and infection groups, as well as within different infection su RNA bgroups (Extended Data Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA, C, assessed via Mann-Whitney-U test, p-value\u0026thinsp;\u0026gt;\u0026thinsp;0.05). However, β-diversity analysis based on the Bray-Curtis distance indicated that the microbial composition of BALF samples of cancer group was distinct from either infection groups or infection subgroups (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA, B, PERMANOVA, P\u0026thinsp;\u0026lt;\u0026thinsp;0.01). To find out specific microorganisms of different pulmonary diseases, we did Lefse analysis. The findings revealed a higher prevalence of \u003cem\u003eS. oralis\u003c/em\u003e, \u003cem\u003eS. mitts\u003c/em\u003e, \u003cem\u003eV. parvula\u003c/em\u003e, \u003cem\u003eP. gingivalis\u003c/em\u003e, and \u003cem\u003eC. orthopsilosis\u003c/em\u003e, which are often regarded as oral or airway commensals, in lung cancer compared to pulmonary infection (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC, LDA score\u0026thinsp;\u0026gt;\u0026thinsp;2, adjusted p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05). Conversely, pathogenic microorganisms commonly linked with infections, such as \u003cem\u003eM. tuberculosis\u003c/em\u003e, \u003cem\u003eA. oryzae\u003c/em\u003e, \u003cem\u003eA. fumigatus\u003c/em\u003e, and \u003cem\u003eC. neoformans\u003c/em\u003e, were more frequently detected in the pulmonary infection. These distinct microbial profiles could potentially serve as valuable indicators for diagnosing pulmonary diseases. For the RNA data, both α-diversity (Extend Data Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB, D, Mann-Whitney-U test, p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05) and β-diversity (Extend Data Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA, B, PERMANOVA, p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.01) analyses of microbiome supported distinct microbial community features in the lower airways among pulmonary diseases. Interestingly, we found that RNA relative abundances of some pathogenic microorganisms, such as \u003cem\u003eM. tuberculosis\u003c/em\u003e, were significantly higher in Pulmonary Infection or infection subgroups comparing Lung Cancer (Extend Data Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC, Mann-Whitney-U test, p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.01).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eDifference in host immune response, transposable elements, and immune cell abundance of different pulmonary diseases\u003c/h2\u003e \u003cp\u003eTo discern host immune responses between lung cancer and infection, we conducted BALF host gene expression analyses, revealing substantial variations among various groups as depicted in volcano graph analysis (Extend Data Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eA-D). GSEA enrichment analysis highlighted significant enrichment of differential expression genes (DEGs) in innate immune pathways like T-cell receptor signaling and cytokine-cytokine receptor signaling (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA). Employing PLIER on training datasets, we delineated host transcriptomic profiles across 545 canonical Pathways, identifying multiple differentially expressed latent variables (LVs) with distinct biological functions across different groups (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB, Mann-Whitney-U Test, adjusted p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05). Specifically, in the cancer group, lower airway transcriptomes exhibited upregulation of the cell cycle (LV102 and LV107), while LV165, annotated as cytokine-cytokine receptor interaction pathways, displayed upregulation, contrary to LV86 in the same pathways (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB). Furthermore, we observed upregulation of interferon signaling and the innate immune system in infection groups (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eD, E), notably driven by Pulmonary Tuberculosis, which exhibited the well-established upregulation of interferon signaling.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFor further exploration, we selected differentially expressed immune genes (IMG) from the ImmPort database and interferon-stimulated genes (ISG) from prior research\u003csup\u003e\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e,\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u003c/sup\u003e. Notably, TB-associated markers GBP1 and GBP5 were elevated in the TB group (Extend Data Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA, adjusted p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.01). Four genes emerged as notably upregulated in the cancer group, and intriguingly, these genes were chemokines: C-C motif chemokine ligand 7 (CCL7), C-C motif chemokine ligand 8 (CCL8), C-C motif chemokine ligand 13 (CCL13) and pro-platelet basic protein (PPBP) also known as CXCL7 (Extend Data Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA, indicated by red triangle, adjusted p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.01). Studies suggest that CCL7, highly expressed in tumor tissues, recruits cDC1 cells, aiding antitumor immunity and checkpoint immunotherapy. Additionally, CCL7, CCL8, and CCL13 are linked to tumor-associated macrophages (M2)\u003csup\u003e\u003cspan additionalcitationids=\"CR44\" citationid=\"CR43\" class=\"CitationRef\"\u003e43\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe identified 27 transposable elements among lung cancer and three infection groups (Extend Data Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE-H, Extend Data Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB), notably finding significantly higher LTR-ERV (LTR6A and HUERS-P3-int) levels in lung cancer (Extend Data Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB, adjusted p-value\u0026thinsp;=\u0026thinsp;0.019).\u003c/p\u003e \u003cp\u003eTo investigate variations in immune cell abundance across different groups, we estimated cell-type levels in host transcriptomes using computational quantification methods, including a deconvolution approach implemented in CIBERSORTx. Macrophage M1 were significantly elevated in pulmonary tuberculosis (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eB, Mann-Whitney-U Test, adjusted p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05), whereas Macrophage M2 levels were higher in fungal infection, pulmonary tuberculosis and lung cancer Macrophage M2 levels were higher in fungal infection, pulmonary tuberculosis and lung cancer (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eC, Mann-Whitney-U Test, adjusted p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.01). Neutrophils were enriched in bacterial infectioncomparing with lung cancer (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eD, Mann-Whitney-U test, p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.01). Furthermore, we observed notably higher monocytes in fungal infection (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003eE, Mann-Whitney-U Test, adjusted p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.01).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003eCopy number variants and CNV-derived tumor fraction of different pulmonary diseases\u003c/h2\u003e \u003cp\u003eTo enhance CNV and tumor fraction estimations in BALF mNGS data, we used three distinct software tools. CNVkit revealed slight increases in CNV counts on chromosomes 11 (cancer group) and 3 (infection group) (Extended Data Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA, p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05). Higher CNV percentages on chromosome 3 were noted in the infection group (Extended Data Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eB, p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05). However, no significant CNV count or percentage differences emerged when comparing the cancer group with the three infection subgroups (Extended Data Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eC, D, p-value\u0026thinsp;\u0026gt;\u0026thinsp;0.05).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSubsequently, ichorCNA estimated tumor fractions at 5.96% (lung cancer, 95% CI 4.15%-7.77%) and 6.29% (pulmonary infection, 95% CI 0.54%-12.04%) (Extended Data Fig.\u0026nbsp;7A). Notably, no significant differences in tumor fractions were observed between the lung cancer and the three infection subgroups (Extended Data Fig.\u0026nbsp;7B). Calculated scores (Stromal, Immune, ESTIMATE, Tumor Purity) using 'estimate' software showed no differences between cancer and all the infection groups (Extended Data Fig.\u0026nbsp;7A, B). This suggests that, unlike Cancer-Negative (Benign) samples\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e\u003c/sup\u003e, BALF samples from infection patients display comparable levels of copy number variations seen in cancer patients.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eHost/microbe metagenomics-based modelling for lung cancer and pulmonary infection diagnosis\u003c/h2\u003e \u003cp\u003eWe initially utilized a range of combined metrics from the training set, including microbial and bacteriophage DNA/RNA abundances, host gene expression, immune cell composition, transposable element expression, and CNV-derived tumor fraction, to identify the most effective machine learning classifier among 10 options. The Random Forest classifier emerged as the optimal choice (Extended Data Fig.\u0026nbsp;8). Subsequently, within this classifier, we built six preset diagnostic models (Model I-VI) using various biological features or their combinations from the mNGS dataset (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA). These models were individually trained and evaluated to identify the most effective one for distinguishing lung cancer from infection in both general and subgroup comparisons (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA).\u003c/p\u003e \u003cp\u003eThe results unveiled that Model VI, incorporating differential features from microbial and bacteriophage DNA/RNA abundances, host gene expression, immune cell composition, transposable elements, and CNV-derived tumor fraction, exhibited the highest discriminatory capability in both general and subgroup comparisons. Specifically, for the general comparison, Model VI demonstrated an AUC of 0.87 (95% CI\u0026thinsp;=\u0026thinsp;0.857\u0026ndash;0.883) with 73.8% sensitivity and 84.5% specificity in the training cohort. In the validation cohort, it achieved an AUC of 0.831 (95% CI\u0026thinsp;=\u0026thinsp;0.819\u0026ndash;0.843) with 67.1% sensitivity and 94.4% specificity, effectively distinguishing lung cancer from pulmonary infections (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eC, Extended Data Table S3). The highlighted host transcriptome features in Model VI included genes involved in the cell cycle and cytokine-cytokine receptor pathways, such as ULBP1, BG3GAT1, and CCL13 (Extended Data Table S4; Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eG, H, and I, Mann-Whitney-U test, p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05). Notably, CCL13, a downstream gene of EGFR, serves as a typical LUAD biomarker, while ULBP1 and BG3GAT1 are genes regulated by CCL13 for cDC modulation.\u003c/p\u003e \u003cp\u003eIn the subgroup comparisons, Model VI showcased notable performance. For instance, in distinguishing lung cancer from bacterial infection, it attained an AUC of 0.849, with 67.6% sensitivity and 91.7% specificity in the validation cohort. Similarly, in discerning lung cancer from fungal infection, Model VI displayed an AUC of 0.811, sensitivity of 82.6%, and specificity of 77.7%. Furthermore, when differentiating lung cancer from pulmonary tuberculosis, Model VI achieved an AUC of 0.882, sensitivity of 84.0%, and specificity of 77.7% (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eB, D, E, and F, Extended Data Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Noteworthy observations included higher levels of MAS1 associated with apoptosis and tissue injuries in Bacterial Infection, increased AP1AR levels correlated with TLR4 in Pulmonary Tuberculosis, and elevated ZPLD2P levels in Fungal Infection compared to lung cancer (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eK, J, and L, Mann-Whitney-U test, p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.01).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003eA composite predictive model for Lung cancer and infection diagnosis\u003c/h2\u003e \u003cp\u003eWith a rule-in and rule-out strategy, we developed a composite predictive model that combines the Model-VI used for general comparison with either Model VI used for subgroup comparison, aiming to enhance the diagnostic accuracy for lung cancer and infections. In this rule-in and rule-out strategy, if both Model-VI of general comparison and either Model VI used for a subgroup comparison classified a patient as lung cancer, we defined it in rule-in-band (i.e., positive of cancer diagnosis). While if both models classified a patient as infection, we defined it in rule-out-band (i.e., positive of infection diagnosis) (Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eA).\u003c/p\u003e \u003cp\u003eThe validation cohort from each subgroup comparison was utilized to evaluate the performance of the composite predictive model. Within the lung cancer versus bacterial infection group, a total of 61 patients were categorized, with 41 identified as rule-in and 20 as rule-out (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Similarly, within the comparison of lung cancer versus fungal infection, 44 patients were classified, comprising 28 rule-in and 16 rule-out cases (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eB, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Moreover, in the evaluation of lung cancer versus fungal infection, 46 patients were allocated, consisting of 30 rule-in and 16 rule-out instances (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eC, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eTest statistics for combination strategy.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTreated\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCancer\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eInfection\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eLR*\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eSpecificity#\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSensitivity+\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eLung Cancer vs.\u003c/p\u003e \u003cp\u003eBacterial Infection\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRule-Out\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRule-In\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e4.13\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.805\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eLung Cancer vs.\u003c/p\u003e \u003cp\u003eFungal Infection\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRule-Out\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.938\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRule-In\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.964\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eLung Cancer vs.\u003c/p\u003e \u003cp\u003ePulmonary Tuberculosis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRule-Out\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRule-In\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e6.5\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.867\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"7\"\u003eLR*: Likelihood Ratio, serves as an indicator of cancer risks. A higher LR signifies a stronger correlation with lung cancer. For example, within the Rule-out band for lung cancer versus bacterial infection, there were 18 patients classified as infection and 2 patients classified as having cancer. The LR calculation resulted in 2/18\u0026thinsp;=\u0026thinsp;0.11.\u003c/td\u003e\u003c/tr\u003e \u003ctr\u003e\u003ctd colspan=\"7\"\u003eSpecificity#: refers to the accuracy of the rule-out band in correctly identifying infected patients. It is calculated as the number of infected patients correctly identified by the rule-out band (true positives) divided by the sum of true positives and the number of infected patients mistakenly identified as having cancer by the rule-out band.\u003c/td\u003e\u003c/tr\u003e \u003ctr\u003e\u003ctd colspan=\"7\"\u003eSensitivity+: refers to the accuracy of the rule-in band in correctly identifying cancer patients. This is calculated by dividing the number of cancer patients correctly identified by the rule-in band (true positives) by the sum of true positives and the number of cancer patients mistakenly identified as having an infection by the rule-in band.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFrom the results, it is evident that employing this strategy significantly enhanced the diagnostic accuracy (ACC) in distinguishing between lung cancer and bacterial infection, elevating it from 0.800 (56/70) to 0.836 (51/61) (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eA). This enhancement was accompanied by a sensitivity of 80.5%, reflecting the rule-in band's accuracy in correctly identifying individuals with cancer, and a specificity of 90.0%, demonstrating the rule-out band's accuracy in correctly identifying patients with an infection (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Similarly, there was a significant enhancement in ACC, rising from 0.797 (47/59) to 0.955 (42/44) alongside a specificity of 93.8% and sensitivity of 96.4% for diagnosing Lung cancer and Fungal Infection (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eB, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Of note, this method yielded 100.0% specificity and 86.7% sensitivity in distinguishing Lung cancer and Pulmonary Tuberculosis (ACC\u0026thinsp;=\u0026thinsp;0.913, 42/46) (Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003eC, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Accordingly, this integrated predictive approach indeed provides a highly accurate strategy to better utilize complex data generated by mNGS for distinguishing various pulmonary diseases in a clinically viable manner.\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eIn the realm of diagnostics, BALF-based mNGS testing has emerged as a rapid assay to pinpoint pulmonary infection pathogens \u003csup\u003e\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e,\u003cspan additionalcitationids=\"CR47\" citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e\u003c/sup\u003e. Despite over 90% of mNGS results being human-origin reads, often disregarded as \"noise\", recent research posits that these sequences may harbor valuable biomarkers linked to the host's disease state \u003csup\u003e\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e,\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e\u003c/sup\u003e. Our study pioneers a comprehensive host/microbe metagenomics approach, utilizing BALF mNGS data for diagnosing lung cancer and pulmonary infections. This innovative methodology exhibits exceptional accuracy in distinguishing between lung cancer and diverse pulmonary infections (including pulmonary tuberculosis, fungal infection, and bacterial infection), amplifying the clinical applicability of BALF mNGS testing.\u003c/p\u003e \u003cp\u003eWhile BALF samples exhibit inherent heterogeneity compared to whole blood or tissue specimens\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e,\u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e,\u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e\u003c/sup\u003e, our analytical model demonstrates significant robustness. Specifically tailored for distinguishing lung cancer from pulmonary infection, our Model VI achieved a notable AUC of 0.831 (95% CI\u0026thinsp;=\u0026thinsp;0.819\u0026ndash;0.843) within the validation cohort. This cohort encompassed a spectrum of complex pulmonary infections, including bacterial, fungal, and tuberculosis infections, each characterized by substantial variations in host immune responses, pathogen profiles, and microbiota compositions. Impressively, our model's performance is comparable to the uniform multi-omics models utilized in other studies for different sample types. For instance, the IMX-BVN model used to differentiate acute bacterial infections from others achieved an AUC of 0.86 (95% CI 0.77\u0026ndash;0.93), while distinguishing acute viral infections scored an AUC of 0.85 (95% CI 0.76\u0026ndash;0.93) \u003csup\u003e\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e\u003c/sup\u003e. The diagnostic capacity of whole blood transcriptomics in discerning sepsis from non-sepsis states showed an AUC of 0.82, while plasma cell-free RNA transcriptomics reached an AUC of 0.77\u003csup\u003e14\u003c/sup\u003e. These studies indirectly demonstrate similar remarkable efficacy of our model in managing complex pulmonary conditions. Moreover, in the validation cohort differentiating lung cancer from pulmonary tuberculosis, the AUC escalated to 0.882 (95% CI\u0026thinsp;=\u0026thinsp;0.875\u0026ndash;0.891), showcasing the advantage of integrating multi-omics into the Model VI.\u003c/p\u003e \u003cp\u003eIdentifying patients with lung cancer or pulmonary infections remains a crucial clinical challenge in many medical settings. The decision to administer empirical antibiotics often relies on an educated guess. If we could further refine our diagnosis of specific infection subgroups (such as bacteria, fungi, or tuberculosis) after confirming an infection firstly using our developed Model VI, it could assist clinicians in more accurately employing antibiotic therapies. Hence, we have further developed a more rigorous integrated predictive model based on predefined rule-in and rule-out strategies, enhancing the differentiation accuracy between lung cancer and infection subgroups. The result showed improved accuracy in distinguishing lung cancer from pulmonary tuberculosis (ACC\u0026thinsp;=\u0026thinsp;0.913), fungal infection (ACC\u0026thinsp;=\u0026thinsp;0.955), and bacterial infection (AUC\u0026thinsp;=\u0026thinsp;0.836). Such diagnostic approaches promise more precise clinical diagnoses, thereby yielding greater benefits for patients.\u003c/p\u003e \u003cp\u003eOur study tested an integrated host-microbe mNGS diagnostic approach, examining microbial (including bacteriophage) DNA/RNA abundance, host gene expression, transposable elements, immune cell composition, and copy-number variants (CNV) derived tumor fraction. Prior research only only one or a few features independently to help diagnosis, like lung cancer microbiomes \u003csup\u003e\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e\u003c/sup\u003e. Previous 16s rRNA sequencing revealed higher Firmicutes and \u003cem\u003eTM7\u003c/em\u003e presence in lung cancer versus healthy controls \u003csup\u003e\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e. \u003cem\u003eVeillonella\u003c/em\u003e and \u003cem\u003eMegasphaera\u003c/em\u003e showed promise as lung cancer biomarkers (AUC: 0.888), indicating distinctive bacterial profiles in lung cancer versus benign conditions \u003csup\u003e\u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e\u003c/sup\u003e. Our data detected subtle microbial differences between lung cancer and pulmonary infections and infectious subgroups. \u003cem\u003eVeillonella parvula\u003c/em\u003e notably increased in lung cancer compared to bacterial/fungal infection (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003eC, LDA score\u0026thinsp;\u0026gt;\u0026thinsp;2, adjusted p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05). Yet, the microbiome had limited diagnostic predictive power for diagnosis of lung cancer and pulmonary infection (AUC\u0026thinsp;=\u0026thinsp;0.518 in validation cohort). Extracting more distinctive biological information from sequencing data is crucial for differentiating lung cancer from pulmonary infections.\u003c/p\u003e \u003cp\u003eWe believe that host immune dysregulation disrupts the composition of respiratory microbiota. Previous literature has underscored significant changes in the dynamic equilibrium between host and microbiome in conditions such as lung cancer and infections \u003csup\u003e\u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e,\u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e\u003c/sup\u003e. In this study, we independently compared the contributions of Microbial/Bacteriophage relative abundances (Model I and Model II), Host gene expression and composition of immune cell (Model III), TE expression levels (Model IV), and CNV-derived tumor fraction (Model V) for diagnosing lung cancer from infections. The results indicate that host immune response (Model III) reflects the most prominent differences in pulmonary disease status compared to other categories (Figs.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003eD-F, DeLong's ROC test, p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05).\u003c/p\u003e \u003cp\u003eIn spite of the limited cellular content in certain BALF samples from patients, we successfully retrieved a robust human gene expression dataset. These data unveiled distinct immune responses across various pulmonary diseases. Analysis using PLIER revealed significant differences in latent variables associated with cell cycle, interferon, and cytokine pathways among these diseases. Notably, our findings highlighted genes involved in cell cycle regulation concurrently influencing PI3K-Akt signaling, p53 signaling, and lung cancer pathways, under the regulation of EGFR\u003csup\u003e\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e\u003c/sup\u003e. Additionally, we identified the GPB5 gene, known for its high diagnostic relevance in active tuberculosis \u003csup\u003e\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e\u003c/sup\u003e, and observed elevated expression levels of interferon signaling pathways in the pulmonary tuberculosis group compared to the other groups (Figure S4A). This further underscores the reliability of our findings regarding the host immune response.\u003c/p\u003e \u003cp\u003eOur top three classifier genes for lung cancer and pulmonary infection were identified as B3GAT1, ULBP1, and CCL13. Interestingly, these genes have not been previously linked in host gene expression signatures in bodily fluids related to lung cancer. Specifically, ULBP1's role as a ligand for the NKG2D receptor activates NK cells in lung cancer, fostering NK cell-mediated tumor surveillance and cytotoxicity \u003csup\u003e\u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e\u003c/sup\u003e. Expression of ULBP1-6, particularly in squamous-cell carcinoma, correlates with clinical outcomes in NSCLC patients, suggesting a predictive value for clinical prognosis \u003csup\u003e\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e\u003c/sup\u003e. Conversely, CCL13, a ligand for CCR2, contributes to cancer-related processes such as metastasis and immunosuppression. CCR2 expression in M2 macrophages is integral in the bidirectional communication between these macrophages and cancer cells, driving lung cancer progression \u003csup\u003e\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e\u003c/sup\u003e. Additionally, CCL13, derived from M2 tumor-associated macrophages, promotes oral cancer metastasis by inducing inflammatory cytokines\u003csup\u003e\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e\u003c/sup\u003e. Finally, B3GAT1, or beta-1,3-glucuronyltransferase 1, holds significance in cancer, particularly concerning tumor cell motility and specific carbohydrate epitope biosynthesis. Its role in canonical integrin signaling pathways influences tumor cell motility, while its involvement in HNK-1 carbohydrate epitope biosynthesis bears relevance to neurodevelopment and cancer-related processes \u003csup\u003e\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eWe investigated for the first time the expression levels of transposable elements in BALF samples from pulmonary diseases in this study. HERVK11D showed higher expression in lung cancer compared to fungal infection and tuberculosis. Similarly, ERVK-MER11B was more expressed in bacterial infection than tuberculosis (Figure S4, adjusted p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05). The heightened expression of HERV-K, linked to basal-like and triple-negative breast cancer progression, illustrates altered gene expression driving cancer advancement. HERV-derived long non-coding RNAs also promote cancer progression, signaling significant gene profile shifts in these cancers\u003csup\u003e\u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e,\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e\u003c/sup\u003e. Additionally, two ERV1 were notably higher in lung cancer compared to all pulmonary infections (Figure S4, adjusted p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05). These findings underscore the importance of Repetitive Sequences in human health, exemplified by severe COVID-19 pneumonia triggering intense inflammatory responses and HERVs dysregulation in BALF samples. For example, HERV-FRD, notably upregulated in COVID-19 BALF, suggests HERVs as potential disease progression biomarkers linked to increased severity in aging \u003csup\u003e\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e\u003c/sup\u003e. Surprisingly, we first found that certain transposable elements were more expressed in BALF during pulmonary infection than in lung cancer. GSAT satellite, notably higher in bacterial and fungal infections (Figure S4, adjusted p-value\u0026thinsp;\u0026lt;\u0026thinsp;0.05), regulated by AP-1, holds significance in various pulmonary diseases by impacting gene expression and inflammatory cell activation crucial in pulmonary infections \u003csup\u003e\u003cspan citationid=\"CR66\" class=\"CitationRef\"\u003e66\u003c/span\u003e,\u003cspan citationid=\"CR67\" class=\"CitationRef\"\u003e67\u003c/span\u003e\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eSeveral studies have employed copy number variants (CNV) from bodily fluids to diagnose pulmonary malignancies \u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e,\u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e68\u003c/span\u003e\u003c/sup\u003e. In these investigations, whole genome testing of metagenomic data demonstrated a heightened diagnostic accuracy for pulmonary malignancies in samples initially identified as negative via conventional testing. Intriguingly, our findings indicated no significant distinction in the CNV- derived tumor fraction of BALF samples between the lung cancer and pulmonary infections. This reiterates that relying solely on one-dimensional information acquired from conventional BALF mNGS, characterized by low-depth sequencing, is insufficient for diagnosing intricate or multifaceted diseases.\u003c/p\u003e \u003cp\u003eDespite insights gained, our study has limitations. Firstly, our cohort lacked viral pneumonia cases due to reduced incidence during China's COVID-19 control measures. in fact, most of viral pneumonia often exhibits clinical and radiological differences from lung cancer, lessening the need for complex differential diagnostics than other infections associated with bacteria, fungi and mycobacteria. Secondly, our study focused on distinguishing infection from cancer, and therefore, the model established cannot address the differentiation effectiveness among various infection subgroups. We are conducting another study developing diagnostic models for distinguishing between infection subgroups, and some progress has been made thus far.\u003c/p\u003e \u003cp\u003eIn conclusion, we report that integrated host and microbe information from BAL nucleic acid enables accurate diagnosis of lung cancer and pulmonary infections. Future studies are needed to validate and test the clinical impact of this culture-independent diagnostic approach.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eData Availability.\u0026nbsp;\u003c/strong\u003eMicrobial reads from metagenomic and metatranscriptomic data were deposited in NCBI's Sequence Read Archive (SRA) database under project number PRJNA1056765. Host gene expression profile derived from metatranscriptomic were deposited in GSE252118.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthical Approval\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe study was approved by the Ethics Committee of the Institutional Review Board of FAHZU (study no. IIT20220714A). Written informed consent was waived because of the non-interventional study design.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe thank all clinicians who provided detailed diagnostic and treatment data of patients for our study, as well as all infectious disease (ID) physicians, clinical microbiologists and oncologists who received our clinical consultations.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was supported by the National Key R\u0026amp;D Program of China (2023YFC2308300),“Leading Geese” Research and Development Plan of Zhejiang Province (No. 2024C03218), Zhejiang Provincial Natural Science Foundation (grant number LY23H200001), Zhejiang Provincial Natural Science Foundation for Distinguished Young Scholar (grant number LR23H200002).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contributions\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eStudy design, D.H., S.Z., and Y.C.; Sample detection, F.Y., D.Z., M.X., L.Y., J.Z., J.W. and J.T; Data collection, D.H., F.Y., B.L., H.T. and H.Z.; Data analysis, D.H., F.Y., B.Y., K.M., H.L. and Y.C.; Wrote the paper: D.H. and B.Y. All authors have read and approved the final version of the manuscript.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eKreier, F. Cancer will cost the world $25 trillion over next 30 years. \u003cem\u003eNature\u003c/em\u003e, (2023).\u003c/li\u003e\n\u003cli\u003eAgusti, A., Vogelmeier, C. F. \u0026amp; Halpin, D. M. G. Tackling the global burden of lung disease through prevention and early diagnosis. \u003cem\u003eThe Lancet Respiratory Medicine\u003c/em\u003e. \u003cstrong\u003e10\u003c/strong\u003e, 1013-1015 (2022).\u003c/li\u003e\n\u003cli\u003eMcKelvy, B. J. et al. Infectious Diseases That May Mimic Lung Cancer. In: Moran, C. A., Truong, M. T. \u0026amp; de Groot, P. M., editors. \u003cem\u003eThe Thorax: Medical, Radiological, and Pathological Assessment\u003c/em\u003e. Cham: Springer International Publishing; 2023. pp. 827-851.\u003c/li\u003e\n\u003cli\u003eNewman-Toker, D. E. et al. Serious misdiagnosis-related harms in malpractice claims: The \u0026quot;Big Three\u0026quot; - vascular events, infections, and cancers. \u003cem\u003eDiagnosis (Berlin, Germany)\u003c/em\u003e. \u003cstrong\u003e6\u003c/strong\u003e, 227 (2019).\u003c/li\u003e\n\u003cli\u003eGuimar\u0026atilde;es, M. D. et al. Fungal Infection Mimicking Pulmonary Malignancy: Clinical and Radiological Characteristics. \u003cem\u003eLung\u003c/em\u003e. \u003cstrong\u003e191\u003c/strong\u003e, 655-662 (2013).\u003c/li\u003e\n\u003cli\u003eFabre, V. et al. Principles of diagnostic stewardship: A practical guide from the Society for Healthcare Epidemiology of America Diagnostic Stewardship Task Force. \u003cem\u003eInfection Control \u0026amp; Hospital Epidemiology\u003c/em\u003e. \u003cstrong\u003e44\u003c/strong\u003e, 178-185 (2023).\u003c/li\u003e\n\u003cli\u003eBlauwkamp, T. A. et al. Analytical and clinical validation of a microbial cell-free DNA sequencing test for infectious disease. \u003cem\u003eNature Microbiology\u003c/em\u003e. \u003cstrong\u003e4\u003c/strong\u003e, 663-674 (2019).\u003c/li\u003e\n\u003cli\u003eMiller, S. et al. Laboratory validation of a clinical metagenomic sequencing assay for pathogen detection in cerebrospinal fluid. \u003cem\u003eGenome Res.\u003c/em\u003e \u003cstrong\u003e29\u003c/strong\u003e, 831-842 (2019).\u003c/li\u003e\n\u003cli\u003eDiao, Z. et al. Validation of a Metagenomic Next-Generation Sequencing Assay for Lower Respiratory Pathogen Detection. \u003cem\u003eMicrobiology Spectrum\u003c/em\u003e. \u003cstrong\u003e11\u003c/strong\u003e, (2023).\u003c/li\u003e\n\u003cli\u003eChiu, C. Y. \u0026amp; Miller, S. A. Clinical metagenomics. \u003cem\u003eNat. Rev. Genet.\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 341-355 (2019).\u003c/li\u003e\n\u003cli\u003eDiao, Z., Han, D., Zhang, R. \u0026amp; Li, J. Metagenomics next-generation sequencing tests take the stage in the diagnosis of lower respiratory tract infections. \u003cem\u003eJournal of Advanced Research\u003c/em\u003e. \u003cstrong\u003e38\u003c/strong\u003e, 201-212 (2022).\u003c/li\u003e\n\u003cli\u003eEdgeworth, J. D. Respiratory metagenomics: route to routine service. \u003cem\u003eCurr. Opin. Infect. Dis.\u003c/em\u003e \u003cstrong\u003e36\u003c/strong\u003e, 115-123 (2023).\u003c/li\u003e\n\u003cli\u003eRamachandran, P. S. et al. Integrating central nervous system metagenomics and host response for diagnosis of tuberculosis meningitis and its mimics. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, (2022).\u003c/li\u003e\n\u003cli\u003eKalantar, K. L. et al. Integrated host-microbe plasma metagenomics for sepsis diagnosis in a prospective cohort of critically ill adults. \u003cem\u003eNature Microbiology\u003c/em\u003e. \u003cstrong\u003e7\u003c/strong\u003e, 1805-1816 (2022).\u003c/li\u003e\n\u003cli\u003eLangelier, C. et al. Integrating host response and unbiased microbe detection for lower respiratory tract infection diagnosis in critically ill adults. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e. \u003cstrong\u003e115\u003c/strong\u003e, E12353-E12362 (2018).\u003c/li\u003e\n\u003cli\u003eGu, W. et al. Detection of cryptogenic malignancies from metagenomic whole genome sequencing of body fluids. \u003cem\u003eGenome Med.\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, (2021).\u003c/li\u003e\n\u003cli\u003eGu, W. et al. Detection of Neoplasms by Metagenomic Next-Generation Sequencing of Cerebrospinal Fluid. \u003cem\u003eJama Neurol.\u003c/em\u003e \u003cstrong\u003e78\u003c/strong\u003e, 1355-1366 (2021).\u003c/li\u003e\n\u003cli\u003eGuo, Y. et al. Metagenomic next-generation sequencing to identify pathogens and cancer in lung biopsy tissue. \u003cem\u003eEbiomedicine\u003c/em\u003e. \u003cstrong\u003e73\u003c/strong\u003e, 103639 (2021).\u003c/li\u003e\n\u003cli\u003eTravis, W. D. et al. The 2015 World Health Organization Classification of Lung Tumors: Impact of Genetic, Clinical and Radiologic Advances Since the 2004 Classification. \u003cem\u003eJ. Thorac. Oncol.\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 1243-1260 (2015).\u003c/li\u003e\n\u003cli\u003eSulaiman, I. et al. Microbial signatures in the lower airways of mechanically ventilated COVID-19 patients associated with poor clinical outcome. \u003cem\u003eNat Microbiol\u003c/em\u003e. \u003cstrong\u003e6\u003c/strong\u003e, 1245-1258 (2021).\u003c/li\u003e\n\u003cli\u003eZhou, Z. et al. Heightened Innate Immune Responses in the Respiratory Tract of COVID-19 Patients. \u003cem\u003eCell Host Microbe\u003c/em\u003e. \u003cstrong\u003e27\u003c/strong\u003e, 883-890 (2020).\u003c/li\u003e\n\u003cli\u003eBolger, A. M., Lohse, M. \u0026amp; Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. \u003cem\u003eBioinformatics\u003c/em\u003e. \u003cstrong\u003e30\u003c/strong\u003e, 2114-2120 (2014).\u003c/li\u003e\n\u003cli\u003eHo, S., Wheeler, N. E., Millard, A. D. \u0026amp; van Schaik, W. Gauge your phage: benchmarking of bacteriophage identification tools in metagenomic sequencing data. \u003cem\u003eMicrobiome\u003c/em\u003e. \u003cstrong\u003e11\u003c/strong\u003e, 84 (2023).\u003c/li\u003e\n\u003cli\u003eHaddock, N. L. et al. Phage diversity in cell-free DNA identifies bacterial pathogens in human sepsis cases. \u003cem\u003eNat Microbiol\u003c/em\u003e. \u003cstrong\u003e8\u003c/strong\u003e, 1495-1507 (2023).\u003c/li\u003e\n\u003cli\u003eHaddock, N. L. et al. Phage diversity in cell-free DNA identifies bacterial pathogens in human sepsis cases. \u003cem\u003eNat Microbiol\u003c/em\u003e. \u003cstrong\u003e8\u003c/strong\u003e, 1495-1507 (2023).\u003c/li\u003e\n\u003cli\u003eKim, D., Paggi, J. M., Park, C., Bennett, C. \u0026amp; Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. \u003cem\u003eNat. Biotechnol.\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 907-915 (2019).\u003c/li\u003e\n\u003cli\u003eLiao, Y., Smyth, G. K. \u0026amp; Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. \u003cem\u003eBioinformatics\u003c/em\u003e. \u003cstrong\u003e30\u003c/strong\u003e, 923-930 (2014).\u003c/li\u003e\n\u003cli\u003eJin, Y., Tam, O. H., Paniagua, E. \u0026amp; Hammell, M. TEtranscripts: a package for including transposable elements in differential expression analysis of RNA-seq datasets. \u003cem\u003eBioinformatics\u003c/em\u003e. \u003cstrong\u003e31\u003c/strong\u003e, 3593-3599 (2015).\u003c/li\u003e\n\u003cli\u003eLove, M. I., Huber, W. \u0026amp; Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. \u003cem\u003eGenome Biol\u003c/em\u003e. \u003cstrong\u003e15\u003c/strong\u003e, 550 (2014).\u003c/li\u003e\n\u003cli\u003eSubramanian, A. et al. Gene Set Enrichment Analysis: A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles. \u003cem\u003eProceedings of the National Academy of Sciences - Pnas\u003c/em\u003e. \u003cstrong\u003e102\u003c/strong\u003e, 15545-15550 (2005).\u003c/li\u003e\n\u003cli\u003eKanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. \u0026amp; Morishima, K. KEGG: new perspectives on genomes, pathways, diseases and drugs. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e45\u003c/strong\u003e, D353-D361 (2017).\u003c/li\u003e\n\u003cli\u003eGillespie, M. et al. The reactome pathway knowledgebase 2022. \u003cem\u003eNucleic Acids Res.\u003c/em\u003e \u003cstrong\u003e50\u003c/strong\u003e, D687-D692 (2022).\u003c/li\u003e\n\u003cli\u003eSchoggins, J. W. et al. A diverse range of gene products are effectors of the type I interferon antiviral response. \u003cem\u003eNature\u003c/em\u003e. \u003cstrong\u003e472\u003c/strong\u003e, 481-485 (2011).\u003c/li\u003e\n\u003cli\u003eSteen, C. B., Liu, C. L., Alizadeh, A. A. \u0026amp; Newman, A. M. Profiling Cell Type Abundance and Expression in Bulk Tissues with CIBERSORTx. \u003cem\u003eMethods Mol Biol\u003c/em\u003e. \u003cstrong\u003e2117\u003c/strong\u003e, 135-157 (2020).\u003c/li\u003e\n\u003cli\u003eMao, W., Zaslavsky, E., Hartmann, B. M., Sealfon, S. C. \u0026amp; Chikina, M. Pathway-level information extractor (PLIER) for gene expression data. \u003cem\u003eNat. Methods\u003c/em\u003e. \u003cstrong\u003e16\u003c/strong\u003e, 607-610 (2019).\u003c/li\u003e\n\u003cli\u003eAdalsteinsson, V. A. et al. Scalable whole-exome sequencing of cell-free DNA reveals high concordance with metastatic tumors. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 1313-1324 (2017).\u003c/li\u003e\n\u003cli\u003eTalevich, E., Shain, A. H., Botton, T. \u0026amp; Bastian, B. C. CNVkit: Genome-Wide Copy Number Detection and Visualization from Targeted DNA Sequencing. \u003cem\u003ePlos Comput Biol\u003c/em\u003e. \u003cstrong\u003e12\u003c/strong\u003e, e1004873 (2016).\u003c/li\u003e\n\u003cli\u003eYoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e4\u003c/strong\u003e, 2612 (2013).\u003c/li\u003e\n\u003cli\u003eSegata, N. et al. Metagenomic biomarker discovery and explanation. \u003cem\u003eGenome Biol\u003c/em\u003e. \u003cstrong\u003e12\u003c/strong\u003e, R60 (2011).\u003c/li\u003e\n\u003cli\u003eMayhew, M. B. et al. A generalizable 29-mRNA neural-network classifier for acute bacterial and viral infections. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 1177 (2020).\u003c/li\u003e\n\u003cli\u003eRen, L. et al. Dynamics of the Upper Respiratory Tract Microbiota and Its Association with Mortality in COVID-19. \u003cem\u003eAm J Respir Crit Care Med\u003c/em\u003e. \u003cstrong\u003e204\u003c/strong\u003e, 1379-1390 (2021).\u003c/li\u003e\n\u003cli\u003eBhattacharya, S. et al. ImmPort, toward repurposing of open access immunological assay data for translational and clinical research. \u003cem\u003eSci Data\u003c/em\u003e. \u003cstrong\u003e5\u003c/strong\u003e, 180015 (2018).\u003c/li\u003e\n\u003cli\u003eNakayama, T. et al. Inflammatory molecular endotypes of nasal polyps derived from White and Japanese populations. \u003cem\u003eJ. Allergy Clin. Immun.\u003c/em\u003e \u003cstrong\u003e149\u003c/strong\u003e, 1296-1308 (2022).\u003c/li\u003e\n\u003cli\u003eKorbecki, J. et al. CC Chemokines in a Tumor: A Review of Pro-Cancer and Anti-Cancer Properties of the Ligands of Receptors CCR1, CCR2, CCR3, and CCR4. \u003cem\u003eInt. J. Mol. Sci.\u003c/em\u003e \u003cstrong\u003e21\u003c/strong\u003e, 8412 (2020).\u003c/li\u003e\n\u003cli\u003eLiu, Z. et al. Tumor-Associated Macrophages Promote Metastasis of Oral Squamous Cell Carcinoma via CCL13 Regulated by Stress Granule. \u003cem\u003eCancers (Basel)\u003c/em\u003e. \u003cstrong\u003e14\u003c/strong\u003e, (2022).\u003c/li\u003e\n\u003cli\u003eDiao, Z., Han, D., Zhang, R. \u0026amp; Li, J. Metagenomics next-generation sequencing tests take the stage in the diagnosis of lower respiratory tract infections. \u003cem\u003eJournal of Advanced Research\u003c/em\u003e, (2021).\u003c/li\u003e\n\u003cli\u003eCharalampous, T. et al. Evaluating the potential for respiratory metagenomics to improve treatment of secondary infection and detection of nosocomial transmission on expanded COVID-19 intensive care units. \u003cem\u003eGenome Med.\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 182 (2021).\u003c/li\u003e\n\u003cli\u003eCharalampous, T. et al. Nanopore metagenomics enables rapid clinical diagnosis of bacterial lower respiratory infection. \u003cem\u003eNat. Biotechnol.\u003c/em\u003e \u003cstrong\u003e37\u003c/strong\u003e, 783-792 (2019).\u003c/li\u003e\n\u003cli\u003eMick, E. et al. Integrated host/microbe metagenomics enables accurate lower respiratory tract infection diagnosis in critically ill children. \u003cem\u003eJ. Clin. Invest.\u003c/em\u003e \u003cstrong\u003e133\u003c/strong\u003e, (2023).\u003c/li\u003e\n\u003cli\u003eDavidson, K. R., Ha, D. M., Schwarz, M. I. \u0026amp; Chan, E. D. Bronchoalveolar lavage as a diagnostic procedure: a review of known cellular and molecular findings in various lung diseases. \u003cem\u003eJ Thorac Dis\u003c/em\u003e. \u003cstrong\u003e12\u003c/strong\u003e, 4991-5019 (2020).\u003c/li\u003e\n\u003cli\u003eChellapandian, D. et al. Bronchoalveolar lavage and lung biopsy in patients with cancer and hematopoietic stem-cell transplantation recipients: a systematic review and meta-analysis. \u003cem\u003eJ. Clin. Oncol.\u003c/em\u003e \u003cstrong\u003e33\u003c/strong\u003e, 501-509 (2015).\u003c/li\u003e\n\u003cli\u003eMayhew, M. B. et al. A generalizable 29-mRNA neural-network classifier for acute bacterial and viral infections. \u003cem\u003eNat. Commun.\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, (2020).\u003c/li\u003e\n\u003cli\u003eRan, Z. et al. Pulmonary Micro-Ecological Changes and Potential Microbial Markers in Lung Cancer Patients. \u003cem\u003eFront Oncol\u003c/em\u003e. \u003cstrong\u003e10\u003c/strong\u003e, 576855 (2020).\u003c/li\u003e\n\u003cli\u003eLee, S. H. et al. Characterization of microbiome in bronchoalveolar lavage fluid of patients with lung cancer comparing with benign mass like lesions. \u003cem\u003eLung Cancer\u003c/em\u003e. \u003cstrong\u003e102\u003c/strong\u003e, 89-95 (2016).\u003c/li\u003e\n\u003cli\u003eDickson, R. P. \u0026amp; Huffnagle, G. B. The Lung Microbiome: New Principles for Respiratory Bacteriology in Health and Disease. \u003cem\u003ePlos Pathog\u003c/em\u003e. \u003cstrong\u003e11\u003c/strong\u003e, e1004923 (2015).\u003c/li\u003e\n\u003cli\u003eMan, W. H., de Steenhuijsen Piters, W. A. A. \u0026amp; Bogaert, D. The microbiota of the respiratory tract: gatekeeper to respiratory health. \u003cem\u003eNature Reviews. Microbiology\u003c/em\u003e. \u003cstrong\u003e15\u003c/strong\u003e, 259-270 (2017).\u003c/li\u003e\n\u003cli\u003eDa, C. S. G., Shepherd, F. A. \u0026amp; Tsao, M. S. EGFR mutations and lung cancer. \u003cem\u003eAnnu Rev Pathol\u003c/em\u003e. \u003cstrong\u003e6\u003c/strong\u003e, 49-69 (2011).\u003c/li\u003e\n\u003cli\u003eSweeney, T. E., Braviak, L., Tato, C. M. \u0026amp; Khatri, P. Genome-wide expression for diagnosis of pulmonary tuberculosis: a multicohort analysis. \u003cem\u003eThe Lancet Respiratory Medicine\u003c/em\u003e. \u003cstrong\u003e4\u003c/strong\u003e, 213-224 (2016).\u003c/li\u003e\n\u003cli\u003eSchmiedel, D. \u0026amp; Mandelboim, O. NKG2D Ligands-Critical Targets for Cancer Immune Escape and Therapy. \u003cem\u003eFront Immunol\u003c/em\u003e. \u003cstrong\u003e9\u003c/strong\u003e, 2040 (2018).\u003c/li\u003e\n\u003cli\u003eGowen, B. G. et al. A forward genetic screen reveals novel independent regulators of ULBP1, an activating ligand for natural killer cells. \u003cem\u003eElife\u003c/em\u003e. \u003cstrong\u003e4\u003c/strong\u003e, (2015).\u003c/li\u003e\n\u003cli\u003eSchmall, A. et al. Macrophage and cancer cell cross-talk via CCR2 and CX3CR1 is a fundamental mechanism driving lung cancer. \u003cem\u003eAm J Respir Crit Care Med\u003c/em\u003e. \u003cstrong\u003e191\u003c/strong\u003e, 437-447 (2015).\u003c/li\u003e\n\u003cli\u003eJeffries, A. R. et al. beta-1,3-Glucuronyltransferase-1 gene implicated as a candidate for a schizophrenia-like psychosis through molecular analysis of a balanced translocation. \u003cem\u003eMol Psychiatry\u003c/em\u003e. \u003cstrong\u003e8\u003c/strong\u003e, 654-663 (2003).\u003c/li\u003e\n\u003cli\u003eLemaitre, C., Tsang, J., Bireau, C., Heidmann, T. \u0026amp; Dewannieux, M. A human endogenous retrovirus-derived gene that can contribute to oncogenesis by activating the ERK pathway and inducing migration and invasion. \u003cem\u003ePlos Pathog\u003c/em\u003e. \u003cstrong\u003e13\u003c/strong\u003e, e1006451 (2017).\u003c/li\u003e\n\u003cli\u003eJin, X. et al. The endogenous retrovirus-derived long noncoding RNA TROJAN promotes triple-negative breast cancer progression via ZMYND8 degradation. \u003cem\u003eSci Adv\u003c/em\u003e. \u003cstrong\u003e5\u003c/strong\u003e, eaat9820 (2019).\u003c/li\u003e\n\u003cli\u003eKitsou, K. et al. Upregulation of Human Endogenous Retroviruses in Bronchoalveolar Lavage Fluid of COVID-19 Patients. \u003cem\u003eMicrobiol Spectr\u003c/em\u003e. \u003cstrong\u003e9\u003c/strong\u003e, e126021 (2021).\u003c/li\u003e\n\u003cli\u003eWang, A. et al. Transcription factor complex AP-1 mediates inflammation initiated by Chlamydia pneumoniae infection. \u003cem\u003eCell. Microbiol.\u003c/em\u003e \u003cstrong\u003e15\u003c/strong\u003e, 779-794 (2013).\u003c/li\u003e\n\u003cli\u003eArancio, W. \u0026amp; Coronnello, C. Repetitive Sequence Transcription in Breast Cancer. \u003cem\u003eCells (Basel, Switzerland)\u003c/em\u003e. \u003cstrong\u003e11\u003c/strong\u003e, 2522 (2022).\u003c/li\u003e\n\u003cli\u003eLin, P. et al. A multicenter-retrospective cohort study of chromosome instability in lung cancer: clinical characteristics and prognosis of patients harboring chromosomal instability detected by metagenomic next-generation sequencing. \u003cem\u003eJ Thorac Dis\u003c/em\u003e. \u003cstrong\u003e15\u003c/strong\u003e, 112-122 (2023).\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"metagenomic next-generation sequencing, mNGS, lung cancer, pulmonary infections","lastPublishedDoi":"10.21203/rs.3.rs-3883914/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3883914/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eRecent advances in unbiased metagenomic next-generation sequencing (mNGS) have enabled the simultaneous examination of both microbial and host genetic material in a single test. This study harnesses cost-effective bronchoalveolar lavage fluid (BALF) mNGS data from patients with lung cancer (n=123) and pulmonary infections (n=279). We developed a machine learning-based diagnostic approach to differentiate between these two conditions, which are often misdiagnosed in clinical settings. To ensure independence between model construction and validation, we divided the cohorts based on the collection dates of the samples. The training cohort (lung cancer, n=87; pulmonary infection, n=197) revealed distinct differences in DNA/RNA microbial composition, bacteriophage abundances, and host responses, including gene expression, transposable element levels, immune cell composition, and tumor fraction determined by copy number variation (CNV). These features, blinded to the validation cohort, were integrated into a host/microbe metagenomics-driven machine learning model (Model VI). The model demonstrated an Area Under the Curve (AUC) of 0.87 (95% CI = 0.857-0.883) in the training cohort and 0.831 (95% CI = 0.819-0.843) in the validation cohort for differentiating between patients with lung cancer and pulmonary infections. Applying a composite predictive model based on a rule-in and rule-out strategy significantly increased accuracy in distinguishing lung cancer from tuberculosis (ACC=0.913), fungal infection (ACC=0.955), and bacterial infection (ACC=0.836). These results underscore the potential of mNGS-based analysis as a valuable, cost-effective tool for the early differentiation of lung cancer from pulmonary infections, offering a comprehensive testing solution in a clinical context.\u003c/p\u003e","manuscriptTitle":"Metagenomic Analysis of Bronchoalveolar Lavage Fluid Enables Differential Diagnosis Between Lung Cancer and Pulmonary Infections","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-02-09 20:27:58","doi":"10.21203/rs.3.rs-3883914/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c4a416f2-469f-46f9-b10d-dffd6c86ebc3","owner":[],"postedDate":"February 9th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":28676096,"name":"Biological sciences/Microbiology/Infectious-disease diagnostics"},{"id":28676097,"name":"Biological sciences/Biological techniques/Sequencing/Next-generation sequencing"}],"tags":[],"updatedAt":"2024-07-05T11:26:04+00:00","versionOfRecord":[],"versionCreatedAt":"2024-02-09 20:27:58","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-3883914","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3883914","identity":"rs-3883914","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00