Development and Validation of a Simplified Diagnostic Model for Lung Cancer Using Age, Smoking History, and Serum CEA

preprint OA: closed
Full text JSON View at publisher
Full text 121,297 characters · extracted from preprint-html · click to expand
Development and Validation of a Simplified Diagnostic Model for Lung Cancer Using Age, Smoking History, and Serum CEA | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Development and Validation of a Simplified Diagnostic Model for Lung Cancer Using Age, Smoking History, and Serum CEA Shuling Wang, Guofu Lin, Jiefeng Huang, Lifang Fu, Yiming Chen, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9254199/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 17 You are reading this latest preprint version Abstract Background : Lung cancer is the leading cause of cancer-related death worldwide. Current diagnostic models frequently depend on imaging features or complex algorithms, restricting their applicability in settings with limited resources. This study aimed to develop and validate a simplified diagnostic model for lung cancer using readily available clinical variables. Methods : This retrospective cohort study included 336 patients who underwent bronchoscopy at a tertiary hospital from January 2020 to June 2025. Patients were randomly assigned to training (n=235) and validation (n=101) sets. Candidate predictors included demographic characteristics, smoking history, comorbidities, and laboratory parameters. Variable selection and model development were performed using a combination of least absolute shrinkage and selection operator (LASSO) regression and multivariable logistic regression. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), calibration plots, and decision curve analysis. Subgroup analyses, restricted cubic splines, and mediation analysis were conducted to explore underlying relationships. Results : We identified three independent predictors: age ≥75 years (OR=6.03, 95% CI: 1.73-21.21), smoking history (OR=3.73, 95% CI: 1.87-7.41), and serum CEA (OR=1.27, 95% CI: 1.12-1.45). The model showed good discrimination in both the training set (AUC=0.84, 95% CI: 0.79-0.90) and validation set (AUC=0.85, 95% CI: 0.78-0.93), with well-calibrated predictions and positive net benefit across a range of threshold probabilities. At a cut-off of 4.57 ng/mL, serum CEA demonstrated high specificity (94%) but moderate sensitivity (52%). In contrast, carcinoembryonic antigen in bronchoalveolar lavage fluid (BALF-CEA) at 22.6 ng/mL showed higher sensitivity (71%) but lower specificity (57%). Mediation analysis revealed that serum CEA did not significantly mediate the relationship between smoking and lung cancer (indirect effect: 0.09, 95% CI: -0.02-0.22, P =0.118). The dose-response relationship between CEA and lung cancer risk was linear, with no evidence of significant threshold effects. Conclusions : We developed a simplified diagnostic model based on age, smoking history, and serum CEA that accurately predicts lung cancer risk. Its simplicity and use of readily available variables make it particularly suitable for primary care settings and large-scale screening programs, where it can serve as an effective triage tool to identify high-risk individuals for subsequent low-dose CT examination. Lung cancer prediction model carcinoembryonic antigen smoking diagnosis nomogram Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 1 Introduction Lung cancer carries the highest burden of cancer-related morbidity and mortality worldwide, with a particularly substantial impact in China [ 1 , 2 ], where approximately 1.06 million new cases and 733,300 deaths were recorded in 2022, accounting for 21.98% of all new cancer diagnoses and 28.49% of total cancer mortality, respectively [ 3 ]. Currently, a definitive diagnosis of lung cancer requires invasive tissue biopsy (eg, bronchoscopy or needle biopsy). The accuracy of these procedures depends on the operator’s experience, specimen quality, and laboratory conditions [ 4 ]. Moreover, these procedures are not readily accessible in primary care settings. Although minimally invasive, they can cause patient discomfort and carry risks such as pneumothorax and bleeding [ 5 ]. Low-dose computed tomography (LDCT) enables early detection of pulmonary lesions, but its high false-positive rate often leads to unnecessary follow-up and over-investigation, wasting healthcare resources [ 6 , 7 ]. Therefore, developing a non-invasive, low-cost, and easily accessible ancillary diagnostic tool is of significant practical importance for early lung cancer diagnosis. Clinical prediction models are statistical tools that use clinical data to improve the accuracy and efficiency of medical decision-making [ 8 ]. These models are now widely used for risk prediction and prognostic assessment in lung cancer [ 9 ], colorectal cancer [ 10 ], and other malignancies [ 11 , 12 ]. Traditional diagnostic models for lung cancer, such as the Mayo model [ 13 ], Brock model [ 14 ], and Peking University (PKUPH) model [ 15 ], rely mainly on radiographic features (eg, spiculation, lobulation) for discrimination. However, identifying these features depends heavily on the radiologist's experience, which inevitably introduces inter-observer variability [ 16 ] and compromises model objectivity and consistency. Recent advances in artificial intelligence (AI) have enabled machine learning and deep learning to be widely applied for automated analysis and feature extraction from imaging data. These approaches have significantly improved the ability of models to differentiate benign from malignant pulmonary nodules and assess their invasiveness. For example, models based on deep convolutional neural networks (CNNs) can extract subtle features from high-resolution CT images that are imperceptible to the human eye, achieving excellent performance in distinguishing invasive from non-invasive ground-glass nodules (AUC up to 0.944) [ 17 ]. However, the high performance of these models often requires substantial computational resources and complex architectures, creating technical barriers and implementation costs that may limit their adoption in resource-constrained settings [ 18 ]. Moreover, most AI models do not systematically incorporate patients’ clinical characteristics and laboratory data, serving only as ancillary references rather than supporting comprehensive decision-making [ 19 ]. Carcinoembryonic antigen (CEA) is a classic broad-spectrum tumor marker. Since its association with lung cancer was first reported in 1981, it has been widely used for adjunctive diagnosis, treatment monitoring, and prognosis assessment in lung cancer [ 20 , 21 ]. As a simple and cost-effective serological marker, CEA offers unique advantages in clinical accessibility, making it particularly suitable for use in primary healthcare settings. To address these limitations, we aimed to develop a novel diagnostic prediction model for lung cancer using real-world clinical data. Our goal was to achieve robust discriminative performance while minimizing the model's complexity and computational requirements. This model is intended to serve as an efficient, convenient, and cost-effective tool for early ancillary lung cancer diagnosis in primary care settings and large-scale screening programs, thereby improving the clinical applicability and accessibility of early lung cancer detection. 2 Methods 2.1 Study Design This single-center, retrospective cohort study was designed to develop and validate a clinical prediction model for the ancillary diagnosis of lung cancer. The study was conducted in accordance with the principles of the Declaration of Helsinki (revised 2013) [ 22 ] and was approved by the Ethics Committee of the First Affiliated Hospital of Fujian Medical University (Approval No. 2024 − 408). Due to the retrospective design and use of anonymized data, the requirement for individual patient informed consent was waived by the ethics committee. 2.2 Study Population Figure 1 illustrates the patient selection process. Data were obtained from the electronic medical record system of the First Affiliated Hospital of Fujian Medical University. We initially enrolled 800 consecutive patients who underwent bronchoscopy between January 2020 and June 2025. The lung cancer group included patients with newly diagnosed primary lung cancer confirmed by histopathological examination (bronchoscopic biopsy, percutaneous lung biopsy, or surgical resection) who had not received any prior anti-tumor therapy. Diagnoses followed the 5th edition of the World Health Organization (WHO) Classification of Lung Tumors [ 23 ]. The control group consisted of patients with benign pulmonary diseases confirmed by pathology or clinical follow-up (≥ 12 months). Exclusion criteria were: absolute contraindications to bronchoscopy (eg, active massive hemoptysis, severe coagulation disorders, severe or unstable cardiovascular disease); concomitant severe hepatic or renal insufficiency, severe pneumonia, or other conditions that could affect laboratory parameters; a history of other primary malignancies or pulmonary metastatic tumors; and missing critical clinical or laboratory data. Ultimately, 336 eligible patients were included in the final analysis. No data were missing for any of the variables included in the final analysis, as patients with missing critical clinical or laboratory data had been excluded. 2.3 Data Collection Clinical data for all enrolled patients were retrospectively collected from the hospital's electronic medical record system. These data included: (1) demographic characteristics (age, sex, body mass index [BMI], marital status); (2) clinical features (smoking history, hypertension, diabetes, hyperlipidemia, cardiovascular disease); and (3) laboratory parameters (serum CEA, BALF CEA, complete blood count (white blood cell count, neutrophil count, lymphocyte count, platelet count, hemoglobin, red blood cell count), liver function tests (lactate dehydrogenase, gamma-glutamyl transferase, aspartate aminotransferase, alanine aminotransferase, total bilirubin, total protein, albumin), renal function tests (urea, creatinine, uric acid), lipid profile (high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, total cholesterol, triglycerides), and D-dimer. All laboratory tests were performed on specimens collected at the time of initial admission, strictly following standard operating procedures. After collection, blood samples were immediately centrifuged to separate serum, which was then stored at − 80°C until analysis. BALF specimens were placed on ice immediately after collection for transport and processed within 2 hours; the resulting supernatant was aliquoted and stored at − 80°C until use. 2.4 Statistical Analysis Statistical analyses were performed using R software (version 4.5.0) and the Zstats online platform ( www.medsta.cn/software ). Categorical variables are presented as frequencies (percentages) and were compared using the χ² test or Fisher's exact test. Normally distributed continuous variables are expressed as mean ± standard deviation and were compared using the t-test. Non-normally distributed variables are expressed as median (interquartile range) and were compared using the Mann-Whitney U test or Kruskal-Wallis test. All tests were two-sided, and P < 0.05 was considered statistically significant. Sample size was estimated based on the events-per-variable (EPV) rule. With up to 10 candidate predictors and an expected lung cancer prevalence of 43% in patients undergoing bronchoscopy, a minimum of 100 lung cancer cases was required to achieve an EPV of 10 [ 24 ]. The final cohort of 336 patients (including 145 lung cancer cases) satisfied this requirement. Patients were randomly divided into training and internal validation sets in a 7:3 ratio. In the training set, we performed initial variable selection using least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation. Variables selected by LASSO were then analyzed using univariable logistic regression. Those with P < 0.05 were entered into a multivariable logistic regression model to identify independent predictors of lung cancer The final model was constructed based on the multivariable regression results and is presented as a nomogram. Model discrimination was assessed using the area under the receiver operating characteristic curve (AUC). Model performance was evaluated in both the training and validation sets using ROC curves, calibration plots, the Hosmer-Lemeshow test, and decision curve analysis (DCA). We also developed a multinomial logistic regression model to evaluate discrimination among histological subtypes (adenocarcinoma, squamous cell carcinoma and small cell lung cancer). For additional analyses, we used restricted cubic splines to examine the dose-response relationship between CEA and lung cancer risk and piecewise linear regression to test for potential threshold effects. Subgroup analyses were conducted to assess the robustness of the association between serum CEA and lung cancer. Mediation analysis with 1000 bootstrap resamples was performed to determine whether smoking indirectly increases lung cancer risk through its effect on serum CEA levels. Finally, we conducted a sensitivity analysis by re-developing the model after excluding a histological subtype with a very small sample size from the training set to test the robustness of the original model. 3 Results 3.1 Baseline Patient Characteristics Figure 1 shows the patient selection flowchart. We initially enrolled 800 patients who underwent bronchoscopy at our institution between January 2020 and June 2025. After applying the exclusion criteria, 464 patients were excluded, leaving 336 eligible patients. These were randomly assigned to the training set (n=235) and validation set (n=101) in a 7:3 ratio. Supplementary Table 1 compares baseline characteristics between the training and validation sets. Except for the prevalence of cardiovascular diseases (CVD) and high-density lipoprotein (HDL) levels (both P 0.05). Tables 1 and 2 detail the demographic, clinical, and laboratory characteristics of patients in the training set. We observed significant differences between the lung cancer subtypes and the control group in sex, age, marital status, smoking status, serum CEA, BALF CEA, BALF-CEA/serum CEA ratio, lactate dehydrogenase (LDH) and D-dimer levels (all P < 0.05). These differences provide a rationale for developing subtype-specific diagnostic models. 3.2 Predictor Selection To identify the most parsimonious set of predictors and reduce the risk of overfitting, we used a two-step variable selection method. As shown in Figure 2, LASSO regression with 10-fold cross-validation initially selected nine predictors with non-zero coefficients from the 33 candidate variables: age, marital status, smoking status, serum CEA, BALF CEA, the BALF-CEA/serum CEA ratio, white blood cell count (WBC), albumin (ALB) and HDL. These nine variables were then analyzed using univariable logistic regression. Variables with P < 0.05 were entered into a multivariable logistic regression model for confirmation (Table 3). The multivariable analysis showed that age ≥75 years (OR=6.03, 95% CI: 1.73-21.21), smoking (OR=3.73, 95% CI: 1.87-7.41), and serum CEA (OR=1.27, 95% CI: 1.12-1.45) were independent predictors of lung cancer. To develop a model capable of distinguishing among histological subtypes, we performed multinomial logistic regression analysis. To ensure model stability, we excluded a subtype with a very small sample size (type 4, n=5) from this analysis. Subsequent sensitivity analysis confirmed that this exclusion did not affect overall model performance (Supplementary Table 7). To increase statistical power, we combined the age categories with small sample sizes ("<55 years" and "55-64 years") into a new reference group (<65 years). The final multinomial model revealed several subtype-specific predictors (Table 4). For adenocarcinoma, log-transformed serum CEA was the strongest predictor (OR=6.576, P <0.001). For small cell lung cancer (SCLC), age ≥75 years (OR=5.113, P =0.010), smoking (OR=4.367, P =0.011), and serum CEA (OR=5.001, P <0.001) were significant predictors. For squamous cell carcinoma, age ≥75 years (OR=3.885, P =0.027), smoking (OR=5.489, P <0.001), and serum CEA (OR=2.645, P =0.007) also showed independent associations. 3.3 Model Development and Evaluation Based on these findings, we constructed a diagnostic model using age, smoking status and serum CEA. The model is presented as a nomogram (Figure 3). The nomogram assigns points to each risk factor; the total score can be used to estimate an individual’s probability of having lung cancer, with higher scores indicating greater risk. The model showed excellent discrimination in both the training and validation sets. ROC curve analysis (Figure 4A, 4D) yielded an AUC of 0.84 (95% CI: 0.79-0.90) in the training set and 0.85 (95% CI: 0.78-0.93) in the validation set. Using the Youden index, we determined the optimal probability threshold to be 0.415 in the training set, which gave a sensitivity of 0.83 and specificity of 0.78. When this threshold was applied to the validation set, the model maintained a sensitivity of 0.79 and specificity of 0.76. Detailed performance metrics are presented in Table 5. Calibration plots showed good agreement between predicted probabilities and observed outcomes in both sets (Figure 4B, 4E). Decision curve analysis (DCA) showed that using the model for clinical decision-making provided a higher net benefit than "treat-all" or "treat-none" strategies across a wide range of threshold probabilities (Figure 4C, 4F), indicating favorable clinical utility. Notably, the combined model (AUC=0.84) outperformed models using single biomarkers, including serum CEA alone (AUC=0.78), BALF CEA alone (AUC=0.65), and their ratio (AUC=0.57) (Supplementary Table 2). The combined model increased sensitivity from 0.71 to 0.83 while maintaining high specificity. The subtype-specific models also showed balanced and robust discrimination. The AUCs for distinguishing adenocarcinoma, squamous cell carcinoma, and SCLC from the control group were 0.84 (95% CI: 0.76-0.92), 0.84 (95% CI: 0.77-0.92), and 0.83 (95% CI: 0.72-0.93), respectively. The optimal subtype-specific thresholds (by Youden index) were 0.404 for adenocarcinoma, 0.415 for squamous cell carcinoma, and 0.442 for SCLC. Detailed performance metrics at these thresholds are shown in Table 6. The DeLong test showed no significant differences in AUCs among the subtype models (all P > 0.05, Table 7), indicating consistent performance across subtypes. 3.4 Diagnostic Performance of Serum and BALF CEA As shown in Supplementary Table 2, serum CEA and BALF CEA differed significantly in their diagnostic performance. At a cut-off of 4.57 ng/mL, serum CEA showed high specificity (94%) and positive predictive value (83%) but relatively low sensitivity (52%). In contrast, at a cut-off of 22.6 ng/mL, BALF CEA had higher sensitivity (71%) but lower specificity (57%) and positive predictive value (49%). The BALF-CEA/serum CEA ratio performed even more poorly. 3.5 Association Between CEA and Lung Cancer Risk To characterize the relationship between CEA and lung cancer risk, we first performed restricted cubic spline analysis. After logarithmic transformation, both serum CEA and BALF CEA showed a predominantly linear association with lung cancer probability, with no significant non-linear trends (Figure 5A, 5B). Models were adjusted for age, BMI, sex, marital status, smoking, hypertension, diabetes, hyperlipidemia, and CVD. Piecewise linear regression was further employed to explore potential threshold effects (Supplementary Tables 3 and 4). For log-transformed serum CEA, the likelihood ratio test showed no significant inflection point ( P =0.109), indicating a continuously linear association across the measurement range (Figure 6A). For BALF CEA, a potential inflection point was observed at 3.53, but the likelihood ratio test was not significant ( P =0.087, Figure 6B), providing insufficient evidence for a threshold effect. Subgroup analyses confirmed the robustness of the association between serum CEA and lung cancer risk (Figure 7). The association was consistent across subgroups defined by age, sex, BMI, smoking status, and comorbidities, with no significant interactions (all P for interaction > 0.05). Although the interactions were not significant, serum CEA showed a stronger effect in patients aged ≥65 years (OR 1.35 vs 1.25) and those with a smoking history (OR 1.40 vs 1.22). These findings support the broad utility of serum CEA as a lung cancer biomarker. 3.6 Mediation Analysis To determine whether smoking increases lung cancer risk indirectly by affecting serum CEA levels, we performed mediation analysis. All models were adjusted for age, BMI, sex, marital status, hypertension, diabetes, hyperlipidemia, and CVD. Path analysis (Figure 8) showed that smoking had a significant direct effect on lung cancer risk (β=0.99, P =0.024). There was a positive but non-significant association between smoking and serum CEA (β=17.30, P =0.133). Serum CEA was significantly associated with lung cancer risk (β=0.25, P <0.001). Effect decomposition (Table 8) showed that the total effect of smoking on lung cancer was borderline significant (coefficient: 0.12, 95% CI: -0.00 to 0.25, P =0.056). The indirect effect mediated by serum CEA was not significant (coefficient: 0.09, 95% CI: -0.02 to 0.22, P =0.118). The direct effect, independent of serum CEA, was smaller but significant (coefficient: 0.02, 95% CI: 0.00-0.05, P =0.024). The proportion mediated was 82.08% based on point estimates, but the 95% confidence interval was wide and included zero (-20.25% to 127.00%), consistent with the non-significant indirect effect. In summary, we found no evidence that serum CEA mediates the association between smoking and lung cancer in this cohort. 3.7 Sensitivity Analysis To assess model robustness and the potential impact of the small-sample subtype (Type 4, n=5), we performed a sensitivity analysis. We removed all Type 4 patients from the training set; baseline characteristics of the remaining 230 patients are shown in Supplementary Table 5. The validation set (n=101) was unchanged to provide a consistent benchmark. In the reduced training set, LASSO regression selected a slightly different set of candidate variables: sex, age, smoking status, serum CEA, diabetes, red blood cell count (RBC), alanine aminotransferase (ALT), LDL, and total cholesterol (TC) (Supplementary Figure 1). Importantly, age, smoking status, and serum CEA were consistently selected. After univariable and multivariable logistic regression (Supplementary Table 6), age, smoking history, and serum CEA remained significant independent predictors (all P < 0.05). The direction and magnitude of their coefficients were consistent with the original model. The model rebuilt on these core variables showed performance in the validation set that closely matched the original model (Supplementary Table 7). Both models had identical AUCs (0.85, 95% CI: 0.78-0.93) and accuracy (0.78). The sensitivity analysis model had slightly higher sensitivity (0.81 vs 0.79) and slightly lower specificity (0.73 vs 0.76), but its optimal cut-off was very close to the original (0.413 vs 0.415). Confidence intervals for positive and negative predictive values overlapped substantially. These results indicated that our model is robust and that its performance was not affected by the small-sample subtype, supporting the reliability of our conclusions. 4. Discussion In this study, we developed and validated a diagnostic prediction model for lung cancer that integrates age, smoking history, and serum CEA using data from a single-center cohort. The model showed good discrimination (AUC 0.84 in training, 0.85 in validation) and clinical utility, with consistent performance across histological subtypes, suggesting good generalizability. This internally validated parsimonious model addresses the need for accessible diagnostic tools in resource-limited settings, complementing existing imaging-based strategies. Smoking is the most important environmental risk factor for lung cancer [25], accounting for an estimated 85-90% of cases [26]. Smokers have a 15- to 30-fold higher risk of developing lung cancer than never-smokers, with a clear dose-response relationship [27, 28]. The carcinogenic mechanisms of smoking are complex and involve direct genetic damage, immune microenvironment remodeling, and epigenetic regulation. Specifically, tobacco smoke carcinogens such as Nicotine-derived Nitrosamine Ketone (NNK) and polycyclic aromatic hydrocarbons can directly or indirectly induce DNA mutations [29]. In parallel, smoke can induce PD-L1 expression on lung epithelial cells, promoting immune escape [30]. Smoking can also drive tumorigenesis through epigenetic alterations, including hypomethylation of specific genes [31]. In our study, smoking was the strongest predictor for squamous cell carcinoma (OR=5.489) and SCLC (OR=4.367) but was weaker for adenocarcinoma. This difference likely reflects the distinct anatomical origins and molecular mechanisms of these subtypes. Squamous cell carcinoma and SCLC typically arise in the central airways, which are directly exposed to tobacco smoke. Their development often follows a cumulative damage model, with continuous DNA damage leading to inactivation of tumor suppressor genes such as TP53 and RB1 [32, 33]. In contrast, adenocarcinomas often arise in the peripheral lung and are driven by specific oncogenic alterations such as EGFR mutations and ALK fusions [34, 35]. These molecular events are less strongly linked to tobacco exposure [36] and occur more often in never-smokers and women [37, 38]. Together, these factors weaken the predictive value of smoking for adenocarcinoma. Age is another critical risk factor for lung cancer, independent of smoking [39]. We found that age ≥75 years was significantly associated with lung cancer risk (OR=6.03). This finding aligns with the overarching epidemiological trend that over 95% of lung cancer cases are diagnosed after age 50 [40], with incidence rates continuously rising with age, and approximately half of patients are diagnosed after age 65. A recent study of disease burden in China reported increasing lung cancer incidence in the over-70 age group [41]. The biological basis involves multiple aging processes including cumulative carcinogen exposure, declining DNA repair capacity, and waning immune surveillance [42-44]. However, aging rates vary among individuals, and chronological age alone may not fully explain risk heterogeneity among same-aged individuals [45]. Biological age (BA), quantified using multiple biomarkers (eg, Klemera-Doubal method [46], PhenoAge [47], homeostatic dysregulation [48]), may better reflect an individual’s physiological state and risk for age-related diseases and mortality. Studies have demonstrated significant associations between these composite measures and lung cancer risk [49]. Future integration of these more precise aging measures into prediction models may enable finer risk stratification among individuals of the same chronological age. Our study also clarifies the distinct roles of serum and BALF CEA in lung cancer diagnosis. Serum CEA at 4.57 ng/mL showed high specificity (94%) and positive predictive value (83%) but limited sensitivity (52%). An elevated serum CEA level is a useful indicator of lung cancer, particularly adenocarcinoma (OR=6.576, P <0.001); however, a normal level is insufficient to exclude the disease. This finding aligns with recent reports by Lv et al. [50] and Kuo et al. [51]. In contrast, BALF CEA at 22.6 ng/mL had higher sensitivity (71%) but lower specificity (57%) and positive predictive value (49%). Sanguinetti et al. similarly reported that BALF CEA, while more sensitive than serum CEA, had lower specificity and was susceptible to interference from benign lung disease, limiting its clinical utility [52]. Moreover, BALF CEA measurement is invasive, susceptible to local inflammation [53], and lacks standardized protocols [54], making it of limited value as a standalone test. However, because BALF CEA directly reflects local tumor information, it may still offer complementary value in complex cases such as peripheral lung cancer [55]. Although some studies have suggested that combining serum and BALF CEA could improve diagnostic accuracy [56], the ratio of the two performed poorly in our study. A similar phenomenon was observed by Sanguinetti et al. [52], who found no significant difference in the BALF-CEA/serum CEA ratio between patients with lung cancer and those with benign lung disease. This phenomenon may be related to the inconsistency in the biological information reflected by these two markers and the confounding factors inherent in BALF CEA measurement, suggesting that this ratio should be interpreted with caution in clinical practice. The mediation analysis in this study did not find a significant mediating role for serum CEA in the association between smoking and lung cancer. Although smoking may influence CEA levels through pathways such as chronic inflammation [57, 58], the association between them did not reach statistical significance in our study ( P =0.133), which is inconsistent with some previous reports [59, 60]. This discrepancy may reflect our crude smoking assessment, limited sample size, and the high proportion of adenocarcinomas, in which CEA is less strongly linked to smoking [61]. This result suggests that the primary pathways of smoking-induced carcinogenesis may be more focused on its direct genotoxic effects and remodeling of the tumor microenvironment [29-31], whereas CEA elevation is more likely a concomitant phenomenon following tumor development rather than a key mediator of the smoking-carcinogenesis process. Additionally, we also found a continuous linear relationship between serum CEA and lung cancer risk, with no clear threshold effect. This supports the use of serum CEA as a continuous variable in prediction models, thereby capturing its risk information and avoiding loss from dichotomization. This study has several limitations. First, its single-center retrospective design and relatively small sample size may limit generalizability. Although internal validation and sensitivity analyses supported robustness, external validation in diverse populations is needed. Second, to maintain simplicity and accessibility, we did not include CT imaging features. In clinical practice, this model could serve as an initial screening tool to identify high-risk individuals for subsequent low-dose CT [62, 63], enabling a stratified, sequential screening approach In conclusion, we developed and validated a parsimonious diagnostic model for lung cancer using smoking history, age and serum CEA. The model utilizes readily accessible variables, and is easy to use, providing a practical and reliable new tool for early lung cancer diagnosis in primary care and resource-limited settings. Abbreviations ALB: Albumin ALT: Alanine aminotransferase AST: Aspartate aminotransferase AUC: Area under the receiver operating characteristic curve BA: Biological age BALF: Bronchoalveolar lavage fluid BMI: Body mass index CEA: Carcinoembryonic antigen CI: Confidence interval CNN: Convolutional neural network Crea: Creatinine CVD: Cardiovascular disease DCA: Decision curve analysis GGT: Gamma-glutamyl transferase Hb: Hemoglobin HDL: High-density lipoprotein cholesterol KDM: Klemera-Doubal method LASSO: Least absolute shrinkage and selection operator LDCT: Low-dose computed tomography LDH: Lactate dehydrogenase LDL: Low-density lipoprotein cholesterol LYMPH: Lymphocyte count M: Median MCMC: Markov chain Monte Carlo NEUT: Neutrophil count NNK: Nicotine-derived Nitrosamine Ketone NPV: Negative predictive value OR: Odds ratio PLT: Platelet count PPV: Positive predictive value Q₁: First quartile Q₃: Third quartile RBC: Red blood cell count RCS: Restricted cubic spline ROC: Receiver operating characteristic S.E.: Standard error SCLC: Small cell lung cancer SD: Standard deviation TBIL: Total bilirubin TC: Total cholesterol TG: Triglycerides TP: Total protein UA: Uric acid WBC: White blood cell count WHO: World Health Organization Declarations Ethics approval and consent to participate This study was conducted in accordance with the Declaration of Helsinki (revised 2013) and was approved by the Ethics Committee of the First Affiliated Hospital of Fujian Medical University (Approval No. 2024-408). The need for individual patient informed consent was waived by the Ethics Committee of the First Affiliated Hospital of Fujian Medical University (Approval No. 2024-408) due to the retrospective design and use of anonymized data. Consent for publication Not applicable. Availability of data and materials The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request. Competing interests The authors declare that they have no competing interests Funding This work was supported by High-level Talent Funding Project of The First Affiliated Hospital of Fujian Medical University (Grant number: YJRC4187) and Young and Middle-Aged Health Professionals Training Program of Fujian Province (Grant number: 2025GGB07). Authors' contributions Wang SL designed the study and wrote the manuscript; Wang SL, Fu LF, Chen YM, Zeng AM collected the clinical data and performed the statistical analyses; Lin GF, Huang JF, Wang BY contributed to data interpretation and manuscript revision. All authors reviewed and approved the final version of the manuscript. References Sun, X.F., et al., Immune-cell signatures of persistent inflammation, immunosuppression, and catabolism syndrome after sepsis. Med, 2025. 6 (5): p. 100569. Zhang, X., et al., Economic Burden for Lung Cancer Survivors in Urban China. Int J Environ Res Public Health, 2017. 14 (3). Han, B., et al., Cancer incidence and mortality in China, 2022. J Natl Cancer Cent, 2024. 4 (1): p. 47–53. Nooreldeen, R. and H. Bach, Current and Future Development in Lung Cancer Diagnosis. Int J Mol Sci, 2021. 22 (16). Sheng, A., et al., Diagnostic Efficacy of CT Radiomic Features in Pulmonary Invasive Mucinous Adenocarcinoma. Scanning, 2022. 2022 : p. 5314225. Toyoda, Y., et al., Sensitivity and specificity of lung cancer screening using chest low-dose computed tomography. Br J Cancer, 2008. 98 (10): p. 1602–7. Xie, D., et al., Overdiagnosis of Lung Cancer Due to the Introduction of Low-Dose Computed Tomography in Average-Risk Populations in the People's Republic of China. J Thorac Oncol, 2025. 20 (7): p. 884–896. Binuya, M.A.E., et al., Methodological guidance for the evaluation and updating of clinical prediction models: a systematic review. BMC Med Res Methodol, 2022. 22 (1): p. 316. Park, B., et al., Risk-based prediction model for selecting eligible population for lung cancer screening among ever smokers in Korea. Transl Lung Cancer Res, 2021. 10 (12): p. 4390–4402. Cooper, J.A., et al., The use of electronic healthcare records for colorectal cancer screening referral decisions and risk prediction model development. BMC Gastroenterol, 2020. 20 (1): p. 78. Zhou, X., et al., Predicting Cancer-Specific Survival Among Patients With Prostate Cancer After Radical Prostatectomy Based on the Competing Risk Model: Population-Based Study. Front Surg, 2021. 8 : p. 770169. An, P., et al., Prognostic Predicting Model of Pancreatic Body Tail Carcinoma Using Clinical and CT Radiomic Data. Technol Cancer Res Treat, 2023. 22 : p. 15330338231186739. Swensen, S.J., et al., The probability of malignancy in solitary pulmonary nodules. Application to small radiologically indeterminate nodules. Arch Intern Med, 1997. 157 (8): p. 849–55. McWilliams, A., et al., Probability of cancer in pulmonary nodules detected on first screening CT. N Engl J Med, 2013. 369 (10): p. 910–9. Li, Y., et al., Establishment of a mathematical prediction model to evaluate the probability of malignancy or benign in patients with solitary pulmonary nodules. Beijing Da Xue Xue Bao Yi Xue Ban, 2011. 43 (3): p. 450–4. Nair, A., et al., Variable radiological lung nodule evaluation leads to divergent management recommendations. Eur Respir J, 2018. 52 (6). Ardila, D., et al., End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med, 2019. 25 (6): p. 954–961. Liu, Y., et al., RSDCNet: An efficient and lightweight deep learning model for benign and malignant pathology detection in breast cancer. Digit Health, 2025. 11 : p. 20552076251336286. Xiong Y, Lu NH. Research progress on risk factors and benign-malignant prediction models for pulmonary nodules . Zhongguo Yi Yao Ke Xue. 2022; 12 (23):35-38,42. Carcinoembryonic antigen: its role as a marker in the management of cancer. Summary of an NIH consensus statement. Br Med J (Clin Res Ed), 1981. 282 (6261): p. 373–5. Grunnet, M. and J.B. Sorensen, Carcinoembryonic antigen (CEA) as tumor marker in lung cancer. Lung Cancer, 2012. 76 (2): p. 138–43. World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects. Jama, 2013. 310 (20): p. 2191–4. WHO Classification of Tumours Editorial Board, Thoracic Tumours . WHO classification of tumours series. Vol. 5. 2021, Lyon, France: International Agency for Research on Cancer. Peduzzi, P., et al., A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol, 1996. 49 (12): p. 1373–9. Kehrle, K., M. Hetjens, and S. Hetjens, Risk Factors and Preventive Measures for Lung Cancer in the European Union. Epidemiologia (Basel), 2024. 5 (3): p. 539–546. Freedman, N.D., et al., Cigarette smoking and subsequent risk of lung cancer in men and women: analysis of a prospective cohort study. Lancet Oncol, 2008. 9 (7): p. 649–56. Peto, J., That the effects of smoking should be measured in pack-years: misconceptions 4. Br J Cancer, 2012. 107 (3): p. 406–7. National Center for Chronic Disease, P., S. Health Promotion Office on, and Health, Reports of the Surgeon General , in The Health Consequences of Smoking—50 Years of Progress: A Report of the Surgeon General . 2014, Centers for Disease Control and Prevention (US): Atlanta (GA). Ewa, B. and M. Danuta, Polycyclic aromatic hydrocarbons and PAH-related DNA adducts. J Appl Genet, 2017. 58 (3): p. 321–330. Wang, G.Z., et al., The Aryl hydrocarbon receptor mediates tobacco-induced PD-L1 expression and is associated with response to immunotherapy. Nat Commun, 2019. 10 (1): p. 1125. Gao, X., et al., Tobacco smoking and methylation of genes related to lung cancer development. Oncotarget, 2016. 7 (37): p. 59017–59028. Toyooka, S., T. Tsuda, and A.F. Gazdar, The TP53 gene, tobacco exposure, and lung cancer. Hum Mutat, 2003. 21 (3): p. 229–39. Drapkin, B.J. and A.F. Farago, Unexpected Synergy Reveals New Therapeutic Strategy in SCLC. Trends Pharmacol Sci, 2019. 40 (5): p. 295–297. Paez, J.G., et al., EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science, 2004. 304 (5676): p. 1497–500. Soda, M., et al., Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature, 2007. 448 (7153): p. 561–6. Zhang, R., L. Dong, and J. Yu, Concomitant Pathogenic Mutations and Fusions of Driver Oncogenes in Tumors. Front Oncol, 2020. 10 : p. 544579. Dmitrieva, A.M., I.G. Kocak, and L. Meder, Aberrations in the glycosylation of receptor tyrosine kinases: A focus on lung adenocarcinoma. Cytojournal, 2025. 22 : p. 62. Chapman, A.M., et al., Lung cancer mutation profile of EGFR, ALK, and KRAS: Meta-analysis and comparison of never and ever smokers. Lung Cancer, 2016. 102 : p. 122–134. Krist, A.H., et al., Screening for Lung Cancer: US Preventive Services Task Force Recommendation Statement. Jama, 2021. 325 (10): p. 962–970. Bray, F., et al., Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin, 2024. 74 (3): p. 229–263. Zhang AQ, Li HY, Dai DY. Age-period-cohort analysis and prediction of lung cancer burden in China from 1992 to 2021 . Lin Chuang Fei Ke Za Zhi. 2025; 30 (5):649-655. Sharma, G., N.A. Hanania, and Y.M. Shim, The aging immune system and its relationship to the development of chronic obstructive pulmonary disease. Proc Am Thorac Soc, 2009. 6 (7): p. 573–80. Calcinotto, A., et al., Cellular Senescence: Aging, Cancer, and Injury. Physiol Rev, 2019. 99 (2): p. 1047–1078. Cho, S.J. and H.W. Stout-Delgado, Aging and Lung Disease. Annu Rev Physiol, 2020. 82 : p. 433–459. Shi, L., et al., ResnetAge: A Resnet-Based DNA Methylation Age Prediction Method. Bioengineering (Basel), 2023. 11 (1). Klemera, P. and S. Doubal, A new approach to the concept and computation of biological age. Mech Ageing Dev, 2006. 127 (3): p. 240–8. Levine, M.E., et al., An epigenetic biomarker of aging for lifespan and healthspan. Aging (Albany NY), 2018. 10 (4): p. 573–591. Cohen, A.A., et al., A novel statistical approach shows evidence for multi-system physiological dysregulation during aging. Mech Ageing Dev, 2013. 134 (3-4): p. 110–7. Michaud, D.S., et al., Epigenetic age and lung cancer risk in the CLUE II prospective cohort study. Aging (Albany NY), 2023. 15 (3): p. 617–629. Lv, H., et al., Diagnostic efficiency of peripheral tumor serum biomarkers in lung cancer and their correlation with clinicopathological features. Am J Transl Res, 2025. 17 (4): p. 2712–2720. Kuo, Y.S., et al., Prognostic and Monitoring Utility of Serum CEA in Lung Adenocarcinoma: Differential Roles in EGFR-TKI and Chemotherapy Treatments. Cancer Med, 2025. 14 (17): p. e71170. Sanguinetti, C.M., et al., Bronchoalveolar lavage fluid level of carcinoembryonic antigen in the diagnosis of peripheral lung cancer. Monaldi Arch Chest Dis, 1995. 50 (3): p. 177–82. Haeger, S., et al., The bronchoalveolar lavage dilution conundrum: an updated view on a long-standing problem. Am J Physiol Lung Cell Mol Physiol, 2024. 327 (5): p. L807–l813. Bollmann, B.A., et al., Cellular analysis in bronchoalveolar lavage: inherent limitations of current standard procedure. Eur Respir J, 2017. 49 (6). Jiang K, Shao GG, Tian RG. Diagnostic significance of radioimmunoassay for CEA and β2-microglobulin in serum and bronchoalveolar lavage fluid for lung cancer . Zhongguo Mian Yi Xue Za Zhi. 1994(3):186-187. Yang ZW, Shao RX. Diagnostic value of CEA, CYFRA21-1, NSE, and SCC-Ag in pleural fluid and serum for lung cancer. Shi Yong Yi Xue Za Zhi. 2015; 31 (20):3334-3337. Alexander, J.C., N.A. Silverman, and P.B. Chretien, Effect of age and cigarette smoking on carcinoembryonic antigen levels. Jama, 1976. 235 (18): p. 1975–9. Lee, J., V. Taneja, and R. Vassallo, Cigarette smoking and inflammation: cellular and molecular mechanisms. J Dent Res, 2012. 91 (2): p. 142–9. Ghosh, I., et al., Diagnostic Role of Tumour Markers CEA, CA15-3, CA19-9 and CA125 in Lung Cancer. Indian J Clin Biochem, 2013. 28 (1): p. 24–9. Pothal, S., et al., Diagnostic efficacy of broncho-alveolar lavage carcino-embronic antigen in carcinoma of lung. J Family Med Prim Care, 2019. 8 (5): p. 1725–1729. Okada, M., et al., Effect of histologic type and smoking status on interpretation of serum carcinoembryonic antigen value in non-small cell lung carcinoma. Ann Thorac Surg, 2004. 78 (3): p. 1004–9; discussion 1009–10. National Health Commission of the People's Republic of China. Lung cancer screening and early diagnosis and treatment guidelines (2024 edition) . Quan Ke Yi Xue Lin Chuang Yu Jiao Yu. 2024; 22 (9):772,776. Revel, M.P., et al., ESR Essentials: lung cancer screening with low-dose CT-practice recommendations by the European Society of Thoracic Imaging. Eur Radiol, 2025. Tables Tables 1 to 8 are available in the Supplementary Files section. Additional Declarations No competing interests reported. Supplementary Files SupplementaryMaterial.docx tABLES.docx Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 13 May, 2026 Reviews received at journal 07 May, 2026 Reviewers agreed at journal 07 May, 2026 Reviews received at journal 05 May, 2026 Reviewers agreed at journal 01 May, 2026 Reviews received at journal 30 Apr, 2026 Reviewers agreed at journal 30 Apr, 2026 Reviewers agreed at journal 29 Apr, 2026 Reviewers agreed at journal 29 Apr, 2026 Reviewers agreed at journal 29 Apr, 2026 Reviews received at journal 29 Apr, 2026 Reviewers agreed at journal 29 Apr, 2026 Reviewers invited by journal 29 Apr, 2026 Editor assigned by journal 26 Apr, 2026 Editor invited by journal 06 Apr, 2026 Submission checks completed at journal 03 Apr, 2026 First submitted to journal 03 Apr, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9254199","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":635088004,"identity":"8eef33db-41b4-4800-a8a3-3dd34396072d","order_by":0,"name":"Shuling Wang","email":"","orcid":"","institution":"Department of respiratory and critical care medicine, the First Affiliated Hospital, Fujian Medical University","correspondingAuthor":false,"prefix":"","firstName":"Shuling","middleName":"","lastName":"Wang","suffix":""},{"id":635088005,"identity":"73e73a96-aa73-473f-842c-b39c60f1f241","order_by":1,"name":"Guofu Lin","email":"","orcid":"","institution":"Department of respiratory and critical care medicine, the First Affiliated Hospital, Fujian Medical University","correspondingAuthor":false,"prefix":"","firstName":"Guofu","middleName":"","lastName":"Lin","suffix":""},{"id":635088007,"identity":"c8b1c8e0-3a50-49d1-b707-e55ad9525f0a","order_by":2,"name":"Jiefeng Huang","email":"","orcid":"","institution":"Department of respiratory and critical care medicine, the First Affiliated Hospital, Fujian Medical University","correspondingAuthor":false,"prefix":"","firstName":"Jiefeng","middleName":"","lastName":"Huang","suffix":""},{"id":635088008,"identity":"34fe12e3-d0df-4b05-befc-4fd4d5c37270","order_by":3,"name":"Lifang Fu","email":"","orcid":"","institution":"Department of respiratory and critical care medicine, the First Affiliated Hospital, Fujian Medical University","correspondingAuthor":false,"prefix":"","firstName":"Lifang","middleName":"","lastName":"Fu","suffix":""},{"id":635088011,"identity":"31446f2e-e637-4441-9483-592a0e827793","order_by":4,"name":"Yiming Chen","email":"","orcid":"","institution":"Department of respiratory and critical care medicine, the First Affiliated Hospital, Fujian Medical University","correspondingAuthor":false,"prefix":"","firstName":"Yiming","middleName":"","lastName":"Chen","suffix":""},{"id":635088012,"identity":"73345032-348a-4968-93f6-0bd6197e0855","order_by":5,"name":"Aiming Zeng","email":"","orcid":"","institution":"Department of respiratory and critical care medicine, the First Affiliated Hospital, Fujian Medical University","correspondingAuthor":false,"prefix":"","firstName":"Aiming","middleName":"","lastName":"Zeng","suffix":""},{"id":635088016,"identity":"802682bf-b787-4c6b-b51d-262d6c892ff7","order_by":6,"name":"Biying Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA9klEQVRIiWNgGAWjYJACZgYDBh4GEGIwsGFgI1VLGrFawACkheEwYeUGx88efl1QcFjGnH/tMWmegvP2fNLNDxh+7sCj5UxemvUMg8M8ljPepUnOMLid2CZzzICx9wxuLWYHcsyMeYBaDG6cMZP4YHA7gU0iwYCZsQ2PlvNvkLQkGJyzZ5NI/4Bfy40c48dgLed7QLYcYGyTyMFvi/2NN2bMMwzSgbbwGFvOMEhOBGopONiLR4tkf47x54I/1vYG588Y3ub5Y2cvPyN944OfeLQAAZsEA0MzA4NEAkLoAF4NwJj8wMBQx8DAT0jdKBgFo2AUjFgAAOxzTnTKnkWIAAAAAElFTkSuQmCC","orcid":"","institution":"Department of respiratory and critical care medicine, the First Affiliated Hospital, Fujian Medical University","correspondingAuthor":true,"prefix":"","firstName":"Biying","middleName":"","lastName":"Wang","suffix":""}],"badges":[],"createdAt":"2026-03-28 16:38:18","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9254199/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9254199/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108733406,"identity":"d4322404-8c2e-4f1c-9657-2e32318988b3","added_by":"auto","created_at":"2026-05-07 19:33:07","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":52814,"visible":true,"origin":"","legend":"\u003cp\u003eFlowchart of Patient Enrollment and Study Design.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-9254199/v1/b8a35fee471e015e11797999.png"},{"id":108733407,"identity":"379b4d63-4cfd-4582-891a-b8875d2e7e4a","added_by":"auto","created_at":"2026-05-07 19:33:07","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":645472,"visible":true,"origin":"","legend":"\u003cp\u003eLasso regression screening of prognostic model predictor variables. A) the selection process of the most appropriate λ in the lasso model; B) lasso coefficient curves.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-9254199/v1/20e7dea4365de288c0c8fa86.png"},{"id":108806705,"identity":"de56140f-7277-45a5-979b-fbff5e10247e","added_by":"auto","created_at":"2026-05-08 15:29:17","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":69862,"visible":true,"origin":"","legend":"\u003cp\u003eNomogram for Predicting Individual Risk of Lung Cancer. The scores for each variable are summed to give a total score, based on which to assess the patient's risk of lung cancer.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-9254199/v1/47f2e1676a99eb4d64bbf5ee.png"},{"id":109203391,"identity":"ca532c65-80d9-434b-a0b6-404bb4ae93a0","added_by":"auto","created_at":"2026-05-13 14:31:47","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":257549,"visible":true,"origin":"","legend":"\u003cp\u003eEvaluation and Validation of the Predictive Model. Figures A and D show the ROC curves for the training and validation sets, Figures B and E show the calibration curves for the training and validation sets, and Figures C and F show the DCA curves.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-9254199/v1/1016fe5cd5aac828443f02d9.png"},{"id":108733410,"identity":"4a3be476-1025-40d2-ba10-ac884b3f052b","added_by":"auto","created_at":"2026-05-07 19:33:07","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":97578,"visible":true,"origin":"","legend":"\u003cp\u003eRestricted Cubic Spline (RCS) Analysis of the Association Between Log-Transformed CEA Levels and Lung Cancer Risk. A) The association between log(serum CEA + 1) and the probability of lung cancer. B) The association between log(BALF CEA + 1) and the probability of lung cancer. The solid line represents the estimated odds ratio, and the shaded area represents the 95% confidence interval. The histograms show the distribution of the CEA values. The models were adjusted for age, BMI, gender, marital status, smoking status, hypertension, diabetes, hyperlipidemia, and CVD.\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-9254199/v1/c13812c65e57aa08e666fc1a.png"},{"id":108806755,"identity":"b876e258-17a6-4c2c-af04-4c2d70e15a22","added_by":"auto","created_at":"2026-05-08 15:29:22","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":52403,"visible":true,"origin":"","legend":"\u003cp\u003eThreshold Effect Analysis of Log-Transformed CEA on Lung Cancer Risk Using Two-Piecewise Linear Regression. A) Analysis for serum CEA. The solid line represents the odds ratio, and the shaded area represents the 95% confidence interval. B) Analysis for BALF CEA. The solid line represents the odds ratio, and the shaded area represents the 95% confidence interval. The models were adjusted for age, BMI, gender, marital status, smoking status, hypertension, diabetes, hyperlipidemia, and CVD.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-9254199/v1/3a32f06f21a2efe14ce482bf.png"},{"id":108807642,"identity":"9c7bcce0-c143-40a0-af5e-8d6e020f6206","added_by":"auto","created_at":"2026-05-08 15:31:01","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":72192,"visible":true,"origin":"","legend":"\u003cp\u003eSubgroup Analyses of the Association Between Serum CEA and Lung Cancer Risk. Forest plot showing odds ratios (squares) and 95% confidence intervals (horizontal lines) for the association between serum CEA (per 1-unit increase in log scale) and lung cancer across different patient subgroups. The diamond at the bottom represents the overall effect. P for interaction tests the heterogeneity of the association across subgroups. Abbreviations: BMI, body mass index; CVD, cardiovascular disease.\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-9254199/v1/20a2c3ea34bf9210a9804944.png"},{"id":108733413,"identity":"13a064c1-b0f8-497e-bd42-954bafabe41d","added_by":"auto","created_at":"2026-05-07 19:33:07","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":23424,"visible":true,"origin":"","legend":"\u003cp\u003ePath Diagram of the Mediation Analysis. Path diagram illustrating the mediation model with serum CEA as the mediator. Path coefficients are standardized estimates derived from Bayesian mediation analysis (MCMC iterations = 1000). The analysis was adjusted for age, BMI, gender, marital status, hypertension, diabetes, hyperlipidemia, and CVD.\u003c/p\u003e","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-9254199/v1/ee272870b55da93b0d232e7e.png"},{"id":109205062,"identity":"7a2d491e-bdbb-4527-92b2-2d27c7a9094c","added_by":"auto","created_at":"2026-05-13 15:03:13","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1515911,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9254199/v1/11d93219-0153-4257-8888-aee2acfa0ba6.pdf"},{"id":108806423,"identity":"77ba7623-4598-4c08-a622-f2ca35d8d40f","added_by":"auto","created_at":"2026-05-08 15:28:31","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":184102,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryMaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-9254199/v1/4e90541a3315758a0277e240.docx"},{"id":108733408,"identity":"cfff60e3-1567-4f16-a81f-e81d01960313","added_by":"auto","created_at":"2026-05-07 19:33:07","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":59015,"visible":true,"origin":"","legend":"","description":"","filename":"tABLES.docx","url":"https://assets-eu.researchsquare.com/files/rs-9254199/v1/9fc78bf67b84b040ef139f17.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Development and Validation of a Simplified Diagnostic Model for Lung Cancer Using Age, Smoking History, and Serum CEA","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eLung cancer carries the highest burden of cancer-related morbidity and mortality worldwide, with a particularly substantial impact in China [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e], where approximately 1.06\u0026nbsp;million new cases and 733,300 deaths were recorded in 2022, accounting for 21.98% of all new cancer diagnoses and 28.49% of total cancer mortality, respectively [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Currently, a definitive diagnosis of lung cancer requires invasive tissue biopsy (eg, bronchoscopy or needle biopsy). The accuracy of these procedures depends on the operator\u0026rsquo;s experience, specimen quality, and laboratory conditions [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Moreover, these procedures are not readily accessible in primary care settings. Although minimally invasive, they can cause patient discomfort and carry risks such as pneumothorax and bleeding [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Low-dose computed tomography (LDCT) enables early detection of pulmonary lesions, but its high false-positive rate often leads to unnecessary follow-up and over-investigation, wasting healthcare resources [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. Therefore, developing a non-invasive, low-cost, and easily accessible ancillary diagnostic tool is of significant practical importance for early lung cancer diagnosis.\u003c/p\u003e \u003cp\u003eClinical prediction models are statistical tools that use clinical data to improve the accuracy and efficiency of medical decision-making [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. These models are now widely used for risk prediction and prognostic assessment in lung cancer [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], colorectal cancer [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], and other malignancies [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Traditional diagnostic models for lung cancer, such as the Mayo model [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], Brock model [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], and Peking University (PKUPH) model [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e], rely mainly on radiographic features (eg, spiculation, lobulation) for discrimination. However, identifying these features depends heavily on the radiologist's experience, which inevitably introduces inter-observer variability [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e] and compromises model objectivity and consistency. Recent advances in artificial intelligence (AI) have enabled machine learning and deep learning to be widely applied for automated analysis and feature extraction from imaging data. These approaches have significantly improved the ability of models to differentiate benign from malignant pulmonary nodules and assess their invasiveness. For example, models based on deep convolutional neural networks (CNNs) can extract subtle features from high-resolution CT images that are imperceptible to the human eye, achieving excellent performance in distinguishing invasive from non-invasive ground-glass nodules (AUC up to 0.944) [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. However, the high performance of these models often requires substantial computational resources and complex architectures, creating technical barriers and implementation costs that may limit their adoption in resource-constrained settings [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e]. Moreover, most AI models do not systematically incorporate patients\u0026rsquo; clinical characteristics and laboratory data, serving only as ancillary references rather than supporting comprehensive decision-making [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eCarcinoembryonic antigen (CEA) is a classic broad-spectrum tumor marker. Since its association with lung cancer was first reported in 1981, it has been widely used for adjunctive diagnosis, treatment monitoring, and prognosis assessment in lung cancer [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e]. As a simple and cost-effective serological marker, CEA offers unique advantages in clinical accessibility, making it particularly suitable for use in primary healthcare settings.\u003c/p\u003e \u003cp\u003eTo address these limitations, we aimed to develop a novel diagnostic prediction model for lung cancer using real-world clinical data. Our goal was to achieve robust discriminative performance while minimizing the model's complexity and computational requirements. This model is intended to serve as an efficient, convenient, and cost-effective tool for early ancillary lung cancer diagnosis in primary care settings and large-scale screening programs, thereby improving the clinical applicability and accessibility of early lung cancer detection.\u003c/p\u003e"},{"header":"2 Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003e2.1 Study Design\u003c/h2\u003e\n \u003cp\u003eThis single-center, retrospective cohort study was designed to develop and validate a clinical prediction model for the ancillary diagnosis of lung cancer. The study was conducted in accordance with the principles of the Declaration of Helsinki (revised 2013) [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e] and was approved by the Ethics Committee of the First Affiliated Hospital of Fujian Medical University (Approval No. 2024\u0026thinsp;\u0026minus;\u0026thinsp;408). Due to the retrospective design and use of anonymized data, the requirement for individual patient informed consent was waived by the ethics committee.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\n \u003ch2\u003e2.2 Study Population\u003c/h2\u003e\n \u003cp\u003eFigure \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e1\u003c/span\u003e illustrates the patient selection process. Data were obtained from the electronic medical record system of the First Affiliated Hospital of Fujian Medical University. We initially enrolled 800 consecutive patients who underwent bronchoscopy between January 2020 and June 2025. The lung cancer group included patients with newly diagnosed primary lung cancer confirmed by histopathological examination (bronchoscopic biopsy, percutaneous lung biopsy, or surgical resection) who had not received any prior anti-tumor therapy. Diagnoses followed the 5th edition of the World Health Organization (WHO) Classification of Lung Tumors [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. The control group consisted of patients with benign pulmonary diseases confirmed by pathology or clinical follow-up (\u0026ge;\u0026thinsp;12 months). Exclusion criteria were: absolute contraindications to bronchoscopy (eg, active massive hemoptysis, severe coagulation disorders, severe or unstable cardiovascular disease); concomitant severe hepatic or renal insufficiency, severe pneumonia, or other conditions that could affect laboratory parameters; a history of other primary malignancies or pulmonary metastatic tumors; and missing critical clinical or laboratory data. Ultimately, 336 eligible patients were included in the final analysis. No data were missing for any of the variables included in the final analysis, as patients with missing critical clinical or laboratory data had been excluded.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\n \u003ch2\u003e2.3 Data Collection\u003c/h2\u003e\n \u003cp\u003eClinical data for all enrolled patients were retrospectively collected from the hospital\u0026apos;s electronic medical record system. These data included: (1) demographic characteristics (age, sex, body mass index [BMI], marital status); (2) clinical features (smoking history, hypertension, diabetes, hyperlipidemia, cardiovascular disease); and (3) laboratory parameters (serum CEA, BALF CEA, complete blood count (white blood cell count, neutrophil count, lymphocyte count, platelet count, hemoglobin, red blood cell count), liver function tests (lactate dehydrogenase, gamma-glutamyl transferase, aspartate aminotransferase, alanine aminotransferase, total bilirubin, total protein, albumin), renal function tests (urea, creatinine, uric acid), lipid profile (high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, total cholesterol, triglycerides), and D-dimer. All laboratory tests were performed on specimens collected at the time of initial admission, strictly following standard operating procedures. After collection, blood samples were immediately centrifuged to separate serum, which was then stored at \u0026minus;\u0026thinsp;80\u0026deg;C until analysis. BALF specimens were placed on ice immediately after collection for transport and processed within 2 hours; the resulting supernatant was aliquoted and stored at \u0026minus;\u0026thinsp;80\u0026deg;C until use.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\n \u003ch2\u003e2.4 Statistical Analysis\u003c/h2\u003e\n \u003cp\u003eStatistical analyses were performed using R software (version 4.5.0) and the Zstats online platform (\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ewww.medsta.cn/software\u003c/span\u003e\u003c/span\u003e). Categorical variables are presented as frequencies (percentages) and were compared using the \u0026chi;\u0026sup2; test or Fisher\u0026apos;s exact test. Normally distributed continuous variables are expressed as mean\u0026thinsp;\u0026plusmn;\u0026thinsp;standard deviation and were compared using the t-test. Non-normally distributed variables are expressed as median (interquartile range) and were compared using the Mann-Whitney U test or Kruskal-Wallis test. All tests were two-sided, and P\u0026thinsp;\u0026lt;\u0026thinsp;0.05 was considered statistically significant.\u003c/p\u003e\n \u003cp\u003eSample size was estimated based on the events-per-variable (EPV) rule. With up to 10 candidate predictors and an expected lung cancer prevalence of 43% in patients undergoing bronchoscopy, a minimum of 100 lung cancer cases was required to achieve an EPV of 10 [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. The final cohort of 336 patients (including 145 lung cancer cases) satisfied this requirement.\u003c/p\u003e\n \u003cp\u003ePatients were randomly divided into training and internal validation sets in a 7:3 ratio. In the training set, we performed initial variable selection using least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation. Variables selected by LASSO were then analyzed using univariable logistic regression. Those with \u003cem\u003eP\u003c/em\u003e\u0026thinsp;\u0026lt;\u0026thinsp;0.05 were entered into a multivariable logistic regression model to identify independent predictors of lung cancer\u003c/p\u003e\n \u003cp\u003eThe final model was constructed based on the multivariable regression results and is presented as a nomogram. Model discrimination was assessed using the area under the receiver operating characteristic curve (AUC). Model performance was evaluated in both the training and validation sets using ROC curves, calibration plots, the Hosmer-Lemeshow test, and decision curve analysis (DCA). We also developed a multinomial logistic regression model to evaluate discrimination among histological subtypes (adenocarcinoma, squamous cell carcinoma and small cell lung cancer).\u003c/p\u003e\n \u003cp\u003eFor additional analyses, we used restricted cubic splines to examine the dose-response relationship between CEA and lung cancer risk and piecewise linear regression to test for potential threshold effects. Subgroup analyses were conducted to assess the robustness of the association between serum CEA and lung cancer. Mediation analysis with 1000 bootstrap resamples was performed to determine whether smoking indirectly increases lung cancer risk through its effect on serum CEA levels. Finally, we conducted a sensitivity analysis by re-developing the model after excluding a histological subtype with a very small sample size from the training set to test the robustness of the original model.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"3 Results","content":"\u003cp\u003e\u003cstrong\u003e3.1 Baseline Patient Characteristics\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eFigure 1 shows the patient selection flowchart. We initially enrolled 800 patients who underwent bronchoscopy at our institution between January 2020 and June 2025. After applying the exclusion criteria, 464 patients were excluded, leaving 336 eligible patients. These were randomly assigned to the training set (n=235) and validation set (n=101) in a 7:3 ratio. Supplementary Table 1 compares baseline characteristics between the training and validation sets. Except for the prevalence of cardiovascular diseases (CVD) and high-density lipoprotein (HDL) levels (both \u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05), all other variables were well balanced between the two groups (all \u003cem\u003eP\u003c/em\u003e \u0026gt; 0.05).\u003c/p\u003e\n\u003cp\u003eTables 1 and 2 detail the demographic, clinical, and laboratory characteristics of patients in the training set. We observed significant differences between the lung cancer subtypes and the control group in sex, age, marital status, smoking status, serum CEA, BALF CEA, BALF-CEA/serum CEA ratio, lactate dehydrogenase (LDH) and D-dimer levels (all \u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05). These differences provide a rationale for developing subtype-specific diagnostic models.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.2 Predictor Selection\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo identify the most parsimonious set of predictors and reduce the risk of overfitting, we used a two-step variable selection method. As shown in Figure 2, LASSO regression with 10-fold cross-validation initially selected nine predictors with non-zero coefficients from the 33 candidate variables: age, marital status, smoking status, serum CEA, BALF CEA, the BALF-CEA/serum CEA ratio, white blood cell count (WBC), albumin (ALB) and HDL. These nine variables were then analyzed using univariable logistic regression. Variables with \u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05 were entered into a multivariable logistic regression model for confirmation (Table 3). The multivariable analysis showed that age \u0026ge;75 years (OR=6.03, 95% CI: 1.73-21.21), smoking (OR=3.73, 95% CI: 1.87-7.41), and serum CEA (OR=1.27, 95% CI: 1.12-1.45) were independent predictors of lung cancer.\u003c/p\u003e\n\u003cp\u003eTo develop a model capable of distinguishing among histological subtypes, we performed multinomial logistic regression analysis. To ensure model stability, we excluded a subtype with a very small sample size (type 4, n=5) from this analysis. Subsequent sensitivity analysis confirmed that this exclusion did not affect overall model performance (Supplementary Table 7).\u003c/p\u003e\n\u003cp\u003eTo increase statistical power, we combined the age categories with small sample sizes (\u0026quot;\u0026lt;55 years\u0026quot; and \u0026quot;55-64 years\u0026quot;) into a new reference group (\u0026lt;65 years). The final multinomial model revealed several subtype-specific predictors (Table 4). For adenocarcinoma, log-transformed serum CEA was the strongest predictor (OR=6.576, \u003cem\u003eP\u003c/em\u003e\u0026lt;0.001). For small cell lung cancer (SCLC), age \u0026ge;75 years (OR=5.113, \u003cem\u003eP\u003c/em\u003e=0.010), smoking (OR=4.367, \u003cem\u003eP\u003c/em\u003e=0.011), and serum CEA (OR=5.001, \u003cem\u003eP\u003c/em\u003e\u0026lt;0.001) were significant predictors. For squamous cell carcinoma, age \u0026ge;75 years (OR=3.885, \u003cem\u003eP\u003c/em\u003e=0.027), smoking (OR=5.489, \u003cem\u003eP\u003c/em\u003e\u0026lt;0.001), and serum CEA (OR=2.645, \u003cem\u003eP\u003c/em\u003e=0.007) also showed independent associations.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.3 Model Development and Evaluation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBased on these findings, we constructed a diagnostic model using age, smoking status and serum CEA. The model is presented as a nomogram (Figure 3). The nomogram assigns points to each risk factor; the total score can be used to estimate an individual\u0026rsquo;s probability of having lung cancer, with higher scores indicating greater risk.\u003c/p\u003e\n\u003cp\u003eThe model showed excellent discrimination in both the training and validation sets. ROC curve analysis (Figure 4A, 4D) yielded an AUC of 0.84 (95% CI: 0.79-0.90) in the training set and 0.85 (95% CI: 0.78-0.93) in the validation set. Using the Youden index, we determined the optimal probability threshold to be 0.415 in the training set, which gave a sensitivity of 0.83 and specificity of 0.78. When this threshold was applied to the validation set, the model maintained a sensitivity of 0.79 and specificity of 0.76. Detailed performance metrics are presented in Table 5. Calibration plots showed good agreement between predicted probabilities and observed outcomes in both sets (Figure 4B, 4E). Decision curve analysis (DCA) showed that using the model for clinical decision-making provided a higher net benefit than \u0026quot;treat-all\u0026quot; or \u0026quot;treat-none\u0026quot; strategies across a wide range of threshold probabilities (Figure 4C, 4F), indicating favorable clinical utility.\u003c/p\u003e\n\u003cp\u003eNotably, the combined model (AUC=0.84) outperformed models using single biomarkers, including serum CEA alone (AUC=0.78), BALF CEA alone (AUC=0.65), and their ratio (AUC=0.57) (Supplementary Table 2). The combined model increased sensitivity from 0.71 to 0.83 while maintaining high specificity.\u003c/p\u003e\n\u003cp\u003eThe subtype-specific models also showed balanced and robust discrimination. The AUCs for distinguishing adenocarcinoma, squamous cell carcinoma, and SCLC from the control group were 0.84 (95% CI: 0.76-0.92), 0.84 (95% CI: 0.77-0.92), and 0.83 (95% CI: 0.72-0.93), respectively. The optimal subtype-specific thresholds (by Youden index) were 0.404 for adenocarcinoma, 0.415 for squamous cell carcinoma, and 0.442 for SCLC. Detailed performance metrics at these thresholds are shown in Table 6. The DeLong test showed no significant differences in AUCs among the subtype models (all \u003cem\u003eP\u003c/em\u003e \u0026gt; 0.05, Table 7), indicating consistent performance across subtypes.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.4 Diagnostic Performance of Serum and BALF CEA\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAs shown in Supplementary Table 2, serum CEA and BALF CEA differed significantly in their diagnostic performance. At a cut-off of 4.57 ng/mL, serum CEA showed high specificity (94%) and positive predictive value (83%) but relatively low sensitivity (52%). In contrast, at a cut-off of 22.6 ng/mL, BALF CEA had higher sensitivity (71%) but lower specificity (57%) and positive predictive value (49%). The BALF-CEA/serum CEA ratio performed even more poorly.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.5 Association Between CEA and Lung Cancer Risk\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo characterize the relationship between CEA and lung cancer risk, we first performed restricted cubic spline analysis. After logarithmic transformation, both serum CEA and BALF CEA showed a predominantly linear association with lung cancer probability, with no significant non-linear trends (Figure 5A, 5B). Models were adjusted for age, BMI, sex, marital status, smoking, hypertension, diabetes, hyperlipidemia, and CVD.\u003c/p\u003e\n\u003cp\u003ePiecewise linear regression was further employed to explore potential threshold effects (Supplementary Tables 3 and 4). For log-transformed serum CEA, the likelihood ratio test showed no significant inflection point (\u003cem\u003eP\u003c/em\u003e=0.109), indicating a continuously linear association across the measurement range (Figure 6A). For BALF CEA, a potential inflection point was observed at 3.53, but the likelihood ratio test was not significant (\u003cem\u003eP\u003c/em\u003e=0.087, Figure 6B), providing insufficient evidence for a threshold effect.\u003c/p\u003e\n\u003cp\u003eSubgroup analyses confirmed the robustness of the association between serum CEA and lung cancer risk (Figure 7). The association was consistent across subgroups defined by age, sex, BMI, smoking status, and comorbidities, with no significant interactions (all \u003cem\u003eP\u003c/em\u003e for interaction \u0026gt; 0.05). Although the interactions were not significant, serum CEA showed a stronger effect in patients aged \u0026ge;65 years (OR 1.35 vs 1.25) and those with a smoking history (OR 1.40 vs 1.22). These findings support the broad utility of serum CEA as a lung cancer biomarker.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.6 Mediation Analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo determine whether smoking increases lung cancer risk indirectly by affecting serum CEA levels, we performed mediation analysis. All models were adjusted for age, BMI, sex, marital status, hypertension, diabetes, hyperlipidemia, and CVD.\u003c/p\u003e\n\u003cp\u003ePath analysis (Figure 8) showed that smoking had a significant direct effect on lung cancer risk (\u0026beta;=0.99, \u003cem\u003eP\u003c/em\u003e=0.024). There was a positive but non-significant association between smoking and serum CEA (\u0026beta;=17.30, \u003cem\u003eP\u003c/em\u003e=0.133). Serum CEA was significantly associated with lung cancer risk (\u0026beta;=0.25, \u003cem\u003eP\u003c/em\u003e\u0026lt;0.001).\u003c/p\u003e\n\u003cp\u003eEffect decomposition (Table 8) showed that the total effect of smoking on lung cancer was borderline significant (coefficient: 0.12, 95% CI: -0.00 to 0.25, \u003cem\u003eP\u003c/em\u003e=0.056). The indirect effect mediated by serum CEA was not significant (coefficient: 0.09, 95% CI: -0.02 to 0.22, \u003cem\u003eP\u003c/em\u003e=0.118). The direct effect, independent of serum CEA, was smaller but significant (coefficient: 0.02, 95% CI: 0.00-0.05, \u003cem\u003eP\u003c/em\u003e=0.024). The proportion mediated was 82.08% based on point estimates, but the 95% confidence interval was wide and included zero (-20.25% to 127.00%), consistent with the non-significant indirect effect. In summary, we found no evidence that serum CEA mediates the association between smoking and lung cancer in this cohort.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e3.7 Sensitivity Analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eTo assess model robustness and the potential impact of the small-sample subtype (Type 4, n=5), we performed a sensitivity analysis. We removed all Type 4 patients from the training set; baseline characteristics of the remaining 230 patients are shown in Supplementary Table 5. The validation set (n=101) was unchanged to provide a consistent benchmark.\u003c/p\u003e\n\u003cp\u003eIn the reduced training set, LASSO regression selected a slightly different set of candidate variables: sex, age, smoking status, serum CEA, diabetes, red blood cell count (RBC), alanine aminotransferase (ALT), LDL, and total cholesterol (TC) (Supplementary Figure 1). Importantly, age, smoking status, and serum CEA were consistently selected. After univariable and multivariable logistic regression (Supplementary Table 6), age, smoking history, and serum CEA remained significant independent predictors (all \u003cem\u003eP\u003c/em\u003e \u0026lt; 0.05). The direction and magnitude of their coefficients were consistent with the original model.\u003c/p\u003e\n\u003cp\u003eThe model rebuilt on these core variables showed performance in the validation set that closely matched the original model (Supplementary Table 7). Both models had identical AUCs (0.85, 95% CI: 0.78-0.93) and accuracy (0.78).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe sensitivity analysis model had slightly higher sensitivity (0.81 vs 0.79) and slightly lower specificity (0.73 vs 0.76), but its optimal cut-off was very close to the original (0.413 vs 0.415). Confidence intervals for positive and negative predictive values overlapped substantially. These results indicated that our model is robust and that its performance was not affected by the small-sample subtype, supporting the reliability of our conclusions.\u003c/p\u003e"},{"header":"4. Discussion","content":"\u003cp\u003eIn this study, we developed and validated a diagnostic prediction model for lung cancer that integrates age, smoking history, and serum CEA using data from a single-center cohort.\u0026nbsp;The model showed good discrimination (AUC 0.84 in training, 0.85 in validation) and clinical utility, with consistent performance across histological subtypes, suggesting good generalizability. This internally validated parsimonious model addresses the need for accessible diagnostic tools in resource-limited settings, complementing existing imaging-based strategies.\u003c/p\u003e\n\u003cp\u003eSmoking is the most important environmental risk factor for lung cancer [25], accounting for an estimated 85-90% of cases [26]. Smokers have a 15- to 30-fold higher risk of developing lung cancer than never-smokers, with a clear dose-response relationship [27, 28]. The carcinogenic mechanisms of smoking are complex and involve direct genetic damage, immune microenvironment remodeling, and epigenetic regulation. Specifically, tobacco smoke carcinogens such as Nicotine-derived Nitrosamine Ketone (NNK) and polycyclic aromatic hydrocarbons can directly or indirectly induce DNA mutations [29]. In parallel, smoke can induce PD-L1 expression on lung epithelial cells, promoting immune escape [30]. Smoking can also drive tumorigenesis through epigenetic alterations, including hypomethylation of specific genes [31]. In our study, smoking was the strongest predictor for squamous cell carcinoma (OR=5.489) and SCLC (OR=4.367) but was weaker for adenocarcinoma. This difference likely reflects the distinct anatomical origins and molecular mechanisms of these subtypes. Squamous cell carcinoma and SCLC typically arise in the central airways, which are directly exposed to tobacco smoke. Their development often follows a cumulative damage model, with continuous DNA damage leading to inactivation of tumor suppressor genes such as TP53 and RB1 [32, 33]. In contrast, adenocarcinomas often arise in the peripheral lung and are driven by specific oncogenic alterations such as EGFR mutations and ALK fusions [34, 35]. These molecular events are less strongly linked to tobacco exposure [36] and occur more often in never-smokers and women [37, 38]. Together, these factors weaken the predictive value of smoking for adenocarcinoma.\u003c/p\u003e\n\u003cp\u003eAge is another critical risk factor for lung cancer, independent of smoking [39]. We found that age \u0026ge;75 years was significantly associated with lung cancer risk (OR=6.03). This finding aligns with the overarching epidemiological trend that over 95% of lung cancer cases are diagnosed after age 50 [40], with incidence rates continuously rising with age, and approximately half of patients are diagnosed after age 65. A recent study of disease burden in China reported increasing lung cancer incidence in the over-70 age group [41]. The biological basis involves multiple aging processes including cumulative carcinogen exposure, declining DNA repair capacity, and waning immune surveillance [42-44]. However, aging rates vary among individuals, and chronological age alone may not fully explain risk heterogeneity among same-aged individuals [45]. Biological age (BA), quantified using multiple biomarkers (eg, Klemera-Doubal method [46], PhenoAge [47], homeostatic dysregulation [48]), may better reflect an individual\u0026rsquo;s physiological state and risk for age-related diseases and mortality. Studies have demonstrated significant associations between these composite measures and lung cancer risk [49].\u0026nbsp;Future integration of these more precise aging measures into prediction models may enable finer risk stratification among individuals of the same chronological age.\u003c/p\u003e\n\u003cp\u003eOur study also clarifies the distinct roles of serum and BALF CEA in lung cancer diagnosis. Serum CEA at 4.57 ng/mL showed high specificity (94%) and positive predictive value (83%) but limited sensitivity (52%). An elevated serum CEA level is a useful indicator of lung cancer, particularly adenocarcinoma (OR=6.576, \u003cem\u003eP\u003c/em\u003e\u0026lt;0.001); however, a normal level is insufficient to exclude the disease. This finding aligns with recent reports by Lv et al. [50] and Kuo et al. [51].\u0026nbsp;In contrast, BALF CEA at 22.6 ng/mL had higher sensitivity (71%) but lower specificity (57%) and positive predictive value (49%). Sanguinetti et al. similarly reported that BALF CEA, while more sensitive than serum CEA, had lower specificity and was susceptible to interference from benign lung disease, limiting its clinical utility [52].\u0026nbsp;Moreover, BALF CEA measurement is invasive, susceptible to local inflammation [53], and lacks standardized protocols [54], making it of limited value as a standalone test. However, because BALF CEA directly reflects local tumor information, it may still offer complementary value in complex cases such as peripheral lung cancer [55]. Although some studies have suggested that combining serum and BALF CEA could improve diagnostic accuracy [56], the ratio of the two performed poorly in our study. A similar phenomenon was observed by Sanguinetti et al. [52], who found no significant difference in the BALF-CEA/serum CEA ratio between patients with lung cancer and those with benign lung disease.\u0026nbsp;This phenomenon may be related to the inconsistency in the biological information reflected by these two markers and the confounding factors inherent in BALF CEA measurement, suggesting that this ratio should be interpreted with caution in clinical practice.\u003c/p\u003e\n\u003cp\u003eThe mediation analysis in this study did not find a significant mediating role for serum CEA in the association between smoking and lung cancer. Although smoking may influence CEA levels through pathways such as chronic inflammation [57, 58], the association between them did not reach statistical significance in our study (\u003cem\u003eP\u003c/em\u003e=0.133), which is inconsistent with some previous reports [59, 60]. This discrepancy may reflect our crude smoking assessment, limited sample size, and the high proportion of adenocarcinomas, in which CEA is less strongly linked to smoking [61]. This result suggests that the primary pathways of smoking-induced carcinogenesis may be more focused on its direct genotoxic effects and remodeling of the tumor microenvironment [29-31], whereas CEA elevation is more likely a concomitant phenomenon following tumor development rather than a key mediator of the smoking-carcinogenesis process. Additionally, we also found a continuous linear relationship between serum CEA and lung cancer risk, with no clear threshold effect. This supports the use of serum CEA as a continuous variable in prediction models,\u0026nbsp;thereby capturing its risk information and avoiding loss from dichotomization.\u003c/p\u003e\n\u003cp\u003eThis study has several limitations. First, its single-center retrospective design and relatively small sample size may limit generalizability. Although internal validation and sensitivity analyses supported robustness, external validation in diverse populations is needed. Second, to maintain simplicity and accessibility, we did not include CT imaging features.\u0026nbsp;In clinical practice, this model could serve as an initial screening tool to identify high-risk individuals for subsequent low-dose CT [62, 63], enabling a stratified, sequential screening approach\u003c/p\u003e\n\u003cp\u003eIn conclusion, we developed and validated a parsimonious diagnostic model for lung cancer using smoking history, age and serum CEA. The model utilizes readily accessible variables, and is easy to use, providing a practical and reliable new tool for early lung cancer diagnosis in primary care and resource-limited settings.\u003c/p\u003e"},{"header":"Abbreviations","content":"\u003cp\u003eALB: Albumin\u003c/p\u003e\n\u003cp\u003eALT: Alanine aminotransferase\u003c/p\u003e\n\u003cp\u003eAST: Aspartate aminotransferase\u003c/p\u003e\n\u003cp\u003eAUC: Area under the receiver operating characteristic curve\u003c/p\u003e\n\u003cp\u003eBA: Biological age\u003c/p\u003e\n\u003cp\u003eBALF: Bronchoalveolar lavage fluid\u003c/p\u003e\n\u003cp\u003eBMI: Body mass index\u003c/p\u003e\n\u003cp\u003eCEA: Carcinoembryonic antigen\u003c/p\u003e\n\u003cp\u003eCI: Confidence interval\u003c/p\u003e\n\u003cp\u003eCNN: Convolutional neural network\u003c/p\u003e\n\u003cp\u003eCrea: Creatinine\u003c/p\u003e\n\u003cp\u003eCVD: Cardiovascular disease\u003c/p\u003e\n\u003cp\u003eDCA: Decision curve analysis\u003c/p\u003e\n\u003cp\u003eGGT: Gamma-glutamyl transferase\u003c/p\u003e\n\u003cp\u003eHb: Hemoglobin\u003c/p\u003e\n\u003cp\u003eHDL: High-density lipoprotein cholesterol\u003c/p\u003e\n\u003cp\u003eKDM: Klemera-Doubal method\u003c/p\u003e\n\u003cp\u003eLASSO: Least absolute shrinkage and selection operator\u003c/p\u003e\n\u003cp\u003eLDCT: Low-dose computed tomography\u003c/p\u003e\n\u003cp\u003eLDH: Lactate dehydrogenase\u003c/p\u003e\n\u003cp\u003eLDL: Low-density lipoprotein cholesterol\u003c/p\u003e\n\u003cp\u003eLYMPH: Lymphocyte count\u003c/p\u003e\n\u003cp\u003eM: Median\u003c/p\u003e\n\u003cp\u003eMCMC: Markov chain Monte Carlo\u003c/p\u003e\n\u003cp\u003eNEUT: Neutrophil count\u003c/p\u003e\n\u003cp\u003eNNK: Nicotine-derived Nitrosamine Ketone\u003c/p\u003e\n\u003cp\u003eNPV: Negative predictive value\u003c/p\u003e\n\u003cp\u003eOR: Odds ratio\u003c/p\u003e\n\u003cp\u003ePLT: Platelet count\u003c/p\u003e\n\u003cp\u003ePPV: Positive predictive value\u003c/p\u003e\n\u003cp\u003eQ₁: First quartile\u003c/p\u003e\n\u003cp\u003eQ₃: Third quartile\u003c/p\u003e\n\u003cp\u003eRBC: Red blood cell count\u003c/p\u003e\n\u003cp\u003eRCS: Restricted cubic spline\u003c/p\u003e\n\u003cp\u003eROC: Receiver operating characteristic\u003c/p\u003e\n\u003cp\u003eS.E.: Standard error\u003c/p\u003e\n\u003cp\u003eSCLC: Small cell lung cancer\u003c/p\u003e\n\u003cp\u003eSD: Standard deviation\u003c/p\u003e\n\u003cp\u003eTBIL: Total bilirubin\u003c/p\u003e\n\u003cp\u003eTC: Total cholesterol\u003c/p\u003e\n\u003cp\u003eTG: Triglycerides\u003c/p\u003e\n\u003cp\u003eTP: Total protein\u003c/p\u003e\n\u003cp\u003eUA: Uric acid\u003c/p\u003e\n\u003cp\u003eWBC: White blood cell count\u003c/p\u003e\n\u003cp\u003eWHO: World Health Organization\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was conducted in accordance with the Declaration of Helsinki (revised 2013) and was approved by the Ethics Committee of the First Affiliated Hospital of Fujian Medical University (Approval No. 2024-408).\u0026nbsp;The need for individual patient informed consent was waived by the Ethics Committee of the First Affiliated Hospital of Fujian Medical University (Approval No. 2024-408) due to the retrospective design and use of anonymized data.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare that they have no competing interests\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis work was supported by High-level Talent Funding Project of The First Affiliated Hospital of Fujian Medical University (Grant number: YJRC4187) and Young and Middle-Aged Health Professionals Training Program of Fujian Province (Grant number: 2025GGB07).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWang SL designed the study and wrote the manuscript; Wang SL, Fu LF, Chen YM, Zeng AM collected the clinical data and performed the statistical analyses; Lin GF, Huang JF, Wang BY contributed to data interpretation and manuscript revision. All authors reviewed and approved the final version of the manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eSun, X.F., et al., \u003cem\u003eImmune-cell signatures of persistent inflammation, immunosuppression, and catabolism syndrome after sepsis.\u003c/em\u003e Med, 2025. \u003cstrong\u003e6\u003c/strong\u003e(5): p. 100569.\u003c/li\u003e\n\u003cli\u003eZhang, X., et al., \u003cem\u003eEconomic Burden for Lung Cancer Survivors in Urban China.\u003c/em\u003e Int J Environ Res Public Health, 2017. \u003cstrong\u003e14\u003c/strong\u003e(3).\u003c/li\u003e\n\u003cli\u003eHan, B., et al., \u003cem\u003eCancer incidence and mortality in China, 2022.\u003c/em\u003e J Natl Cancer Cent, 2024. \u003cstrong\u003e4\u003c/strong\u003e(1): p. 47\u0026ndash;53.\u003c/li\u003e\n\u003cli\u003eNooreldeen, R. and H. Bach, \u003cem\u003eCurrent and Future Development in Lung Cancer Diagnosis.\u003c/em\u003e Int J Mol Sci, 2021. \u003cstrong\u003e22\u003c/strong\u003e(16).\u003c/li\u003e\n\u003cli\u003eSheng, A., et al., \u003cem\u003eDiagnostic Efficacy of CT Radiomic Features in Pulmonary Invasive Mucinous Adenocarcinoma.\u003c/em\u003e Scanning, 2022. \u003cstrong\u003e2022\u003c/strong\u003e: p. 5314225.\u003c/li\u003e\n\u003cli\u003eToyoda, Y., et al., \u003cem\u003eSensitivity and specificity of lung cancer screening using chest low-dose computed tomography.\u003c/em\u003e Br J Cancer, 2008. \u003cstrong\u003e98\u003c/strong\u003e(10): p. 1602\u0026ndash;7.\u003c/li\u003e\n\u003cli\u003eXie, D., et al., \u003cem\u003eOverdiagnosis of Lung Cancer Due to the Introduction of Low-Dose Computed Tomography in Average-Risk Populations in the People\u0026apos;s Republic of China.\u003c/em\u003e J Thorac Oncol, 2025. \u003cstrong\u003e20\u003c/strong\u003e(7): p. 884\u0026ndash;896.\u003c/li\u003e\n\u003cli\u003eBinuya, M.A.E., et al., \u003cem\u003eMethodological guidance for the evaluation and updating of clinical prediction models: a systematic review.\u003c/em\u003e BMC Med Res Methodol, 2022. \u003cstrong\u003e22\u003c/strong\u003e(1): p. 316.\u003c/li\u003e\n\u003cli\u003ePark, B., et al., \u003cem\u003eRisk-based prediction model for selecting eligible population for lung cancer screening among ever smokers in Korea.\u003c/em\u003e Transl Lung Cancer Res, 2021. \u003cstrong\u003e10\u003c/strong\u003e(12): p. 4390\u0026ndash;4402.\u003c/li\u003e\n\u003cli\u003eCooper, J.A., et al., \u003cem\u003eThe use of electronic healthcare records for colorectal cancer screening referral decisions and risk prediction model development.\u003c/em\u003e BMC Gastroenterol, 2020. \u003cstrong\u003e20\u003c/strong\u003e(1): p. 78.\u003c/li\u003e\n\u003cli\u003eZhou, X., et al., \u003cem\u003ePredicting Cancer-Specific Survival Among Patients With Prostate Cancer After Radical Prostatectomy Based on the Competing Risk Model: Population-Based Study.\u003c/em\u003e Front Surg, 2021. \u003cstrong\u003e8\u003c/strong\u003e: p. 770169.\u003c/li\u003e\n\u003cli\u003eAn, P., et al., \u003cem\u003ePrognostic Predicting Model of Pancreatic Body Tail Carcinoma Using Clinical and CT Radiomic Data.\u003c/em\u003e Technol Cancer Res Treat, 2023. \u003cstrong\u003e22\u003c/strong\u003e: p. 15330338231186739.\u003c/li\u003e\n\u003cli\u003eSwensen, S.J., et al., \u003cem\u003eThe probability of malignancy in solitary pulmonary nodules. Application to small radiologically indeterminate nodules.\u003c/em\u003e Arch Intern Med, 1997. \u003cstrong\u003e157\u003c/strong\u003e(8): p. 849\u0026ndash;55.\u003c/li\u003e\n\u003cli\u003eMcWilliams, A., et al., \u003cem\u003eProbability of cancer in pulmonary nodules detected on first screening CT.\u003c/em\u003e N Engl J Med, 2013. \u003cstrong\u003e369\u003c/strong\u003e(10): p. 910\u0026ndash;9.\u003c/li\u003e\n\u003cli\u003eLi, Y., et al., \u003cem\u003eEstablishment of a mathematical prediction model to evaluate the probability of malignancy or benign in patients with solitary pulmonary nodules.\u003c/em\u003e Beijing Da Xue Xue Bao Yi Xue Ban, 2011. \u003cstrong\u003e43\u003c/strong\u003e(3): p. 450\u0026ndash;4.\u003c/li\u003e\n\u003cli\u003eNair, A., et al., \u003cem\u003eVariable radiological lung nodule evaluation leads to divergent management recommendations.\u003c/em\u003e Eur Respir J, 2018. \u003cstrong\u003e52\u003c/strong\u003e(6).\u003c/li\u003e\n\u003cli\u003eArdila, D., et al., \u003cem\u003eEnd-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography.\u003c/em\u003e Nat Med, 2019. \u003cstrong\u003e25\u003c/strong\u003e(6): p. 954\u0026ndash;961.\u003c/li\u003e\n\u003cli\u003eLiu, Y., et al., \u003cem\u003eRSDCNet: An efficient and lightweight deep learning model for benign and malignant pathology detection in breast cancer.\u003c/em\u003e Digit Health, 2025. \u003cstrong\u003e11\u003c/strong\u003e: p. 20552076251336286.\u003c/li\u003e\n\u003cli\u003eXiong Y, Lu NH. \u003cem\u003eResearch progress on risk factors and benign-malignant prediction models for pulmonary nodules\u003c/em\u003e. Zhongguo Yi Yao Ke Xue. 2022;\u003cstrong\u003e12\u003c/strong\u003e(23):35-38,42.\u003c/li\u003e\n\u003cli\u003e\u003cem\u003eCarcinoembryonic antigen: its role as a marker in the management of cancer. Summary of an NIH consensus statement.\u003c/em\u003e Br Med J (Clin Res Ed), 1981. \u003cstrong\u003e282\u003c/strong\u003e(6261): p. 373\u0026ndash;5.\u003c/li\u003e\n\u003cli\u003eGrunnet, M. and J.B. Sorensen, \u003cem\u003eCarcinoembryonic antigen (CEA) as tumor marker in lung cancer.\u003c/em\u003e Lung Cancer, 2012. \u003cstrong\u003e76\u003c/strong\u003e(2): p. 138\u0026ndash;43.\u003c/li\u003e\n\u003cli\u003e\u003cem\u003eWorld Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects.\u003c/em\u003e Jama, 2013. \u003cstrong\u003e310\u003c/strong\u003e(20): p. 2191\u0026ndash;4.\u003c/li\u003e\n\u003cli\u003eWHO Classification of Tumours Editorial Board, \u003cem\u003eThoracic Tumours\u003c/em\u003e. WHO classification of tumours series. Vol. 5. 2021, Lyon, France: International Agency for Research on Cancer.\u003c/li\u003e\n\u003cli\u003ePeduzzi, P., et al., \u003cem\u003eA simulation study of the number of events per variable in logistic regression analysis.\u003c/em\u003e J Clin Epidemiol, 1996. \u003cstrong\u003e49\u003c/strong\u003e(12): p. 1373\u0026ndash;9.\u003c/li\u003e\n\u003cli\u003eKehrle, K., M. Hetjens, and S. Hetjens, \u003cem\u003eRisk Factors and Preventive Measures for Lung Cancer in the European Union.\u003c/em\u003e Epidemiologia (Basel), 2024. \u003cstrong\u003e5\u003c/strong\u003e(3): p. 539\u0026ndash;546.\u003c/li\u003e\n\u003cli\u003eFreedman, N.D., et al., \u003cem\u003eCigarette smoking and subsequent risk of lung cancer in men and women: analysis of a prospective cohort study.\u003c/em\u003e Lancet Oncol, 2008. \u003cstrong\u003e9\u003c/strong\u003e(7): p. 649\u0026ndash;56.\u003c/li\u003e\n\u003cli\u003ePeto, J., \u003cem\u003eThat the effects of smoking should be measured in pack-years: misconceptions 4.\u003c/em\u003e Br J Cancer, 2012. \u003cstrong\u003e107\u003c/strong\u003e(3): p. 406\u0026ndash;7.\u003c/li\u003e\n\u003cli\u003eNational Center for Chronic Disease, P., S. Health Promotion Office on, and Health, \u003cem\u003eReports of the Surgeon General\u003c/em\u003e, in \u003cem\u003eThe Health Consequences of Smoking\u0026mdash;50 Years of Progress: A Report of the Surgeon General\u003c/em\u003e. 2014, Centers for Disease Control and Prevention (US): Atlanta (GA).\u003c/li\u003e\n\u003cli\u003eEwa, B. and M. Danuta, \u003cem\u003ePolycyclic aromatic hydrocarbons and PAH-related DNA adducts.\u003c/em\u003e J Appl Genet, 2017. \u003cstrong\u003e58\u003c/strong\u003e(3): p. 321\u0026ndash;330.\u003c/li\u003e\n\u003cli\u003eWang, G.Z., et al., \u003cem\u003eThe Aryl hydrocarbon receptor mediates tobacco-induced PD-L1 expression and is associated with response to immunotherapy.\u003c/em\u003e Nat Commun, 2019. \u003cstrong\u003e10\u003c/strong\u003e(1): p. 1125.\u003c/li\u003e\n\u003cli\u003eGao, X., et al., \u003cem\u003eTobacco smoking and methylation of genes related to lung cancer development.\u003c/em\u003e Oncotarget, 2016. \u003cstrong\u003e7\u003c/strong\u003e(37): p. 59017\u0026ndash;59028.\u003c/li\u003e\n\u003cli\u003eToyooka, S., T. Tsuda, and A.F. Gazdar, \u003cem\u003eThe TP53 gene, tobacco exposure, and lung cancer.\u003c/em\u003e Hum Mutat, 2003. \u003cstrong\u003e21\u003c/strong\u003e(3): p. 229\u0026ndash;39.\u003c/li\u003e\n\u003cli\u003eDrapkin, B.J. and A.F. Farago, \u003cem\u003eUnexpected Synergy Reveals New Therapeutic Strategy in SCLC.\u003c/em\u003e Trends Pharmacol Sci, 2019. \u003cstrong\u003e40\u003c/strong\u003e(5): p. 295\u0026ndash;297.\u003c/li\u003e\n\u003cli\u003ePaez, J.G., et al., \u003cem\u003eEGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy.\u003c/em\u003e Science, 2004. \u003cstrong\u003e304\u003c/strong\u003e(5676): p. 1497\u0026ndash;500.\u003c/li\u003e\n\u003cli\u003eSoda, M., et al., \u003cem\u003eIdentification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer.\u003c/em\u003e Nature, 2007. \u003cstrong\u003e448\u003c/strong\u003e(7153): p. 561\u0026ndash;6.\u003c/li\u003e\n\u003cli\u003eZhang, R., L. Dong, and J. Yu, \u003cem\u003eConcomitant Pathogenic Mutations and Fusions of Driver Oncogenes in Tumors.\u003c/em\u003e Front Oncol, 2020. \u003cstrong\u003e10\u003c/strong\u003e: p. 544579.\u003c/li\u003e\n\u003cli\u003eDmitrieva, A.M., I.G. Kocak, and L. Meder, \u003cem\u003eAberrations in the glycosylation of receptor tyrosine kinases: A focus on lung adenocarcinoma.\u003c/em\u003e Cytojournal, 2025. \u003cstrong\u003e22\u003c/strong\u003e: p. 62.\u003c/li\u003e\n\u003cli\u003eChapman, A.M., et al., \u003cem\u003eLung cancer mutation profile of EGFR, ALK, and KRAS: Meta-analysis and comparison of never and ever smokers.\u003c/em\u003e Lung Cancer, 2016. \u003cstrong\u003e102\u003c/strong\u003e: p. 122\u0026ndash;134.\u003c/li\u003e\n\u003cli\u003eKrist, A.H., et al., \u003cem\u003eScreening for Lung Cancer: US Preventive Services Task Force Recommendation Statement.\u003c/em\u003e Jama, 2021. \u003cstrong\u003e325\u003c/strong\u003e(10): p. 962\u0026ndash;970.\u003c/li\u003e\n\u003cli\u003eBray, F., et al., \u003cem\u003eGlobal cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.\u003c/em\u003e CA Cancer J Clin, 2024. \u003cstrong\u003e74\u003c/strong\u003e(3): p. 229\u0026ndash;263.\u003c/li\u003e\n\u003cli\u003eZhang AQ, Li HY, Dai DY. \u003cem\u003eAge-period-cohort analysis and prediction of lung cancer burden in China from 1992 to 2021\u003c/em\u003e. Lin Chuang Fei Ke Za Zhi. 2025;\u003cstrong\u003e30\u003c/strong\u003e(5):649-655.\u003c/li\u003e\n\u003cli\u003eSharma, G., N.A. Hanania, and Y.M. Shim, \u003cem\u003eThe aging immune system and its relationship to the development of chronic obstructive pulmonary disease.\u003c/em\u003e Proc Am Thorac Soc, 2009. \u003cstrong\u003e6\u003c/strong\u003e(7): p. 573\u0026ndash;80.\u003c/li\u003e\n\u003cli\u003eCalcinotto, A., et al., \u003cem\u003eCellular Senescence: Aging, Cancer, and Injury.\u003c/em\u003e Physiol Rev, 2019. \u003cstrong\u003e99\u003c/strong\u003e(2): p. 1047\u0026ndash;1078.\u003c/li\u003e\n\u003cli\u003eCho, S.J. and H.W. Stout-Delgado, \u003cem\u003eAging and Lung Disease.\u003c/em\u003e Annu Rev Physiol, 2020. \u003cstrong\u003e82\u003c/strong\u003e: p. 433\u0026ndash;459.\u003c/li\u003e\n\u003cli\u003eShi, L., et al., \u003cem\u003eResnetAge: A Resnet-Based DNA Methylation Age Prediction Method.\u003c/em\u003e Bioengineering (Basel), 2023. \u003cstrong\u003e11\u003c/strong\u003e(1).\u003c/li\u003e\n\u003cli\u003eKlemera, P. and S. Doubal, \u003cem\u003eA new approach to the concept and computation of biological age.\u003c/em\u003e Mech Ageing Dev, 2006. \u003cstrong\u003e127\u003c/strong\u003e(3): p. 240\u0026ndash;8.\u003c/li\u003e\n\u003cli\u003eLevine, M.E., et al., \u003cem\u003eAn epigenetic biomarker of aging for lifespan and healthspan.\u003c/em\u003e Aging (Albany NY), 2018. \u003cstrong\u003e10\u003c/strong\u003e(4): p. 573\u0026ndash;591.\u003c/li\u003e\n\u003cli\u003eCohen, A.A., et al., \u003cem\u003eA novel statistical approach shows evidence for multi-system physiological dysregulation during aging.\u003c/em\u003e Mech Ageing Dev, 2013. \u003cstrong\u003e134\u003c/strong\u003e(3-4): p. 110\u0026ndash;7.\u003c/li\u003e\n\u003cli\u003eMichaud, D.S., et al., \u003cem\u003eEpigenetic age and lung cancer risk in the CLUE II prospective cohort study.\u003c/em\u003e Aging (Albany NY), 2023. \u003cstrong\u003e15\u003c/strong\u003e(3): p. 617\u0026ndash;629.\u003c/li\u003e\n\u003cli\u003eLv, H., et al., \u003cem\u003eDiagnostic efficiency of peripheral tumor serum biomarkers in lung cancer and their correlation with clinicopathological features.\u003c/em\u003e Am J Transl Res, 2025. \u003cstrong\u003e17\u003c/strong\u003e(4): p. 2712\u0026ndash;2720.\u003c/li\u003e\n\u003cli\u003eKuo, Y.S., et al., \u003cem\u003ePrognostic and Monitoring Utility of Serum CEA in Lung Adenocarcinoma: Differential Roles in EGFR-TKI and Chemotherapy Treatments.\u003c/em\u003e Cancer Med, 2025. \u003cstrong\u003e14\u003c/strong\u003e(17): p. e71170.\u003c/li\u003e\n\u003cli\u003eSanguinetti, C.M., et al., \u003cem\u003eBronchoalveolar lavage fluid level of carcinoembryonic antigen in the diagnosis of peripheral lung cancer.\u003c/em\u003e Monaldi Arch Chest Dis, 1995. \u003cstrong\u003e50\u003c/strong\u003e(3): p. 177\u0026ndash;82.\u003c/li\u003e\n\u003cli\u003eHaeger, S., et al., \u003cem\u003eThe bronchoalveolar lavage dilution conundrum: an updated view on a long-standing problem.\u003c/em\u003e Am J Physiol Lung Cell Mol Physiol, 2024. \u003cstrong\u003e327\u003c/strong\u003e(5): p. L807\u0026ndash;l813.\u003c/li\u003e\n\u003cli\u003eBollmann, B.A., et al., \u003cem\u003eCellular analysis in bronchoalveolar lavage: inherent limitations of current standard procedure.\u003c/em\u003e Eur Respir J, 2017. \u003cstrong\u003e49\u003c/strong\u003e(6).\u003c/li\u003e\n\u003cli\u003eJiang K, Shao GG, Tian RG. \u003cem\u003eDiagnostic significance of radioimmunoassay for CEA and \u0026beta;2-microglobulin in serum and bronchoalveolar lavage fluid for lung cancer\u003c/em\u003e. Zhongguo Mian Yi Xue Za Zhi. 1994(3):186-187.\u003c/li\u003e\n\u003cli\u003eYang ZW, Shao RX. \u003cem\u003eDiagnostic value of CEA, CYFRA21-1, NSE, and SCC-Ag in pleural fluid and serum for lung cancer.\u003c/em\u003e Shi Yong Yi Xue Za Zhi. 2015;\u003cstrong\u003e31\u003c/strong\u003e(20):3334-3337.\u003c/li\u003e\n\u003cli\u003eAlexander, J.C., N.A. Silverman, and P.B. Chretien, \u003cem\u003eEffect of age and cigarette smoking on carcinoembryonic antigen levels.\u003c/em\u003e Jama, 1976. \u003cstrong\u003e235\u003c/strong\u003e(18): p. 1975\u0026ndash;9.\u003c/li\u003e\n\u003cli\u003eLee, J., V. Taneja, and R. Vassallo, \u003cem\u003eCigarette smoking and inflammation: cellular and molecular mechanisms.\u003c/em\u003e J Dent Res, 2012. \u003cstrong\u003e91\u003c/strong\u003e(2): p. 142\u0026ndash;9.\u003c/li\u003e\n\u003cli\u003eGhosh, I., et al., \u003cem\u003eDiagnostic Role of Tumour Markers CEA, CA15-3, CA19-9 and CA125 in Lung Cancer.\u003c/em\u003e Indian J Clin Biochem, 2013. \u003cstrong\u003e28\u003c/strong\u003e(1): p. 24\u0026ndash;9.\u003c/li\u003e\n\u003cli\u003ePothal, S., et al., \u003cem\u003eDiagnostic efficacy of broncho-alveolar lavage carcino-embronic antigen in carcinoma of lung.\u003c/em\u003e J Family Med Prim Care, 2019. \u003cstrong\u003e8\u003c/strong\u003e(5): p. 1725\u0026ndash;1729.\u003c/li\u003e\n\u003cli\u003eOkada, M., et al., \u003cem\u003eEffect of histologic type and smoking status on interpretation of serum carcinoembryonic antigen value in non-small cell lung carcinoma.\u003c/em\u003e Ann Thorac Surg, 2004. \u003cstrong\u003e78\u003c/strong\u003e(3): p. 1004\u0026ndash;9; discussion 1009\u0026ndash;10.\u003c/li\u003e\n\u003cli\u003eNational Health Commission of the People\u0026apos;s Republic of China. \u003cem\u003eLung cancer screening and early diagnosis and treatment guidelines (2024 edition)\u003c/em\u003e. Quan Ke Yi Xue Lin Chuang Yu Jiao Yu. 2024;\u003cstrong\u003e22\u003c/strong\u003e(9):772,776.\u003c/li\u003e\n\u003cli\u003eRevel, M.P., et al., \u003cem\u003eESR Essentials: lung cancer screening with low-dose CT-practice recommendations by the European Society of Thoracic Imaging.\u003c/em\u003e Eur Radiol, 2025.\u003c/li\u003e\n\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTables 1 to 8 are available in the Supplementary Files section.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-cancer","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bcan","sideBox":"Learn more about [BMC Cancer](http://bmccancer.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bcan/default.aspx","title":"BMC Cancer","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Lung cancer, prediction model, carcinoembryonic antigen, smoking, diagnosis, nomogram","lastPublishedDoi":"10.21203/rs.3.rs-9254199/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9254199/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground\u003c/strong\u003e: Lung cancer is the leading cause of cancer-related death worldwide. Current diagnostic models frequently depend on imaging features or complex algorithms, restricting their applicability in settings with limited resources. This study aimed to develop and validate a simplified diagnostic model for lung cancer using readily available clinical variables.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethods\u003c/strong\u003e: This retrospective cohort study included 336 patients who underwent bronchoscopy at a tertiary hospital from January 2020 to June 2025. Patients were randomly assigned to training (n=235) and validation (n=101) sets. Candidate predictors included demographic characteristics, smoking history, comorbidities, and laboratory parameters. Variable selection and model development were performed using a combination of least absolute shrinkage and selection operator (LASSO) regression and multivariable logistic regression. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), calibration plots, and decision curve analysis. Subgroup analyses, restricted cubic splines, and mediation analysis were conducted to explore underlying relationships.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e: We identified three independent predictors: age ≥75 years (OR=6.03, 95% CI: 1.73-21.21), smoking history (OR=3.73, 95% CI: 1.87-7.41), and serum CEA (OR=1.27, 95% CI: 1.12-1.45). The model showed good discrimination in both the training set (AUC=0.84, 95% CI: 0.79-0.90) and validation set (AUC=0.85, 95% CI: 0.78-0.93), with well-calibrated predictions and positive net benefit across a range of threshold probabilities. At a cut-off of 4.57 ng/mL, serum CEA demonstrated high specificity (94%) but moderate sensitivity (52%). In contrast, carcinoembryonic antigen in bronchoalveolar lavage fluid (BALF-CEA) at 22.6 ng/mL showed higher sensitivity (71%) but lower specificity (57%). Mediation analysis revealed that serum CEA did not significantly mediate the relationship between smoking and lung cancer (indirect effect: 0.09, 95% CI: -0.02-0.22, \u003cem\u003eP\u003c/em\u003e=0.118). The dose-response relationship between CEA and lung cancer risk was linear, with no evidence of significant threshold effects.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusions\u003c/strong\u003e: We developed a simplified diagnostic model based on age, smoking history, and serum CEA that accurately predicts lung cancer risk. Its simplicity and use of readily available variables make it particularly suitable for primary care settings and large-scale screening programs, where it can serve as an effective triage tool to identify high-risk individuals for subsequent low-dose CT examination.\u003c/p\u003e","manuscriptTitle":"Development and Validation of a Simplified Diagnostic Model for Lung Cancer Using Age, Smoking History, and Serum CEA","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-07 19:33:02","doi":"10.21203/rs.3.rs-9254199/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-05-13T09:29:35+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-08T01:43:56+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"107796793135456015699272126102639466628","date":"2026-05-07T06:54:21+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-06T00:34:53+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"258567652001530477735563906700268674011","date":"2026-05-01T09:18:17+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-30T11:04:42+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"253929102238553631777506720267629813911","date":"2026-04-30T05:22:43+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"112535925878800316775520379777748188983","date":"2026-04-29T21:09:20+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"185000728922921838834408347985859457644","date":"2026-04-29T15:33:45+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"301089985385968338156620102630167722966","date":"2026-04-29T11:51:17+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-29T09:26:43+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"334266253106392329113526138987753116974","date":"2026-04-29T09:23:07+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-29T08:46:37+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-26T19:32:51+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-04-06T05:22:33+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-04-03T12:24:44+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Cancer","date":"2026-04-03T12:19:34+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-cancer","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bcan","sideBox":"Learn more about [BMC Cancer](http://bmccancer.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bcan/default.aspx","title":"BMC Cancer","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ea4eb275-f69b-424c-8c84-46a765ee72e8","owner":[],"postedDate":"May 7th, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-13T09:29:35+00:00","index":52,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-08T01:43:56+00:00","index":51,"fulltext":""},{"type":"reviewerAgreed","content":"107796793135456015699272126102639466628","date":"2026-05-07T06:54:21+00:00","index":50,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-06T00:34:53+00:00","index":49,"fulltext":""},{"type":"reviewerAgreed","content":"258567652001530477735563906700268674011","date":"2026-05-01T09:18:17+00:00","index":48,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-30T11:04:42+00:00","index":47,"fulltext":""},{"type":"reviewerAgreed","content":"253929102238553631777506720267629813911","date":"2026-04-30T05:22:43+00:00","index":46,"fulltext":""},{"type":"reviewerAgreed","content":"112535925878800316775520379777748188983","date":"2026-04-29T21:09:20+00:00","index":45,"fulltext":""},{"type":"reviewerAgreed","content":"185000728922921838834408347985859457644","date":"2026-04-29T15:33:45+00:00","index":44,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-05-07T19:33:02+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-07 19:33:02","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9254199","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9254199","identity":"rs-9254199","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00