The predictive value of neural network models and random forest models for the classification of cervical intraepithelial lesions based on gene methylation and HPV infection genotype

doi:10.21203/rs.3.rs-9365221/v1

The predictive value of neural network models and random forest models for the classification of cervical intraepithelial lesions based on gene methylation and HPV infection genotype

2026 · doi:10.21203/rs.3.rs-9365221/v1

preprint OA: closed

Full text JSON View at publisher

Full text 120,250 characters · extracted from preprint-html · click to expand

The predictive value of neural network models and random forest models for the classification of cervical intraepithelial lesions based on gene methylation and HPV infection genotype | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article The predictive value of neural network models and random forest models for the classification of cervical intraepithelial lesions based on gene methylation and HPV infection genotype Dilibaier Anwaier, Guzailinuer Maimaitituersun, Ayituersun Yasen, and 6 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9365221/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 12 You are reading this latest preprint version Abstract Objective This study aims to evaluate the application value of neural network (NN) and random forest (RF) models integrating gene methylation markers and HPV infection typing data in the prediction of cervical intraepithelial neoplasia (CIN) grading, providing new tools for the precise screening of clinical cervical cancer. Methods Clinical data of 138 patients with cervical lesions who were treated from September 2024 to September 2025 were retrospectively collected. Among them, there were 36 cases (26.1%) in the control group, 63 cases (45.6%) with low - grade squamous intraepithelial lesion (LSIL), and 39 cases (28.3%) with high - grade squamous intraepithelial lesion (HSIL). (ASTN1, DLX1, ITGA4, RXFP3, SOX17, ZNF671) were detected. The NN and RF models were constructed. The AUC, sensitivity, specificity, and accuracy of the two models in different types of cervical intraepithelial lesions were compared and verified. Results There were significant differences in general clinical data, laboratory indicators, and pathological indicators among the three groups of patients: no intraepithelial neoplasia (NILM) in the control group, LSIL, and HSIL. There were statistically significant differences in CD4 + T, IL − 2, HPV infection typing, and gene methylation among the three groups of patients (P < 0.05). In the validation set, the sensitivity, specificity, accuracy, and AUC of the random forest model in predicting different types of cervical intraepithelial lesions were higher than those of the neural network model. Conclusion The random forest model integrating gene methylation and HPV infection typing performs best in the prediction of CIN grading, with high precision and clinical interpretability. It can be used as an efficient tool for the triage of HPV - positive populations and contribute to the optimization of cervical cancer screening strategies. Cervical intraepithelial neoplasia Gene methylation HPV infection typing Neural network Random forest Prediction model Machine learning Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Introduction Cervical cancer ,as one of the most common cancers and the fourth leading cause of cancer-related death, is associated with numerous risk factors, including human papillomavirus (HPV) infection, early onset of sexual activity, multiple sexual partners, the use of oral contraceptives and smoking. So, early screening and effective prevention paly vital roles [1]. Cervical intraepithelial neoplasia(CIN) is a precancerous stage in the development of cervical cancer and provides an opportunity for early detection and intervention[2] Although persistent human papillomavirus (HPV) infection is recognized as the principal etiological factor for cervical cancer, only a minority of HPV-infected women develop malignant progression，with most infections are cleared by host immunity. This low progression rate consequently results in a limited positive predictive value of HPV testing for high-grade cervical lesions, thereby restricting its clinical utility[3, 4]. This is because mucosal immunity can clear local HPV infections[5] . Effective HPV elimination requires mucosal and systemic immunity work in concert : CD8⁺ cytotoxic T lymphocytes (CTLs) are responsible for eliminating infected cells within the cervical epithelium, while CD4⁺ T cells provide necessary assistance through cytokine networks[6] . In the local area of vagina, activated CD4⁺ T cells differentiate into Th1 subsets and secrete IL-2, IL-12, and IFN-γ to amplify CTL responses and shape antiviral immunity [7]. Among them, IL-2 , as a key regulator factor ,has been proven to have anti-tumor functions[8]. Recent studies revealed a dynamic immunological change during process of cervical intraepithelial neoplasia: In the early-stage lesions characterized by low-grade CIN, there is compensatory immune activation occurs in the cervical region , —evidenced by increased of both CD4+ and CD8+ T cells in the cervical epithelium and elevated local IL-2 levels [9] .However, this manifestation negatively correlated with severity of the cervical intraepithelial lesions[10]. As the disease progresses, there is a simultaneous depletion of both local mucosal and systemic immunity ensues, reflected by declining IL-2 concentrations and reduced CD4+ T cell counts in peripheral blood [6, 11]. So peripheral blood biomarkers—particularly serum IL-2 and CD4⁺ T cell counts, may reflect the impact of continuous viral attack on the whole body and provide a possibility for detecting disease burden. At present , c there are many screening methods for cervical cancer . In addition to HPV DNA testing, liquid-based cytology (LBC) and the HPV–cytology co-testing strategy have been validated to enhance diagnostic sensitivity and specificity, enabling safe extension of screening intervals[12]. Some emerging approaches,including next-generation sequencing (NGS), p16 immunohistochemistry (IHC) for surrogate detection of transforming HPV infection, multiplex molecular biomarker panels, and artificial intelligence–assisted cervical cancer screening systems (AI-CCS),show great potential in the early screening and risk stratification of cervical intraepithelial neoplasia (CIN)[13]. Studies have shown that during the HPV infection of the host cell , it causes the abnormal methylation of the promoter region of host cell tumor suppressor genes ,leading to their transcriptional silencing and facilitating cervical carcinogenesis. The methylation level increases with the prolongation of the HPV infection time and the severity of cervical intraepithelial neoplasia, reaching its peak when progressing to cervical cancer[14]. Therefore, by detecting the degree of methylation of the host cell genome, it is possible to predict the grading of cervical intraepithelial neoplasia and assess the risk of cervical cancer .For instance, the commercially available GynTect® assay— has been confirmed in many studies. The DNA methylation detection panel targeting STN1, DLX1, ITGA4, RXFP3, SOX17 and ZNF671genes shows high clinical accuracy in identifying cervical precancerous lesions and stratifying the risk of progression[15, 16]; among these genes, ZNF671 methylation shows the highest diagnostic specificity[17]. This test is particularly applicable to high-risk groups with negative HPV status , filling the gaps in existing screening strategies and providing molecular basis for risk stratification management. Given the complex nonlinear interactions among these biomarkers (such as methylation levels, HPV types), traditional statistical models are unable to fully analyze their combined predictive efficacy[18]. Therefore, it is urgent to introduce machine learning algorithms that can handle the interactions of high-dimensional features in order to construct a more accurate risk stratification model. Machine learning techniques can effectively uncover the nonlinear correlations in multi-dimensional data. While Backpropagation Neural Networks (BPNN)—built on Multi-Layer Perceptron (MLP) architecture—iteratively optimize predictions by minimizing output-target deviations (predominantly in materials science [18, 19]), their biomedical applications remain sparse. Pioneering efforts include cervical cancer classification from colposcopic images (Wutsqa et al.[20] ), breast cancer histology categorization (Kaymak et al. [21]), and automated CIN diagnosis using acetic acid test features—where BPNN reportedly outperformed KNN and RF (Xu et al. [22]). Conversely, Random Forest (RF) ensembles, which aggregate predictions across multiple decision trees[23, 24], have gained broader biomedical traction due to inherent robustness with heterogeneous clinical data, as evidenced by our CIN risk stratification model's superior performance (AUC: 0.999 vs BPNN's 0.996).For instance ,Shi et al. identified differentially expressed genes (DEGs) through high-throughput sequencing and constructed a Random Forest (RF) prediction model, which exhibited robust performance in prognostic assessment of cervical cancer [12]. Notably, in a comparative analysis of RF and SVM models for predicting regression stages of cervical intraepithelial neoplasia, RF significantly outperformed SVM [25], underscoring the potential superiority of ensemble learning algorithms in forecasting disease progression trajectories. While current cervical screening models rely predominantly on histopathology, colposcopy, or routine blood biomarkers [25-29], the critical omission of integrated host-genome methylation, peripheral immune markers (IL-2/CD4⁺ T cells), and HPV genotyping represents a significant knowledge gap. Furthermore, there are only few studies compared Random Forest (RF) and Backpropagation Neural Network (BPNN) performance using identical multimodal features for CIN stratification[22]. To address these dual deficiencies, this study pioneers: (1) the first combined methylation-immune-HPV predictive framework; and (2) a head-to-head algorithmic benchmark of RF versus BPNN under matched feature conditions—ultimately identifying the optimal model (RF: AUC 0.999) for clinical deployment as an automated, evidence-based triage tool. 1. Materials and Methods 1.1 General information This study was approved by the Ethics Committee of our hospital. A total of 138 patients with cervical lesions diagnosed and treated in our hospital from September 2024 to September 2025 were selected as the study group, including 63 cases in the low - grade squamous intraepithelial lesion (LSIL) group and 39 cases in the high - grade squamous intraepithelial lesion (HSIL) group. The inclusion criteria were as follows: (1) aged ≥ 18 years, with a history of sexual intercourse and an intact cervix; (2) no severe immunodeficiency diseases or problems such as HIV infection, history of organ transplantation, or treatment with immunosuppressants; (3) willingness to undergo routine cervical cancer screening and methylation testing. The exclusion criteria were: known malignant tumors of the female genital tract or other malignant tumors still under treatment. All study subjects voluntarily participated in this study and signed the informed consent forms. A total of 5840 women who underwent opportunistic cervical screening were included in the study. Among them, 884 cases underwent colposcopy and cervical histopathology, and 138 cases had relatively complete clinical data. 1.2 Grouping of research subjects : Taking the histopathological results of colposcopic biopsy as the gold standard, the research subjects were divided into three groups: The negative group included patients with negative histopathological results or cervical inflammation; the positive group included those with histopathological results of LSIL and HSIL. 1.3 Collection of HPV and TCT specimens: The specimen collection methods for both were the same. After the patient took the lithotomy position, a vaginal speculum was used to fully expose the cervix. A cervical brush was placed at the transformation zone of the cervix, and the brush head was placed inside the cervical canal. The brush was rotated in the same direction to collect exfoliated cervical cells. The samples were sent to the pathology department of our hospital for HPV and TCT tests respectively. For HPV typing, if the types were 81, 68, 66, 58, 59, 56, 53, 52, 51, 45, 39, 35, 33, 31, 16, 18, it was judged as high - risk HPV - positive. If the TCT result was ≥ ASCUS, it was judged as TCT - positive. 1.4 DNA methylation marker analysis Exfoliated cervical cells were detected using the Gong An Li multi - gene methylation detection kit, which included a total of 6 markers: ASTN1, DLX1, ITGA4, RXFP3, SOX17, and ZNF671. The detection process was divided into three steps: cell lysis, bisulfite treatment, and real - time fluorescence polymerase chain reaction (PCR). The DNA after cell lysis was treated with bisulfite to fix the DNA methylation state. Subsequently, the template DNA was analyzed through 6 independent methylation - specific real - time PCR reactions to selectively amplify the methylated DNA regions. At the same time, real - time fluorescence dyes were used to detect the cervical cancer methylation markers and quality - control markers in real - time. The final judgment was to score each marker according to the Ct value, Tm value, and △Ct reference range of the 6 markers respectively. When the comprehensive score was ≥ 0.5 points, it was judged as methylation - positive. 1.5 Model Construction and Evaluation 1.5.1 Neural Network (NN): A 3-layer fully connected network is adopted. The input layer has 8 features, the hidden layer has 2 layers (with the number of nodes being 64 and 32 respectively), the activation function is ReLU, the output layer is for 4-class classification (NILM/LSIL/HSIL/CC), the Adam optimizer is used, and the dropout rate is set to 0.3 to prevent overfitting. 1.5.2 Random Forest (RF): The number of decision trees is set to 100, the maximum depth is 10, the minimum number of samples required to split an internal node is 4, and the number of randomly selected features is 3. At the same time, single-feature models (only methylation / only HPV typing) and a traditional logistic regression model are constructed as controls. 1.5.2 Evaluation Metrics: Accuracy, Sensitivity, Specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV) and the AUC-ROC curve are used to evaluate the performance of the models. The importance of features and interaction effects are analyzed through SHAP (SHapley Additive exPlanations). 1.6 Statistics Python 3.9 and the Scikit - learn and TensorFlow frameworks are used for model building, and SPSS 26.0 is used for statistical analysis. Measurement data are expressed as mean ± standard deviation, and the t - test is used for comparison between groups; count data are expressed as rates, and the χ² test is used. A P value < 0.05 is considered statistically significant. 2. Results 2.1 Patient clinicopathological characteristics Comparison of general clinical data, laboratory indicators and pathological indicators among the three groups of patients without intraepithelial neoplasia (NILM), low-grade squamous intraepithelial lesion (LSIL), and high-grade squamous intraepithelial lesion (HSIL) in the training set, in Table 1 . Table 1 Comparison of general clinical data, laboratory indicators and pathological indicators among the three groups of patients characteristics NILM group (n = 36) LSIL group (n = 63) HSIL group (n = 39) F / χ 2 P Age(years) 45.31 ± 4.64 45.68 ± 4.53 44.18 ± 4.45 0.435 0.576 BMI(kg/m 2 ) 22.83 ± 2.25 23.01 ± 2.16 22.18 ± 2.12 0.654 0.467 Diabetes mellitus (%) 8 (22.2%) 10(15.9%) 11(28.2%) 2.250 0.325 Age at first sexual experience<18 (%) 2(5.6%) 4(6.3%) 7 (17.9%) 4.651 0.098 Age at first pregnancy<20(%) 3 (8.3%) 5 (7.9%) 8 (20.5%) 4.222 0.121 CD4 + T(%) 36.40 ± 3.86 34.42 ± 2.41 33.92 ± 2.81 7.533 0.001 CD8 + T(%) 16.25 ± 4.65 14.88 ± 4.18 15.15 ± 4.11 1.848 0.162 CD4 + T/CD8 + T 2.34 ± 0.43 2.44 ± 0.51 2.42 ± 0.45 0.494 0.611 IL-2(pg/mL) 9.00 ± 1.89 8.76 ± 1.60 7.87 ± 1.47 5.184 0.007 IL-6(pg/mL) 6.70 ± 0.66 6.71 ± 0.64 6.51 ± 0.61 1.341 0.265 TNF-α(pg/mL) 9.82 ± 0.97 9.89 ± 0.93 9.54 ± 0.91 1.783 0.172 IFN-λ(pg/mL) 11.24 ± 1.16 11.27 ± 1.14 10.97 ± 1.07 0.862 0.424 HPV Infection Typing (No/Low/High) risk 29/7/0 50/13/0 0/4/35 124.56 0.000 DNA Methylation (%) 2(5.6%) 9(14.3%) 35(89.7%) 78.632 0.000 2.2 Construction of the BPNN Model A back-propagation neural network (BPNN) model was constructed with four input neurons corresponding to CD4 + T cells, IL-2 levels, HPV infection subtypes, and gene methylation status. Hyperparameter optimization was performed via 5-fold cross-validation and grid search. The optimal parameters were determined as follows: network topology: 4-6-3 (hidden = c(6)), maximum iterations: 107 (stepmax = 107), learning rate: 0.1. After 311,741 weight-update iterations, the loss function converged to a minimum value of 11.252035, indicating optimal model fitting (Fig. 1 ). 2.3 Construction of the Random Forest Model A random forest model was constructed based on four characteristic variables: CD4 + T, IL-2, HPV infection typing, and gene methylation. During the modeling process, approximately one-third of the out-of-bag data was generated from the training set. As a result, a total of 1000 decision trees were generated (ntree = 1000, mtry = 2, nodesize = 8). The prediction error rates of the out-of-bag data for NILM, LSIL, HSIL, and the overall situation stabilized at 38.89%, 19.04%, 10.26%, and 21.74% respectively. All four characteristic variables played a crucial role in predicting the classification of cervical intraepithelial lesions. See Fig. 2 and Fig. 3 . 2.4 External validation of the prediction efficacy of neural network models and random forest models In the validation set, the sensitivity, specificity, accuracy, and AUC of the random forest model in predicting different types of cervical intraepithelial lesions were all higher than those of the neural network model. See Fig. 4 , Fig. 5 , Fig. 6 , and Table 2 . Table 2 AUC, sensitivity, specificity, and accuracy of the validation set for neural network models and random forest models in different types of cervical intraepithelial lesions test neural network models random forest models NILM(n = 22) LSIL(n = 45) HSIL(n = 28) NILM(n = 22) LSIL(n = 45) HSIL(n = 28) AUC 0.921(95CI: 0.863–0.979) 0.913(95CI: 0.858–0.969) 0.996(95CI: 0.988-1.000) 0.954(95CI: 0.915–0.992) 0.957(95CI: 0.922–0.993) 0.999(95CI: 0.996-1.000) sensitivity (%) 86.4(19/22) 71.1(32/45) 89.3(25/28) 86.4(19/22) 82.2(37/45) 89.3(25/28) specificity (%) 82.2(60/73) 84.0(42/50) 100.0(67/67) 89.0(65/73) 88.0(44/50) 100.0(67/67) accuracy (%) 83.2(79/95) 77.9(74/95) 96.8(92/95) 88.4(84/95) 85.3(81/95) 96.8(92/95) 3. Discussion In this study, significant differences in CD4⁺T cell counts, IL-2 levels, gene methylation status, and HPV genotyping were observed among the three patient groups (Table 1 ). Consequently, we developed Backpropagation Neural Network (BPNN) and Random Forest (RF) models to evaluate their predictive performance for cervical squamous intraepithelial lesions. Feature importance analysis in the RF model revealed that HPV genotyping contributed most significantly, followed by gene methylation and CD4⁺T cells. Although IL-2 had lower predictive contribution, it played an indispensable role in optimizing decision tree structures (Fig. 3 ), supporting the rationale for incorporating these four variables. External validation via ROC curves demonstrated that the RF model significantly outperformed BPNN in specificity(, sensitivity, AUC, and accuracy for predicting both low-grade (LSIL) and high-grade (HSIL) lesions (Figs. 4 – 6 , Table 2 ).The RF-based model achieved an AUC of 0.957 (95% CI: 0.922–0.993) in predicting LSIL and 0.999(95% CI: 0.996-1.000) in HSIL, demonstrating robust diagnostic potential. The current HPV testing has a certain predictive value for high-grade squamous intraepithelial lesions of the cervix, but its specificity is limited, which may cause infected women to undergo unnecessary examinations such as colposcopy and biopsy[ 30 , 31 ]. Since some cases of high-grade squamous intraepithelial lesions of the cervix may naturally regress, the lack of an effective diversion policy may bring additional health risks to women of childbearing age[ 32 ]. Studies indicate that DNA methylation testing in HPV-positive patients not only optimizes triage but also predicts disease progression[ 33 ]. For instance, the GynTect® methylation panel (ASTN1, DLX1, ITGA4, RXFP3, SOX17, ZNF671) detects methylation markers in specific gene regions and exhibits superior specificity over conventional cytology (TCT) in predicting cervical lesion severity among hrHPV-infected individuals[ 34 ]. Recent research utilizing a novel six-gene methylation panel (FAM19A4/PHACTR3/SST/ZIC1/PAX1/ZNF671) achieved enhanced predictive performance (sensitivity 89.6%, specificity 95.0%, AUC = 0.969) [ 35 ]compared to the GynTect® panel. This superiority may be attributed to the inclusion of key genes ZNF671 and PAX1—both show comparable sensitivity for CIN3⁺ detection, but ZNF671 demonstrates superior specificity[ 17 ]. Further supporting this, a methylation-sensitive restriction enzyme qPCR (MSRE-qPCR) study identified a tri-gene panel (PAX1/ZNF671/ASCL1), suggesting that combined detection of PAX1 and ZNF671 may improve cervical lesion prediction[ 36 ]. We selected the Random Forest (RF) algorithm as our final predictive model due to its excellent predictive performance in classification tasks and its natural suitability for the heterogeneous feature structure of our clinical dataset. As an ensemble classifier that aggregates predictions from multiple decision trees through majority voting, RF inherently accommodates mixed data types—including both categorical and continuous variables—while exhibiting strong resilience to outliers. Although the predictive power of individual decision trees is limited, RF mitigates this limitation through a dual randomization strategy : (i) bootstrap aggregation (bagging), which generates diverse training subsets by sampling with replacement from the original dataset, thereby preserving overall data distribution while enhancing tree diversity; and (ii) random feature selection at each node, where only a subset of predictors is considered for optimal split determination—reducing inter-tree correlation and improving generalization. These mechanisms substantially reduce overfitting risk and elevate predictive accuracy beyond that of single-tree models[ 24 , 37 ]. Critically, our RF-based model addresses three persistent challenges in conventional cervical cancer screening, including the subjectivity of cytological interpretation, limited accessibility, and reliance on professionals making it especially viable for resource-constrained settings where advanced imaging modalities or invasive diagnostics are not accessible[ 38 ]. In this study, the RF model performed exceptionally well in grading cervical squamous intraepithelial lesions, with an area under the curve (AUC) was 0.957 (95% CI: 0.922–0.993) for low-grade squamous intraepithelial lesions (LSIL) and 0.999 (95% CI: 0.996–1.000) for high-grade squamous intraepithelial lesions (HSIL), outperforming existing machine learning models. For instance: Yuan et al. developed a CIN classification model using colposcopy images, reporting an AUC of 0.93[ 26 ]; He et al. established an RF risk prediction model combining clinical parameters and pathological images, achieving an AUC of 0.866[ 25 ]; Li et al. predicted HSIL misclassified as LSIL using clinical parameters (including HPV genotyping, cytology, and colposcopy results), with an AUC of 0.936[ 27 ] ; and Farzaneh et al. constructed a CIN severity prediction model based on clinical and demographic data, reporting an AUC of 0.944[ 28 ]. The superior performance of our model likely stems from: (1) Variable selection: Our variable selection prioritized biologically and clinically validated predictors—including HPV genotype, host cell gene methylation levels, peripheral blood IL-2 concentration, and CD4 + T-cell count, capturing multifactorial pathogenic mechanisms more comprehensively than models relying on narrower data sources. (2) Algorithmic advantages: As an ensemble learning technique, RF enhances stability and reduces overfitting risk by aggregating multiple decision trees and excels in structured data analysis, where it consistently outperforms other deep learning approaches when image features are absent[ 23 , 39 ]. (3) Data and model consistency: While backpropagation neural networks (BPNNs) demonstrate superiority in processing high-dimensional unstructured inputs (such as colposcopy image classification) [ 22 ], our study focus on structured variables, there is no doubt that our predictive model demonstrating superior predictive performance than BPNN and other machine learning models mentioned above[ 20 ]. (4) Sample size limitation: Although our validation cohort comprised 95 samples (training/validation ratio = 138/95), the relatively modest size may limit the precision of comparative AUC estimates; thus, necessitating further validation with larger cohorts. This study advances CIN risk stratification by integrating novel biomarker categories—including DNA methylation signatures, HPV genotyping profiles, and immunological indicators (IL-2/CD4⁺ T cells)—into a comparative machine learning framework (RF vs. BPNN). Our feature selection strategy specifically targets biological pathways driving CIN progression[ 40 ], a critical distinction from prior models reliant on peripheral blood parameters alone. For instance :Models based solely on hematological indices (e.g., creatinine, RBC counts) achieved limited discrimination (AUC ≤ 0.75) despite their clinical accessibility [ 29 ].Similarly, frameworks using inflammatory/coagulation markers attained modest performance (AUC ~ 0.7) for cervical neoplasia outcomes[ 41 ] The marked superiority of our approach (AUC: 0.957–0.999) underscores the imperative to incorporate direct carcinogenesis biomarkers—particularly methylation and localized immune response metrics—for precision screening. However, being a single-center retrospective study from Kashgar with limited sample size (n = 233), potential selection bias may widen confidence intervals for sensitivity/specificity estimates and reduce power to discriminate triage strategies. The absence of an independent test set—necessitated by scarce clinical samples in this understudied population—precludes definitive assessment of real-world generalizability. Our training/validation split (138:95) further limited hyperparameter optimization depth. Although feature importance analyses (MDA and Gini index; Fig. 3 ) provide mechanistic insights into predictor contributions, they cannot compensate for performance overestimation risks inherent in single-dataset validation[ 23 ]. Future multi-center cohorts should verify these findings. Due to the relatively high incidence of cervical cancer in this region, the proportion of HSIL in the patient samples we used is relatively high (29% vs. national average 10%), which may lead to deviations in the calculated HSIL risk and referral rate. Consequently, this limits the applicability of our research results to a broader screening population. However, it helps us effectively evaluate the relative performance between the two models [ 42 – 44 ]. However, our high-performance models essentially rely on specialized equipment, expert interpretation, and substantial costs, which limits their scalability in resource-constrained environments[ 45 ].Furthermore, absence of longitudinal data precludes assessment of long-term predictive value, while omission of other biomarker-based model hinders definitive conclusions on optimal frameworks. Despite these constraints, our RF model as triage strategy, integrating methylation and HPV genotyping could reduce unnecessary colposcopy referrals, alleviating cervical trauma and financial burden. Future work should validate these findings in large prospective cohorts and explore highly specific novel biomarkers (e.g., serum proteins replacing methylation assays) to streamline cost-effective screening protocols. 4. Conclusion In general, this study shows that the random forest model established based on gene methylation, HPV infection typing, peripheral blood IL − 2, and CD4 + T cell content has a higher predictive value for the typing of cervical intraepithelial neoplasia than the BPNN model, and is expected to become a powerful tool for early clinical screening of cervical cancer. To ensure the reliability of the findings, larger and more diverse sample populations are needed for validation in the future. Declarations Acknowledgements Not applicable Author Contributions Dilibaier Anwaier designed the experiment for this study, explained the data;Guzailinuer Maimaitituersun completed the experiment for this study, explained the data;Ayituersun Yasen wrote the manuscript and edited the manuscript; Guo Yu Sha, Renagu Aishan, Guzhanur Awuti, Xu Yan and others collected and analyzed the patient data; Hanikezi Tuersun ,Feng Qin, etc. guided this work. All authors participated in the discussion of the results and made contributions to the final manuscript. All authors approved the final manuscript. Corresponding authors correspondence to Hanikezi Tuersun or QIN Feng Ethics Statement This study was approved by the Medical Ethics Committee for Scientific Research Projects of the Second People's Hospital in Kashgar Region(Approval Number: [2023]20). Funding This study is partially funded by State Key Laboratory Pathogenesis ,Prevention and Treatment of High Incidence Disease in Central Asia, Xinjiang Medical University, Nos. SKL-HIDCA-2023-KE 6 Conflicts of Interest The authors declare no conflicts of interest. Data Availability Statement The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions. References Wang J, Yu Y, Tan Y, Wan H, Zheng N, He Z, Mao L, Ren W, Chen K, Lin Z, et al. Artificial intelligence enables precision diagnosis of cervical cytology grades and cervical cancer. Nat Commun. 2024;15(1):4369. Özgürlük I, Erdem HB, Alay MT, Oktar O, Sahin-Duran F, Çevik-Demir E, Tezcan AY, Keskin HL. Unveiling mitochondrial DNA copy number alterations: Insights into progression from cervical intraepithelial neoplasia to cervical cancer. Oncol Lett 2026, 31(3). Coquillard G, Palao B, Patterson BK. Quantification of intracellular HPV E6/E7 mRNA expression increases the specificity and positive predictive value of cervical cancer screening compared to HPV DNA. Gynecol Oncol. 2011;120(1):89–93. de Sanjosé S, Brotons M, Pavón MA. The natural history of human papillomavirus infection. Best Pract Res Clin Obstet Gynecol. 2018;47:2–13. Sasagawa T, Takagi H, Makinoda S. Immune responses against human papillomavirus (HPV) infection and evasion of host defense in cervical cancer. J Infect Chemother. 2012;18(6):807–15. Brito MJ, Sequeira P, Silva I, Quintas A, Martins C, Félix A. CD4 + and CD8 + cell populations in HIV-positive women with cervical squamous intra-epithelial lesions and squamous cell carcinoma. Int J Infect Dis. 2021;103:370–7. Xu HM. Th1 cytokine-based immunotherapy for cancer. Hepatobiliary Pancreat Dis Int. 2014;13(5):482–94. Jiang T, Zhou CC, Ren SX. Role of IL-2 in cancer immunotherapy. Oncoimmunology 2016, 5(6). Zhu RX, Yang AM, Wang WH, Zhao WH, Wang W, Wang ZL, Wang JT, Hou YL, Su XQ, Zhang LL et al. A population-based study of interactions between high-risk human papillomavirus infection and vaginal local cytokines CD4 CD8 IL-10 with cervical intraepithelial neoplasia. Front Oncol 2025, 15. Zhu RX, Wang WH, Yang AM, Zhao WH, Wang W, Wang ZL, Wang JT, Hou YL, Su XQ, Zhang LL et al. Interactions between vaginal local cytokine IL-2 and high-risk human papillomavirus infection with cervical intraepithelial neoplasia in a Chinese population-based study. Front Cell Infect Microbiol 2023, 13. Daniilidis A, Koutsos J, Oikonomou Z, Nasioutziki M, Hatziparadisi K, Tantanasis T. Cytokines of Cervical Mucosa and Human Papilloma Virus Infection of the Cervix: A Descriptive Study. Acta Cytol. 2016;60(1):58–64. Shi R, Chang L, Shi L, Zhang Z, Zhang L, Li X. Development and validation of a prognostic model for cervical cancer by combination of machine learning and high-throughput sequencing. Eur J Surg Oncol. 2024;50(4):108241. Raju K, Kukkamgai HM, Kamarthi PP. Evolution of Cervical Cancer Screening Techniques: A Concept of Translation Medicine. Indian J Surg Oncol 2025. Peng SX, Zhang XW, Wu YJ. Potential applications of DNA methylation testing technology in female tumors and screening methods. Biochim Et Biophys Acta-Reviews Cancer 2023, 1878(5). Vieira-Baptista P, Costa M, Hippe J, Sousa C, Schmitz M, Silva AR, Hansel A, Preti M. Evaluation of Host Gene Methylation as a Triage Test for HPV-Positive Women-A Cohort Study. J Lower Genit Tract Dis. 2024;28(4):326–31. Lin CY, Zhu CY, Xie MH, Yang H. Analysis of the triage value of multigene methylation testing for CIN2 + in hrHPV-positive patients. Infect Agents Cancer 2025, 20(1). Zhu P, Xiong J, Yuan D, Li X, Luo LL, Huang J, Wang BB, Nie QF, Wang SL, Dang LY et al. ZNF671 methylation test in cervical scrapings for cervical intraepithelial neoplasia grade 3 and cervical cancer detection. Cell Rep Med 2023, 4(8). Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015;349(6245):255–60. Rumelhart DE, Hinton GE, Williams RJ. LEARNING REPRESENTATIONS BY BACK-PROPAGATING ERRORS. Nature. 1986;323(6088):533–6. Wutsqa DU, Marwah M, Iop. Median Filter Noise Reduction of Image and Backpropagation Neural Network Model for Cervical Cancer Classification. In: 1st International Conference on Mathematics - Education, Theory and Application (ICMETA): Dec 06–07 2017 2016; Univ Sebelas Maret, Surakarta, INDONESIA; 2016. Kaymak S, Helwan A, Uzun D. Breast cancer image classification using artificial neural networks. In: 9th International Conference on Theory and Application of Soft Computing, Computing with Words and Perception (ICSCCW): Aug 22–25 2017 2017; Budapest, HUNGARY; 2017: 126–131. Du HW, Liu J, Lu H. A Cervical Intraepithelial Neoplasia Classification Method Using Feature Extraction and Back Propagation Neural Network. In: IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC): Dec 14–16 2018 2018; Chongqing, PEOPLES R CHINA ; 2018: 794–798. Binson VA, Thomas S, Subramoniam M, Arun J, Naveen S, Madhu S. A Review of Machine Learning Algorithms for Biomedical Applications. Ann Biomed Eng. 2024;52(5):1159–83. Becker T, Rousseau AJ, Geubbelmans M, Burzykowski T, Valkenborg D. Decision trees and random forests. Am J Orthod Dentofac Orthop. 2023;164(6):894–7. He SM, Zhu GM, Zhou Y, Yang BR, Wang JP, Wang ZX, Wang T. Predictive models for personalized precision medical intervention in spontaneous regression stages of cervical precancerous lesions. J Translational Med 2024, 22(1). Yuan C, Yao Y, Cheng B, Cheng Y, Li Y, Li Y, Liu X, Cheng X, Xie X, Wu J, et al. The application of deep learning based diagnostic system to cervical squamous intraepithelial lesions recognition in colposcopy images. Sci Rep. 2020;10(1):11639. Li D, Wang Z, Liu Y, Zhou M, Xia B, Zhang L, Chen K, Zeng Y. Assessing the risk of high-grade squamous intraepithelial lesions (HSIL+) in women with LSIL biopsies: a machine learning-based study. Infect Agent Cancer. 2024;19(1):61. Farzaneh F, Soltani A, Dastyar F, Mohammadi M, Hosseini MS. AI-Driven predictive modeling of cervical intraepithelial neoplasia severity: a comprehensive analysis with clinical adoption frameworks. BMC Cancer 2025, 25(1). Yue CB, Liu SC, Wang WH, Zhao Y, Zhang XF, Zhao GH. Machine learning in early screening for high-grade cervical intraepithelial neoplasia using blood testing. BMC Med Inf Decis Mak 2025, 26(1). Péfoyo AJK, Wang L, Gao JL, Kupets R. Are Women Who Exit Colposcopy Without Treatment at Elevated Risk for Cervical Cancer? J Lower Genit Tract Dis. 2017;21(1):47–54. Leeman A, del Pino M, Marimon L, Torné A, Ordi J, ter Harmsel B, Meijer C, Jenkins D, Van Kemenade FJ, Quint WGV. Reliable identification of women with CIN3 + using hrHPV genotyping and methylation markers in a cytology-screened referral population. Int J Cancer. 2019;144(1):160–8. Moscicki AB, Ma YF, Wibbelsman C, Darragh TM, Powers A, Farhat S, Shiboski S. Rate of and Risks for Regression of Cervical Intraepithelial Neoplasia 2 in Adolescents and Young Women. Obstet Gynecol. 2010;116(6):1373–80. Wang YG, Ma XP, Zhang T, Wang LF, Liu YL, Zhou GH, Wu HP. Research progress of DNA methylation in the screening of cervical cancer and precancerous lesions. Interdisciplinary Med 2025, 3(1). Ren YF, Qin FJ, Shen L, Li LF, Wu QR, Yi P. Triage of women with a positive HPV DNA test: evaluating a DNA methylation panel for detecting cervical intraepithelial neoplasia grade 3 and cervical cancer in cervical cytology samples. BMC Cancer 2025, 25(1). Ding H, Ke ZH, Xiao X, Xin BB, Xiong H, Lu W. A Predictive Model Using Six Genes DNA Methylation Markers to Identify Individuals With High Risks of High-Grade Squamous Intraepithelial Lesions and Cervical Cancer. Int J Womens Health. 2025;17:739–49. Chen SM, Dai HS, Hu SY, Gao TJ, Chen M, Zhou XH, Dai LZ, Zhao XL, Zhao FH. Novel triple-target panels utilizing methylation-sensitive restriction enzyme-based quantitative PCR for detecting advanced cervical precancers and cancers among high-risk HPV-positive women. J Translational Med 2025, 23(1). Yang WY, Gou X, Xu TQ, Yi XP, Jiang MH, Assoc Comp M. Cervical Cancer Risk Prediction Model and Analysis of Risk Factors based on Machine Learning. In: 11th International Conference on Bioinformatics and Biomedical Technology (ICBBT): May 29–31 2019 2019; Stockholm, SWEDEN ; 2019: 50–54. Arbyn M, Sankaranarayanan R, Muwonge R, Keita N, Dolo A, Mbalawa CG, Nouhou H, Sakande B, Wesley R, Somanathan T, et al. Pooled analysis of the accuracy of five cervical cancer screening tests assessed in eleven studies in Africa and India. Int J Cancer. 2008;123(1):153–60. Saini SK, Sharma DN, Chauhan S, Srivastava S, Gopishankar N, Subramani. Precision prediction of cervical cancer outcomes: A machine learning approach to recurrence and survival analysis. J Cancer Res Ther. 2025;21(3):538–46. Castle PE, Sideri M, Jeronimo J, Solomon D, Schiffman M. Risk assessment to guide the prevention of cervical cancer. Am J Obstet Gynecol. 2007;197(4):e356351–356. Bai GG, Chen FH, Qiu JJ, Hua KQ. Machine learning-based prediction of clinical outcomes in cervical cancer using routine hematological indices: development and web implementation. Front Oncol 2025, 15. Rezhake R, Chen F, Hu SY, Zhao XL, Zhang X, Cao J, Qiao YL, Zhao FH, Arbyn M. Triage options to manage high-risk human papillomavirus-positive women: A population-based cross-sectional study from rural China. Int J Cancer. 2020;147(8):2053–64. Wang Y, Cai YB, James W, Zhou JL, Rezhake R, Zhang Q. Human papillomavirus distribution and cervical cancer epidemiological characteristics in rural population of Xinjiang, China. Chin Med J (Engl). 2021;134(15):1838–44. Li N, He L, Zhang HB, Turson G, Lou JQ, Cheng HF. Human papillomavirus prevalence, genotype distribution, risk factors, and cervical pathology association in women aged 50 years and older: a retrospective cross-sectional study in Xinjiang, China. Front Oncol 2026, 15. Petersen Z, Jaca A, Ginindza TG, Maseko G, Takatshana S, Ndlovu P, Zondi N, Zungu N, Varghese C, Hunting G, et al. Barriers to uptake of cervical cancer screening services in low-and-middle-income countries: a systematic review. BMC Womens Health. 2022;22(1):486. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 22 May, 2026 Reviews received at journal 22 May, 2026 Reviewers agreed at journal 16 May, 2026 Reviewers agreed at journal 15 May, 2026 Reviews received at journal 14 May, 2026 Reviewers agreed at journal 14 May, 2026 Reviewers agreed at journal 06 May, 2026 Reviewers invited by journal 06 May, 2026 Editor invited by journal 10 Apr, 2026 Editor assigned by journal 09 Apr, 2026 Submission checks completed at journal 09 Apr, 2026 First submitted to journal 09 Apr, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9365221","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":640102443,"identity":"fe784f8d-abc3-449f-894c-475651ddaaf4","order_by":0,"name":"Dilibaier Anwaier","email":"","orcid":"","institution":"Kashi Prefecture Second People’ sHospital","correspondingAuthor":false,"prefix":"","firstName":"Dilibaier","middleName":"","lastName":"Anwaier","suffix":""},{"id":640102444,"identity":"638eecf1-b675-4059-9e4b-fba2f47290e2","order_by":1,"name":"Guzailinuer Maimaitituersun","email":"","orcid":"","institution":"Kashi Prefecture Second People’ sHospital","correspondingAuthor":false,"prefix":"","firstName":"Guzailinuer","middleName":"","lastName":"Maimaitituersun","suffix":""},{"id":640102445,"identity":"6b11e470-c54a-4944-810f-b8235440c14c","order_by":2,"name":"Ayituersun Yasen","email":"","orcid":"","institution":"Shanghai Jiao Tong University","correspondingAuthor":false,"prefix":"","firstName":"Ayituersun","middleName":"","lastName":"Yasen","suffix":""},{"id":640102446,"identity":"6f8c1cc7-14a2-4d63-b417-06986ca5dd6d","order_by":3,"name":"Guo Yuxia","email":"","orcid":"","institution":"People's Hospital of Zepu County, Kashgar Prefecture, Xinjiang Uygur Autonomous Region","correspondingAuthor":false,"prefix":"","firstName":"Guo","middleName":"","lastName":"Yuxia","suffix":""},{"id":640102447,"identity":"cf7bcb9c-49eb-407e-acbf-0b468488f7d9","order_by":4,"name":"Renagu Aishan","email":"","orcid":"","institution":"Kashi Prefecture Second People’ sHospital","correspondingAuthor":false,"prefix":"","firstName":"Renagu","middleName":"","lastName":"Aishan","suffix":""},{"id":640102448,"identity":"b5996fc3-e9e9-4f67-90d1-23b2f2cfde28","order_by":5,"name":"Guzhanur Awuti","email":"","orcid":"","institution":"Kashi Prefecture Second People’ sHospital","correspondingAuthor":false,"prefix":"","firstName":"Guzhanur","middleName":"","lastName":"Awuti","suffix":""},{"id":640102449,"identity":"250ede61-b735-406b-b078-0df0c3265eee","order_by":6,"name":"Xu Yan","email":"","orcid":"","institution":"Kashi Prefecture Second People’ sHospital","correspondingAuthor":false,"prefix":"","firstName":"Xu","middleName":"","lastName":"Yan","suffix":""},{"id":640102450,"identity":"58bb532b-173f-44a1-8d59-0975ec216f6c","order_by":7,"name":"Hanikezi Tuersun","email":"","orcid":"","institution":"The First Affiliated Hospital of Xinjiang Medical University","correspondingAuthor":false,"prefix":"","firstName":"Hanikezi","middleName":"","lastName":"Tuersun","suffix":""},{"id":640102451,"identity":"c2e2707e-860f-4109-b40d-08d269f70e87","order_by":8,"name":"Qin Feng","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAz0lEQVRIiWNgGAWjYBACNobDBx//qJCw42dvPkCcFj7GY8nGDGcskiV7jiUQp0WO+YyaNGNbBeOGGz4GRDqM7QyDdGGbBLPkDJ6PN94w2MnpNhDSwnP2gPGMcxJ8/NK9my3nMCQbmx0gpEXiXEICTxnQljlnt0nzMBxI3EZQi/wbgwM8bBJAv+Q8I1ILwxnDZp42sBY2YrUcS2accUYCFMjGlnMMiPCLfMPh4z8+VNSBovLhjTcVdnIEtaAACR4iowZZC6k6RsEoGAWjYEQAAKh6Q/S7U7uPAAAAAElFTkSuQmCC","orcid":"","institution":"People's Hospital of Zepu County, Kashgar Prefecture, Xinjiang Uygur Autonomous Region","correspondingAuthor":true,"prefix":"","firstName":"Qin","middleName":"","lastName":"Feng","suffix":""}],"badges":[],"createdAt":"2026-04-09 08:09:31","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9365221/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9365221/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":109435029,"identity":"ae703a80-db27-49ba-a0ae-a2bbc8ec8a87","added_by":"auto","created_at":"2026-05-18 05:56:06","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":130877,"visible":true,"origin":"","legend":"\u003cp\u003eBPNN model\u003c/p\u003e","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9365221/v1/048dd81076789b7fcc6ce7ed.jpeg"},{"id":109435040,"identity":"8ee93750-2236-452c-9ca9-6a68ba64e82c","added_by":"auto","created_at":"2026-05-18 05:56:10","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":86682,"visible":true,"origin":"","legend":"\u003cp\u003eRandom Forest Model Structure Diagram\u003c/p\u003e","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9365221/v1/9985bafe5225aa3e45df43a1.jpeg"},{"id":109435030,"identity":"f92e3324-ee86-4363-826a-685e5f64cde2","added_by":"auto","created_at":"2026-05-18 05:56:06","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":47575,"visible":true,"origin":"","legend":"\u003cp\u003eThe average accuracy reduction index and the average Gini index importance scores graph\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9365221/v1/4609a12a918c4ad4467c502a.jpeg"},{"id":109435038,"identity":"94a118bf-7b8c-4abf-bd03-13f6d7dd6a26","added_by":"auto","created_at":"2026-05-18 05:56:09","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":33124,"visible":true,"origin":"","legend":"\u003cp\u003eROC curves of NILM predicted by the neural network model and the random forest model on the validation set\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-9365221/v1/2711b0d9fc9b435cbecb7e94.png"},{"id":109435035,"identity":"8edf3458-df2a-42b9-8157-952e94f8ee28","added_by":"auto","created_at":"2026-05-18 05:56:08","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":33154,"visible":true,"origin":"","legend":"\u003cp\u003eROC curves of the validation set for predicting LSIL by neural network models and random forest models\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-9365221/v1/704c54336cfb8ae81d6327ef.png"},{"id":109435033,"identity":"9377790e-1bd9-4426-a6ba-ef8cd932806d","added_by":"auto","created_at":"2026-05-18 05:56:07","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":32216,"visible":true,"origin":"","legend":"\u003cp\u003eROC curves of the validation set for predicting HSIL by neural network models and random forest models\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-9365221/v1/520f6c11ab3be46d9539b791.png"},{"id":109760745,"identity":"fe1ba937-e8a7-458f-bb38-8595c01860f8","added_by":"auto","created_at":"2026-05-22 07:29:04","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":578960,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9365221/v1/7a0ceafc-df06-4fc7-b027-99fc9f50598e.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"\u003cp\u003eThe predictive value of neural network models and random forest models for the classification of cervical intraepithelial lesions based on gene methylation and HPV infection genotype\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eCervical cancer ,as one of the most common cancers and the fourth leading cause of cancer-related death, is associated with numerous risk factors, including human papillomavirus (HPV) infection, early onset of sexual activity, multiple sexual partners, the use of oral contraceptives and smoking. So, early screening and effective prevention paly vital roles [1]. Cervical intraepithelial neoplasia(CIN) is a precancerous stage in the development of cervical cancer and provides an opportunity for early detection and intervention[2]\u0026nbsp; Although persistent human papillomavirus (HPV) infection is recognized as the principal etiological factor for cervical cancer, only a minority of HPV-infected women develop malignant progression，with most infections are cleared by host immunity. This low progression rate consequently results in a limited positive predictive value of HPV testing for high-grade cervical lesions, thereby restricting its clinical utility[3, 4]. This is because mucosal immunity can clear local HPV infections[5] . Effective HPV elimination requires mucosal and systemic immunity work in concert : CD8⁺ cytotoxic T lymphocytes (CTLs) are responsible for eliminating infected cells within the cervical epithelium, while CD4⁺ T cells provide necessary assistance through cytokine networks[6] . In the local area of vagina, activated CD4⁺ T cells differentiate into Th1 subsets and secrete IL-2, IL-12, and IFN-γ to amplify CTL responses and shape antiviral immunity [7]. Among them, IL-2 , as a key regulator factor ,has been proven to have anti-tumor functions[8].\u003c/p\u003e\n\u003cp\u003eRecent studies revealed \u0026nbsp;a dynamic immunological change during process of cervical intraepithelial neoplasia: In the early-stage lesions characterized by low-grade CIN, there is compensatory immune activation occurs in the cervical region , —evidenced by increased of both CD4+ and CD8+ T cells in the cervical epithelium and elevated local IL-2 levels [9] .However, this manifestation negatively correlated with severity of the cervical intraepithelial lesions[10]. \u0026nbsp;As the disease progresses, there is a simultaneous depletion of both local mucosal and systemic immunity ensues, reflected by declining IL-2 concentrations and reduced CD4+ T cell counts in peripheral blood [6, 11]. So peripheral blood biomarkers—particularly serum IL-2 and CD4⁺ T cell counts, may reflect the impact of continuous viral attack on the whole body and provide a possibility for detecting disease burden.\u003c/p\u003e\n\u003cp\u003eAt present , c there are many screening methods for cervical cancer .\u0026nbsp;In addition to HPV DNA testing, liquid-based cytology (LBC) and the HPV–cytology co-testing strategy have been validated to enhance diagnostic sensitivity and specificity, enabling safe extension of screening intervals[12]. Some emerging approaches,including next-generation sequencing (NGS), p16 immunohistochemistry (IHC) for surrogate detection of transforming HPV infection, multiplex molecular biomarker panels, and artificial intelligence–assisted cervical cancer screening systems (AI-CCS),show great potential in the early screening and risk stratification of cervical intraepithelial neoplasia (CIN)[13]. \u0026nbsp;Studies have shown that during the HPV infection of the host cell , it causes the abnormal methylation of the promoter region of host cell tumor suppressor genes ,leading to their transcriptional silencing and facilitating cervical carcinogenesis. The methylation level increases with the prolongation of the HPV infection time and the severity of cervical intraepithelial neoplasia, reaching its peak when progressing to cervical cancer[14]. Therefore, by detecting the degree of methylation of the host cell genome, it is possible to predict the grading of cervical intraepithelial neoplasia and assess the risk of cervical cancer .For instance, the commercially available GynTect® assay— has been confirmed in many studies. The DNA methylation detection panel targeting STN1, DLX1, ITGA4, RXFP3, SOX17 and ZNF671genes shows high clinical accuracy in identifying cervical precancerous lesions and stratifying the risk of progression[15, 16];\u0026nbsp;among these genes, ZNF671 methylation shows the highest diagnostic specificity[17]. This test is particularly applicable to high-risk groups with negative HPV status , filling the gaps in existing screening strategies and providing molecular basis for risk stratification management.\u003c/p\u003e\n\u003cp\u003eGiven the complex nonlinear interactions among these biomarkers (such as methylation levels, HPV types), traditional statistical models are unable to fully analyze their combined predictive efficacy[18]. Therefore, it is urgent to introduce machine learning algorithms that can handle the interactions of high-dimensional features in order to construct a more accurate risk stratification model.\u003c/p\u003e\n\u003cp\u003eMachine learning techniques can effectively uncover the nonlinear correlations in multi-dimensional data.\u003c/p\u003e\n\u003cp\u003eWhile Backpropagation Neural Networks (BPNN)—built on Multi-Layer Perceptron (MLP) architecture—iteratively optimize predictions by minimizing output-target deviations (predominantly in materials science [18, 19]), their biomedical applications remain sparse. Pioneering efforts include cervical cancer classification from colposcopic images (Wutsqa et al.[20] ), breast cancer histology categorization (Kaymak et al. [21]), and automated CIN diagnosis using acetic acid test features—where BPNN reportedly outperformed KNN and RF (Xu et al. [22]). Conversely, Random Forest (RF) ensembles, which aggregate predictions across multiple decision trees[23, 24], have gained broader biomedical traction due to inherent robustness with heterogeneous clinical data, as evidenced by our CIN risk stratification model's superior performance (AUC: 0.999 vs BPNN's 0.996).For instance ,Shi et al. identified differentially expressed genes (DEGs) through high-throughput sequencing and constructed a Random Forest (RF) prediction model, which exhibited robust performance in prognostic assessment of cervical cancer [12]. Notably, in a comparative analysis of RF and SVM models for predicting regression stages of cervical intraepithelial neoplasia, RF significantly outperformed SVM [25], underscoring the potential superiority of ensemble learning algorithms in forecasting disease progression trajectories.\u003c/p\u003e\n\u003cp\u003eWhile current cervical screening models rely predominantly on histopathology, colposcopy, or routine blood biomarkers [25-29], the critical omission of integrated host-genome methylation, peripheral immune markers (IL-2/CD4⁺ T cells), and HPV genotyping represents a significant knowledge gap. Furthermore, there are only few studies compared Random Forest (RF) and Backpropagation Neural Network (BPNN) performance using identical multimodal features for CIN stratification[22]. To address these dual deficiencies, this study pioneers: (1) the first combined methylation-immune-HPV predictive framework; and (2) a head-to-head algorithmic benchmark of RF versus BPNN under matched feature conditions—ultimately identifying the optimal model (RF: AUC 0.999) for clinical deployment as an automated, evidence-based triage tool.\u003c/p\u003e"},{"header":"1. Materials and Methods","content":"\u003cp\u003e1.1 \u003cstrong\u003eGeneral information\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was approved by the Ethics Committee of our hospital. A total of 138 patients with cervical lesions diagnosed and treated in our hospital from September 2024 to September 2025 were selected as the study group, including 63 cases in the low - grade squamous intraepithelial lesion (LSIL) group and 39 cases in the high - grade squamous intraepithelial lesion (HSIL) group.\u003c/p\u003e\n\u003cp\u003eThe inclusion criteria were as follows: (1) aged ≥ 18 years, with a history of sexual intercourse and an intact cervix; (2) no severe immunodeficiency diseases or problems such as HIV infection, history of organ transplantation, or treatment with immunosuppressants; (3) willingness to undergo routine cervical cancer screening and methylation testing.\u003c/p\u003e\n\u003cp\u003eThe exclusion criteria were: known malignant tumors of the female genital tract or other malignant tumors still under treatment. All study subjects voluntarily participated in this study and signed the informed consent forms. A total of 5840 women who underwent opportunistic cervical screening were included in the study. Among them, 884 cases underwent colposcopy and cervical histopathology, and 138 cases had relatively complete clinical data.\u003c/p\u003e\n\u003cp\u003e1.2 \u003cstrong\u003eGrouping of research subjects\u003c/strong\u003e:\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTaking the histopathological results of colposcopic biopsy as the gold standard, the research subjects were divided into three groups: The negative group included patients with negative histopathological results or cervical inflammation; the positive group included those with histopathological results of LSIL and HSIL.\u003c/p\u003e\n\u003cp\u003e1.3 \u003cstrong\u003eCollection of HPV and TCT specimens:\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe specimen collection methods for both were the same. After the patient took the lithotomy position, a vaginal speculum was used to fully expose the cervix. A cervical brush was placed at the transformation zone of the cervix, and the brush head was placed inside the cervical canal. The brush was rotated in the same direction to collect exfoliated cervical cells. The samples were sent to the pathology department of our hospital for HPV and TCT tests respectively. For HPV typing, if the types were 81, 68, 66, 58, 59, 56, 53, 52, 51, 45, 39, 35, 33, 31, 16, 18, it was judged as high - risk HPV - positive. If the TCT result was ≥ ASCUS, it was judged as TCT - positive.\u003c/p\u003e\n\u003cp\u003e1.4 \u003cstrong\u003eDNA methylation marker analysis\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eExfoliated cervical cells were detected using the Gong An Li multi - gene methylation detection kit, which included a total of 6 markers: ASTN1, DLX1, ITGA4, RXFP3, SOX17, and ZNF671. The detection process was divided into three steps: cell lysis, bisulfite treatment, and real - time fluorescence polymerase chain reaction (PCR). The DNA after cell lysis was treated with bisulfite to fix the DNA methylation state. Subsequently, the template DNA was analyzed through 6 independent methylation - specific real - time PCR reactions to selectively amplify the methylated DNA regions. At the same time, real - time fluorescence dyes were used to detect the cervical cancer methylation markers and quality - control markers in real - time. The final judgment was to score each marker according to the Ct value, Tm value, and\u0026nbsp;△Ct reference range of the 6 markers respectively. When the comprehensive score was ≥ 0.5 points, it was judged as methylation - positive.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e1.5 Model Construction and Evaluation\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e1.5.1 Neural Network (NN):\u003c/strong\u003e A 3-layer fully connected network is adopted. The input layer has 8 features, the hidden layer has 2 layers (with the number of nodes being 64 and 32 respectively), the activation function is ReLU, the output layer is for 4-class classification (NILM/LSIL/HSIL/CC), the Adam optimizer is used, and the dropout rate is set to 0.3 to prevent overfitting.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e1.5.2 Random Forest (RF):\u003c/strong\u003e The number of decision trees is set to 100, the maximum depth is 10, the minimum number of samples required to split an internal node is 4, and the number of randomly selected features is 3. At the same time, single-feature models (only methylation / only HPV typing) and a traditional logistic regression model are constructed as controls.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e1.5.2 Evaluation Metrics:\u003c/strong\u003e Accuracy, Sensitivity, Specificity, Positive Predictive Value (PPV), Negative Predictive Value (NPV) and the AUC-ROC curve are used to evaluate the performance of the models. The importance of features and interaction effects are analyzed through SHAP (SHapley Additive exPlanations).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e1.6 Statistics\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ePython 3.9 and the Scikit - learn and TensorFlow frameworks are used for model building, and SPSS 26.0 is used for statistical analysis. Measurement data are expressed as mean ± standard deviation, and the t - test is used for comparison between groups; count data are expressed as rates, and the χ² test is used. A P value \u0026lt; 0.05 is considered statistically significant.\u003c/p\u003e"},{"header":"2. Results","content":"\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e\u003cb\u003e2.1 Patient clinicopathological characteristics\u003c/b\u003e\u003c/h2\u003e \u003cp\u003eComparison of general clinical data, laboratory indicators and pathological indicators among the three groups of patients without intraepithelial neoplasia (NILM), low-grade squamous intraepithelial lesion (LSIL), and high-grade squamous intraepithelial lesion (HSIL) in the training set, in Table \u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparison of general clinical data, laboratory indicators and pathological indicators among the three groups of patients\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003echaracteristics\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNILM group\u003c/p\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;36)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLSIL group\u003c/p\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;63)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eHSIL group\u003c/p\u003e \u003cp\u003e(n\u0026thinsp;=\u0026thinsp;39)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cem\u003eF\u003c/em\u003e/\u003cem\u003eχ\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cem\u003eP\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge(years)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e45.31\u0026thinsp;\u0026plusmn;\u0026thinsp;4.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e45.68\u0026thinsp;\u0026plusmn;\u0026thinsp;4.53\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e44.18\u0026thinsp;\u0026plusmn;\u0026thinsp;4.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.435\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.576\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBMI(kg/m\u003csup\u003e2\u003c/sup\u003e)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e22.83\u0026thinsp;\u0026plusmn;\u0026thinsp;2.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e23.01\u0026thinsp;\u0026plusmn;\u0026thinsp;2.16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e22.18\u0026thinsp;\u0026plusmn;\u0026thinsp;2.12\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.654\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.467\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiabetes mellitus (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e8 (22.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e10(15.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e11(28.2%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e2.250\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.325\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge at first sexual experience\u0026lt;18 (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2(5.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e4(6.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e7 (17.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e4.651\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.098\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAge at first pregnancy\u0026lt;20(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e3 (8.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e5 (7.9%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e8 (20.5%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e4.222\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.121\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCD4\u0026thinsp;+\u0026thinsp;T(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e36.40\u0026thinsp;\u0026plusmn;\u0026thinsp;3.86\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e34.42\u0026thinsp;\u0026plusmn;\u0026thinsp;2.41\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e33.92\u0026thinsp;\u0026plusmn;\u0026thinsp;2.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e7.533\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCD8\u0026thinsp;+\u0026thinsp;T(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e16.25\u0026thinsp;\u0026plusmn;\u0026thinsp;4.65\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e14.88\u0026thinsp;\u0026plusmn;\u0026thinsp;4.18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e15.15\u0026thinsp;\u0026plusmn;\u0026thinsp;4.11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.848\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.162\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCD4\u0026thinsp;+\u0026thinsp;T/CD8\u0026thinsp;+\u0026thinsp;T\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2.34\u0026thinsp;\u0026plusmn;\u0026thinsp;0.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.44\u0026thinsp;\u0026plusmn;\u0026thinsp;0.51\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e2.42\u0026thinsp;\u0026plusmn;\u0026thinsp;0.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.494\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.611\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIL-2(pg/mL)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e9.00\u0026thinsp;\u0026plusmn;\u0026thinsp;1.89\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e8.76\u0026thinsp;\u0026plusmn;\u0026thinsp;1.60\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e7.87\u0026thinsp;\u0026plusmn;\u0026thinsp;1.47\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5.184\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.007\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIL-6(pg/mL)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e6.70\u0026thinsp;\u0026plusmn;\u0026thinsp;0.66\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e6.71\u0026thinsp;\u0026plusmn;\u0026thinsp;0.64\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e6.51\u0026thinsp;\u0026plusmn;\u0026thinsp;0.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.341\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.265\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTNF-α(pg/mL)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e9.82\u0026thinsp;\u0026plusmn;\u0026thinsp;0.97\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e9.89\u0026thinsp;\u0026plusmn;\u0026thinsp;0.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e9.54\u0026thinsp;\u0026plusmn;\u0026thinsp;0.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.783\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.172\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIFN-λ(pg/mL)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e11.24\u0026thinsp;\u0026plusmn;\u0026thinsp;1.16\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e11.27\u0026thinsp;\u0026plusmn;\u0026thinsp;1.14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e10.97\u0026thinsp;\u0026plusmn;\u0026thinsp;1.07\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.862\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.424\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHPV Infection Typing\u003c/p\u003e \u003cp\u003e(No/Low/High) risk\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e29/7/0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e50/13/0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0/4/35\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e124.56\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDNA Methylation (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e2(5.6%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e9(14.3%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e35(89.7%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e78.632\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.000\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Construction of the BPNN Model\u003c/h2\u003e \u003cp\u003eA back-propagation neural network (BPNN) model was constructed with four input neurons corresponding to CD4\u0026thinsp;+\u0026thinsp;T cells, IL-2 levels, HPV infection subtypes, and gene methylation status. Hyperparameter optimization was performed via 5-fold cross-validation and grid search. The optimal parameters were determined as follows:\u003c/p\u003e \u003cp\u003enetwork topology: 4-6-3 (hidden\u0026thinsp;=\u0026thinsp;c(6)), maximum iterations: 107 (stepmax\u0026thinsp;=\u0026thinsp;107), learning rate: 0.1. After 311,741 weight-update iterations, the loss function converged to a minimum value of 11.252035, indicating optimal model fitting (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Construction of the Random Forest Model\u003c/h2\u003e \u003cp\u003eA random forest model was constructed based on four characteristic variables: CD4\u0026thinsp;+\u0026thinsp;T, IL-2, HPV infection typing, and gene methylation. During the modeling process, approximately one-third of the out-of-bag data was generated from the training set. As a result, a total of 1000 decision trees were generated (ntree\u0026thinsp;=\u0026thinsp;1000, mtry\u0026thinsp;=\u0026thinsp;2, nodesize\u0026thinsp;=\u0026thinsp;8). The prediction error rates of the out-of-bag data for NILM, LSIL, HSIL, and the overall situation stabilized at 38.89%, 19.04%, 10.26%, and 21.74% respectively. All four characteristic variables played a crucial role in predicting the classification of cervical intraepithelial lesions. See Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e and Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e2.4 External validation of the prediction efficacy of neural network models and random forest models\u003c/h2\u003e \u003cp\u003eIn the validation set, the sensitivity, specificity, accuracy, and AUC of the random forest model in predicting different types of cervical intraepithelial lesions were all higher than those of the neural network model. See Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, Fig.\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, and Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAUC, sensitivity, specificity, and accuracy of the validation set for neural network models and random forest models in different types of cervical intraepithelial lesions\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003etest\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c4\" namest=\"c2\"\u003e \u003cp\u003eneural network models\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"3\" nameend=\"c7\" namest=\"c5\"\u003e \u003cp\u003erandom forest models\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNILM(n\u0026thinsp;=\u0026thinsp;22)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLSIL(n\u0026thinsp;=\u0026thinsp;45)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eHSIL(n\u0026thinsp;=\u0026thinsp;28)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eNILM(n\u0026thinsp;=\u0026thinsp;22)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eLSIL(n\u0026thinsp;=\u0026thinsp;45)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eHSIL(n\u0026thinsp;=\u0026thinsp;28)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAUC\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.921(95CI: 0.863\u0026ndash;0.979)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e0.913(95CI: 0.858\u0026ndash;0.969)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e0.996(95CI: 0.988-1.000)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.954(95CI: 0.915\u0026ndash;0.992)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.957(95CI: 0.922\u0026ndash;0.993)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.999(95CI: 0.996-1.000)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003esensitivity \u003c/p\u003e \u003cp\u003e(%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e86.4(19/22)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e71.1(32/45)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e89.3(25/28)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e86.4(19/22)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e82.2(37/45)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e89.3(25/28)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003especificity (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e82.2(60/73)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e84.0(42/50)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e100.0(67/67)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e89.0(65/73)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e88.0(44/50)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e100.0(67/67)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eaccuracy (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83.2(79/95)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e77.9(74/95)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e96.8(92/95)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e88.4(84/95)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e85.3(81/95)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e96.8(92/95)\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"3. Discussion","content":"\u003cp\u003eIn this study, significant differences in CD4⁺T cell counts, IL-2 levels, gene methylation status, and HPV genotyping were observed among the three patient groups (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Consequently, we developed Backpropagation Neural Network (BPNN) and Random Forest (RF) models to evaluate their predictive performance for cervical squamous intraepithelial lesions. Feature importance analysis in the RF model revealed that HPV genotyping contributed most significantly, followed by gene methylation and CD4⁺T cells. Although IL-2 had lower predictive contribution, it played an indispensable role in optimizing decision tree structures (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e), supporting the rationale for incorporating these four variables. External validation via ROC curves demonstrated that the RF model significantly outperformed BPNN in specificity(, sensitivity, AUC, and accuracy for predicting both low-grade (LSIL) and high-grade (HSIL) lesions (Figs.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e\u0026ndash;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e, Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).The RF-based model achieved an AUC of 0.957 (95% CI: 0.922\u0026ndash;0.993) in predicting LSIL and 0.999(95% CI: 0.996-1.000) in HSIL, demonstrating robust diagnostic potential.\u003c/p\u003e \u003cp\u003eThe current HPV testing has a certain predictive value for high-grade squamous intraepithelial lesions of the cervix, but its specificity is limited, which may cause infected women to undergo unnecessary examinations such as colposcopy and biopsy[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. Since some cases of high-grade squamous intraepithelial lesions of the cervix may naturally regress, the lack of an effective diversion policy may bring additional health risks to women of childbearing age[\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. Studies indicate that DNA methylation testing in HPV-positive patients not only optimizes triage but also predicts disease progression[\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. For instance, the GynTect\u0026reg; methylation panel (ASTN1, DLX1, ITGA4, RXFP3, SOX17, ZNF671) detects methylation markers in specific gene regions and exhibits superior specificity over conventional cytology (TCT) in predicting cervical lesion severity among hrHPV-infected individuals[\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eRecent research utilizing a novel six-gene methylation panel (FAM19A4/PHACTR3/SST/ZIC1/PAX1/ZNF671) achieved enhanced predictive performance (sensitivity 89.6%, specificity 95.0%, AUC\u0026thinsp;=\u0026thinsp;0.969) [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]compared to the GynTect\u0026reg; panel. This superiority may be attributed to the inclusion of key genes ZNF671 and PAX1\u0026mdash;both show comparable sensitivity for CIN3⁺ detection, but ZNF671 demonstrates superior specificity[\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. Further supporting this, a methylation-sensitive restriction enzyme qPCR (MSRE-qPCR) study identified a tri-gene panel (PAX1/ZNF671/ASCL1), suggesting that combined detection of PAX1 and ZNF671 may improve cervical lesion prediction[\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eWe selected the Random Forest (RF) algorithm as our final predictive model due to its excellent predictive performance in classification tasks and its natural suitability for the heterogeneous feature structure of our clinical dataset. As an ensemble classifier that aggregates predictions from multiple decision trees through majority voting, RF inherently accommodates mixed data types\u0026mdash;including both categorical and continuous variables\u0026mdash;while exhibiting strong resilience to outliers. Although the predictive power of individual decision trees is limited, RF mitigates this limitation through a dual randomization strategy : (i) bootstrap aggregation (bagging), which generates diverse training subsets by sampling with replacement from the original dataset, thereby preserving overall data distribution while enhancing tree diversity; and (ii) random feature selection at each node, where only a subset of predictors is considered for optimal split determination\u0026mdash;reducing inter-tree correlation and improving generalization. These mechanisms substantially reduce overfitting risk and elevate predictive accuracy beyond that of single-tree models[\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. Critically, our RF-based model addresses three persistent challenges in conventional cervical cancer screening, including the subjectivity of cytological interpretation, limited accessibility, and reliance on professionals making it especially viable for resource-constrained settings where advanced imaging modalities or invasive diagnostics are not accessible[\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eIn this study, the RF model performed exceptionally well in grading cervical squamous intraepithelial lesions, with an area under the curve (AUC) was 0.957 (95% CI: 0.922\u0026ndash;0.993) for low-grade squamous intraepithelial lesions (LSIL) and 0.999 (95% CI: 0.996\u0026ndash;1.000) for high-grade squamous intraepithelial lesions (HSIL), outperforming existing machine learning models. For instance: Yuan et al. developed a CIN classification model using colposcopy images, reporting an AUC of 0.93[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]; He et al. established an RF risk prediction model combining clinical parameters and pathological images, achieving an AUC of 0.866[\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]; Li et al. predicted HSIL misclassified as LSIL using clinical parameters (including HPV genotyping, cytology, and colposcopy results), with an AUC of 0.936[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e] ; and Farzaneh et al. constructed a CIN severity prediction model based on clinical and demographic data, reporting an AUC of 0.944[\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe superior performance of our model likely stems from: (1) Variable selection: Our variable selection prioritized biologically and clinically validated predictors\u0026mdash;including HPV genotype, host cell gene methylation levels, peripheral blood IL-2 concentration, and CD4\u0026thinsp;+\u0026thinsp;T-cell count, capturing multifactorial pathogenic mechanisms more comprehensively than models relying on narrower data sources. (2) Algorithmic advantages: As an ensemble learning technique, RF enhances stability and reduces overfitting risk by aggregating multiple decision trees and excels in structured data analysis, where it consistently outperforms other deep learning approaches when image features are absent[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. (3) Data and model consistency: While backpropagation neural networks (BPNNs) demonstrate superiority in processing high-dimensional unstructured inputs (such as colposcopy image classification) [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e], our study focus on structured variables, there is no doubt that our predictive model demonstrating superior predictive performance than BPNN and other machine learning models mentioned above[\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. (4) Sample size limitation: Although our validation cohort comprised 95 samples (training/validation ratio\u0026thinsp;=\u0026thinsp;138/95), the relatively modest size may limit the precision of comparative AUC estimates; thus, necessitating further validation with larger cohorts.\u003c/p\u003e \u003cp\u003eThis study advances CIN risk stratification by integrating novel biomarker categories\u0026mdash;including DNA methylation signatures, HPV genotyping profiles, and immunological indicators (IL-2/CD4⁺ T cells)\u0026mdash;into a comparative machine learning framework (RF vs. BPNN). Our feature selection strategy specifically targets biological pathways driving CIN progression[\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e], a critical distinction from prior models reliant on peripheral blood parameters alone. For instance :Models based solely on hematological indices (e.g., creatinine, RBC counts) achieved limited discrimination (AUC\u0026thinsp;\u0026le;\u0026thinsp;0.75) despite their clinical accessibility [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e].Similarly, frameworks using inflammatory/coagulation markers attained modest performance (AUC\u0026thinsp;~\u0026thinsp;0.7) for cervical neoplasia outcomes[\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e]\u003c/p\u003e \u003cp\u003eThe marked superiority of our approach (AUC: 0.957\u0026ndash;0.999) underscores the imperative to incorporate direct carcinogenesis biomarkers\u0026mdash;particularly methylation and localized immune response metrics\u0026mdash;for precision screening.\u003c/p\u003e \u003cp\u003eHowever, being a single-center retrospective study from Kashgar with limited sample size (n\u0026thinsp;=\u0026thinsp;233), potential selection bias may widen confidence intervals for sensitivity/specificity estimates and reduce power to discriminate triage strategies. The absence of an independent test set\u0026mdash;necessitated by scarce clinical samples in this understudied population\u0026mdash;precludes definitive assessment of real-world generalizability. Our training/validation split (138:95) further limited hyperparameter optimization depth. Although feature importance analyses (MDA and Gini index; Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e) provide mechanistic insights into predictor contributions, they cannot compensate for performance overestimation risks inherent in single-dataset validation[\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. Future multi-center cohorts should verify these findings. Due to the relatively high incidence of cervical cancer in this region, the proportion of HSIL in the patient samples we used is relatively high (29% vs. national average 10%), which may lead to deviations in the calculated HSIL risk and referral rate. Consequently, this limits the applicability of our research results to a broader screening population. However, it helps us effectively evaluate the relative performance between the two models [\u003cspan additionalcitationids=\"CR43\" citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e]. However, our high-performance models essentially rely on specialized equipment, expert interpretation, and substantial costs, which limits their scalability in resource-constrained environments[\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e].Furthermore, absence of longitudinal data precludes assessment of long-term predictive value, while omission of other biomarker-based model hinders definitive conclusions on optimal frameworks.\u003c/p\u003e \u003cp\u003eDespite these constraints, our RF model as triage strategy, integrating methylation and HPV genotyping could reduce unnecessary colposcopy referrals, alleviating cervical trauma and financial burden. Future work should validate these findings in large prospective cohorts and explore highly specific novel biomarkers (e.g., serum proteins replacing methylation assays) to streamline cost-effective screening protocols.\u003c/p\u003e"},{"header":"4. Conclusion","content":"\u003cp\u003eIn general, this study shows that the random forest model established based on gene methylation, HPV infection typing, peripheral blood IL\u0026thinsp;\u0026minus;\u0026thinsp;2, and CD4\u0026thinsp;+\u0026thinsp;T cell content has a higher predictive value for the typing of cervical intraepithelial neoplasia than the BPNN model, and is expected to become a powerful tool for early clinical screening of cervical cancer. To ensure the reliability of the findings, larger and more diverse sample populations are needed for validation in the future.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eDilibaier Anwaier designed the experiment for this study, explained the data;Guzailinuer Maimaitituersun completed the experiment for this study, explained the data;Ayituersun Yasen wrote the manuscript and edited the manuscript; Guo Yu Sha, Renagu Aishan,\u0026nbsp;Guzhanur Awuti, Xu Yan and others collected and analyzed the patient data; Hanikezi Tuersun ,Feng Qin, etc. guided this work. All authors participated in the discussion of the results and made contributions to the final manuscript. All authors approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCorresponding authors\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003ecorrespondence to Hanikezi Tuersun or QIN Feng\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study was approved by the Medical Ethics Committee for Scientific Research Projects of the Second People's Hospital in Kashgar Region(Approval Number: [2023]20).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study is partially funded by\u0026nbsp;State Key Laboratory Pathogenesis ,Prevention and Treatment of High Incidence Disease in Central Asia, Xinjiang Medical University, Nos. SKL-HIDCA-2023-KE 6\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflicts of Interest\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors declare no conflicts of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData Availability Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eWang J, Yu Y, Tan Y, Wan H, Zheng N, He Z, Mao L, Ren W, Chen K, Lin Z, et al. Artificial intelligence enables precision diagnosis of cervical cytology grades and cervical cancer. Nat Commun. 2024;15(1):4369.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e\u0026Ouml;zg\u0026uuml;rl\u0026uuml;k I, Erdem HB, Alay MT, Oktar O, Sahin-Duran F, \u0026Ccedil;evik-Demir E, Tezcan AY, Keskin HL. Unveiling mitochondrial DNA copy number alterations: Insights into progression from cervical intraepithelial neoplasia to cervical cancer. Oncol Lett 2026, 31(3).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCoquillard G, Palao B, Patterson BK. Quantification of intracellular HPV E6/E7 mRNA expression increases the specificity and positive predictive value of cervical cancer screening compared to HPV DNA. Gynecol Oncol. 2011;120(1):89\u0026ndash;93.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ede Sanjos\u0026eacute; S, Brotons M, Pav\u0026oacute;n MA. The natural history of human papillomavirus infection. Best Pract Res Clin Obstet Gynecol. 2018;47:2\u0026ndash;13.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSasagawa T, Takagi H, Makinoda S. Immune responses against human papillomavirus (HPV) infection and evasion of host defense in cervical cancer. J Infect Chemother. 2012;18(6):807\u0026ndash;15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBrito MJ, Sequeira P, Silva I, Quintas A, Martins C, F\u0026eacute;lix A. CD4\u0026thinsp;+\u0026thinsp;and CD8\u0026thinsp;+\u0026thinsp;cell populations in HIV-positive women with cervical squamous intra-epithelial lesions and squamous cell carcinoma. Int J Infect Dis. 2021;103:370\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXu HM. Th1 cytokine-based immunotherapy for cancer. Hepatobiliary Pancreat Dis Int. 2014;13(5):482\u0026ndash;94.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiang T, Zhou CC, Ren SX. Role of IL-2 in cancer immunotherapy. Oncoimmunology 2016, 5(6).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu RX, Yang AM, Wang WH, Zhao WH, Wang W, Wang ZL, Wang JT, Hou YL, Su XQ, Zhang LL et al. A population-based study of interactions between high-risk human papillomavirus infection and vaginal local cytokines CD4 CD8 IL-10 with cervical intraepithelial neoplasia. Front Oncol 2025, 15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu RX, Wang WH, Yang AM, Zhao WH, Wang W, Wang ZL, Wang JT, Hou YL, Su XQ, Zhang LL et al. Interactions between vaginal local cytokine IL-2 and high-risk human papillomavirus infection with cervical intraepithelial neoplasia in a Chinese population-based study. Front Cell Infect Microbiol 2023, 13.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDaniilidis A, Koutsos J, Oikonomou Z, Nasioutziki M, Hatziparadisi K, Tantanasis T. Cytokines of Cervical Mucosa and Human Papilloma Virus Infection of the Cervix: A Descriptive Study. Acta Cytol. 2016;60(1):58\u0026ndash;64.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShi R, Chang L, Shi L, Zhang Z, Zhang L, Li X. Development and validation of a prognostic model for cervical cancer by combination of machine learning and high-throughput sequencing. Eur J Surg Oncol. 2024;50(4):108241.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRaju K, Kukkamgai HM, Kamarthi PP. Evolution of Cervical Cancer Screening Techniques: A Concept of Translation Medicine. Indian J Surg Oncol 2025.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePeng SX, Zhang XW, Wu YJ. Potential applications of DNA methylation testing technology in female tumors and screening methods. Biochim Et Biophys Acta-Reviews Cancer 2023, 1878(5).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVieira-Baptista P, Costa M, Hippe J, Sousa C, Schmitz M, Silva AR, Hansel A, Preti M. Evaluation of Host Gene Methylation as a Triage Test for HPV-Positive Women-A Cohort Study. J Lower Genit Tract Dis. 2024;28(4):326\u0026ndash;31.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin CY, Zhu CY, Xie MH, Yang H. Analysis of the triage value of multigene methylation testing for CIN2\u0026thinsp;+\u0026thinsp;in hrHPV-positive patients. Infect Agents Cancer 2025, 20(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhu P, Xiong J, Yuan D, Li X, Luo LL, Huang J, Wang BB, Nie QF, Wang SL, Dang LY et al. ZNF671 methylation test in cervical scrapings for cervical intraepithelial neoplasia grade 3 and cervical cancer detection. Cell Rep Med 2023, 4(8).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015;349(6245):255\u0026ndash;60.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRumelhart DE, Hinton GE, Williams RJ. LEARNING REPRESENTATIONS BY BACK-PROPAGATING ERRORS. Nature. 1986;323(6088):533\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWutsqa DU, Marwah M, Iop. Median Filter Noise Reduction of Image and Backpropagation Neural Network Model for Cervical Cancer Classification. In: \u003cem\u003e1st International Conference on Mathematics - Education, Theory and Application (ICMETA): Dec 06\u0026ndash;07\u003c/em\u003e 2017 2016; Univ Sebelas Maret, Surakarta, INDONESIA; 2016.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKaymak S, Helwan A, Uzun D. Breast cancer image classification using artificial neural networks. In: \u003cem\u003e9th International Conference on Theory and Application of Soft Computing, Computing with Words and Perception (ICSCCW): Aug 22\u0026ndash;25\u003c/em\u003e 2017 2017; Budapest, HUNGARY; 2017: 126\u0026ndash;131.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDu HW, Liu J, Lu H. A Cervical Intraepithelial Neoplasia Classification Method Using Feature Extraction and Back Propagation Neural Network. In: \u003cem\u003eIEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC): Dec 14\u0026ndash;16 2018 2018; Chongqing, PEOPLES R CHINA\u003c/em\u003e; 2018: 794\u0026ndash;798.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBinson VA, Thomas S, Subramoniam M, Arun J, Naveen S, Madhu S. A Review of Machine Learning Algorithms for Biomedical Applications. Ann Biomed Eng. 2024;52(5):1159\u0026ndash;83.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBecker T, Rousseau AJ, Geubbelmans M, Burzykowski T, Valkenborg D. Decision trees and random forests. Am J Orthod Dentofac Orthop. 2023;164(6):894\u0026ndash;7.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHe SM, Zhu GM, Zhou Y, Yang BR, Wang JP, Wang ZX, Wang T. Predictive models for personalized precision medical intervention in spontaneous regression stages of cervical precancerous lesions. J Translational Med 2024, 22(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYuan C, Yao Y, Cheng B, Cheng Y, Li Y, Li Y, Liu X, Cheng X, Xie X, Wu J, et al. The application of deep learning based diagnostic system to cervical squamous intraepithelial lesions recognition in colposcopy images. Sci Rep. 2020;10(1):11639.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi D, Wang Z, Liu Y, Zhou M, Xia B, Zhang L, Chen K, Zeng Y. Assessing the risk of high-grade squamous intraepithelial lesions (HSIL+) in women with LSIL biopsies: a machine learning-based study. Infect Agent Cancer. 2024;19(1):61.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFarzaneh F, Soltani A, Dastyar F, Mohammadi M, Hosseini MS. AI-Driven predictive modeling of cervical intraepithelial neoplasia severity: a comprehensive analysis with clinical adoption frameworks. BMC Cancer 2025, 25(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYue CB, Liu SC, Wang WH, Zhao Y, Zhang XF, Zhao GH. Machine learning in early screening for high-grade cervical intraepithelial neoplasia using blood testing. BMC Med Inf Decis Mak 2025, 26(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eP\u0026eacute;foyo AJK, Wang L, Gao JL, Kupets R. Are Women Who Exit Colposcopy Without Treatment at Elevated Risk for Cervical Cancer? J Lower Genit Tract Dis. 2017;21(1):47\u0026ndash;54.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLeeman A, del Pino M, Marimon L, Torn\u0026eacute; A, Ordi J, ter Harmsel B, Meijer C, Jenkins D, Van Kemenade FJ, Quint WGV. Reliable identification of women with CIN3\u0026thinsp;+\u0026thinsp;using hrHPV genotyping and methylation markers in a cytology-screened referral population. Int J Cancer. 2019;144(1):160\u0026ndash;8.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMoscicki AB, Ma YF, Wibbelsman C, Darragh TM, Powers A, Farhat S, Shiboski S. Rate of and Risks for Regression of Cervical Intraepithelial Neoplasia 2 in Adolescents and Young Women. Obstet Gynecol. 2010;116(6):1373\u0026ndash;80.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang YG, Ma XP, Zhang T, Wang LF, Liu YL, Zhou GH, Wu HP. Research progress of DNA methylation in the screening of cervical cancer and precancerous lesions. Interdisciplinary Med 2025, 3(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRen YF, Qin FJ, Shen L, Li LF, Wu QR, Yi P. Triage of women with a positive HPV DNA test: evaluating a DNA methylation panel for detecting cervical intraepithelial neoplasia grade 3 and cervical cancer in cervical cytology samples. BMC Cancer 2025, 25(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDing H, Ke ZH, Xiao X, Xin BB, Xiong H, Lu W. A Predictive Model Using Six Genes DNA Methylation Markers to Identify Individuals With High Risks of High-Grade Squamous Intraepithelial Lesions and Cervical Cancer. Int J Womens Health. 2025;17:739\u0026ndash;49.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen SM, Dai HS, Hu SY, Gao TJ, Chen M, Zhou XH, Dai LZ, Zhao XL, Zhao FH. Novel triple-target panels utilizing methylation-sensitive restriction enzyme-based quantitative PCR for detecting advanced cervical precancers and cancers among high-risk HPV-positive women. J Translational Med 2025, 23(1).\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang WY, Gou X, Xu TQ, Yi XP, Jiang MH, Assoc Comp M. Cervical Cancer Risk Prediction Model and Analysis of Risk Factors based on Machine Learning. In: \u003cem\u003e11th International Conference on Bioinformatics and Biomedical Technology (ICBBT): May 29\u0026ndash;31 2019 2019; Stockholm, SWEDEN\u003c/em\u003e; 2019: 50\u0026ndash;54.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eArbyn M, Sankaranarayanan R, Muwonge R, Keita N, Dolo A, Mbalawa CG, Nouhou H, Sakande B, Wesley R, Somanathan T, et al. Pooled analysis of the accuracy of five cervical cancer screening tests assessed in eleven studies in Africa and India. Int J Cancer. 2008;123(1):153\u0026ndash;60.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSaini SK, Sharma DN, Chauhan S, Srivastava S, Gopishankar N, Subramani. Precision prediction of cervical cancer outcomes: A machine learning approach to recurrence and survival analysis. J Cancer Res Ther. 2025;21(3):538\u0026ndash;46.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCastle PE, Sideri M, Jeronimo J, Solomon D, Schiffman M. Risk assessment to guide the prevention of cervical cancer. Am J Obstet Gynecol. 2007;197(4):e356351\u0026ndash;356.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBai GG, Chen FH, Qiu JJ, Hua KQ. Machine learning-based prediction of clinical outcomes in cervical cancer using routine hematological indices: development and web implementation. Front Oncol 2025, 15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRezhake R, Chen F, Hu SY, Zhao XL, Zhang X, Cao J, Qiao YL, Zhao FH, Arbyn M. Triage options to manage high-risk human papillomavirus-positive women: A population-based cross-sectional study from rural China. Int J Cancer. 2020;147(8):2053\u0026ndash;64.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang Y, Cai YB, James W, Zhou JL, Rezhake R, Zhang Q. Human papillomavirus distribution and cervical cancer epidemiological characteristics in rural population of Xinjiang, China. Chin Med J (Engl). 2021;134(15):1838\u0026ndash;44.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi N, He L, Zhang HB, Turson G, Lou JQ, Cheng HF. Human papillomavirus prevalence, genotype distribution, risk factors, and cervical pathology association in women aged 50 years and older: a retrospective cross-sectional study in Xinjiang, China. Front Oncol 2026, 15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePetersen Z, Jaca A, Ginindza TG, Maseko G, Takatshana S, Ndlovu P, Zondi N, Zungu N, Varghese C, Hunting G, et al. Barriers to uptake of cervical cancer screening services in low-and-middle-income countries: a systematic review. BMC Womens Health. 2022;22(1):486.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-cancer","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bcan","sideBox":"Learn more about [BMC Cancer](http://bmccancer.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bcan/default.aspx","title":"BMC Cancer","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Cervical intraepithelial neoplasia, Gene methylation, HPV infection typing, Neural network, Random forest, Prediction model, Machine learning","lastPublishedDoi":"10.21203/rs.3.rs-9365221/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9365221/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eObjective\u003c/h2\u003e \u003cp\u003eThis study aims to evaluate the application value of neural network (NN) and random forest (RF) models integrating gene methylation markers and HPV infection typing data in the prediction of cervical intraepithelial neoplasia (CIN) grading, providing new tools for the precise screening of clinical cervical cancer.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eClinical data of 138 patients with cervical lesions who were treated from September 2024 to September 2025 were retrospectively collected. Among them, there were 36 cases (26.1%) in the control group, 63 cases (45.6%) with low - grade squamous intraepithelial lesion (LSIL), and 39 cases (28.3%) with high - grade squamous intraepithelial lesion (HSIL). (ASTN1, DLX1, ITGA4, RXFP3, SOX17, ZNF671) were detected. The NN and RF models were constructed. The AUC, sensitivity, specificity, and accuracy of the two models in different types of cervical intraepithelial lesions were compared and verified.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThere were significant differences in general clinical data, laboratory indicators, and pathological indicators among the three groups of patients: no intraepithelial neoplasia (NILM) in the control group, LSIL, and HSIL. There were statistically significant differences in CD4\u0026thinsp;+\u0026thinsp;T, IL\u0026thinsp;\u0026minus;\u0026thinsp;2, HPV infection typing, and gene methylation among the three groups of patients (P\u0026thinsp;\u0026lt;\u0026thinsp;0.05). In the validation set, the sensitivity, specificity, accuracy, and AUC of the random forest model in predicting different types of cervical intraepithelial lesions were higher than those of the neural network model.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e \u003cp\u003eThe random forest model integrating gene methylation and HPV infection typing performs best in the prediction of CIN grading, with high precision and clinical interpretability. It can be used as an efficient tool for the triage of HPV - positive populations and contribute to the optimization of cervical cancer screening strategies.\u003c/p\u003e","manuscriptTitle":"The predictive value of neural network models and random forest models for the classification of cervical intraepithelial lesions based on gene methylation and HPV infection genotype","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-18 05:55:40","doi":"10.21203/rs.3.rs-9365221/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-05-22T08:35:55+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-22T04:36:56+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"71672076883622767267134473019320179990","date":"2026-05-16T11:56:10+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"202267220384931535017156703905015851273","date":"2026-05-15T09:37:49+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-15T02:45:09+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"308614905500133344892012599808348209449","date":"2026-05-14T06:22:14+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"59861519579246682718939974617989688999","date":"2026-05-07T00:37:10+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-05-06T14:09:38+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-04-10T20:45:37+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-10T01:00:34+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-04-10T00:59:40+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Cancer","date":"2026-04-09T07:55:06+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-cancer","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bcan","sideBox":"Learn more about [BMC Cancer](http://bmccancer.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bcan/default.aspx","title":"BMC Cancer","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c5ead83b-f68c-456b-8be1-5fccc6332f81","owner":[],"postedDate":"May 18th, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-22T08:35:55+00:00","index":75,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-22T04:36:56+00:00","index":74,"fulltext":""},{"type":"reviewerAgreed","content":"71672076883622767267134473019320179990","date":"2026-05-16T11:56:10+00:00","index":71,"fulltext":""},{"type":"reviewerAgreed","content":"202267220384931535017156703905015851273","date":"2026-05-15T09:37:49+00:00","index":70,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-15T02:45:09+00:00","index":69,"fulltext":""},{"type":"reviewerAgreed","content":"308614905500133344892012599808348209449","date":"2026-05-14T06:22:14+00:00","index":68,"fulltext":""},{"type":"reviewerAgreed","content":"59861519579246682718939974617989688999","date":"2026-05-07T00:37:10+00:00","index":46,"fulltext":""},{"type":"reviewersInvited","content":"30","date":"2026-05-06T14:09:38+00:00","index":"","fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-05-18T05:55:40+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-18 05:55:40","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9365221","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9365221","identity":"rs-9365221","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00