IMATX: An Integrated Multi-Context Pyramidal Framework for Explainable and Interpretable AI Predictions for Real time Clinical Validation in Cervical Cancer Detection | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article IMATX: An Integrated Multi-Context Pyramidal Framework for Explainable and Interpretable AI Predictions for Real time Clinical Validation in Cervical Cancer Detection Tathagat Banerjee This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6232663/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The detection of cervical cancer through histopathological images remains difficult due to complex cellular features combined with diverse staining practices which make current methods ineffective in clinical settings. Automated classification systems need to identify significant spatial features along with contextual features because this leads to more accurate and dependable diagnoses. Present methods show limitations when identifying complex associations between elements thereby reducing their escalating characteristics and broad applicability. This paper introduces the deep learning architecture IMATX Net which combines IMA attention modules with T-blocks for improved feature selection and classification accomplishments. Effective lesion discrimination alongside interpretability emerges from the proposed network because it refines both spatial and contextual elements. The system operates through a multiple-stage procedure which integrates attention channelling together with feature refinement along with classification steps. Through the IMA layer monitoring of attention the model creates better explainability by marking down crucial diagnostic regions. The ablation study evaluates all vital network components to show their effects on classification results. IMATX Net produces higher performance than current machine learning (ML) and deep learning (DL) systems while delivering maximum sensitivity and specificity and accuracy and precision and F1-score. The reliability of the model gets measured through confusion matrix (CM) along with ROC-AUC curves yet training and validation curves prove the learning stays stable even with minimal overfitting. IMATX Net demonstrates a sensitivity value of 0.97 which exceeds all other state-of-the-art techniques. The experimental findings show that IMATX Net demonstrates effective performance in addressing cervical cancer detection problems in histopathological imaging. The proposed model delivers robust interpretable clinical-scale classification through its integration of multi-scale attention features with refinement methods. The research verifies feature refinement techniques utilizing attention mechanisms as crucial elements for medical image analysis while allowing future improvements in automatic cervical cancer screening methods. Cervical cancer detection Deep learning IMATX Net Attention mechanism T-block Medical image classification Explainable AI Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 1 Introduction The Cervical-based malignancy(cancer) is a major health issue which disproportionately affects developing nations. The main cause of cervical cancer stems from chronic infections with high-risk HPV strains where HPV-16 and HPV-18 are the prominent culprits. The disease shows slow invisible progression while staying without symptoms at its initial phases. The progression of cervical cancer leads to the appearance of these symptoms including abnormal bleeding from the vagina alongside pelvic pain and continuous discharge. Successful treatment alongside improved patient survival rates results from early discovery of the disease. Traditional cervical cancer screening approaches combining Pap smears with HPV testing acts as a primary factor in lowering cervical cancer occurrence rates. These methods have established restrictions which affect their applicability. Pathologist evaluation for Pap smear analysis requires human judgement because it depends on expert interpretation yet such opinions remain vulnerable to misdiagnosis. The ongoing problem results from both incorrect test outcomes and troubled diagnostic delays. These screening options face accessibility and affordability challenges in various low-resource regions which restrict their usefulness at the broader scale. Cervical cancer cell diagnosis requires pathologists to identify dyskeratotic together with koilocytotic cells and metaplastic along with parabasal cell and superficial-intermediate cellular features. Hand-based pathologic review through manual cytological examination extends the duration of testing alongside exposing results to different interpretation patterns between analysts. Manual interpretation methods increase diagnostic errors and delay times which thus affects patient treatment results negatively. Deep learning methods in automated diagnostic systems have gained increasing interest for handling current diagnostic obstacles. Cervical cancer classification demonstrates promising results through three main artificial intelligence models including Convolutional neural networks (CNNs), vision transformers (ViTs) and hybrid models. Improvements in medical practice require solutions for the existing challenges of image quality uncertainty alongside small visual irregularities and the requirement for tested clinical applications. Medical staff can advance early-stage cervical cancer detection with AI-based approaches which leads to enhanced patient survival rates alongside decreased global cervical cancer mortality statistics. 1.1 Key Contributions The study proposes a hybrid deep learning framework with Integrated Multi-context Attention or IMA module and T-blocks for pyramidal multi dilated convolutional feature extraction. The study further Evaluates the impact of various architectural components to illustrate the significance of each in building the cervical cancer autonomous detection model. Attention visualization through the IMA layer for improved interpretability in medical diagnosis. This research evaluates various state-of-the-art models before showing how IMATX works in comparison to current ML and DL approaches. 1.2 Organization of the research The organisation of this work is structured to present the research contributions. The Literature Review, reviews existing related studies, analysing the machine learning and deep learning methodologies for medical image classification. The Methodology showcases the data description and the proposed IMATX Net architecture. The Results and Discussion illustrates the results and ablation study. The Conclusion provides the summarisation of the major findings and use of the IMATX architecture. 2 Literature Review Pacal and Kılıcarslan introduced [ 1 ] a solid cervical cancer diagnostic method that combines CNNs and ViTs and applies their approach to the SIPaKMeD Pap smear dataset. Data augmentation combined with ensemble learning revealed that ViTs outperformed CNNs in diagnostic accuracy for clinical use in early cancer detection with precision accuracy. The authors Kalbhor and Shinde [ 2 ] applied deep learning techniques to cervical cancer diagnosis by combining pre-trained models with machine learning classifiers for feature extraction. The authors reached 92.03% accuracy using ResNet-50 in their experiments but improved results reached 96.01% by fine-tuning GoogleNet. A Deep Neural Network (DNN) automated cervical cancer classification framework was introduced by Kumari et al. [ 3 ] to do early-stage prediction. The automated system completed data preprocessing then eliminated outliers while reducing dimensions with PCA before it successfully classified normal and abnormal cervical cell types. Using CNNs Youneszade et al. [ 4 ] examined the role of class granulation on detecting cervical cancer through colposcopic images. The network model attained 99% training accuracy yet its ability to generalize was weak because test accuracy fell to 43.11%. Cheng et al. [ 5 ] established a complete procedure for cervical cancer image processing through a sequence of imaging steps starting from acquisition to preprocessing and feature extraction and target detection. The researchers revealed understanding of CNN-based designs in addition to GANs alongside autoencoders for better feature depiction. DeepCervixNet stands as an elite deep learning detection model according to Talpur et al. [ 6 ] that specializes in cervical cancer diagnosis from Pap smear images. The combined model design linked sequence and excitation blocks to both ResNet101 and DenseNet169 neural networks which used ensemble techniques to reach classifying accuracy of 99.89%. Grad-CAM enables explainability in cervical cancer diagnosis according to Bueno-Crespo et al. [ 7 ] who introduced an interpretable deep learning model built with CNNs. Devaraj et al. [ 8 ] investigated pre-trained CNNs ResNet50V2 and InceptionV3 and Xception for cervical cancer detection in Pap smear pictures. ResNet50V2 proved most efficient among the tested deep learning models during cross-validation according to accuracy, precision, recall and F1-score metrics thus confirming deep learning models can effectively identify cervical cancer early through non-invasive diagnostics. The research by Mathivanan et al. [ 9 ] studied pre-trained deep neural networks AlexNet, InceptionV3, ResNet-101, and ResNet-152 for cervical cancer detection. A combination of extracted features with machine learning classifiers through the hybrid approach made ResNet-152 emerge as the best model that demonstrated 98.08% accuracy on the SIPaKMeD dataset. Tan et al. [ 10 ] created deep learning models which automatically detected cervical cancer while erasing the necessity for both segmenting areas and manually working on features. The research of Tripathi et al. [ 11 ] involved deep learning for cervical cancer classification by using ResNet-152 on SIPaKMeD Pap smear dataset samples. The deep network-based model delivered 94.89% accuracy in classifying cervical cytology samples thus validating its effectiveness for cervical cytology analysis. The cross-attention-based Transformer model CerviFormer by Deo et al. [ 12 ] was designed specifically for Pap smear image cervical cancer classification. The architecture proved effective at processing big datasets while delivering competent performance across two public data collections so it represents a promising tool for both early diagnosis and clinical decisions. Jeyshri and Kowsigan [ 13 ] developed a attention-based hybrid model to segment and classify cervical cancer in biomedical imaging. Their approach used Multiscale ResUNet + + along with Fuzzy C-means Clustering for segmentation until it was finished by Serial Cascaded Residual Attention together with Long Short-Term Memory (LSTM) for the classification stage. 3 Methodology 3.1 Data Description The dataset comprises 12,147 microscopic cervical cell images categorized into five distinct classes: Dyskeratotic, Koilocytotic, Metaplastic, Parabasal, and Superficial-Intermediate. Dyskeratotic cells (2,439 images) are characterized by dense cytoplasm and irregular nuclei, often indicating high-grade squamous intraepithelial lesions (HSIL) and potential malignancy. Koilocytotic cells (2,475 images) exhibit perinuclear halos and nuclear atypia, typically associated with HPV infections and early precancerous changes. Metaplastic cells (2,379 images) represent transitional epithelial changes commonly found in non-malignant cervical transformation zones. Parabasal cells (2,361 images) are immature squamous cells with a high nuclear-to-cytoplasmic ratio, frequently observed in atrophic smears and inflammatory conditions. Superficial-Intermediate cells (2,493 images) are mature epithelial cells present in normal and benign cervical samples. Collected as part of the IEEE International Conference on Image Processing (ICIP) 2018 in Athens, Greece, from October 7–10, this dataset serves as a crucial resource for developing automated deep learning-based cervical cancer detection models, addressing the limitations of traditional screening methods and enhancing early diagnosis accuracy. 3.2 IMATX Net The study presents a deep learning model which uses DenseNet169 backbone architecture together with attention mechanisms along with T-block modules and Integrated Multi-context Attention (IMA) module for automated classification of cervical cancer. This architecture method was developed to strengthen the feature extraction operations and build more accurate classifications for processing variable cervical cell image data. The model uses various deep learning innovations to advance cervical cancer identification processes. The network exhibits DenseNet169 features since it extracts information from ImageNet pre-trained features to generate hierarchical representations along with minimized processing constraints. Once the training begins the backbone section stays frozen because this protects the learned feature maps for maintaining a steady feature extraction process. Through this method medical imaging applications obtain better generalization together with reduced overfitting problems. The Fig. 1 IMATX Proposed illustration demonstrates a system which applies DenseNet169 as its feature extractor to generate deep hierarchical features alongside initial layer freezing to protect learned weights. The attention mechanism applies 1×1 convolutions together with sigmoid-activated attention masks and features weights to highlight important areas and reduce unnecessary information. Two types of convolutions run in parallel during T-block operations to obtain both local and wide-ranged context information. The blocks progress in filter size from 64 to 128 to 256 while both batch normalization and ReLU activation enable stable learning for feature enhancement. The Integrated Multi-Context Attention (IMA) module applies a combination of global average and max pooling with 1×1, 3×3, and 5×5 dilation rate multi-scale convolutions which strengthens hierarchical feature learning. Feature retention occurs through a combination between the feature masking and enhancement procedure and the T-block output multiplication with attention-weighted feature maps. Finally the classification procedure occurs through layers with dropout regularization (64, 128, 512 neurons) that provide robust performance. The implementation of multi-scale feature extraction with attention mechanisms and optimization techniques produces an enhanced ability for the model to precisely categorize cervical cell images. 3.2.1 Attention Channelling In this architecture is the use of batch normalization and attention mechanisms in symphony. Batch normalization stabilises the training by normalizing intermediate features, thus forming a steady convergence. The attention module refines feature maps by applying a channel-wise attention mechanism, which enhances the important regions in the extracted feature vectors. The pyramidal framework has three convolutional layers that reduce dimensions and generate a spatial Convoluted attention mask. This mask is then multiplied with the feature maps to reduce the non-corelated regions of attention perception of the model, thereby improving the model’s ability to focus on diagnostically significant cellular growths. The attention mechanism is introduced to refine feature maps by highlighting informative regions. This Attention Changeling consists of: A Self convolution of 1x1 convolutions that reduce feature dimensions. A sigmoid-activated convolution to generate an attention mask. Multiplication of the attention mask with the extracted features to enhance relevant regions while suppressing redundant information. 3.2.2 Integrated Multi-context Attention (IMA) The feature representation receives improvement through an Integrated Multi-context Attention (IMA) module. IMA(as shown by Fig. 2 ) contains two types of pooling named global average pooling (GAP) and global max pooling (GMP) which are refined through ANN to achieve semantic attention. Two decreased and enhanced feature vectors move through three respectively sized convolutional filter streams to collect contextual data from different levels. The process enables the model to evaluate many spatial scales while improving its discriminatory abilities. IMA serves as a module that combines semantic attention with CNN-modelling techniques of context extraction based on pyramidal approaches. Key components include: Global average and max pooling applied to extracted features. Dense Neural Networks are applied to pooled features to generate a semantic attention mask. Multiplication of the attention mask with extracted features for selective enhancement. Pyramidal convolutions (1x1, 3x3 with dilation = 2, 5x5 with dilation = 3) are used for concatenation to capture contextual variations. 3.2.3 T Block T-block modules developed by the study apply dilated convolutions to acquire contextual information. The T-block consists of two sequential convolutional networks with normal and dilated configuration before merging their outputs through concatenation. No additional computational power is required through this approach because it allows the network to keep global together with local context. By stacking T-blocks (shown by Fig. 3 ) the model's detection capabilities of cervical cancer complex patterns enhance because the added complexity of features becomes possible to learn. The T-block module incorporates: A 3x3 convolutional layer with ReLU activation. A dilated 3x3 convolutional layer with a specified dilation rate to capture contextual features. Feature concatenation followed by batch normalization and ReLU activation. Three T-blocks are applied sequentially with increasing filters (64, 128, and 256) to progressively refine feature representation. 3.2.4 Feature Refinement and Classification This module aggregates global average pooling (GAP) and global max pooling (GMP) features, followed by dense layers to refine semantic attention. The refined feature maps are scaled using three parallel convolutional filters of varying kernel sizes, incorporating dilation rates to capture hierarchical contextual information. This process allows the model to consider diverse spatial scales and reinforce its discriminative capacity. To refine features: The refined features from the T-blocks are multiplied with the earlier masked features from the attention mechanism. Global average pooling is applied to both the final features and the attention mask. A rescaled GAP operation normalizes the features. The final classification head consists of: A dropout layer (0.5) to prevent overfitting. A dense layer with 128 units and ELU activation. Another dropout layer (0.25) for regularization. A final softmax layer predicting five cervical cancer categories. Global average pooling together with dropout nodes and dense layers combine to produce predictions at the last stage of classification. The attention-weighted feature maps go through GAP layers while additional dropout layers provide regularization. The last fully connected layer uses ELU activation to increase non-linear behavior before generating class probability predictions through its softmax activation output. The model reaches optimization through Adam optimization alongside categorical cross-entropy loss optimization which maintains strong multi-class classification capabilities.These components work in unison to let the model handle the difficulties faced in medical imaging while dealing with quality variations and subtle type differences of cervical cancer. 4 Results and Discussion 4.1 Cervical Cancer Detection using IMATX Network The evaluation of IMAX NET requires analysis through ablation studies together with attention explainability and state-of-the-art comparisons as well as performance metrics tracking. An ablation study validates the essential nature of both the IMA attention module and T-blocks because they function as critical elements that drive better feature extraction and classification outcomes. Results demonstrate that the model becomes less accurate when its essential modules are removed from operation thus establishing their important function. Medical imaging interpretation becomes more effective through the IMA layer as it visualizes where the model directs its attention to important diagnostic areas. The Fig. 4 Training and Validation Loss, Accuracy below illustrates the training and validation accuracy and loss curves for IMAX NET. 4.1.1 Ablation Study Ablation studies systematically remove or modify components of a model to evaluate their individual contributions to overall performance. By comparing different configurations, researchers can determine which architectural components enhance accuracy, precision, recall, and F1-score. Table 1 Ablation Study on IMAX Ne Model Precision Recall F1-Score Accuracy DenseNet169 0.9475 0.9359 0.9413 0.9389 DenseNet201 0.9621 0.9544 0.9579 0.9545 Inception + ResNet V2 0.9639 0.9623 0.9629 0.9617 With IMA module 0.9639 0.9616 0.9627 0.9687 Parallel T Block 0.8612 0.8069 0.8226 0.8137 IMAX Network (DenseNet169 + Pyramidal T Block + IMA module) 0.9764 0.97 0.973 0.972 From the Table 1 Ablation Study on IMAX Net, we observe that DenseNet169 alone achieves an accuracy of 93.89%, with a macro-average F1-score of 0.9413. The utilization of DenseNet201 as backbone results in better performance with an accuracy of 95.45% and F1-score of 0.9579. Results improve when implementing Inception + ResNet V2 (F1-score: 0.9629 along with accuracy: 96.17%) because combining these architectures generates positive effects. Standardized results indicate that improvements from the IMA module are modest when using IMA (F1-score: 0.9627 and accuracy: 96.87%) since attention mechanisms enhance the representation of features. Performances decrease dramatically when Parallel T Blocks are included because this module functions independently as an insufficient component for accurate classification tasks (F1-score: 0.8226, accuracy: 81.37%). The IMAX Network combined with DenseNet169 along with Pyramidal T Blocks and IMA module reaches optimal performance (F1-score: 0.9730 accuracy: 97.20%) which indicates maximum robustness in feature extraction and classification. 4.1.2 Attention – IMA layer Explainability The IMA (Integrated Multi-Attention) layer functions as a vital pathway for selecting target features and disposing nonessential background information. The IMA layer creates effective region targeting through attention visualization methods which enhances both clinical interpretation capabilities and medical diagnosis assistance. IMAX NET focuses its analysis exclusively on regions containing diagnostic importance which proves its effectiveness according to activation and heatmaps findings. 4.1.3 Evaluation Parameters The evaluation data presented by Fig. 6 IMAX Network Confusion Matrix on how the model classified the five cells as Dyskeratotic, Koilocytotic, Metaplastic, Parabasal and Superficial-Intermediate in the IMAX Network Confusion Matrix. The correct classifications of samples for each class appear on the main diagonal whereas misidentification cases exist in the off-diagonal areas of the matrix. The Dyskeratotic class had 216 correctly classified instances, with 5 misclassifications (2 as Koilocytotic, 3 as Metaplastic, and 2 as Superficial-Intermediate). The Koilocytotic class was well-classified with 231 correct predictions, but 7 instances were misclassified, mainly as Dyskeratotic (4) and minor errors in other classes. The Metaplastic class achieved 267 correct classifications, with only 4 misclassified samples. The Parabasal class had 105 correct classifications, with 3 misclassified samples, mainly into Superficial-Intermediate. The Superficial-Intermediate class had 123 correct predictions, with only 3 misclassified samples Further the Fig. 7 IMAX Network ROC – AUC curve suggests that the model performs exceptionally well, with minimal misclassifications. The high number of correct classifications along the diagonal and the low count of off-diagonal misclassifications indicate strong precision and recall for each class. The minor errors could be attributed to feature similarities between some classes, especially in the Metaplastic and Superficial-Intermediate categories, where slight misclassifications were observed. 4.1.4 State of the Art Comparison The Table 2 State of the Art Model Comparison showcases that the IMAX NET - Proposed Method outperforms existing approaches across all key metrics, demonstrating superior classification performance. The IMATX demonstrates a sensitivity rate of 0.97 with high precision that minimizes incorrect negative results better than alternative approaches particularly Habtemariam et al. (0.713) and Tanimu et al. (0.713). Fei M et al. [ 21 ] demonstrates a model performance with 85.26% accuracy along with 84.16% F1 score but stops short of reporting specificity measurements for assessment. The accuracy rate reaches 0.972 which proves superior to Pacal and Kılıcarslan (0.893). The high precision (0.976) together with F1-score (0.973) demonstrates IMAX NET delivers balanced results which indicate its robust ability for accurate and reliable classification over previous models. Table 2 State of the Art Model Compariso Metric IMAX NET - Proposed Method Abdalla Ibrahim Abdalla Musa [ 20 ] Pacal and Kılıcarslan [ 1 ] Habtemariam et al. [ 18 ] Fei, M ET AL. [ 21 ] Tanimu et al. [ 19 ] Sensitivity 0.97 0.934 0.89 0.804 0.8348 0.825 Specificity 0.971 0.905 0.827 0.713 NA 0.713 Accuracy 0.972 0.902 0.893 0.821 0.8526 0.821 Precision 0.976 0.904 0.858 0.825 0.8314 0.825 F1 Score 0.973 0.923 0.89 0.804 0.8416 0.804 5 Conclusion The IMAX NET model shows better results in all essential measurement criteria which surpass conventional methods in sensitivity along with specificity and accuracy and precision and F1-score. The model demonstrates excellent capabilities in achieving an optimal ratio of false positivity and negative detection which permits high medical image classification reliability. The ablation study assessed individual sections of IMAX NET by disconnecting its main components such as the IMA (Interleaved Multi-Attention) module and T-Blocks. Ablation tests reveal how each crucial aspect of IMAX NET functions by analyzing both the IMA (Interleaved Multi-Attention) module and T-Blocks. Research results showed that performance degradation occurred when essential components were eliminated which proved their essential role in the success of feature extraction and classification accuracy. Two key components in IMAX NET adopt dilated convolutions and channel attention mechanisms to enhance spatial awareness through the T-Block combined with multiple attention fusion mechanisms that improve feature representation in the IMA module. The Autonomous Pyramidial feature extraction occurs within IMAX NET which produces predictions that are accurate and applicable to various situations. Through its attention-based design IMAX NET surpasses CNN-based architectures and ResNet and UNet variations because it enables better contextual assessment to improve classification performance. Additionally, attention visualization techniques reveal that IMAX NET efficiently focuses on diagnostically significant regions, demonstrating higher interpretability and clinical relevance. This visual evidence confirms that the model does not rely on irrelevant background information but rather on critical features for classification, ensuring reliability in medical imaging applications. Declarations Author Contribution T.B. awrote the main manuscript text and prepared figures. All authors reviewed the manuscript. References Pacal, I., Kılıcarslan, S.: Deep learning-based approaches for robust classification of cervical cancer. Neural Comput. Appl. 35 (25), 18813–18828 (2023). https://doi.org/10.1007/s00521-023-08757-w Kalbhor, M.M., Shinde, S.V.: Cervical cancer diagnosis using convolution neural network: Feature learning and transfer learning approaches. Soft. Comput. 1–11 (2023). https://doi.org/10.1007/s00500-023-08969-1 Kumari, C.M., Bhavani, R., Padmashree, S., Priya, R.: Automated cervical cancer classification using deep neural network classifier. Int. J. Model. Simul. Sci. Comput. 15 (1), 2450008 (2024). https://doi.org/10.1142/S1793962324500089 Youneszade, N., Marjani, M., Shafiq, D.A.: Exploring the impact of increasing the number of classes on the performance of cervical cancer detection models using deep learning and colposcopy. J. Eng. Sci. Technol. 19 (2), 629–647 (2024) Cheng, C., Yang, Y., Qu, Y.: Exploration of cervical cancer image processing technology based on deep learning. International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024). SPIE, 13180, 255–263. (2024). https://doi.org/10.1117/12.3033802 Talpur, D.B., Raza, A., Khowaja, A., Shah, A.: DeepCervixNet: An advanced deep learning approach for cervical cancer classification in pap smear images. VAWKUM Trans. Comput. Sci. 12 (1), 136–148 (2024). https://doi.org/10.21015/vtcs.v12i1.1812 Bueno-Crespo, A., Martínez-España, R., Morales-García, J., Ortíz-González, A., Imbernón, B., Martínez-Más, J., Rosique-Egea, D., Álvarez, M.A.: Diagnosis of cervical cancer using a deep learning explainable fusion model. International Work-Conference on the Interplay Between Natural and Artificial Computation. Springer: Cham, 451–460. (2024). https://doi.org/10.1007/978-3-031-61137-7_42 Devaraj, S., Madian, N., Menagadevi, M., Remya, R.: Deep learning approaches for analysing pap smear images to detect cervical cancer. Wireless Pers. Commun. 1–18 (2024). https://doi.org/10.1007/s11277-024-10986-8 Mathivanan, S.K., Francis, D., Srinivasan, S., Khatavkar, V., Shah, P.K., M.A: Enhancing cervical cancer detection and robust classification through a fusion of deep learning models. Sci. Rep. 14 (1), 10812 (2024). https://doi.org/10.1038/s41598-024-61063-w Tan, S.L., Selvachandran, G., Ding, W., Paramesran, R., Kotecha, K.: Cervical cancer classification from pap smear images using deep convolutional neural network models. Interdisciplinary Sciences: Comput. Life Sci. 16 (1), 16–38 (2024). https://doi.org/10.1007/s12539-023-00589-5 Tripathi, A., Arora, A., Bhan, A. Classification of cervical cancer using deep learning algorithm. 2021 5th International Conference on Intelligent Computing and, Systems, C.: (ICICCS), Madurai, India, 1210–1218. (2021). https://doi.org/10.1109/ICICCS51141.2021.9432382 Deo, B.S., Pal, M., Panigrahi, P.K., Pradhan, A.: CerviFormer: A pap smear-based cervical cancer classification method using cross-attention and latent transformer. Int. J. Imaging Syst. Technol. 34 (2), e23043 (2024). https://doi.org/10.1002/ima.23043 Jeyshri, J., Kowsigan, M.: Multi-stage attention-based long short-term memory networks for cervical cancer segmentation and severity classification. Iran. J. Sci. Technol. Trans. Electr. Eng. 48 (1), 445–470 (2024). https://doi.org/10.1007/s40998-023-00664-z Ganguly, T., Singh, R.P., Kumar, P.: Self-attention based ResNet model for cervical cancer detection. 2023 Second International Conference on Informatics (ICI), Noida, India, 1–6. (2023). https://doi.org/10.1109/ICI60088.2023.10421309 Xia, M., Zhang, G., Mu, C., Guan, B., Wang, M.: Cervical cancer cell detection based on deep convolutional neural network. 2020 39th Chinese Control Conference (CCC), Shenyang, China, 6527–6532. (2020). https://doi.org/10.23919/CCC50068.2020.9188454 Ghoneim, A., Muhammad, G., Hossain, M.S.: Cervical cancer classification using convolutional neural networks and extreme learning machines. Future Generation Comput. Syst. 102 , 643–649 (2020). https://doi.org/10.1016/j.future.2019.09.015 Fan, Z.Z., Wu, X.C., Li, C.Z., Chen, H.Y., Liu, W.L., Zheng, Y.C., Chen, J., Li, X.Y., Sun, H.Z., Jiang, T., Grzegorzek, M., Li, C.: CAM-VT: A weakly supervised cervical cancer nest image identification approach using conjugated attention mechanism and visual transformer. Comput. Biol. Med. 162 , 107070 (2023). https://doi.org/10.1016/j.compbiomed.2023.107070 Habtemariam, L.W., Zewde, E.T., Simegn, G.L.: Cervix type and cervical cancer classification system using deep learning techniques. Med. Devices: Evid. Res. 15 , 163–176 (2022). https://doi.org/10.2147/mder.s366303 Yi, J.X., Liu, X.L., Cheng, S.H., Chen, L., Zeng, S.Q.: Multi-scale window transformer for cervical cytopathology image recognition. Comput. Struct. Biotechnol. J. 24 , 314–321 (2024). https://doi.org/10.1016/j.csbj.2024.04.028 Musa, A.I.A., Adam, M.M.S.: Attention-Guided Hybrid Network for Cervical Cancer Classification. Ingénierie des. systèmes d Inform. 30 (1), 191–202 (2025). https://doi.org/10.18280/isi.300116 Fei, M., Zhang, X., Chen, D., Song, Z., Wang, Q., Zhang, L.: Whole slide cervical cancer classification via graph attention networks and contrastive learning. Neurocomputing, 613 , (2025). 128787.] Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6232663","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":434554507,"identity":"862665a0-3da3-448b-bd80-2fd74c7d2420","order_by":0,"name":"Tathagat Banerjee","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABlElEQVRIie2RMWvbQBTHnzg4La9oPaE2+grvEMgphPirnBG4i1IMWQoB+4KhWZR49ZavkI7ZFATNktZrhlIcBJk8XCgUp6i0Z8VKE3+CQvQbHn897nf3HgJoafkP6dYV68ofWu5kDoZsYOzSGBLrk9QoUq8VBFYrAvAKnOlg5fK+MyUhNhTK1+GfIlJgaGz0MGa46jxHHn25KD+83tnqurPP8+Wv3SHAXmGQvm+FY4jLncH2sAPsxsDgW7MLvk/kFfYjxMSVxyeJvfNrXwjaj6iAd1FqB3ureSSAbh93SWNfY9HLIOHiVcYEODoGItU7Y5AHK8UOH9sVimaX2aJzr/HPKPNK7v/ORgIYREaRGp2OHR1s14r786lyncaOxlyhSHiAy0IARxI5KQUF4wHUCj59RU4XkR0skdl1GQdv9KX94H1fk5JnBWcyI+Gfj3FfqEel66Xy7jDbDd1J79ZfVAcehGXxo6pUGE5md/NlNfQ67tEnY6pGqXGyJny0l+SwATFb1EZz2YRq8/iz/97S0tLyQvkLMX1/jvQ/AkYAAAAASUVORK5CYII=","orcid":"","institution":"Indian Institute of Technology Patna","correspondingAuthor":true,"prefix":"","firstName":"Tathagat","middleName":"","lastName":"Banerjee","suffix":""}],"badges":[],"createdAt":"2025-03-15 11:53:17","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6232663/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6232663/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":79665918,"identity":"b19309b8-7e35-4b89-8246-3e8cb4400c60","added_by":"auto","created_at":"2025-04-01 10:10:46","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":128853,"visible":true,"origin":"","legend":"\u003cp\u003eIMATX Proposed illustration\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-6232663/v1/12c5502dfd9025f766e3c9fc.png"},{"id":79666357,"identity":"b00cb8a6-bd0c-4765-af84-8fa3a3da1ff4","added_by":"auto","created_at":"2025-04-01 10:18:53","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":739158,"visible":true,"origin":"","legend":"\u003cp\u003eIMA Block for IMATX Network\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-6232663/v1/162f6c4530b580e6310d4c47.png"},{"id":79665930,"identity":"d2e4f372-bfd7-48ec-8aa8-78adca177e7f","added_by":"auto","created_at":"2025-04-01 10:10:47","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":678216,"visible":true,"origin":"","legend":"\u003cp\u003eT Block for IMATX Network\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-6232663/v1/747da5e67391daac864311d1.png"},{"id":79666353,"identity":"56dbe81b-54ce-4e1c-a2b2-0c19c6d4b7fd","added_by":"auto","created_at":"2025-04-01 10:18:47","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":91338,"visible":true,"origin":"","legend":"\u003cp\u003eTraining and Validation Loss, Accuracy\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-6232663/v1/54d3c70162f04d26e269b05e.png"},{"id":79666350,"identity":"d8168d7c-92b0-45c2-96b5-bf5c9f79e8b8","added_by":"auto","created_at":"2025-04-01 10:18:47","extension":"jpeg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":1244542,"visible":true,"origin":"","legend":"\u003cp\u003eAttention Player Visualization for general Attention, T block and T-Block enhanced with IMA\u003c/p\u003e","description":"","filename":"floatimage5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-6232663/v1/9b603554880e34828021e51b.jpeg"},{"id":79666347,"identity":"7a0b5f1b-b88c-4726-ac2b-470f372b9269","added_by":"auto","created_at":"2025-04-01 10:18:47","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":44826,"visible":true,"origin":"","legend":"\u003cp\u003eIMAX Network Confusion Matrix\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-6232663/v1/ce65b6cdaead407f30eff077.png"},{"id":79667144,"identity":"fa2ae77f-a7aa-463b-b6d3-014a56c63577","added_by":"auto","created_at":"2025-04-01 10:26:47","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":70438,"visible":true,"origin":"","legend":"\u003cp\u003eIMAX Network ROC – AUC curve\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-6232663/v1/b50cb0074c0be8b9281ae811.png"},{"id":98375506,"identity":"2ad826b3-8db1-4221-8533-ae36c4bc98ca","added_by":"auto","created_at":"2025-12-17 06:54:53","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2193360,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6232663/v1/5327f1aa-306c-4345-a663-9656ad230a64.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"IMATX: An Integrated Multi-Context Pyramidal Framework for Explainable and Interpretable AI Predictions for Real time Clinical Validation in Cervical Cancer Detection","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eThe Cervical-based malignancy(cancer) is a major health issue which disproportionately affects developing nations. The main cause of cervical cancer stems from chronic infections with high-risk HPV strains where HPV-16 and HPV-18 are the prominent culprits. The disease shows slow invisible progression while staying without symptoms at its initial phases. The progression of cervical cancer leads to the appearance of these symptoms including abnormal bleeding from the vagina alongside pelvic pain and continuous discharge. Successful treatment alongside improved patient survival rates results from early discovery of the disease.\u003c/p\u003e \u003cp\u003eTraditional cervical cancer screening approaches combining Pap smears with HPV testing acts as a primary factor in lowering cervical cancer occurrence rates. These methods have established restrictions which affect their applicability. Pathologist evaluation for Pap smear analysis requires human judgement because it depends on expert interpretation yet such opinions remain vulnerable to misdiagnosis. The ongoing problem results from both incorrect test outcomes and troubled diagnostic delays. These screening options face accessibility and affordability challenges in various low-resource regions which restrict their usefulness at the broader scale. Cervical cancer cell diagnosis requires pathologists to identify dyskeratotic together with koilocytotic cells and metaplastic along with parabasal cell and superficial-intermediate cellular features. Hand-based pathologic review through manual cytological examination extends the duration of testing alongside exposing results to different interpretation patterns between analysts. Manual interpretation methods increase diagnostic errors and delay times which thus affects patient treatment results negatively. Deep learning methods in automated diagnostic systems have gained increasing interest for handling current diagnostic obstacles. Cervical cancer classification demonstrates promising results through three main artificial intelligence models including Convolutional neural networks (CNNs), vision transformers (ViTs) and hybrid models. Improvements in medical practice require solutions for the existing challenges of image quality uncertainty alongside small visual irregularities and the requirement for tested clinical applications. Medical staff can advance early-stage cervical cancer detection with AI-based approaches which leads to enhanced patient survival rates alongside decreased global cervical cancer mortality statistics.\u003c/p\u003e \u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003e1.1 Key Contributions\u003c/h2\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eThe study proposes a hybrid deep learning framework with Integrated Multi-context Attention or IMA module and T-blocks for pyramidal multi dilated convolutional feature extraction.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eThe study further Evaluates the impact of various architectural components to illustrate the significance of each in building the cervical cancer autonomous detection model.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eAttention visualization through the IMA layer for improved interpretability in medical diagnosis.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eThis research evaluates various state-of-the-art models before showing how IMATX works in comparison to current ML and DL approaches.\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e1.2 Organization of the research\u003c/h2\u003e \u003cp\u003eThe organisation of this work is structured to present the research contributions. The Literature Review, reviews existing related studies, analysing the machine learning and deep learning methodologies for medical image classification. The Methodology showcases the data description and the proposed IMATX Net architecture. The Results and Discussion illustrates the results and ablation study. The Conclusion provides the summarisation of the major findings and use of the IMATX architecture.\u003c/p\u003e \u003c/div\u003e"},{"header":"2 Literature Review","content":"\u003cp\u003ePacal and Kılıcarslan introduced [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] a solid cervical cancer diagnostic method that combines CNNs and ViTs and applies their approach to the SIPaKMeD Pap smear dataset. Data augmentation combined with ensemble learning revealed that ViTs outperformed CNNs in diagnostic accuracy for clinical use in early cancer detection with precision accuracy. The authors Kalbhor and Shinde [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] applied deep learning techniques to cervical cancer diagnosis by combining pre-trained models with machine learning classifiers for feature extraction. The authors reached 92.03% accuracy using ResNet-50 in their experiments but improved results reached 96.01% by fine-tuning GoogleNet.\u003c/p\u003e \u003cp\u003eA Deep Neural Network (DNN) automated cervical cancer classification framework was introduced by Kumari et al. [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] to do early-stage prediction. The automated system completed data preprocessing then eliminated outliers while reducing dimensions with PCA before it successfully classified normal and abnormal cervical cell types. Using CNNs Youneszade et al. [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] examined the role of class granulation on detecting cervical cancer through colposcopic images. The network model attained 99% training accuracy yet its ability to generalize was weak because test accuracy fell to 43.11%.\u003c/p\u003e \u003cp\u003eCheng et al. [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] established a complete procedure for cervical cancer image processing through a sequence of imaging steps starting from acquisition to preprocessing and feature extraction and target detection. The researchers revealed understanding of CNN-based designs in addition to GANs alongside autoencoders for better feature depiction. DeepCervixNet stands as an elite deep learning detection model according to Talpur et al. [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] that specializes in cervical cancer diagnosis from Pap smear images. The combined model design linked sequence and excitation blocks to both ResNet101 and DenseNet169 neural networks which used ensemble techniques to reach classifying accuracy of 99.89%.\u003c/p\u003e \u003cp\u003eGrad-CAM enables explainability in cervical cancer diagnosis according to Bueno-Crespo et al. [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] who introduced an interpretable deep learning model built with CNNs.\u003c/p\u003e \u003cp\u003eDevaraj et al. [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] investigated pre-trained CNNs ResNet50V2 and InceptionV3 and Xception for cervical cancer detection in Pap smear pictures. ResNet50V2 proved most efficient among the tested deep learning models during cross-validation according to accuracy, precision, recall and F1-score metrics thus confirming deep learning models can effectively identify cervical cancer early through non-invasive diagnostics.\u003c/p\u003e \u003cp\u003eThe research by Mathivanan et al. [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] studied pre-trained deep neural networks AlexNet, InceptionV3, ResNet-101, and ResNet-152 for cervical cancer detection. A combination of extracted features with machine learning classifiers through the hybrid approach made ResNet-152 emerge as the best model that demonstrated 98.08% accuracy on the SIPaKMeD dataset.\u003c/p\u003e \u003cp\u003eTan et al. [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] created deep learning models which automatically detected cervical cancer while erasing the necessity for both segmenting areas and manually working on features. The research of Tripathi et al. [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] involved deep learning for cervical cancer classification by using ResNet-152 on SIPaKMeD Pap smear dataset samples. The deep network-based model delivered 94.89% accuracy in classifying cervical cytology samples thus validating its effectiveness for cervical cytology analysis. The cross-attention-based Transformer model CerviFormer by Deo et al. [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] was designed specifically for Pap smear image cervical cancer classification. The architecture proved effective at processing big datasets while delivering competent performance across two public data collections so it represents a promising tool for both early diagnosis and clinical decisions.\u003c/p\u003e \u003cp\u003eJeyshri and Kowsigan [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] developed a attention-based hybrid model to segment and classify cervical cancer in biomedical imaging. Their approach used Multiscale ResUNet\u0026thinsp;+\u0026thinsp;+\u0026thinsp;along with Fuzzy C-means Clustering for segmentation until it was finished by Serial Cascaded Residual Attention together with Long Short-Term Memory (LSTM) for the classification stage.\u003c/p\u003e"},{"header":"3 Methodology","content":"\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\n \u003ch2\u003e3.1 Data Description\u003c/h2\u003e\n \u003cp\u003eThe dataset comprises 12,147 microscopic cervical cell images categorized into five distinct classes: Dyskeratotic, Koilocytotic, Metaplastic, Parabasal, and Superficial-Intermediate. Dyskeratotic cells (2,439 images) are characterized by dense cytoplasm and irregular nuclei, often indicating high-grade squamous intraepithelial lesions (HSIL) and potential malignancy. Koilocytotic cells (2,475 images) exhibit perinuclear halos and nuclear atypia, typically associated with HPV infections and early precancerous changes. Metaplastic cells (2,379 images) represent transitional epithelial changes commonly found in non-malignant cervical transformation zones. Parabasal cells (2,361 images) are immature squamous cells with a high nuclear-to-cytoplasmic ratio, frequently observed in atrophic smears and inflammatory conditions. Superficial-Intermediate cells (2,493 images) are mature epithelial cells present in normal and benign cervical samples. Collected as part of the IEEE International Conference on Image Processing (ICIP) 2018 in Athens, Greece, from October 7\u0026ndash;10, this dataset serves as a crucial resource for developing automated deep learning-based cervical cancer detection models, addressing the limitations of traditional screening methods and enhancing early diagnosis accuracy.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n \u003ch2\u003e3.2 IMATX Net\u003c/h2\u003e\n \u003cp\u003eThe study presents a deep learning model which uses DenseNet169 backbone architecture together with attention mechanisms along with T-block modules and Integrated Multi-context Attention (IMA) module for automated classification of cervical cancer. This architecture method was developed to strengthen the feature extraction operations and build more accurate classifications for processing variable cervical cell image data. The model uses various deep learning innovations to advance cervical cancer identification processes. The network exhibits DenseNet169 features since it extracts information from ImageNet pre-trained features to generate hierarchical representations along with minimized processing constraints. Once the training begins the backbone section stays frozen because this protects the learned feature maps for maintaining a steady feature extraction process. Through this method medical imaging applications obtain better generalization together with reduced overfitting problems.\u003c/p\u003e\n \u003cp\u003eThe Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e IMATX Proposed illustration demonstrates a system which applies DenseNet169 as its feature extractor to generate deep hierarchical features alongside initial layer freezing to protect learned weights. The attention mechanism applies 1\u0026times;1 convolutions together with sigmoid-activated attention masks and features weights to highlight important areas and reduce unnecessary information.\u003c/p\u003e\n \u003cp\u003eTwo types of convolutions run in parallel during T-block operations to obtain both local and wide-ranged context information. The blocks progress in filter size from 64 to 128 to 256 while both batch normalization and ReLU activation enable stable learning for feature enhancement. The Integrated Multi-Context Attention (IMA) module applies a combination of global average and max pooling with 1\u0026times;1, 3\u0026times;3, and 5\u0026times;5 dilation rate multi-scale convolutions which strengthens hierarchical feature learning.\u003c/p\u003e\n \u003cp\u003eFeature retention occurs through a combination between the feature masking and enhancement procedure and the T-block output multiplication with attention-weighted feature maps. Finally the classification procedure occurs through layers with dropout regularization (64, 128, 512 neurons) that provide robust performance. The implementation of multi-scale feature extraction with attention mechanisms and optimization techniques produces an enhanced ability for the model to precisely categorize cervical cell images.\u003c/p\u003e\n \u003cdiv id=\"Sec8\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.1 Attention Channelling\u003c/h2\u003e\n \u003cp\u003eIn this architecture is the use of batch normalization and attention mechanisms in symphony. Batch normalization stabilises the training by normalizing intermediate features, thus forming a steady convergence. The attention module refines feature maps by applying a channel-wise attention mechanism, which enhances the important regions in the extracted feature vectors. The pyramidal framework has three convolutional layers that reduce dimensions and generate a spatial Convoluted attention mask. This mask is then multiplied with the feature maps to reduce the non-corelated regions of attention perception of the model, thereby improving the model\u0026rsquo;s ability to focus on diagnostically significant cellular growths. The attention mechanism is introduced to refine feature maps by highlighting informative regions. This Attention Changeling consists of:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eA Self convolution of 1x1 convolutions that reduce feature dimensions.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eA sigmoid-activated convolution to generate an attention mask.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eMultiplication of the attention mask with the extracted features to enhance relevant regions while suppressing redundant information.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec9\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.2 Integrated Multi-context Attention (IMA)\u003c/h2\u003e\n \u003cp\u003eThe feature representation receives improvement through an Integrated Multi-context Attention (IMA) module. IMA(as shown by Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e) contains two types of pooling named global average pooling (GAP) and global max pooling (GMP) which are refined through ANN to achieve semantic attention.\u003c/p\u003e\n \u003cp\u003eTwo decreased and enhanced feature vectors move through three respectively sized convolutional filter streams to collect contextual data from different levels. The process enables the model to evaluate many spatial scales while improving its discriminatory abilities. IMA serves as a module that combines semantic attention with CNN-modelling techniques of context extraction based on pyramidal approaches. Key components include: Global average and max pooling applied to extracted features.\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eDense Neural Networks are applied to pooled features to generate a semantic attention mask.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eMultiplication of the attention mask with extracted features for selective enhancement.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003ePyramidal convolutions (1x1, 3x3 with dilation\u0026thinsp;=\u0026thinsp;2, 5x5 with dilation\u0026thinsp;=\u0026thinsp;3) are used for concatenation to capture contextual variations.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec10\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.3 T Block\u003c/h2\u003e\n \u003cp\u003eT-block modules developed by the study apply dilated convolutions to acquire contextual information. The T-block consists of two sequential convolutional networks with normal and dilated configuration before merging their outputs through concatenation.\u003c/p\u003e\n \u003cp\u003eNo additional computational power is required through this approach because it allows the network to keep global together with local context. By stacking T-blocks (shown by Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e) the model\u0026apos;s detection capabilities of cervical cancer complex patterns enhance because the added complexity of features becomes possible to learn. The T-block module incorporates:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eA 3x3 convolutional layer with ReLU activation.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eA dilated 3x3 convolutional layer with a specified dilation rate to capture contextual features.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eFeature concatenation followed by batch normalization and ReLU activation. Three T-blocks are applied sequentially with increasing filters (64, 128, and 256) to progressively refine feature representation.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec11\" class=\"Section3\"\u003e\n \u003ch2\u003e3.2.4 Feature Refinement and Classification\u003c/h2\u003e\n \u003cp\u003eThis module aggregates global average pooling (GAP) and global max pooling (GMP) features, followed by dense layers to refine semantic attention. The refined feature maps are scaled using three parallel convolutional filters of varying kernel sizes, incorporating dilation rates to capture hierarchical contextual information. This process allows the model to consider diverse spatial scales and reinforce its discriminative capacity. To refine features:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eThe refined features from the T-blocks are multiplied with the earlier masked features from the attention mechanism.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eGlobal average pooling is applied to both the final features and the attention mask.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eA rescaled GAP operation normalizes the features.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eThe final classification head consists of:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eA dropout layer (0.5) to prevent overfitting.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eA dense layer with 128 units and ELU activation.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eAnother dropout layer (0.25) for regularization.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eA final softmax layer predicting five cervical cancer categories.\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003e\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eGlobal average pooling together with dropout nodes and dense layers combine to produce predictions at the last stage of classification. The attention-weighted feature maps go through GAP layers while additional dropout layers provide regularization. The last fully connected layer uses ELU activation to increase non-linear behavior before generating class probability predictions through its softmax activation output. The model reaches optimization through Adam optimization alongside categorical cross-entropy loss optimization which maintains strong multi-class classification capabilities.These components work in unison to let the model handle the difficulties faced in medical imaging while dealing with quality variations and subtle type differences of cervical cancer.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e"},{"header":"4 Results and Discussion","content":"\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n \u003ch2\u003e4.1 Cervical Cancer Detection using IMATX Network\u003c/h2\u003e\n \u003cp\u003eThe evaluation of IMAX NET requires analysis through ablation studies together with attention explainability and state-of-the-art comparisons as well as performance metrics tracking. An ablation study validates the essential nature of both the IMA attention module and T-blocks because they function as critical elements that drive better feature extraction and classification outcomes. Results demonstrate that the model becomes less accurate when its essential modules are removed from operation thus establishing their important function. Medical imaging interpretation becomes more effective through the IMA layer as it visualizes where the model directs its attention to important diagnostic areas.\u003c/p\u003e\n \u003cp\u003eThe Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e Training and Validation Loss, Accuracy below illustrates the training and validation accuracy and loss curves for IMAX NET.\u003c/p\u003e\n \u003cdiv id=\"Sec14\" class=\"Section3\"\u003e\n \u003ch2\u003e4.1.1 Ablation Study\u003c/h2\u003e\n \u003cp\u003eAblation studies systematically remove or modify components of a model to evaluate their individual contributions to overall performance. By comparing different configurations, researchers can determine which architectural components enhance accuracy, precision, recall, and F1-score.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eAblation Study on IMAX Ne\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eModel\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003ePrecision\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRecall\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eF1-Score\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAccuracy\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDenseNet169\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9475\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9359\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9413\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9389\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eDenseNet201\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9621\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9544\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9579\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9545\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eInception\u0026thinsp;+\u0026thinsp;ResNet V2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9639\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9623\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9629\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9617\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWith IMA module\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9639\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9616\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9627\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9687\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eParallel T Block\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.8612\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.8069\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.8226\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.8137\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eIMAX Network (DenseNet169\u0026thinsp;+\u0026thinsp;Pyramidal T Block\u0026thinsp;+\u0026thinsp;IMA module)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.9764\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.97\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.973\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.972\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eFrom the Table \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e Ablation Study on IMAX Net, we observe that DenseNet169 alone achieves an accuracy of 93.89%, with a macro-average F1-score of 0.9413. The utilization of DenseNet201 as backbone results in better performance with an accuracy of 95.45% and F1-score of 0.9579. Results improve when implementing Inception\u0026thinsp;+\u0026thinsp;ResNet V2 (F1-score: 0.9629 along with accuracy: 96.17%) because combining these architectures generates positive effects. Standardized results indicate that improvements from the IMA module are modest when using IMA (F1-score: 0.9627 and accuracy: 96.87%) since attention mechanisms enhance the representation of features. Performances decrease dramatically when Parallel T Blocks are included because this module functions independently as an insufficient component for accurate classification tasks (F1-score: 0.8226, accuracy: 81.37%). The IMAX Network combined with DenseNet169 along with Pyramidal T Blocks and IMA module reaches optimal performance (F1-score: 0.9730 accuracy: 97.20%) which indicates maximum robustness in feature extraction and classification.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec15\" class=\"Section3\"\u003e\n \u003ch2\u003e4.1.2 Attention \u0026ndash; IMA layer Explainability\u003c/h2\u003e\n \u003cp\u003eThe IMA (Integrated Multi-Attention) layer functions as a vital pathway for selecting target features and disposing nonessential background information. The IMA layer creates effective region targeting through attention visualization methods which enhances both clinical interpretation capabilities and medical diagnosis assistance. IMAX NET focuses its analysis exclusively on regions containing diagnostic importance which proves its effectiveness according to activation and heatmaps findings.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec16\" class=\"Section3\"\u003e\n \u003ch2\u003e4.1.3 Evaluation Parameters\u003c/h2\u003e\n \u003cp\u003eThe evaluation data presented by Fig. \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e IMAX Network Confusion Matrix on how the model classified the five cells as Dyskeratotic, Koilocytotic, Metaplastic, Parabasal and Superficial-Intermediate in the IMAX Network Confusion Matrix. The correct classifications of samples for each class appear on the main diagonal whereas misidentification cases exist in the off-diagonal areas of the matrix.\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eThe Dyskeratotic class had 216 correctly classified instances, with 5 misclassifications (2 as Koilocytotic, 3 as Metaplastic, and 2 as Superficial-Intermediate).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eThe Koilocytotic class was well-classified with 231 correct predictions, but 7 instances were misclassified, mainly as Dyskeratotic (4) and minor errors in other classes.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eThe Metaplastic class achieved 267 correct classifications, with only 4 misclassified samples.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eThe Parabasal class had 105 correct classifications, with 3 misclassified samples, mainly into Superficial-Intermediate.\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eThe Superficial-Intermediate class had 123 correct predictions, with only 3 misclassified samples\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n \u003cp\u003eFurther the Fig. \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e IMAX Network ROC \u0026ndash; AUC curve suggests that the model performs exceptionally well, with minimal misclassifications. The high number of correct classifications along the diagonal and the low count of off-diagonal misclassifications indicate strong precision and recall for each class. The minor errors could be attributed to feature similarities between some classes, especially in the Metaplastic and Superficial-Intermediate categories, where slight misclassifications were observed.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec17\" class=\"Section3\"\u003e\n \u003ch2\u003e4.1.4 State of the Art Comparison\u003c/h2\u003e\n \u003cp\u003eThe Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e \u003cstrong\u003eState of the Art Model Comparison\u003c/strong\u003e showcases that the IMAX NET - Proposed Method outperforms existing approaches across all key metrics, demonstrating superior classification performance. The IMATX demonstrates a sensitivity rate of 0.97 with high precision that minimizes incorrect negative results better than alternative approaches particularly Habtemariam et al. (0.713) and Tanimu et al. (0.713). Fei M et al. [\u003cspan class=\"CitationRef\"\u003e21\u003c/span\u003e] demonstrates a model performance with 85.26% accuracy along with 84.16% F1 score but stops short of reporting specificity measurements for assessment. The accuracy rate reaches 0.972 which proves superior to Pacal and Kılıcarslan (0.893). The high precision (0.976) together with F1-score (0.973) demonstrates IMAX NET delivers balanced results which indicate its robust ability for accurate and reliable classification over previous models.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003ctable id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eState of the Art Model Compariso\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eMetric\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eIMAX NET - Proposed Method\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eAbdalla Ibrahim Abdalla Musa [\u003cspan class=\"CitationRef\"\u003e20\u003c/span\u003e]\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003ePacal and Kılıcarslan [\u003cspan class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eHabtemariam et al. [\u003cspan class=\"CitationRef\"\u003e18\u003c/span\u003e]\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eFei, M ET AL. [\u003cspan class=\"CitationRef\"\u003e21\u003c/span\u003e]\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTanimu et al. [\u003cspan class=\"CitationRef\"\u003e19\u003c/span\u003e]\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSensitivity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.97\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.934\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.89\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.804\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.8348\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.825\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSpecificity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.971\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.905\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.827\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.713\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNA\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.713\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAccuracy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.972\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.902\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.893\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.821\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.8526\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.821\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePrecision\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.976\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.904\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.858\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.825\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.8314\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.825\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eF1 Score\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.973\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.923\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.89\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.804\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e0.8416\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.804\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n\u003c/div\u003e"},{"header":"5 Conclusion","content":"\u003cp\u003eThe IMAX NET model shows better results in all essential measurement criteria which surpass conventional methods in sensitivity along with specificity and accuracy and precision and F1-score. The model demonstrates excellent capabilities in achieving an optimal ratio of false positivity and negative detection which permits high medical image classification reliability. The ablation study assessed individual sections of IMAX NET by disconnecting its main components such as the IMA (Interleaved Multi-Attention) module and T-Blocks.\u003c/p\u003e \u003cp\u003eAblation tests reveal how each crucial aspect of IMAX NET functions by analyzing both the IMA (Interleaved Multi-Attention) module and T-Blocks. Research results showed that performance degradation occurred when essential components were eliminated which proved their essential role in the success of feature extraction and classification accuracy. Two key components in IMAX NET adopt dilated convolutions and channel attention mechanisms to enhance spatial awareness through the T-Block combined with multiple attention fusion mechanisms that improve feature representation in the IMA module. The Autonomous Pyramidial feature extraction occurs within IMAX NET which produces predictions that are accurate and applicable to various situations. Through its attention-based design IMAX NET surpasses CNN-based architectures and ResNet and UNet variations because it enables better contextual assessment to improve classification performance.\u003c/p\u003e \u003cp\u003eAdditionally, attention visualization techniques reveal that IMAX NET efficiently focuses on diagnostically significant regions, demonstrating higher interpretability and clinical relevance. This visual evidence confirms that the model does not rely on irrelevant background information but rather on critical features for classification, ensuring reliability in medical imaging applications.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eT.B. awrote the main manuscript text and prepared figures. All authors reviewed the manuscript.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003ePacal, I., Kılıcarslan, S.: Deep learning-based approaches for robust classification of cervical cancer. Neural Comput. Appl. \u003cb\u003e35\u003c/b\u003e(25), 18813\u0026ndash;18828 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s00521-023-08757-w\u003c/span\u003e\u003cspan address=\"10.1007/s00521-023-08757-w\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKalbhor, M.M., Shinde, S.V.: Cervical cancer diagnosis using convolution neural network: Feature learning and transfer learning approaches. Soft. Comput. 1\u0026ndash;11 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s00500-023-08969-1\u003c/span\u003e\u003cspan address=\"10.1007/s00500-023-08969-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKumari, C.M., Bhavani, R., Padmashree, S., Priya, R.: Automated cervical cancer classification using deep neural network classifier. Int. J. Model. Simul. Sci. Comput. \u003cb\u003e15\u003c/b\u003e(1), 2450008 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1142/S1793962324500089\u003c/span\u003e\u003cspan address=\"10.1142/S1793962324500089\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYouneszade, N., Marjani, M., Shafiq, D.A.: Exploring the impact of increasing the number of classes on the performance of cervical cancer detection models using deep learning and colposcopy. J. Eng. Sci. Technol. \u003cb\u003e19\u003c/b\u003e(2), 629\u0026ndash;647 (2024)\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCheng, C., Yang, Y., Qu, Y.: Exploration of cervical cancer image processing technology based on deep learning. International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024). SPIE, 13180, 255\u0026ndash;263. (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1117/12.3033802\u003c/span\u003e\u003cspan address=\"10.1117/12.3033802\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTalpur, D.B., Raza, A., Khowaja, A., Shah, A.: DeepCervixNet: An advanced deep learning approach for cervical cancer classification in pap smear images. VAWKUM Trans. Comput. Sci. \u003cb\u003e12\u003c/b\u003e(1), 136\u0026ndash;148 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.21015/vtcs.v12i1.1812\u003c/span\u003e\u003cspan address=\"10.21015/vtcs.v12i1.1812\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBueno-Crespo, A., Mart\u0026iacute;nez-Espa\u0026ntilde;a, R., Morales-Garc\u0026iacute;a, J., Ort\u0026iacute;z-Gonz\u0026aacute;lez, A., Imbern\u0026oacute;n, B., Mart\u0026iacute;nez-M\u0026aacute;s, J., Rosique-Egea, D., \u0026Aacute;lvarez, M.A.: Diagnosis of cervical cancer using a deep learning explainable fusion model. International Work-Conference on the Interplay Between Natural and Artificial Computation. Springer: Cham, 451\u0026ndash;460. (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-3-031-61137-7_42\u003c/span\u003e\u003cspan address=\"10.1007/978-3-031-61137-7_42\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDevaraj, S., Madian, N., Menagadevi, M., Remya, R.: Deep learning approaches for analysing pap smear images to detect cervical cancer. Wireless Pers. Commun. 1\u0026ndash;18 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11277-024-10986-8\u003c/span\u003e\u003cspan address=\"10.1007/s11277-024-10986-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMathivanan, S.K., Francis, D., Srinivasan, S., Khatavkar, V., Shah, P.K., M.A: Enhancing cervical cancer detection and robust classification through a fusion of deep learning models. Sci. Rep. \u003cb\u003e14\u003c/b\u003e(1), 10812 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-024-61063-w\u003c/span\u003e\u003cspan address=\"10.1038/s41598-024-61063-w\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTan, S.L., Selvachandran, G., Ding, W., Paramesran, R., Kotecha, K.: Cervical cancer classification from pap smear images using deep convolutional neural network models. Interdisciplinary Sciences: Comput. Life Sci. \u003cb\u003e16\u003c/b\u003e(1), 16\u0026ndash;38 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s12539-023-00589-5\u003c/span\u003e\u003cspan address=\"10.1007/s12539-023-00589-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTripathi, A., Arora, A., Bhan, A. Classification of cervical cancer using deep learning algorithm. 2021 5th International Conference on Intelligent Computing and, Systems, C.: (ICICCS), Madurai, India, 1210\u0026ndash;1218. (2021). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICICCS51141.2021.9432382\u003c/span\u003e\u003cspan address=\"10.1109/ICICCS51141.2021.9432382\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDeo, B.S., Pal, M., Panigrahi, P.K., Pradhan, A.: CerviFormer: A pap smear-based cervical cancer classification method using cross-attention and latent transformer. Int. J. Imaging Syst. Technol. \u003cb\u003e34\u003c/b\u003e(2), e23043 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/ima.23043\u003c/span\u003e\u003cspan address=\"10.1002/ima.23043\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJeyshri, J., Kowsigan, M.: Multi-stage attention-based long short-term memory networks for cervical cancer segmentation and severity classification. Iran. J. Sci. Technol. Trans. Electr. Eng. \u003cb\u003e48\u003c/b\u003e(1), 445\u0026ndash;470 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s40998-023-00664-z\u003c/span\u003e\u003cspan address=\"10.1007/s40998-023-00664-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGanguly, T., Singh, R.P., Kumar, P.: Self-attention based ResNet model for cervical cancer detection. 2023 Second International Conference on Informatics (ICI), Noida, India, 1\u0026ndash;6. (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/ICI60088.2023.10421309\u003c/span\u003e\u003cspan address=\"10.1109/ICI60088.2023.10421309\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eXia, M., Zhang, G., Mu, C., Guan, B., Wang, M.: Cervical cancer cell detection based on deep convolutional neural network. 2020 39th Chinese Control Conference (CCC), Shenyang, China, 6527\u0026ndash;6532. (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.23919/CCC50068.2020.9188454\u003c/span\u003e\u003cspan address=\"10.23919/CCC50068.2020.9188454\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGhoneim, A., Muhammad, G., Hossain, M.S.: Cervical cancer classification using convolutional neural networks and extreme learning machines. Future Generation Comput. Syst. \u003cb\u003e102\u003c/b\u003e, 643\u0026ndash;649 (2020). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.future.2019.09.015\u003c/span\u003e\u003cspan address=\"10.1016/j.future.2019.09.015\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFan, Z.Z., Wu, X.C., Li, C.Z., Chen, H.Y., Liu, W.L., Zheng, Y.C., Chen, J., Li, X.Y., Sun, H.Z., Jiang, T., Grzegorzek, M., Li, C.: CAM-VT: A weakly supervised cervical cancer nest image identification approach using conjugated attention mechanism and visual transformer. Comput. Biol. Med. \u003cb\u003e162\u003c/b\u003e, 107070 (2023). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.compbiomed.2023.107070\u003c/span\u003e\u003cspan address=\"10.1016/j.compbiomed.2023.107070\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHabtemariam, L.W., Zewde, E.T., Simegn, G.L.: Cervix type and cervical cancer classification system using deep learning techniques. Med. Devices: Evid. Res. \u003cb\u003e15\u003c/b\u003e, 163\u0026ndash;176 (2022). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2147/mder.s366303\u003c/span\u003e\u003cspan address=\"10.2147/mder.s366303\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYi, J.X., Liu, X.L., Cheng, S.H., Chen, L., Zeng, S.Q.: Multi-scale window transformer for cervical cytopathology image recognition. Comput. Struct. Biotechnol. J. \u003cb\u003e24\u003c/b\u003e, 314\u0026ndash;321 (2024). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.csbj.2024.04.028\u003c/span\u003e\u003cspan address=\"10.1016/j.csbj.2024.04.028\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMusa, A.I.A., Adam, M.M.S.: Attention-Guided Hybrid Network for Cervical Cancer Classification. Ing\u0026eacute;nierie des. syst\u0026egrave;mes d Inform. \u003cb\u003e30\u003c/b\u003e(1), 191\u0026ndash;202 (2025). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.18280/isi.300116\u003c/span\u003e\u003cspan address=\"10.18280/isi.300116\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFei, M., Zhang, X., Chen, D., Song, Z., Wang, Q., Zhang, L.: Whole slide cervical cancer classification via graph attention networks and contrastive learning. Neurocomputing, \u003cb\u003e613\u003c/b\u003e, (2025). 128787.]\u003c/span\u003e\u003c/li\u003e \u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Cervical cancer detection, Deep learning, IMATX Net, Attention mechanism, T-block, Medical image classification, Explainable AI","lastPublishedDoi":"10.21203/rs.3.rs-6232663/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6232663/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe detection of cervical cancer through histopathological images remains difficult due to complex cellular features combined with diverse staining practices which make current methods ineffective in clinical settings. Automated classification systems need to identify significant spatial features along with contextual features because this leads to more accurate and dependable diagnoses. Present methods show limitations when identifying complex associations between elements thereby reducing their escalating characteristics and broad applicability. This paper introduces the deep learning architecture IMATX Net which combines IMA attention modules with T-blocks for improved feature selection and classification accomplishments. Effective lesion discrimination alongside interpretability emerges from the proposed network because it refines both spatial and contextual elements. The system operates through a multiple-stage procedure which integrates attention channelling together with feature refinement along with classification steps. Through the IMA layer monitoring of attention the model creates better explainability by marking down crucial diagnostic regions.\u003c/p\u003e \u003cp\u003eThe ablation study evaluates all vital network components to show their effects on classification results. IMATX Net produces higher performance than current machine learning (ML) and deep learning (DL) systems while delivering maximum sensitivity and specificity and accuracy and precision and F1-score. The reliability of the model gets measured through confusion matrix (CM) along with ROC-AUC curves yet training and validation curves prove the learning stays stable even with minimal overfitting. IMATX Net demonstrates a sensitivity value of 0.97 which exceeds all other state-of-the-art techniques.\u003c/p\u003e \u003cp\u003eThe experimental findings show that IMATX Net demonstrates effective performance in addressing cervical cancer detection problems in histopathological imaging. The proposed model delivers robust interpretable clinical-scale classification through its integration of multi-scale attention features with refinement methods. The research verifies feature refinement techniques utilizing attention mechanisms as crucial elements for medical image analysis while allowing future improvements in automatic cervical cancer screening methods.\u003c/p\u003e","manuscriptTitle":"IMATX: An Integrated Multi-Context Pyramidal Framework for Explainable and Interpretable AI Predictions for Real time Clinical Validation in Cervical Cancer Detection","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-01 10:10:42","doi":"10.21203/rs.3.rs-6232663/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d33ff96c-2b67-4ce2-8b77-44abc1d88350","owner":[],"postedDate":"April 1st, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-12-17T06:53:32+00:00","versionOfRecord":[],"versionCreatedAt":"2025-04-01 10:10:42","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6232663","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6232663","identity":"rs-6232663","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.