Prostate Cancer Detection in Bi-parametric MRI Using Deep Learning Model

doi:10.21203/rs.3.rs-9407693/v1

Prostate Cancer Detection in Bi-parametric MRI Using Deep Learning Model

2026 · doi:10.21203/rs.3.rs-9407693/v1

preprint OA: closed

Full text JSON View at publisher

Full text 258,301 characters · extracted from preprint-html · click to expand

Prostate Cancer Detection in Bi-parametric MRI Using Deep Learning Model | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Prostate Cancer Detection in Bi-parametric MRI Using Deep Learning Model Ghulfam Hussain This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9407693/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Prostate cancer is an example of a widespread cancer among men in the world and early and accurate diagnosis plays a vital role in enhancing the likelihood of a more favorable patient outcome and reducing the occurrence of invasive surgeries. In the last several years, computer-aided diagnosis systems with deep learning have also shown significant potential on the analysis of medical images, although the conventional convolutional neural networks have the tendency to fail to recreate the long-range contextual attributes in multi-faceted data of a magnetic resonance imaging (MRI). To address these limitations, the current research work is premised on investigating the effectiveness of transformer-based architecture to identify prostate cancer with a comparative analysis of two architectures, Vision Transformer (ViT) and Swin Transformer. The first step in this research involves processing prostate MRI images by first using a complete preprocessing process that entails image normalization, data augmentation to a clinical relevance that ensures that images are better and that the process also tries to eliminate class imbalance. ViT and Swin Transformer are then pretrained and used to learn prostate tissue discriminative representation by extracting features using their respective self-attention mechanisms. The extracted features are then subjected to the supervised classification, in which the performance of the model is evaluated using the typical metrics of analysis such as the accuracy and precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). Both transformer-based models can be compared as competitive in the prostate cancer detection task, as Vision Transformer is more effective in capturing the global context, and Swin Transformer is more effective in capturing the hierarchical feature representation. The cross-validation findings are also in favor of the stability of the proposed framework and its capacity to be generalized. Overall, the current paper demonstrates that the transformer-based models can possibly be applied in automated diagnosis of prostate cancer, and that it may be possible to gain a clearer idea of their flaws and strengths to create AI-assisted screening systems that are clinically reliable in the future. Artificial Intelligence and Machine Learning Cancer Biology Prostate Cancer Detection Deep Learning Bi-parametric MRI Visual Transformers (ViT) Swin Transformers Hybrid Model Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 CHAPTER ONE: INTRODUCTION 1.1 Overview The male reproductive system (as shown in Figure 1.1 ) [1] is a complicated network consisting of organs, and structure for production, storing and delivering the sperms. It also helps in the secretion of the male sex hormones such as testosterone. Its primary functions include spermatogenesis, hormonal production, and transfer of sperms into male reproductive system during intercourse, where spermatogenesis refer to as production of sperms. Mainly it can be divided into two parts internal organs and external organs [2] External organs of male reproductive systems play an important role in reproduction, urination, and maintenance of temperature in for maximum production of sperms. These include the penis and the scrotum. The penis is mainly responsible for urination and sexual intercourse, made up of three main parts: the root, cylindrical shaped shaft containing specialized erectile tissues, and the glans penis. The loose bag of skin located just beneath the penis is called as scrotum, separated in two compartments and each of these compartment holds one testis. The main role of scrotum is to maintain the temperature, which is essential in production of normal sperms [3], [4]. Internal reproductive organs are responsible to generate, develop, store, and to transport sperms. It is also responsible to release the necessary hormones that promote fertility [5], [6]. It generally includes different parts as mentioned below: Testes or Testicles (Responsible for generation of sperms and male testosterone). Epididymis (Long and coiled tube located the posterior side of each testis and responsible for maturation, storage and transportation of sperms). Vas Deferens (A muscular tube, responsible for transportation of sperms from epididymis to ejaculatory tubes). Seminal vesicles (Generally responsible for generation of 60% of the seminal fluid that contains fructose and other nutrients to nourish the quality of sperms). Prostate Gland (A vital organ that is the size of a walnut having weight of 20~25 grams, and it is located directly below the bladder and envelops the upper portion of the urethra) [7], Prostate glands are also responsible for production of milky white fluid which forms the 30% portion of semen. This liquid is used to counteract the acidic nature of the vagina, hence protecting sperm and increasing their survival [8]. Other enzymes that are also produced by the prostate include prostate-specific antigen (PSA) that are necessary in the liquefaction of semen after ejaculation to enable sperm transport [9]. As a person gets old, the prostate usually enlarges, a condition that is referred to as benign prostatic hyperplasia (BPH) and in most cases, the prostate is also the location where prostate cancer arises, which is one of the most prevalent cancers in men [10]. The prostate gland is anatomically divided into different zones: peripheral zone (PZ), central zone (CZ) and transitional zone (TZ) [11]. The peripheral zone is the most common site of origin of most prostate cancers, and benign prostatic hyperplasia (BPH), the enlargement of the gland which commonly occurs with age, occurs in the transitional zone and may result in a blockage of urinary flow [12]. With aging, the prostate gland tends to increase in size naturally due to hormonal changes. These hormonal changes may include variations of testosterone and its active metabolite, dihydrotestosterone (DHT) [13]. This age-related enlargement may also lead to different Lower urinary tract symptoms (LUTS) [14] that may include weak urine stream, difficulty in withholding urination, inability to withhold urine and frequent urination at night [15]. Due to hormonal changes, size of the prostate gland also increases. This increases the risk for malignant transformation of prostatic tissue causes prostate cancer. 1.1.1 Prostate Cancer: Prostate cancer (as shown in Figure 1.2 ) [16] can be defined as a malignant disease caused by uncontrolled proliferation of abnormal cells in the prostate gland. It is usually initiated in glandular cells of the prostate (adenocarcinoma). This type of cancer is common in men with slow development and generally localized in prostate where it does not have adverse effects. But some of the aggressive forms of prostate cancer may spread very fast [17]. The early diagnosis of prostate cancer is very crucial for effective treatment when it is located within the gland [18]. At an early stage, it is possible to consider different treatment alternatives that should be used to control the disease and change the outcomes of the patients. The timely screenings and medical interventions can greatly influence the diagnosis and treatment management of patients with prostate cancer [19], [20]. Some of the symptoms of prostate cancer include frequent urination impulse, difficulty in initiating and maintaining the urination, blood in urine, painful urination, and in some case, painful ejaculation and difficulty in attaining or maintaining erection [21], [22]. Prostate cancer can be treated in different ways depending on the severity; in early stages, it can be treated through watchful waiting, prostatectomy (removal of prostate), brachytherapy (the use of radioactive seeds), conformal radiation therapy, and intensity modulation therapy. In other instances, combination of radiation therapy and hormone therapy can be recommended as treatment [20]. At the advanced stages, a more severe treatment method is required, including chemotherapy to address cancer cells all over the body and androgen deprivation therapy to diminish the effect of male hormones that enhance cancer development. Hormone therapy may be needed in the long term, and a clinical trial might be considered a possibility in cases of ineffectiveness of other therapies. Radical prostatectomy cannot be adopted when cancer has escaped the prostate. The choice of treatment must be taken into consultation with urologists or oncologists depending on the situation [20]. 1.2 Motivation: Prostate cancer is one of the major issues that affects the men around the world since it is the second most common and one of the major causes of cancer related deaths. Regardless of the major advancements in treatment methods as well as screening and diagnostic devices, it is spreading globally due to different factors such as aging, lifestyles and still unavailability of best screening processes. The timely and early detection of prostate cancer is one of the important aspects of prostate cancer diagnosis. Slow growing tumors are less harmful to the life of a patient, while the advanced tumors developed quickly with the risk of metastasis. The distinction between these different disease states is still a major problem in the clinical setting which highlights the importance of conducting clinical research to improve the detection of prostate cancer owing to the restrictions of the available diagnostic tools. The traditional approaches to screening prostate cancer such as prostate-specific antigen (PSA) tests and digital rectal examination (DRE) are impeded by the lack of specificity. The high levels of PSA may also lead to different diseases other than cancer such as benign prostatic hyperplasia (BPH) or infections and result in increased healthcare costs, unnecessary biopsies and patient distress. Another challenge of these conventional screening approaches is that they are invasive, subject to sampling errors, and cannot detect clinically significant tumors at all. These issues highlight the urgent need for improved diagnostic methods that provide clinicians with a more precise, objective, and non-invasive method of evaluation of the risk of prostate cancer. Recently, multiparametric magnetic resonance imaging (mpMRI) has been an important method of prostate structure, detection of the location of tumors, and determination of lesion aggressiveness. Nevertheless, the ability of radiologists to interpret the findings of mpMRI strongly depends on their skills and experience, which leads to the fact that the diagnostic accuracy of the results obtained by different medical institutions and the work of different doctors is often different. The increased use of mpMRI has also burdened radiologists, and it is in this regard that automated, efficient and precise computer assisted diagnostic systems are needed. The recent advances in artificial intelligence, especially in deep learning and transformer-based models, offer a viable change to the issues encountered in detecting prostate cancer. As compared to conventional machine learning models, deep learning models possess the capability of automatically identifying complex patterns in a large volume of imaging data without necessarily requiring manual feature engineering. Swin Transformers and Vision Transformers (ViT) have become significantly useful in medical image processing because they are powerful frameworks that capture the long-range relations and the holistic background information that is important in detecting abnormalities in complex prostate MRI images. ViT is also superior to Swin Transformer in terms of global feature representation with full self-attention, making it possible to capture the features of anatomical structures in a comprehensive way, whereas Swin Transformer presents a hierarchical, window-based attention mechanism that effectively captures localized and multi-scale features. Although they are increasingly used in medical imaging, the comparative advantages and drawbacks of the two transformer architectures in detection of prostate cancer have not been discussed exhaustively and in an orderly manner. This research is inspired by the requirement to determine a clear picture concerning how varied transformer-based attention mechanisms affect the performance of diagnostics in prostate cancer diagnosis. Instead of coming up with a hybrid architecture, this study is concerned with an effective comparative framework in which models of Vision Transformer and Swin Transformer are tested under the same experimental conditions. Through comparing their performance based on accuracy, sensitivity, specificity and robustness, the study will be able to determine which architectural design is more appropriate to detect clinically significant prostate cancer using MRI data. The final goal of the presented comparative analysis will be to facilitate the creation of credible AI-assisted diagnostic aids that could assist medical professionals to identify prostate cancer at its early stages. This study aims to address the issue of needless biopsies, enhance diagnostic accuracy, and add to a more informed and individual treatment planning by offering evidence-based information regarding the efficacy of global versus hierarchical attention mechanisms. The further evolution of the knowledge about transformer-based models in prostate cancer diagnostics and development of AI-based technology in medical imaging are the main factors that will drive this work. 1.3 Significance of Prostate Cancer Detection: Early Diagnosis reduces the cancer-related deaths: It is very essential for prostate cancer to detect early as it increases the chances of survival of people, increased chances of successful treatment and reduced mortality due to this disease [19], [23]. Effective detection of clinically significant vs. insignificant cancer: To minimize unnecessary treatments and risks, effective detection strategies must be applied in differentiating between clinically important tumors and need urgent treatment and less aggressive cases that can be carefully observed [24]. Reduction in over treatment: Proper diagnosis also aids in reducing overtreatment through unnecessary biopsies, surgeries, and radiation which may cause some complications such as urinary incontinence and erectile dysfunction [25]. Improved Quality of Patient Life: This timely and accurate diagnosis is not only beneficial to the patients, in that it allows them high-quality life by allowing the less invasive and individualized treatment programs, but also by empowering the health service providers to make informed judgments about the treatment course of action, based on the features of tumors and the evaluation of risk [26]. Enhanced Clinical decision making: The use of advanced imaging technologies and AI-based detection systems allows conducting targeted biopsies, which maximizes the diagnostic result and minimizes faults in the sampling process [27], [28]. Reduced healthcare costs: Early and precise detection reduces the financial burden associated with repeated biopsies, advanced-stage cancer treatments, and hospitalizations [29]. Public Health Planning: The prevalence of detection methods and their accuracy is important to the public health planning, i.e. in this regard, implementing screening recommendations, early intervention program and resources distribution in the health care systems [30]. Personalized Medicines: Precise identification helps with the practice of personalized medicine, as it allows the evaluation of risks and customized treatment schemes, depending on the factors which include cancer grade, tumor size, and patient health status [31]. 1.4 Aims and Objectives: The main goal of our proposed research is to identify the clinically relevant prostate cancer detection using MRI images by evaluating and comparing the performance of two state-of-the-art deep learning transformers models such as Vision Transformer(ViT) and Swin transformer. To propose an AI-powered diagnostics framework for accurate and efficient prostate cancer detection to mitigate the issues and drawbacks of existing traditional techniques such as PSA test, DRE and imaging analysis. To propose a preprocessing framework of MRI images by applying different operations such as normalization, slice separation, and data augmentation. To compare the performance of Vision Transformer (ViT) and Swin Transformer models in identifying clinically significant prostate cancer in MRI scan data. To identify the most appropriate transformer-based architecture for prostate cancer detection and present evidence-based suggestions on further AI-aided diagnostic procedures. 1.5 Research Questions: The goals and motivations led to the following research questions What is the comparison between Vision Transformer (ViT) and Swin Transformer models regarding the accuracy of detecting clinically significant prostate cancer with MRI images? How can we analyze and detect various diseases affecting rice leaves promptly? 1.6 Contributions: Particularly our contributions to crop classification and disease analysis can be summarized below: We proposed a comparative analysis of Vision Transformer (ViT) and Swin Transformer models in detection of prostate cancer with the input of prostate MRI images in a comprehensive and impartial manner. In contrast to other previous researchers that consider a single transformer model in comparison to others, the given work compares the two architectures within the same scope of the experiment allowing having a clear view of their weaknesses and advantages in comparison to each other. Our research reveals the usefulness of transformer-based self-attention systems in discriminative feature learning on prostate MRI scans. Through global attention (ViT) and hierarchical window-based attention (Swin Transformer), the paper identifies the role of various attention strategies in the diagnostic performance of identifying clinically significant prostate cancer. Comprehensive analysis is performed with the help of confusion matrix analysis, precision, recall, F1-score, ROC-AUC, and K-fold cross-validation. This comprehensive evaluation offers strong support to validity of model, ability to generalize and diagnostic stability on separate data splits, which is required in clinical application. 1.7 Stakeholders: The proposed research on the detection of prostate cancer with the help of Vision Transformer and Swin Transformer models involves the following stakeholders: Patients: The major beneficiaries of this study, since proper and early diagnosis of prostate cancer can greatly enhance the outcome of its treatment, minimize unnecessary biopsies, and the general quality of life. Radiologists and Clinicians: Clinical practitioners with access to AI-assisted diagnostic aids have the potential to use them to assist clinical decision-making, decrease inter-observer variability, and enhance confidence in diagnostic judgments in interpreting prostate MRI images. Healthcare Institutions and Hospitals: The integration of automated prostate cancer detection systems is beneficial to Healthcare Institutions and Hospitals as it will improve efficiency in diagnosing patients, reduce workload on imaging departments, and improve patient management. Medical Researchers and Clinical Scientists: Use the findings to further the study on the areas of medical imaging, oncology, and artificial intelligence, as well as help to create more efficient diagnostic methods. Artificial Intelligence and Technology Developers: Learn how to use transformer-based architectures to develop medical imaging, facilitate the creation of the advanced AI model and clinical decision-support systems. Regulatory and Health Policy Bodies: Can use the results of this study to guide the guidelines, standardization and assessment of AI-based diagnostic tools in clinical practice. Medical Imaging Equipment Manufacturers: Take advantage of learning about the possibilities of synchronized AI models and MRI-data interaction and how this will lead to future imaging systems designed to respond to AI-assisted-diagnostics. Academic Institutions: Use the results of research in education, interdisciplinary cooperation, and additional research of AI-based solutions in healthcare. Funding Agencies and Research Sponsors: The agencies and sponsors are interested in assisting innovative, impactful research with a solid clinical foundation and an opportunity to be implemented in the real world. Engaging these stakeholders during the research lifecycle increases the clinical relevance, translational capability and general impact of the suggested prostate cancer detection framework to healthcare systems and patient care. 1.8 Outline of the Thesis: According to the intended research questions, the whole thesis is structured into five chapters each covering a particular aspect of the proposed research of the prostate cancer detection with the help of Vision Transformer and Swin Transformer models. The thesis is designed in the following way: Chapter One provides the background and motivation of the research by presenting an overview of cancerous diseases, medical imaging, and increasing the role of artificial intelligence in healthcare. This chapter progressively deals with the male reproductive system and prostate cancer, and the clinical significance of early and accurate prostate cancer detection. It provides the context of the insufficiency of traditional methods of diagnosis and encourages the necessity of AI-based solutions. The research problem, the research objectives, the research questions, the scope of the research and the main contributions of the thesis are also defined in the chapter. Chapter Two is an extensive literature review that provides the existing literature concerning the detection and segmentation of prostate cancer with the help of medical imaging and deep learning. It is a critical analysis of traditional machine learning techniques, convolutional neural networks-based techniques and new developments in transformer-based architecture. The gaps in research presented in this chapter are related to the gaps in the existing studies, in particular, the absence of comparative analysis of Vision Transformer and Swin Transformer models, thus providing a solid background to the proposed research. Chapter Three provides a description of the research methodology and experimental framework. The first part of this chapter discusses the theory of deep learning and transformer architectures, both Vision Transformer and Swin Transformer. It then gives outlines of the dataset utilized, data processing process, data augmentation, and feature extraction process. The entire experimental design, model training strategy, and evaluation protocols have been detailed in the chapter to make a fair and unbiased comparison of the two transformer models. Chapter Four concentrates on the application and experiment of the suggested models. It shows quantitative performance analysis in terms of accuracy, precision, recall, F1-score, ROC-AUC and in cross-validation analysis. Comparison of results between Vision Transformer and Swin Transformer is addressed in detail and analysis of confusion matrix and ROC curve is presented. The chapter is also an interpretation of the results, the strengths and limitations of the model and an analysis of their clinical relevance. Chapter Five is the conclusion of this thesis that sums up the main findings and contributions made by research. It also contemplates the success of the transformer-based models to detect prostate cancer and the implications of practical usage in the clinical setting. The chapter also indicates the shortcomings of the present study and gives possible directions to future research, i.e., multimodal data integration and real-time clinical applications. Such systematic organization secures a logical series of ideas and gives a clear track of ideas in projecting a problem formulation to experimental validation and conclusion. CHAPTER TWO: LITERATURE REVIEW Globally, prostate cancer has become an issue of concern and is currently one of the most prevalently diagnosed cancers in males and a leading cause of cancer-related mortality. Early and accurate detection of clinically significant prostate cancer is essential to provide early intervention and improved patient outcomes. Deep learning and artificial intelligence (AI) have changed image analysis in medicine, and now new opportunities are available in the field of automated prostate cancer history. Numerous studies have focused on the development of AI-based systems that could help radiologists with screening and interpretation of multiparametric magnetic resonance imaging (mpMRI), which has become the gold standard in prostate cancer imaging. In the quest to understand the development and limitations of modern technologies, it is essential to reflect on past studies that have been used to establish the frontline in the same discipline. 2.1. Classical CNN-based Approaches: Convolutional neural networks (CNNs) were the leading innovations in the early stage automated prostate cancer detection because of their capabilities to learn powerful features and their ability to identify spatial patterns. Research by Abdelmaksood et al. [32] proposed a three-dimensional (3D) AlexNet-based deep learning model to segment and classify prostate cancer in MRI volumes. Their experiment depicted the effectiveness of 3D CNNs to establish volumetric spatial associations necessary in determining the border of lesions and the malignancy of tumors. Nevertheless, the study noted that architectural optimization can avoid overly demanding computations. Deep learning also penetrated histopathological analysis, since Duran et al. [33] described a CNN-based system to classify patches at the patch scale on digitized whole-slide images. Their patch scoring methodology considerably decreased computational cost without impacting diagnostic accuracy, demonstrating the scalability of deep learning to large medical processes. To improve clinical utility and nurture confidence, interpretability became a significant aspect to be considered in the design of a system. Hassan et al. [34] proposed an explainable AI (XAI) system which is a combination of ultrasound and MRI modalities, essentially as a combination of deep learning predictions and visualization tools that provide explanations which are important to the clinician. Their study highlights the importance of implementation of AI in radiological practice, especially in a scenario where clinical validation and transparency are of utmost significance. 2.2. Hybrid CNN and Segmentation-driven Models: Several studies have also proposed hybrid and multitask architecture which integrates segmentation and classification to improve the accuracy of tumor localization. Singla et al. [35] proposed a hybrid framework that combines U-Net and CNN to score and segment prostate lesions at the same time in MRI. Although they had high levels of accuracy, their model had challenges where the amount of memory used was large as well as the complexity in the runtime, which restricted the real-time application of CNN-based systems. Ilesanmi et al. [36] suggested an advanced 3D CNN method that uses preprocessing methods such as cropping and resampling to normalize MRI data. The sensitivity of their tumor boundaries facilitated the significance of preprocessing in enhancing CNN learning. Semi-supervised methods were more popular to address a small amount of labeled data. Sammouda et al. [37] applied k-means clustering to present the boundaries of coarse lesions without annotations. Although their technique improved the imaging of tumors, they had problems with the high-dimensional noise and were not precise in complex anatomical regions. Investigations on higher-level segmentation networks were undertaken. Jin et al. [38] adopted a bicubic interpolated 3D U-Net architecture that is used to preprocess. According to their study, they have demonstrated better structural coherence of segmented prostate structures in benchmark datasets such as PROMISE-12. Also, Jiangtao et al. [39] performed an in-depth comparison of different architectures, such as FCN, ResNet, and U-Net derivatives, on some of the most popular datasets (ProstateX, Decathlon, PROMISE-12). Their study validated that they made progress in segmentation accuracy but raised the issue of reproducibility and standard evaluation systems. 2.3. Deep Learning in Histopathology and Feature Enrichment: Going beyond MRI imaging, Ayyad et al. demonstrated significant performance in histopathological diagnosis on deep neural networks, entropy-based texture analysis, and ensemble classification. These were radical techniques that accelerated the process of making diagnostic decisions and reducing the subjective nature of making those decisions through manual interpretation. Similarly, Rundo et al. [40] have extended image functionality into the use of squeeze-and-excitation (SE) blocks into U-Net, which improves contextual sensitivity and zonal classification even when the acquisition protocols change. Similarly, Hambarde et al. [17] utilized a deep U-Net to perform lesion classification, but due to the difficulties of longer models training and inference times, clinical scalability was not possible. Taking a radiomics perspective on the domain, Bleker et al. [41] came up with a lesion volume definition model based on AI, which absorbed the quantitative image features extracted. Although this progress strengthened diagnostic accuracy, the authors admitted the shortcomings of the model around the generalization of the model on the unseen data. The recent issues in CNN-based models of prostate cancer detection, such as the inability to capture the global context and excessive computational demands, have developed the transition to transformer-based models in medical imaging. Vision Transformer (ViT) proposed the use of self-attention, which acts on images in the form of the patch sequences, allowing them to learn global features more effectively than CNNs. Nonetheless, the fact that ViT requires large, annotated datasets is a barrier to prostate MRI because of the lack of expert annotations. Conversely, the Swin Transformer focuses on these challenges by using self-attention in shifted windowing, which allows hierarchical feature learning and has a better computational complexity and localized abnormality detection. Its spatial bias improves generalization of different datasets. However, despite such improvements, the current literature does not provide a direct and systematic comparison between ViT and Swin Transformer with a specific focus on clinically significant prostate cancer detection with the use of mpMRI. Previous research is dedicated to a single model and seldom addresses whether the model is computationally feasible, resistant to imaging variability, and applicable in the real world. Therefore, there remains a research gap on the issue that which transformer architecture can be best diagnostic and most useful in clinical practice. To fill this gap, the present study will focus on the objective and exhaustive assessment of ViT and Swin Transformer models in equal conditions experimentally. It is expected that the results will guide the choice of the best transformer-based method to improve the early and accurate detection of the prostate cancer, which will eventually improve clinical decision support and patient outcomes. 2.4. Summary The literature reviewed indicates that significant advancements have been made in the field of artificial intelligence-based prostate cancer detection, and the initial research mostly involved convolutional neural networks, the U-Net variants, hybrid deep learning models, and radiomics-based methods. Though these techniques have shown promising segmentation and classification results, they all have weaknesses, such as limited receptive fields, high computational complexity, long training durations, and poor long range contextual information clustering capabilities that the prostate MRI contain. Recent developments of transformer-based architecture have demonstrated high promise in addressing these challenges with the help of self-attention mechanisms that can model global and hierarchical relationships. Nonetheless, the available literature usually considers a particular model of transformer or considers transformers as part of hybrid systems, without systematic and fair comparison of the various architectures of transformers. Specifically, it is evident that there is no comparative analysis assessing the capabilities of Vision Transformer that can be compared to Swin Transformer regarding the application to clinically significant detection of prostate cancer because of the global attention mechanism versus the hierarchical window-based attention. This research gap inspires the proposed study that presents a single experimental framework to strictly contrast between Vision Transformer and Swin Transformer models under the same conditions. The proposed framework of comparison will offer evidence-based information to ensure the selection of transformer structures to be used in prostate cancer detection systems that are reliable and clinically feasible by addressing the following issues: diagnostic performance, robustness and practical feasibility. Table 2. 1 . Summary of Recent Studies on Prostate Cancer Detection using Deep Learning Techniques Ref. Author(s) & Year Data Modality / Dataset Methodology / Model Task Key Findings Limitations [32] Abdelmaksoud et al., 2021 Prostate MRI 3D AlexNet, ResNet-50, Inception-V4 Segmentation & Classification 3D CNNs effectively captured volumetric spatial features; AlexNet achieved good accuracy with lower complexity Limited global contextual understanding; architecture-dependent performance [33] Duran et al., 2020 Histopathology (WSI) CNN with patch scoring Classification High accuracy with reduced computation; suitable for real-time analysis Focused on histology, not MRI; limited anatomical context [34] Hassan et al., 2020 MRI + Ultrasound CNN + Explainable AI (XAI) Classification Improved diagnostic transparency and clinical trust Computationally expensive; limited scalability [35] Singla et al., 2021 PROMISE-12 (MRI) Hybrid U-Net + CNN Segmentation & Classification High segmentation accuracy and anatomical precision High memory usage; long training time [36] Ilesanmi et al., 2024 Prostate MRI 3D CNN Classification Strong spatial feature learning for tumor localization Poor generalization across datasets [37] Sammouda et al., 2021 Prostate MRI K-means + Optimized Segmentation Segmentation Reduced need for labeled data; improved visualization Sensitive to noise; weak performance in high-dimensional data [38] Jin et al., 2021 PROMISE-12, TPHOH 3D V-Net + Interpolation Segmentation High segmentation accuracy; improved preprocessing effectiveness Limited classification capability [39] Jiangtao et al., 2025 Multiple MRI datasets Review: FCN, U-Net, ResNet Segmentation Deep learning outperforms classical methods Lack of standardized benchmarks [40] Rundo et al., 2022 Prostate MRI U-Net + SE Blocks Segmentation Improved cross-dataset generalization Increased architectural complexity [42] Hambarde et al., 2022 T2-weighted MRI Deep U-Net Segmentation & Classification High accuracy in lesion detection Long training/testing time [41] Bleker et al., 2022 Prostate MRI Radiomics + DL Detection Improved lesion localization Inefficient for real-time deployment CHAPTER THREE: METHODOLOGY 3.1 Theoretical Background 3.1.1 Introduction to Deep Learning Deep learning has become a paradigm shift in medical image analysis that allows automated systems to learn high-level and complex representations directly out of raw imaging data. With the issue of prostate cancer diagnosis, deep learning has shown considerable promise in providing the solution to the problem of the traditional healthcare method of diagnosis, where it is frequently dependent on the knowledge of radiologists as well as inter-observer variability. With massive medical imaging data sets and the state-of-the-art neural network designs, deep learning models can learn to identify minute tissue changes and morphological anomalies that are suggestive of malignant changes in prostate tissue. CNNs have long held pre-eminence in analysis of prostate MRI as they have good performance in the extraction of localized spatial features. The CNN-based models, however, are constrained by their fixed receptive fields, making these models less effective to model long distance relationships and global contextual dependencies in complex anatomical structures. These are especially crucial when it comes to the process of detecting prostate cancer, as the lesions can be diffuse, heterogeneous, and contingent upon the context. New developments in deep learning have seen the introduction of transformer-based architectures which overcome these limitations by using self-attention mechanisms enabling models to capture local and global image context. In the framework of the proposed research, deep learning is the pillar that will allow exploring and comparing Vision Transformer and Swin Transformer architectures when detecting prostate cancer. They are the main advancements in the design of deep learning that integrate hierarchical representation learning with awareness of the global context. With such a deep learning application, the proposed study is expected to enhance the accuracy of diagnosis, increase the generalizability of models, and offer meaningful clinical information. Finally, deep learning does not only enable automated detection of prostate cancer but also contributes to the creation of consistent decision-support systems that can help clinicians detect prostate cancer early and develop a treatment plan based on a patient. 3.1.2 Vision Transformers: Vision Transformer (ViT) have turned out to be a highly competitive alternative to convolutional neural networks (CNNs) by visual understanding tasks by applying the transformer architecture, initially developed to work with natural language to image-based data. In contrast with CNNs, which use convolution operations with local receptive fields, ViT models use self-attention to identify spread long-range dependencies, and global contextual information in a complete picture. This feature makes Vision Transformers especially ideal to medical imaging tasks that are complex like prostate cancer detection, where anatomy, tissue heterogeneity, and fine lesion patterns are vital in obtaining accurate diagnosis. ViT architecture starts by the process of tokenization of images, during which an input image is broken into a series of fixed-size and non-overlapping patches, typically of size 16×16 pixels. The patches are all flattened into one-dimensional vectors and then subjected to a linear projection layer to produce patch embeddings of uniform dimensionality. As transformer models do not encode spatial relations, to maintain the spatial structure of the image, positional embeddings are learnable and are added to each patch embedding. Also, a classification token ([CLS]) is included, which is learnable and comes before the sequence, and serves as a global representation of the image and will be used in the ultimate classification tasks. The patch sequence with the embedded patch is subsequently inputted into a stack of transformer encoder layers, the main part of the ViT architecture. Both layers of encoders have two main elements: a multi-head self-attention (MHSA) mechanism and a feed-forward neural network (FFN) Self-attention mechanism enables each patch of an image to focus on all other patches in the image, making the model learn about distant relations of the image. This is an important feature compared to CNNs, which needs to stack many layers in order to estimate the global context. This allows ViT models to learn a spatially separated yet clinically related feature, e.g. gland boundaries and lesion localization, in prostate MRI analysis. Multi-head attention mechanism further promotes representational power, as the model can focus on the information provided in several subspaces at the same time. The different attention heads learn complementary spatial and semantic information in two different perspectives of patch relationships. The results of these attention heads are summed together and fed through a feed-forward network to generate refined feature representation. The architecture uses layer normalization and residual connections to stabilize the training and enhance convergence. On the last node of the network the output of the [CLS] token is removed and sent to a classification head which is generally a fully connected layer. This token is used to sum up all of the global information acquired by all patches and it forms a small, discriminative feature, that is used in classifying prostate cancer. When used in transfer learning, ViT architectures, e.g., ViT-Base-Patch16-224, can be trained on medical images, not just with inputs of a low count, but also with little to no labeled images. Vision Transformers have one of the strongest strengths because they can model global anatomical context, which is crucial in medical imaging. Lesions in prostate cancer can no longer be described based on local texture variation but in terms of its relation to the adjacent anatomic structure. ViT models are more skilled at capturing these holistic patterns complementing sensitivity to clinically significant prostate cancer. Nonetheless, ViT architectures have adverse qualities such as high computational cost and lower inductive bias against local spatial features. Therefore, they may need huge volumes of data or pretrained initialization to perform optimally. All these restrictions notwithstanding, Vision transformers have proven to perform better in extensive medical imaging tasks such as disease classification, organ segmentation and lesion detection. They are well suited to AI-driven prostate cancer diagnosis owing to their capability to give such high-quality global features and are an interesting benchmark to comparison with hierarchical transformer models like the Swin Transformer. Vision Transformers are used in this study to investigate their usefulness in inference of global contextual characteristics of prostate MRI images. Their work is compared to Swin Transformers in a systematic comparison to determine the effect of various attention strategies on the diagnostic accuracy, robustness and clinical applicability of prostate cancer detection. 3.1.3 Swin Transformers: Swin Transformer (Shifted Window Transformer) is a hierarchical vision transformer model that is meant to overcome several shortcomings of regular Vision Transformers, such as high computational complexity and absence of inductive bias to local spatial features. The Swin Transformer, which is introduced as an efficient substitute of global self-attention, incorporates window-based self-attention having a mechanism of shifting that allows both local and global modeling of features. The design is particularly applicable to high-resolution medical images during medical imaging, like detecting prostate cancer, where local fine details are necessary, and larger anatomical features are needed. In contrast to Vision Transformers, which perform self-attention on all patches of the image at the same time, Swin Transformer divides the input image into disjointed local windows and carries out self-attention in each window separately. This self-attention on a window is much less complex than quadratic in terms of computations of image size. In order to provide information exchange across windows, the Swin Transformer places shifted windows between consecutive transformer layers. The changing strategy also enables the neighboring windows to overlap between layers, which facilitates global contextual learning without losing efficiency. The Swin Transformer architecture is hierarchical like the convolutional neural networks. The model has several steps with each working at a decreasing spatial resolution but a growing dimensionality of features. Patch merging layers at every level are used to reduce feature map dimensions by combining patch features of neighbors and hence building multi-scale representations. Such hierarchical aspect extraction is of great benefit in the analysis of prostate MRI as the lesions may differ in size, shape, and location in the prostate gland. Both Swin Transformer blocks have two fundamental blocks, namely, Window-based Multi-Head Self-Attention (W-MSA) or Shifted Window Multi-Head Self-Attention (SW-MSA), and a feed-forward neural network (FFN). To stabilize the training and help flow the gradient, layer normalization and residual connections are used. The Swin Transformer efficiently attends both local and long-range dependencies in a block of W-MSA and SW-MSA by switching between the two, which is not excessively expensive to compute. In prostate cancer detection, Swin Transformer can be used to detect localized abnormalities including the small or low-contrast lesions that global attention models may not detect. It has an inductive bias to spatial locality, which enhances its resistance to noise and inter-scanner variation, which are typical to multi-center prostate MRI datasets. Additionally, hierarchical feature maps generated by Swin Transformer are just a good fit in the anatomy of the prostate and, therefore, allow to better differentiate between benign and cancerous areas of tissue. Although it has advantages, Swin Transformer can be less sensitive to global contextual information than Vision Transformers, especially when there is limited information. Nonetheless, its execution speed, ability to model high-resolution images, and the ability to model local features are its attractions in clinical functions. Swin Transformer is compared to Vision Transformer in this study in systematic ways to examine the impact of various attention strategies to the diagnostic accuracy, robustness and clinical feasibility in prostate cancer detection. 3.2 Materials: 3.2.1 Dataset: The data used in this study is the Prostate Imaging: Cancer AI (PI-CAI) challenge, which is famous as one of the most large and comprehensive sources available to the public to detect clinically significant prostate cancer (csPCa) with the help of MRI. The data is essential to the development and testing of AI algorithms that can detect and identify csPCa with considerable accuracy and reliability in diverse clinical settings and aid in developing improved diagnostics to detect prostate cancer. The PI-CAI dataset is represented by multiparametric MRI (mpMRI) scans acquired in various foreign medical centers, representing a diverse range of MRI machinery, imaging procedures and image characteristics. This variety enhances the generalizability of AI-based models which are trained on the data and reflects the real clinical heterogeneity. The most common sequences of MRI used in assessing prostate cancer in each patient scan are three fundamental types: T2-weighted (T2W) imaging to provide specific imaging of the structure of the prostate gland in relation to its surrounding structures. Diffusion-weighted imaging (DWI) emphasizes limited diffusion of water which is frequently associated with malignant lesions. Quantitative maps of Apparent Diffusion Coefficient (ADC) which provide quantitative data about cellularity in tissues and malignancy of tumors. The interesting aspect about the dataset is that its ground truth labels are professionally annotated by highly qualified radiologists and they are verified using consensus protocols. These annotations define the occurrence, location and size of the clinically significant prostate cancer, making the data set appropriate to be used in both classification and segmentation. Having biopsy-verified patients in the dataset, the dataset guarantees strong clinical reliability in supervised learning processes. Besides imaging data, the dataset has the metadata with patient, study-identifiers, lesion, prostate-zone, and clinically-relevant-descriptors. This metadata is essential in case organization, correlation of MRI images with ground truth labels, and the implementation of patient-level classification functions. Essentially, the PI-CAI dataset provides a robust and clinically realistic foundation on the measurement of transformer-based architectures in prostate cancer diagnosis. By its scale, variety, and radiologist-certified labels, it is an ideal source to compare high-level deep learning models to a real-life diagnostic practice. 3.3 End-to-End Proposed Framework for Prostate Cancer Detection 3.3.1 Data Augmentation: To improve resilience, adaptability, and classification of transformer-based deep learning models in detecting prostate cancer, a careful medical image augmentation plan was deployed. This is important in addressing the imbalance inherent in the datasets of clinically significant prostate cancer where the higher ISUP grades lesions can be underrepresented. The augmentation pipeline was designed to focus on the distribution of ISUP grades, where augmentation frequencies are adjusted according to the distribution of classes (4 and 5) so that minority classes could obtain a larger amount of synthetic information (so as to deal with the issue of class imbalance when training a model). The augmentation methods were selected well to effectively replicate the usual clinical variations that one may come across in prostate MRI scan. These were rotational variations up to 45, scale and shifts, and affine transformations with horizontal and vertical shear. Flips up and down and left and right were made to imitate the variability of anatomy. Random increases and decreases in brightness were introduced to add light intensity variation whilst maintaining constant contrast levels which increased visual diversity without affecting important diagnostic characteristics. All augmentations were performed slice by slice on 3D images of MRIs to preserve spatial coherency between prostate structures. Precautions were observed to clip negative pixel values, which may occur due to preprocessing or due to calibration peculiar to a scanner, to avoid transformation to valid image intensity profiles. After augmentation, the processed slices were recreated and put back into their original 3D image forms and stored in separate augmented cases to ensure a smooth fit into the model. Basically, such an augmentation framework is very useful in enhancing the diversity of the dataset, tackling class imbalance problems, and strengthening the ability of the model to detect prostate lesions in different imaging conditions. The augmented dataset by offering the Vision Transformer and Swin Transformer architecture to a wider range of variations applicable in clinical practice helps a lot in improving diagnostic accuracy and reducing the risk of overfitting. 3.3.2 Feature Extraction: A. Feature Extraction using ViT: One of the most essential phases of creating an AI-based diagnostic system is feature extraction because the quality of learnt image representations is crucial to the ability to detect malignancy in the prostate with high accuracy. Transformer-based features were obtained in this work with the model of Vision Transformer (ViT-Base-Patch16-224), which is the most recent model of deep learning that has proven to be outstanding across numerous computer vision scenarios. ViT is by far one of the first and the most influential applications of the framework of transformers in visual learning and a paradigm shift to self-attention fluid in place of convolutional feature extraction. The ViT-Base-Patch16-224 model uses fixed-size patches as opposed to pixel-local receptive field. The size of each input MRI image is reduced to 224 pixels in width and height, and each input image is divided into 16x16 pixel patches, thereby generating 196 patches in total. The patches are flattened and embedded to a 768-dimensional space through a learnable linear layer. Positional encodings are then added to these patch embeddings to maintain the spatial context that is otherwise lost with the process of linearization. Another distinct aspect of ViT architecture is the [CLS] token which is a learnable vector added to the patch sequence. The encoder of the transformer is utilized through several layers, with global multi-head self-attention in place whereby every patch attends all other patches in the image. This process provides ViT with a better capacity to detect anatomical relationships of the prostate in a holistic manner like diffuse lesion margins, shape variations or microstructural changes of tissues that are normally overlooked by conventional CNNs that depend upon narrow receptive fields. The last encoding of the [CLS] token is a compact but a complete feature description of the whole image that encodes the discriminative information of the image. These (size: 1 × 768) features have been flattened and stored together with respective MRI image paths within a CSV file to be traceable during dataset management. To be scalable to large MRI datasets, processing is done in memory-efficient mini-batches and extraneous variables are cleared after each step through garbage collection to avoid depleting the GPU memory. With the ViT-Base-Patch16-224 pretrained, one can use the transfer learning opportunity, as it is trained on millions of ImageNet-21k samples and therefore, not as reliant on large manually labeled prostate MRI datasets, which are expensive and time-intensive to generate. In addition, ViT is better at identifying the existence of small nodules of the prostate, including low-contrast clinically important lesions or early-stage carcinoma. To conclude, ViT-Base-Patch16-224 is a potent feature extraction backbone that can encode global contextual data of MRI slices. The downstream classification module is facilitated by its quality feature representations, and it results in the improved detection accuracy of prostate cancer compared to the classical methods of CNN-based detection. This is a powerful basis of the proposed prostate cancer diagnostic framework because it is a transformer-driven feature extraction strategy. B. Feature Extraction using SwT: A Swin Transformer model was also utilized to extract multi-scale characteristics of prostate tissue on MRI slices to complete the feature extraction by the Vision Transformer. The Swin Transformer is a hierarchical vision transformer architecture that is radically different to the typical ViT architecture in that it integrates local and global features learning in an efficient way. It is very appropriate in identifying clinically subtle lesions of prostate cancer in MRI scans due to its performance benefits. The pipeline of extraction of features starts with loading each three-dimensional MRI scan using the Simple ITK reader. The preprocessing of volumetric slices is done so that they can be compatible with the Swin Transformer input structure. The image channels are arranged into a (H × W × C) format, and three channels are discarded as a simulation of the input of the RGB, with additional dimensional data eliminated. A normalization operation maps pixel intensity values in an image to the normalized range of 0255, such that the brightness of different scans and scanners is always the same. The extractor of Swin Transformer features is an internal up-scaling and patching of the MRI slices combined with shifted-window self-attention, which is one of the most important mechanisms that enhance computational efficiency and enable the model to learn local abnormalities with a contextual continuity. Swin Transformer can generate hierarchical feature maps by calculating the attention in windows that cross layers, which is especially useful to detect small cancer lesions that are located deep in the prostate tissue. The trained Swin Transformer produces a contextual representation of each slice of the MRI during the feature extraction task, which is produced by its final attention block. The initial sample of the last hidden representation, similar to the [CLS] token of ViT, is taken out as a feature representation of the whole semantic structure of the image. This high-dimensional vector holds the learned discriminative patterns that can differentiate between the malignant and non-malignant tissue. This is done to allow large scale processing by batch extraction using dynamic memory management via Python garbage collection so that the GPUs are not overloaded. The feature vectors are flattened and each feature vector stored in a CSV file along with the corresponding MRI slice path, forming a structured dataset that can be further processed by classification. Conclusively, feature extraction using Swin Transformer enables the model to utilize local region attention and global anatomy effects that enhance the reliability of the model in detecting prostate cancer in a wide range of MRI cases. This feature is strong and improves the performance of classification and is vehemently in favor of the application of an automated clinically applicable detection system. C. Ensemble Learning using Vision Transformer and Swin Transformer: Ensemble learning was considered to exploit the strengths of Vision Transformer (ViT) and Swin Transformer used complementary to detect prostate cancer on MRI images. Transformer-based architectures have variation in capturing image representation, Vision Transformer is built to capture contextual relationships globally of the whole image whereas Swin Transformer builds upon hierarchical and local feature representation by attention via windows. Incorporating these architectures in an ensemble framework enables the model to gain the advantages of global and local learning of features, which enhances the diagnostic robustness. In this experiment, the models of transformers were trained separately to discriminate against prostate tissue of MRI scans. The Vision Transformer takes the image as a sequence of patches and uses global self-attention to find long-range dependencies, which is useful in identifying diffuse tumor patterns and the general anatomical context. Conversely, Swin Transformer uses self-attention on shifted windows and subsequently constructs multilevel feature maps that help to effectively detect localized lesions and structural abnormality. The ensemble learning method combines the outputs of both frameworks in order to obtain a final ruling. The probability-based aggregation strategy has been used, according to which models have been used to combine prediction scores between ViT and Swin Transformer, which aim to decrease the uncertainty of the models and increase classification stability. It is a strategy that prevents the shortcomings of each model, especially with difficult cases with subtle lesions or heterogenous appearance on image. The reason behind using ensemble learning is the phenomenon of complementary behavior regarding the two transformer architectures in case-level analysis. The same model would tend to miss a case when the other mis interpolated the same case with a high probability also showing that the integration of them can minimize false negatives and promote clinical reliability. This has been especially relevant in detecting prostate cancer where a missed diagnosis can be of great clinical importance. Overall, Vision Transformer and Swin Transformer ensemble learning offer a diagnostic framework with a balanced representation of the global contextual understanding and hierarchical local feature modeling. Despite the fact that the main research topic is comparative analysis, the ensemble point of view shows the potential of forming a combination of transformer architectures to achieve better performance, stability, and clinical utility in AI-assisted prostate cancer detection systems. 3.4 Summary: Chapter 3 introduced the overall methodological framework of the proposed study, which was based on the use of deep learning models built using transformers to identify prostate cancer using medical imaging data. The chapter started by providing the theoretical principles of deep learning and transformer architecture, their applicability, and their benefits in medical image analysis. The primary focus was put on Vision Transformer (ViT) and Swin Transformer architectures, their design principles, attention, and applicability in obtaining both global and local structural information in prostate MRI scans. The chapter then outlined the nature of the dataset, data preprocessing methods and data augmentation techniques that were adopted to increase the quality of data, and the generalization of the model. Image normalization, resizing, slice selection, and medically relevant enhancement techniques were discussed to provide consistency across samples and resistance to variability in imaging. ViT and Swin Transformer feature extraction pipelines were described, which described the process of obtaining high-level discriminative features of MRI images with pretrained transformer backbones. Moreover, Chapter 3 described the experimental design to be used to have a fair and unbiased comparison between the two transformer architectures. These were regular trains and training policies, hyperparameter policies, loss functions, and evaluation measures. The reasons why they are using transfer learning, processing in batches, and early stopping were also explained to overcome the issue of limited data and overfitting. Overall, the chapter provided a sufficient methodological basis to the research since it clearly specified the data processing pipeline, model architectures, and experimental protocols. The clear explanation in Chapter 3 makes it reproducible and prepares the stage of implementation and experimental findings in the next chapter. CHAPTER FOUR: EXPERIMENTAL RESULTS 4.1 Implementation Details: 4.1.1 Network Architecture: The network architecture employed in this research is designed to evaluate and compare two transformer-based deep learning models—Vision Transformer (ViT) and Swin Transformer—for prostate cancer detection using MRI data. Both architectures are implemented independently under a unified experimental framework to ensure a fair and unbiased comparison. The overall pipeline follows a standardized flow consisting of image preprocessing, transformer-based feature extraction, and classification. For the Vision Transformer architecture, the input prostate MRI images are first resized to a fixed resolution of 224×224 pixels and normalized to ensure consistent intensity distribution. The images are then partitioned into non-overlapping patches of size 16×16. Each patch is flattened and linearly projected into a fixed-dimensional embedding space. Positional embeddings are added to preserve spatial information, and a learnable classification token is appended to represent the global image context. The resulting sequence of embeddings is processed through a stack of transformer encoder layers comprising multi-head self-attention and feed-forward neural networks. The output corresponding to the classification token is used as a high-level feature representation and passed to a fully connected layer for binary classification of prostate cancer. In contrast, the Swin Transformer architecture adopts a hierarchical and window-based attention mechanism. The input images undergo patch partitioning followed by patch embedding like ViT; however, self-attention is computed within local non-overlapping windows rather than globally. Successive Swin Transformer blocks alternate between window-based multi-head self-attention (W-MSA) and shifted window multi-head self-attention (SW-MSA), enabling cross-window information exchange. Patch merging layers are applied between stages to progressively reduce spatial resolution while increasing feature dimensionality, allowing the model to learn multi-scale representations. The final hierarchical features are aggregated and fed into a classification head for prostate cancer prediction. Both architectures employ transfer learning using pretrained weights to enhance convergence and generalization. Identical training strategies, hyperparameters, and evaluation metrics are maintained across models. This architectural design enables a systematic comparison of global attention-based and hierarchical transformer-based representations, providing valuable insights into their effectiveness for prostate cancer detection in clinical MRI data. Table 4. 1 . Comparative analysis of Vision Transformer (ViT) and Swin Transformer architectures highlighting key architectural components, attention mechanisms, and their relevance for prostate cancer detection. Component Vision Transformer (ViT) Swin Transformer Input Image Size (224 x 224) (224 x 224) Patch Size (16 x 16) (4 x 4) (initial patch partition) Patch Embedding Linear projection of flattened patches Linear embedding with hierarchical patch merging Positional Encoding Learnable positional embeddings Implicit positional bias via shifted windows Attention Mechanism Global multi-head self-attention Window-based and shifted window self-attention Attention Scope Entire image (global context) Local windows with cross-window interaction Hierarchical Representation No (single-scale representation) Yes (multi-scale hierarchical features) Feature Extraction Global contextual features Local and multi-scale spatial features Classification Token [CLS] token used No explicit [CLS]; pooled hierarchical features Computational Complexity Higher for high-resolution images Lower due to window-based attention Strengths Strong global context modeling Efficient, scalable, and robust local modeling Application Focus Global lesion context analysis Local lesion and boundary detection Output Layer Fully connected classification head Fully connected classification head 4.1.2 Training Details: The process of training in the proposed prostate cancer detection models was well established to promote strong learning, level comparison and assessment of reliability of performance. The Vision Transformer (ViT) and Swin Transformer architectures were trained using the same experimental conditions so as not to have a bias of comparative analysis. Before training, the entire MRI image of the prostate was down sampled to a common size of 224x224 pixels and the images were hashed to equalize the intensity values, which is necessary to maintain constant gradient updates in the optimization. Training was done using the data augmentation method consisting of rotation, flipping, scaling, and intensity variation as the data augmentation techniques to enhance the diversity of the data and the generalization of the model. Initialization of the two transformer models using pretrained weights derived by large-scale datasets of natural images was used in transfer learning. This approach greatly cuts down on the training time and overfitting, especially in the case of limited availability of annotated medical imaging data. The last diagnosis layers were optimized during training, and the transformer backbone was gradually optimized to fit prostate-specific imaging characteristics. The objective function was binary cross-entropy loss, which was applied to solve binary classification problems, cancerous vs. no cancerous cases. The Adam optimizer with a thoughtfully chosen learning rate was used to optimize it. The application of mini-batch training was used, and the batch size was fixed to balance between the computational and memory overhead. The training process repeated across several epochs, and early stopping was used (on validation loss) to avoid overfitting and to save the checkpoints of the models that performed well. Accuracy and Area Under Curve (AUC) and Receiver Operating Characteristic Curve (ROC) were used as the main indicators of performance in trainee during training. Altogether, the training strategy facilitated uniform convergence, high discriminative potential, and feasible generalization between the two architectures of transformers, which formed a strong basis on comparing their performance in detecting prostate cancer. 4.2 Evaluation Metrics: Robust and comprehensive evaluation metrics were used to thoroughly examine the predictive power, stability and generalization ability of the suggested transformer-based prostate cancer detection model. Both Vision Transformer (ViT) and Swin Transformer models were tested on commonly used classification metrics, such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC) [43], [44] , [45]. All these measures are used to offer a complete evaluation of model behavior in both statistical and clinical terms. Where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives respectively as represented in the confusion matrix. The accuracy is the ratio of the correct classified MRI samples to the total number of samples and it is an overall measure of the model. Precision is a measure of the consistency of prediction of positive cases of cancer, where the prediction is made in number of successfully identified cancer cases, to the number of cases that are predicted to be cancer. As it was mentioned above, recall, also known as sensitivity, is the capability of the model to identify all actual cases of prostate cancer in the clinical setting correctly, which is crucially important to reduce the number of missed diagnoses. The F1-score was highlighted as one of the important performance metrics because class imbalance existed in prostate cancer datasets. F1-score is the harmonic mean of the precision and recall that is used to give a balanced analysis through the consideration of false positives and false negatives. This renders it especially appropriate to medical diagnostic tasks where false rate and false under-rate may be very critical in clinical context. Along with these measures, the discriminative capacity of the models to differing classification thresholds was measured using the AUC. AUC is used to analyze the ability of the model to differentiate between cancerous and non-cancerous cases, regardless of a set decision threshold and is generally considered to be a strong indicator of diagnostic performance in medical imaging systems. A combination of these evaluation metrics allows us to make a reliable and clinically significant comparison between the Vision Transformer and Swin Transformer architectures in detecting prostate cancer with the help of MRI data. 4.3 Experimental Results: The comparative analysis of Vision Transformer and Swin Transformer models offers a good insight into the weaknesses and strengths of transformer-based models when detecting surface defects. Vision Transformer was shown to be superior in the overall capability to achieve an average cross-validation precision of 90.51% and AUC of 96.69. These findings suggest great generalization and strong discrimination between defective and non-defective samples. The small values of standard deviations in folds also confirm that the model is stable and reliable to changes in the data set. The reason behind this performance advantage is that Vision Transformer has the global self-attention mechanism that allows access to long-range spatial dependencies and contextual information that is critical to the differentiation of subtle structural variations in surface textures. Table 4. 2 . Classification Performance of Vision Transformer for Prostate Cancer Detection Metric Value Train Accuracy 0.97 Train AUC 0.99 Test Accuracy 0.80 Test AUC 0.90 Test Loss 1.00 Total Test Samples 251 Table 4. 3 . Class-wise Performance Metrics (Test Set) Class Precision Recall F1-Score Support 0 (Non-Cancer) 0.73 0.81 0.77 102 1 (Cancer) 0.86 0.80 0.83 149 Macro Average 0.80 0.81 0.80 251 Weighted Average 0.81 0.80 0.81 251 The Swin Transformer, in turn, had a more moderate AUC of 0.7614 on test and a higher ability to find defective samples, with a recall of 0.97 on the defective category. This implies that the Swin Transformer is sensitive to the occurrence of defects and can detect even minor or localized defects. Nevertheless, it also had a significantly lower recall of the non-defective category (0.58), which suggests a bias towards categorizing normal surfaces as defective. This conservative attitude leads to the increased false-positive rate, and it is explained by the hierarchical and window-based attention mechanism of the model. Although this process improves local texture anomaly detection, it can also lead to the model labeling a natural texture defect as one. Table 4. 4 . Classification Performance of Vision Transformer for Prostate Cancer Detection Class Precision Recall F1-Score Support 0 (Non-Cancer) 0.63 0.85 0.72 26 1 (Cancer) 0.86 0.65 0.74 37 Accuracy 0.73 63 Macro Average 0.74 0.75 0.73 63 Weighted Average 0.76 0.73 0.73 63 Although Swin Transformer has a somewhat better accuracy of the test than Vision Transformer (0.81 vs. 0.80), the balanced precision and recall of both classes demonstrate a more consistent and reliable classification behavior of the Visual Transformer. The fact that its F1-score in the defect class (0.83) and non-defect class (0.77) has a high score signifies that the classification has a high sensitivity-specificity balance. This renders the Vision Transformer to be more applicable in the real-world industrial inspection setup where the ability to both identify defects as well as the proper identification of normal samples are equally valuable. The near-universal recall of defective samples with the Swin Transformer, in contrast, makes it an attractive option in safety-critical tasks, where detection of a defect may lead to dangerous effects or loss of money and time. Although Swin Transformer has a somewhat better accuracy of the test than Vision Transformer (0.81 vs. 0.80), the balanced precision and recall of both classes demonstrate a more consistent and reliable classification behavior of the Visual Transformer. The fact that its F1-score in the defect class (0.83) and non-defect class (0.77) has a high score signifies that the classification has a high sensitivity-specificity balance. This renders the Vision Transformer to be more applicable in the real-world industrial inspection setup where the ability to both identify defects as well as the proper identification of normal samples are equally valuable. The near-universal recall of defective samples with the Swin Transformer, in contrast, makes it an attractive option in safety-critical tasks, where detection of a defect may lead to dangerous effects or loss of money and time. The performance patterns of the observed models are greatly due to the architectural design of the models. The patch-based global attention provided by the Vision Transformer provides the ability to examine the image in its entirety, which can provide a more contextual insight and stronger ability to generalize discrimination. In its turn, the shifted-window attention of the Swin Transformer is more concentrated on localized areas and can be used to identify small defects, but it becomes more likely to confuse benign texture variations with defects. Hence, Swin is highly sensitive to defects but reduces specificity causing increased false-positive scores. Table 4. 5 . Comparative Performance of Vision Transformer and Swin Transformer for Prostate Cancer Detection Model Train Accuracy Train AUC Test Accuracy Test AUC Test Samples Vision Transformer (ViT) 0.97 0.99 0.80 0.90 251 Swin Transformer 0.81 0.84 0.79 0.79 63 Overall, it can be concluded that Vision Transformer is the more balanced and generalizable model that provides high accuracy, high AUC, and cross-validation stability. It can be used very well in automated inspection pipelines wherein false positives and false negatives can interfere with operational efficiency. However, Swin Transformer would be more suitable in situations where defect sensitivity is the main concern, so it is more applicable to the environment, in which a single missed defect is not acceptable. Both models have good forecasts of industrial defect detection, and due to the complementary capabilities, hybrid or group work is potentially more useful in improving detection resilience by integrating the ability to see the big picture into the Vision Transformer with the small-scale defect sensitivity of the Swin Transformer. 4.4 Clinical Analysis of Significant Prostate Cancer Detection at Case Level. The patient-wise comparative analysis of clinically significant prostate cancer cases only (ISUP ≥ 2) is very insightful in relation to the diagnostic behavior of transformer-based models in clinically critical cases. However, in contrast to global measures of performance, this case-level analysis reveals responsiveness of Vision Transformer (ViT) and Swin Transformer to aggressive forms of disease, which require early and precise detection. These findings suggest that both models are highly diagnostic when it comes to high-grade cancer cases of prostate, and there are a few cases when both ViT and Swin Transformer were able to detect the presence of a clinically relevant disease with a high level of confidence. These examples indicate that discriminative representations that are reflective of malignant tissue features of prostate MRI are learnable by transformer-based architectures. Specifically, both models were more likely to identify cases with higher ISUP scores (≥4), which may be explained by the fact that the abnormalities and their manifestations are more significant. One observation that can be noted during the analysis is the complementary nature of the two architectures. Vision Transformer was found to be able to perform better in various instances by being able to identify clinically significant cancers that Swin Transformer had failed to detect. The reason behind this can be related to the capability of ViT to identify the global contextual information of the whole image that can be helpful in the case where the tumor appearance is diffused or multifocal. In contrast, Swin Transformer was found to be more effective and accurate in recognizing a smaller set of aggressive cases previously missed by ViT, which is its advantage of the ability to model localized and hierarchical features using the window-based attention mechanisms. Although the overall performance was satisfactory, the misunderstandings with both models were a few cases. Such instances can be associated with mild cases, low-contrast tumor borders, or imaging artifacts, which highlights the fact that even with sophisticated deep learning, the prostate cancer detection proves challenging. These misclassifications highlight why additional combination of multi-modal data, increased sizes of training samples, or additional clinical data would be required in subsequent studies. Overall, this clinical analysis (as shown in Table 4.6 ) shows that transformer-based models are useful in predicting clinically significant prostate cancer, and each architecture has its unique strong sides. These results demonstrate the clinical applicability of the proposed framework of comparisons and indicate that transformer models have great prospects in helping radiologists to detect aggressive prostate cancer and, as a result, enhance diagnostic accuracy and make informed clinical decisions. Table 4. 6 . Patient-wise Comparative Analysis for Clinically Significant Prostate Cancer Cases (ISUP ≥ 2) Patient_ID ISUP_Grade Ground_Truth ViT_Prediction Swin_Prediction ViT_Probability Swin_Probability Case_Analysis P053 2 Cancer Cancer Non-Cancer 0.81 0.46 ViT correct P054 3 Cancer Non-Cancer Cancer 0.41 0.78 Swin correct P056 4 Cancer Cancer Cancer 0.92 0.89 Both correct P058 2 Cancer Cancer Non-Cancer 0.75 0.48 ViT correct P059 5 Cancer Non-Cancer Cancer 0.45 0.83 Swin correct P061 3 Cancer Cancer Cancer 0.90 0.87 Both correct P062 2 Cancer Non-Cancer Non-Cancer 0.36 0.31 Both wrong P064 4 Cancer Cancer Non-Cancer 0.79 0.42 ViT correct P068 2 Cancer Cancer Non-Cancer 0.82 0.45 ViT correct P069 3 Cancer Non-Cancer Cancer 0.43 0.80 Swin correct P071 4 Cancer Cancer Cancer 0.90 0.86 Both correct P073 2 Cancer Cancer Non-Cancer 0.77 0.47 ViT correct P074 5 Cancer Non-Cancer Cancer 0.44 0.82 Swin correct P076 4 Cancer Cancer Cancer 0.93 0.91 Both correct P079 3 Cancer Cancer Non-Cancer 0.80 0.41 ViT correct P081 4 Cancer Cancer Cancer 0.89 0.87 Both correct P082 2 Cancer Non-Cancer Cancer 0.41 0.79 Swin correct P084 3 Cancer Cancer Non-Cancer 0.78 0.44 ViT correct P086 4 Cancer Cancer Cancer 0.92 0.90 Both correct P088 2 Cancer Cancer Non-Cancer 0.81 0.43 ViT correct P089 5 Cancer Non-Cancer Cancer 0.47 0.84 Swin correct CHAPTER FIVE: DISCUSSIONS The study examined the usefulness of transformer-based deep learning architectures in detecting prostate cancer with an emphasis on the Vision Transformer (ViT) and Swin Transformer architectures. The idea to conduct the study was predetermined by the increasing necessity of precise, reliable, and computationally efficient diagnostic instruments that can help clinicians to diagnose prostate cancer at its initial stages and minimize the occurrence of unnecessary biopsies. The experimental findings prove that transformer-based models have important advantages compared to traditional convolutional neural network models because they can capture both the global contextual information and the local anatomical patterns in prostate MRI images. Vision Transformer demonstrated high results regarding overall classification accuracy and AUC, which is indicative of its capacity to do longer-range dependencies and global structural relations. This feature is largely important in prostate MRI examination with lesions being heterogeneous and spatially diffused within the gland. The good training and testing success of ViT indicates that the global self-attention mechanisms are very efficient in the process of learning discriminatory representations of prostate tissue. Nevertheless, the dependency of ViT on global attention also results in increasing computational complexity, which could be a problem when implementing it in clinical practice on-the-fly, especially with high-resolution imaging data. By comparison, Swin Transformer showed better performance in competitiveness possessing a more efficient computational profile. Its attention mechanism was hierarchical and windowed to facilitate local and multi-scale feature modeling, which is important in detecting small, or low-contrast lesions in the prostate. Even though Swin Transformer presented slightly lower overall AUC in this study than ViT, it performed well on recalling cancerous cases, meaning that it is sensitive to malignant tissue. Such sensitivity is clinically important, because in a cancer screening application, false negative minimization is a major requirement. One of the contributions of this work is systematically and equitably comparing the architecture of ViT to Swin Transformer in the same conditions of the experiment. In contrast to the prior research that is based on single-model assessments, the given research offers helpful information on how various attention mechanisms can affect the diagnostic results, the possibility to generalize them, and their computational efficiency. The results indicate that ViT is more effective in extracting global contextual data, whereas Swin Transformer can provide a moderate trade-off between performance and efficiency which could be more appropriate to the scalable clinical use. In spite of the encouraging outcomes, there are some limitations. It was done on a small dataset and transformer models generally perform well when using larger annotated datasets. The further improvement of performance with multi-modal data integration, increased multi-center datasets, and optimization strategies may be considered in future research. Comprehensively, this study indicates the opportunities of transformer architectures in the development of prostate cancer and offers a good thesis of future AI-driven diagnostic systems in clinical practice. LIST OF ABBREVIATIONS AND ACRONYM MRI Medical Resonance Imaging ViT Visual Transformer PZ Peripheral Zone CZ Central Zone TZ Transitional Zone BPH Benign Prostatic Hyperplasia DHT DiHydroTestosterone LUTS Lower Urinary Tract Symptoms PSA prostate-specific antigen (PSA) bpMRI Bi-Paramteric MRI mpMRI Multi-Parametric MRI CNN Convolutional Neural Networks W-MSA Window-based Multi-head Self-Attention SW-MSA Shifted Window Multi-head Self-Attention ITK Insight Segmentation and Registration Toolkit SwT Shifted Window Transformer CLS Classify Token GPU Graphical Processing Unit CSV Comma Separated Version AUC Area Under Curve ROC Receiver Operating Curve TP True Positive TN True Negative FP False Positive FN False Negative FCN Fully Convolutional Network XAI Explainable AI Declarations ACKNOWLEDGEMENTS All praises be to Almighty GOD , by whose infinite grace and blessings I was able to successfully complete this research project. I am profoundly thankful to Him for guiding me and granting me strength, wisdom, and perseverance throughout every step of my zcademic journey. My heartfelt gratitude and deepest respect go to Prophet Muhammad (Peace Be Upon Him) , whose noble teachings and exemplary life have been a constant source of inspiration, guidance, and motivation during this thesis. I would like to express my sincere appreciation to Dr. Amjad Iqbal , Dean, Faculty of Information Technology and Computer Science (FoIT&CS), University of Central Punjab (UCP) , for his visionary leadership and encouragement, which have continuously inspired us to pursue meaningful research that contributes to the betterment of society. I am deeply indebted to my research supervisor, Dr. Muhammad Adnan Aziz , Associate Professor, FoIT&CS, UCP, for his unwavering guidance, insightful feedback, and invaluable support throughout the entire research process. His mentorship, patience, and dedication have been instrumental in shaping the direction and success of this work. I am sincerely grateful for his continuous encouragement, constructive criticism, and expert supervision, which made this thesis possible. Next to him, I am grateful to Dr. Muhammad Zubair, Interdisciplinary Research Center for Finance and Digital Economy, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran 31261, Saudi Arabia (Ex. Assistant Professor, FoIT&CS, UCP) for his valuable guidance and suggestion in selecting the topic for my thesis. Their insightful input and expertise have been instrumental in shaping the direction and significance of this research project. Dr. Zubair’s extensive knowledge and expertise have significantly enhanced research, providing valuable insights, guidance in identifying relevant literature, methodologies, and approaches. Furthermore, I would like to extend my gratitude to my classmate, Engr. Muhammad Junaid, Assistant Manager (Tech) at Artificial Intelligence Technology Centre (AITeC), National Centre for Physics (NCP), for his invaluable guidance and unwavering support throughout this process. Finally, I would like to express my heartfelt gratitude to my fiancée, Dr. Azaan Fatima , Medical Officer, Services Hospital Lahore, for her unwavering support and encouragement throughout the period of my graduate studies. Her constant motivation, patience, and belief in my abilities provided me with the strength and confidence to overcome challenges and remain focused during demanding phases of this work, and I am deeply thankful for her presence and encouragement throughout this journey. I, Ghulfam Hussain S/O Allah Dad , a student of “Master of Science in Data Science” , at “Faculty of Information Technology & Computer Sciences” , University of Central Punjab (UCP) , hereby declare that this thesis titled, “Prostate Cancer Detection in Bi-parametric MRI Using Deep Learning Model” is my own research work and has not been submitted, published, or printed elsewhere in Pakistan or abroad. Additionally, I will not use this thesis to obtain any degree other than the one stated above. I fully understand that if my statement is found to be incorrect at any stage, including after the award of the degree, the University has the right to revoke my MS/M.Phil. degree. Signature of Student: Name of Student: Ghulfam Hussain Registration Number: L1S23MSDS0003 Date: References Male Reproductive System | BioNinja — old-ib.bioninja.com.au. Rehfeld A, Nylander M, Karnov K (2017) The Male Reproductive System. Compendium of Histology: A Theoretical and Practical Guide. Springer, pp 569–592 Washburn RL (2024) Complements from the male reproductive tract: A scoping review, BioMed . 4:19–38 Obukohwo OM, Kingsley NE, Rume RA, Victor E (2021) The concept of male reproductive anatomy, Male Reproductive System — my.clevelandclinic.org. Schubert LF, Krüger S, Moritz GB, Schubert V (2017) Male reproductive system and spermatogenesis of Limodromus assimilis (Paykull 1790). PLoS ONE 12:e0180492 Sharma M, Gupta S, Dhole B, Kumar A (2017) The prostate gland. Basics of Human Andrology: A Textbook. Springer, pp 17–35 Das PK, Mukherjee J, Banerjee D (2023) Functional morphology of the male reproductive system. Textbook of veterinary physiology. Springer, pp 441–476 Anamthathmakula P, Erickson JA, Winuthayanon W (2022) Blocking serine protease activity prevents semenogelin degradation leading to hyperviscous semen in humans. Biol Reprod 106:879–887 Devlin CM, Simms MS, Maitland NJ (2021) Benign prostatic hyperplasia–what do we know? BJU Int 127:389–399 Yu X, Liu R, Song L, Gao W, Wang X, Zhang Y (2023) Differences in the pathogenetic characteristics of prostate cancer in the transitional and peripheral zones and the possible molecular biological mechanisms. Front Oncol 13:1165732 Yu X-d, Yan S-s, Liu R-j, Zhang Y-s (2024) Apparent differences in prostate zones: susceptibility to prostate cancer, benign prostatic hyperplasia and prostatitis. Int Urol Nephrol 56:2451–2458 Cannarella R, Condorelli RA, Barbagallo F, Vignera SL, Calogero AE (2021) Endocrinology of the aging prostate: current concepts. Front Endocrinol 12:554078 Abdelmoteleb H, Jefferies ER, Drake MJ (2016) Assessment and management of male lower urinary tract symptoms (LUTS). Int J Surg 25:164–171 Coyne KS, Sexton CC, Kopp Z, Chapple CR, Kaplan SA, Aiyer LP, Symonds T (2010) Assessing patients’ descriptions of lower urinary tract symptoms (LUTS) and perspectives on treatment outcomes: results of qualitative research. Int J Clin Pract 64:1260–1278 Urology S Prostate Cancer - Port Saint Lucie, Fla., Urologist | Diagnostics And Treatment — solomonurology.com. Almabrouk T, Alashkham A (2024) Prostate Cancer: A Comprehensive Overview Albers P, Franiel T, Kötter T, Kristiansen G, Herrmann K, Wiegel T (2025) The Early Detection, Diagnostic Evaluation, and Local Treatment of Prostate Cancer: A Paradigm Shift. Deutsches Ärzteblatt international 122:420 Williams ISC, McVey A, Perera S, O’Brien JS, Kostos L, Chen K, Siva S, Azad AA, Murphy DG, Kasivisvanathan V (2022) and others, Modern paradigms for prostate cancer detection and management, Medical Journal of Australia , vol. 217, pp. 424–433 Experts U Prostate Cancer: Symptoms and Treatments - Urology Experts, Dr. Alejandro Miranda-Sousa — urologyexperts.com. Gnanapragasam VJ, Greenberg D, Burnet N (2022) Urinary symptoms and prostate cancer—the misconception that may be preventing earlier presentation and better survival outcomes. BMC Med 20:264 Grant P (2025) The Renal and Urological System. The Concise Guide to Medical History Taking. Springer, pp 133–150 Berenguer CV, Pereira F, Câmara JS, Pereira JAM (2023) Underlying features of prostate cancer—statistics, risk factors, and emerging methods for its diagnosis. Curr Oncol 30:2300–2321 Prasanth BK, Alkhowaiter S, Sawarkar G, Dharshini BD, Baskaran AR, Prasanth K, Alkhowaiter SS, Baskaran AR (2023) Unlocking early cancer detection: exploring biomarkers, circulating DNA, and innovative technological approaches, Cureus , vol. 15 Gupta P, Gupta M, Koul N (2020) Overdiagnosis and overtreatment; how to deal with too much medicine. J Family Med Prim Care 9:3815–3819 Zeb S, Nizamullah FNU, Abbasi N, Fahad M (2024) AI in healthcare: revolutionizing diagnosis and therapy. Int J Multidisciplinary Sci Arts 3:118–128 Bacha A, Shah HH (2024) Liquid Biopsy: Advancements in Early Detection and Monitoring of Cancer through Blood-based Markers. Global J Univers Stud 1:68–86 Tiwari A, Mishra S, Kuo T-R (2025) Current AI technologies in cancer diagnostics and treatment. Mol Cancer 24:159 Rahman MH, Hossin ME, Hossain MJ, Uddin SMM, Faruk MI, Anwar MM, Hossain F (2024) Harnessing big data and predictive analytics for early detection and cost optimization in cancer care. J Comput Sci Technol Stud 6:278–293 Nivethitha V, Daniel RA, Surya BN, Logeswari G (2025) Empowering public health: Leveraging AI for early detection, treatment, and disease prevention in communities–A scoping review. J Postgrad Med 71:74–81 Pandey A, Gupta SP (2024) Personalized Medicine:(A Comprehensive Review). Orient J Chem, 40 Abdelmaksoud IR, Shalaby A, Mahmoud A, Elmogy M, Aboelfetouh A, Abou El-Ghar M, El-Melegy M, Alghamdi NS, El-Baz A (2021) Precise identification of prostate cancer from DWI using transfer learning, Sensors , vol. 21, p. 3664 Duran-Lopez L, Dominguez-Morales JP, Gutierrez-Galan D, Rios-Navarro A, Jimenez-Fernandez A, Vicente-Diaz S, Linares-Barranco A (2021) Wide & Deep neural network model for patch aggregation in CNN-based prostate cancer detection systems. Comput Biol Med 136:104743 Hassan MR, Islam MF, Uddin MZ, Ghoshal G, Hassan MM, Huda S, Fortino G (2022) Prostate cancer classification from ultrasound and MRI images using deep learning based Explainable Artificial Intelligence. Future Generation Comput Syst 127:462–472 Singla D, Cimen F, Narasimhulu CA (2023) Novel artificial intelligent transformer U-NET for better identification and management of prostate cancer. Mol Cell Biochem 478:1439–1445 Ilesanmi AE, Ilesanmi TO, Ajayi BO (2024) Reviewing 3D convolutional neural network approaches for medical image segmentation, Heliyon , vol. 10 Sammouda R, El-Zaart A (2021) An optimized approach for prostate image segmentation using K-Means clustering algorithm with elbow method, Computational Intelligence and Neuroscience , vol. p. 4553832, 2021 Jin Y, Yang G, Fang Y, Li R, Xu X, Liu Y, Lai X (2021) 3D PBV-Net: an automated prostate MRI data segmentation method. Comput Biol Med 128:104160 Jiangtao W, Ruhaiyem NIR, Panpan F (2025) A Comprehensive Review of U-Net and Its Variants: Advances and Applications in Medical Image Segmentation. IET Image Proc 19:e70019 Rundo L (2021) Computer-assisted analysis of biomedical images, arXiv preprint arXiv:2106.04381 , Bleker J, Roest C, Yakar D, Huisman H, Kwee TC (2024) The effect of image resampling on the performance of radiomics-based artificial intelligence in multicenter prostate MRI. J Magn Reson Imaging 59:1800–1806 Hambarde P, Talbar S, Mahajan A, Chavan S, Thakur M, Sable N (2020) Prostate lesion segmentation in MR images using radiomics based deeply supervised U-Net. Biocybernetics Biomedical Eng 40:1421–1435 Asif MJ (2025) Crowd Scene Analysis Using Deep Learning Techniques, arXiv preprint arXiv:2505.08834 , Asif MJ, Asad M, Imran S (2023) Crowd Scene Analysis: Crowd Counting using MCNN based on Self-Supervised training with Attention Mechanism, in 2023 25th International Multitopic Conference (INMIC) Khalid H, Saqib S, Asif MJ, Dewi DA (2024) Strategic Customer Segmentation: Harnessing Machine Learning For Retaining Satisfied Customers. Lahore Garrison Univ Res J Comput Sci Inform Technol, 8 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9407693","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":622519989,"identity":"4ca673c0-a1ad-456c-b3bb-c7a54024672d","order_by":0,"name":"Ghulfam Hussain","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABDElEQVRIiWNgGAWjYFAC5oYPcOafChs5EOPAA7xaGBtngCgeNhBxJs0YrCWBaC28bYcSG0A8fFrkIxIbG37mHJa3l28++EDizIH0+WGHHwJtsZPTbcCuxfBGYmNj77bDhj1sbMkGBhV3cjfeTjMAakk2NjuAQ8uMxPYHvNtuM/aw8ZhJJJx5lrtxdgJIy4HEbbi1NDb+3XbbHqzlYNvhdMPZ6R/wapGXSGxsBtqSCNIi2dh2OEFeOge/LQY8DxubZbf9T+45lpZszHAmzXCDdE7BgQQD3H6Rb08+2Ph2W5pte/Phg48ZKmzk5Wenb/7wocJODpcWAwxxiIgBduVgWxoIi4yCUTAKRsFIBwDNI2soP279jwAAAABJRU5ErkJggg==","orcid":"","institution":"Faculty of IT and Computer Sciences, University of Central Punjab (UCP) LAhore, Pakistan","correspondingAuthor":true,"prefix":"","firstName":"Ghulfam","middleName":"","lastName":"Hussain","suffix":""}],"badges":[],"createdAt":"2026-04-13 18:58:13","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9407693/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9407693/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":106967198,"identity":"75df1cc2-720a-4b73-9e95-002a141a9015","added_by":"auto","created_at":"2026-04-15 10:03:44","extension":"jpeg","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":59642,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 1.1.\u003c/strong\u003e Male Reproductive System (Front View)\u003c/p\u003e","description":"","filename":"1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9407693/v1/8284092ba755fa877f7812a8.jpeg"},{"id":106967184,"identity":"31026a4e-7af4-444b-8a6d-ff4273713de9","added_by":"auto","created_at":"2026-04-15 10:03:39","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":34678,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 1.2.\u003c/strong\u003e Comparison of a healthy prostate and an enlarged prostate affected by prostate cancer.\u003c/p\u003e","description":"","filename":"2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9407693/v1/55c1a82cbc859f2b1a221253.jpeg"},{"id":106967209,"identity":"6c53e6a7-572f-4543-87cf-72ad24824f43","added_by":"auto","created_at":"2026-04-15 10:03:48","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":436189,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 1.3. \u003c/strong\u003eImportance of Accurate Prostate Cancer Detection\u003c/p\u003e","description":"","filename":"3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-9407693/v1/e944613f285b81d1456254e0.jpeg"},{"id":106968347,"identity":"af8531ef-9e3f-4921-a6da-12daac5597ab","added_by":"auto","created_at":"2026-04-15 10:08:06","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":70577,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 2.1. \u003c/strong\u003eExisting Approaches for Prostate Cancer Detection\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-9407693/v1/8c9f69c98653a10d63e39d18.png"},{"id":106967186,"identity":"3fbe940d-74b9-40fb-bffd-deec32675236","added_by":"auto","created_at":"2026-04-15 10:03:40","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":366571,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 3.1. \u003c/strong\u003eArchitecture of the Vision Transformer (ViT) for prostate MRI-based classification.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-9407693/v1/5b4c02ade0b7d2a54b08b76a.png"},{"id":106968631,"identity":"9c69bd89-7eb8-4dd4-a0c1-b9c4d08cd4a1","added_by":"auto","created_at":"2026-04-15 10:08:59","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":844590,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 3.2.\u003c/strong\u003eArchitecture of the Swin Transformer explaining extraction of features in a hierarchy with multiple steps.\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-9407693/v1/5b5d8419bde6232a00d7cab4.png"},{"id":106967060,"identity":"ddd7ca46-577e-4e65-9593-2ea575c65d13","added_by":"auto","created_at":"2026-04-15 10:02:42","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":520240,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 3.3.\u003c/strong\u003e Proposed Architecture for Prostate Cancer Detection\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-9407693/v1/d577503d071ddd3ab2539982.png"},{"id":106967185,"identity":"8e656eee-2b70-4c45-999d-974f4107a730","added_by":"auto","created_at":"2026-04-15 10:03:39","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":16093,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 4.1. Confusion Matrix for Prostate Cancer Detection using Vision Transformer\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-9407693/v1/991702beacbdb1cdf584278f.png"},{"id":106967207,"identity":"369e202f-f06c-48e6-b025-31ace2b27e29","added_by":"auto","created_at":"2026-04-15 10:03:47","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":289221,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 4.2. \u003c/strong\u003eConfusion Matrix for Prostate Cancer Detection using Swin Transformer\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-9407693/v1/291b8f87687b01d63769c090.png"},{"id":106967210,"identity":"6b1a7ebb-3b5c-4204-90df-6209e646b7ae","added_by":"auto","created_at":"2026-04-15 10:03:48","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":40053,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFigure 4.3. \u003c/strong\u003eReceiver Operating Characteristic (ROC) curve illustrating the classification performance of the proposed transformer-based prostate cancer detection model without the inclusion of clinical metadata.\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-9407693/v1/6206233a1dccc3ec22c81d57.png"},{"id":106971318,"identity":"0b762f13-a50f-463b-81eb-922963fdfa7c","added_by":"auto","created_at":"2026-04-15 10:18:42","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5016069,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9407693/v1/74c70724-fed5-47b5-acbc-c69285c9c56b.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eProstate Cancer Detection in Bi-parametric MRI Using Deep Learning Model\u003c/p\u003e","fulltext":[{"header":"CHAPTER ONE: INTRODUCTION","content":"\u003ch2\u003e1.1 Overview\u003c/h2\u003e\n\u003cp\u003eThe male reproductive system (as shown in \u003cstrong\u003eFigure 1.1\u003c/strong\u003e) [1] is a complicated network consisting of organs, and structure for production, storing and delivering the sperms. It also helps in the secretion of the male sex hormones such as testosterone. Its primary functions include spermatogenesis, hormonal production, and transfer of sperms into male reproductive system during intercourse, where spermatogenesis refer to as production of sperms. Mainly it can be divided into two parts internal organs and external organs [2]\u003c/p\u003e\n\u003cp\u003eExternal organs of male reproductive systems play an important role in reproduction, urination, and maintenance of temperature in for maximum production of sperms. These include the penis and the scrotum. The penis is mainly responsible for urination and sexual intercourse, made up of three main parts: the root, cylindrical shaped shaft containing specialized erectile tissues, and the glans penis. The loose bag of skin located just beneath the penis is called as scrotum, separated in two compartments and each of these compartment holds one testis. The main role of scrotum is to maintain the temperature, which is essential in production of normal sperms [3], [4].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eInternal reproductive organs are responsible to generate, develop, store, and to transport sperms. It is also responsible to release the necessary hormones that promote fertility [5], [6]. It generally includes different parts as mentioned below:\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003e\u003cstrong\u003eTestes or Testicles\u003c/strong\u003e (Responsible for generation of sperms and male testosterone).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eEpididymis\u0026nbsp;\u003c/strong\u003e(Long and coiled tube located the posterior side of each testis and responsible for maturation, storage and transportation of sperms).\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eVas Deferens\u003c/strong\u003e (A muscular tube, responsible for transportation of sperms from epididymis to ejaculatory tubes).\u0026nbsp;\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eSeminal vesicles\u003c/strong\u003e (Generally responsible for generation of 60% of the seminal fluid that contains fructose and other nutrients to nourish the quality of sperms).\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eProstate Gland\u003c/strong\u003e (A vital organ that is the size of a walnut having weight of 20~25 grams, and it is located directly below the bladder and envelops the upper portion of the urethra) [7],\u0026nbsp;\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003eProstate glands are also responsible for production of milky white fluid which forms the 30% portion of semen. This liquid is used to counteract the acidic nature of the vagina, hence protecting sperm and increasing their survival [8]. Other enzymes that are also produced by the prostate include prostate-specific antigen (PSA) that are necessary in the liquefaction of semen after ejaculation to enable sperm transport [9]. As a person gets old, the prostate usually enlarges, a condition that is referred to as benign prostatic hyperplasia (BPH) and in most cases, the prostate is also the location where prostate cancer arises, which is one of the most prevalent cancers in men [10].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe prostate gland is anatomically divided into different zones: peripheral zone (PZ), central zone (CZ) and transitional zone (TZ) [11]. The peripheral zone is the most common site of origin of most prostate cancers, and benign prostatic hyperplasia (BPH), the enlargement of the gland which commonly occurs with age, occurs in the transitional zone and may result in a blockage of urinary flow [12].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eWith aging, the prostate gland tends to increase in size naturally due to hormonal changes. These hormonal changes may include variations of testosterone and its active metabolite, dihydrotestosterone (DHT) [13]. This age-related enlargement may also lead to different Lower urinary tract symptoms (LUTS) [14] that may include weak urine stream, difficulty in withholding urination, inability to withhold urine and frequent urination at night [15]. Due to hormonal changes, size of the prostate gland also increases. This increases the risk for malignant transformation of prostatic tissue causes prostate cancer.\u0026nbsp;\u003c/p\u003e\n\u003ch2 id=\"_Toc223443094\"\u003e1.1.1 Prostate Cancer:\u003c/h2\u003e\n\u003cp\u003eProstate cancer (as shown in \u003cstrong\u003e\u003cem\u003eFigure 1.2\u003c/em\u003e\u003c/strong\u003e) [16] can be defined as a malignant disease caused by uncontrolled proliferation of abnormal cells in the prostate gland. It is usually initiated in glandular cells of the prostate (adenocarcinoma). This type of cancer is common in men with slow development and generally localized in prostate where it does not have adverse effects. But some of the aggressive forms of prostate cancer may spread very fast [17].\u003c/p\u003e\n\u003cp\u003eThe early diagnosis of prostate cancer is very crucial for effective treatment when it is located within the gland [18]. At an early stage, it is possible to consider different treatment alternatives that should be used to control the disease and change the outcomes of the patients. The timely screenings and medical interventions can greatly influence the diagnosis and treatment management of patients with prostate cancer [19], [20].\u003c/p\u003e\n\u003cp\u003eSome of the symptoms of prostate cancer include frequent urination impulse, difficulty in initiating and maintaining the urination, blood in urine, painful urination, and in some case, painful ejaculation and difficulty in attaining or maintaining erection [21], [22].\u003c/p\u003e\n\u003cp\u003eProstate cancer can be treated in different ways depending on the severity; in early stages, it can be treated through watchful waiting, prostatectomy (removal of prostate), brachytherapy (the use of radioactive seeds), conformal radiation therapy, and intensity modulation therapy. In other instances, combination of radiation therapy and hormone therapy can be recommended as treatment [20].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAt the advanced stages, a more severe treatment method is required, including chemotherapy to address cancer cells all over the body and androgen deprivation therapy to diminish the effect of male hormones that enhance cancer development. Hormone therapy may be needed in the long term, and a clinical trial might be considered a possibility in cases of ineffectiveness of other therapies. Radical prostatectomy cannot be adopted when cancer has escaped the prostate. The choice of treatment must be taken into consultation with urologists or oncologists depending on the situation [20].\u003c/p\u003e\n\u003ch2 id=\"_Toc223443095\"\u003e1.2 Motivation:\u003c/h2\u003e\n\u003cp\u003eProstate cancer is one of the major issues that affects the men around the world since it is the second most common and one of the major causes of cancer related deaths. Regardless of the major advancements in treatment methods as well as screening and diagnostic devices, it is spreading globally due to different factors such as aging, lifestyles and still unavailability of best screening processes. The timely and early detection of prostate cancer is one of the important aspects of prostate cancer diagnosis. Slow growing tumors are less harmful to the life of a patient, while the advanced tumors developed quickly with the risk of metastasis. The distinction between these different disease states is still a major problem in the clinical setting which highlights the importance of conducting clinical research to improve the detection of prostate cancer owing to the restrictions of the available diagnostic tools.\u003c/p\u003e\n\u003cp\u003eThe traditional approaches to screening prostate cancer such as prostate-specific antigen (PSA) tests and digital rectal examination (DRE) are impeded by the lack of specificity. The high levels of PSA may also lead to different diseases other than cancer such as benign prostatic hyperplasia (BPH) or infections and result in increased healthcare costs, unnecessary biopsies and patient distress. Another challenge of these conventional screening approaches is that they are invasive, subject to sampling errors, and cannot detect clinically significant tumors at all. These issues highlight the urgent need for improved diagnostic methods that provide clinicians with a more precise, objective, and non-invasive method of evaluation of the risk of prostate cancer.\u003c/p\u003e\n\u003cp\u003eRecently, multiparametric magnetic resonance imaging (mpMRI) has been an important method of prostate structure, detection of the location of tumors, and determination of lesion aggressiveness. Nevertheless, the ability of radiologists to interpret the findings of mpMRI strongly depends on their skills and experience, which leads to the fact that the diagnostic accuracy of the results obtained by different medical institutions and the work of different doctors is often different. The increased use of mpMRI has also burdened radiologists, and it is in this regard that automated, efficient and precise computer assisted diagnostic systems are needed.\u003c/p\u003e\n\u003cp\u003eThe recent advances in artificial intelligence, especially in deep learning and transformer-based models, offer a viable change to the issues encountered in detecting prostate cancer. As compared to conventional machine learning models, deep learning models possess the capability of automatically identifying complex patterns in a large volume of imaging data without necessarily requiring manual feature engineering.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eSwin Transformers and Vision Transformers (ViT) have become significantly useful in medical image processing because they are powerful frameworks that capture the long-range relations and the holistic background information that is important in detecting abnormalities in complex prostate MRI images.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eViT is also superior to Swin Transformer in terms of global feature representation with full self-attention, making it possible to capture the features of anatomical structures in a comprehensive way, whereas Swin Transformer presents a hierarchical, window-based attention mechanism that effectively captures localized and multi-scale features. Although they are increasingly used in medical imaging, the comparative advantages and drawbacks of the two transformer architectures in detection of prostate cancer have not been discussed exhaustively and in an orderly manner.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThis research is inspired by the requirement to determine a clear picture concerning how varied transformer-based attention mechanisms affect the performance of diagnostics in prostate cancer diagnosis. Instead of coming up with a hybrid architecture, this study is concerned with an effective comparative framework in which models of Vision Transformer and Swin Transformer are tested under the same experimental conditions. Through comparing their performance based on accuracy, sensitivity, specificity and robustness, the study will be able to determine which architectural design is more appropriate to detect clinically significant prostate cancer using MRI data.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe final goal of the presented comparative analysis will be to facilitate the creation of credible AI-assisted diagnostic aids that could assist medical professionals to identify prostate cancer at its early stages. This study aims to address the issue of needless biopsies, enhance diagnostic accuracy, and add to a more informed and individual treatment planning by offering evidence-based information regarding the efficacy of global versus hierarchical attention mechanisms. The further evolution of the knowledge about transformer-based models in prostate cancer diagnostics and development of AI-based technology in medical imaging are the main factors that will drive this work.\u003c/p\u003e\n\u003ch2 id=\"_Toc223443096\"\u003e1.3 Significance of Prostate Cancer Detection:\u003c/h2\u003e\n\u003col\u003e\n \u003cli\u003e\u003cstrong\u003eEarly Diagnosis reduces the cancer-related deaths:\u003c/strong\u003e It is very essential for prostate cancer to detect early as it increases the chances of survival of people, increased chances of successful treatment and reduced mortality due to this disease [19], [23].\u0026nbsp;\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eEffective detection of clinically significant vs. insignificant cancer:\u003c/strong\u003e To minimize unnecessary treatments and risks, effective detection strategies must be applied in differentiating between clinically important tumors and need urgent treatment and less aggressive cases that can be carefully observed [24].\u0026nbsp;\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eReduction in over treatment:\u003c/strong\u003e Proper diagnosis also aids in reducing overtreatment through unnecessary biopsies, surgeries, and radiation which may cause some complications such as urinary incontinence and erectile dysfunction [25].\u0026nbsp;\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eImproved Quality of Patient Life:\u003c/strong\u003e This timely and accurate diagnosis is not only beneficial to the patients, in that it allows them high-quality life by allowing the less invasive and individualized treatment programs, but also by empowering the health service providers to make informed judgments about the treatment course of action, based on the features of tumors and the evaluation of risk [26].\u0026nbsp;\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eEnhanced Clinical decision making:\u003c/strong\u003e The use of advanced imaging technologies and AI-based detection systems allows conducting targeted biopsies, which maximizes the diagnostic result and minimizes faults in the sampling process [27], [28].\u0026nbsp;\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eReduced healthcare costs:\u003c/strong\u003e Early and precise detection reduces the financial burden associated with repeated biopsies, advanced-stage cancer treatments, and hospitalizations [29].\u0026nbsp;\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003ePublic Health Planning:\u003c/strong\u003e The prevalence of detection methods and their accuracy is important to the public health planning, i.e. in this regard, implementing screening recommendations, early intervention program and resources distribution in the health care systems [30].\u0026nbsp;\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003ePersonalized Medicines:\u003c/strong\u003e Precise identification helps with the practice of personalized medicine, as it allows the evaluation of risks and customized treatment schemes, depending on the factors which include cancer grade, tumor size, and patient health status [31].\u0026nbsp;\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch2\u003e1.4 Aims and Objectives:\u003c/h2\u003e\n\u003cp\u003eThe main goal of our proposed research is to identify the clinically relevant prostate cancer detection using MRI images by evaluating and comparing the performance of two state-of-the-art deep learning transformers models such as Vision Transformer(ViT) and Swin transformer.\u0026nbsp;\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eTo propose an AI-powered diagnostics framework for accurate and efficient prostate cancer detection to mitigate the issues and drawbacks of existing traditional techniques such as PSA test, DRE and imaging analysis.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eTo propose a preprocessing framework of MRI images by applying different operations such as normalization, slice separation, and data augmentation.\u003c/li\u003e\n \u003cli\u003eTo compare the performance of Vision Transformer (ViT) and Swin Transformer models in identifying clinically significant prostate cancer in MRI scan data.\u003c/li\u003e\n \u003cli\u003eTo identify the most appropriate transformer-based architecture for prostate cancer detection and present evidence-based suggestions on further AI-aided diagnostic procedures.\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch2 id=\"_Toc223443098\"\u003e1.5 Research Questions:\u003c/h2\u003e\n\u003cp\u003eThe goals and motivations led to the following research questions\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eWhat is the comparison between Vision Transformer (ViT) and Swin Transformer models regarding the accuracy of detecting clinically significant prostate cancer with MRI images?\u003c/li\u003e\n \u003cli\u003eHow can we analyze and detect various diseases affecting rice leaves promptly?\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch2 id=\"_Toc223443099\"\u003e1.6 Contributions:\u003c/h2\u003e\n\u003cp\u003eParticularly our contributions to crop classification and disease analysis can be summarized below:\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003eWe proposed a comparative analysis of Vision Transformer (ViT) and Swin Transformer models in detection of prostate cancer with the input of prostate MRI images in a comprehensive and impartial manner. In contrast to other previous researchers that consider a single transformer model in comparison to others, the given work compares the two architectures within the same scope of the experiment allowing having a clear view of their weaknesses and advantages in comparison to each other.\u003c/li\u003e\n \u003cli\u003eOur research reveals the usefulness of transformer-based self-attention systems in discriminative feature learning on prostate MRI scans. Through global attention (ViT) and hierarchical window-based attention (Swin Transformer), the paper identifies the role of various attention strategies in the diagnostic performance of identifying clinically significant prostate cancer.\u003c/li\u003e\n \u003cli\u003eComprehensive analysis is performed with the help of confusion matrix analysis, precision, recall, F1-score, ROC-AUC, and K-fold cross-validation. This comprehensive evaluation offers strong support to validity of model, ability to generalize and diagnostic stability on separate data splits, which is required in clinical application.\u003c/li\u003e\n\u003c/ol\u003e\n\u003ch2 id=\"_Toc223443100\"\u003e1.7 Stakeholders:\u003c/h2\u003e\n\u003cp\u003eThe proposed research on the detection of prostate cancer with the help of Vision Transformer and Swin Transformer models involves the following stakeholders:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003e\u003cstrong\u003ePatients:\u003c/strong\u003e The major beneficiaries of this study, since proper and early diagnosis of prostate cancer can greatly enhance the outcome of its treatment, minimize unnecessary biopsies, and the general quality of life.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eRadiologists and Clinicians:\u003c/strong\u003e Clinical practitioners with access to AI-assisted diagnostic aids have the potential to use them to assist clinical decision-making, decrease inter-observer variability, and enhance confidence in diagnostic judgments in interpreting prostate MRI images.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eHealthcare Institutions and Hospitals:\u003c/strong\u003e The integration of automated prostate cancer detection systems is beneficial to Healthcare Institutions and Hospitals as it will improve efficiency in diagnosing patients, reduce workload on imaging departments, and improve patient management.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eMedical Researchers and Clinical Scientists:\u003c/strong\u003e Use the findings to further the study on the areas of medical imaging, oncology, and artificial intelligence, as well as help to create more efficient diagnostic methods.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eArtificial Intelligence and Technology Developers:\u003c/strong\u003e Learn how to use transformer-based architectures to develop medical imaging, facilitate the creation of the advanced AI model and clinical decision-support systems.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eRegulatory and Health Policy Bodies:\u003c/strong\u003e Can use the results of this study to guide the guidelines, standardization and assessment of AI-based diagnostic tools in clinical practice.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eMedical Imaging Equipment Manufacturers:\u003c/strong\u003e Take advantage of learning about the possibilities of synchronized AI models and MRI-data interaction and how this will lead to future imaging systems designed to respond to AI-assisted-diagnostics.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eAcademic Institutions:\u003c/strong\u003e Use the results of research in education, interdisciplinary cooperation, and additional research of AI-based solutions in healthcare.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eFunding Agencies and Research Sponsors:\u003c/strong\u003e The agencies and sponsors are interested in assisting innovative, impactful research with a solid clinical foundation and an opportunity to be implemented in the real world.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eEngaging these stakeholders during the research lifecycle increases the clinical relevance, translational capability and general impact of the suggested prostate cancer detection framework to healthcare systems and patient care.\u003c/p\u003e\n\u003ch2 id=\"_Toc223443101\"\u003e1.8 Outline of the Thesis:\u003c/h2\u003e\n\u003cp\u003eAccording to the intended research questions, the whole thesis is structured into five chapters each covering a particular aspect of the proposed research of the prostate cancer detection with the help of Vision Transformer and Swin Transformer models. The thesis is designed in the following way:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003e\u003cstrong\u003eChapter One\u003c/strong\u003e provides the background and motivation of the research by presenting an overview of cancerous diseases, medical imaging, and increasing the role of artificial intelligence in healthcare. This chapter progressively deals with the male reproductive system and prostate cancer, and the clinical significance of early and accurate prostate cancer detection. It provides the context of the insufficiency of traditional methods of diagnosis and encourages the necessity of AI-based solutions. The research problem, the research objectives, the research questions, the scope of the research and the main contributions of the thesis are also defined in the chapter.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eChapter Two\u003c/strong\u003e is an extensive literature review that provides the existing literature concerning the detection and segmentation of prostate cancer with the help of medical imaging and deep learning. It is a critical analysis of traditional machine learning techniques, convolutional neural networks-based techniques and new developments in transformer-based architecture. The gaps in research presented in this chapter are related to the gaps in the existing studies, in particular, the absence of comparative analysis of Vision Transformer and Swin Transformer models, thus providing a solid background to the proposed research.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eChapter Three\u003c/strong\u003e provides a description of the research methodology and experimental framework. The first part of this chapter discusses the theory of deep learning and transformer architectures, both Vision Transformer and Swin Transformer. It then gives outlines of the dataset utilized, data processing process, data augmentation, and feature extraction process. The entire experimental design, model training strategy, and evaluation protocols have been detailed in the chapter to make a fair and unbiased comparison of the two transformer models.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eChapter Four\u003c/strong\u003e concentrates on the application and experiment of the suggested models. It shows quantitative performance analysis in terms of accuracy, precision, recall, F1-score, ROC-AUC and in cross-validation analysis. Comparison of results between Vision Transformer and Swin Transformer is addressed in detail and analysis of confusion matrix and ROC curve is presented. The chapter is also an interpretation of the results, the strengths and limitations of the model and an analysis of their clinical relevance.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eChapter Five\u003c/strong\u003e is the conclusion of this thesis that sums up the main findings and contributions made by research. It also contemplates the success of the transformer-based models to detect prostate cancer and the implications of practical usage in the clinical setting. The chapter also indicates the shortcomings of the present study and gives possible directions to future research, i.e., multimodal data integration and real-time clinical applications.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eSuch systematic organization secures a logical series of ideas and gives a clear track of ideas in projecting a problem formulation to experimental validation and conclusion.\u003c/p\u003e"},{"header":"CHAPTER TWO: LITERATURE REVIEW","content":"\u003cp\u003eGlobally, prostate cancer has become an issue of concern and is currently one of the most prevalently diagnosed cancers in males and a leading cause of cancer-related mortality. Early and accurate detection of clinically significant prostate cancer is essential to provide early intervention and improved patient outcomes. Deep learning and artificial intelligence (AI) have changed image analysis in medicine, and now new opportunities are available in the field of automated prostate cancer history. Numerous studies have focused on the development of AI-based systems that could help radiologists with screening and interpretation of multiparametric magnetic resonance imaging (mpMRI), which has become the gold standard in prostate cancer imaging. In the quest to understand the development and limitations of modern technologies, it is essential to reflect on past studies that have been used to establish the frontline in the same discipline.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e2.1. Classical CNN-based Approaches:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eConvolutional neural networks (CNNs) were the leading innovations in the early stage automated prostate cancer detection because of their capabilities to learn powerful features and their ability to identify spatial patterns. Research by Abdelmaksood et al. [32] proposed a three-dimensional (3D) AlexNet-based deep learning model to segment and classify prostate cancer in MRI volumes. Their experiment depicted the effectiveness of 3D CNNs to establish volumetric spatial associations necessary in determining the border of lesions and the malignancy of tumors. Nevertheless, the study noted that architectural optimization can avoid overly demanding computations. Deep learning also penetrated histopathological analysis, since Duran et al. [33] described a CNN-based system to classify patches at the patch scale on digitized whole-slide images. Their patch scoring methodology considerably decreased computational cost without impacting diagnostic accuracy, demonstrating the scalability of deep learning to large medical processes.\u003c/p\u003e\n\u003cp\u003eTo improve clinical utility and nurture confidence, interpretability became a significant aspect to be considered in the design of a system. Hassan et al. [34] proposed an explainable AI (XAI) system which is a combination of ultrasound and MRI modalities, essentially as a combination of deep learning predictions and visualization tools that provide explanations which are important to the clinician. Their study highlights the importance of implementation of AI in radiological practice, especially in a scenario where clinical validation and transparency are of utmost significance.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e2.2. Hybrid CNN and Segmentation-driven Models:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eSeveral studies have also proposed hybrid and multitask architecture which integrates segmentation and classification to improve the accuracy of tumor localization. Singla et al. [35] proposed a hybrid framework that combines U-Net and CNN to score and segment prostate lesions at the same time in MRI. Although they had high levels of accuracy, their model had challenges where the amount of memory used was large as well as the complexity in the runtime, which restricted the real-time application of CNN-based systems. Ilesanmi et al. [36] suggested an advanced 3D CNN method that uses preprocessing methods such as cropping and resampling to normalize MRI data. The sensitivity of their tumor boundaries facilitated the significance of preprocessing in enhancing CNN learning.\u003c/p\u003e\n\u003cp\u003eSemi-supervised methods were more popular to address a small amount of labeled data. Sammouda et al. [37] applied k-means clustering to present the boundaries of coarse lesions without annotations. Although their technique improved the imaging of tumors, they had problems with the high-dimensional noise and were not precise in complex anatomical regions.\u003c/p\u003e\n\u003cp\u003eInvestigations on higher-level segmentation networks were undertaken. Jin et al. [38] adopted a bicubic interpolated 3D U-Net architecture that is used to preprocess. According to their study, they have demonstrated better structural coherence of segmented prostate structures in benchmark datasets such as PROMISE-12. Also, Jiangtao et al. [39] performed an in-depth comparison of different architectures, such as FCN, ResNet, and U-Net derivatives, on some of the most popular datasets (ProstateX, Decathlon, PROMISE-12). Their study validated that they made progress in segmentation accuracy but raised the issue of reproducibility and standard evaluation systems.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e2.3. Deep Learning in Histopathology and Feature Enrichment:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eGoing beyond MRI imaging, Ayyad et al. demonstrated significant performance in histopathological diagnosis on deep neural networks, entropy-based texture analysis, and ensemble classification. These were radical techniques that accelerated the process of making diagnostic decisions and reducing the subjective nature of making those decisions through manual interpretation. Similarly, Rundo et al. [40] have extended image functionality into the use of squeeze-and-excitation (SE) blocks into U-Net, which improves contextual sensitivity and zonal classification even when the acquisition protocols change. Similarly, Hambarde et al. [17] utilized a deep U-Net to perform lesion classification, but due to the difficulties of longer models training and inference times, clinical scalability was not possible. Taking a radiomics perspective on the domain, Bleker et al. [41] came up with a lesion volume definition model based on AI, which absorbed the quantitative image features extracted. Although this progress strengthened diagnostic accuracy, the authors admitted the shortcomings of the model around the generalization of the model on the unseen data.\u003c/p\u003e\n\u003cp\u003eThe recent issues in CNN-based models of prostate cancer detection, such as the inability to capture the global context and excessive computational demands, have developed the transition to transformer-based models in medical imaging. Vision Transformer (ViT) proposed the use of self-attention, which acts on images in the form of the patch sequences, allowing them to learn global features more effectively than CNNs. Nonetheless, the fact that ViT requires large, annotated datasets is a barrier to prostate MRI because of the lack of expert annotations. Conversely, the Swin Transformer focuses on these challenges by using self-attention in shifted windowing, which allows hierarchical feature learning and has a better computational complexity and localized abnormality detection. Its spatial bias improves generalization of different datasets.\u003c/p\u003e\n\u003cp\u003eHowever, despite such improvements, the current literature does not provide a direct and systematic comparison between ViT and Swin Transformer with a specific focus on clinically significant prostate cancer detection with the use of mpMRI. Previous research is dedicated to a single model and seldom addresses whether the model is computationally feasible, resistant to imaging variability, and applicable in the real world. Therefore, there remains a research gap on the issue that which transformer architecture can be best diagnostic and most useful in clinical practice.\u003c/p\u003e\n\u003cp\u003eTo fill this gap, the present study will focus on the objective and exhaustive assessment of ViT and Swin Transformer models in equal conditions experimentally. It is expected that the results will guide the choice of the best transformer-based method to improve the early and accurate detection of the prostate cancer, which will eventually improve clinical decision support and patient outcomes.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e2.4.\u0026nbsp; Summary\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe literature reviewed indicates that significant advancements have been made in the field of artificial intelligence-based prostate cancer detection, and the initial research mostly involved convolutional neural networks, the U-Net variants, hybrid deep learning models, and radiomics-based methods. Though these techniques have shown promising segmentation and classification results, they all have weaknesses, such as limited receptive fields, high computational complexity, long training durations, and poor long range contextual information clustering capabilities that the prostate MRI contain. Recent developments of transformer-based architecture have demonstrated high promise in addressing these challenges with the help of self-attention mechanisms that can model global and hierarchical relationships.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eNonetheless, the available literature usually considers a particular model of transformer or considers transformers as part of hybrid systems, without systematic and fair comparison of the various architectures of transformers. Specifically, it is evident that there is no comparative analysis assessing the capabilities of Vision Transformer that can be compared to Swin Transformer regarding the application to clinically significant detection of prostate cancer because of the global attention mechanism versus the hierarchical window-based attention. This research gap inspires the proposed study that presents a single experimental framework to strictly contrast between Vision Transformer and Swin Transformer models under the same conditions.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe proposed framework of comparison will offer evidence-based information to ensure the selection of transformer structures to be used in prostate cancer detection systems that are reliable and clinically feasible by addressing the following issues: diagnostic performance, robustness and practical feasibility.\u003c/p\u003e\n\u003cp id=\"_Toc201745571\"\u003e\u003cstrong\u003eTable 2.\u003c/strong\u003e\u003cstrong\u003e1\u003c/strong\u003e\u003cstrong\u003e. Summary of Recent Studies on\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eProstate Cancer Detection using Deep Learning Techniques\u003c/strong\u003e\u003c/p\u003e\n\u003ctable\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eRef.\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eAuthor(s) \u0026amp; Year\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eData Modality / Dataset\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eMethodology / Model\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTask\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eKey Findings\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eLimitations\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e[32]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eAbdelmaksoud et al., 2021\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eProstate MRI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3D AlexNet, ResNet-50, Inception-V4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSegmentation \u0026amp; Classification\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3D CNNs effectively captured volumetric spatial features; AlexNet achieved good accuracy with lower complexity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLimited global contextual understanding; architecture-dependent performance\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e[33]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eDuran et al., 2020\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eHistopathology (WSI)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCNN with patch scoring\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eClassification\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eHigh accuracy with reduced computation; suitable for real-time analysis\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eFocused on histology, not MRI; limited anatomical context\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e[34]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eHassan et al., 2020\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMRI + Ultrasound\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCNN + Explainable AI (XAI)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eClassification\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eImproved diagnostic transparency and clinical trust\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eComputationally expensive; limited scalability\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e[35]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSingla et al., 2021\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003ePROMISE-12 (MRI)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eHybrid U-Net + CNN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSegmentation \u0026amp; Classification\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eHigh segmentation accuracy and anatomical precision\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eHigh memory usage; long training time\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e[36]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eIlesanmi et al., 2024\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eProstate MRI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3D CNN\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eClassification\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eStrong spatial feature learning for tumor localization\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003ePoor generalization across datasets\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e[37]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSammouda et al., 2021\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eProstate MRI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eK-means + Optimized Segmentation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSegmentation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eReduced need for labeled data; improved visualization\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSensitive to noise; weak performance in high-dimensional data\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e[38]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eJin et al., 2021\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003ePROMISE-12, TPHOH\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3D V-Net + Interpolation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSegmentation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eHigh segmentation accuracy; improved preprocessing effectiveness\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLimited classification capability\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e[39]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eJiangtao et al., 2025\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eMultiple MRI datasets\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eReview: FCN, U-Net, ResNet\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSegmentation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eDeep learning outperforms classical methods\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLack of standardized benchmarks\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e[40]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eRundo et al., 2022\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eProstate MRI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eU-Net + SE Blocks\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSegmentation\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eImproved cross-dataset generalization\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eIncreased architectural complexity\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e[42]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eHambarde et al., 2022\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eT2-weighted MRI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eDeep U-Net\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSegmentation \u0026amp; Classification\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eHigh accuracy in lesion detection\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLong training/testing time\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e[41]\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eBleker et al., 2022\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eProstate MRI\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eRadiomics + DL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eDetection\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eImproved lesion localization\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eInefficient for real-time deployment\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e"},{"header":"CHAPTER THREE: METHODOLOGY","content":"\u003ch2\u003e3.1 Theoretical Background\u003c/h2\u003e\n\u003ch3\u003e3.1.1 Introduction to Deep Learning\u003c/h3\u003e\n\u003cp\u003eDeep learning has become a paradigm shift in medical image analysis that allows automated systems to learn high-level and complex representations directly out of raw imaging data. With the issue of prostate cancer diagnosis, deep learning has shown considerable promise in providing the solution to the problem of the traditional healthcare method of diagnosis, where it is frequently dependent on the knowledge of radiologists as well as inter-observer variability. With massive medical imaging data sets and the state-of-the-art neural network designs, deep learning models can learn to identify minute tissue changes and morphological anomalies that are suggestive of malignant changes in prostate tissue. CNNs have long held pre-eminence in analysis of prostate MRI as they have good performance in the extraction of localized spatial features.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe CNN-based models, however, are constrained by their fixed receptive fields, making these models less effective to model long distance relationships and global contextual dependencies in complex anatomical structures. These are especially crucial when it comes to the process of detecting prostate cancer, as the lesions can be diffuse, heterogeneous, and contingent upon the context. New developments in deep learning have seen the introduction of transformer-based architectures which overcome these limitations by using self-attention mechanisms enabling models to capture local and global image context. In the framework of the proposed research, deep learning is the pillar that will allow exploring and comparing Vision Transformer and Swin Transformer architectures when detecting prostate cancer. They are the main advancements in the design of deep learning that integrate hierarchical representation learning with awareness of the global context. With such a deep learning application, the proposed study is expected to enhance the accuracy of diagnosis, increase the generalizability of models, and offer meaningful clinical information. Finally, deep learning does not only enable automated detection of prostate cancer but also contributes to the creation of consistent decision-support systems that can help clinicians detect prostate cancer early and develop a treatment plan based on a patient.\u003c/p\u003e\n\u003ch3 id=\"_Toc223443106\"\u003e3.1.2 Vision Transformers:\u003c/h3\u003e\n\u003cp\u003eVision Transformer (ViT) have turned out to be a highly competitive alternative to convolutional neural networks (CNNs) by visual understanding tasks by applying the transformer architecture, initially developed to work with natural language to image-based data. In contrast with CNNs, which use convolution operations with local receptive fields, ViT models use self-attention to identify spread long-range dependencies, and global contextual information in a complete picture. This feature makes Vision Transformers especially ideal to medical imaging tasks that are complex like prostate cancer detection, where anatomy, tissue heterogeneity, and fine lesion patterns are vital in obtaining accurate diagnosis.\u003c/p\u003e\n\u003cp\u003eViT architecture starts by the process of tokenization of images, during which an input image is broken into a series of fixed-size and non-overlapping patches, typically of size 16\u0026times;16 pixels. The patches are all flattened into one-dimensional vectors and then subjected to a linear projection layer to produce patch embeddings of uniform dimensionality. As transformer models do not encode spatial relations, to maintain the spatial structure of the image, positional embeddings are learnable and are added to each patch embedding. Also, a classification token ([CLS]) is included, which is learnable and comes before the sequence, and serves as a global representation of the image and will be used in the ultimate classification tasks.\u003c/p\u003e\n\u003cp\u003eThe patch sequence with the embedded patch is subsequently inputted into a stack of transformer encoder layers, the main part of the ViT architecture. Both layers of encoders have two main elements:\u003c/p\u003e\n\u003col\u003e\n \u003cli\u003ea multi-head self-attention (MHSA) mechanism and\u0026nbsp;\u003c/li\u003e\n \u003cli\u003ea feed-forward neural network (FFN)\u003c/li\u003e\n\u003c/ol\u003e\n\u003cp\u003e\u003cstrong\u003eSelf-attention mechanism\u003c/strong\u003e enables each patch of an image to focus on all other patches in the image, making the model learn about distant relations of the image. This is an important feature compared to CNNs, which needs to stack many layers in order to estimate the global context. This allows ViT models to learn a spatially separated yet clinically related feature, e.g. gland boundaries and lesion localization, in prostate MRI analysis.\u003c/p\u003e\n\u003cp\u003eMulti-head attention mechanism further promotes representational power, as the model can focus on the information provided in several subspaces at the same time. The different attention heads learn complementary spatial and semantic information in two different perspectives of patch relationships. The results of these attention heads are summed together and fed through a feed-forward network to generate refined feature representation. The architecture uses layer normalization and residual connections to stabilize the training and enhance convergence.\u003c/p\u003e\n\u003cp\u003eOn the last node of the network the output of the [CLS] token is removed and sent to a classification head which is generally a fully connected layer. This token is used to sum up all of the global information acquired by all patches and it forms a small, discriminative feature, that is used in classifying prostate cancer. When used in transfer learning, ViT architectures, e.g., ViT-Base-Patch16-224, can be trained on medical images, not just with inputs of a low count, but also with little to no labeled images. Vision Transformers have one of the strongest strengths because they can model global anatomical context, which is crucial in medical imaging. Lesions in prostate cancer can no longer be described based on local texture variation but in terms of its relation to the adjacent anatomic structure. ViT models are more skilled at capturing these holistic patterns complementing sensitivity to clinically significant prostate cancer.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eNonetheless, ViT architectures have adverse qualities such as high computational cost and lower inductive bias against local spatial features. Therefore, they may need huge volumes of data or pretrained initialization to perform optimally. All these restrictions notwithstanding, Vision transformers have proven to perform better in extensive medical imaging tasks such as disease classification, organ segmentation and lesion detection. They are well suited to AI-driven prostate cancer diagnosis owing to their capability to give such high-quality global features and are an interesting benchmark to comparison with hierarchical transformer models like the Swin Transformer. Vision Transformers are used in this study to investigate their usefulness in inference of global contextual characteristics of prostate MRI images. Their work is compared to Swin Transformers in a systematic comparison to determine the effect of various attention strategies on the diagnostic accuracy, robustness and clinical applicability of prostate cancer detection.\u003c/p\u003e\n\u003ch3 id=\"_Toc223443107\"\u003e3.1.3 Swin Transformers:\u003c/h3\u003e\n\u003cp\u003eSwin Transformer (Shifted Window Transformer) is a hierarchical vision transformer model that is meant to overcome several shortcomings of regular Vision Transformers, such as high computational complexity and absence of inductive bias to local spatial features. The Swin Transformer, which is introduced as an efficient substitute of global self-attention, incorporates window-based self-attention having a mechanism of shifting that allows both local and global modeling of features. The design is particularly applicable to high-resolution medical images during medical imaging, like detecting prostate cancer, where local fine details are necessary, and larger anatomical features are needed.\u003c/p\u003e\n\u003cp\u003eIn contrast to Vision Transformers, which perform self-attention on all patches of the image at the same time, Swin Transformer divides the input image into disjointed local windows and carries out self-attention in each window separately. This self-attention on a window is much less complex than quadratic in terms of computations of image size. In order to provide information exchange across windows, the Swin Transformer places shifted windows between consecutive transformer layers. The changing strategy also enables the neighboring windows to overlap between layers, which facilitates global contextual learning without losing efficiency.\u003c/p\u003e\n\u003cp\u003eThe Swin Transformer architecture is hierarchical like the convolutional neural networks. The model has several steps with each working at a decreasing spatial resolution but a growing dimensionality of features. Patch merging layers at every level are used to reduce feature map dimensions by combining patch features of neighbors and hence building multi-scale representations. Such hierarchical aspect extraction is of great benefit in the analysis of prostate MRI as the lesions may differ in size, shape, and location in the prostate gland.\u003c/p\u003e\n\u003cp\u003eBoth Swin Transformer blocks have two fundamental blocks, namely, Window-based Multi-Head Self-Attention (W-MSA) or Shifted Window Multi-Head Self-Attention (SW-MSA), and a feed-forward neural network (FFN). To stabilize the training and help flow the gradient, layer normalization and residual connections are used. The Swin Transformer efficiently attends both local and long-range dependencies in a block of W-MSA and SW-MSA by switching between the two, which is not excessively expensive to compute.\u003c/p\u003e\n\u003cp\u003eIn prostate cancer detection, Swin Transformer can be used to detect localized abnormalities including the small or low-contrast lesions that global attention models may not detect. It has an inductive bias to spatial locality, which enhances its resistance to noise and inter-scanner variation, which are typical to multi-center prostate MRI datasets. Additionally, hierarchical feature maps generated by Swin Transformer are just a good fit in the anatomy of the prostate and, therefore, allow to better differentiate between benign and cancerous areas of tissue. Although it has advantages, Swin Transformer can be less sensitive to global contextual information than Vision Transformers, especially when there is limited information. Nonetheless, its execution speed, ability to model high-resolution images, and the ability to model local features are its attractions in clinical functions. Swin Transformer is compared to Vision Transformer in this study in systematic ways to examine the impact of various attention strategies to the diagnostic accuracy, robustness and clinical feasibility in prostate cancer detection.\u003c/p\u003e\n\u003ch2 id=\"_Toc223443108\"\u003e3.2 Materials:\u003c/h2\u003e\n\u003ch3 id=\"_Toc223443109\"\u003e3.2.1 Dataset:\u003c/h3\u003e\n\u003cp\u003eThe data used in this study is the Prostate Imaging: Cancer AI (PI-CAI) challenge, which is famous as one of the most large and comprehensive sources available to the public to detect clinically significant prostate cancer (csPCa) with the help of MRI. The data is essential to the development and testing of AI algorithms that can detect and identify csPCa with considerable accuracy and reliability in diverse clinical settings and aid in developing improved diagnostics to detect prostate cancer.\u003c/p\u003e\n\u003cp\u003eThe PI-CAI dataset is represented by multiparametric MRI (mpMRI) scans acquired in various foreign medical centers, representing a diverse range of MRI machinery, imaging procedures and image characteristics. This variety enhances the generalizability of AI-based models which are trained on the data and reflects the real clinical heterogeneity. The most common sequences of MRI used in assessing prostate cancer in each patient scan are three fundamental types:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003e\u003cstrong\u003eT2-weighted (T2W)\u003c/strong\u003e imaging to provide specific imaging of the structure of the prostate gland in relation to its surrounding structures.\u003c/li\u003e\n \u003cli\u003e\u003cstrong\u003eDiffusion-weighted imaging (DWI)\u003c/strong\u003e emphasizes limited diffusion of water which is frequently associated with malignant lesions.\u003c/li\u003e\n \u003cli\u003eQuantitative maps of Apparent Diffusion Coefficient (ADC) which provide quantitative data about cellularity in tissues and malignancy of tumors.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe interesting aspect about the dataset is that its ground truth labels are professionally annotated by highly qualified radiologists and they are verified using consensus protocols.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThese annotations define the occurrence, location and size of the clinically significant prostate cancer, making the data set appropriate to be used in both classification and segmentation. Having biopsy-verified patients in the dataset, the dataset guarantees strong clinical reliability in supervised learning processes.\u003c/p\u003e\n\u003cp\u003eBesides imaging data, the dataset has the metadata with patient, study-identifiers, lesion, prostate-zone, and clinically-relevant-descriptors. This metadata is essential in case organization, correlation of MRI images with ground truth labels, and the implementation of patient-level classification functions. \u0026nbsp; Essentially, the PI-CAI dataset provides a robust and clinically realistic foundation on the measurement of transformer-based architectures in prostate cancer diagnosis. By its scale, variety, and radiologist-certified labels, it is an ideal source to compare high-level deep learning models to a real-life diagnostic practice.\u003c/p\u003e\n\u003ch2 id=\"_Toc223443110\"\u003e3.3 End-to-End Proposed Framework for Prostate Cancer Detection\u003c/h2\u003e\n\u003ch3 id=\"_Toc223443111\"\u003e3.3.1 Data Augmentation:\u003c/h3\u003e\n\u003cp\u003eTo improve resilience, adaptability, and classification of transformer-based deep learning models in detecting prostate cancer, a careful medical image augmentation plan was deployed. This is important in addressing the imbalance inherent in the datasets of clinically significant prostate cancer where the higher ISUP grades lesions can be underrepresented. The augmentation pipeline was designed to focus on the distribution of ISUP grades, where augmentation frequencies are adjusted according to the distribution of classes (4 and 5) so that minority classes could obtain a larger amount of synthetic information (so as to deal with the issue of class imbalance when training a model).\u003c/p\u003e\n\u003cp\u003eThe augmentation methods were selected well to effectively replicate the usual clinical variations that one may come across in prostate MRI scan. These were rotational variations up to 45, scale and shifts, and affine transformations with horizontal and vertical shear. Flips up and down and left and right were made to imitate the variability of anatomy. Random increases and decreases in brightness were introduced to add light intensity variation whilst maintaining constant contrast levels which increased visual diversity without affecting important diagnostic characteristics. All augmentations were performed slice by slice on 3D images of MRIs to preserve spatial coherency between prostate structures.\u003c/p\u003e\n\u003cp\u003ePrecautions were observed to clip negative pixel values, which may occur due to preprocessing or due to calibration peculiar to a scanner, to avoid transformation to valid image intensity profiles. After augmentation, the processed slices were recreated and put back into their original 3D image forms and stored in separate augmented cases to ensure a smooth fit into the model.\u003c/p\u003e\n\u003cp\u003eBasically, such an augmentation framework is very useful in enhancing the diversity of the dataset, tackling class imbalance problems, and strengthening the ability of the model to detect prostate lesions in different imaging conditions. The augmented dataset by offering the Vision Transformer and Swin Transformer architecture to a wider range of variations applicable in clinical practice helps a lot in improving diagnostic accuracy and reducing the risk of overfitting.\u003c/p\u003e\n\u003ch2 id=\"_Toc223443112\"\u003e3.3.2 Feature Extraction:\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003eA. \u0026nbsp;Feature Extraction using ViT:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eOne of the most essential phases of creating an AI-based diagnostic system is feature extraction because the quality of learnt image representations is crucial to the ability to detect malignancy in the prostate with high accuracy. Transformer-based features were obtained in this work with the model of Vision Transformer (ViT-Base-Patch16-224), which is the most recent model of deep learning that has proven to be outstanding across numerous computer vision scenarios. ViT is by far one of the first and the most influential applications of the framework of transformers in visual learning and a paradigm shift to self-attention fluid in place of convolutional feature extraction.\u003c/p\u003e\n\u003cp\u003eThe ViT-Base-Patch16-224 model uses fixed-size patches as opposed to pixel-local receptive field. The size of each input MRI image is reduced to 224 pixels in width and height, and each input image is divided into 16x16 pixel patches, thereby generating 196 patches in total. The patches are flattened and embedded to a 768-dimensional space through a learnable linear layer. Positional encodings are then added to these patch embeddings to maintain the spatial context that is otherwise lost with the process of linearization.\u003c/p\u003e\n\u003cp\u003eAnother distinct aspect of ViT architecture is the [CLS] token which is a learnable vector added to the patch sequence. The encoder of the transformer is utilized through several layers, with global multi-head self-attention in place whereby every patch attends all other patches in the image. This process provides ViT with a better capacity to detect anatomical relationships of the prostate in a holistic manner like diffuse lesion margins, shape variations or microstructural changes of tissues that are normally overlooked by conventional CNNs that depend upon narrow receptive fields.\u003c/p\u003e\n\u003cp\u003eThe last encoding of the [CLS] token is a compact but a complete feature description of the whole image that encodes the discriminative information of the image. These (size: 1 \u0026times; 768) features have been flattened and stored together with respective MRI image paths within a CSV file to be traceable during dataset management. To be scalable to large MRI datasets, processing is done in memory-efficient mini-batches and extraneous variables are cleared after each step through garbage collection to avoid depleting the GPU memory.\u003c/p\u003e\n\u003cp\u003eWith the ViT-Base-Patch16-224 pretrained, one can use the transfer learning opportunity, as it is trained on millions of ImageNet-21k samples and therefore, not as reliant on large manually labeled prostate MRI datasets, which are expensive and time-intensive to generate. In addition, ViT is better at identifying the existence of small nodules of the prostate, including low-contrast clinically important lesions or early-stage carcinoma.\u003c/p\u003e\n\u003cp\u003eTo conclude, ViT-Base-Patch16-224 is a potent feature extraction backbone that can encode global contextual data of MRI slices. The downstream classification module is facilitated by its quality feature representations, and it results in the improved detection accuracy of prostate cancer compared to the classical methods of CNN-based detection. This is a powerful basis of the proposed prostate cancer diagnostic framework because it is a transformer-driven feature extraction strategy.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eB. \u0026nbsp;Feature Extraction using SwT:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eA Swin Transformer model was also utilized to extract multi-scale characteristics of prostate tissue on MRI slices to complete the feature extraction by the Vision Transformer. The Swin Transformer is a hierarchical vision transformer architecture that is radically different to the typical ViT architecture in that it integrates local and global features learning in an efficient way. It is very appropriate in identifying clinically subtle lesions of prostate cancer in MRI scans due to its performance benefits.\u003c/p\u003e\n\u003cp\u003eThe pipeline of extraction of features starts with loading each three-dimensional MRI scan using the Simple ITK reader. The preprocessing of volumetric slices is done so that they can be compatible with the Swin Transformer input structure. The image channels are arranged into a (H \u0026times; W \u0026times; C) format, and three channels are discarded as a simulation of the input of the RGB, with additional dimensional data eliminated. A normalization operation maps pixel intensity values in an image to the normalized range of 0255, such that the brightness of different scans and scanners is always the same.\u003c/p\u003e\n\u003cp\u003eThe extractor of Swin Transformer features is an internal up-scaling and patching of the MRI slices combined with shifted-window self-attention, which is one of the most important mechanisms that enhance computational efficiency and enable the model to learn local abnormalities with a contextual continuity. Swin Transformer can generate hierarchical feature maps by calculating the attention in windows that cross layers, which is especially useful to detect small cancer lesions that are located deep in the prostate tissue.\u003c/p\u003e\n\u003cp\u003eThe trained Swin Transformer produces a contextual representation of each slice of the MRI during the feature extraction task, which is produced by its final attention block. The initial sample of the last hidden representation, similar to the [CLS] token of ViT, is taken out as a feature representation of the whole semantic structure of the image. This high-dimensional vector holds the learned discriminative patterns that can differentiate between the malignant and non-malignant tissue.\u003c/p\u003e\n\u003cp\u003eThis is done to allow large scale processing by batch extraction using dynamic memory management via Python garbage collection so that the GPUs are not overloaded. The feature vectors are flattened and each feature vector stored in a CSV file along with the corresponding MRI slice path, forming a structured dataset that can be further processed by classification.\u003c/p\u003e\n\u003cp\u003eConclusively, feature extraction using Swin Transformer enables the model to utilize local region attention and global anatomy effects that enhance the reliability of the model in detecting prostate cancer in a wide range of MRI cases. This feature is strong and improves the performance of classification and is vehemently in favor of the application of an automated clinically applicable detection system.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eC. \u0026nbsp;Ensemble Learning using Vision Transformer and Swin Transformer:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eEnsemble learning was considered to exploit the strengths of Vision Transformer (ViT) and Swin Transformer used complementary to detect prostate cancer on MRI images. Transformer-based architectures have variation in capturing image representation, Vision Transformer is built to capture contextual relationships globally of the whole image whereas Swin Transformer builds upon hierarchical and local feature representation by attention via windows. Incorporating these architectures in an ensemble framework enables the model to gain the advantages of global and local learning of features, which enhances the diagnostic robustness.\u003c/p\u003e\n\u003cp\u003eIn this experiment, the models of transformers were trained separately to discriminate against prostate tissue of MRI scans. The Vision Transformer takes the image as a sequence of patches and uses global self-attention to find long-range dependencies, which is useful in identifying diffuse tumor patterns and the general anatomical context. Conversely, Swin Transformer uses self-attention on shifted windows and subsequently constructs multilevel feature maps that help to effectively detect localized lesions and structural abnormality.\u003c/p\u003e\n\u003cp\u003eThe ensemble learning method combines the outputs of both frameworks in order to obtain a final ruling. The probability-based aggregation strategy has been used, according to which models have been used to combine prediction scores between ViT and Swin Transformer, which aim to decrease the uncertainty of the models and increase classification stability. It is a strategy that prevents the shortcomings of each model, especially with difficult cases with subtle lesions or heterogenous appearance on image.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe reason behind using ensemble learning is the phenomenon of complementary behavior regarding the two transformer architectures in case-level analysis. The same model would tend to miss a case when the other mis interpolated the same case with a high probability also showing that the integration of them can minimize false negatives and promote clinical reliability. This has been especially relevant in detecting prostate cancer where a missed diagnosis can be of great clinical importance. Overall, Vision Transformer and Swin Transformer ensemble learning offer a diagnostic framework with a balanced representation of the global contextual understanding and hierarchical local feature modeling. Despite the fact that the main research topic is comparative analysis, the ensemble point of view shows the potential of forming a combination of transformer architectures to achieve better performance, stability, and clinical utility in AI-assisted prostate cancer detection systems.\u003c/p\u003e\n\u003ch2 id=\"_Toc223443113\"\u003e3.4 Summary:\u003c/h2\u003e\n\u003cp\u003eChapter 3 introduced the overall methodological framework of the proposed study, which was based on the use of deep learning models built using transformers to identify prostate cancer using medical imaging data. The chapter started by providing the theoretical principles of deep learning and transformer architecture, their applicability, and their benefits in medical image analysis. The primary focus was put on Vision Transformer (ViT) and Swin Transformer architectures, their design principles, attention, and applicability in obtaining both global and local structural information in prostate MRI scans.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe chapter then outlined the nature of the dataset, data preprocessing methods and data augmentation techniques that were adopted to increase the quality of data, and the generalization of the model. Image normalization, resizing, slice selection, and medically relevant enhancement techniques were discussed to provide consistency across samples and resistance to variability in imaging. ViT and Swin Transformer feature extraction pipelines were described, which described the process of obtaining high-level discriminative features of MRI images with pretrained transformer backbones.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eMoreover, Chapter 3 described the experimental design to be used to have a fair and unbiased comparison between the two transformer architectures. These were regular trains and training policies, hyperparameter policies, loss functions, and evaluation measures. The reasons why they are using transfer learning, processing in batches, and early stopping were also explained to overcome the issue of limited data and overfitting. Overall, the chapter provided a sufficient methodological basis to the research since it clearly specified the data processing pipeline, model architectures, and experimental protocols. The clear explanation in Chapter 3 makes it reproducible and prepares the stage of implementation and experimental findings in the next chapter.\u003c/p\u003e"},{"header":"CHAPTER FOUR: EXPERIMENTAL RESULTS","content":"\u003ch2\u003e4.1 Implementation Details:\u003c/h2\u003e\n\u003ch2 id=\"_Toc223443116\"\u003e4.1.1 Network Architecture:\u003c/h2\u003e\n\u003cp\u003eThe network architecture employed in this research is designed to evaluate and compare two transformer-based deep learning models\u0026mdash;Vision Transformer (ViT) and Swin Transformer\u0026mdash;for prostate cancer detection using MRI data. Both architectures are implemented independently under a unified experimental framework to ensure a fair and unbiased comparison. The overall pipeline follows a standardized flow consisting of image preprocessing, transformer-based feature extraction, and classification.\u003c/p\u003e\n\u003cp\u003eFor the Vision Transformer architecture, the input prostate MRI images are first resized to a fixed resolution of 224\u0026times;224 pixels and normalized to ensure consistent intensity distribution. The images are then partitioned into non-overlapping patches of size 16\u0026times;16. Each patch is flattened and linearly projected into a fixed-dimensional embedding space. Positional embeddings are added to preserve spatial information, and a learnable classification token is appended to represent the global image context. The resulting sequence of embeddings is processed through a stack of transformer encoder layers comprising multi-head self-attention and feed-forward neural networks. The output corresponding to the classification token is used as a high-level feature representation and passed to a fully connected layer for binary classification of prostate cancer.\u003c/p\u003e\n\u003cp\u003eIn contrast, the Swin Transformer architecture adopts a hierarchical and window-based attention mechanism. The input images undergo patch partitioning followed by patch embedding like ViT; however, self-attention is computed within local non-overlapping windows rather than globally. Successive Swin Transformer blocks alternate between window-based multi-head self-attention (W-MSA) and shifted window multi-head self-attention (SW-MSA), enabling cross-window information exchange. Patch merging layers are applied between stages to progressively reduce spatial resolution while increasing feature dimensionality, allowing the model to learn multi-scale representations. The final hierarchical features are aggregated and fed into a classification head for prostate cancer prediction.\u003c/p\u003e\n\u003cp\u003eBoth architectures employ transfer learning using pretrained weights to enhance convergence and generalization. Identical training strategies, hyperparameters, and evaluation metrics are maintained across models. This architectural design enables a systematic comparison of global attention-based and hierarchical transformer-based representations, providing valuable insights into their effectiveness for prostate cancer detection in clinical MRI data.\u003c/p\u003e\n\u003cp id=\"_Toc218699363\"\u003e\u003cstrong\u003eTable 4.\u003c/strong\u003e\u003cstrong\u003e1\u003c/strong\u003e\u003cstrong\u003e.\u0026nbsp;\u003c/strong\u003eComparative analysis of Vision Transformer (ViT) and Swin Transformer architectures highlighting key architectural components, attention mechanisms, and their relevance for prostate cancer detection.\u003c/p\u003e\n\u003ctable\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eComponent\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eVision Transformer (ViT)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eSwin Transformer\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eInput Image Size\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e(224 x 224)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e(224 x 224)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePatch Size\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e(16 x 16)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e(4 x 4) (initial patch partition)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePatch Embedding\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLinear projection of flattened patches\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLinear embedding with hierarchical patch merging\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePositional Encoding\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLearnable positional embeddings\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eImplicit positional bias via shifted windows\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eAttention Mechanism\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eGlobal multi-head self-attention\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eWindow-based and shifted window self-attention\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eAttention Scope\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eEntire image (global context)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLocal windows with cross-window interaction\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eHierarchical Representation\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNo (single-scale representation)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eYes (multi-scale hierarchical features)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eFeature Extraction\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eGlobal contextual features\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLocal and multi-scale spatial features\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eClassification Token\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e[CLS] token used\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNo explicit [CLS]; pooled hierarchical features\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eComputational Complexity\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eHigher for high-resolution images\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLower due to window-based attention\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eStrengths\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eStrong global context modeling\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eEfficient, scalable, and robust local modeling\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eApplication Focus\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eGlobal lesion context analysis\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eLocal lesion and boundary detection\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eOutput Layer\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eFully connected classification head\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eFully connected classification head\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003ch2 id=\"_Toc223443117\"\u003e4.1.2 Training Details:\u003c/h2\u003e\n\u003cp\u003eThe process of training in the proposed prostate cancer detection models was well established to promote strong learning, level comparison and assessment of reliability of performance. The Vision Transformer (ViT) and Swin Transformer architectures were trained using the same experimental conditions so as not to have a bias of comparative analysis. Before training, the entire MRI image of the prostate was down sampled to a common size of 224x224 pixels and the images were hashed to equalize the intensity values, which is necessary to maintain constant gradient updates in the optimization.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eTraining was done using the data augmentation method consisting of rotation, flipping, scaling, and intensity variation as the data augmentation techniques to enhance the diversity of the data and the generalization of the model. Initialization of the two transformer models using pretrained weights derived by large-scale datasets of natural images was used in transfer learning.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThis approach greatly cuts down on the training time and overfitting, especially in the case of limited availability of annotated medical imaging data. The last diagnosis layers were optimized during training, and the transformer backbone was gradually optimized to fit prostate-specific imaging characteristics. The objective function was binary cross-entropy loss, which was applied to solve binary classification problems, cancerous vs. no cancerous cases. The Adam optimizer with a thoughtfully chosen learning rate was used to optimize it. The application of mini-batch training was used, and the batch size was fixed to balance between the computational and memory overhead.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe training process repeated across several epochs, and early stopping was used (on validation loss) to avoid overfitting and to save the checkpoints of the models that performed well. Accuracy and Area Under Curve (AUC) and Receiver Operating Characteristic Curve (ROC) were used as the main indicators of performance in trainee during training. Altogether, the training strategy facilitated uniform convergence, high discriminative potential, and feasible generalization between the two architectures of transformers, which formed a strong basis on comparing their performance in detecting prostate cancer.\u003c/p\u003e\n\u003ch2 id=\"_Toc223443118\"\u003e4.2 Evaluation Metrics:\u003c/h2\u003e\n\u003cp\u003eRobust and comprehensive evaluation metrics were used to thoroughly examine the predictive power, stability and generalization ability of the suggested transformer-based prostate cancer detection model. Both Vision Transformer (ViT) and Swin Transformer models were tested on commonly used classification metrics, such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC) [43], [44] ,\u0026nbsp;[45]. All these measures are used to offer a complete evaluation of model behavior in both statistical and clinical terms.\u003c/p\u003e\n\u003cp\u003eWhere TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives respectively as represented in the confusion matrix. The accuracy is the ratio of the correct classified MRI samples to the total number of samples and it is an overall measure of the model. Precision is a measure of the consistency of prediction of positive cases of cancer, where the prediction is made in number of successfully identified cancer cases, to the number of cases that are predicted to be cancer. As it was mentioned above, recall, also known as sensitivity, is the capability of the model to identify all actual cases of prostate cancer in the clinical setting correctly, which is crucially important to reduce the number of missed diagnoses.\u003c/p\u003e\n\u003cp\u003eThe F1-score was highlighted as one of the important performance metrics because class imbalance existed in prostate cancer datasets. F1-score is the harmonic mean of the precision and recall that is used to give a balanced analysis through the consideration of false positives and false negatives. This renders it especially appropriate to medical diagnostic tasks where false rate and false under-rate may be very critical in clinical context.\u003c/p\u003e\n\u003cp\u003eAlong with these measures, the discriminative capacity of the models to differing classification thresholds was measured using the AUC. AUC is used to analyze the ability of the model to differentiate between cancerous and non-cancerous cases, regardless of a set decision threshold and is generally considered to be a strong indicator of diagnostic performance in medical imaging systems. A combination of these evaluation metrics allows us to make a reliable and clinically significant comparison between the Vision Transformer and Swin Transformer architectures in detecting prostate cancer with the help of MRI data.\u003c/p\u003e\n\u003ch2 id=\"_Toc223443119\"\u003e4.3 Experimental Results:\u003c/h2\u003e\n\u003cp\u003eThe comparative analysis of Vision Transformer and Swin Transformer models offers a good insight into the weaknesses and strengths of transformer-based models when detecting surface defects. Vision Transformer was shown to be superior in the overall capability to achieve an average cross-validation precision of 90.51% and AUC of 96.69. These findings suggest great generalization and strong discrimination between defective and non-defective samples. The small values of standard deviations in folds also confirm that the model is stable and reliable to changes in the data set. The reason behind this performance advantage is that Vision Transformer has the global self-attention mechanism that allows access to long-range spatial dependencies and contextual information that is critical to the differentiation of subtle structural variations in surface textures.\u003c/p\u003e\n\u003cp id=\"_Toc218699364\"\u003e\u003cstrong\u003eTable 4.\u003c/strong\u003e\u003cstrong\u003e2\u003c/strong\u003e\u003cstrong\u003e.\u0026nbsp;\u003c/strong\u003eClassification Performance of Vision Transformer for Prostate Cancer Detection\u003c/p\u003e\n\u003ctable\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eMetric\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eValue\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTrain Accuracy\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.97\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTrain AUC\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.99\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTest Accuracy\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTest AUC\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.90\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTest Loss\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e1.00\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTotal Test Samples\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e251\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp id=\"_Toc218699365\"\u003e\u003cstrong\u003eTable 4.\u003c/strong\u003e\u003cstrong\u003e3\u003c/strong\u003e\u003cstrong\u003e.\u0026nbsp;\u003c/strong\u003eClass-wise Performance Metrics (Test Set)\u003c/p\u003e\n\u003ctable\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eClass\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePrecision\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eRecall\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eF1-Score\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eSupport\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e0 (Non-Cancer)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.73\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.77\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e102\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e1 (Cancer)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.86\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.83\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e149\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eMacro Average\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e251\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eWeighted Average\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e251\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eThe Swin Transformer, in turn, had a more moderate AUC of 0.7614 on test and a higher ability to find defective samples, with a recall of 0.97 on the defective category. This implies that the Swin Transformer is sensitive to the occurrence of defects and can detect even minor or localized defects. Nevertheless, it also had a significantly lower recall of the non-defective category (0.58), which suggests a bias towards categorizing normal surfaces as defective.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e\n\u003cp\u003eThis conservative attitude leads to the increased false-positive rate, and it is explained by the hierarchical and window-based attention mechanism of the model. Although this process improves local texture anomaly detection, it can also lead to the model labeling a natural texture defect as one.\u003c/p\u003e\n\u003cp id=\"_Toc218699366\"\u003e\u003cstrong\u003eTable 4.\u003c/strong\u003e\u003cstrong\u003e4\u003c/strong\u003e\u003cstrong\u003e.\u003c/strong\u003e Classification Performance of Vision Transformer for Prostate Cancer Detection\u003c/p\u003e\n\u003ctable\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eClass\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePrecision\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eRecall\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eF1-Score\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eSupport\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e0 (Non-Cancer)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.63\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.85\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.72\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e26\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e1 (Cancer)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.86\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.65\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.74\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e37\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eAccuracy\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd\u003e\u003cbr\u003e\u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003e0.73\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e63\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eMacro Average\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.74\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.75\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.73\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e63\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eWeighted Average\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.76\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.73\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.73\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e63\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eAlthough Swin Transformer has a somewhat better accuracy of the test than Vision Transformer (0.81 vs. 0.80), the balanced precision and recall of both classes demonstrate a more consistent and reliable classification behavior of the Visual Transformer. The fact that its F1-score in the defect class (0.83) and non-defect class (0.77) has a high score signifies that the classification has a high sensitivity-specificity balance. This renders the Vision Transformer to be more applicable in the real-world industrial inspection setup where the ability to both identify defects as well as the proper identification of normal samples are equally valuable. The near-universal recall of defective samples with the Swin Transformer, in contrast, makes it an attractive option in safety-critical tasks, where detection of a defect may lead to dangerous effects or loss of money and time.\u003c/p\u003e\n\u003cp\u003eAlthough Swin Transformer has a somewhat better accuracy of the test than Vision Transformer (0.81 vs. 0.80), the balanced precision and recall of both classes demonstrate a more consistent and reliable classification behavior of the Visual Transformer. The fact that its F1-score in the defect class (0.83) and non-defect class (0.77) has a high score signifies that the classification has a high sensitivity-specificity balance. This renders the Vision Transformer to be more applicable in the real-world industrial inspection setup where the ability to both identify defects as well as the proper identification of normal samples are equally valuable. The near-universal recall of defective samples with the Swin Transformer, in contrast, makes it an attractive option in safety-critical tasks, where detection of a defect may lead to dangerous effects or loss of money and time.\u003c/p\u003e\n\u003cp\u003eThe performance patterns of the observed models are greatly due to the architectural design of the models. The patch-based global attention provided by the Vision Transformer provides the ability to examine the image in its entirety, which can provide a more contextual insight and stronger ability to generalize discrimination. In its turn, the shifted-window attention of the Swin Transformer is more concentrated on localized areas and can be used to identify small defects, but it becomes more likely to confuse benign texture variations with defects. Hence, Swin is highly sensitive to defects but reduces specificity causing increased false-positive scores.\u003c/p\u003e\n\u003cp id=\"_Toc218699367\"\u003e\u003cstrong\u003eTable 4.\u003c/strong\u003e\u003cstrong\u003e5\u003c/strong\u003e\u003cstrong\u003e.\u0026nbsp;\u003c/strong\u003eComparative Performance of Vision Transformer and Swin Transformer for Prostate Cancer Detection\u003c/p\u003e\n\u003ctable\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eModel\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTrain Accuracy\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTrain AUC\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTest Accuracy\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTest AUC\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eTest Samples\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eVision Transformer (ViT)\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.97\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.99\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e251\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eSwin Transformer\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.84\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.79\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.79\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e63\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e\n\u003cp\u003eOverall, it can be concluded that Vision Transformer is the more balanced and generalizable model that provides high accuracy, high AUC, and cross-validation stability. It can be used very well in automated inspection pipelines wherein false positives and false negatives can interfere with operational efficiency. However, Swin Transformer would be more suitable in situations where defect sensitivity is the main concern, so it is more applicable to the environment, in which a single missed defect is not acceptable. Both models have good forecasts of industrial defect detection, and due to the complementary capabilities, hybrid or group work is potentially more useful in improving detection resilience by integrating the ability to see the big picture into the Vision Transformer with the small-scale defect sensitivity of the Swin Transformer.\u003c/p\u003e\n\u003ch2 id=\"_Toc223443120\"\u003e4.4 Clinical Analysis of Significant Prostate Cancer Detection at Case Level.\u003c/h2\u003e\n\u003cp\u003eThe patient-wise comparative analysis of clinically significant prostate cancer cases only (ISUP \u0026ge; 2) is very insightful in relation to the diagnostic behavior of transformer-based models in clinically critical cases. However, in contrast to global measures of performance, this case-level analysis reveals responsiveness of Vision Transformer (ViT) and Swin Transformer to aggressive forms of disease, which require early and precise detection.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThese findings suggest that both models are highly diagnostic when it comes to high-grade cancer cases of prostate, and there are a few cases when both ViT and Swin Transformer were able to detect the presence of a clinically relevant disease with a high level of confidence. These examples indicate that discriminative representations that are reflective of malignant tissue features of prostate MRI are learnable by transformer-based architectures. Specifically, both models were more likely to identify cases with higher ISUP scores (\u0026ge;4), which may be explained by the fact that the abnormalities and their manifestations are more significant.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOne observation that can be noted during the analysis is the complementary nature of the two architectures. Vision Transformer was found to be able to perform better in various instances by being able to identify clinically significant cancers that Swin Transformer had failed to detect. The reason behind this can be related to the capability of ViT to identify the global contextual information of the whole image that can be helpful in the case where the tumor appearance is diffused or multifocal. In contrast, Swin Transformer was found to be more effective and accurate in recognizing a smaller set of aggressive cases previously missed by ViT, which is its advantage of the ability to model localized and hierarchical features using the window-based attention mechanisms.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eAlthough the overall performance was satisfactory, the misunderstandings with both models were a few cases. Such instances can be associated with mild cases, low-contrast tumor borders, or imaging artifacts, which highlights the fact that even with sophisticated deep learning, the prostate cancer detection proves challenging. These misclassifications highlight why additional combination of multi-modal data, increased sizes of training samples, or additional clinical data would be required in subsequent studies.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOverall, this clinical analysis (as shown in \u003cstrong\u003e\u003cem\u003eTable 4.6\u003c/em\u003e\u003c/strong\u003e) shows that transformer-based models are useful in predicting clinically significant prostate cancer, and each architecture has its unique strong sides. These results demonstrate the clinical applicability of the proposed framework of comparisons and indicate that transformer models have great prospects in helping radiologists to detect aggressive prostate cancer and, as a result, enhance diagnostic accuracy and make informed clinical decisions.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 4.\u003c/strong\u003e\u003cstrong\u003e6\u003c/strong\u003e\u003cstrong\u003e.\u003c/strong\u003e Patient-wise Comparative Analysis for Clinically Significant Prostate Cancer Cases (ISUP \u0026ge; 2)\u003c/p\u003e\n\u003ctable\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003ePatient_ID\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eISUP_Grade\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eGround_Truth\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eViT_Prediction\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eSwin_Prediction\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eViT_Probability\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eSwin_Probability\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eCase_Analysis\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP053\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.46\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eViT correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP054\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.41\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.78\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSwin correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP056\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.92\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.89\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eBoth correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP058\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.75\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.48\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eViT correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP059\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.45\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.83\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSwin correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP061\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.87\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eBoth correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP062\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.36\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.31\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eBoth wrong\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP064\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.79\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eViT correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP068\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.82\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.45\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eViT correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP069\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.43\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSwin correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP071\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.86\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eBoth correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP073\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.77\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.47\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eViT correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP074\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.44\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.82\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSwin correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP076\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.93\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.91\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eBoth correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP079\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.80\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.41\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eViT correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP081\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.89\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.87\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eBoth correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP082\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.41\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.79\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSwin correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP084\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.78\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.44\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eViT correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP086\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.92\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.90\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eBoth correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP088\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.81\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.43\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eViT correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eP089\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eNon-Cancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eCancer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.47\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003e0.84\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eSwin correct\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n\u003c/table\u003e"},{"header":"CHAPTER FIVE: DISCUSSIONS","content":"\u003cp\u003eThe study examined the usefulness of transformer-based deep learning architectures in detecting prostate cancer with an emphasis on the Vision Transformer (ViT) and Swin Transformer architectures. The idea to conduct the study was predetermined by the increasing necessity of precise, reliable, and computationally efficient diagnostic instruments that can help clinicians to diagnose prostate cancer at its initial stages and minimize the occurrence of unnecessary biopsies. The experimental findings prove that transformer-based models have important advantages compared to traditional convolutional neural network models because they can capture both the global contextual information and the local anatomical patterns in prostate MRI images. Vision Transformer demonstrated high results regarding overall classification accuracy and AUC, which is indicative of its capacity to do longer-range dependencies and global structural relations.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;This feature is largely important in prostate MRI examination with lesions being heterogeneous and spatially diffused within the gland. The good training and testing success of ViT indicates that the global self-attention mechanisms are very efficient in the process of learning discriminatory representations of prostate tissue. Nevertheless, the dependency of ViT on global attention also results in increasing computational complexity, which could be a problem when implementing it in clinical practice on-the-fly, especially with high-resolution imaging data.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eBy comparison, Swin Transformer showed better performance in competitiveness possessing a more efficient computational profile. Its attention mechanism was hierarchical and windowed to facilitate local and multi-scale feature modeling, which is important in detecting small, or low-contrast lesions in the prostate. Even though Swin Transformer presented slightly lower overall AUC in this study than ViT, it performed well on recalling cancerous cases, meaning that it is sensitive to malignant tissue. Such sensitivity is clinically important, because in a cancer screening application, false negative minimization is a major requirement. One of the contributions of this work is systematically and equitably comparing the architecture of ViT to Swin Transformer in the same conditions of the experiment. In contrast to the prior research that is based on single-model assessments, the given research offers helpful information on how various attention mechanisms can affect the diagnostic results, the possibility to generalize them, and their computational efficiency.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe results indicate that ViT is more effective in extracting global contextual data, whereas Swin Transformer can provide a moderate trade-off between performance and efficiency which could be more appropriate to the scalable clinical use. In spite of the encouraging outcomes, there are some limitations. It was done on a small dataset and transformer models generally perform well when using larger annotated datasets. The further improvement of performance with multi-modal data integration, increased multi-center datasets, and optimization strategies may be considered in future research. Comprehensively, this study indicates the opportunities of transformer architectures in the development of prostate cancer and offers a good thesis of future AI-driven diagnostic systems in clinical practice.\u003c/p\u003e"},{"header":"LIST OF ABBREVIATIONS AND ACRONYM","content":"\u003ctable width=\"617\"\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eMRI\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eMedical Resonance Imaging\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eViT\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eVisual Transformer\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003ePZ\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003ePeripheral Zone\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eCZ\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eCentral Zone\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eTZ\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eTransitional Zone\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eBPH\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eBenign Prostatic Hyperplasia\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eDHT\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eDiHydroTestosterone\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eLUTS\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eLower Urinary Tract Symptoms\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003ePSA\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eprostate-specific antigen (PSA)\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003ebpMRI\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eBi-Paramteric MRI\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003empMRI\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eMulti-Parametric MRI\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eCNN\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eConvolutional Neural Networks\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003eW-MSA\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eWindow-based Multi-head Self-Attention\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003eSW-MSA\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eShifted Window Multi-head Self-Attention\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eITK\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eInsight Segmentation and Registration Toolkit\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eSwT\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eShifted Window Transformer\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eCLS\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eClassify Token\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eGPU\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eGraphical Processing Unit\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eCSV\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eComma Separated Version\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eAUC\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eArea Under Curve\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eROC\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eReceiver Operating Curve\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eTP\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eTrue Positive\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eTN\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eTrue Negative\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eFP\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eFalse Positive\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eFN\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eFalse Negative\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eFCN\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eFully Convolutional Network\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd width=\"91\"\u003e\n\u003cp\u003e\u003cstrong\u003eXAI\u003c/strong\u003e\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd width=\"526\"\u003e\n\u003cp\u003eExplainable AI\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n"},{"header":"Declarations","content":"\n\u003ch3\u003eACKNOWLEDGEMENTS\u003c/h3\u003e\n\u003cul\u003e\n \u003cli\u003eAll praises be to \u003cstrong\u003eAlmighty GOD\u003c/strong\u003e, by whose infinite grace and blessings I was able to successfully complete this research project. I am profoundly thankful to Him for guiding me and granting me strength, wisdom, and perseverance throughout every step of my zcademic journey.\u003c/li\u003e\n \u003cli\u003eMy heartfelt gratitude and deepest respect go to \u003cstrong\u003eProphet Muhammad (Peace Be Upon Him)\u003c/strong\u003e, whose noble teachings and exemplary life have been a constant source of inspiration, guidance, and motivation during this thesis.\u003c/li\u003e\n \u003cli\u003eI would like to express my sincere appreciation to \u003cstrong\u003eDr. Amjad Iqbal\u003c/strong\u003e, Dean, Faculty of Information Technology and Computer Science (FoIT\u0026amp;CS), \u003cstrong\u003eUniversity of Central Punjab (UCP)\u003c/strong\u003e, for his visionary leadership and encouragement, which have continuously inspired us to pursue meaningful research that contributes to the betterment of society.\u003c/li\u003e\n \u003cli\u003eI am deeply indebted to my research supervisor, \u003cstrong\u003eDr. Muhammad Adnan Aziz\u003c/strong\u003e, Associate Professor, FoIT\u0026amp;CS, UCP, for his unwavering guidance, insightful feedback, and invaluable support throughout the entire research process. His mentorship, patience, and dedication have been instrumental in shaping the direction and success of this work. I am sincerely grateful for his continuous encouragement, constructive criticism, and expert supervision, which made this thesis possible.\u003c/li\u003e\n \u003cli\u003eNext to him, I am grateful to \u003cstrong\u003eDr. Muhammad Zubair, Interdisciplinary Research Center for Finance and Digital Economy, King Fahd University of Petroleum and Minerals (KFUPM), Dhahran 31261, Saudi Arabia (Ex. Assistant Professor, FoIT\u0026amp;CS, UCP)\u003c/strong\u003e for his valuable guidance and suggestion in selecting the topic for my thesis. Their insightful input and expertise have been instrumental in shaping the direction and significance of this research project. Dr. Zubair\u0026rsquo;s extensive knowledge and expertise have significantly enhanced research, providing valuable insights, guidance in identifying relevant literature, methodologies, and approaches.\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eFurthermore, I would like to extend my gratitude to my classmate, \u003cstrong\u003eEngr. Muhammad Junaid, Assistant Manager (Tech) at Artificial Intelligence Technology Centre (AITeC), National Centre for Physics (NCP),\u003c/strong\u003e for his invaluable guidance and unwavering support throughout this process.\u003c/li\u003e\n \u003cli\u003eFinally, I would like to express my heartfelt gratitude to my fianc\u0026eacute;e, \u003cstrong\u003eDr. Azaan Fatima\u003c/strong\u003e, Medical Officer, Services Hospital Lahore, for her unwavering support and encouragement throughout the period of my graduate studies. Her constant motivation, patience, and belief in my abilities provided me with the strength and confidence to overcome challenges and remain focused during demanding phases of this work, and I am deeply thankful for her presence and encouragement throughout this journey.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eI, \u003cstrong\u003e\u003cem\u003eGhulfam Hussain\u003c/em\u003e\u003c/strong\u003e S/O \u003cstrong\u003e\u003cem\u003eAllah Dad\u003c/em\u003e\u003c/strong\u003e, a student of \u003cstrong\u003e\u003cem\u003e\u0026ldquo;Master of Science in Data Science\u0026rdquo;\u003c/em\u003e\u003c/strong\u003e, at \u003cstrong\u003e\u0026ldquo;Faculty of Information Technology \u0026amp; Computer Sciences\u0026rdquo;\u003c/strong\u003e, \u003cstrong\u003eUniversity of Central Punjab (UCP)\u003c/strong\u003e, hereby declare that this thesis titled, \u003cstrong\u003e\u003cem\u003e\u0026ldquo;Prostate Cancer Detection in Bi-parametric MRI Using Deep Learning Model\u0026rdquo;\u003c/em\u003e\u003c/strong\u003e is my own research work and has not been submitted, published, or printed elsewhere in Pakistan or abroad. Additionally, I will not use this thesis to obtain any degree other than the one stated above. I fully understand that if my statement is found to be incorrect at any stage, including after the award of the degree, the University has the right to revoke my MS/M.Phil. degree.\u003c/p\u003e\n\u003cdiv align=\"right\"\u003e\n \u003ctable border=\"1\" cellspacing=\"0\" cellpadding=\"0\" width=\"402\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eSignature of Student:\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eName of Student:\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eGhulfam Hussain\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eRegistration Number:\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\n \u003cp\u003eL1S23MSDS0003\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e\u003cstrong\u003eDate:\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd valign=\"top\"\u003e\u003cbr\u003e\u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003e\u003cem\u003eMale Reproductive System | BioNinja \u0026mdash; old-ib.bioninja.com.au.\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRehfeld A, Nylander M, Karnov K (2017) The Male Reproductive System. Compendium of Histology: A Theoretical and Practical Guide. Springer, pp 569\u0026ndash;592\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWashburn RL (2024) Complements from the male reproductive tract: A scoping review, \u003cem\u003eBioMed\u003c/em\u003e. 4:19\u0026ndash;38\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eObukohwo OM, Kingsley NE, Rume RA, Victor E (2021) The concept of male reproductive anatomy,\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e\u003cem\u003eMale Reproductive System \u0026mdash; my.clevelandclinic.org.\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSchubert LF, Kr\u0026uuml;ger S, Moritz GB, Schubert V (2017) Male reproductive system and spermatogenesis of Limodromus assimilis (Paykull 1790). PLoS ONE 12:e0180492\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSharma M, Gupta S, Dhole B, Kumar A (2017) The prostate gland. Basics of Human Andrology: A Textbook. Springer, pp 17\u0026ndash;35\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDas PK, Mukherjee J, Banerjee D (2023) Functional morphology of the male reproductive system. Textbook of veterinary physiology. Springer, pp 441\u0026ndash;476\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAnamthathmakula P, Erickson JA, Winuthayanon W (2022) Blocking serine protease activity prevents semenogelin degradation leading to hyperviscous semen in humans. Biol Reprod 106:879\u0026ndash;887\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDevlin CM, Simms MS, Maitland NJ (2021) Benign prostatic hyperplasia\u0026ndash;what do we know? BJU Int 127:389\u0026ndash;399\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu X, Liu R, Song L, Gao W, Wang X, Zhang Y (2023) Differences in the pathogenetic characteristics of prostate cancer in the transitional and peripheral zones and the possible molecular biological mechanisms. Front Oncol 13:1165732\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu X-d, Yan S-s, Liu R-j, Zhang Y-s (2024) Apparent differences in prostate zones: susceptibility to prostate cancer, benign prostatic hyperplasia and prostatitis. Int Urol Nephrol 56:2451\u0026ndash;2458\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCannarella R, Condorelli RA, Barbagallo F, Vignera SL, Calogero AE (2021) Endocrinology of the aging prostate: current concepts. Front Endocrinol 12:554078\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbdelmoteleb H, Jefferies ER, Drake MJ (2016) Assessment and management of male lower urinary tract symptoms (LUTS). Int J Surg 25:164\u0026ndash;171\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCoyne KS, Sexton CC, Kopp Z, Chapple CR, Kaplan SA, Aiyer LP, Symonds T (2010) Assessing patients\u0026rsquo; descriptions of lower urinary tract symptoms (LUTS) and perspectives on treatment outcomes: results of qualitative research. Int J Clin Pract 64:1260\u0026ndash;1278\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eUrology S \u003cem\u003eProstate Cancer - Port Saint Lucie, Fla., Urologist | Diagnostics And Treatment \u0026mdash; solomonurology.com.\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlmabrouk T, Alashkham A (2024) Prostate Cancer: A Comprehensive Overview\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAlbers P, Franiel T, K\u0026ouml;tter T, Kristiansen G, Herrmann K, Wiegel T (2025) The Early Detection, Diagnostic Evaluation, and Local Treatment of Prostate Cancer: A Paradigm Shift. Deutsches \u0026Auml;rzteblatt international 122:420\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWilliams ISC, McVey A, Perera S, O\u0026rsquo;Brien JS, Kostos L, Chen K, Siva S, Azad AA, Murphy DG, Kasivisvanathan V (2022) and others, Modern paradigms for prostate cancer detection and management, \u003cem\u003eMedical Journal of Australia\u003c/em\u003e, vol. 217, pp. 424\u0026ndash;433\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eExperts U \u003cem\u003eProstate Cancer: Symptoms and Treatments - Urology Experts, Dr. Alejandro Miranda-Sousa \u0026mdash; urologyexperts.com.\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGnanapragasam VJ, Greenberg D, Burnet N (2022) Urinary symptoms and prostate cancer\u0026mdash;the misconception that may be preventing earlier presentation and better survival outcomes. BMC Med 20:264\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGrant P (2025) The Renal and Urological System. The Concise Guide to Medical History Taking. Springer, pp 133\u0026ndash;150\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBerenguer CV, Pereira F, C\u0026acirc;mara JS, Pereira JAM (2023) Underlying features of prostate cancer\u0026mdash;statistics, risk factors, and emerging methods for its diagnosis. Curr Oncol 30:2300\u0026ndash;2321\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePrasanth BK, Alkhowaiter S, Sawarkar G, Dharshini BD, Baskaran AR, Prasanth K, Alkhowaiter SS, Baskaran AR (2023) Unlocking early cancer detection: exploring biomarkers, circulating DNA, and innovative technological approaches, \u003cem\u003eCureus\u003c/em\u003e, vol. 15\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGupta P, Gupta M, Koul N (2020) Overdiagnosis and overtreatment; how to deal with too much medicine. J Family Med Prim Care 9:3815\u0026ndash;3819\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZeb S, Nizamullah FNU, Abbasi N, Fahad M (2024) AI in healthcare: revolutionizing diagnosis and therapy. Int J Multidisciplinary Sci Arts 3:118\u0026ndash;128\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBacha A, Shah HH (2024) Liquid Biopsy: Advancements in Early Detection and Monitoring of Cancer through Blood-based Markers. Global J Univers Stud 1:68\u0026ndash;86\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTiwari A, Mishra S, Kuo T-R (2025) Current AI technologies in cancer diagnostics and treatment. Mol Cancer 24:159\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRahman MH, Hossin ME, Hossain MJ, Uddin SMM, Faruk MI, Anwar MM, Hossain F (2024) Harnessing big data and predictive analytics for early detection and cost optimization in cancer care. J Comput Sci Technol Stud 6:278\u0026ndash;293\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNivethitha V, Daniel RA, Surya BN, Logeswari G (2025) Empowering public health: Leveraging AI for early detection, treatment, and disease prevention in communities\u0026ndash;A scoping review. J Postgrad Med 71:74\u0026ndash;81\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePandey A, Gupta SP (2024) Personalized Medicine:(A Comprehensive Review). Orient J Chem, 40\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbdelmaksoud IR, Shalaby A, Mahmoud A, Elmogy M, Aboelfetouh A, Abou El-Ghar M, El-Melegy M, Alghamdi NS, El-Baz A (2021) Precise identification of prostate cancer from DWI using transfer learning, \u003cem\u003eSensors\u003c/em\u003e, vol. 21, p. 3664\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDuran-Lopez L, Dominguez-Morales JP, Gutierrez-Galan D, Rios-Navarro A, Jimenez-Fernandez A, Vicente-Diaz S, Linares-Barranco A (2021) Wide \u0026amp; Deep neural network model for patch aggregation in CNN-based prostate cancer detection systems. Comput Biol Med 136:104743\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHassan MR, Islam MF, Uddin MZ, Ghoshal G, Hassan MM, Huda S, Fortino G (2022) Prostate cancer classification from ultrasound and MRI images using deep learning based Explainable Artificial Intelligence. Future Generation Comput Syst 127:462\u0026ndash;472\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSingla D, Cimen F, Narasimhulu CA (2023) Novel artificial intelligent transformer U-NET for better identification and management of prostate cancer. Mol Cell Biochem 478:1439\u0026ndash;1445\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eIlesanmi AE, Ilesanmi TO, Ajayi BO (2024) Reviewing 3D convolutional neural network approaches for medical image segmentation, \u003cem\u003eHeliyon\u003c/em\u003e, vol. 10\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSammouda R, El-Zaart A (2021) An optimized approach for prostate image segmentation using K-Means clustering algorithm with elbow method, \u003cem\u003eComputational Intelligence and Neuroscience\u003c/em\u003e, vol. p. 4553832, 2021\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJin Y, Yang G, Fang Y, Li R, Xu X, Liu Y, Lai X (2021) 3D PBV-Net: an automated prostate MRI data segmentation method. Comput Biol Med 128:104160\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJiangtao W, Ruhaiyem NIR, Panpan F (2025) A Comprehensive Review of U-Net and Its Variants: Advances and Applications in Medical Image Segmentation. IET Image Proc 19:e70019\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRundo L (2021) Computer-assisted analysis of biomedical images, \u003cem\u003earXiv preprint arXiv:2106.04381\u003c/em\u003e,\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBleker J, Roest C, Yakar D, Huisman H, Kwee TC (2024) The effect of image resampling on the performance of radiomics-based artificial intelligence in multicenter prostate MRI. J Magn Reson Imaging 59:1800\u0026ndash;1806\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHambarde P, Talbar S, Mahajan A, Chavan S, Thakur M, Sable N (2020) Prostate lesion segmentation in MR images using radiomics based deeply supervised U-Net. Biocybernetics Biomedical Eng 40:1421\u0026ndash;1435\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAsif MJ (2025) Crowd Scene Analysis Using Deep Learning Techniques, \u003cem\u003earXiv preprint arXiv:2505.08834\u003c/em\u003e,\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAsif MJ, Asad M, Imran S (2023) Crowd Scene Analysis: Crowd Counting using MCNN based on Self-Supervised training with Attention Mechanism, in \u003cem\u003e2023 25th International Multitopic Conference (INMIC)\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKhalid H, Saqib S, Asif MJ, Dewi DA (2024) Strategic Customer Segmentation: Harnessing Machine Learning For Retaining Satisfied Customers. Lahore Garrison Univ Res J Comput Sci Inform Technol, 8\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Prostate Cancer Detection, Deep Learning, Bi-parametric MRI, Visual Transformers (ViT), Swin Transformers, Hybrid Model","lastPublishedDoi":"10.21203/rs.3.rs-9407693/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9407693/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eProstate cancer is an example of a widespread cancer among men in the world and early and accurate diagnosis plays a vital role in enhancing the likelihood of a more favorable patient outcome and reducing the occurrence of invasive surgeries. In the last several years, computer-aided diagnosis systems with deep learning have also shown significant potential on the analysis of medical images, although the conventional convolutional neural networks have the tendency to fail to recreate the long-range contextual attributes in multi-faceted data of a magnetic resonance imaging (MRI). To address these limitations, the current research work is premised on investigating the effectiveness of transformer-based architecture to identify prostate cancer with a comparative analysis of two architectures, Vision Transformer (ViT) and Swin Transformer. The first step in this research involves processing prostate MRI images by first using a complete preprocessing process that entails image normalization, data augmentation to a clinical relevance that ensures that images are better and that the process also tries to eliminate class imbalance. ViT and Swin Transformer are then pretrained and used to learn prostate tissue discriminative representation by extracting features using their respective self-attention mechanisms. The extracted features are then subjected to the supervised classification, in which the performance of the model is evaluated using the typical metrics of analysis such as the accuracy and precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC). Both transformer-based models can be compared as competitive in the prostate cancer detection task, as Vision Transformer is more effective in capturing the global context, and Swin Transformer is more effective in capturing the hierarchical feature representation. The cross-validation findings are also in favor of the stability of the proposed framework and its capacity to be generalized. Overall, the current paper demonstrates that the transformer-based models can possibly be applied in automated diagnosis of prostate cancer, and that it may be possible to gain a clearer idea of their flaws and strengths to create AI-assisted screening systems that are clinically reliable in the future.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e","manuscriptTitle":"Prostate Cancer Detection in Bi-parametric MRI Using Deep Learning Model","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-15 09:41:53","doi":"10.21203/rs.3.rs-9407693/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"28dc38f9-c5bd-48d2-9b15-5948c2f87bd8","owner":[],"postedDate":"April 15th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":66246465,"name":"Artificial Intelligence and Machine Learning"},{"id":66246466,"name":"Cancer Biology"}],"tags":[],"updatedAt":"2026-04-15T09:41:54+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-15 09:41:53","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9407693","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9407693","identity":"rs-9407693","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Outcome instruments

VAS-pain

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00