Hierarchical CXR-Net: A Two-Stage Interpretable Framework for Efficient and Interpretable Chest X-Ray Diagnosis | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Hierarchical CXR-Net: A Two-Stage Interpretable Framework for Efficient and Interpretable Chest X-Ray Diagnosis Ssempeebwa Phillip, Ayebale Allen, Irene Phoebe Akitwi, Vicent Mabirizi This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8373899/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The increasing volume of daily chest X-ray examinations places a significant burden on clinical workflows, as most scans are normal but still require expert review, delaying the diagnosis of critical conditions. Many of existing deep learning models are either computationally heavy and unsuitable for triage or lack transparency. This study aimed to develop an efficient, interpretable, and reproducible hierarchical model aligned with real clinical practice. We proposed Hierarchical chest X-ray Network, a two-stage framework built entirely on public dataset. Stege 1 utilised a lightweight EfficientNet-B0 model, selected through rigorous competitive experiment, to rapidly triage and prioritise potentially abnormal cases. Stage 2 employed a more powerful EfficientNet-B2 model, also empirically validated, to perform 14-class multi-label classification on the prioritised images. The Stage 1 screener achieved a test area under the receiver operating characteristics curve of 0.831, demonstrating efficient and imbalance-robust screening performance. The Stage 2 expert model achieved a mean area under the receiver operating characteristics curve of 0.814 across 14 pathologies, providing strong diagnostic capabilities. Hierarchical chest X-ray Network enhances workflow efficiency while improving transparency and reproducibility compared to traditional single-stage models. Its two-step, workflow-oriented architecture offers a practical, interpretable solution suitable for integration into real-world clinical settings. Artificial Intelligence and Machine Learning Chest X-Ray Diagnosis Hierarchical Deep Learning Medical Image Triage EfficientNet Interpretable AI Figures Figure 1 Figure 2 Figure 3 Figure 4 1. Introduction As the primary imaging modality for a wide range of cardiothoracic disorders [ 1 ], chest radiography continues to be a fundamental component of diagnostic medicine [ 2 ]. However, the vast number of these test poses a serious and expanding problem for clinical workflows around the globe. Radiologist workload has increased dramatically due to both rising imaging volumes and technological advancements, with studies showing in the previous years, a radiologist had to interpret one image every 3–4 seconds during an 8-hour workday to meet clinical demands [ 3 ]. Everyday, radiologists must interpret an increasing number of studies, many of which are eventually found to be normal [ 4 ]. The high volumes of routine cases burden the critical task of identifying pathological finding, creating a classic “signal-to-noise” problem that could result in longer turnaround times for urgent diagnoses and contribute to practitioner fatigue [ 5 ]. The utilization of deep learning in medical imaging has demonstrated significant potential in addressing this challenge. Foundation studies, including the CheXNet and CheXNeXt algorithms [ 6 ], [ 7 ], have effectively illustrated that Convolutional Neural Networks (CNN’s) can attain radiologist-level proficiency in the intricate task of multi-label pathology classification utilizing extensive public datasets such as the NIH Chest x-ray14 collection [ 8 ]. These studies provided a compelling proof-of-concept for artificial intelligence in radiology. However, the evolution of the field has revealed critical discrepancies between these academic standards and the requirements for a functional, practical, and clinically integrated system. A predominant limitation is the prevailing single-model (“monolithic”) approach, where a single, computationally intensive model is applied uniformly to every image. This design is misaligned with the efficient, triage-based workflows practiced in clinical settings. Moreover, many of the most sophisticated systems capable of generating fluent narrative reports have been developed on private, institutional datasets of paired images and text [ 9 ], [ 10 ], rendering their methods scientifically non-reproducible. By shifting the emphasis from isolated model optimization to an intelligent, workflow-oriented system, this paper introduces Hierarchical CXR-Net, a two-stage framework aligned with prior studies that use cascaded or multi-stage architectures for chest X-ray screening and diagnosis [ 11 ], [ 12 ], [ 13 ]. In this framework, the second stage employs a high-capacity model for comprehensive 14-class multi-label classification on flagged cases, while the first stage utilizes a lightweight screener to rapidly distinguish “Likely Normal” from “Potentially Abnormal” scans, effectively simulating clinical triage. This hierarchical division mirrors real radiology decision pathways and ensures optimal allocation of computational resources. We make sure that every model is empirically validated against the most advanced alternatives by thoroughly defending our architectural decisions with comparative experiments. The entire pipeline is constructed using publicly available datasets, integrates Grad-CAM [ 14 ] for localized interpretability, and is intended to be a transparent, repeatable “glass box” system for the scientific community. 2. Related Literature Chest X-ray imaging remains one of the most widely used diagnostics tools in clinical practice [ 15 ], [ 16 ], particularly in resource-limited settings where access to advanced imaging modalities such as CT or MRI is restricted [ 17 ]. Consequently, recent years have seen substantial interest in AI-driven CXR interpretation systems designed to augment radiologists’ workflow, reduce diagnostic delays, and improve accuracy. Across the literature, three major themes emerge: (i) advances in deep learning convolutional networks for multi-label CXR classification (e.g., dual-weighted loss models and transformer-based classification) [ 18 ], [ 19 ], (ii) integration of hierarchical or multi-stage diagnostic pipelines to better reflect real-world diagnostic triage and resource allocation (e.g., RadAI hybrid classification models) [ 20 ], and (iii) growing emphasis on interpretability and clinical trust through anatomically guided attention mechanisms and explicit attention-based heatmaps (domain-knowledge attention networks; transformer-label correlation models) [ 21 ]. 2.1 Advances in Deep Learning for CXR Classification Large-scale CXR datasets such as NIH ChestX-ray14 [ 22 ], CheXpert [ 23 ], MIMIC-CXR [ 24 ], and PadChest [ 25 ] have enabled the development of high-capacity convolutional neural networks capable of detecting multiple thoracic diseases simultaneously. Early landmark models such as CheXNet built on DenseNet-121, demonstrated near-radiologist performance on pneumonia detection and laid the foundation for multi-pathology classification architectures [ 6 ]. Subsequent studies improved upon these baselines by introducing attention mechanisms, classification loss functions, and transformer-based architectures, achieving more robust multi-label predictions across diverse clinical populations [ 26 ]. Despite these advances, high-capacity architecture remains computationally expensive, making them less suitable for real-time triage, especially in low-resource hospital. The models often process every image with equal computational intensity, even though the majority of CXR scans in clinical practice may be normal. This inefficiency has motivated research into lightweight classification modules, efficient networks (e.g., MobileNet) [ 27 ], [ 28 ], [ 29 ], and early-exit architectures that reduce inference time without major performance losses [ 30 ]. 2.2 Hierarchical and Multi-Stage Diagnostic Pipeline In real-world radiology workflow, diagnosis is not a single-step process [ 5 ]. Radiologists typically perform an initial screening to determine whether an image appears normal before conducting a more thorough, pathology-specific investigation. Several AI studies have attempted to mimic this workflow through two-stage or hierarchical classification systems. For example, some works propose using a lightweight classifier to distinguish normal from abnormal images, followed by a deeper model to analyse abnormal scans in detail [ 26 ]. This approach not only reflects clinically meaningful triage processes but also provides computational savings by applying complex models only when necessary. However, many existing hierarchical frameworks rely on static thresholds or lack interoperability mechanisms to justify transitions between stages, limiting their clinical reliability [ 31 ]. The proposed Hierarchical CXR-Net aligns with this line of work but addresses key gaps: (i) ensuring the first-stage screener is extremely lightweight yet reliable, (ii) incorporating a high-capacity model for 14-class multi-label classification, and (iii) emphasizing interpretability using attention or heatmap-based visual expiations [ 14 ]. By doing so, the framework provides both efficiency and clinical relevance while maintaining diagnostic precision. 2.3 Interpretability and Clinically aligned AI Decision-Making Interpretability remains a central concern in medical AI because clinicians require transparency to trust automated systems. Techniques such as Grad-CAM, heatmaps, attention visualisation, and saliency maps have been widely adopted to highlight regions that drive model predictions [ 32 ]. Recent studies have shown that interpretable models not only improve clinician acceptance but may also help identify biases or incorrect patterns learned from noisy labels [ 33 ], [ 34 ]. However, many state-of-the-art multi-label classification models still function as “black boxes” offering limited explanatory value for their predictions [ 35 ]. A growing movement towards interpretable deep learning emphasizes building models that can justify each decision step, especially in hierarchical systems where misclassification at stage 1 affects all downstream outcomes. The proposed Hierarchical CXR-Net contributes to this need by integrating interpretability at both stages, reinforcing trustworthiness and providing radiologists with explicit visual evidence of suspected abnormalities. 3. Methodology The design of Hierarchical CXR-Net is based on a workflow-centered approach that splits the high-volume screening task from the high-complexity diagnostic task. In this part, we talk about the dataset that was used, the structure of the two-stage framework, and the strict experimental design that was used to choose and test each part. All models were developed using the PyTorch deep learning library [ 36 ]. 3.1 Dataset and Preprocessing This study utilized the NIH ChestX-ray14 dataset, a large-scale, publicly available collection of 112,120 frontal-view radiographs from 30,805 unique patients [ 8 ]. We strictly adhered to the official patient-level splits to ensure a robust and comparable evaluation, partitioning the data into a training set (86,524 images) and a held-out test set (25,596 images). The original multi-class labels were processed to create two distinct targets for our framework's stages. For the Stage 1 triage task, the 14 pathology labels were consolidated into a single binary target, Is_Abnormal, where a value of 1 indicated the presence of at least one pathology and 0 corresponded to the "No Finding" label. For the Stage 2 diagnostic task, the original 14 labels were maintained in a multi-hot encoded vector format. All images were resized to a consistent input dimension (224×224 or 288×288 pixels) and normalized based on the ImageNet dataset's mean and standard deviation. To enhance model generalization and mitigate overfitting, standard data augmentation techniques, including random horizontal flipping and random rotations (± 10 degrees), were applied exclusively during the training phase. 3.2 The Hierarchical CXR-Net Framework Our proposed system is composed of two sequential deep learning models, each with a specialized function as illustrated in Fig. 1 . Stage 1: The Efficient Triage Screener : The primary function of this stage is the rapid binary classification of all incoming studies. To address the inherent class imbalance of the dataset, where “No Finding” constitutes the majority class, we implemented a robust training strategy. A WeightedRandomSampler was employed to ensure that each training mini-batch contained a balanced representation of normal and abnormal cases. This was complemented by a weighted binary cross-entropy loss function, which applies a higher penalty (pos_weight) for the misclassification of the minority “Abnormal” class, thereby compelling the model to learn its features more effectively. Stage 2: The Expert Multi-Label Diagnostician : Studies flagged as “Potentially Abnormal” by the screener are subsequently processed by this more powerful second-stage model. Its objective is to perform a detailed, multi-label classification to identify which of the 14 specific pathologies are present. This model leverages a larger, higher-capacity architecture to discern the complex and often overlapping visual patterns of different thoracic diseases. 3.3 Experimental Design for Model Selection and Validation A core principle of our methodology is the data-driven justification of architectural choices. To this end, we conducted comprehensive comparative experiments for both stages. Stage 1: Model Selection : We evaluated three lightweight architectures renowned for their computational efficiency: EfficientNet-B0 [ 37 ], MobileNetV3-Small [ 27 ], and ResNet-18 [ 15 ]. These models were trained on the binary classification task for five epochs, and their performance was compared based on Test Set Area Under the ROC Curve (AUROC), model size (parameters), and total training time to identify the optimal balance of accuracy and efficiency. Stage 2: Model Selection : For the more demanding diagnostic task, we evaluated three high-performance architectures: EfficientNet-B2 [ 37 ], DenseNet-121 [ 38 ], and ResNet-50 [ 39 ]. These models were trained for ten epochs on the multi-label task. The primary selection criterion was the macro-averaged Mean AUROC across all 14 classes on the test set. Stage 3: Implementation and Evaluation Details : All models were fine-tuned using the Adam optimizer [ 40 ] with an initial learning rate of 1e-4. Model convergence was monitored by plotting the test set AUROC after each training epoch. The final screener's performance was further characterized at its optimal operating point, determined by Youden's J statistic, using a full classification report and confusion matrix. 3.4 Interpretability and Reporting To ensure the system is not a “black box,” we integrated an interpretability layer. We employ Grad-CAM, a post-hoc visualization technique, on the final convolutional layer of the Stage 2 expert model. This generates a heatmap that provides a visual explanation for a given prediction by highlighting the most salient image regions. This visual evidence is then combined with the model's probabilistic outputs in a deterministic, template-based reporting module to produce a safe and structured final report. 4. Results We evaluated our hierarchical framework by first conducting model selection experiments for each stage and then demonstrating the performance of the integrated pipeline. All performance metrics reported are calculated on the held-out test set. 4.1 Stage 1: Screener Performance and Model Selection To identify the optimal architecture for the rapid triage task, we compared three lightweight CNNs. The training was conducted using our imbalance-robust strategy earlier presented. The performance, model size, and total training time for each candidate are summarized in Table 1 . EfficientNet-B0 was selected as the definitive Stage 1 screener, as it achieved the highest Test AUROC of 0.831. While MobileNetV3-Small was more parameter-efficient, its diagnostic performance was significantly lower. Conversely, ResNet-18 offered no performance benefit over EfficientNet-B0 despite having nearly three times the number of parameters. Table 1 presents comparative results for stage 1 screener model selection. Table 1 Comparative results for stage 1 screener model selection Model Architecture Test AUROC Parameters (M) EfficientNet-B0 0.831 4.0 ResNet-18 0.796 11.2 MobileNetV3-Small 0.783 1.5 The final performance of the selected EfficientNet-B0 screener is detailed by its Receiver Operating Characteristic (ROC) curve, shown in Fig. 2 . At the optimal operating point determined by Youden's J statistic (threshold = 0.59), the model achieved a sensitivity of 73% and a specificity of 78%. This indicates that the screener correctly identifies 73% of all abnormal studies while correctly clearing 78% of all normal studies, effectively fulfilling its role as a clinical triage tool. 4.2 Stage 2: Expert Diagnostician Performance and Model Selection For the high-complexity multi-label diagnostic task, three high-performance architectures were compared over the training epochs. The primary selection criterion was the macro-averaged Mean AUROC across all 14 pathologies. The results are presented in Table 2 . The experiment revealed that all three models are highly capable, achieving state-of-the-art performance with remarkably similar peak AUROC scores. EfficientNet-B2 was selected as the definitive Stage 2 expert, as it achieved the highest Mean AUROC of 0.814. While DenseNet-121 was slightly smaller, the marginal performance gain of the more modern EfficientNet-B2 architecture made it the superior choice for a task where diagnostic accuracy is the primary objective. Table 2 presents comparative results for stage 2 expert model selection. Table 2 Comparative results for stage 2 expert model selection Model Architecture Best Mean AUROC Parameters (M) EfficientNet-B2 0.814 7.7 ResNet-50 0.813 23.5 DenseNet-121 0.813 7.0 The per-pathology performance of the selected EfficientNet-B2 model is shown in Fig. 3 . The model demonstrated excellent performance on pathologies with distinct visual signatures, such as Emphysema (AUROC 0.92), Hernia (AUROC 0.90) and Cardiomegaly (AUROC 0.88), and was most challenged by diffuse and often ambiguous findings like Pneumonia (AUROC 0.71) and Infiltration (AUROC 0.70). The convergence analysis for all models confirmed that peak performance was achieved within 5–8 epochs, validating our training duration of 10 epochs. 4.3 Integrated Pipeline Case Study To demonstrate the functionality of the complete hierarchical system, we present a case study from the test set in Fig. 4 . The ground truth for this image was "No Finding." The Stage 1 screener processed the image and returned an abnormality probability of 86.07%, correctly escalating it for expert review based on our optimal threshold. The Stage 2 expert subsequently analysed the image and identified “Cardiomegaly” as the most likely finding with 54.0% confidence. The accompanying Grad-CAM visualization confirms that the expert model's reasoning was anatomically sound, with its attention focused exclusively on the cardiac silhouette. This case exemplifies the system's ability to flag and provide a specific, interpretable hypothesis for borderline or subtle cases that may deviate from a strict "normal" appearance. 5. Discussion This study presents the development and validation of Hierarchical CXR-Net, a novel two-stage framework designed for the automated interpretation of chest radiographs. The findings indicate that dividing the clinical workflow into a rapid triage screening phase and an expert diagnostic phase enables the development of a system that is both diagnostically effective and methodologically sound. Our primary contribution is the design of the hierarchical workflow itself. The initial triage, conducted by a lightweight EfficientNet-B0, attained a strong AUROC of 0.831 for the binary task of separating normal from abnormal studies. This result, achieved after a thorough model selection process and with robust handling of the dataset's inherent class imbalance, confirms the viability of a rapid “first-pass” filter. The clinical utility of this stage is significant: at its optimal threshold, the screener can correctly clear 78% of all normal studies, substantially reducing the workload for the more computationally intensive expert model, thereby addressing a key limitation of monolithic systems. The Stage 2 expert model, identified as EfficientNet-B2 through a data-driven comparative analysis, achieved a Mean AUROC of 0.814 across 14 pathologies. The per-pathology performance analysis yielded essential insights regarding the model's capabilities. The identification of findings with distinct visual signatures, including Emphysema (AUROC 0.92) and Cardiomegaly (AUROC 0.88), underscores its capability in discerning clear anatomical and pathological patterns. In contrast, the model’s relative difficulty with diffuse and ambiguous conditions such as Infiltration (AUROC 0.70) should not be viewed as a failure. Instead, it highlights the inherent diagnostic challenges and label ambiguity present in the weakly-supervised NIH dataset. The case studies presented underscore the practical value and the nuances of our integrated system. The case of the false positive screener (Fig. 4 ), where a “No Finding” image was flagged and subsequently diagnosed with “Cardiomegaly” by the expert, is particularly insightful. The Grad-CAM visualization, which showed anatomically perfect localization on the cardiac silhouette, demonstrates that the system is not failing randomly. Instead, it is identifying borderline or subtle cases that may exist in a gray area even for human interpreters. This highlights the system's potential to act as a sensitive second reader, drawing attention to cases that merit closer inspection. At the same time, our analysis of screener misses (false negatives) serves as a crucial reminder that no automated system is infallible and underscores the necessity of keeping a human radiologist in the loop. Unlike systems from existing literature that require private datasets of paired reports [ 10 ],[ 27 ], our entire framework is built and validated on publicly available data, making it a fully reproducible “glass box” benchmark. By grounding our architectural choices in empirical evidence, we move beyond precedent-based design and provide a clear justification for the models used. However, we must acknowledge the limitations of this study. The analysis was conducted on a retrospective dataset from a single source, and prospective, multi-institutional validation is required to confirm the generalizability of our findings. The ground truth labels, while the standard for this dataset, are derived from NLP and contain inherent noise. Furthermore, our template-based reporting module, while safe and predictable, lacks the linguistic nuance of generative models. Despite these limitations, Hierarchical CXR-Net provides a compelling blueprint for the design of practical and effective clinical AI tools. By prioritizing a workflow-centric design, methodological rigor, and interpretability, our work contributes a robust and transparent framework to the field of automated medical image analysis. 6. Conclusion In this study, we introduced Hierarchical CXR-Net, a two-stage deep learning framework designed to address key challenges in automated chest radiograph interpretation. By explicitly modeling the clinical workflow, separating rapid triage from detailed expert-level diagnosis, the proposed system demonstrates strong diagnostic performance while maintaining computational efficiency, interpretability, and methodological transparency. Our data-driven model selection strategy established a new standard for empirical rigor in this domain, identifying EfficientNet-B0 as an optimal lightweight triage model (AUROC 0.831) and EfficientNet-B2 as a high-performing specialist classifier (mean AUROC 0.814). The resulting pipeline constitutes a fully reproducible, end-to-end “glass-box” architecture developed exclusively from publicly available datasets, thereby eliminating barriers related to proprietary data or opaque modeling processes. In conclusion, the proposed Hierarchical CXR-Net provides a robust and extensible foundation for advancing workflow-aligned medical AI systems. Future research can build on this framework to integrate richer clinical context, support multi-task diagnostic pathways, and evaluate real-world deployment in diverse healthcare settings. Declarations Ethics approval and consent to participate The study received ethical clearance from the Kabale University Research Ethics Committee (KAB-REC). The requirement for informed consent was waived because no primary data were collected from human participants. The study exclusively utilised publicly available secondary data from the NIH Chest X-ray 14 dataset, accessed through National Institutes of Health chest x-ray dataset Kaggle website account, and all analyses adhered strictly to the original patient-level data splits to ensure integrity and comparability. Consent for publication Not applicable Availability of data and material The datasets used in this study, the NIH ChestX-ray14 collection, are publicly available and can be accessed at the National Institutes of Health Kaggle website account on this link: https://www.kaggle.com/datasets/nih-chest-xrays/data/. Competing interests The authors declare that they have no competing interests Funding This research received no specific grant from any funding agency. Authors’ contributions SP pre-processed the datasets, developed the methodology and implemented the software, and was. AA and IPA contributed to data curation, and framework validation. MV contributed to conceptualisation, provided supervision and project administration, and approved the final manuscript. All authors contributed to writing, review and editing. Acknowledgement We would like to thank the Deep Learning Indaba X Uganda AI Research Lab for providing the necessary computational resources and collaborative environment for this research. We also extend our appreciation to the Department of Computer Science at Kabale University for its academic support and institutional guidance throughout the project. References J. Broder, “Imaging the chest: the chest radiograph,” Diagnostic imaging Emerg. physician , pp. 185–296, 2011, doi: 10.1016/B978-1-4160-6113-7.10005-5. R. G. Dreyer, C. M. Van der Merwe, M. A. Nicolaou, and G. A. Richards, “Assessing and comparing chest radiograph interpretation in the Department of Internal Medicine at the University of the Witwatersrand medical school, according to seniority,” African J. Thorac. Crit. care Med. , vol. 29, no. 1, pp. 12–17, 2023, doi: 10.7196/AJTCCM.2023.v29i1.265. R. J. McDonald et al. , “The effects of changes in utilization and technological advancements of cross-sectional imaging on radiologist workload,” Acad. Radiol. , vol. 22, no. 9, pp. 1191–1198, 2015, doi: 10.1016/j.acra.2015.05.007. L. Berlin, “Defending the ‘missed’ radiographic diagnosis,” Am. J. Roentgenol. , vol. 176, no. 2, pp. 317–322, 2001, doi: 10.2214/ajr.176.2.1760317. A. P. Brady, “Error and discrepancy in radiology: inevitable or avoidable?,” Insights Imaging , vol. 8, no. 1, pp. 171–182, 2017, doi: 10.1007/s13244-016-0534-1. P. Rajpurkar et al. , “Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning,” arXiv Prepr. arXiv1711.05225 , 2017, doi: 10.48550/arXiv.1711.05225. P. Rajpurkar et al. , “Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists,” PLoS Med. , vol. 15, no. 11, p. e1002686, 2018, doi: 10.1371/journal.pmed.1002. X. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2017, pp. 2097–2106. X. Wang, Y. Peng, L. Lu, Z. Lu, and R. M. Summers, “Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2018, pp. 9049–9058. S. Zhang et al. , “Automated Radiological Report Generation For Chest X-Rays With Weakly-Supervised End-to-End Deep Learning,” arXiv Prepr. arXiv2006.10347 , 2020, doi: 10.48550/arXiv.2006.10347. S. Sajed, H. Rostami, J. E. Garcia, A. Keshavarz, and A. Teixeira, “A Hybrid Deep Learning Approach for Enhanced Classification of Lung Pathologies From Chest X‐Ray,” Int. J. Imaging Syst. Technol. , vol. 35, no. 6, p. e70227, 2025, doi: 10.1002/ima.70227. M. E. Karar, E. E.-D. Hemdan, and M. A. Shouman, “Cascaded deep learning classifiers for computer-aided diagnosis of COVID-19 and pneumonia diseases in X-ray scans,” Complex Intell. Syst. , vol. 7, no. 1, pp. 235–247, 2021, doi: 10.1007/s40747-020-00199-4. S. Kawuma et al. , “Diagnosis and Classification of Tuberculosis Chest X-ray Images of Children Less Than 15 years at Mbarara Regional Referral Hospital Using Deep Learning,” J. Artif. Intell. Data Min. , vol. 12, no. 2, pp. 315–324, 2024, doi: 10.22044/JADM.2024.14270.2530. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization,” Grad-CAM Vis. Explan. from Deep Networks via Gradient-based Localization , vol. 17, pp. 331–336, 2016, [Online]. Available: http://arxiv.org/abs/1610.02391 V. E. Georgakopoulou, D. A. Spandidos, and A. Corlateanu, “Diagnostic tools in respiratory medicine,” Biomed. Reports , vol. 23, no. 1, p. 112, 2025, doi: 10.3892/br.2025.1990. V. C. Nwaiwu and S. K. Das, “Emerging multifaceted application of artificial intelligence in chest radiography: a narrative review,” J. Med. Artif. Intell. , vol. 7, 2024, doi: 10.21037/jmai-24-67. M. Vicent, W. Willian, and K. Simon, “A Multimodal Convolutional Neural Network Based Approach for DICOM Files Classification,” J. Eng. , vol. 2025, no. 1, p. e70107, 2025, doi: 10.1049/tje2.70107. Y. Jin, H. Lu, W. Zhu, and W. Huo, “Deep learning based classification of multi-label chest X-ray images via dual-weighted metric loss,” Comput. Biol. Med. , vol. 157, p. 106683, 2023, doi: 10.1016/j.compbiomed.2023.106683. T. Tsuji et al. , “Classification of chest X-ray images by incorporation of medical domain knowledge into operation branch networks,” BMC Med. Imaging , vol. 23, no. 1, p. 62, 2023, doi: 10.1186/s12880-023-01019-0. H. Aljuaid et al. , “RADAI: A Deep Learning-Based Classification of Lung Abnormalities in Chest X-Rays,” Diagnostics , vol. 15, no. 13, p. 1728, 2025, doi: 10.3390/diagnostics15131728. Z. Sun, L. Qu, J. Luo, Z. Song, and M. Wang, “Label correlation transformer for automated chest X-ray diagnosis with reliable interpretability,” Radiol. Med. , vol. 128, no. 6, pp. 726–733, 2023, doi: 10.1007/s11547-023-01647-0. R. W. Filice et al. , “Crowdsourcing pneumothorax annotations using machine learning annotations on the NIH chest X-ray dataset,” J. Digit. Imaging , vol. 33, no. 2, pp. 490–496, 2020, doi: 10.1007/s10278-019-00299-9. J. Irvin et al. , “Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison,” in Proceedings of the AAAI conference on artificial intelligence , 2019, pp. 590–597. doi: 10.1609/aaai.v33i01.3301590. A. E. W. Johnson et al. , “MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports,” Sci. data , vol. 6, no. 1, p. 317, 2019, doi: 10.6084/m9.figshare.10303823. K. Bouzid et al. , “PadChest-GR: A Bilingual Chest X-Ray Dataset for Grounded Radiology Report Generation,” NEJM AI , vol. 2, no. 7, 2025, doi: 10.1056/AIdbp2401120. B. Chen, J. Li, G. Lu, H. Yu, and D. Zhang, “Label co-occurrence learning with graph convolutional networks for multi-label chest x-ray image classification,” IEEE J. Biomed. Heal. informatics , vol. 24, no. 8, pp. 2292–2302, 2020, doi: 10.1109/JBHI.2020.2967084. A. Howard et al. , “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF international conference on computer vision , 2019, pp. 1314–1324. X. Zhang, X. Zhou, M. Lin, and J. Sun, “Shufflenet: An extremely efficient convolutional neural network for mobile devices,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2018, pp. 6848–6856. M. G. Hluchyj and M. J. Karol, “Shuffle net: An application of generalized perfect shuffles to multihop lightwave networks,” J. Light. Technol. , vol. 9, no. 10, pp. 1386–1397, 2002, doi: 10.1109/50.90937. S. Teerapittayanon, B. McDanel, and H.-T. Kung, “Branchynet: Fast inference via early exiting from deep neural networks,” in 2016 23rd international conference on pattern recognition (ICPR) , IEEE, 2016, pp. 2464–2469. doi: 10.1109/ICPR.2016.7900006. T. Fuchs, L. Kaiser, D. Müller, L. Papp, R. Fischer, and J. Tran-Gia, “Enhancing interoperability and harmonisation of nuclear medicine image data and associated clinical data,” Nuklearmedizin-NuclearMedicine , vol. 62, no. 06, pp. 389–398, 2023, doi: 10.1055/a-2187-5701. K. Borys et al. , “Explainable AI in medical imaging: An overview for clinical practitioners–Beyond saliency-based XAI approaches,” Eur. J. Radiol. , vol. 162, p. 110786, 2023, doi: 10.1016/j.ejrad.2023.110786. Y. Brima and M. Atemkeng, “Saliency-driven explainable deep learning in medical imaging: bridging visual explainability and statistical quantitative analysis,” BioData Min. , vol. 17, no. 1, p. 18, 2024, doi: 10.1186/s13040-024-00370-4. M. Vicent, W. William, and K. Simon, “A Multimodal Convolutional Neural Network Based Approach for DICOM Files Classification,” vol. 1, no. 1, pp. 1–10, 2025, doi: 10.1049/tje2.70107. E. H. Houssein, A. M. Gamal, E. M. G. Younis, and E. Mohamed, “Explainable artificial intelligence for medical imaging systems using deep learning: a comprehensive review,” Cluster Comput. , vol. 28, no. 7, p. 469, 2025, doi: 10.1007/s10586-025-05281-5. A. Paszke et al. , “Pytorch: An imperative style, high-performance deep learning library,” Adv. Neural Inf. Process. Syst. , vol. 32, 2019. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning , PMLR, 2019, pp. 6105–6114. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2017, pp. 4700–4708. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition , 2016, pp. 770–778. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization (Jan 2017),” 2017. Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8373899","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":561057785,"identity":"5e5d0914-5a61-49cf-ab17-b23fa63dc366","order_by":0,"name":"Ssempeebwa Phillip","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3UlEQVRIiWNgGAWjYDACCcbGAw8MGOTYGBgMmInV0nAgwYDBmBQtDAwHEhgYEhuI1mIu3Qy0peBOep9088bPBTWHE+c7MD98dAOPFss5B0EOe5bbJnOsWHrGsbTEjQfYjI1z8GgxuJEI0nI4t00ix0Cah80mcWMDD5s0MVrS2SRyjH/z/JMgXksCUIuZNG+bTeJ8BgJaLGeAtTwzbJNIK7Oe2ZdmvIGZgF/MJdIfPvjw5468/IzkzbcLvh2Wnd/e/PAxXodBqANIIofxKMeuRb6BgJZRMApGwSgYcQAAuN1S860mbPAAAAAASUVORK5CYII=","orcid":"","institution":"Deep Learning Indaba X Uganda AI Research Lab, Kampala, Uganda","correspondingAuthor":true,"prefix":"","firstName":"Ssempeebwa","middleName":"","lastName":"Phillip","suffix":""},{"id":561058414,"identity":"f76dd098-bf62-472c-9da3-8612ae0ff2d0","order_by":1,"name":"Ayebale Allen","email":"","orcid":"","institution":"Deep Learning Indaba X Uganda AI Research Lab, Kampala, Uganda","correspondingAuthor":false,"prefix":"","firstName":"Ayebale","middleName":"","lastName":"Allen","suffix":""},{"id":561058415,"identity":"245910d3-9ca7-4fc8-9423-d25d7b4d2661","order_by":2,"name":"Irene Phoebe Akitwi","email":"","orcid":"","institution":"Deep Learning Indaba X Uganda AI Research Lab, Kampala, Uganda","correspondingAuthor":false,"prefix":"","firstName":"Irene","middleName":"Phoebe","lastName":"Akitwi","suffix":""},{"id":561058416,"identity":"4982fbec-d11f-41ae-9e9c-8de90c3339ea","order_by":3,"name":"Vicent Mabirizi","email":"","orcid":"https://orcid.org/0000-0001-8990-4003","institution":"Kabale University","correspondingAuthor":false,"prefix":"","firstName":"Vicent","middleName":"","lastName":"Mabirizi","suffix":""}],"badges":[],"createdAt":"2025-12-16 09:07:09","currentVersionCode":1,"declarations":{"humanSubjects":true,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":true,"humanSubjectConsent":true,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-8373899/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8373899/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":98761652,"identity":"d44fcf12-ba69-4b0e-b7e6-4e79a5b14801","added_by":"auto","created_at":"2025-12-22 09:55:51","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1206330,"visible":true,"origin":"","legend":"","description":"","filename":"Manuscript.docx","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/7e88d8d721e8f3be2e431f66.docx"},{"id":98761642,"identity":"6da82a7d-eb46-418a-a6bf-42937adaf125","added_by":"auto","created_at":"2025-12-22 09:55:51","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":342,"visible":true,"origin":"","legend":"","description":"","filename":"rs8373899.json","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/e2a7d20636b3a8972a6e3495.json"},{"id":98761638,"identity":"7c32de36-7d55-42f5-9473-7250ab3d2c65","added_by":"auto","created_at":"2025-12-22 09:55:50","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":90110,"visible":true,"origin":"","legend":"","description":"","filename":"rs83738990enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/cf08a215be4dbaacdf370b3b.xml"},{"id":98761677,"identity":"6b5fd53d-342a-43b3-b569-20052bbe998c","added_by":"auto","created_at":"2025-12-22 09:55:59","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":897993,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/3cee595f3ded185f6d69bf43.png"},{"id":98761625,"identity":"9c6ccda8-d3ed-4f63-94d0-2bb46275aec0","added_by":"auto","created_at":"2025-12-22 09:55:44","extension":"jpeg","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":74147,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/18c4755f77c9bc7b6864d382.jpeg"},{"id":98761668,"identity":"b4104cc7-b347-499a-9056-d8119be09d9a","added_by":"auto","created_at":"2025-12-22 09:55:56","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":58906,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/1e59156940e451e4cdf90a11.png"},{"id":98761648,"identity":"8d83e076-eb8e-49f3-8cd6-704adc0bc939","added_by":"auto","created_at":"2025-12-22 09:55:51","extension":"jpeg","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":69439,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/431cc1f9d5d59ccb7b378ac0.jpeg"},{"id":98761675,"identity":"f86a458f-09d1-463c-a0a3-cac4f61c1f34","added_by":"auto","created_at":"2025-12-22 09:55:58","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":120583,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/255fa28956b182f4eaffb27c.png"},{"id":98761663,"identity":"b46ca363-8a47-4648-aa37-227f53429358","added_by":"auto","created_at":"2025-12-22 09:55:54","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":20498,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/84d0c7abf8a131d87ca5d760.png"},{"id":98761647,"identity":"4b7283f8-79b7-472a-8bd3-17a356501757","added_by":"auto","created_at":"2025-12-22 09:55:51","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":19041,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/05f31e272af3b0d2067c48af.png"},{"id":98761669,"identity":"7b4787c5-6c0e-4aa5-86f0-c078c783e245","added_by":"auto","created_at":"2025-12-22 09:55:56","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":38094,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/58189e940ec2415e554d999a.png"},{"id":98761661,"identity":"59e8e525-6738-4172-9124-e573e57c2111","added_by":"auto","created_at":"2025-12-22 09:55:53","extension":"xml","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":89088,"visible":true,"origin":"","legend":"","description":"","filename":"rs83738990structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/549759481508db7ee17bca1b.xml"},{"id":98761673,"identity":"5fc898fb-7f2f-4bd6-b40b-6a96967f28c8","added_by":"auto","created_at":"2025-12-22 09:55:57","extension":"html","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":98954,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/29e3edf7b2d740d9122c9540.html"},{"id":98761633,"identity":"7b74bc79-29c6-4024-938b-bdff5530170a","added_by":"auto","created_at":"2025-12-22 09:55:46","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":895826,"visible":true,"origin":"","legend":"\u003cp\u003eHierarchical CXR-Net funnel triage\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/87acc4aa861f708db3857da1.png"},{"id":98761715,"identity":"e687228d-45d3-4c1d-8821-42faa5527655","added_by":"auto","created_at":"2025-12-22 09:56:01","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":74147,"visible":true,"origin":"","legend":"\u003cp\u003eThe ROC curve for the winning Stage 1 screener, with the optimal threshold marked\u003c/p\u003e","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/d99e178ed4c44962e55f3f90.jpeg"},{"id":98761670,"identity":"ddcaf943-3e58-493c-b376-0e7e238f9125","added_by":"auto","created_at":"2025-12-22 09:55:56","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":47806,"visible":true,"origin":"","legend":"\u003cp\u003ePer-Pathology AUROC of the Stage 2 Expert Model (EfficientNet-B2)\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/bf4cc1bdc311f47163987c85.png"},{"id":98761671,"identity":"9a791108-b5cd-4445-929a-9f8fc846d472","added_by":"auto","created_at":"2025-12-22 09:55:56","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":69439,"visible":true,"origin":"","legend":"\u003cp\u003eGrad-CAM localization and the original CXR, highlighting regions contributing to the model’s predictions\u003c/p\u003e","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/e46ddb45a7562d7710a8283a.jpeg"},{"id":98761783,"identity":"d90a909a-a167-4ce6-a7ff-c5a8014d9bf7","added_by":"auto","created_at":"2025-12-22 09:56:12","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2058926,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8373899/v1/440801e8-85f4-46bb-87d3-45183fa9e0a7.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eHierarchical CXR-Net: A Two-Stage Interpretable Framework for Efficient and Interpretable Chest X-Ray Diagnosis\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eAs the primary imaging modality for a wide range of cardiothoracic disorders [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e], chest radiography continues to be a fundamental component of diagnostic medicine [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. However, the vast number of these test poses a serious and expanding problem for clinical workflows around the globe. Radiologist workload has increased dramatically due to both rising imaging volumes and technological advancements, with studies showing in the previous years, a radiologist had to interpret one image every 3\u0026ndash;4 seconds during an 8-hour workday to meet clinical demands [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Everyday, radiologists must interpret an increasing number of studies, many of which are eventually found to be normal [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. The high volumes of routine cases burden the critical task of identifying pathological finding, creating a classic \u0026ldquo;signal-to-noise\u0026rdquo; problem that could result in longer turnaround times for urgent diagnoses and contribute to practitioner fatigue [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe utilization of deep learning in medical imaging has demonstrated significant potential in addressing this challenge. Foundation studies, including the CheXNet and CheXNeXt algorithms [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e], have effectively illustrated that Convolutional Neural Networks (CNN\u0026rsquo;s) can attain radiologist-level proficiency in the intricate task of multi-label pathology classification utilizing extensive public datasets such as the NIH Chest x-ray14 collection [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. These studies provided a compelling proof-of-concept for artificial intelligence in radiology. However, the evolution of the field has revealed critical discrepancies between these academic standards and the requirements for a functional, practical, and clinically integrated system.\u003c/p\u003e \u003cp\u003eA predominant limitation is the prevailing single-model (\u0026ldquo;monolithic\u0026rdquo;) approach, where a single, computationally intensive model is applied uniformly to every image. This design is misaligned with the efficient, triage-based workflows practiced in clinical settings. Moreover, many of the most sophisticated systems capable of generating fluent narrative reports have been developed on private, institutional datasets of paired images and text [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e], [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e], rendering their methods scientifically non-reproducible.\u003c/p\u003e \u003cp\u003eBy shifting the emphasis from isolated model optimization to an intelligent, workflow-oriented system, this paper introduces Hierarchical CXR-Net, a two-stage framework aligned with prior studies that use cascaded or multi-stage architectures for chest X-ray screening and diagnosis [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e], [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e], [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. In this framework, the second stage employs a high-capacity model for comprehensive 14-class multi-label classification on flagged cases, while the first stage utilizes a lightweight screener to rapidly distinguish \u0026ldquo;Likely Normal\u0026rdquo; from \u0026ldquo;Potentially Abnormal\u0026rdquo; scans, effectively simulating clinical triage. This hierarchical division mirrors real radiology decision pathways and ensures optimal allocation of computational resources.\u003c/p\u003e \u003cp\u003eWe make sure that every model is empirically validated against the most advanced alternatives by thoroughly defending our architectural decisions with comparative experiments. The entire pipeline is constructed using publicly available datasets, integrates Grad-CAM [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] for localized interpretability, and is intended to be a transparent, repeatable \u0026ldquo;glass box\u0026rdquo; system for the scientific community.\u003c/p\u003e"},{"header":"2. Related Literature","content":"\u003cp\u003eChest X-ray imaging remains one of the most widely used diagnostics tools in clinical practice [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e], [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], particularly in resource-limited settings where access to advanced imaging modalities such as CT or MRI is restricted [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. Consequently, recent years have seen substantial interest in AI-driven CXR interpretation systems designed to augment radiologists\u0026rsquo; workflow, reduce diagnostic delays, and improve accuracy. Across the literature, three major themes emerge: (i) advances in deep learning convolutional networks for multi-label CXR classification (e.g., dual-weighted loss models and transformer-based classification) [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e], [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e], (ii) integration of hierarchical or multi-stage diagnostic pipelines to better reflect real-world diagnostic triage and resource allocation (e.g., RadAI hybrid classification models) [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e], and (iii) growing emphasis on interpretability and clinical trust through anatomically guided attention mechanisms and explicit attention-based heatmaps (domain-knowledge attention networks; transformer-label correlation models) [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e].\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Advances in Deep Learning for CXR Classification\u003c/h2\u003e \u003cp\u003eLarge-scale CXR datasets such as NIH ChestX-ray14 [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e], CheXpert [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e], MIMIC-CXR [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e], and PadChest [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] have enabled the development of high-capacity convolutional neural networks capable of detecting multiple thoracic diseases simultaneously. Early landmark models such as CheXNet built on DenseNet-121, demonstrated near-radiologist performance on pneumonia detection and laid the foundation for multi-pathology classification architectures [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Subsequent studies improved upon these baselines by introducing attention mechanisms, classification loss functions, and transformer-based architectures, achieving more robust multi-label predictions across diverse clinical populations [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eDespite these advances, high-capacity architecture remains computationally expensive, making them less suitable for real-time triage, especially in low-resource hospital. The models often process every image with equal computational intensity, even though the majority of CXR scans in clinical practice may be normal. This inefficiency has motivated research into lightweight classification modules, efficient networks (e.g., MobileNet) [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e], [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e], and early-exit architectures that reduce inference time without major performance losses [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Hierarchical and Multi-Stage Diagnostic Pipeline\u003c/h2\u003e \u003cp\u003eIn real-world radiology workflow, diagnosis is not a single-step process [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Radiologists typically perform an initial screening to determine whether an image appears normal before conducting a more thorough, pathology-specific investigation. Several AI studies have attempted to mimic this workflow through two-stage or hierarchical classification systems. For example, some works propose using a lightweight classifier to distinguish normal from abnormal images, followed by a deeper model to analyse abnormal scans in detail [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. This approach not only reflects clinically meaningful triage processes but also provides computational savings by applying complex models only when necessary. However, many existing hierarchical frameworks rely on static thresholds or lack interoperability mechanisms to justify transitions between stages, limiting their clinical reliability [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe proposed Hierarchical CXR-Net aligns with this line of work but addresses key gaps: (i) ensuring the first-stage screener is extremely lightweight yet reliable, (ii) incorporating a high-capacity model for 14-class multi-label classification, and (iii) emphasizing interpretability using attention or heatmap-based visual expiations [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. By doing so, the framework provides both efficiency and clinical relevance while maintaining diagnostic precision.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Interpretability and Clinically aligned AI Decision-Making\u003c/h2\u003e \u003cp\u003eInterpretability remains a central concern in medical AI because clinicians require transparency to trust automated systems. Techniques such as Grad-CAM, heatmaps, attention visualisation, and saliency maps have been widely adopted to highlight regions that drive model predictions [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. Recent studies have shown that interpretable models not only improve clinician acceptance but may also help identify biases or incorrect patterns learned from noisy labels [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e], [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eHowever, many state-of-the-art multi-label classification models still function as \u0026ldquo;black boxes\u0026rdquo; offering limited explanatory value for their predictions [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e]. A growing movement towards interpretable deep learning emphasizes building models that can justify each decision step, especially in hierarchical systems where misclassification at stage 1 affects all downstream outcomes. The proposed Hierarchical CXR-Net contributes to this need by integrating interpretability at both stages, reinforcing trustworthiness and providing radiologists with explicit visual evidence of suspected abnormalities.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Methodology","content":"\u003cp\u003eThe design of Hierarchical CXR-Net is based on a workflow-centered approach that splits the high-volume screening task from the high-complexity diagnostic task. In this part, we talk about the dataset that was used, the structure of the two-stage framework, and the strict experimental design that was used to choose and test each part. All models were developed using the PyTorch deep learning library [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e].\u003c/p\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Dataset and Preprocessing\u003c/h2\u003e \u003cp\u003eThis study utilized the NIH ChestX-ray14 dataset, a large-scale, publicly available collection of 112,120 frontal-view radiographs from 30,805 unique patients [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e]. We strictly adhered to the official patient-level splits to ensure a robust and comparable evaluation, partitioning the data into a training set (86,524 images) and a held-out test set (25,596 images).\u003c/p\u003e \u003cp\u003eThe original multi-class labels were processed to create two distinct targets for our framework's stages. For the Stage 1 triage task, the 14 pathology labels were consolidated into a single binary target, Is_Abnormal, where a value of 1 indicated the presence of at least one pathology and 0 corresponded to the \"No Finding\" label. For the Stage 2 diagnostic task, the original 14 labels were maintained in a multi-hot encoded vector format.\u003c/p\u003e \u003cp\u003eAll images were resized to a consistent input dimension (224\u0026times;224 or 288\u0026times;288 pixels) and normalized based on the ImageNet dataset's mean and standard deviation. To enhance model generalization and mitigate overfitting, standard data augmentation techniques, including random horizontal flipping and random rotations (\u0026plusmn;\u0026thinsp;10 degrees), were applied exclusively during the training phase.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.2 The Hierarchical CXR-Net Framework\u003c/h2\u003e \u003cp\u003eOur proposed system is composed of two sequential deep learning models, each with a specialized function as illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eStage 1: The Efficient Triage Screener\u003c/b\u003e: The primary function of this stage is the rapid binary classification of all incoming studies. To address the inherent class imbalance of the dataset, where \u0026ldquo;No Finding\u0026rdquo; constitutes the majority class, we implemented a robust training strategy. A \u003cem\u003eWeightedRandomSampler\u003c/em\u003e was employed to ensure that each training mini-batch contained a balanced representation of normal and abnormal cases. This was complemented by a weighted binary cross-entropy loss function, which applies a higher penalty (pos_weight) for the misclassification of the minority \u0026ldquo;Abnormal\u0026rdquo; class, thereby compelling the model to learn its features more effectively.\u003c/p\u003e \u003cp\u003e \u003cb\u003eStage 2: The Expert Multi-Label Diagnostician\u003c/b\u003e: Studies flagged as \u0026ldquo;Potentially Abnormal\u0026rdquo; by the screener are subsequently processed by this more powerful second-stage model. Its objective is to perform a detailed, multi-label classification to identify which of the 14 specific pathologies are present. This model leverages a larger, higher-capacity architecture to discern the complex and often overlapping visual patterns of different thoracic diseases.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Experimental Design for Model Selection and Validation\u003c/h2\u003e \u003cp\u003eA core principle of our methodology is the data-driven justification of architectural choices. To this end, we conducted comprehensive comparative experiments for both stages.\u003c/p\u003e \u003cp\u003e \u003cb\u003eStage 1: Model Selection\u003c/b\u003e: We evaluated three lightweight architectures renowned for their computational efficiency: EfficientNet-B0 [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e], MobileNetV3-Small [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], and ResNet-18 [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. These models were trained on the binary classification task for five epochs, and their performance was compared based on Test Set Area Under the ROC Curve (AUROC), model size (parameters), and total training time to identify the optimal balance of accuracy and efficiency.\u003c/p\u003e \u003cp\u003e \u003cb\u003eStage 2: Model Selection\u003c/b\u003e: For the more demanding diagnostic task, we evaluated three high-performance architectures: EfficientNet-B2 [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e], DenseNet-121 [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e], and ResNet-50 [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e]. These models were trained for ten epochs on the multi-label task. The primary selection criterion was the macro-averaged Mean AUROC across all 14 classes on the test set.\u003c/p\u003e \u003cp\u003e \u003cb\u003eStage 3: Implementation and Evaluation Details\u003c/b\u003e: All models were fine-tuned using the Adam optimizer [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e] with an initial learning rate of 1e-4. Model convergence was monitored by plotting the test set AUROC after each training epoch. The final screener's performance was further characterized at its optimal operating point, determined by Youden's J statistic, using a full classification report and confusion matrix.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Interpretability and Reporting\u003c/h2\u003e \u003cp\u003eTo ensure the system is not a \u0026ldquo;black box,\u0026rdquo; we integrated an interpretability layer. We employ Grad-CAM, a post-hoc visualization technique, on the final convolutional layer of the Stage 2 expert model. This generates a heatmap that provides a visual explanation for a given prediction by highlighting the most salient image regions. This visual evidence is then combined with the model's probabilistic outputs in a deterministic, template-based reporting module to produce a safe and structured final report.\u003c/p\u003e \u003c/div\u003e"},{"header":"4. Results","content":"\u003cp\u003eWe evaluated our hierarchical framework by first conducting model selection experiments for each stage and then demonstrating the performance of the integrated pipeline. All performance metrics reported are calculated on the held-out test set.\u003c/p\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Stage 1: Screener Performance and Model Selection\u003c/h2\u003e \u003cp\u003eTo identify the optimal architecture for the rapid triage task, we compared three lightweight CNNs. The training was conducted using our imbalance-robust strategy earlier presented. The performance, model size, and total training time for each candidate are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eEfficientNet-B0 was selected as the definitive Stage 1 screener, as it achieved the highest Test AUROC of 0.831. While MobileNetV3-Small was more parameter-efficient, its diagnostic performance was significantly lower. Conversely, ResNet-18 offered no performance benefit over EfficientNet-B0 despite having nearly three times the number of parameters. Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e presents comparative results for stage 1 screener model selection.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparative results for stage 1 screener model selection\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel Architecture\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTest AUROC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eParameters (M)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientNet-B0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.831\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e4.0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResNet-18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.796\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e11.2\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMobileNetV3-Small\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.783\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe final performance of the selected EfficientNet-B0 screener is detailed by its Receiver Operating Characteristic (ROC) curve, shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e. At the optimal operating point determined by Youden's J statistic (threshold\u0026thinsp;=\u0026thinsp;0.59), the model achieved a sensitivity of 73% and a specificity of 78%. This indicates that the screener correctly identifies 73% of all abnormal studies while correctly clearing 78% of all normal studies, effectively fulfilling its role as a clinical triage tool.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Stage 2: Expert Diagnostician Performance and Model Selection\u003c/h2\u003e \u003cp\u003eFor the high-complexity multi-label diagnostic task, three high-performance architectures were compared over the training epochs. The primary selection criterion was the macro-averaged Mean AUROC across all 14 pathologies. The results are presented in Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eThe experiment revealed that all three models are highly capable, achieving state-of-the-art performance with remarkably similar peak AUROC scores. EfficientNet-B2 was selected as the definitive Stage 2 expert, as it achieved the highest Mean AUROC of 0.814. While DenseNet-121 was slightly smaller, the marginal performance gain of the more modern EfficientNet-B2 architecture made it the superior choice for a task where diagnostic accuracy is the primary objective. Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e presents comparative results for stage 2 expert model selection.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eComparative results for stage 2 expert model selection\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel Architecture\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBest Mean AUROC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eParameters (M)\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficientNet-B2\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003e0.814\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003e7.7\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eResNet-50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.813\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e23.5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDenseNet-121\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.813\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e7.0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe per-pathology performance of the selected EfficientNet-B2 model is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. The model demonstrated excellent performance on pathologies with distinct visual signatures, such as Emphysema (AUROC 0.92), Hernia (AUROC 0.90) and Cardiomegaly (AUROC 0.88), and was most challenged by diffuse and often ambiguous findings like Pneumonia (AUROC 0.71) and Infiltration (AUROC 0.70). The convergence analysis for all models confirmed that peak performance was achieved within 5\u0026ndash;8 epochs, validating our training duration of 10 epochs.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e4.3 Integrated Pipeline Case Study\u003c/h2\u003e \u003cp\u003eTo demonstrate the functionality of the complete hierarchical system, we present a case study from the test set in Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e. The ground truth for this image was \"No Finding.\" The Stage 1 screener processed the image and returned an abnormality probability of 86.07%, correctly escalating it for expert review based on our optimal threshold. The Stage 2 expert subsequently analysed the image and identified \u0026ldquo;Cardiomegaly\u0026rdquo; as the most likely finding with 54.0% confidence. The accompanying Grad-CAM visualization confirms that the expert model's reasoning was anatomically sound, with its attention focused exclusively on the cardiac silhouette. This case exemplifies the system's ability to flag and provide a specific, interpretable hypothesis for borderline or subtle cases that may deviate from a strict \"normal\" appearance.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"5. Discussion","content":"\u003cp\u003eThis study presents the development and validation of Hierarchical CXR-Net, a novel two-stage framework designed for the automated interpretation of chest radiographs. The findings indicate that dividing the clinical workflow into a rapid triage screening phase and an expert diagnostic phase enables the development of a system that is both diagnostically effective and methodologically sound.\u003c/p\u003e \u003cp\u003eOur primary contribution is the design of the hierarchical workflow itself. The initial triage, conducted by a lightweight EfficientNet-B0, attained a strong AUROC of 0.831 for the binary task of separating normal from abnormal studies. This result, achieved after a thorough model selection process and with robust handling of the dataset's inherent class imbalance, confirms the viability of a rapid \u0026ldquo;first-pass\u0026rdquo; filter. The clinical utility of this stage is significant: at its optimal threshold, the screener can correctly clear 78% of all normal studies, substantially reducing the workload for the more computationally intensive expert model, thereby addressing a key limitation of monolithic systems.\u003c/p\u003e \u003cp\u003eThe Stage 2 expert model, identified as EfficientNet-B2 through a data-driven comparative analysis, achieved a Mean AUROC of 0.814 across 14 pathologies. The per-pathology performance analysis yielded essential insights regarding the model's capabilities. The identification of findings with distinct visual signatures, including Emphysema (AUROC 0.92) and Cardiomegaly (AUROC 0.88), underscores its capability in discerning clear anatomical and pathological patterns. In contrast, the model\u0026rsquo;s relative difficulty with diffuse and ambiguous conditions such as Infiltration (AUROC 0.70) should not be viewed as a failure. Instead, it highlights the inherent diagnostic challenges and label ambiguity present in the weakly-supervised NIH dataset.\u003c/p\u003e \u003cp\u003eThe case studies presented underscore the practical value and the nuances of our integrated system. The case of the false positive screener (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e), where a \u0026ldquo;No Finding\u0026rdquo; image was flagged and subsequently diagnosed with \u0026ldquo;Cardiomegaly\u0026rdquo; by the expert, is particularly insightful. The Grad-CAM visualization, which showed anatomically perfect localization on the cardiac silhouette, demonstrates that the system is not failing randomly. Instead, it is identifying borderline or subtle cases that may exist in a gray area even for human interpreters. This highlights the system's potential to act as a sensitive second reader, drawing attention to cases that merit closer inspection. At the same time, our analysis of screener misses (false negatives) serves as a crucial reminder that no automated system is infallible and underscores the necessity of keeping a human radiologist in the loop.\u003c/p\u003e \u003cp\u003eUnlike systems from existing literature that require private datasets of paired reports [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e],[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], our entire framework is built and validated on publicly available data, making it a fully reproducible \u0026ldquo;glass box\u0026rdquo; benchmark. By grounding our architectural choices in empirical evidence, we move beyond precedent-based design and provide a clear justification for the models used.\u003c/p\u003e \u003cp\u003eHowever, we must acknowledge the limitations of this study. The analysis was conducted on a retrospective dataset from a single source, and prospective, multi-institutional validation is required to confirm the generalizability of our findings. The ground truth labels, while the standard for this dataset, are derived from NLP and contain inherent noise. Furthermore, our template-based reporting module, while safe and predictable, lacks the linguistic nuance of generative models.\u003c/p\u003e \u003cp\u003eDespite these limitations, Hierarchical CXR-Net provides a compelling blueprint for the design of practical and effective clinical AI tools. By prioritizing a workflow-centric design, methodological rigor, and interpretability, our work contributes a robust and transparent framework to the field of automated medical image analysis.\u003c/p\u003e"},{"header":"6. Conclusion","content":"\u003cp\u003eIn this study, we introduced Hierarchical CXR-Net, a two-stage deep learning framework designed to address key challenges in automated chest radiograph interpretation. By explicitly modeling the clinical workflow, separating rapid triage from detailed expert-level diagnosis, the proposed system demonstrates strong diagnostic performance while maintaining computational efficiency, interpretability, and methodological transparency.\u003c/p\u003e \u003cp\u003eOur data-driven model selection strategy established a new standard for empirical rigor in this domain, identifying EfficientNet-B0 as an optimal lightweight triage model (AUROC 0.831) and EfficientNet-B2 as a high-performing specialist classifier (mean AUROC 0.814). The resulting pipeline constitutes a fully reproducible, end-to-end \u0026ldquo;glass-box\u0026rdquo; architecture developed exclusively from publicly available datasets, thereby eliminating barriers related to proprietary data or opaque modeling processes.\u003c/p\u003e \u003cp\u003eIn conclusion, the proposed Hierarchical CXR-Net provides a robust and extensible foundation for advancing workflow-aligned medical AI systems. Future research can build on this framework to integrate richer clinical context, support multi-task diagnostic pathways, and evaluate real-world deployment in diverse healthcare settings.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eEthics approval and consent to participate\u003c/h2\u003e\n\u003cp\u003eThe study received ethical clearance from the Kabale University Research Ethics Committee (KAB-REC). The requirement for informed consent was waived because no primary data were collected from human participants. The study exclusively utilised publicly available secondary data from the NIH Chest X-ray 14 dataset, accessed through National Institutes of Health chest x-ray dataset Kaggle website account, and all analyses adhered strictly to the original patient-level data splits to ensure integrity and comparability.\u003c/p\u003e\n\u003ch2\u003eConsent for publication\u003c/h2\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003ch2\u003eAvailability of data and material\u003c/h2\u003e\n\u003cp\u003eThe datasets used in this study, the NIH ChestX-ray14 collection, are publicly available and can be accessed at the National Institutes of Health Kaggle website account on this link: https://www.kaggle.com/datasets/nih-chest-xrays/data/.\u003c/p\u003e\n\u003ch2\u003eCompeting interests\u003c/h2\u003e\n\u003cp\u003eThe authors declare that they have no competing interests\u003c/p\u003e\n\u003ch2\u003eFunding\u003c/h2\u003e\n\u003cp\u003eThis research received no specific grant from any funding agency.\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eAuthors\u0026rsquo; contributions\u003c/h2\u003e\n\u003cp\u003eSP pre-processed the datasets, developed the methodology and implemented the software, and was. AA and IPA contributed to data curation, and framework validation. MV contributed to conceptualisation, provided supervision and project administration, and approved the final manuscript. All authors contributed to writing, review and editing.\u003c/p\u003e\n\u003ch2\u003eAcknowledgement\u003c/h2\u003e\n\u003cp\u003eWe would like to thank the Deep Learning Indaba X Uganda AI Research Lab for providing the necessary computational resources and collaborative environment for this research. We also extend our appreciation to the Department of Computer Science at Kabale University for its academic support and institutional guidance throughout the project.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eJ. Broder, \u0026ldquo;Imaging the chest: the chest radiograph,\u0026rdquo; \u003cem\u003eDiagnostic imaging Emerg. physician\u003c/em\u003e, pp. 185\u0026ndash;296, 2011, doi: 10.1016/B978-1-4160-6113-7.10005-5.\u003c/li\u003e\n\u003cli\u003eR. G. Dreyer, C. M. Van der Merwe, M. A. Nicolaou, and G. A. Richards, \u0026ldquo;Assessing and comparing chest radiograph interpretation in the Department of Internal Medicine at the University of the Witwatersrand medical school, according to seniority,\u0026rdquo; \u003cem\u003eAfrican J. Thorac. Crit. care Med.\u003c/em\u003e, vol. 29, no. 1, pp. 12\u0026ndash;17, 2023, doi: 10.7196/AJTCCM.2023.v29i1.265.\u003c/li\u003e\n\u003cli\u003eR. J. McDonald \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;The effects of changes in utilization and technological advancements of cross-sectional imaging on radiologist workload,\u0026rdquo; \u003cem\u003eAcad. Radiol.\u003c/em\u003e, vol. 22, no. 9, pp. 1191\u0026ndash;1198, 2015, doi: 10.1016/j.acra.2015.05.007.\u003c/li\u003e\n\u003cli\u003eL. Berlin, \u0026ldquo;Defending the \u0026lsquo;missed\u0026rsquo; radiographic diagnosis,\u0026rdquo; \u003cem\u003eAm. J. Roentgenol.\u003c/em\u003e, vol. 176, no. 2, pp. 317\u0026ndash;322, 2001, doi: 10.2214/ajr.176.2.1760317.\u003c/li\u003e\n\u003cli\u003eA. P. Brady, \u0026ldquo;Error and discrepancy in radiology: inevitable or avoidable?,\u0026rdquo; \u003cem\u003eInsights Imaging\u003c/em\u003e, vol. 8, no. 1, pp. 171\u0026ndash;182, 2017, doi: 10.1007/s13244-016-0534-1.\u003c/li\u003e\n\u003cli\u003eP. Rajpurkar \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Chexnet: Radiologist-level pneumonia detection on chest x-rays with deep learning,\u0026rdquo; \u003cem\u003earXiv Prepr. arXiv1711.05225\u003c/em\u003e, 2017, doi: 10.48550/arXiv.1711.05225.\u003c/li\u003e\n\u003cli\u003eP. Rajpurkar \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists,\u0026rdquo; \u003cem\u003ePLoS Med.\u003c/em\u003e, vol. 15, no. 11, p. e1002686, 2018, doi: 10.1371/journal.pmed.1002.\u003c/li\u003e\n\u003cli\u003eX. Wang, Y. Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, \u0026ldquo;Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases,\u0026rdquo; in \u003cem\u003eProceedings of the IEEE conference on computer vision and pattern recognition\u003c/em\u003e, 2017, pp. 2097\u0026ndash;2106.\u003c/li\u003e\n\u003cli\u003eX. Wang, Y. Peng, L. Lu, Z. Lu, and R. M. Summers, \u0026ldquo;Tienet: Text-image embedding network for common thorax disease classification and reporting in chest x-rays,\u0026rdquo; in \u003cem\u003eProceedings of the IEEE conference on computer vision and pattern recognition\u003c/em\u003e, 2018, pp. 9049\u0026ndash;9058.\u003c/li\u003e\n\u003cli\u003eS. Zhang \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Automated Radiological Report Generation For Chest X-Rays With Weakly-Supervised End-to-End Deep Learning,\u0026rdquo; \u003cem\u003earXiv Prepr. arXiv2006.10347\u003c/em\u003e, 2020, doi: 10.48550/arXiv.2006.10347.\u003c/li\u003e\n\u003cli\u003eS. Sajed, H. Rostami, J. E. Garcia, A. Keshavarz, and A. Teixeira, \u0026ldquo;A Hybrid Deep Learning Approach for Enhanced Classification of Lung Pathologies From Chest X‐Ray,\u0026rdquo; \u003cem\u003eInt. J. Imaging Syst. Technol.\u003c/em\u003e, vol. 35, no. 6, p. e70227, 2025, doi: 10.1002/ima.70227.\u003c/li\u003e\n\u003cli\u003eM. E. Karar, E. E.-D. Hemdan, and M. A. Shouman, \u0026ldquo;Cascaded deep learning classifiers for computer-aided diagnosis of COVID-19 and pneumonia diseases in X-ray scans,\u0026rdquo; \u003cem\u003eComplex Intell. Syst.\u003c/em\u003e, vol. 7, no. 1, pp. 235\u0026ndash;247, 2021, doi: 10.1007/s40747-020-00199-4.\u003c/li\u003e\n\u003cli\u003eS. Kawuma \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Diagnosis and Classification of Tuberculosis Chest X-ray Images of Children Less Than 15 years at Mbarara Regional Referral Hospital Using Deep Learning,\u0026rdquo; \u003cem\u003eJ. Artif. Intell. Data Min.\u003c/em\u003e, vol. 12, no. 2, pp. 315\u0026ndash;324, 2024, doi: 10.22044/JADM.2024.14270.2530.\u003c/li\u003e\n\u003cli\u003eR. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, \u0026ldquo;Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization,\u0026rdquo; \u003cem\u003eGrad-CAM Vis. Explan. from Deep Networks via Gradient-based Localization\u003c/em\u003e, vol. 17, pp. 331\u0026ndash;336, 2016, [Online]. Available: http://arxiv.org/abs/1610.02391\u003c/li\u003e\n\u003cli\u003eV. E. Georgakopoulou, D. A. Spandidos, and A. Corlateanu, \u0026ldquo;Diagnostic tools in respiratory medicine,\u0026rdquo; \u003cem\u003eBiomed. Reports\u003c/em\u003e, vol. 23, no. 1, p. 112, 2025, doi: 10.3892/br.2025.1990.\u003c/li\u003e\n\u003cli\u003eV. C. Nwaiwu and S. K. Das, \u0026ldquo;Emerging multifaceted application of artificial intelligence in chest radiography: a narrative review,\u0026rdquo; \u003cem\u003eJ. Med. Artif. Intell.\u003c/em\u003e, vol. 7, 2024, doi: 10.21037/jmai-24-67.\u003c/li\u003e\n\u003cli\u003eM. Vicent, W. Willian, and K. Simon, \u0026ldquo;A Multimodal Convolutional Neural Network Based Approach for DICOM Files Classification,\u0026rdquo; \u003cem\u003eJ. Eng.\u003c/em\u003e, vol. 2025, no. 1, p. e70107, 2025, doi: 10.1049/tje2.70107.\u003c/li\u003e\n\u003cli\u003eY. Jin, H. Lu, W. Zhu, and W. Huo, \u0026ldquo;Deep learning based classification of multi-label chest X-ray images via dual-weighted metric loss,\u0026rdquo; \u003cem\u003eComput. Biol. Med.\u003c/em\u003e, vol. 157, p. 106683, 2023, doi: 10.1016/j.compbiomed.2023.106683.\u003c/li\u003e\n\u003cli\u003eT. Tsuji \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Classification of chest X-ray images by incorporation of medical domain knowledge into operation branch networks,\u0026rdquo; \u003cem\u003eBMC Med. Imaging\u003c/em\u003e, vol. 23, no. 1, p. 62, 2023, doi: 10.1186/s12880-023-01019-0.\u003c/li\u003e\n\u003cli\u003eH. Aljuaid \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;RADAI: A Deep Learning-Based Classification of Lung Abnormalities in Chest X-Rays,\u0026rdquo; \u003cem\u003eDiagnostics\u003c/em\u003e, vol. 15, no. 13, p. 1728, 2025, doi: 10.3390/diagnostics15131728.\u003c/li\u003e\n\u003cli\u003eZ. Sun, L. Qu, J. Luo, Z. Song, and M. Wang, \u0026ldquo;Label correlation transformer for automated chest X-ray diagnosis with reliable interpretability,\u0026rdquo; \u003cem\u003eRadiol. Med.\u003c/em\u003e, vol. 128, no. 6, pp. 726\u0026ndash;733, 2023, doi: 10.1007/s11547-023-01647-0.\u003c/li\u003e\n\u003cli\u003eR. W. Filice \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Crowdsourcing pneumothorax annotations using machine learning annotations on the NIH chest X-ray dataset,\u0026rdquo; \u003cem\u003eJ. Digit. Imaging\u003c/em\u003e, vol. 33, no. 2, pp. 490\u0026ndash;496, 2020, doi: 10.1007/s10278-019-00299-9.\u003c/li\u003e\n\u003cli\u003eJ. Irvin \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison,\u0026rdquo; in \u003cem\u003eProceedings of the AAAI conference on artificial intelligence\u003c/em\u003e, 2019, pp. 590\u0026ndash;597. doi: 10.1609/aaai.v33i01.3301590.\u003c/li\u003e\n\u003cli\u003eA. E. W. Johnson \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports,\u0026rdquo; \u003cem\u003eSci. data\u003c/em\u003e, vol. 6, no. 1, p. 317, 2019, doi: 10.6084/m9.figshare.10303823.\u003c/li\u003e\n\u003cli\u003eK. Bouzid \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;PadChest-GR: A Bilingual Chest X-Ray Dataset for Grounded Radiology Report Generation,\u0026rdquo; \u003cem\u003eNEJM AI\u003c/em\u003e, vol. 2, no. 7, 2025, doi: 10.1056/AIdbp2401120.\u003c/li\u003e\n\u003cli\u003eB. Chen, J. Li, G. Lu, H. Yu, and D. Zhang, \u0026ldquo;Label co-occurrence learning with graph convolutional networks for multi-label chest x-ray image classification,\u0026rdquo; \u003cem\u003eIEEE J. Biomed. Heal. informatics\u003c/em\u003e, vol. 24, no. 8, pp. 2292\u0026ndash;2302, 2020, doi: 10.1109/JBHI.2020.2967084.\u003c/li\u003e\n\u003cli\u003eA. Howard \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Searching for mobilenetv3,\u0026rdquo; in \u003cem\u003eProceedings of the IEEE/CVF international conference on computer vision\u003c/em\u003e, 2019, pp. 1314\u0026ndash;1324.\u003c/li\u003e\n\u003cli\u003eX. Zhang, X. Zhou, M. Lin, and J. Sun, \u0026ldquo;Shufflenet: An extremely efficient convolutional neural network for mobile devices,\u0026rdquo; in \u003cem\u003eProceedings of the IEEE conference on computer vision and pattern recognition\u003c/em\u003e, 2018, pp. 6848\u0026ndash;6856.\u003c/li\u003e\n\u003cli\u003eM. G. Hluchyj and M. J. Karol, \u0026ldquo;Shuffle net: An application of generalized perfect shuffles to multihop lightwave networks,\u0026rdquo; \u003cem\u003eJ. Light. Technol.\u003c/em\u003e, vol. 9, no. 10, pp. 1386\u0026ndash;1397, 2002, doi: 10.1109/50.90937.\u003c/li\u003e\n\u003cli\u003eS. Teerapittayanon, B. McDanel, and H.-T. Kung, \u0026ldquo;Branchynet: Fast inference via early exiting from deep neural networks,\u0026rdquo; in \u003cem\u003e2016 23rd international conference on pattern recognition (ICPR)\u003c/em\u003e, IEEE, 2016, pp. 2464\u0026ndash;2469. doi: 10.1109/ICPR.2016.7900006.\u003c/li\u003e\n\u003cli\u003eT. Fuchs, L. Kaiser, D. M\u0026uuml;ller, L. Papp, R. Fischer, and J. Tran-Gia, \u0026ldquo;Enhancing interoperability and harmonisation of nuclear medicine image data and associated clinical data,\u0026rdquo; \u003cem\u003eNuklearmedizin-NuclearMedicine\u003c/em\u003e, vol. 62, no. 06, pp. 389\u0026ndash;398, 2023, doi: 10.1055/a-2187-5701.\u003c/li\u003e\n\u003cli\u003eK. Borys \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Explainable AI in medical imaging: An overview for clinical practitioners\u0026ndash;Beyond saliency-based XAI approaches,\u0026rdquo; \u003cem\u003eEur. J. Radiol.\u003c/em\u003e, vol. 162, p. 110786, 2023, doi: 10.1016/j.ejrad.2023.110786.\u003c/li\u003e\n\u003cli\u003eY. Brima and M. Atemkeng, \u0026ldquo;Saliency-driven explainable deep learning in medical imaging: bridging visual explainability and statistical quantitative analysis,\u0026rdquo; \u003cem\u003eBioData Min.\u003c/em\u003e, vol. 17, no. 1, p. 18, 2024, doi: 10.1186/s13040-024-00370-4.\u003c/li\u003e\n\u003cli\u003eM. Vicent, W. William, and K. Simon, \u0026ldquo;A Multimodal Convolutional Neural Network Based Approach for DICOM Files Classification,\u0026rdquo; vol. 1, no. 1, pp. 1\u0026ndash;10, 2025, doi: 10.1049/tje2.70107.\u003c/li\u003e\n\u003cli\u003eE. H. Houssein, A. M. Gamal, E. M. G. Younis, and E. Mohamed, \u0026ldquo;Explainable artificial intelligence for medical imaging systems using deep learning: a comprehensive review,\u0026rdquo; \u003cem\u003eCluster Comput.\u003c/em\u003e, vol. 28, no. 7, p. 469, 2025, doi: 10.1007/s10586-025-05281-5.\u003c/li\u003e\n\u003cli\u003eA. Paszke \u003cem\u003eet al.\u003c/em\u003e, \u0026ldquo;Pytorch: An imperative style, high-performance deep learning library,\u0026rdquo; \u003cem\u003eAdv. Neural Inf. Process. Syst.\u003c/em\u003e, vol. 32, 2019.\u003c/li\u003e\n\u003cli\u003eM. Tan and Q. Le, \u0026ldquo;Efficientnet: Rethinking model scaling for convolutional neural networks,\u0026rdquo; in \u003cem\u003eInternational conference on machine learning\u003c/em\u003e, PMLR, 2019, pp. 6105\u0026ndash;6114.\u003c/li\u003e\n\u003cli\u003eG. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, \u0026ldquo;Densely connected convolutional networks,\u0026rdquo; in \u003cem\u003eProceedings of the IEEE conference on computer vision and pattern recognition\u003c/em\u003e, 2017, pp. 4700\u0026ndash;4708.\u003c/li\u003e\n\u003cli\u003eK. He, X. Zhang, S. Ren, and J. Sun, \u0026ldquo;Deep residual learning for image recognition,\u0026rdquo; in \u003cem\u003eProceedings of the IEEE conference on computer vision and pattern recognition\u003c/em\u003e, 2016, pp. 770\u0026ndash;778.\u003c/li\u003e\n\u003cli\u003eD. P. Kingma and J. Ba, \u0026ldquo;Adam: A method for stochastic optimization (Jan 2017),\u0026rdquo; 2017. \u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Kabale University","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Chest X-Ray Diagnosis, Hierarchical Deep Learning, Medical Image Triage, EfficientNet, Interpretable AI","lastPublishedDoi":"10.21203/rs.3.rs-8373899/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8373899/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe increasing volume of daily chest X-ray examinations places a significant burden on clinical workflows, as most scans are normal but still require expert review, delaying the diagnosis of critical conditions. Many of existing deep learning models are either computationally heavy and unsuitable for triage or lack transparency. This study aimed to develop an efficient, interpretable, and reproducible hierarchical model aligned with real clinical practice. We proposed Hierarchical chest X-ray Network, a two-stage framework built entirely on public dataset. Stege 1 utilised a lightweight EfficientNet-B0 model, selected through rigorous competitive experiment, to rapidly triage and prioritise potentially abnormal cases. Stage 2 employed a more powerful EfficientNet-B2 model, also empirically validated, to perform 14-class multi-label classification on the prioritised images. The Stage 1 screener achieved a test area under the receiver operating characteristics curve of 0.831, demonstrating efficient and imbalance-robust screening performance. The Stage 2 expert model achieved a mean area under the receiver operating characteristics curve of 0.814 across 14 pathologies, providing strong diagnostic capabilities. Hierarchical chest X-ray Network enhances workflow efficiency while improving transparency and reproducibility compared to traditional single-stage models. Its two-step, workflow-oriented architecture offers a practical, interpretable solution suitable for integration into real-world clinical settings.\u003c/p\u003e","manuscriptTitle":"Hierarchical CXR-Net: A Two-Stage Interpretable Framework for Efficient and Interpretable Chest X-Ray Diagnosis","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-22 09:53:54","doi":"10.21203/rs.3.rs-8373899/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"85338f0f-73a9-480b-8090-fd824409136e","owner":[],"postedDate":"December 22nd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":59859841,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2025-12-22T09:53:55+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-22 09:53:54","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8373899","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8373899","identity":"rs-8373899","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.