Convolutional Automatic Identification of B-lines and Interstitial Syndrome in Lung Ultrasound Images Using Pre-Trained Neural Networks with Feature Fusion | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Convolutional Automatic Identification of B-lines and Interstitial Syndrome in Lung Ultrasound Images Using Pre-Trained Neural Networks with Feature Fusion Khalid Moafa, Maria Antico, Damjan Vukovic, Christopher Edwards, and 8 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4487345/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background Interstitial/Alveolar Syndrome (IS) is a condition detectable on lung ultrasound (LUS) that indicates underlying pulmonary or cardiac diseases associated with significant morbidity and increased mortality rates. The diagnosis of IS using LUS can be challenging and time-consuming, and it requires clinical expertise. Methods In this study, multiple Convolutional Neural Network (CNN) deep learning (DL) models were trained, acting as binary classifiers, to accurately screen for IS from LUS frames by differentiating between IS-present and healthy cases. The CNN DL models were initially pre-trained using a generic image dataset to learn general visual features (ImageNet), and then fine-tuned on our specific dataset of 108 LUS clips from 54 patients (27 healthy and 27 with IS), with two clips per patient, to perform a binary classification task. Each frame within a clip was assessed to determine the presence of IS features or to confirm a healthy lung status. The dataset was split into training (70%), validation (15%), and testing (15%) sets. Following the process of fine-tuning, we successfully extracted features from pre-trained DL models. These extracted features were utilised to train multiple machine learning (ML) classifiers, hence the trained ML classifiers yielded significantly improved accuracy in IS classification. Advanced visual interpretation techniques, such as heatmaps based on Gradient-weighted Class Activation Mapping (Grad-CAM) and Local Interpretable Model-Agnostic explanations (LIME), were implemented to further analyse the outcomes. Results The best-trained ML model achieved a test accuracy of 98.2%, with specificity, recall, precision, and F1-score values all above 97.9%. Our study demonstrates, for the first time, the feasibility of using a pre-trained CNN with the feature extraction and fusion technique as a diagnostic tool for IS screening on LUS frames, providing a time-efficient and practical approach to clinical decision-making. Conclusion This study confirms the practicality of using pre-trained CNN models, with the feature extraction and fusion technique, for screening IS through LUS frames. This represents a noteworthy advancement in improving the efficiency of diagnosis. In the next steps, validation on larger datasets will assess the applicability and robustness of these CNN models in more complex clinical settings. Interstitial Syndrome Lung ultrasound Deep Learning Transfer Learning Features Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Figure 18 Figure 19 Figure 20 Figure 21 Figure 22 Figure 23 Figure 24 Figure 25 Figure 26 Figure 27 Figure 28 Introduction Lung ultrasound (LUS) has gained clinical acceptance for diagnosing and managing lung diseases due to its advantages over conventional tests such as computed tomography (CT), including accessibility, absence of radiation risk, and portability ( 1 ). These benefits make it ideal for emergency and intensive care settings. However, LUS is operator-dependent, and training can be costly and time-consuming, often restricted to clinicians who have access to LUS training ( 2 ). Deep learning (DL) algorithms have been developed to enable computer-automated diagnosis of pleural effusion and consolidation( 2 – 4 ). Recent advances in DL and convolutional neural networks (CNNs) have been achieved by using the expertise of LUS-trained clinicians as a reference for DL algorithms in the analysis and recognition of LUS patterns ( 5 , 6 ). This technological advancement assists in reducing risks of operator-related overlooks or misdiagnoses and potentially provides untrained clinicians with a diagnostic ultrasound (US) tool that is reasonably accurate. High-resolution computed tomography (HRCT) remains the gold-standard diagnostic tool for interstitial/ alveolar syndrome (IS) ( 7 ). However, limited access and exposure to risks related to transportation and exposure to ionising radiation make CT less desirable in critical care. LUS has been demonstrated to be superior to chest X-ray in assessing lung pathologies such as pulmonary oedema, pleural effusion, pneumonia and interstitial lung disease ( 8 ). It is particularly of value in expediting diagnosis therefore enabling timely treatment initiation( 9 ). The interpretation of the images largely relies on artefact analysis, which has been shown to correlate with CT findings ( 7 ). B-lines are reverberation artefacts in the form of vertical laser-like mobile lines which indicate interferences resulting from interstitial fluid, inflammation or fibrosis ( 10 ). The diagnosis of IS is appropriate when 3 or more B-lines are present within a single intercostal space and in non-dependent parts of the lungs; however, the significance varies based on the clinical context of the presentation. Bilateral IS can be caused by cardiogenic pulmonary oedema, interstitial lung diseases such as pulmonary fibrosis, or viral pneumonitis, including COVID-19 ( 11 ). Conversely, localised IS may indicate an early stage of pneumonia. Evidence has shown that identifying and quantifying B lines not only aids in diagnosing cardiogenic pulmonary oedema but also guides treatment and its response by repeated scanning and may provide prognostic information( 12 ). This study demonstrates the development and training of DL models, specifically CNNs, to automate the detection of B-lines on US images in patients with IS. Currently, DL approaches, particularly involving the use of CNNs, have been demonstrated to be effective for a wide range of pathologies in LUS ( 13 , 14 ). CNNs are able to automatically and robustly learn specific characteristics of the images, allowing them to reliably detect ( 15 ), segments ( 3 ), and classify ( 2 , 4 ) multiple LUS pathologies. It is well known that DL models require large amounts of labelled data for training ( 16 , 17 ). Transfer learning (TL) is a possible approach proposed to deal with "data starvation" problems, as it can compensate for a lack of data in a target domain by inheriting or maintaining the knowledge learnt in a data-rich source domain ( 18 ). According to the literature, using pre-trained CNNs, such as ImageNet models, as feature extractors or fine-tuning pre-trained CNNs can improve performance for various medical image analysis tasks compared to a DL model that is built without pre-existing features ( 19 , 20 ). Addressing the pressing need for automated LUS analysis tools that accurately and timely detect IS, thereby significantly reducing diagnostic subjectivity, facilitating early disease identification, and potentially leading to improved patient outcomes, forms the core motivation for this work. This work's novelty includes applying DL pre-trained models, namely Xception and InceptionResnetV2, which were initially trained on the ImageNet dataset, to a unique IS dataset and training these models on different data filtering techniques. In addition, we implemented a feature fusion technique to further improve the performance of DL models by combining features derived from those models. The combined set of features was further utilised to train multiple classifiers with the aim of achieving high diagnostic accuracy ( 21 ). We also interpreted the complexity of the "black box" of the DL models used by utilising visualisation and interpretation techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM) and Local Interpretable Model-Agnostic explanations (LIME)( 22 – 24 ). Materials and methods Dataset The LUS datasets used were fully anonymised and were collected at the Royal Melbourne Hospital. The study was approved by the Melbourne Health Human Research Ethics Committee (HREC/18/MH/269)( 25 ). The US dataset comprises 125 patients for a total of 1034 LUS clips. At least six unique lung scanning zones were evaluated and labelled (Fig. 1 ) by our clinical experts (DC and XC) following the protocol shown in Fig. 2 . From the initial dataset, clips from 54 unidentified patients were included, 27 healthy and 27 with IS labelled as "non-healthy". In total, the LUS clips included are 108, comprising 16962 LUS frames (8481 frames each for healthy and IS) (Fig. 3 ). Two LUS examples of IS and healthy frames are demonstrated in Fig. 4 A and B, respectively. Filtering techniques The filtering techniques applied in this study are as follows: Scenario 1 involves the thorough inclusion of all LUS frames from all clips in the training datasets (Dataset 1). In contrast, Scenarios 2 and 3 utilise a selective filtering technique to refine the training dataset (Dataset 2) by excluding LUS frames that do not include the main features characteristic of IS (i.e., absence of B-lines), thereby prioritising clinically relevant features. The two datasets are illustrated in Fig. 3 . All LUS clips were labelled as healthy or non-healthy (IS cases) based on predefined clinical criteria (Fig. 5 ). These criteria were adapted from international evidence-based recommendations for point-of-care ultrasound ( 26 ). It is crucial to highlight that in the clip-based labelling method used in this study, a clip classified as IS may include individual frames that do not show IS features and could be deemed as healthy. This observation highlights the natural variation and complexity of LUS, emphasising that not all frames from IS clips will consistently show the exact features associated with IS. For each LUS clip, a detailed assessment of LUS scans for both healthy and IS cases was undertaken by LUS experts and clinicians from the University of Melbourne (DC and XC). In particular, they examined each lung zone, identifying features associated with either healthy lung or IS based on the criteria outlined in Fig. 5 . Implementation of the models The LUS dataset was divided into three subsets for effective model development and performance assessment: training (70% ≈ 76 LUS clips), validation (15% ≈ 16 LUS clips), and test sets (15% ≈ 16 LUS clips). Three different approaches were followed, referred to as Scenarios 1, 2, and 3, with distinct pre-processing techniques and pre-trained models. The pre-trained models, Xception and InceptionResNetV2, were selected for their proficiency in medical image classification gained through training on the ImageNet dataset with over 14 million natural images across more than 20,000 classes ( 27 , 28 ). In Scenario 1, the Xception pre-trained model was employed ( 28 ). All LUS frames were included in the training and validation process, incorporating the total number of healthy and non-healthy frames from all clips (14,902 frames). The second scenario entailed the utilisation of Xception and Inceptionresnetv2 pre-trained models. A more selective strategy was adopted in Scenario 2 (10,546 frames), where non-healthy frames were re-evaluated, and any frames exhibiting characteristics not conducive to the main features characterising IS (i.e., absence of B-lines) were excluded. This selection criterion was applied to ensure that only relevant features were included in the dataset. In the third scenario, a baseline model (Xception) without pre-existing features was trained. It learns features and weights exclusively from the LUS data without any TL from general image knowledge. Furthermore, similar to Scenario 2, data filtering was applied to the baseline mode, with an evaluation of only non-healthy frames and the exclusion of any frames exhibiting characteristics not conducive to the main features characterising IS (absence of B-lines). To adapt the DL models to our IS detection task, with two classes, IS and healthy class, all models' (Xception, Inceptionresnetv2 and baseline) architectures were customised. The top layer of the models, known as the classifier, which was originally designed to classify 1000 different classes (such as animals or household items), was replaced with a two-class classifier. Additionally, the LUS images were downsized from 720x920 pixels to 299x299 pixels to align with the input dimensions specified by the models' architectures. The Xception model has about 170 layers and 22.9 million trainable parameters. It uses depth-wise separable convolutions across 14 modules to improve its ability to extract features. The InceptionResnetV2 model, with a more complex structure, includes 843 layers and 55.8 million trainable parameters, combining the Inception and ResNet architectures. As the baseline model, we employed a modified version of the Xception model that was devoid of its pre-trained weights. The MATLAB software (Version R2023b) was used to run the models and monitor the training and testing process. The models were trained on a Graphics Processing Unit (GPU) with an NVIDIA TITAN RTX and 25 GB RAM, running Ubuntu 20.04.6 LTS. The Adam optimiser was used during training. After completion of training, the models were tested on the test subset, and their performance was evaluated using multiple performance metrics. Figure 6 illustrates the workflow followed to train and test the models, which includes data processing, model customisation, and model performance evaluation tools. Additionally, Table 1 outlines the model hyperparameters. Table 1 DL parameters used in the training and validation process of the models Parameter Value Batch Size 10 Epochs 10 Shuffle Every epoch Learning Rate 10 − 4 Optimizer Adam Image size 299*299 Explainability and interpretability of DL models The Grad-CAM visualisation technique was used in our model evaluation process to enhance the explainability of the model prediction ( 24 ). Grad-CAM provides a visual explanation in the form of a heatmap overlay on the image, highlighting the Region of Interest (ROI) in the output image ( 24 ), which refers to a specific area within an LUS image used to identify particular pathologies or diseases. For instance, in cases of non-healthy frames (IS) (Fig. 7 . Grad-CAM), the ROI may be defined as an area containing B-lines. This technique generates heatmaps, overlaying the original image to highlight areas influencing the model's decision, aiding in identifying related features. Additionally, the LIME was used to provide explanations for predictions by estimating the decision boundary in a specific input image, focusing on the intended ROI in the LUS image, and generating a heatmap scale (Fig. 7 . Lime). LIME highlights influential regions contributing to a specific prediction, aiding in the understanding of the model's decision-making process ( 22 ). It approximates the boundary that defines the ROI by creating a new scale for a heatmap. This scale highlights the regions that have the most impact on the model's prediction. LIME divides the image into identifiable portions and evaluates the impact of each part on the LUS image. In Fig. 7 , the LIME visualisations show the most important features which are represented by scores and colours. Higher scores, indicated by more intense colours (from blue to red), correspond to features that contributed more significantly to the model's decision. This tool addresses the "black box" nature of DL models and makes their decisions more interpretable. Along with Grad-CAM and LIME plots, the confidence score was used. The confidence value, or the probability score, quantifies the model's level of confidence in its predictions ( 23 ). Higher probabilities or confidence values generally indicate higher confidence by the model, while lower probabilities suggest lower confidence by the model ( 23 ). A confidence score of 50% means the model is equally likely to be correct or incorrect. All developed models were tested using the unseen test dataset to generate Grad-CAM and LIME plots along with the confidence score (Fig. 7 .) A confidence score of 100% indicates that the model has absolute certainty in its prediction. Nevertheless, a high level of confidence does not guarantee the accuracy of the prognosis. The model's indication solely reflects its confidence level derived from the knowledge acquired during the training process. The model's confidence may be significantly high, yet it can still produce an accurate clinical diagnosis, particularly if it has been trained on biased data or if it encounters data that significantly deviates from its training set ( 29 ). Feature fusion technique The feature fusion process in artificial intelligence (AI) combines information from multiple AI models trained on the same dataset using different ML classifiers ( 30 ). This strategy is a powerful technique employed to enhance overall performance by incorporating features from different DL models. Its objective is to acquire and merge additional knowledge from multiple models in order to improve the representation of characteristics of the features extracted from multiple DL models ( 31 ). During the learning stage, the initial layers of each DL model acquire low-level features, such as colours, edges, and forms, while the last layers acquire the high-level features of the object. Consequently, the model's final output features result from this hierarchical learning process, where complex high-level features are built upon the more fundamental ones. Features are extracted from the bottleneck layers, which are the layers prior to the output layer. These layers are rich in complex features that have been analysed through the network and are considered highly informative for the classification task ( 32 ). Feature fusion is then utilised, where features learned from different models are combined or "fused". After the feature extraction phase, the extracted features undergo a process of normalisation to ensure they are on a comparable scale, followed by concatenation to fuse them into a unified feature space. This combination offers an improved depiction of the features and enables a more thorough representation of the underlying patterns and features in the data. ML classifiers are then trained using the fused features. This method enables ML classifiers to leverage the capabilities and distinctive attributes of each DL model, leading to a better comprehension of the target tasks ( 33 ). The integration of features from different models provides numerous benefits for ML classifiers. The built-in Classification Learner in MATLAB 2023b was utilised to develop ML classifiers, which include linear discriminant analysis, neural networks, coarse KNN, cubic SVM, the boosted tree, and the coarse tree to determine the most efficient classifier for this detection task. The study utilised multiple models for this task, beginning with a comprehensive fusion (F1) involving all models mentioned in scenarios 2 and 3, namely Xception, InceptionResnetV2, and the baseline model. Additionally, a feature fusion (F2) was performed with the two best models identified in scenario 2. Furthermore, two separate feature fusion processes were performed: one between the baseline model and Xception (F3) and another between the baseline model and InceptionResnetV2 (F4), each done individually. Each feature fusion process was followed by training those fusion features by ML classifiers, as shown in Fig. 8 , which are named C1, C2, C3 and C4. Comparison of AI models with medical experts To assess the robustness of our developed models in IS detection, a comparative analysis was conducted against the diagnostic expertise of clinicians, incorporating the whole clip assessment. A set of 16 video clips representing our dataset were assessed. These clips were originally evaluated and labelled by our clinical experts (DC and XC) as healthy or with the presence of IS. A blinded review of these video clips was conducted by two clinical experts, expert 1 (MS) and expert 2 (CE), with labels 0 and 1 for healthy and IS, respectively. Both clinical experts (MS) and (CE), who are part of the senior staff at the Queensland University of Technology with approximately 15 years of experience, blindly evaluated these clips, providing a comparison between our DL model performance and a human expert level. The clinical experts' diagnoses served as a reference point for evaluating the performance of our proposed pre-trained models mentioned within three scenarios. The analysis of whole LUS clips mimics the nature of the process of viewing LUS clips and how they are usually assessed and considered the current gold standard, allowing for a more accurate simulation of clinical diagnostic practices by experts. The conversion to video clip analysis adopts a Simple Majority Voting scheme (SVE) to aggregate individual frame predictions into a singular diagnosis for each video clip ( 34 ). This transition to video clip analysis compiles predictions from individual frames into a single diagnosis for each video clip. The class with the highest number prediction is established as the output prediction of the whole clip. To qualify as a single video diagnosis, the model must identify healthy or IS frames with more than 50% of the total frames in the video, ensuring it represents a significant portion of the video frames. All clips were labelled by both experts (MS) and (CE), evaluating these videos blindly. This process involved the experts providing diagnoses for entire videos—a total of 16 clips. Each clip within our test set was assigned a distinct numerical identifier. These identifiers, along with the corresponding ground-truth (GT) labels, were documented in an Excel spreadsheet for recording outputs and analysis. To ensure the integrity of our diagnostic assessment, each expert conducted their evaluations independently. This was facilitated by scheduling their assessments on separate days and within different workspaces, thereby mitigating any potential for bias or undue influence from one another. Upon completion of these assessments, the diagnostic results from each expert were cross-referenced with the GT labels. This comparative analysis enabled us to determine the accuracy of each expert's predictions by identifying correct and false predictions. All developed DL models were evaluated and compared to our clinical experts, using accuracy, sensitivity, and specificity as performance metrics. Additionally, a Receiver Operating Characteristic (ROC) curve, which is created by plotting the true positive rate (TPR) against the false positive rate (FPR), was used in the comparative analysis to discern the strengths and limitations of DL models in IS detection in comparison to our medical experts. Results Model performance metrics The performance of the developed models in the various scenarios was evaluated in terms of accuracy, precision, recall, and F1-score. The GT label used to assess the performance of the DL models is the label that matches the entire LUS clip label. Every LUS clip is given a label that identifies its classification. The GT used to assess the algorithm's performance is the label corresponding to the whole video. This means each LUS video is assigned a single label representing its overall classification, against which the model's predictions for individual frames are evaluated. As shown in Table 2 , the Xception model in Scenario 2 achieved higher accuracy than Scenario 1's model at 95.9% and higher precision and recall at 95.8%. It also had a higher F1-score of 96.0%. On the other hand, the InceptionResnetV2 model in Scenario 2 achieved an accuracy of 95.73% and a specificity, precision, recall and F-1 score of 95.7%. Lastly, the Baseline model achieved a specificity and precision of 90.6%, a recall of 90.4, and an F1-score of 90.5. Overall, both Scenario 2's models outperformed Scenario 1 and 3's models in terms of accuracy, precision, recall, and F1-score. However, the baseline model adopted in Scenario 3 outperformed the model in Scenario 1. Table 2. The performance metrics of both models are summarised in the table below. Model Performance metrics Accuracy Specificity Precision Recall F1-score Scenario 1 (S1) Xception 84.6% 85.4% 88.1% 84.2% 86.1% Scenario 2 (S2) Xception 95.9% 96.6% 97.1% 95.3% 96.2% InceptionResnetV2 95.8% 95.5% 96.1% 95.8% 96.0% Scenario 3 (S3) Baseline Model 90.5% 88.2% 89.7% 92.5% 91.1% In S1, all 14,902 frames were included, and no filtering was applied. However, in S2, only 10,546 frames were included, a filtering technique was applied, and non-healthy frames( without B-lines) from IS were excluded. In S3, the S2 filtering criteria were applied, and the model was built from scratch. Confusion matrix analysis For the unseen test set, the results of both models are presented in Fig. 9 . The number of false predictions made by the Xception model in Scenario 1 was triple that of the same model in Scenario 2, with 315 frames (highlighted in dark orange) and 84 frames (highlighted in light blue), respectively. These results suggest that the Xception model in Scenario 1 had a significantly higher rate of false predictions compared to the same model in Scenario 2. Detailed evaluation results Figures 10 , 11 , 12 , and 13 show a comprehensive overview of the models' performance on the test set, 16 LUS clips, which represent 15% of the LUS datasets used in this study. The figures show model prediction accuracy for each clip, including the count of LUS frames and true and false predictions for healthy and non-healthy frames. The rationale behind this is to understand how the model performs on individual clips. For example, in case number 2–2 from Fig. 10 -b, which is labelled as IS, the Xception model in Scenario 1 accurately identified 35 out of 120 frames as non-healthy (IS) while incorrectly predicting 85 frames as healthy. In contrast, considering the same case number, the Xception model in Scenario 2, as indicated in Fig. 11 -b, performed better by correctly predicting 112 out of 120 frames as non-healthy (IS) and only making 8 incorrect predictions of healthy frames. Furthermore, according to Fig. 12 -b, the InceptionrestnetV2 model in Scenario 2 correctly predicted 119 out of 120 frames as non-healthy (IS) and only made 1 incorrect prediction in the same case number. Additionally, as evident in Fig. 13 -b, the baseline model in Scenario 3 accurately identified 55 out of 12o frames as non-healthy (IS) and made 65 false predictions in the same case number. This comparison highlights the advanced predictive capability of both the pre-trained models in Scenario 2 and the baseline model in Scenario 3 in distinguishing IS and healthy frames. Explainability and interpretability Grad-CAM (Visualisation) In Fig. 14 -A, the heat maps show how the Xception model in Scenario 1 incorrectly detected the test sample of IS as healthy, with a confidence value of 78.8%, focusing on areas outside the intended ROI, marked with the red box on the input image. This indicates a failure to accurately capture key ROI features. In contrast (Fig. 14 -A), both the InceptionrestnetV2 and Xception models in Scenario 2 precisely identified the sample with a high confidence score (100%); however, only the Xception model was able to properly detect the correct ROI. The baseline model also precisely identified the sample as healthy, with a confidence value of 100% (Fig. 14 -A), and accurately detected the correct ROI. All the models correctly identified the test samples with IS according to Fig. 14 B and C; however, the models in Scenario 2 and the baseline model exhibited a higher confidence score compared to the model in Scenario 1. Overall, the Xception and InceptionrestnetV2 models in Scenario 2 and the baseline model in Scenario 3 effectively identified the IS samples with a high level of confidence; however, only the Xception model in Scenario 2 and the baseline model in Scenario 3 focused on the ROI. In Fig. 15 -A, B and C, the heat maps show that all models correctly predicted the test healthy LUS frames. Nonetheless, in Scenario 2, the models precisely recognised healthy samples with a greater confidence score. Specifically, the Xception model achieved 100%, and the InceptionResnetV2 achieved a range of 99.9–100%, surpassing the performance of the Xception model in Scenario 1 (75%-77%) and the baseline model (99.5% − 99.9%). In the most effective Scenario 2, the Grad-CAM for the Xception model displayed a blue-green area (highlighted by a green box), which did not align with the ROI marked by a red box. Conversely, InceptionResnetV2's Grad-CAM accurately focused on the ROI, aligning perfectly with the area marked by the red box. The Grad-CAM visualisation in Fig. 14 reveals that the Xception model in Scenario 2 accurately identifies ROI with a high degree of confidence, precisely matching the ROI in the input images, which is nearly matched by the Grad-CAM visualisation produced by the baseline model. On the other hand, the Grad-CAM visualisation in Fig. 15 highlights the InceptionResnetV2 model's capabilities in Scenario 2, where it not only predicts with accuracy and high confidence but also focuses on the ROI, mirroring the area observed in the ROI in the input images. LIME (Interpretability) Figure 16 presents the LIME comparison of all models in the scenarios, using two examples of true positives. For all the models, the figure illustrates their accurate identification of the test frames as true positives. However, in the case of Scenario 1, the figure displays that the high-intensity area is located within the ROI. Although Scenario 1 correctly predicted the class with a reasonable confidence value (87.5%), the LIME visualisation clearly shows that the region with the highest intensity, in fact, matches the ROI; however, the LIME visualisation shows a more diffuse pattern. In contrast, Scenario 2's models demonstrated a significant confidence improvement with a targeted and refined focus of LIME on the ROI. The models therein confidently predict the input frame with a confidence score of 100%. The accompanying LIME visualisation further confirms the Xception model's confidence by accurately identifying the ROI, as evidenced by the assignment of the maximum intensity value to the appropriate area. However, in InceptionResnetV2 in Scenario 2, the LIME visualisation shows the strongest signal at the top area of the image, as shown in Fig. 16 -B. If this area is indeed outside the expected ROI, this would suggest that the model is attributing high importance to features that are not diagnostically relevant for IS. The accuracy of Xception and InceptionrestnetV2 models in Scenario 2 is significantly enhanced, as indicated by the confidence score of 100%. The LIME visualisation for the Xception model aligns perfectly with the ROI, indicating a high degree of accuracy. In contrast, and notably for the InceptionResnetV2 model, it indicates the higher intensity signals in the heatmap primarily not just focused on the ROI but also extended to areas outside of it. Furthermore, the baseline model in Scenario 3 also confidently predicts the input frame with a confidence score of 100%. However, the LIME visualisation shows the strongest signal at the top area of the image, as shown in Fig. 16 -A. Figure 17 shows a LIME comparison of all scenarios demonstrating two examples of true negatives. All models exhibit high confidence values. Scenario 1's model achieved confidence values of 78% and 99.7% in the respective examples, while the Xception model in Scenario 2 achieved a confidence value of 100%. The InceptionrestnetV2 model in Scenario 2 achieved confidence values of 99.9% and 100%, respectively, whereas the baseline model in Scenario 3 achieved confidence values of 99.5% and 97.5%, respectively. Despite their closely matched overall confidence values, the models differ in localising ROI with varying intensity values. Scenario 1's model prioritised features representing healthy attributes with low-intensity values (depicted in blue and light green). On the contrary, Scenario 2's models effectively captured crucial 'healthy' features characterised by high-intensity values (shown in red and dark green). Notably, Scenario 2's models successfully highlighted these essential healthy features, ensuring a precise alignment of high-intensity areas with the ROI. False predictions All the false predictions of the Xception model in Scenario 2 were re-evaluated due to its superior performance in terms of accuracy among the four models evaluated in S1, S2 and S3. This thorough evaluation aimed not simply to capture errors by the model but also to rigorously assess if the identified predictions were truly false, thus enhancing our comprehension of the model's diagnostic dependability. A total of 84 frames were reviewed, including 33 false positives and 51 false negatives identified by the model. For the false positives frames ( 33 ) where the Xception model in Scenario 2 predicted "healthy", the expert re-evaluated 33 frames. The clinical expert categorised these frames into three classes. In the first class, 22 cases emerged where our clinicians classified the frames as healthy due to either the absence of B-lines or limited visibility, which match Scenario 2's Xception model's prediction. In the second class, only one frame exhibited potential B-lines. In the third class, 10 frames were classified as non-diagnostic or marked with limited visibility due to the shadowing caused by ribs. Figure 18 shows examples of the three classes and the predicted confidence value by both models. In the case of false negatives, where the model incorrectly identified frames as "IS", the expert re-evaluated 51 frames. For the first class, 11 of the 51 frames were determined to be IS by clinicians, indicating that the model's predictions were incorrect. The remaining 40 frames were considered healthy, as there was no evidence of B-lines. Figure 19 shows examples along with each of the two classes. Upon re-evaluating the false prediction frames (84 frames) of the Xception model in Scenario 2, the model correctly predicted most false positives, with a success rate of 22 out of 33. However, the model only correctly predicted 11 out of 51 false negatives. This implies that a single LUS clip can contain both healthy and non-healthy frames and should be evaluated accordingly. Confidence value assessment Figure 20 shows examples of IS frames extracted from a sample LUS clip. The model in Scenario 1 had confidence values (85%, 92.7%, 99.5% and 98.8%) in four frames, which correctly predicted IS examples hand. Both models in Scenario 2 showed high confidence scores; however, two frames were mispredicted as healthy. On the other hand, confidence values with all the same frames were predicted correctly (100%). Furthermore, the baseline model in Scenario 3 correctly predicted the same frames as IS with confidence values ranging from 77.3%-100%. Fig 20. A comparison of confidence values in Scenarios 1, 2, and 3. The models in Scenario 2 and Scenario 3 performed better than the model in Scenario 1, which misclassified two frames as healthy. Both models in Scenario 2 achieved the highest accuracy, correctly predicting all frames with high confidence (100%). The model in Scenario 3 also correctly predicted all frames but with lower confidence values (77.3–100%). The two red rectangles illustrate the mislabelled frames by the model in Scenario 1. Evaluation of fusion ML classifiers Multiple ML classifiers were trained using features extracted from three models (2nd and 3rd scenarios) previously mentioned. Table 3 shows the best 10 ML classifiers in terms of accuracy for each fusion process. Table 3 The best 10 ML classifiers in terms of accuracy for each fusion process. F1 F2 F3 F4 Model Type acc Model Type acc Model Type acc Model Type acc Binary GLM Logistic R 98.1% Neural Network 98.1% KNN 96.7% Neural Network 95.8% KNN 97.9% KNN 98.1% KNN 96.7% Efficient L R 95.7% Discriminant 97.9% KNN 98.0% Ensemble 96.5% Binary GLM L R 95.6% Ensemble 97.8% KNN 98.0% Discriminant 96.3% Neural Network 95.3% Neural Network 97.8% Neural Network 98.0% KNN 96.3% Neural Network 95.2% Neural Network 97.8% SVM 98.0% SVM 96.2% SVM 95.2% Kernel 97.7% KNN 98.0% KNN 96.2% KNN 95.2% KNN 97.6% Binary GLM LR 97.91% Binary GLM LR 95.6% SVM 95.1% Neural Network 97.6% Neural Network 97.9% SVM 95.1% SVM 95.1% Efficient L R 97.5% Neural Network 97.9% KNN 95.1% KNN 95.1% *F1 = Xception + InceptionResnetV2 + Baseline *F2 = Xception + InceptionResnetV2 *F3= Xception + Baseline *F4= InceptionResnetV2 + Baseline Table 4 Summary performance metrics of developed models on the test dataset (20260 frames). ML Classifiers Performance metrics Accuracy Specificity Precision Recall F1-score F1 Binary GLM Logistic R 98.2% 97.0% 97.3% 99.2% 98.2% F2 Neural Network 98.2% 97.9% 98.2% 98.4% 98.3% F3 KNN 95.8% 94.1% 94.9% 97.3% 96.1% F4 Neural Network 97.0% 96.7% 97.2% 97.0% 97.1% F1 is a feature fusion of all models mentioned in scenarios 2 and 3 (Xception, InceptionResnetV2, and the baseline model. F2 is a feature fusion between the two best models from scenario 2. F3 and F4 are feature fusions between the baseline model and Xception in S2, and the baseline model and InceptionResnetV2 in S2, respectively. ML classifiers' performance metrics The performance of the best ML classifier in the various fusion processes was evaluated in terms of accuracy, precision, recall, and F1-score. As shown in Table 4 , both the Binary GLM Logistic Regression (F1) and Neural Network (F2) accurately predicted 98.16% of the LUS frames. This gives these models an F1-score of 98.1% and precision and recall of 98.1%, respectively. The KNN model in F3 achieved an accuracy of 95.8% and a precision, recall and F1-score of 95.8%. Lastly, the Neural Network model in F4 achieved an accuracy of 96.8% and a specificity, precision, recall and F-1 score of 96.8%. Overall, both ML classifiers in F1 and F2 outperformed ML classifiers in F3 and F4 in terms of accuracy, precision, recall, and F1-score. ML classifiers’ confusion matrix For the unseen test set, the results of both ML classifiers are presented in Fig. 21 . F1 and F2 had the same number of false predictions, with 38 frames each (highlighted in light green). In contrast, F3 and F4 had much higher rates of false predictions, with 86 and 65 frames, respectively (highlighted in light green). These results indicate that F3 and F4 were less accurate than F1 and F2 in classifying the test data. Detailed evaluation of results Figures 22 , 23 , 24 , and 25 show a comprehensive overview of the ML classifiers' performance on the test set, 16 LUS clips, which represent 15% of LUS datasets used (the test subset). The figures below show the accuracy of model prediction for each clip, including the count of LUS frames and true and false predictions for healthy and non-healthy frames. For example, in case number 2–2 from Fig. 22 -b, which is labelled as IS, the first classifier (C1) accurately identified 109 out of 120 frames as non-healthy (IS) while incorrectly predicting 11 frames as healthy. In contrast, considering the same case number, the second classifier (C2), as indicated in Fig. 23 -b, performed better by correctly predicting all frames as non-healthy (IS). Furthermore, according to Fig. 24 -b, the third classifier (C3) correctly frames as non-healthy (IS) and made 16 incorrect predictions in the same case number. Additionally, as evident in Fig. 25 -b, the fourth classifier (C4) only accurately identified 99 out of 12o frames as non-healthy (IS) and made 21 false predictions in the same case number. This comparison highlights the advanced predictive capability of fused pre-trained models in the second fusion process (F2), which incorporates features from both Xception and InceptionResnetV2 in Scenario 2 for accurately differentiating between IS and healthy frames. The experts compared to our models All developed AI and ML models were evaluated in terms of true and false positives and negatives, as shown in Fig. 26 . The GT label used to assess the performance of the DL models compared to our experts is the label that matches the entire LUS clip label, in which every LUS clip is given a label that identifies its classification. Our 1st expert (MS) identified 75% (12 clips) of the labelled cases within the sample test subset (16 clips) as correct predictions and 25% (4 clips) as false predictions of the labelled cases. In contrast, our 2nd expert (CE) identified 88% (14 clips) of the labelled cases within the sample test subset (16 clips) as correct predictions and 12% (2 clips) as false predictions of the labelled cases. Among all developed models, both S2 and fusion models F1, F2, F3, and F4 predicted 100% of the LUS clips with the best accuracy. They identified 100% (16 clips) of the labelled cases within the sample test subset (16 clips) as correct predictions and with no false predictions. This gives these models an F1-score of 100%, a precision of 100% and a recall of 100%. The DL models in S1 and S3 identified 88% (14 clips) of the labelled cases within the sample test subset (16 clips) as correct predictions and with 2 false predictions. Figure 27 shows a spreadsheet capturing the evaluation process and outcomes of LUS clip predictions performed by our experts and all developed models on a set of 16 clips. The GT labels are noted, against which the assessments of each expert and model are compared. Each column under the experts and models represents their predictions for the clips, with '1' indicating an IS prediction and '0' representing a healthy clip. The cells highlighted in pink mark the instances where a false prediction was recorded by the corresponding expert or developed model. Overall, both ML classifiers (F1 and F2 ) and models in S2 outperformed our expert performance and other developed AI and ML models in terms of accuracy, precision, recall, and F1-score. A more detailed display of the performance of AI and ML models is provided in Table 5 . Furthermore, Fig. 28 . illustrates ROC curve, which shows the performances of our experts and developed models via TPR and FPR. The ROC curve shows that the fused models, F1, F2, F3 and F4, and S2 models significantly outperform our experts and other developed models. Table 5. The performance metrics of developed models on 16 LUS clips (8 Healthy and 8 IS). Model Performance Metrics Accuracy Specificity Precision Recall F1-score Expert 1 75% 75% 75% 75% 75% Expert 2 87.5% 100% 100% 75% 85.71% Scenario 1 Xception 87.5% 87.5% 87.5% 87.5% 87.5% Scenario 2 Xception 100% 100% 100% 100% 100% InceptionResnetV2 100% 100% 100% 100% 100% Scenario 3 Baseline Model 87.5% 87.5% 87.5% 87.5% 87.5% F1 Binary GLM Logistic R 100% 100% 100% 100% 100% F2 Neural Network 100% 100% 100% 100% 100% F3 KNN 100% 100% 100% 100% 100% F4 Neural Network 100% 100% 100% 100% 100% * The metrics highlighted in bold indicate the highest performance achieved among the ML and DL models. Discussion The fused models in the F1 and F2 processes, which combined either both models in S2 and S3 or only models in S2, remarkably display substantial enhancements across a spectrum of performance metrics compared to all developed models. This substantial improvement can be attributed to the strategic use of filtering techniques during the training phase in S2 and the fusion technique used. Several noteworthy observations within previous results illustrate the profound influence of the training modifications and fusion technique applied. Enhancement in accuracy The fused models in (F1 and F2) demonstrate a notable surge in accuracy (98.2%) compared to other models. This discernible advancement signifies that the applied filtering techniques have culminated in more precise overall predictions. Additionally, the fusion models exceed the accuracy compared to the best individual models in S2 (95.9 and 95.8). Diminished false predictions Importantly, the fused models in (F1 and F2) showed a notable reduction in false predictions relative to the individual DL models in S2. This reduction serves as tangible evidence that the incorporated filtering techniques and combined features into fused models have effectively mitigated the model's tendency to misclassify healthy instances as IS and vice versa. The comparison between the fused models, F1 and F2, and individual models in S2 revealed a notable decrease in false predictions (including false negatives and positives) within both models F1 and F2. F1 and F2 models mislabelled only 38 frames, a significant improvement compared to the 84 frames mislabelled by the Xception model in S2. Additionally, within individual models in S1, S2 and S3, the Grad-CAM and LIME visualisation supported our method of excluding healthy frames from IS frames during training of the Xception model in S2, even with the small dataset used, compared to the Xception model in S1. This method had a significant impact on the observed performance differences between the two models. LIME and Grad-CAM visualisation align closely with the clinical decision-making process by emphasising the critical areas that lead to accurate predictions. The comparison of Xception and InceptionResnetV2 in S2 and the baseline model in S3, using LIME and Grad-CAM, highlighted the efficacy of our suggested TL technique. With the same dataset used for the training, both models in S2 performed better than the baseline models trained from scratch. The decision to exclude healthy frames from the non-healthy class had a discernible impact on the model's performance in S2 and S3. By eliminating the potential confusion caused by healthy frames, the models became more adept at distinguishing between different frames, resulting in higher accuracy and precision. This modification effectively reduced the cases of false positives in the non-healthy cases, ultimately contributing to the improved accuracy observed in models in S2 and S3. The strategic exclusion of healthy frames from the non-healthy LUS played a crucial role in enhancing models in S2 and S3's ability to differentiate between healthy and IS frames. This decision led to a substantial improvement in overall accuracy, precision, and, ultimately, DL models' effectiveness in IS-aided diagnosis. In S1, the Xception model, which did not involve any frame exclusion, had its accuracy set at 84.6%. In contrast, in S2, the Xception model, which applied filtering techniques, substantially improved the overall accuracy, reaching 95.9%. This emphasised the substantial enhancement conferred by the application of filtering to the training dataset as opposed to utilising all frames in the training process. Furthermore, upon evaluation of a subset of 16 clips, the result shows that the fused models, F1, F2, F3 and F4, and S2 models, outperformed our experts' performance in classifying healthy and IS LUS clips. This suggests that developed AI and ML models have a high level of accuracy, precision, recall, and F1-score in detecting healthy and IS from LUS frames. Expert 1 (MS) had a high rate of false predictions (25%=4 clips), which indicates that there are some challenges in distinguishing between healthy and IS LUS clips from the video-level assessment. Comparably, Expert 2 (CE) showed a high diagnostic accuracy, with a reduced rate of false predictions, with only 12.5% of the clips (2 out of 16). In our comprehensive assessment of detecting IS, the case of diagnostic disagreement emerged, highlighting the challenges in LUS interpretation. Our clinical experts from Melbourne (DC and XC) initially classified those LUS clips, herein referred to as a clip, as indicative of a healthy or IS lung condition. This initial assessment was based on IS criteria from the international evidence-based recommendations ( 26 ) and their clinical expertise, reflecting a view at the time of their acquisition and based on multiple zones on the patient's lung. During the re-assessment of the LUS clips by Expert 1 (MS) and Expert 2 (CE), utilising only 1 clip from a single zone within the lung, a significant discrepancy is observed. In instances where Expert 2 (CE) incorrectly predicted clips, Expert 1 (MS) made the same error in prediction compared to the initial labels, as GT, by our experts, DC and XC. Among these false predictions were two clips, each originating from a pair assigned to two different patients, making up a total of four videos where each pair included one clip that was accurately identified and another that was not. This pattern of mislabelling by both experts, involving the same clip for each patient, highlights the challenges of consistent video interpretation and shows the complexity inherent in diagnosing through LUS clips with a limited field of view or poor-quality images and no prior knowledge of patent history. This disagreement among experts shows the subjectivity and variability that define the interpretation of LUS clips, particularly with artefacts related to B-lines. Such variability arises from the diverse interpretations by experts regarding the origin of B-lines—whether they emanate from the pleural line or not—and the quantification of these B-lines. Contributing to these discrepancies are factors such as a constrained field of view and the challenges posed by poor-quality LUS clips. These elements highlight the complexities of LUS analysis. In contrast, most of the developed AI and ML models accurately predicted all test clips with a high level of accuracy. Given the results shown by the fused models, it is relevant to situate our study within the ongoing research effort between Melbourne University and QUT researchers in the field of LUS pathology evaluation using AI tools. Our collaborative research team has recently proposed a fully automated LUS evaluation for lung pathologies, including pleural effusion, atelectasis (collapse), consolidation, interstitial syndrome, and pneumothorax. As part of this collaboration, a study by Tsai et al. ( 4 ) achieved 92% accuracy in classifying pleural effusion using a DL model consisting of a Regularised Spatial Transformer Network (Reg-STN). A follow-up study by Durrani et al. ( 2 ) demonstrated DL's potential for diagnosing pulmonary consolidation or collapse with 89% accuracy. Our current work has resulted in the development of a best-trained ML model that achieved a test accuracy of 98.2%. This achievement represents a significant milestone in our work, demonstrating for the first time the use of pre-trained CNNs with a feature extraction and fusion method to develop a diagnostic tool for IS screening in LUS frames. Importantly, the model's proficiency extends to the analysis of LUS video clips, which not only enhances the model's diagnostic accuracy within LUS frames but also its utility in clinical settings. A notable limitation of this study is that we only used a small test set (16 clips) to evaluate our experts' performance compared to developed models. This may not be representative of the general population or the different settings where LUS imaging is performed. Furthermore, only two experts evaluated these test clips. Therefore, further studies are needed to test our expert's performance on larger and more diverse datasets of LUS with more LUS experts involved. Another limitation of this study arises from using DL models that were initially pre-trained on general-purpose image classification using natural images (out-domain dataset). We then retrained these models using our specific LUS dataset (the target dataset). As a result of the inherent differences between the original dataset (comprising natural images) and our target dataset (comprising LUS images), DL models produced some inaccurate predictions, specifically with Grad-CAM in healthy examples with the Xception model in S2 (Fig. 14 ). In our future work, we plan to enhance the model's performance by retraining it using the same source dataset (US images) and incorporating a larger number of unlabelled US images. Following this, we will fine-tune the model by introducing a small, represented, and labelled LUS dataset and categorising it into multiple classes to identify multiple LUS pathologies. Another area for improvement is that labelled LUS clips used in this study are based only on US medical reports without the ability to correlate them with CT findings. For future work, we will aim to access a larger dataset that includes both LUS and CT scans as well as patients' medical reports. This is because a CT scan is considered the gold standard for diagnosing IS. Comparing the model prediction with medical reports and scan findings from both imaging modalities will provide reassurance to clinical experts to trust this technology. Conclusion and future work The pre-trained models utilised in this study function effectively as an automated tool for recognising B-lines and identifying IS in LUS video frames. Apart from these significant results, the fusion models from features extracted from those models outperformed the individual DL modes in terms of accuracy. Future studies in this area could further enhance the applicability and reliability of the proposed CNN models with TL and feature-fusion approach for other lung diseases. Firstly, expanding the dataset size and diversity could help validate the model's generalizability across different patient populations, diseases, and imaging conditions. Additionally, investigating the model's performance in distinguishing between different types of ISs could provide valuable insights into its potential clinical utility for other LUS disorders. In conclusion, while the current study shows promising results in IS screening using pre-trained models and LUS frames can explain where the model is looking and making decisions (using Grad-CAM and LIME), future research should focus on expanding the dataset and performing rigorous validation across diverse LUS datasets. This progressive approach will contribute to the establishment of an accurate, reliable, and clinically valuable tool for the diagnosis and management of LUS disorders. Declarations Acknowledgements Not applicable. Author contributions KM -the main coauthor, wrote the main manuscript text, performed trainings and validation, and the testing. MA, DF, DV and JD provided critical revisions and approved the final version. MS, CE- clinical image validation. DC, XC, AR, CR and KH – data collection and clinical discussions. Funding The present wok has no funding. Availability of data and materials The datasets used during this study are not publicly available but can be obtained from the corresponding author(s) on reasonable request. Ethics approval and consent to participate Written informed consent was obtained from all patients who participated in this study. The study was approved by the Melbourne Health Human Research Ethics Committee (HREC/18/MH/269) and registered with ( ANZCTR) (25). Consent for publication Not applicable. Competing interests The authors declare no competing interests. References Wang J, Yang X, Zhou B, Sohn JJ, Zhou J, Jacob JT, et al. Review of Machine Learning in Lung Ultrasound in COVID-19 Pandemic. J Imaging. 2022;8(3). Durrani N, Vukovic D, van der Burgt J, Antico M, van Sloun RJG, Canty D, et al. Automatic deep learning-based consolidation/collapse classification in lung ultrasound images for COVID-19 induced pneumonia. Sci Rep. 2022;12(1):17581–17581. Vukovic D, Wang A, Antico M, Steffens M, Ruvinov I, van Sloun RJ, et al. Automatic deep learning-based pleural effusion segmentation in lung ultrasound images. BMC Med Inform Decis Mak [Internet]. 2023 Nov 29 [cited 2023 Dec 9];23(1):274. Available from: https://doi.org/10.1186/s12911-023-02362-6 Tsai CH, Van Der Burgt J, Vukovic D, Kaur N, Demi L, Canty D, et al. Automatic deep learning-based pleural effusion classification in lung ultrasound images for respiratory pathology diagnosis. Phys Med. 2021;83:38–45. Barros B, Lacerda P, Albuquerque C, Conci A. Pulmonary COVID-19: Learning Spatiotemporal Features Combining CNN and LSTM Networks for Lung Ultrasound Video Classification. Sensors [Internet]. 2021 Aug 14 [cited 2023 Mar 2];21(16):5486. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8401701/ Yu R, Tian Y, Gao J, Liu Z, Wei X, Jiang H, et al. Feature discretization-based deep clustering for thyroid ultrasound image feature extraction. Comput Biol Med. 2022 Jul 1;146:105600. Yan JH, Pan L, Gao YB, Cui GH, Wang YH. Utility of lung ultrasound to identify interstitial lung disease: An observational study based on the STROBE guidelines. Medicine (Baltimore). 2021;100(12):e25217. Camacho J, Muñoz M, Genovés V, Herraiz JL, Ortega I, Belarra A, et al. Artificial Intelligence and Democratization of the Use of Lung Ultrasound in COVID-19: On the Feasibility of Automatic Calculation of Lung Ultrasound Score. Int J Transl Med. 2022;2(1):17–25. Volpicelli G, Fraccalini T, Cardinale L. Lung ultrasound: are we diagnosing too much? Ultrasound J [Internet]. 2023 Mar 29 [cited 2023 Apr 17];15(1):17. Available from: https://doi.org/10.1186/s13089-023-00313-w Smargiassi A, Zanforlin A, Perrone T, Buonsenso D, Torri E, Limoli G, et al. Vertical Artifacts as Lung Ultrasound Signs: Trick or Trap? Part 2- An Accademia di Ecografia Toracica Position Paper on B-Lines and Sonographic Interstitial Syndrome. J Ultrasound Med [Internet]. 2023 [cited 2023 Sep 6];42(2):279–92. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1002/jum.16116 Zanza C, Saglietti F, Tesauro M, Longhitano Y, Savioli G, Balzanelli MG, et al. Cardiogenic Pulmonary Edema in Emergency Medicine. Adv Respir Med [Internet]. 2023 Oct [cited 2024 May 24];91(5):445–63. Available from: https://www.mdpi.com/2543-6031/91/5/34 Wang Y, Gargani L, Barskova T, Furst DE, Cerinic MM. Usefulness of lung ultrasound B-lines in connective tissue disease-associated interstitial lung disease: A literature review. Arthritis Res Ther. 2017;19(1):206–206. Baloescu C, Toporek G, Kim S, McNamara K, Liu R, Shaw MM, et al. Automated Lung Ultrasound B-Line Assessment Using a Deep Learning Algorithm. IEEE Trans Ultrason Ferroelectr Freq Control [Internet]. 2020 Nov [cited 2024 May 20];67(11):2312–20. Available from: https://ieeexplore.ieee.org/document/9116812 Baloescu C, Rucki AA, Chen A, Zahiri M, Ghoshal G, Wang J, et al. Machine Learning Algorithm Detection of Confluent B-Lines. Ultrasound Med Biol. 2023;49(9):2095–102. Born J, Wiedemann N, Cossio M, Buhre C, Brändle G, Leidermann K, et al. Accelerating Detection of Lung Pathologies with Explainable Ultrasound Image Analysis. Appl Sci [Internet]. 2021 Jan [cited 2023 Mar 2];11(2):672. Available from: https://www.mdpi.com/2076-3417/11/2/672 Alzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021;8(1):53–53. Alzubaidi L, Duan Y, Al-Dujaili A, Ibraheem IK, Alkenani AH, Santamaría J, et al. Deepening into the suitability of using pre-trained models of ImageNet against a lightweight convolutional neural network in medical imaging: an experimental study. PeerJ Comput Sci [Internet]. 2021 Sep 28 [cited 2023 Aug 28];7:e715. Available from: https://peerj.com/articles/cs-715 Tajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, et al. Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? IEEE Trans Med Imaging. 2016;35(5):1299–312. Alzubaidi L, Al-Amidie M, Al-Asadi A, Humaidi AJ, Al-Shamma O, Fadhel MA, et al. Novel transfer learning approach for medical imaging with limited labeled data. Cancers. 2021;13(7):1590-. Alammar Z, Alzubaidi L, Zhang J, Li Y, Lafta W, Gu Y. Deep Transfer Learning with Enhanced Feature Fusion for Detection of Abnormalities in X-ray Images. Cancers [Internet]. 2023 Jan [cited 2023 Aug 8];15(15):4007. Available from: https://www.mdpi.com/2072-6694/15/15/4007 Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going Deeper with Convolutions. arXiv.org. 2014; Ribeiro MT, Singh S, Guestrin C. ‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier [Internet]. arXiv; 2016 [cited 2024 Jan 31]. Available from: http://arxiv.org/abs/1602.04938 Zhang Y, Liao QV, Bellamy RKE. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency [Internet]. 2020 [cited 2024 Feb 5]. p. 295–305. Available from: http://arxiv.org/abs/2001.02114 Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Int J Comput Vis [Internet]. 2020 Feb [cited 2024 Jan 31];128(2):336–59. Available from: http://arxiv.org/abs/1610.02391 Effect of a Multiorgan Focused Clinical Ultrasonography on Length of Stay in Patients Admitted With a Cardiopulmonary Diagnosis: A Randomized Clinical Trial | Pulmonary Medicine | JAMA Network Open | JAMA Network [Internet]. [cited 2024 May 27]. Available from: https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2787284 Volpicelli G, Elbarbary M, Blaivas M, Lichtenstein DA, Mathis G, Kirkpatrick AW, et al. International evidence-based recommendations for point-of-care lung ultrasound. Intensive Care Med. 2012 Apr;38(4):577–91. Szegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning [Internet]. arXiv; 2016 [cited 2024 Jan 31]. Available from: http://arxiv.org/abs/1602.07261 Chollet F. Xception: Deep Learning with Depthwise Separable Convolutions [Internet]. arXiv; 2017 [cited 2024 Jan 31]. Available from: http://arxiv.org/abs/1610.02357 Rechkemmer A, Yin M. When Confidence Meets Accuracy: Exploring the Effects of Multiple Performance Indicators on Trust in Machine Learning Models. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems [Internet]. New York, NY, USA: Association for Computing Machinery; 2022 [cited 2024 Feb 4]. p. 1–14. (CHI ’22). Available from: https://dl.acm.org/doi/10.1145/3491102.3501967 Mungoli N. Adaptive Feature Fusion: Enhancing Generalization in Deep Learning Models [Internet]. arXiv; 2023 [cited 2024 Feb 7]. Available from: http://arxiv.org/abs/2304.03290 Alzubaidi L, Fadhel MA, Albahri AS, Salhi A, Gupta A, Gu Y. Domain Adaptation and Feature Fusion for the Detection of Abnormalities in X-Ray Forearm Images. In: 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) [Internet]. Sydney, Australia: IEEE; 2023 [cited 2024 Mar 12]. p. 1–5. Available from: https://ieeexplore.ieee.org/document/10340309/ Elharrouss O, Akbari Y, Almaadeed N, Al-Maadeed S. Backbones-Review: Feature Extraction Networks for Deep Learning and Deep Reinforcement Learning Approaches [Internet]. arXiv; 2022 [cited 2024 Mar 14]. Available from: http://arxiv.org/abs/2206.08016 Classification - MATLAB & Simulink - MathWorks Australia [Internet]. [cited 2024 Feb 6]. Available from: https://au.mathworks.com/help/stats/classification.html Khan U, Smargiassi A, Inchingolo R, Demi L. A Novel Weighted Majority Voting-Based Ensemble Framework for Lung Ultrasound Pattern Classification in Pneumonia Patients. In: 2023 IEEE International Ultrasonics Symposium (IUS) [Internet]. 2023 [cited 2024 Mar 27]. p. 1–4. Available from: https://ieeexplore.ieee.org/abstract/document/10308194 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4487345","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":309059597,"identity":"4054783b-7b40-486c-94b5-26958cc04289","order_by":0,"name":"Khalid Moafa","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABXElEQVRIie3RPWvCQBjA8ScE4hL3iCbfoHASiBaKnyUibZYILUIQWmgkEBfbOaXSzxCX63oQ0KFB15N00MWpQroUO0h7VdEUEdqtlPyny+V+PHkBSEv7g0kScEQXANDqUljvxtAknL05sl3sCOwRzoNwQ8gBAnuEcw+T3L0zIRMLK6XM06kfWxU4ajtT+/3hWX60eZyP4UT2CT+TdiRf6COiDyP1uFPHY29YAy3sF1s3eKZ2iWBJBM5UnwhagiiSjkjVjao+qeMo6/KgUb00yeKg6oGoMRKwW/CdGPGajOY4WrrXjBivrWV3Sz4YybwlH0wyN1Mom8K5ASNm0cnaW0IYEZNTcp55vnoXROd43BkORC0ML5xCP1A9XmiUQ1RT7wKxUU58ZGr0pgsrUtCojunCulS0QbvXerkKZC/j9GizWZFv2Q6Fw4m7Jf/1s9Bq8fPQL86mpaWl/d8+AT8nl6Wyl9kHAAAAAElFTkSuQmCC","orcid":"","institution":"School of Clinical Sciences, Queensland University of Technology, Gardens Point Campus, Brisbane, QLD 4000","correspondingAuthor":true,"prefix":"","firstName":"Khalid","middleName":"","lastName":"Moafa","suffix":""},{"id":309059598,"identity":"3fc9ba3a-3be1-453d-9fea-38bc6974616e","order_by":1,"name":"Maria Antico","email":"","orcid":"","institution":"Australian e-Health Research Centre, The Commonwealth Scientific and Industrial Research Organisation (CSIRO), Brisbane","correspondingAuthor":false,"prefix":"","firstName":"Maria","middleName":"","lastName":"Antico","suffix":""},{"id":309059599,"identity":"75cdec42-ccfc-4190-87c2-9bb0338ea758","order_by":2,"name":"Damjan Vukovic","email":"","orcid":"","institution":"School of Clinical Sciences, Queensland University of Technology, Gardens Point Campus, Brisbane, QLD 4000","correspondingAuthor":false,"prefix":"","firstName":"Damjan","middleName":"","lastName":"Vukovic","suffix":""},{"id":309059600,"identity":"63040216-1965-45fb-b67c-5513a4481235","order_by":3,"name":"Christopher Edwards","email":"","orcid":"","institution":"School of Clinical Sciences, Queensland University of Technology, Gardens Point Campus, Brisbane, QLD 4000","correspondingAuthor":false,"prefix":"","firstName":"Christopher","middleName":"","lastName":"Edwards","suffix":""},{"id":309059601,"identity":"c01fb0af-dd74-40ee-9ecd-6dc06dd86861","order_by":4,"name":"David Canty","email":"","orcid":"","institution":"Department of Surgery (Royal Melbourne Hospital), University of Melbourne, Royal Parade, Parkville, VIC 3050","correspondingAuthor":false,"prefix":"","firstName":"David","middleName":"","lastName":"Canty","suffix":""},{"id":309059602,"identity":"bbbf8469-5422-47a6-b687-1cf19c289c00","order_by":5,"name":"Ximena Cid Serra","email":"","orcid":"","institution":"Department of Surgery (Royal Melbourne Hospital), University of Melbourne, Royal Parade, Parkville, VIC 3050","correspondingAuthor":false,"prefix":"","firstName":"Ximena","middleName":"Cid","lastName":"Serra","suffix":""},{"id":309059603,"identity":"d70c6a66-d403-4de7-bf92-56ace79af056","order_by":6,"name":"Alistair Royse","email":"","orcid":"","institution":"Department of Surgery (Royal Melbourne Hospital), University of Melbourne, Royal Parade, Parkville, VIC 3050","correspondingAuthor":false,"prefix":"","firstName":"Alistair","middleName":"","lastName":"Royse","suffix":""},{"id":309059604,"identity":"703c08ec-6a98-4541-bb44-1f3982313b3c","order_by":7,"name":"Colin Royse","email":"","orcid":"","institution":"Department of Surgery (Royal Melbourne Hospital), University of Melbourne, Royal Parade, Parkville, VIC 3050","correspondingAuthor":false,"prefix":"","firstName":"Colin","middleName":"","lastName":"Royse","suffix":""},{"id":309059605,"identity":"6a065e91-701e-4119-8252-c5b263379354","order_by":8,"name":"Kavi Haji","email":"","orcid":"","institution":"Department of Surgery (Royal Melbourne Hospital), University of Melbourne, Royal Parade, Parkville, VIC 3050","correspondingAuthor":false,"prefix":"","firstName":"Kavi","middleName":"","lastName":"Haji","suffix":""},{"id":309059606,"identity":"90867b39-733d-4a1c-920a-c3721faba7c2","order_by":9,"name":"Jason Dowling","email":"","orcid":"","institution":"Australian e-Health Research Centre, The Commonwealth Scientific and Industrial Research Organisation (CSIRO), Brisbane","correspondingAuthor":false,"prefix":"","firstName":"Jason","middleName":"","lastName":"Dowling","suffix":""},{"id":309059608,"identity":"a1cecf1a-9ada-43e6-ae2f-ca6467125180","order_by":10,"name":"Marian Steffens","email":"","orcid":"","institution":"School of Clinical Sciences, Queensland University of Technology, Gardens Point Campus, Brisbane, QLD 4000","correspondingAuthor":false,"prefix":"","firstName":"Marian","middleName":"","lastName":"Steffens","suffix":""},{"id":309059615,"identity":"44ab2d1f-e21d-4bbf-b6d7-88ab79d930b0","order_by":11,"name":"Davide Fontanarosa","email":"","orcid":"","institution":"School of Clinical Sciences, Queensland University of Technology, Gardens Point Campus, Brisbane, QLD 4000","correspondingAuthor":false,"prefix":"","firstName":"Davide","middleName":"","lastName":"Fontanarosa","suffix":""}],"badges":[],"createdAt":"2024-05-28 01:00:14","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4487345/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4487345/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":58595378,"identity":"968454b4-82aa-4b68-bb74-1a57f367ca26","added_by":"auto","created_at":"2024-06-18 16:38:14","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":290273,"visible":true,"origin":"","legend":"\u003cp\u003eThe Six lung scanning zones; (A) Displaying Right Anterior (RANT) and Left Anterior (LANT) views; (B) Encompassing Left Posterior Upper (LPU), Left Posterior Lower (LPL), Right Posterior Upper (RPU), and Right Posterior Lower (RPL) views; (C) Providing a posterior view for LPU, RPU, RPL, and LPL.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/6d8cf38faca06420ebb92ad8.png"},{"id":58595397,"identity":"f86ab9a7-8dc1-4475-a36d-64e60fe1b14e","added_by":"auto","created_at":"2024-06-18 16:38:16","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":46573,"visible":true,"origin":"","legend":"\u003cp\u003eExample of data in the medical report, indicating 6 LUS scanning regions corresponding with each LUS pathology. In this example, regions marked with IS. The right side of the image corresponds to the left side of patient (LANT and LPL), while the left side of the image corresponds to the right side of patient (RANT and RPL).\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/8157346db1c94d55835e76c0.png"},{"id":58596331,"identity":"1e44925d-49fd-48bf-9c18-585faad23e15","added_by":"auto","created_at":"2024-06-18 16:46:14","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":168469,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of patient numbers, number of clips, and number of frames, along with the process of reviewing, exclusion, and finalizing training, validation, and testing sets for both Dataset 1 and Dataset 2.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/23267313ac7f8bbf6bf1da80.png"},{"id":58595386,"identity":"f4159828-f0c0-46ef-8ab5-e05b67d8093d","added_by":"auto","created_at":"2024-06-18 16:38:15","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":46283,"visible":true,"origin":"","legend":"\u003cp\u003eThe displayed images are LUS examples of healthy and IS frames. On the right is a non-healthy frame with B-lines indicating non-healthy lungs (vertical Lines). On the left is a healthy frame showing A-lines (Horizontal Lines).\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/4600ce9a55ed3630844a8447.png"},{"id":58595382,"identity":"fd9765ba-81f5-4c87-9772-a15ab2cae495","added_by":"auto","created_at":"2024-06-18 16:38:15","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":148099,"visible":true,"origin":"","legend":"\u003cp\u003eSummary of recommendations for diagnosing IS and healthy LUS clips. \u003cstrong\u003eA\u003c/strong\u003e, IS (Yellow) is characterized by the presence of three B-lines, which are associated with four main features: B-lines move with lung sliding and lung pulse, reach the bottom of the screen without fading, and exhibit a laser-like appearance. Conversely, \u003cstrong\u003eB\u003c/strong\u003e, \u0026nbsp;a healthy condition (Green) is characterized by the presence of A-lines, with a clear linear pleural line and lung sliding during inspiration.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/a60089951c99b512bf05d1ed.png"},{"id":58595379,"identity":"32350329-9cc0-4df0-a1c1-1a6855786467","added_by":"auto","created_at":"2024-06-18 16:38:14","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":109426,"visible":true,"origin":"","legend":"\u003cp\u003eThe flowchart demonstrating the process of model customization, training, and evaluation used in the study.\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/34c408cfaef5021df0adc51c.png"},{"id":58595383,"identity":"8fafeb55-8016-4bfc-b647-ba273d57c240","added_by":"auto","created_at":"2024-06-18 16:38:15","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":90943,"visible":true,"origin":"","legend":"\u003cp\u003eExamples of Grad-CAM and LIME plots. These plots show clearly how the model focuses on the ROI and what features and regions are important for the model’s prediction. In the case of Grad-CAM, the heatmap pinpointing significantly impacted the prediction, whereas red zones represent the elements the model focused on most. For LIME, features with higher scores, and thus more intense colors, are the ones that the model considered more important in making its decision. For example, green and red colours areas (3-10\u003csup\u003e-5\u003c/sup\u003e) indicate the most important areas that support the decision made by the model. Conversely, blue areas (0-3) signify the less important features of the model's decision. In addition, at the top of the visualisation plots, the prediction comes with a confidence score (100%) which indicates the model’s certainty about its prediction.\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/74147a69ee57d48f80032847.png"},{"id":58595395,"identity":"2c1deb3a-403a-4fd3-bfa7-d0cc97900c54","added_by":"auto","created_at":"2024-06-18 16:38:16","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":128063,"visible":true,"origin":"","legend":"\u003cp\u003eFour fusion processes (F1, F2, F3, F4) that combine different models’ features from scenario 2 and scenario 3 are shown: F1 (Feature fusion of Xception, InceptionResnetV2, and baseline model), F2 (Feature fusion of Xception and InceptionResnetV2), F3 (fusion of baseline model and Xception), and F4 (fusion of baseline model and InceptionResnetV2). Features extracted from these models (F1, F2, F3, F4) are pooled together and fed into various machine learning (ML) classifiers such as linear discriminant analysis, neural networks, KNN, cubic SVM, the boosted tree, and the coarse tree. These classifiers then make final predictions (C1, C2, C3, C4).\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/c85eaca906b2c103e51f6c40.png"},{"id":58595393,"identity":"d61f8595-710e-4f7f-a060-a0cd898509b0","added_by":"auto","created_at":"2024-06-18 16:38:16","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":47223,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion matrix of the developed model on the test set with two different training techniques: From left, Scenario 1 with 315 false predictions, medial, Scenario 2 Xception model 84 with false predictions and InceptionResnetV2 model with 88 false predictions, and Scenario 3, Baseline model with 195 false predictions.\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/74f67b5770676b581e84d434.png"},{"id":58596332,"identity":"31c3820c-8819-4489-a691-2ce26c1a5191","added_by":"auto","created_at":"2024-06-18 16:46:15","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":70333,"visible":true,"origin":"","legend":"\u003cp\u003eTesting results for Scenario 1 (Xception model), involving 16 LUS clips for both healthy and IS cases, with true and false predictions of LUS frames and their percentage within LUS frames. The test subset consisted of 4 cases classified as healthy (each case containing 2 LUS clips) and 4 cases with IS (each case containing 2 LUS clips).\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/f4839c69763bb739f6e8d7ee.png"},{"id":58595394,"identity":"fdcf1c39-0673-4572-9117-83e8611d6209","added_by":"auto","created_at":"2024-06-18 16:38:16","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":71887,"visible":true,"origin":"","legend":"\u003cp\u003eTesting results for Scenario 2 (Xception model), involving 16 LUS clips for both healthy and IS cases, with true and false predictions of LUS frames and their percentage within LUS frames. The test subset consisted of 4 cases classified as healthy (each case containing 2 LUS clips) and 4 cases with IS (each case containing 2 LUS clips).\u003c/p\u003e","description":"","filename":"11.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/6ecf62bdad0aedca17e606a2.png"},{"id":58595389,"identity":"d36a2b4f-fd6b-4c5b-912f-bcf4822f38df","added_by":"auto","created_at":"2024-06-18 16:38:15","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":98020,"visible":true,"origin":"","legend":"\u003cp\u003eTesting results for Scenario 2 (InceptionResnetV2 model), involving 16 LUS clips for both healthy and IS cases, with true and false predictions of LUS frames and their percentage within LUS frames. The test subset consisted of 4 cases classified as healthy (each case containing 2 LUS clips) and 4 cases with IS (each case containing 2 LUS clips).\u003c/p\u003e","description":"","filename":"12.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/e7c7c59083d1e8cdc6a3df34.png"},{"id":58596336,"identity":"c5237639-631c-4679-9832-89fb0fcdb64f","added_by":"auto","created_at":"2024-06-18 16:46:16","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":75126,"visible":true,"origin":"","legend":"\u003cp\u003eTesting results for Scenario 3 (Baseline Model), involving 16 LUS clips for both healthy and IS cases, with true and false predictions of LUS frames and their percentage within LUS frames. The test subset consisted of 4 cases classified as healthy (each case containing 2 LUS clips) and 4 cases with IS (each case containing 2 LUS clips).\u003c/p\u003e","description":"","filename":"13.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/01203cb9fee53f69767c5c32.png"},{"id":58595398,"identity":"2f68c84a-ece6-4e5a-8381-64813f96132f","added_by":"auto","created_at":"2024-06-18 16:38:16","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":352867,"visible":true,"origin":"","legend":"\u003cp\u003eVisualisation of Grad-CAM and confidence values for the LUS frame predicted as true positives for all models. The red box in the input image indicates the intended Region of Interest (ROI). In each scenario, the images display the Grad-CAM with a red box highlighting the ROI. Each image displays the Grad-CAM with a red box indicating the intended ROI. A red cross is shown if the Grad-CAM does not align with the ROI, and a green checkmark is shown if it does.\u003c/p\u003e","description":"","filename":"14.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/547412c79738c73e3d704fca.png"},{"id":58595387,"identity":"63c7dd6f-8fe4-472b-aac5-8beb2bb78ba9","added_by":"auto","created_at":"2024-06-18 16:38:15","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":357623,"visible":true,"origin":"","legend":"\u003cp\u003eVisualisation of Grad-CAM and confidence values for the LUS frame predicted as true negatives for all models. The red box within the input image represents the intended ROI, and the green box represents the focus area where the Xception model in Scenario 2 focuses. In each scenario, the images display the Grad-CAM with a red box highlighting the ROI. Each image displays the Grad-CAM with a red box indicating the intended Region of Interest (ROI). A red cross is shown if the Grad-CAM does not align with the ROI, and a green checkmark is shown if it does.\u003c/p\u003e","description":"","filename":"15.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/003a2fb07cc8bfc040fc92b9.png"},{"id":58595388,"identity":"0e82debe-f22a-4ed4-877f-5f569d4c27de","added_by":"auto","created_at":"2024-06-18 16:38:15","extension":"png","order_by":16,"title":"Figure 16","display":"","copyAsset":false,"role":"figure","size":288543,"visible":true,"origin":"","legend":"\u003cp\u003eVisualization of Lime for the LUS frames predicted as true positives show where the models focus on different areas of test LUS frames; A, Scenario 1 focuses on multiple areas, marked with green and red colours, while Scenario 2 focuses on IS features (yellow arrows) within the intended ROI (marked with the red box on the input image); B, shows the focus on the intended ROI by both models, but Scenario 2 focuses more on the yellow arrow, which matches the ROI.\u003c/p\u003e","description":"","filename":"16.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/59d17c29fa8f7d52a39d5f2c.png"},{"id":58595385,"identity":"7bbc229c-6863-4c76-a9a8-3cc9c4771d14","added_by":"auto","created_at":"2024-06-18 16:38:15","extension":"png","order_by":17,"title":"Figure 17","display":"","copyAsset":false,"role":"figure","size":241488,"visible":true,"origin":"","legend":"\u003cp\u003eVisualization of LIME for the LUS frames predicted as true positives show where the models focus on different areas of test LUS frames; A, Scenario 1 focuses on multiple areas, marked with green and red colours, while Scenario 2 focuses on IS features (yellow arrows) within the intended ROI (marked with the red box on the input image); B, shows the focus on the intended ROI by both models, but Scenario 2 focuses more on the yellow arrow, which matches the ROI.\u003c/p\u003e","description":"","filename":"17.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/0a38f8937f1332aa80f1ebd5.png"},{"id":58596335,"identity":"3e532282-0014-4532-bdc8-39ecc2a8c719","added_by":"auto","created_at":"2024-06-18 16:46:16","extension":"png","order_by":18,"title":"Figure 18","display":"","copyAsset":false,"role":"figure","size":279555,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of models in the three scenarios, where the Xception and InceptionrestnetV2 models in Scenario 2 are more confident in the prediction of healthy frames for those mislabelled as SIS ,with higher confidence values in InceptionrestnetV2 model (93%-99.8%).\u003c/p\u003e","description":"","filename":"18.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/2dcc242aa57591dbbc41fd09.png"},{"id":58596338,"identity":"855cf2ab-feb1-43d8-9170-32a3f64069db","added_by":"auto","created_at":"2024-06-18 16:46:16","extension":"png","order_by":19,"title":"Figure 19","display":"","copyAsset":false,"role":"figure","size":232949,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of Scenario 1, 2, and 3, where the Xception model in Scenario 2 is mislabelled in the upper example (1\u003csup\u003est\u003c/sup\u003e class) with low confidence value and correctly predicted in the lower example (2\u003csup\u003end\u003c/sup\u003e class).\u003c/p\u003e","description":"","filename":"19.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/68704e034e66e76a07baeca7.png"},{"id":58595392,"identity":"c1de2daa-6225-4949-9e97-502ec25b6ef8","added_by":"auto","created_at":"2024-06-18 16:38:16","extension":"png","order_by":20,"title":"Figure 20","display":"","copyAsset":false,"role":"figure","size":188677,"visible":true,"origin":"","legend":"\u003cp\u003eA comparison of confidence values in Scenarios 1, 2, and 3. The models in Scenario 2 and Scenario 3 performed better than the model in Scenario 1, which misclassified two frames as healthy. Both models in Scenario 2 achieved the highest accuracy, correctly predicting all frames with high confidence (100%). The model in Scenario 3 also correctly predicted all frames but with lower confidence values (77.3%–100%). The two red rectangles illustrate the mislabelled frames by the model in Scenario 1.\u003c/p\u003e","description":"","filename":"20.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/3c8be240eb8947c2e3de599e.png"},{"id":58595405,"identity":"61e0b3a3-568b-4864-91e5-d9903fe51264","added_by":"auto","created_at":"2024-06-18 16:38:17","extension":"png","order_by":21,"title":"Figure 21","display":"","copyAsset":false,"role":"figure","size":35807,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion matrix of the fusion models on the test set with four different fusion process: left, F1 \u0026nbsp;and F2 both with 38 false predictions, right, F3 with 86 with false predictions and F4 with 88 false predictions.\u003c/p\u003e","description":"","filename":"21.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/0f71630ac748ad449d31b48a.png"},{"id":58595402,"identity":"9e2837b2-3eaf-4dc6-a4d7-231eddaf7d2a","added_by":"auto","created_at":"2024-06-18 16:38:16","extension":"png","order_by":22,"title":"Figure 22","display":"","copyAsset":false,"role":"figure","size":69284,"visible":true,"origin":"","legend":"\u003cp\u003eTesting results for the first classifier (C1), involving 16 LUS clips for both healthy and IS cases, with true and false predictions of LUS frames and their percentage within LUS frames. The test subset consisted of 4 cases classified as healthy (each case containing 2 LUS clips) and 4 cases with IS (each case containing 2 LUS clips).\u003c/p\u003e","description":"","filename":"22.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/d1401bde5f5185ad99ae6e2a.png"},{"id":58595401,"identity":"12a3b97a-8bcf-4ab4-8eff-c72ef5fc9413","added_by":"auto","created_at":"2024-06-18 16:38:16","extension":"png","order_by":23,"title":"Figure 23","display":"","copyAsset":false,"role":"figure","size":73825,"visible":true,"origin":"","legend":"\u003cp\u003eTesting results for the second classifier (C2), involving 16 LUS clips for both healthy and IS cases, with true and false predictions of LUS frames and their percentage within LUS frames. The test subset consisted of 4 cases classified as healthy (each case containing 2 LUS clips) and 4 cases with IS (each case containing 2 LUS clips).\u003c/p\u003e","description":"","filename":"23.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/fa2f05694e61eb6b4e27df2e.png"},{"id":58595404,"identity":"30cd33e1-3101-4f72-8784-e78e8e20cf11","added_by":"auto","created_at":"2024-06-18 16:38:17","extension":"png","order_by":24,"title":"Figure 24","display":"","copyAsset":false,"role":"figure","size":72174,"visible":true,"origin":"","legend":"\u003cp\u003eTesting results for the third classifier (C3), involving 16 LUS clips for both healthy and IS cases, with true and false predictions of LUS frames and their percentage within LUS frames. The test subset consisted of 4 cases classified as healthy (each case containing 2 LUS clips) and 4 cases with IS (each case containing 2 LUS clips).\u003c/p\u003e","description":"","filename":"24.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/4797efa328c961ca15fee51f.png"},{"id":58596334,"identity":"b228ae5a-cf49-4377-8f48-fd3de7a82d7c","added_by":"auto","created_at":"2024-06-18 16:46:15","extension":"png","order_by":25,"title":"Figure 25","display":"","copyAsset":false,"role":"figure","size":75281,"visible":true,"origin":"","legend":"\u003cp\u003eTesting results for the fourth classifier (C4), involving 16 LUS clips for both healthy and IS cases, with true and false predictions of LUS frames and their percentage within LUS frames. The test subset consisted of 4 cases classified as healthy (each case containing 2 LUS clips) and 4 cases with IS (each case containing 2 LUS clips).\u003c/p\u003e","description":"","filename":"25.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/014140fef086e11c45cb994c.png"},{"id":58595406,"identity":"f478a9e8-05fe-4ebb-b623-c780e085e720","added_by":"auto","created_at":"2024-06-18 16:38:18","extension":"png","order_by":26,"title":"Figure 26","display":"","copyAsset":false,"role":"figure","size":74121,"visible":true,"origin":"","legend":"\u003cp\u003eConfusion matrix (CM) of our experts and developed models on 16 LUS clips (8 Healthy and 8 IS). \u003cstrong\u003e(a)\u003c/strong\u003e CM of our 1\u003csup\u003est\u003c/sup\u003e expert with 4 false predictions, while our 2\u003csup\u003end\u003c/sup\u003e expert shows only 2 false predictions. (b) CM for all developed models in scenarios (1, 2, and 3). In S1, both Xception and InceptionResNetV2 models demonstrate perfect classification for all clips. In S3, the baseline model shows 2 false predictions. \u003cstrong\u003e(c)\u003c/strong\u003e CM for all fused models, F1 and F2 achieve perfect classification with no false predictions.\u003c/p\u003e","description":"","filename":"26.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/18d8e65a7757f6b1b2e764d4.png"},{"id":58596333,"identity":"5bd0541a-769d-4544-9401-810a6e94f490","added_by":"auto","created_at":"2024-06-18 16:46:15","extension":"png","order_by":27,"title":"Figure 27","display":"","copyAsset":false,"role":"figure","size":104406,"visible":true,"origin":"","legend":"\u003cp\u003eThis spreadsheet shows the evaluation output for LUS clip predictions as determined by our experts alongside developed models. It shows the comparative analysis of predictions across 16 clips, with the ground truth (GT) labels for IS and healthy states as the benchmark. Predictive outcomes are marked '1' for IS and '0' for healthy clips, with false predictions highlighted in pink.\u003c/p\u003e","description":"","filename":"27.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/ee40109fef467225b47f8054.png"},{"id":58595403,"identity":"de0561bc-bdf2-421e-a221-8459e73bb8c6","added_by":"auto","created_at":"2024-06-18 16:38:16","extension":"png","order_by":28,"title":"Figure 28","display":"","copyAsset":false,"role":"figure","size":33732,"visible":true,"origin":"","legend":"\u003cp\u003ePerformance of developed models and our clinical experts on a set test (n = 16). Dotted lines represent clinical experts' performance (blue for Expert 1 AUC=0.75), black for Expert 2; AUC=0.88). Triangular markers with solid lines represent deep learning (DL) models' performance in scenarios S1 and S3 (AUC=0.88). The pink solid line shows the superior performance of the developed models with AUC=1.00, including both scenario S2 models (Xception and InceptionResnetV2) and fusion models F1-F4.\u003c/p\u003e","description":"","filename":"28.png","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/46cc8d4c5cc21177c5789568.png"},{"id":66404487,"identity":"2836c586-bd08-4cbb-a1f8-19a49c2136ba","added_by":"auto","created_at":"2024-10-11 12:17:35","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5147293,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4487345/v1/6c55ffe7-4b79-4e63-8ae8-776f9c8076b7.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Convolutional Automatic Identification of B-lines and Interstitial Syndrome in Lung Ultrasound Images Using Pre-Trained Neural Networks with Feature Fusion","fulltext":[{"header":"Introduction","content":"\u003cp\u003eLung ultrasound (LUS) has gained clinical acceptance for diagnosing and managing lung diseases due to its advantages over conventional tests such as computed tomography (CT), including accessibility, absence of radiation risk, and portability (\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e). These benefits make it ideal for emergency and intensive care settings. However, LUS is operator-dependent, and training can be costly and time-consuming, often restricted to clinicians who have access to LUS training (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e). Deep learning (DL) algorithms have been developed to enable computer-automated diagnosis of pleural effusion and consolidation(\u003cspan additionalcitationids=\"CR3\" citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e). Recent advances in DL and convolutional neural networks (CNNs) have been achieved by using the expertise of LUS-trained clinicians as a reference for DL algorithms in the analysis and recognition of LUS patterns (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e). This technological advancement assists in reducing risks of operator-related overlooks or misdiagnoses and potentially provides untrained clinicians with a diagnostic ultrasound (US) tool that is reasonably accurate.\u003c/p\u003e \u003cp\u003eHigh-resolution computed tomography (HRCT) remains the gold-standard diagnostic tool for interstitial/ alveolar syndrome (IS) (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e). However, limited access and exposure to risks related to transportation and exposure to ionising radiation make CT less desirable in critical care. LUS has been demonstrated to be superior to chest X-ray in assessing lung pathologies such as pulmonary oedema, pleural effusion, pneumonia and interstitial lung disease (\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e). It is particularly of value in expediting diagnosis therefore enabling timely treatment initiation(\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe interpretation of the images largely relies on artefact analysis, which has been shown to correlate with CT findings (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e). B-lines are reverberation artefacts in the form of vertical laser-like mobile lines which indicate interferences resulting from interstitial fluid, inflammation or fibrosis (\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e). The diagnosis of IS is appropriate when 3 or more B-lines are present within a single intercostal space and in non-dependent parts of the lungs; however, the significance varies based on the clinical context of the presentation. Bilateral IS can be caused by cardiogenic pulmonary oedema, interstitial lung diseases such as pulmonary fibrosis, or viral pneumonitis, including COVID-19 (\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e). Conversely, localised IS may indicate an early stage of pneumonia. Evidence has shown that identifying and quantifying B lines not only aids in diagnosing cardiogenic pulmonary oedema but also guides treatment and its response by repeated scanning and may provide prognostic information(\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThis study demonstrates the development and training of DL models, specifically CNNs, to automate the detection of B-lines on US images in patients with IS.\u003c/p\u003e \u003cp\u003eCurrently, DL approaches, particularly involving the use of CNNs, have been demonstrated to be effective for a wide range of pathologies in LUS (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e). CNNs are able to automatically and robustly learn specific characteristics of the images, allowing them to reliably detect (\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e), segments (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e), and classify (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e) multiple LUS pathologies.\u003c/p\u003e \u003cp\u003eIt is well known that DL models require large amounts of labelled data for training (\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e). Transfer learning (TL) is a possible approach proposed to deal with \"data starvation\" problems, as it can compensate for a lack of data in a target domain by inheriting or maintaining the knowledge learnt in a data-rich source domain (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e). According to the literature, using pre-trained CNNs, such as ImageNet models, as feature extractors or fine-tuning pre-trained CNNs can improve performance for various medical image analysis tasks compared to a DL model that is built without pre-existing features (\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eAddressing the pressing need for automated LUS analysis tools that accurately and timely detect IS, thereby significantly reducing diagnostic subjectivity, facilitating early disease identification, and potentially leading to improved patient outcomes, forms the core motivation for this work. This work's novelty includes applying DL pre-trained models, namely Xception and InceptionResnetV2, which were initially trained on the ImageNet dataset, to a unique IS dataset and training these models on different data filtering techniques.\u003c/p\u003e \u003cp\u003eIn addition, we implemented a feature fusion technique to further improve the performance of DL models by combining features derived from those models. The combined set of features was further utilised to train multiple classifiers with the aim of achieving high diagnostic accuracy (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e). We also interpreted the complexity of the \"black box\" of the DL models used by utilising visualisation and interpretation techniques such as Gradient-weighted Class Activation Mapping (Grad-CAM) and Local Interpretable Model-Agnostic explanations (LIME)(\u003cspan additionalcitationids=\"CR23\" citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e).\u003c/p\u003e"},{"header":"Materials and methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003eDataset\u003c/h2\u003e\n \u003cp\u003eThe LUS datasets used were fully anonymised and were collected at the Royal Melbourne Hospital. The study was approved by the Melbourne Health Human Research Ethics Committee (HREC/18/MH/269)(\u003cspan class=\"CitationRef\"\u003e25\u003c/span\u003e). The US dataset comprises 125 patients for a total of 1034 LUS clips. At least six unique lung scanning zones were evaluated and labelled (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e) by our clinical experts (DC and XC) following the protocol shown in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e. From the initial dataset, clips from 54 unidentified patients were included, 27 healthy and 27 with IS labelled as \u0026quot;non-healthy\u0026quot;. In total, the LUS clips included are 108, comprising 16962 LUS frames (8481 frames each for healthy and IS) (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e). Two LUS examples of IS and healthy frames are demonstrated in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003eA and B, respectively.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\n \u003ch2\u003eFiltering techniques\u003c/h2\u003e\n \u003cp\u003eThe filtering techniques applied in this study are as follows: Scenario 1 involves the thorough inclusion of all LUS frames from all clips in the training datasets (Dataset 1). In contrast, Scenarios 2 and 3 utilise a selective filtering technique to refine the training dataset (Dataset 2) by excluding LUS frames that do not include the main features characteristic of IS (i.e., absence of B-lines), thereby prioritising clinically relevant features. The two datasets are illustrated in Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e\n \u003cp\u003eAll LUS clips were labelled as healthy or non-healthy (IS cases) based on predefined clinical criteria (Fig. \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e). These criteria were adapted from international evidence-based recommendations for point-of-care ultrasound (\u003cspan class=\"CitationRef\"\u003e26\u003c/span\u003e). It is crucial to highlight that in the clip-based labelling method used in this study, a clip classified as IS may include individual frames that do not show IS features and could be deemed as healthy. This observation highlights the natural variation and complexity of LUS, emphasising that not all frames from IS clips will consistently show the exact features associated with IS.\u003c/p\u003e\n \u003cp\u003eFor each LUS clip, a detailed assessment of LUS scans for both healthy and IS cases was undertaken by LUS experts and clinicians from the University of Melbourne (DC and XC). In particular, they examined each lung zone, identifying features associated with either healthy lung or IS based on the criteria outlined in Fig. \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\n \u003ch2\u003eImplementation of the models\u003c/h2\u003e\n \u003cp\u003eThe LUS dataset was divided into three subsets for effective model development and performance assessment: training (70% \u0026asymp; 76 LUS clips), validation (15% \u0026asymp; 16 LUS clips), and test sets (15% \u0026asymp; 16 LUS clips). Three different approaches were followed, referred to as Scenarios 1, 2, and 3, with distinct pre-processing techniques and pre-trained models. The pre-trained models, Xception and InceptionResNetV2, were selected for their proficiency in medical image classification gained through training on the ImageNet dataset with over 14\u0026nbsp;million natural images across more than 20,000 classes (\u003cspan class=\"CitationRef\"\u003e27\u003c/span\u003e, \u003cspan class=\"CitationRef\"\u003e28\u003c/span\u003e). In Scenario 1, the Xception pre-trained model was employed (\u003cspan class=\"CitationRef\"\u003e28\u003c/span\u003e). All LUS frames were included in the training and validation process, incorporating the total number of healthy and non-healthy frames from all clips (14,902 frames). The second scenario entailed the utilisation of Xception and Inceptionresnetv2 pre-trained models. A more selective strategy was adopted in Scenario 2 (10,546 frames), where non-healthy frames were re-evaluated, and any frames exhibiting characteristics not conducive to the main features characterising IS (i.e., absence of B-lines) were excluded. This selection criterion was applied to ensure that only relevant features were included in the dataset. In the third scenario, a baseline model (Xception) without pre-existing features was trained. It learns features and weights exclusively from the LUS data without any TL from general image knowledge. Furthermore, similar to Scenario 2, data filtering was applied to the baseline mode, with an evaluation of only non-healthy frames and the exclusion of any frames exhibiting characteristics not conducive to the main features characterising IS (absence of B-lines).\u003c/p\u003e\n \u003cp\u003eTo adapt the DL models to our IS detection task, with two classes, IS and healthy class, all models\u0026apos; (Xception, Inceptionresnetv2 and baseline) architectures were customised. The top layer of the models, known as the classifier, which was originally designed to classify 1000 different classes (such as animals or household items), was replaced with a two-class classifier. Additionally, the LUS images were downsized from 720x920 pixels to 299x299 pixels to align with the input dimensions specified by the models\u0026apos; architectures.\u003c/p\u003e\n \u003cp\u003eThe Xception model has about 170 layers and 22.9 million trainable parameters. It uses depth-wise separable convolutions across 14 modules to improve its ability to extract features. The InceptionResnetV2 model, with a more complex structure, includes 843 layers and 55.8 million trainable parameters, combining the Inception and ResNet architectures. As the baseline model, we employed a modified version of the Xception model that was devoid of its pre-trained weights. The MATLAB software (Version R2023b) was used to run the models and monitor the training and testing process. The models were trained on a Graphics Processing Unit (GPU) with an NVIDIA TITAN RTX and 25 GB RAM, running Ubuntu 20.04.6 LTS. The Adam optimiser was used during training. After completion of training, the models were tested on the test subset, and their performance was evaluated using multiple performance metrics. Figure \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e illustrates the workflow followed to train and test the models, which includes data processing, model customisation, and model performance evaluation tools. Additionally, Table \u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e outlines the model hyperparameters.\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eDL parameters used in the training and validation process of the models\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"2\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eParameter\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eValue\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBatch Size\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEpochs\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eShuffle\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eEvery epoch\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eLearning Rate\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e10\u003csup\u003e\u0026minus;\u0026thinsp;4\u003c/sup\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eOptimizer\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAdam\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eImage size\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e299*299\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\n \u003ch2\u003eExplainability and interpretability of DL models\u003c/h2\u003e\n \u003cp\u003eThe Grad-CAM visualisation technique was used in our model evaluation process to enhance the explainability of the model prediction (\u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e). Grad-CAM provides a visual explanation in the form of a heatmap overlay on the image, highlighting the Region of Interest (ROI) in the output image (\u003cspan class=\"CitationRef\"\u003e24\u003c/span\u003e), which refers to a specific area within an LUS image used to identify particular pathologies or diseases. For instance, in cases of non-healthy frames (IS) (Fig. \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e. Grad-CAM), the ROI may be defined as an area containing B-lines. This technique generates heatmaps, overlaying the original image to highlight areas influencing the model\u0026apos;s decision, aiding in identifying related features. Additionally, the LIME was used to provide explanations for predictions by estimating the decision boundary in a specific input image, focusing on the intended ROI in the LUS image, and generating a heatmap scale (Fig. \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e. Lime).\u003c/p\u003e\n \u003cp\u003eLIME highlights influential regions contributing to a specific prediction, aiding in the understanding of the model\u0026apos;s decision-making process (\u003cspan class=\"CitationRef\"\u003e22\u003c/span\u003e). It approximates the boundary that defines the ROI by creating a new scale for a heatmap. This scale highlights the regions that have the most impact on the model\u0026apos;s prediction. LIME divides the image into identifiable portions and evaluates the impact of each part on the LUS image. In Fig. \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e, the LIME visualisations show the most important features which are represented by scores and colours. Higher scores, indicated by more intense colours (from blue to red), correspond to features that contributed more significantly to the model\u0026apos;s decision. This tool addresses the \u0026quot;black box\u0026quot; nature of DL models and makes their decisions more interpretable.\u003c/p\u003e\n \u003cp\u003eAlong with Grad-CAM and LIME plots, the confidence score was used. The confidence value, or the probability score, quantifies the model\u0026apos;s level of confidence in its predictions (\u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e). Higher probabilities or confidence values generally indicate higher confidence by the model, while lower probabilities suggest lower confidence by the model (\u003cspan class=\"CitationRef\"\u003e23\u003c/span\u003e). A confidence score of 50% means the model is equally likely to be correct or incorrect. All developed models were tested using the unseen test dataset to generate Grad-CAM and LIME plots along with the confidence score (Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e.) A confidence score of 100% indicates that the model has absolute certainty in its prediction. Nevertheless, a high level of confidence does not guarantee the accuracy of the prognosis. The model\u0026apos;s indication solely reflects its confidence level derived from the knowledge acquired during the training process. The model\u0026apos;s confidence may be significantly high, yet it can still produce an accurate clinical diagnosis, particularly if it has been trained on biased data or if it encounters data that significantly deviates from its training set (\u003cspan class=\"CitationRef\"\u003e29\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n \u003ch2\u003eFeature fusion technique\u003c/h2\u003e\n \u003cp\u003eThe feature fusion process in artificial intelligence (AI) combines information from multiple AI models trained on the same dataset using different ML classifiers (\u003cspan class=\"CitationRef\"\u003e30\u003c/span\u003e). This strategy is a powerful technique employed to enhance overall performance by incorporating features from different DL models. Its objective is to acquire and merge additional knowledge from multiple models in order to improve the representation of characteristics of the features extracted from multiple DL models (\u003cspan class=\"CitationRef\"\u003e31\u003c/span\u003e). During the learning stage, the initial layers of each DL model acquire low-level features, such as colours, edges, and forms, while the last layers acquire the high-level features of the object. Consequently, the model\u0026apos;s final output features result from this hierarchical learning process, where complex high-level features are built upon the more fundamental ones. Features are extracted from the bottleneck layers, which are the layers prior to the output layer. These layers are rich in complex features that have been analysed through the network and are considered highly informative for the classification task (\u003cspan class=\"CitationRef\"\u003e32\u003c/span\u003e). Feature fusion is then utilised, where features learned from different models are combined or \u0026quot;fused\u0026quot;.\u003c/p\u003e\n \u003cp\u003eAfter the feature extraction phase, the extracted features undergo a process of normalisation to ensure they are on a comparable scale, followed by concatenation to fuse them into a unified feature space. This combination offers an improved depiction of the features and enables a more thorough representation of the underlying patterns and features in the data. ML classifiers are then trained using the fused features. This method enables ML classifiers to leverage the capabilities and distinctive attributes of each DL model, leading to a better comprehension of the target tasks (\u003cspan class=\"CitationRef\"\u003e33\u003c/span\u003e). The integration of features from different models provides numerous benefits for ML classifiers. The built-in Classification Learner in MATLAB 2023b was utilised to develop ML classifiers, which include linear discriminant analysis, neural networks, coarse KNN, cubic SVM, the boosted tree, and the coarse tree to determine the most efficient classifier for this detection task.\u003c/p\u003e\n \u003cp\u003eThe study utilised multiple models for this task, beginning with a comprehensive fusion (F1) involving all models mentioned in scenarios 2 and 3, namely Xception, InceptionResnetV2, and the baseline model. Additionally, a feature fusion (F2) was performed with the two best models identified in scenario 2. Furthermore, two separate feature fusion processes were performed: one between the baseline model and Xception (F3) and another between the baseline model and InceptionResnetV2 (F4), each done individually. Each feature fusion process was followed by training those fusion features by ML classifiers, as shown in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e, which are named C1, C2, C3 and C4.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n \u003ch2\u003eComparison of AI models with medical experts\u003c/h2\u003e\n \u003cp\u003eTo assess the robustness of our developed models in IS detection, a comparative analysis was conducted against the diagnostic expertise of clinicians, incorporating the whole clip assessment. A set of 16 video clips representing our dataset were assessed. These clips were originally evaluated and labelled by our clinical experts (DC and XC) as healthy or with the presence of IS. A blinded review of these video clips was conducted by two clinical experts, expert 1 (MS) and expert 2 (CE), with labels 0 and 1 for healthy and IS, respectively. Both clinical experts (MS) and (CE), who are part of the senior staff at the Queensland University of Technology with approximately 15 years of experience, blindly evaluated these clips, providing a comparison between our DL model performance and a human expert level. The clinical experts\u0026apos; diagnoses served as a reference point for evaluating the performance of our proposed pre-trained models mentioned within three scenarios.\u003c/p\u003e\n \u003cp\u003eThe analysis of whole LUS clips mimics the nature of the process of viewing LUS clips and how they are usually assessed and considered the current gold standard, allowing for a more accurate simulation of clinical diagnostic practices by experts. The conversion to video clip analysis adopts a Simple Majority Voting scheme (SVE) to aggregate individual frame predictions into a singular diagnosis for each video clip (\u003cspan class=\"CitationRef\"\u003e34\u003c/span\u003e). This transition to video clip analysis compiles predictions from individual frames into a single diagnosis for each video clip. The class with the highest number prediction is established as the output prediction of the whole clip. To qualify as a single video diagnosis, the model must identify healthy or IS frames with more than 50% of the total frames in the video, ensuring it represents a significant portion of the video frames. All clips were labelled by both experts (MS) and (CE), evaluating these videos blindly. This process involved the experts providing diagnoses for entire videos\u0026mdash;a total of 16 clips.\u003c/p\u003e\n \u003cp\u003eEach clip within our test set was assigned a distinct numerical identifier. These identifiers, along with the corresponding ground-truth (GT) labels, were documented in an Excel spreadsheet for recording outputs and analysis. To ensure the integrity of our diagnostic assessment, each expert conducted their evaluations independently. This was facilitated by scheduling their assessments on separate days and within different workspaces, thereby mitigating any potential for bias or undue influence from one another. Upon completion of these assessments, the diagnostic results from each expert were cross-referenced with the GT labels. This comparative analysis enabled us to determine the accuracy of each expert\u0026apos;s predictions by identifying correct and false predictions.\u003c/p\u003e\n \u003cp\u003eAll developed DL models were evaluated and compared to our clinical experts, using accuracy, sensitivity, and specificity as performance metrics. Additionally, a Receiver Operating Characteristic (ROC) curve, which is created by plotting the true positive rate (TPR) against the false positive rate (FPR), was used in the comparative analysis to discern the strengths and limitations of DL models in IS detection in comparison to our medical experts.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eModel performance metrics\u003c/h2\u003e \u003cp\u003eThe performance of the developed models in the various scenarios was evaluated in terms of accuracy, precision, recall, and F1-score. The GT label used to assess the performance of the DL models is the label that matches the entire LUS clip label. Every LUS clip is given a label that identifies its classification. The GT used to assess the algorithm's performance is the label corresponding to the whole video. This means each LUS video is assigned a single label representing its overall classification, against which the model's predictions for individual frames are evaluated. As shown in \u003cb\u003eTable\u0026nbsp;2\u003c/b\u003e, the Xception model in Scenario 2 achieved higher accuracy than Scenario 1's model at 95.9% and higher precision and recall at 95.8%. It also had a higher F1-score of 96.0%. On the other hand, the InceptionResnetV2 model in Scenario 2 achieved an accuracy of 95.73% and a specificity, precision, recall and F-1 score of 95.7%. Lastly, the Baseline model achieved a specificity and precision of 90.6%, a recall of 90.4, and an F1-score of 90.5. Overall, both Scenario 2's models outperformed Scenario 1 and 3's models in terms of accuracy, precision, recall, and F1-score. However, the baseline model adopted in Scenario 3 outperformed the model in Scenario 1.\u003c/p\u003e \u003cp\u003e \u003cb\u003eTable\u0026nbsp;2.\u003c/b\u003e The performance metrics of both models are summarised in the table below.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"5\" nameend=\"c7\" namest=\"c3\"\u003e \u003cp\u003ePerformance metrics\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eF1-score\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eScenario 1 (S1)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eXception\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e84.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e85.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e88.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e84.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e86.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eScenario 2 (S2)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eXception\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e96.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e97.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e95.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e96.2%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eInceptionResnetV2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e95.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e96.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e95.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e96.0%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eScenario 3 (S3)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBaseline Model\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e90.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e88.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e89.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e92.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e91.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"7\" nameend=\"c7\" namest=\"c1\"\u003e \u003cp\u003eIn S1, all 14,902 frames were included, and no filtering was applied. However, in S2, only 10,546 frames were included, a filtering technique was applied, and non-healthy frames( without B-lines) from IS were excluded. In S3, the S2 filtering criteria were applied, and the model was built from scratch.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eConfusion matrix analysis\u003c/h2\u003e \u003cp\u003eFor the unseen test set, the results of both models are presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig9\" class=\"InternalRef\"\u003e9\u003c/span\u003e. The number of false predictions made by the Xception model in Scenario 1 was triple that of the same model in Scenario 2, with 315 frames (highlighted in dark\u003c/p\u003e \u003cp\u003eorange) and 84 frames (highlighted in light blue), respectively. These results suggest that the Xception model in Scenario 1 had a significantly higher rate of false predictions compared to the same model in Scenario 2.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eDetailed evaluation results\u003c/h2\u003e \u003cp\u003eFigures \u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e10\u003c/span\u003e, \u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e11\u003c/span\u003e, \u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e12\u003c/span\u003e, and \u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e13\u003c/span\u003e show a comprehensive overview of the models' performance on the test set, 16 LUS clips, which represent 15% of the LUS datasets used in this study. The figures show model prediction accuracy for each clip, including the count of LUS frames and true and false predictions for healthy and non-healthy frames. The rationale behind this is to understand how the model performs on individual clips. For example, in case number 2\u0026ndash;2 from Fig.\u0026nbsp;\u003cspan refid=\"Fig11\" class=\"InternalRef\"\u003e10\u003c/span\u003e-b, which is labelled as IS, the Xception model in Scenario 1 accurately identified 35 out of 120 frames as non-healthy (IS) while incorrectly predicting 85 frames as healthy. In contrast, considering the same case number, the Xception model in Scenario 2, as indicated in Fig.\u0026nbsp;\u003cspan refid=\"Fig10\" class=\"InternalRef\"\u003e11\u003c/span\u003e-b, performed better by correctly predicting 112 out of 120 frames as non-healthy (IS) and only making 8 incorrect predictions of healthy frames. Furthermore, according to Fig.\u0026nbsp;\u003cspan refid=\"Fig13\" class=\"InternalRef\"\u003e12\u003c/span\u003e-b, the InceptionrestnetV2 model in Scenario 2 correctly predicted 119 out of 120 frames as non-healthy (IS) and only made 1 incorrect prediction in the same case number. Additionally, as evident in Fig.\u0026nbsp;\u003cspan refid=\"Fig12\" class=\"InternalRef\"\u003e13\u003c/span\u003e-b, the baseline model in Scenario 3 accurately identified 55 out of 12o frames as non-healthy (IS) and made 65 false predictions in the same case number. This comparison highlights the advanced predictive capability of both the pre-trained models in Scenario 2 and the baseline model in Scenario 3 in distinguishing IS and healthy frames.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eExplainability and interpretability\u003c/h2\u003e \u003cp\u003eGrad-CAM (Visualisation)\u003c/p\u003e \u003cp\u003eIn Fig.\u0026nbsp;\u003cspan refid=\"Fig15\" class=\"InternalRef\"\u003e14\u003c/span\u003e-A, the heat maps show how the Xception model in Scenario 1 incorrectly detected the test sample of IS as healthy, with a confidence value of 78.8%, focusing on areas outside the intended ROI, marked with the red box on the input image. This indicates a failure to accurately capture key ROI features. In contrast (Fig.\u0026nbsp;\u003cspan refid=\"Fig15\" class=\"InternalRef\"\u003e14\u003c/span\u003e-A), both the InceptionrestnetV2 and Xception models in Scenario 2 precisely identified the sample with a high confidence score (100%); however, only the Xception model was able to properly detect the correct ROI. The baseline model also precisely identified the sample as healthy, with a confidence value of 100% (Fig.\u0026nbsp;\u003cspan refid=\"Fig15\" class=\"InternalRef\"\u003e14\u003c/span\u003e-A), and accurately detected the correct ROI. All the models correctly identified the test samples with IS according to Fig.\u0026nbsp;\u003cspan refid=\"Fig15\" class=\"InternalRef\"\u003e14\u003c/span\u003eB and C; however, the models in Scenario 2 and the baseline model exhibited a higher confidence score compared to the model in Scenario 1. Overall, the Xception and InceptionrestnetV2 models in Scenario 2 and the baseline model in Scenario 3 effectively identified the IS samples with a high level of confidence; however, only the Xception model in Scenario 2 and the baseline model in Scenario 3 focused on the ROI.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn Fig.\u0026nbsp;\u003cspan refid=\"Fig14\" class=\"InternalRef\"\u003e15\u003c/span\u003e-A, B and C, the heat maps show that all models correctly predicted the test healthy LUS frames. Nonetheless, in Scenario 2, the models precisely recognised healthy samples with a greater confidence score. Specifically, the Xception model achieved 100%, and the InceptionResnetV2 achieved a range of 99.9\u0026ndash;100%, surpassing the performance of the Xception model in Scenario 1 (75%-77%) and the baseline model (99.5% \u0026minus;\u0026thinsp;99.9%). In the most effective Scenario 2, the Grad-CAM for the Xception model displayed a blue-green area (highlighted by a green box), which did not align with the ROI marked by a red box. Conversely, InceptionResnetV2's Grad-CAM accurately focused on the ROI, aligning perfectly with the area marked by the red box.\u003c/p\u003e \u003cp\u003eThe Grad-CAM visualisation in Fig.\u0026nbsp;\u003cspan refid=\"Fig15\" class=\"InternalRef\"\u003e14\u003c/span\u003e reveals that the Xception model in Scenario 2 accurately identifies ROI with a high degree of confidence, precisely matching the ROI in the input images, which is nearly matched by the Grad-CAM visualisation produced by the baseline model. On the other hand, the Grad-CAM visualisation in Fig.\u0026nbsp;\u003cspan refid=\"Fig14\" class=\"InternalRef\"\u003e15\u003c/span\u003e highlights the InceptionResnetV2 model's capabilities in Scenario 2, where it not only predicts with accuracy and high confidence but also focuses on the ROI, mirroring the area observed in the ROI in the input images.\u003c/p\u003e \u003cp\u003eLIME (Interpretability)\u003c/p\u003e \u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e16\u003c/span\u003e presents the LIME comparison of all models in the scenarios, using two examples of true positives. For all the models, the figure illustrates their accurate identification of the test frames as true positives. However, in the case of Scenario 1, the figure displays that the high-intensity area is located within the ROI. Although Scenario 1 correctly predicted the class with a reasonable confidence value (87.5%), the LIME visualisation clearly shows that the region with the highest intensity, in fact, matches the ROI; however, the LIME visualisation shows a more diffuse pattern. In contrast, Scenario 2's models demonstrated a significant confidence improvement with a targeted and refined focus of LIME on the ROI. The models therein confidently predict the input frame with a confidence score of 100%. The accompanying LIME visualisation further confirms the Xception model's confidence by accurately identifying the ROI, as evidenced by the assignment of the maximum intensity value to the appropriate area. However, in InceptionResnetV2 in Scenario 2, the LIME visualisation shows the strongest signal at the top area of the image, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e16\u003c/span\u003e-B. If this area is indeed outside the expected ROI, this would suggest that the model is attributing high importance to features that are not diagnostically relevant for IS. The accuracy of Xception and InceptionrestnetV2 models in Scenario 2 is significantly enhanced, as indicated by the confidence score of 100%. The LIME visualisation for the Xception model aligns perfectly with the ROI, indicating a high degree of accuracy. In contrast, and notably for the InceptionResnetV2 model, it indicates the higher intensity signals in the heatmap primarily not just focused on the ROI but also extended to areas outside of it. Furthermore, the baseline model in Scenario 3 also confidently predicts the input frame with a confidence score of 100%. However, the LIME visualisation shows the strongest signal at the top area of the image, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig16\" class=\"InternalRef\"\u003e16\u003c/span\u003e-A.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure\u0026nbsp;\u003cspan refid=\"Fig18\" class=\"InternalRef\"\u003e17\u003c/span\u003e shows a LIME comparison of all scenarios demonstrating two examples of true negatives. All models exhibit high confidence values. Scenario 1's model achieved confidence values of 78% and 99.7% in the respective examples, while the Xception model in Scenario 2 achieved a confidence value of 100%. The InceptionrestnetV2 model in Scenario 2 achieved confidence values of 99.9% and 100%, respectively, whereas the baseline model in Scenario 3 achieved confidence values of 99.5% and 97.5%, respectively. Despite their closely matched overall confidence values, the models differ in localising ROI with varying intensity values. Scenario 1's model prioritised features representing healthy attributes with low-intensity values (depicted in blue and light green). On the contrary, Scenario 2's models effectively captured crucial 'healthy' features characterised by high-intensity values (shown in red and dark green). Notably, Scenario 2's models successfully highlighted these essential healthy features, ensuring a precise alignment of high-intensity areas with the ROI.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eFalse predictions\u003c/h2\u003e \u003cp\u003eAll the false predictions of the Xception model in Scenario 2 were re-evaluated due to its superior performance in terms of accuracy among the four models evaluated in S1, S2 and S3. This thorough evaluation aimed not simply to capture errors by the model but also to rigorously assess if the identified predictions were truly false, thus enhancing our comprehension of the model's diagnostic dependability. A total of 84 frames were reviewed, including 33 false positives and 51 false negatives identified by the model.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFor the false positives frames (\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e) where the Xception model in Scenario 2 predicted \"healthy\", the expert re-evaluated 33 frames. The clinical expert categorised these frames into three classes. In the first class, 22 cases emerged where our clinicians classified the frames as healthy due to either the absence of B-lines or limited visibility, which match Scenario 2's Xception model's prediction. In the second class, only one frame exhibited potential B-lines. In the third class, 10 frames were classified as non-diagnostic or marked with limited visibility due to the shadowing caused by ribs. Figure\u0026nbsp;\u003cspan refid=\"Fig17\" class=\"InternalRef\"\u003e18\u003c/span\u003e shows examples of the three classes and the predicted confidence value by both models.\u003c/p\u003e \u003cp\u003eIn the case of false negatives, where the model incorrectly identified frames as \"IS\", the expert re-evaluated 51 frames. For the first class, 11 of the 51 frames were determined to be IS by clinicians, indicating that the model's predictions were incorrect. The remaining 40 frames were considered healthy, as there was no evidence of B-lines. Figure\u0026nbsp;\u003cspan refid=\"Fig19\" class=\"InternalRef\"\u003e19\u003c/span\u003e shows examples along with each of the two classes.\u003c/p\u003e \u003cp\u003eUpon re-evaluating the false prediction frames (84 frames) of the Xception model in Scenario 2, the model correctly predicted most false positives, with a success rate of 22 out of 33. However, the model only correctly predicted 11 out of 51 false negatives. This implies that a single LUS clip can contain both healthy and non-healthy frames and should be evaluated accordingly.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003eConfidence value assessment\u003c/h2\u003e \u003cp\u003e \u003cb\u003eFigure\u0026nbsp;20\u003c/b\u003e shows examples of IS frames extracted from a sample LUS clip. The model in Scenario 1 had confidence values (85%, 92.7%, 99.5% and 98.8%) in four frames, which correctly predicted IS examples hand. Both models in Scenario 2 showed high confidence scores; however, two frames were mispredicted as healthy. On the other hand, confidence values with all the same frames were predicted correctly (100%). Furthermore, the baseline model in Scenario 3 correctly predicted the same frames as IS with confidence values ranging from 77.3%-100%.\u003c/p\u003e \u003cp\u003e \u003cb\u003eFig 20.\u003c/b\u003e A comparison of confidence values in Scenarios 1, 2, and 3. The models in Scenario 2 and Scenario 3 performed better than the model in Scenario 1, which misclassified two frames as healthy. Both models in Scenario 2 achieved the highest accuracy, correctly predicting all frames with high confidence (100%). The model in Scenario 3 also correctly predicted all frames but with lower confidence values (77.3\u0026ndash;100%). The two red rectangles illustrate the mislabelled frames by the model in Scenario 1.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003eEvaluation of fusion ML classifiers\u003c/h2\u003e \u003cp\u003eMultiple ML classifiers were trained using features extracted from three models (2nd and 3rd scenarios) previously mentioned. Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows the best 10 ML classifiers in terms of accuracy for each fusion process.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe best 10 ML classifiers in terms of accuracy for each fusion process.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e \u003cp\u003eF2\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c6\" namest=\"c5\"\u003e \u003cp\u003eF3\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c8\" namest=\"c7\"\u003e \u003cp\u003eF4\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel Type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eacc\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eModel Type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eacc\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eModel Type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003eacc\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eModel Type\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003eacc\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eBinary GLM Logistic R\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e98.1%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003eNeural Network\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e98.1%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003eKNN\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e96.7%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003eNeural Network\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e95.8%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e97.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e98.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e96.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eEfficient L R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e95.7%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiscriminant\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e97.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e98.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eEnsemble\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e96.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eBinary GLM L R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e95.6%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEnsemble\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e97.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e98.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eDiscriminant\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e96.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNeural Network\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e95.3%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNeural Network\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e97.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNeural Network\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e98.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e96.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eNeural Network\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e95.2%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNeural Network\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e97.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e98.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e96.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e95.2%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eKernel\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e97.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e98.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e96.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e95.2%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e97.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eBinary GLM LR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e97.91%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eBinary GLM LR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e95.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e95.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNeural Network\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e97.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNeural Network\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e97.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e95.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e95.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEfficient L R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e97.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNeural Network\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e97.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e95.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e95.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e \u003cp\u003e*F1\u0026thinsp;=\u0026thinsp;Xception\u0026thinsp;+\u0026thinsp;InceptionResnetV2 + Baseline\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e \u003cp\u003e*F2\u0026thinsp;=\u0026thinsp;Xception\u0026thinsp;+\u0026thinsp;InceptionResnetV2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c6\" namest=\"c5\"\u003e \u003cp\u003e*F3= Xception\u0026thinsp;+\u0026thinsp;Baseline\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c8\" namest=\"c7\"\u003e \u003cp\u003e*F4= InceptionResnetV2\u0026thinsp;+\u0026thinsp;Baseline\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSummary performance metrics of developed models on the test dataset (20260 frames).\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eML Classifiers\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"5\" nameend=\"c7\" namest=\"c3\"\u003e \u003cp\u003ePerformance metrics\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eF1-score\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBinary GLM Logistic R\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e98.2%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e97.0%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e97.3%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e99.2%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e98.2%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNeural Network\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e98.2%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e97.9%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e98.2%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e98.4%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e98.3%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eKNN\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e95.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e94.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e94.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e97.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e96.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNeural Network\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e97.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e96.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e97.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e97.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e97.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"7\" nameend=\"c7\" namest=\"c1\"\u003e \u003cp\u003eF1 is a feature fusion of all models mentioned in scenarios 2 and 3 (Xception, InceptionResnetV2, and the baseline model. F2 is a feature fusion between the two best models from scenario 2. F3 and F4 are feature fusions between the baseline model and Xception in S2, and the baseline model and InceptionResnetV2 in S2, respectively.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003eML classifiers' performance metrics\u003c/h2\u003e \u003cp\u003eThe performance of the best ML classifier in the various fusion processes was evaluated in terms of accuracy, precision, recall, and F1-score. As shown in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e4\u003c/span\u003e, both the Binary GLM Logistic Regression (F1) and Neural Network (F2) accurately predicted 98.16% of the LUS frames. This gives these models an F1-score of 98.1% and precision and recall of 98.1%, respectively. The KNN model in F3 achieved an accuracy of 95.8% and a precision, recall and F1-score of 95.8%. Lastly, the Neural Network model in F4 achieved an accuracy of 96.8% and a specificity, precision, recall and F-1 score of 96.8%. Overall, both ML classifiers in F1 and F2 outperformed ML classifiers in F3 and F4 in terms of accuracy, precision, recall, and F1-score.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003eML classifiers\u0026rsquo; confusion matrix\u003c/h2\u003e \u003cp\u003eFor the unseen test set, the results of both ML classifiers are presented in Fig.\u0026nbsp;\u003cspan refid=\"Fig20\" class=\"InternalRef\"\u003e21\u003c/span\u003e. F1 and F2 had the same number of false predictions, with 38 frames each (highlighted in light green). In contrast, F3 and F4 had much higher rates of false predictions, with 86 and 65 frames, respectively (highlighted in light green). These results indicate that F3 and F4 were less accurate than F1 and F2 in classifying the test data.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003eDetailed evaluation of results\u003c/h2\u003e \u003cp\u003eFigures \u003cspan refid=\"Fig21\" class=\"InternalRef\"\u003e22\u003c/span\u003e, \u003cspan refid=\"Fig22\" class=\"InternalRef\"\u003e23\u003c/span\u003e, \u003cspan refid=\"Fig23\" class=\"InternalRef\"\u003e24\u003c/span\u003e, and \u003cspan refid=\"Fig24\" class=\"InternalRef\"\u003e25\u003c/span\u003e show a comprehensive overview of the ML classifiers' performance on the test set, 16 LUS clips, which represent 15% of LUS datasets used (the test subset). The figures below show the accuracy of model prediction for each clip, including the count of LUS frames and true and false predictions for healthy and non-healthy frames. For example, in case number 2\u0026ndash;2 from Fig.\u0026nbsp;\u003cspan refid=\"Fig21\" class=\"InternalRef\"\u003e22\u003c/span\u003e-b, which is labelled as IS, the first classifier (C1) accurately identified 109 out of 120 frames as non-healthy (IS) while incorrectly predicting 11 frames as healthy. In contrast, considering the same case number, the second classifier (C2), as indicated in Fig.\u0026nbsp;\u003cspan refid=\"Fig22\" class=\"InternalRef\"\u003e23\u003c/span\u003e-b, performed better by correctly predicting all frames as non-healthy (IS). Furthermore, according to Fig.\u0026nbsp;\u003cspan refid=\"Fig23\" class=\"InternalRef\"\u003e24\u003c/span\u003e-b, the third classifier (C3) correctly frames as non-healthy (IS) and made 16 incorrect predictions in the same case number. Additionally, as evident in Fig.\u0026nbsp;\u003cspan refid=\"Fig24\" class=\"InternalRef\"\u003e25\u003c/span\u003e-b, the fourth classifier (C4) only accurately identified 99 out of 12o frames as non-healthy (IS) and made 21 false predictions in the same case number. This comparison highlights the advanced predictive capability of fused pre-trained models in the second fusion process (F2), which incorporates features from both Xception and InceptionResnetV2 in Scenario 2 for accurately differentiating between IS and healthy frames.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003eThe experts compared to our models\u003c/h2\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eAll developed AI and ML models were evaluated in terms of true and false positives and negatives, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig25\" class=\"InternalRef\"\u003e26\u003c/span\u003e. The GT label used to assess the performance of the DL models compared to our experts is the label that matches the entire LUS clip label, in which every LUS clip is given a label that identifies its classification. Our 1st expert (MS) identified 75% (12 clips) of the labelled cases within the sample test subset (16 clips) as correct predictions and 25% (4 clips) as false predictions of the labelled cases. In contrast, our 2nd expert (CE) identified 88% (14 clips) of the labelled cases within the sample test subset (16 clips) as correct predictions and 12% (2 clips) as false predictions of the labelled cases. Among all developed models, both S2 and fusion models F1, F2, F3, and F4 predicted 100% of the LUS clips with the best accuracy. They identified 100% (16 clips) of the labelled cases within the sample test subset (16 clips) as correct predictions and with no false predictions. This gives these models an F1-score of 100%, a precision of 100% and a recall of 100%. The DL models in S1 and S3 identified 88% (14 clips) of the labelled cases within the sample test subset (16 clips) as correct predictions and with 2 false predictions. Figure\u0026nbsp;\u003cspan refid=\"Fig27\" class=\"InternalRef\"\u003e27\u003c/span\u003e shows a spreadsheet capturing the evaluation process and outcomes of LUS clip predictions performed by our experts and all developed models on a set of 16 clips. The GT labels are noted, against which the assessments of each expert and model are compared. Each column under the experts and models represents their predictions for the clips, with '1' indicating an IS prediction and '0' representing a healthy clip. The cells highlighted in pink mark the instances where a false prediction was recorded by the corresponding expert or developed model. Overall, both ML classifiers (F1 and F2 ) and models in S2 outperformed our expert performance and other developed AI and ML models in terms of accuracy, precision, recall, and F1-score. A more detailed display of the performance of AI and ML models is provided in \u003cb\u003eTable\u0026nbsp;5\u003c/b\u003e. Furthermore, Fig.\u0026nbsp;\u003cspan refid=\"Fig26\" class=\"InternalRef\"\u003e28\u003c/span\u003e. illustrates ROC curve, which shows the performances of our experts and developed models via TPR and FPR. The ROC curve shows that the fused models, F1, F2, F3 and F4, and S2 models significantly outperform our experts and other developed models.\u003c/p\u003e \u003cp\u003e \u003cb\u003eTable\u0026nbsp;5.\u003c/b\u003e The performance metrics of developed models on 16 LUS clips (8 Healthy and 8 IS).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"No\" id=\"Tabb\" border=\"1\"\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"5\" nameend=\"c7\" namest=\"c3\"\u003e \u003cp\u003ePerformance Metrics\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eF1-score\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eExpert 1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e75%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e75%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e75%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e75%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e75%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eExpert 2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e87.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e100%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e75%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e85.71%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eScenario 1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eXception\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e87.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e87.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e87.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e87.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e87.5%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003eScenario 2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eXception\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eInceptionResnetV2\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eScenario 3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eBaseline Model\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e87.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e87.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e87.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e87.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e87.5%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eBinary GLM Logistic R\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eNeural Network\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eKNN\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eF4\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eNeural Network\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e100%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colspan=\"7\" nameend=\"c7\" namest=\"c1\"\u003e \u003cp\u003e* The metrics highlighted in bold indicate the highest performance achieved among the ML and DL models.\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThe fused models in the F1 and F2 processes, which combined either both models in S2 and S3 or only models in S2, remarkably display substantial enhancements across a spectrum of performance metrics compared to all developed models. This substantial improvement can be attributed to the strategic use of filtering techniques during the training phase in S2 and the fusion technique used. Several noteworthy observations within previous results illustrate the profound influence of the training modifications and fusion technique applied.\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eEnhancement in accuracy\u003c/strong\u003e \u003c/p\u003e\u003cp\u003eThe fused models in (F1 and F2) demonstrate a notable surge in accuracy (98.2%) compared to other models. This discernible advancement signifies that the applied filtering techniques have culminated in more precise overall predictions. Additionally, the fusion models exceed the accuracy compared to the best individual models in S2 (95.9 and 95.8).\u003c/p\u003e \u003cp\u003e\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eDiminished false predictions\u003c/strong\u003e \u003c/p\u003e\u003cp\u003eImportantly, the fused models in (F1 and F2) showed a notable reduction in false predictions relative to the individual DL models in S2. This reduction serves as tangible evidence that the incorporated filtering techniques and combined features into fused models have effectively mitigated the model's tendency to misclassify healthy instances as IS and vice versa.\u003c/p\u003e \u003cp\u003e\u003c/p\u003e \u003cp\u003eThe comparison between the fused models, F1 and F2, and individual models in S2 revealed a notable decrease in false predictions (including false negatives and positives) within both models F1 and F2. F1 and F2 models mislabelled only 38 frames, a significant improvement compared to the 84 frames mislabelled by the Xception model in S2.\u003c/p\u003e \u003cp\u003eAdditionally, within individual models in S1, S2 and S3, the Grad-CAM and LIME visualisation supported our method of excluding healthy frames from IS frames during training of the Xception model in S2, even with the small dataset used, compared to the Xception model in S1. This method had a significant impact on the observed performance differences between the two models. LIME and Grad-CAM visualisation align closely with the clinical decision-making process by emphasising the critical areas that lead to accurate predictions. The comparison of Xception and InceptionResnetV2 in S2 and the baseline model in S3, using LIME and Grad-CAM, highlighted the efficacy of our suggested TL technique. With the same dataset used for the training, both models in S2 performed better than the baseline models trained from scratch.\u003c/p\u003e \u003cp\u003eThe decision to exclude healthy frames from the non-healthy class had a discernible impact on the model's performance in S2 and S3. By eliminating the potential confusion caused by healthy frames, the models became more adept at distinguishing between different frames, resulting in higher accuracy and precision. This modification effectively reduced the cases of false positives in the non-healthy cases, ultimately contributing to the improved accuracy observed in models in S2 and S3. The strategic exclusion of healthy frames from the non-healthy LUS played a crucial role in enhancing models in S2 and S3's ability to differentiate between healthy and IS frames. This decision led to a substantial improvement in overall accuracy, precision, and, ultimately, DL models' effectiveness in IS-aided diagnosis. In S1, the Xception model, which did not involve any frame exclusion, had its accuracy set at 84.6%. In contrast, in S2, the Xception model, which applied filtering techniques, substantially improved the overall accuracy, reaching 95.9%. This emphasised the substantial enhancement conferred by the application of filtering to the training dataset as opposed to utilising all frames in the training process.\u003c/p\u003e \u003cp\u003eFurthermore, upon evaluation of a subset of 16 clips, the result shows that the fused models, F1, F2, F3 and F4, and S2 models, outperformed our experts' performance in classifying healthy and IS LUS clips. This suggests that developed AI and ML models have a high level of accuracy, precision, recall, and F1-score in detecting healthy and IS from LUS frames. Expert 1 (MS) had a high rate of false predictions (25%=4 clips), which indicates that there are some challenges in distinguishing between healthy and IS LUS clips from the video-level assessment. Comparably, Expert 2 (CE) showed a high diagnostic accuracy, with a reduced rate of false predictions, with only 12.5% of the clips (2 out of 16).\u003c/p\u003e \u003cp\u003eIn our comprehensive assessment of detecting IS, the case of diagnostic disagreement emerged, highlighting the challenges in LUS interpretation. Our clinical experts from Melbourne (DC and XC) initially classified those LUS clips, herein referred to as a clip, as indicative of a healthy or IS lung condition. This initial assessment was based on IS criteria from the international evidence-based recommendations (\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e) and their clinical expertise, reflecting a view at the time of their acquisition and based on multiple zones on the patient's lung.\u003c/p\u003e \u003cp\u003eDuring the re-assessment of the LUS clips by Expert 1 (MS) and Expert 2 (CE), utilising only 1 clip from a single zone within the lung, a significant discrepancy is observed. In instances where Expert 2 (CE) incorrectly predicted clips, Expert 1 (MS) made the same error in prediction compared to the initial labels, as GT, by our experts, DC and XC. Among these false predictions were two clips, each originating from a pair assigned to two different patients, making up a total of four videos where each pair included one clip that was accurately identified and another that was not. This pattern of mislabelling by both experts, involving the same clip for each patient, highlights the challenges of consistent video interpretation and shows the complexity inherent in diagnosing through LUS clips with a limited field of view or poor-quality images and no prior knowledge of patent history.\u003c/p\u003e \u003cp\u003eThis disagreement among experts shows the subjectivity and variability that define the interpretation of LUS clips, particularly with artefacts related to B-lines. Such variability arises from the diverse interpretations by experts regarding the origin of B-lines—whether they emanate from the pleural line or not—and the quantification of these B-lines. Contributing to these discrepancies are factors such as a constrained field of view and the challenges posed by poor-quality LUS clips. These elements highlight the complexities of LUS analysis. In contrast, most of the developed AI and ML models accurately predicted all test clips with a high level of accuracy.\u003c/p\u003e \u003cp\u003eGiven the results shown by the fused models, it is relevant to situate our study within the ongoing research effort between Melbourne University and QUT researchers in the field of LUS pathology evaluation using AI tools. Our collaborative research team has recently proposed a fully automated LUS evaluation for lung pathologies, including pleural effusion, atelectasis (collapse), consolidation, interstitial syndrome, and pneumothorax. As part of this collaboration, a study by Tsai et al. (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e) achieved 92% accuracy in classifying pleural effusion using a DL model consisting of a Regularised Spatial Transformer Network (Reg-STN). A follow-up study by Durrani et al. (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e) demonstrated DL's potential for diagnosing pulmonary consolidation or collapse with 89% accuracy. Our current work has resulted in the development of a best-trained ML model that achieved a test accuracy of 98.2%. This achievement represents a significant milestone in our work, demonstrating for the first time the use of pre-trained CNNs with a feature extraction and fusion method to develop a diagnostic tool for IS screening in LUS frames. Importantly, the model's proficiency extends to the analysis of LUS video clips, which not only enhances the model's diagnostic accuracy within LUS frames but also its utility in clinical settings.\u003c/p\u003e \u003cp\u003eA notable limitation of this study is that we only used a small test set (16 clips) to evaluate our experts' performance compared to developed models. This may not be representative of the general population or the different settings where LUS imaging is performed. Furthermore, only two experts evaluated these test clips. Therefore, further studies are needed to test our expert's performance on larger and more diverse datasets of LUS with more LUS experts involved.\u003c/p\u003e \u003cp\u003eAnother limitation of this study arises from using DL models that were initially pre-trained on general-purpose image classification using natural images (out-domain dataset). We then retrained these models using our specific LUS dataset (the target dataset). As a result of the inherent differences between the original dataset (comprising natural images) and our target dataset (comprising LUS images), DL models produced some inaccurate predictions, specifically with Grad-CAM in healthy examples with the Xception model in S2 (Fig.\u0026nbsp;\u003cspan refid=\"Fig15\" class=\"InternalRef\"\u003e14\u003c/span\u003e). In our future work, we plan to enhance the model's performance by retraining it using the same source dataset (US images) and incorporating a larger number of unlabelled US images. Following this, we will fine-tune the model by introducing a small, represented, and labelled LUS dataset and categorising it into multiple classes to identify multiple LUS pathologies. Another area for improvement is that labelled LUS clips used in this study are based only on US medical reports without the ability to correlate them with CT findings. For future work, we will aim to access a larger dataset that includes both LUS and CT scans as well as patients' medical reports. This is because a CT scan is considered the gold standard for diagnosing IS. Comparing the model prediction with medical reports and scan findings from both imaging modalities will provide reassurance to clinical experts to trust this technology.\u003c/p\u003e "},{"header":"Conclusion and future work","content":"\u003cp\u003eThe pre-trained models utilised in this study function effectively as an automated tool for recognising B-lines and identifying IS in LUS video frames. Apart from these significant results, the fusion models from features extracted from those models outperformed the individual DL modes in terms of accuracy.\u003c/p\u003e\u003cp\u003eFuture studies in this area could further enhance the applicability and reliability of the proposed CNN models with TL and feature-fusion approach for other lung diseases. Firstly, expanding the dataset size and diversity could help validate the model's generalizability across different patient populations, diseases, and imaging conditions. Additionally, investigating the model's performance in distinguishing between different types of ISs could provide valuable insights into its potential clinical utility for other LUS disorders. In conclusion, while the current study shows promising results in IS screening using pre-trained models and LUS frames can explain where the model is looking and making decisions (using Grad-CAM and LIME), future research should focus on expanding the dataset and performing rigorous validation across diverse LUS datasets. This progressive approach will contribute to the establishment of an accurate, reliable, and clinically valuable tool for the diagnosis and management of LUS disorders.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eAcknowledgements\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003eAuthor contributions\u003c/p\u003e\n\u003cp\u003eKM -the main coauthor, wrote the main manuscript text, performed trainings and validation, and the testing. MA, DF, DV and JD provided critical revisions and approved the final version. MS, CE- clinical image validation. DC, XC, AR, CR and KH \u0026ndash; data collection and clinical discussions.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eFunding\u003c/p\u003e\n\u003cp\u003eThe present wok has no funding.\u003c/p\u003e\n\u003cp\u003eAvailability of data and materials\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe datasets used during this study are not publicly available but can be obtained from the corresponding author(s) on reasonable request.\u003c/p\u003e\n\u003cp\u003eEthics approval and consent to participate\u003c/p\u003e\n\u003cp\u003eWritten informed consent was obtained from all patients who participated in this study. The study was approved by the Melbourne Health Human Research Ethics Committee (HREC/18/MH/269) and registered with ( ANZCTR) (25).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eConsent for publication\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003eCompeting interests\u003c/p\u003e\n\u003cp\u003eThe authors declare no competing interests.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eWang J, Yang X, Zhou B, Sohn JJ, Zhou J, Jacob JT, et al. Review of Machine Learning in Lung Ultrasound in COVID-19 Pandemic. J Imaging. 2022;8(3). \u003c/li\u003e\n\u003cli\u003eDurrani N, Vukovic D, van der Burgt J, Antico M, van Sloun RJG, Canty D, et al. Automatic deep learning-based consolidation/collapse classification in lung ultrasound images for COVID-19 induced pneumonia. Sci Rep. 2022;12(1):17581\u0026ndash;17581. \u003c/li\u003e\n\u003cli\u003eVukovic D, Wang A, Antico M, Steffens M, Ruvinov I, van Sloun RJ, et al. Automatic deep learning-based pleural effusion segmentation in lung ultrasound images. BMC Med Inform Decis Mak [Internet]. 2023 Nov 29 [cited 2023 Dec 9];23(1):274. Available from: https://doi.org/10.1186/s12911-023-02362-6\u003c/li\u003e\n\u003cli\u003eTsai CH, Van Der Burgt J, Vukovic D, Kaur N, Demi L, Canty D, et al. Automatic deep learning-based pleural effusion classification in lung ultrasound images for respiratory pathology diagnosis. Phys Med. 2021;83:38\u0026ndash;45. \u003c/li\u003e\n\u003cli\u003eBarros B, Lacerda P, Albuquerque C, Conci A. Pulmonary COVID-19: Learning Spatiotemporal Features Combining CNN and LSTM Networks for Lung Ultrasound Video Classification. Sensors [Internet]. 2021 Aug 14 [cited 2023 Mar 2];21(16):5486. Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8401701/\u003c/li\u003e\n\u003cli\u003eYu R, Tian Y, Gao J, Liu Z, Wei X, Jiang H, et al. Feature discretization-based deep clustering for thyroid ultrasound image feature extraction. Comput Biol Med. 2022 Jul 1;146:105600. \u003c/li\u003e\n\u003cli\u003eYan JH, Pan L, Gao YB, Cui GH, Wang YH. Utility of lung ultrasound to identify interstitial lung disease: An observational study based on the STROBE guidelines. Medicine (Baltimore). 2021;100(12):e25217. \u003c/li\u003e\n\u003cli\u003eCamacho J, Mu\u0026ntilde;oz M, Genov\u0026eacute;s V, Herraiz JL, Ortega I, Belarra A, et al. Artificial Intelligence and Democratization of the Use of Lung Ultrasound in COVID-19: On the Feasibility of Automatic Calculation of Lung Ultrasound Score. Int J Transl Med. 2022;2(1):17\u0026ndash;25. \u003c/li\u003e\n\u003cli\u003eVolpicelli G, Fraccalini T, Cardinale L. Lung ultrasound: are we diagnosing too much? Ultrasound J [Internet]. 2023 Mar 29 [cited 2023 Apr 17];15(1):17. Available from: https://doi.org/10.1186/s13089-023-00313-w\u003c/li\u003e\n\u003cli\u003eSmargiassi A, Zanforlin A, Perrone T, Buonsenso D, Torri E, Limoli G, et al. Vertical Artifacts as Lung Ultrasound Signs: Trick or Trap? Part 2- An Accademia di Ecografia Toracica Position Paper on B-Lines and Sonographic Interstitial Syndrome. J Ultrasound Med [Internet]. 2023 [cited 2023 Sep 6];42(2):279\u0026ndash;92. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1002/jum.16116\u003c/li\u003e\n\u003cli\u003eZanza C, Saglietti F, Tesauro M, Longhitano Y, Savioli G, Balzanelli MG, et al. Cardiogenic Pulmonary Edema in Emergency Medicine. Adv Respir Med [Internet]. 2023 Oct [cited 2024 May 24];91(5):445\u0026ndash;63. Available from: https://www.mdpi.com/2543-6031/91/5/34\u003c/li\u003e\n\u003cli\u003eWang Y, Gargani L, Barskova T, Furst DE, Cerinic MM. Usefulness of lung ultrasound B-lines in connective tissue disease-associated interstitial lung disease: A literature review. Arthritis Res Ther. 2017;19(1):206\u0026ndash;206. \u003c/li\u003e\n\u003cli\u003eBaloescu C, Toporek G, Kim S, McNamara K, Liu R, Shaw MM, et al. Automated Lung Ultrasound B-Line Assessment Using a Deep Learning Algorithm. IEEE Trans Ultrason Ferroelectr Freq Control [Internet]. 2020 Nov [cited 2024 May 20];67(11):2312\u0026ndash;20. Available from: https://ieeexplore.ieee.org/document/9116812\u003c/li\u003e\n\u003cli\u003eBaloescu C, Rucki AA, Chen A, Zahiri M, Ghoshal G, Wang J, et al. Machine Learning Algorithm Detection of Confluent B-Lines. Ultrasound Med Biol. 2023;49(9):2095\u0026ndash;102. \u003c/li\u003e\n\u003cli\u003eBorn J, Wiedemann N, Cossio M, Buhre C, Br\u0026auml;ndle G, Leidermann K, et al. Accelerating Detection of Lung Pathologies with Explainable Ultrasound Image Analysis. Appl Sci [Internet]. 2021 Jan [cited 2023 Mar 2];11(2):672. Available from: https://www.mdpi.com/2076-3417/11/2/672\u003c/li\u003e\n\u003cli\u003eAlzubaidi L, Zhang J, Humaidi AJ, Al-Dujaili A, Duan Y, Al-Shamma O, et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J Big Data. 2021;8(1):53\u0026ndash;53. \u003c/li\u003e\n\u003cli\u003eAlzubaidi L, Duan Y, Al-Dujaili A, Ibraheem IK, Alkenani AH, Santamar\u0026iacute;a J, et al. Deepening into the suitability of using pre-trained models of ImageNet against a lightweight convolutional neural network in medical imaging: an experimental study. PeerJ Comput Sci [Internet]. 2021 Sep 28 [cited 2023 Aug 28];7:e715. Available from: https://peerj.com/articles/cs-715\u003c/li\u003e\n\u003cli\u003eTajbakhsh N, Shin JY, Gurudu SR, Hurst RT, Kendall CB, Gotway MB, et al. Convolutional Neural Networks for Medical Image Analysis: Full Training or Fine Tuning? IEEE Trans Med Imaging. 2016;35(5):1299\u0026ndash;312. \u003c/li\u003e\n\u003cli\u003eAlzubaidi L, Al-Amidie M, Al-Asadi A, Humaidi AJ, Al-Shamma O, Fadhel MA, et al. Novel transfer learning approach for medical imaging with limited labeled data. Cancers. 2021;13(7):1590-. \u003c/li\u003e\n\u003cli\u003eAlammar Z, Alzubaidi L, Zhang J, Li Y, Lafta W, Gu Y. Deep Transfer Learning with Enhanced Feature Fusion for Detection of Abnormalities in X-ray Images. Cancers [Internet]. 2023 Jan [cited 2023 Aug 8];15(15):4007. Available from: https://www.mdpi.com/2072-6694/15/15/4007\u003c/li\u003e\n\u003cli\u003eSzegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going Deeper with Convolutions. arXiv.org. 2014; \u003c/li\u003e\n\u003cli\u003eRibeiro MT, Singh S, Guestrin C. \u0026lsquo;Why Should I Trust You?\u0026rsquo;: Explaining the Predictions of Any Classifier [Internet]. arXiv; 2016 [cited 2024 Jan 31]. Available from: http://arxiv.org/abs/1602.04938\u003c/li\u003e\n\u003cli\u003eZhang Y, Liao QV, Bellamy RKE. Effect of Confidence and Explanation on Accuracy and Trust Calibration in AI-Assisted Decision Making. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency [Internet]. 2020 [cited 2024 Feb 5]. p. 295\u0026ndash;305. Available from: http://arxiv.org/abs/2001.02114\u003c/li\u003e\n\u003cli\u003eSelvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Int J Comput Vis [Internet]. 2020 Feb [cited 2024 Jan 31];128(2):336\u0026ndash;59. Available from: http://arxiv.org/abs/1610.02391\u003c/li\u003e\n\u003cli\u003eEffect of a Multiorgan Focused Clinical Ultrasonography on Length of Stay in Patients Admitted With a Cardiopulmonary Diagnosis: A Randomized Clinical Trial | Pulmonary Medicine | JAMA Network Open | JAMA Network [Internet]. [cited 2024 May 27]. Available from: https://jamanetwork.com/journals/jamanetworkopen/fullarticle/2787284\u003c/li\u003e\n\u003cli\u003eVolpicelli G, Elbarbary M, Blaivas M, Lichtenstein DA, Mathis G, Kirkpatrick AW, et al. International evidence-based recommendations for point-of-care lung ultrasound. Intensive Care Med. 2012 Apr;38(4):577\u0026ndash;91. \u003c/li\u003e\n\u003cli\u003eSzegedy C, Ioffe S, Vanhoucke V, Alemi A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning [Internet]. arXiv; 2016 [cited 2024 Jan 31]. Available from: http://arxiv.org/abs/1602.07261\u003c/li\u003e\n\u003cli\u003eChollet F. Xception: Deep Learning with Depthwise Separable Convolutions [Internet]. arXiv; 2017 [cited 2024 Jan 31]. Available from: http://arxiv.org/abs/1610.02357\u003c/li\u003e\n\u003cli\u003eRechkemmer A, Yin M. When Confidence Meets Accuracy: Exploring the Effects of Multiple Performance Indicators on Trust in Machine Learning Models. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems [Internet]. New York, NY, USA: Association for Computing Machinery; 2022 [cited 2024 Feb 4]. p. 1\u0026ndash;14. (CHI \u0026rsquo;22). Available from: https://dl.acm.org/doi/10.1145/3491102.3501967\u003c/li\u003e\n\u003cli\u003eMungoli N. Adaptive Feature Fusion: Enhancing Generalization in Deep Learning Models [Internet]. arXiv; 2023 [cited 2024 Feb 7]. Available from: http://arxiv.org/abs/2304.03290\u003c/li\u003e\n\u003cli\u003eAlzubaidi L, Fadhel MA, Albahri AS, Salhi A, Gupta A, Gu Y. Domain Adaptation and Feature Fusion for the Detection of Abnormalities in X-Ray Forearm Images. In: 2023 45th Annual International Conference of the IEEE Engineering in Medicine \u0026amp; Biology Society (EMBC) [Internet]. Sydney, Australia: IEEE; 2023 [cited 2024 Mar 12]. p. 1\u0026ndash;5. Available from: https://ieeexplore.ieee.org/document/10340309/\u003c/li\u003e\n\u003cli\u003eElharrouss O, Akbari Y, Almaadeed N, Al-Maadeed S. Backbones-Review: Feature Extraction Networks for Deep Learning and Deep Reinforcement Learning Approaches [Internet]. arXiv; 2022 [cited 2024 Mar 14]. Available from: http://arxiv.org/abs/2206.08016\u003c/li\u003e\n\u003cli\u003eClassification - MATLAB \u0026amp; Simulink - MathWorks Australia [Internet]. [cited 2024 Feb 6]. Available from: https://au.mathworks.com/help/stats/classification.html\u003c/li\u003e\n\u003cli\u003eKhan U, Smargiassi A, Inchingolo R, Demi L. A Novel Weighted Majority Voting-Based Ensemble Framework for Lung Ultrasound Pattern Classification in Pneumonia Patients. In: 2023 IEEE International Ultrasonics Symposium (IUS) [Internet]. 2023 [cited 2024 Mar 27]. p. 1\u0026ndash;4. Available from: https://ieeexplore.ieee.org/abstract/document/10308194\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Interstitial Syndrome, Lung ultrasound, Deep Learning, Transfer Learning, Features","lastPublishedDoi":"10.21203/rs.3.rs-4487345/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4487345/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground\u003c/strong\u003e Interstitial/Alveolar Syndrome (IS) is a condition detectable on lung ultrasound (LUS) that indicates underlying pulmonary or cardiac diseases associated with significant morbidity and increased mortality rates. The diagnosis of IS using LUS can be challenging and time-consuming, and it requires clinical expertise.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethods\u003c/strong\u003e In this study, multiple Convolutional Neural Network (CNN) deep learning (DL) models were trained, acting as binary classifiers, to accurately screen for IS from LUS frames by differentiating between IS-present and healthy cases. The CNN DL models were initially pre-trained using a generic image dataset to learn general visual features (ImageNet), and then \u0026nbsp;fine-tuned on our specific dataset of 108 LUS clips from 54 patients (27 healthy and 27 with IS), with two clips per patient, to perform a binary classification task. Each frame within a clip was assessed to determine the presence of IS features or to confirm a healthy lung status. The dataset was split into training (70%), validation (15%), and testing (15%) sets. Following the process of fine-tuning, we successfully extracted features from pre-trained DL models. These extracted features were utilised to train multiple machine learning (ML) classifiers, hence the trained ML classifiers yielded significantly improved accuracy in IS classification. Advanced visual interpretation techniques, such as heatmaps based on Gradient-weighted Class Activation Mapping (Grad-CAM) and Local Interpretable Model-Agnostic explanations (LIME), were implemented to further analyse the outcomes.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e The best-trained ML model achieved a test accuracy of 98.2%, with specificity, recall, precision, and F1-score values all above 97.9%. Our study demonstrates, for the first time, the feasibility of using a pre-trained CNN with the feature extraction and fusion technique as a diagnostic tool for IS screening on LUS frames, providing a time-efficient and practical approach to clinical decision-making.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusion\u003c/strong\u003e This study confirms the practicality of using pre-trained CNN models, with the feature extraction and fusion technique, for screening IS through LUS frames. This represents a noteworthy advancement in improving the efficiency of diagnosis. In the next steps, validation on larger datasets will assess the applicability and robustness of these CNN models in more complex clinical settings.\u003c/p\u003e","manuscriptTitle":"Convolutional Automatic Identification of B-lines and Interstitial Syndrome in Lung Ultrasound Images Using Pre-Trained Neural Networks with Feature Fusion","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-06-18 16:38:09","doi":"10.21203/rs.3.rs-4487345/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"a77aa0ec-548b-4e17-9268-e95ca8a918d3","owner":[],"postedDate":"June 18th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-10-11T12:09:25+00:00","versionOfRecord":[],"versionCreatedAt":"2024-06-18 16:38:09","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4487345","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4487345","identity":"rs-4487345","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.