Fusion of automatically learned rhythm and morphology features matches diagnostic criteria and enhances AI explainability | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Fusion of automatically learned rhythm and morphology features matches diagnostic criteria and enhances AI explainability Alexander Hammer, Marc Goettling, Hagen Malberg, Axel Linke, Sergio Richter, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4655592/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 27 Aug, 2025 Read the published version in npj Artificial Intelligence → Version 1 posted 11 You are reading this latest preprint version Abstract Deep learning (DL) has demonstrated high accuracy in ECG analysis but lacks in explainability. Although explanations can be estimated using explainable artificial intelligence, their causality has not yet been sufficiently investigated. We present a generalizable method for extensively validating the DL explanations’ causality by relating them to clinically relevant ECG characteristics. We applied xECGArch, combining a long-term and a short-term model, for atrial fibrillation (AF) detection in 1,521 single-lead ECGs, achieving an accuracy of 96.3%. The explanations match the diagnostic criteria of AF regarding rhythm and morphology. While the short-term model emphasizes morphology features such as P and fibrillatory waves, the long-term model focuses on QRS complexes. Moreover, the long-term model explanations strongly correlate with rhythm ( \(p<0.001\) ). For improved clinical interpretability, we introduce a fused representation (xFuseMap), highlighting relevant explanations for rhythm and morphology. We thus demonstrate an explainable and interpretable DL application with potential for providing diagnostic support. Physical sciences/Mathematics and computing/Computational science Physical sciences/Mathematics and computing/Computer science Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 1 Introduction Cardiovascular diseases (CVDs) are the leading cause of premature death worldwide 1 , being responsible for an estimated 20.5 million deaths in 2021 according to World Heart Federation 2 , and a main factor for disability 3 . Many CVDs are age related 1,3 . Therefore, demographic changes and increasing life expectancy are expected to lead to a higher incidence of disease in many parts of the world 3 . If detected early, interventions can be initiated and critical courses of CVDs, including premature death, can be prevented 3,4 . Electrocardiogram (ECG) analyses can non-invasively reveal anomalies in the cardiac excitation as predictors of CVDs 5 . The 12-lead ECG analyzed by at least one cardiologist is considered the clinical standard 6 . However, manual ECG analysis is very time consuming and strongly dependent on the physician’s expertise, experience, and routine as well as factors such as stress or fatigue 6 . Deep learning (DL) is a high-performance method for automatically detecting CVDs in ECGs, which can support medical diagnostics and reduce personnel dependency 6 . DL refers to deep neural networks (DNNs) which are complex mathematical models from the field of artificial intelligence (AI). DNNs generate knowledge from training data based on machine learning (ML) methods and apply it to unknown data 7 . During training, DNNs can learn non-linear representations of the model output in raw input data through interconnected neurons in hidden layers 7,8 . Therefore, in contrast to shallow ML methods, DNNs are able to learn representations from raw data themselves and do not require any prior feature extraction, which is prone to errors and requires a priori knowledge 7 . However, DNNs are often referred to as black boxes due to their complexity, making their decision-making process inexplainable 9 . Nevertheless, decision support requires trustworthiness, which is achieved through the explainability of the machine’s decision-making process and the interpretability of the features used 9,10 . Explainable AI (xAI) methods can partially brighten the black box of DNNs doing post-hoc explanations to approximate the relevance of input values for the model decision 11 . Initial research has therefore recently dealt with of the use of xAI in ECG analysis, e.g., for atrial fibrillation (AF) detection 12–15 , long QT syndrome detection and classification 16 , myocardial infarction detection and classification 17–19 , or multiclass cardiac anomaly detection and classification 13,14,20–22 . However, making DL usable for decision support requires the understanding of the learned representations’ causality 10 . Most times 12,17–19,21,22 , the correspondence of relevant signal components according to xAI and diagnostic criteria is only proven showing exemplary so-called saliency or heat maps, illustrating the relevance of single data points for classification. However, actually understanding the learned representations’ origin requires systematic investigations, such as those carried out in few papers 12–14,16,20,23 that are summarized in Table 1. To achieve clinical relevance, the model explanations must be as close as possible to the cardiological reading of the ECG, considering rhythm and beat morphology for decision-making. In previous studies, QRS complexes are often identified as relevant 12,13,15,20,23 , although xAI methods do not report whether relevance is based on rhythmic or morphologic characteristics, both, or neither. Singh & Sharma classified individual beats and thus narrowed down explanations to morphological characteristics 20 . Ivaturi et al. estimated the relevance of rhythmic information by measuring the altered relevance of the QRS complexes when eliminating RR interval variability by stretching or compressing the RR intervals 13 . However, it is unclear whether the deformation of the QRS complexes due to stretching/compression influences the changes in relevance. Our recently introduced explainable architecture for ECG analysis (xECGArch) is designed to learn rhythmic and morphological information in two separate CNNs for joint classification 15 . The long-term model considers an entire 10-second ECG at once, enabling rhythm analysis, while the short-term model considers only 0.6-second windows at once and thus obtains hardly any rhythm information but morphology information. However, the relationship between automatically learned features and rhythm and morphology information has not yet been systematically investigated. In summary, to the best of our knowledge, there is currently no method for generalizable quantitative analysis of model explanations by examining the relevance of specific ECG characteristics in ECG analysis. However, this is fundamental to bring xAI into clinical use. Aside from the selection of quantitative analysis method, the explanatory power of the results depends on a1) the model’s classification quality, a2) the dataset characteristics, and a3) the choice of the xAI method 15 : a1) Some of the models listed in Table 1 had a sensitivity of less than 90 % 13,14,16 , which is below the threshold for good classification 24 and raises the question of whether the ECG characteristics identified as relevant by xAI are suitable for detecting the corresponding class. a2) The generalizability of the results increases, e.g., with the size and diversity of the dataset. Therefore, the comparison groups for a target class should be diverse in terms of age, sex, or assorted classes for comparison. The latter was considered in most papers summarized in Table 1. a3) In a previous study 15 , by systematically perturbing ECGs we were able to show that the reliability of xAI methods varies greatly. Therefore, to obtain meaningful explanations, the appropriate xAI method must be carefully chosen for each scenario. We present a novel approach to trace DL explanations back to the relevance of rhythmic and morphological information. To do this, we use the highly performant xECGArch 15 , which processes long- and short-term information in separate CNNs for a joint classification as depicted in Fig. 1 . We develop a generalizable method for quantitatively and systematically analyzing the relevance of clinically relevant ECG segments for classification. Based on this, we examine the relationship of model explanations and morphological beat changes or rhythmic information. Furthermore, we present xFuseMap, combining explanations for both long- and short-term models in a fused saliency map to provide a highly interpretable representation of relevant rhythm and morphology features in ECG analysis for the first time. To achieve this, we construct a two-dimensional color space by assigning both models a distinctive color. To conduct our study, we use AF as the target class as it is particularly suitable for our study due to b1) its clinical relevance and b2) its pathophysiology and non-AF (n-AF) as a reference class in favor of the generalizability of our results: b1) AF is the most common clinically relevant cardiac arrhythmic disease globally 25 with a lifetime risk of 22–36% 26 . If untreated, AF is associated with an up to 5-fold increased morbidity, especially stroke, and an up to 2-fold increased mortality 27 . b2) AF is characterized by pathophysiological rhythmic and morphological changes in ECG. It is defined by a disordered spread of excitation in the atria and an irregular transmission to the ventricles. Therefore, the P waves, as reflections of the atrial depolarization, are substituted by F waves that superimpose on the beat morphology 28 . The irregular transmission to the ventricles reflects in an absolute arrhythmia 29 . By conducting a systematic and quantitative analysis of the causal relationship between DL explanations and clinically relevant ECG features on a large and diverse dataset for the first time, we enhance the comprehensibility of xAI, combining meaningful explanations and their interpretability. Furthermore, our approach enables for the first time to account for morphological characteristics that are irregularly distributed across the ECG, such as F waves, which are of great clinical importance. For the first time, xFuseMap offers the potential to present explanations that originate from rhythm and morphology in a combined and interpretable manner that is oriented towards the clinical reading of ECGs. Our methods therefore shed light on the black box of DL-based approaches and pave the way for their implementation of diagnostic support in clinical practice. 2 Results We applied xECGArch to 1,521 unseen, publicly available 10-second single-lead ECGs to classify AF vs. n-AF. The fusion of the decision explanations of the long- and short-term models of xECGArch in xFuseMap provides dissociable information about the relevance of ECG segments on two temporal levels. We demonstrate the relationship between model explanations and signal morphology or rhythm. For this purpose, we split the n-AF class into NSR and other anomalies (O). xECGArch classification performance On an unseen dataset, containing \(n\) = 1,521 ECGs, xECGArch reached an overall accuracy of 96.3% and an F1-score of 94.5% for classifying AF vs. n-AF. To achieve this, the long- and short-term models were weighted in a ratio of 1.669:1, improving the accuracy by 0.46 and the F1-score by 0.65 percentage points compared to xECGArch without optimized weights (see Table 2 ). xECGArch discriminated AF slightly better from NSR (F1 = 96.4%) than from O (F1 = 95.4%), which both make 50% of the n-AF class, respectively. The long-term model outperformed the short-term model with F1 = 94.3% vs. F1 = 92.1% in classifying AF vs. n-AF. The combination of the long-term and the short-term model in xECGArch with optimized weights improved the F1-score by 0.16 or 1.87 percentage points compared to each single model for AF vs. n-AF. However, the optimization did not lead to a change in sensitivity, but an increase in precision from 92.9–94.1%. ECGs with AF were detected equally well, but there were less misclassifications of n-AF ECGs (details in Appendix A). This applies to discriminating AF from both NSR and O. Table 2 Classification metrics for the long-term and the short-term model as well as xECGArch with and without optimized weights of both models that are combined in xECGArch. The original classfication task was atrial fibrillation (AF) vs. non-atrial fibrillation (n-AF). For more detailed insights, the classification metrics are also provided for n-AF being split into normal sinus rhythm (NSR) and other anomalies (O), each containing 50% of the n-AF cases. Classifier Classification task Precision Sensitivity Specificity Accuracy F1 Long-term model AF vs. n-AF 94.1% 94.5% 97.0% 96.2% 94.3% AF vs. NSR 97.8% 94.5% 97.8% 96.2% 96.1% AF vs. O 96.2% 94.5% 96.3% 95.4% 95.3% Short-term model AF vs. n-AF 90.0% 94.3% 94.8% 94.6% 92.1% AF vs. NSR 95.2% 94.3% 95.3% 94.8% 94.8% AF vs. O 94.3% 94.3% 94.3% 94.3% 94.3% xECGArch (1:1 weights) AF vs. n-AF 92.9% 94.9% 96.4% 95.9% 93.9% AF vs. NSR 97.2% 94.9% 97.2% 96.1% 96.0% AF vs. O 95.4% 94.9% 95.5% 95.2% 95.2% xECGArch (optimized weights) AF vs. n-AF 94.1% 94.9% 97.0% 96.3% 94.5% AF vs. NSR 98.0% 94.9% 98.0% 96.5% 96.4% AF vs. O 96.0% 94.9% 96.1% 95.5% 95.4% Fusion of long- and short-term model explanations in a combined saliency map (xFuseMap) We extracted the explanations of both xECGArch models in terms of the relative relevance (rR) of individual samples for the classification decision using deep Taylor decomposition (DTD) and visualized them using xFuseMap. Figure 2 shows the xFuseMaps for correctly classified a) AF, b) O (classified as n-AF; sinus rhythm with ventricular ectopics), and c) NSR (classified as n-AF). The colors represent the samples' rR according to DTD. Blue sections were particularly relevant for the long-term model, orange sections for the short-term model, and pink sections for both models. Black sections were not relevant for any model. The decision reliability is given for both models separately. In all three examples, xFuseMap indicates QRS complexes and especially their right flanks to be primarily relevant to the long-term model, while sequences in between are primarily relevant to the short-term model. An exception is the example for O, which is shown in Fig. 2 b). The ventricular ectopics at seconds 6.2 and 9.3 including the immediately preceding beats are primarily relevant to the short-term model and barely relevant to the long-term model. Furthermore, the remaining R peaks which are in sinus rhythm are colored purple because the left flanks up to the R peaks are of increased rR to the short-term model. While, in case of NSR, the QRS complexes are uniformly colored blue (Fig. 2 c)) or purple (Fig. 2 b)), in case of AF (Fig. 2 a)), the blue tone of the QRS complexes fluctuates irregularly between dark and light blue. This means that while the rR of the QRS complexes that occur irregularly in AF varies, the QRS complexes in NSR are all approximately equally relevant. In the case of n-AF, the flanks of the P wave (primarily the right flanks) are particularly relevant to the short-term model. The right flank of the T wave, the area between the T and subsequent P waves and the left flanks of ventricular ectopics are also partially colored orange and are therefore relevant for distinguishing from AF. Figure 2 a) shows that F waves and, in this case, especially their right flanks are relevant for the short-term model for the detection of AF. This is best visible between seconds 5.5 and 6 and between seconds 7.8 and 8.5. Class-specific relation between model explanations and morphology To systematically investigate the model explanations, we analyzed the rR distribution across 9 diagnostically relevant beat segments and F waves. Therefore, we determined the mean rR of each beat segment type as well as its variability in terms of standard deviation (SD) within a recording. The descriptive statistics are presented as box plots in Fig. 3 a) and b), for the mean rR and in Fig. 3 c) and d), for the SD. Figure 3 a) and c) show the results of the long-term model, while Fig. 3 b) and d) show the results of the short-term model. Two-factor analyses of variance (ANOVAs), performed separately on the mean rR and the SD of the rR of both models individually, showed significant differences ( \(p<0.001\) ) between the classes, the segments, and the factor interaction. The residuals of the ANOVAs were not normally distributed ( \(p<0.001\) ). For clarity, we only show the Tukey-Kramer-corrected significance levels for the class differences within the same segment types in Fig. 3 . An overview of all results of the post-hoc analyses can be found in Appendix B. Across all classes, the R peaks were the most important segments for the long-term model, with a mean rR of 0.609, 0.754, and 0.760 in the median for AF, O, and NSR, respectively. R was followed by the S and Q segments, with the median of the mean rR ranging from 0.341 for the Q segment in AF to 0.521 for the S segment in NSR. The mean rR in Q, R, and S for the long-term model was significantly lower ( \(p<0.001\) ) in cases of AF compared to O or NSR. However, for AF, the SD of mean rR per segment over one recording was significantly ( \(p<0.001\) ) increased by 59.9% (in R) to 67.4% (in S) compared to O and by 69.2% (in Q) to 80.2% (in S) compared to NSR. The median of the remaining segments' mean rR for the long-term model was significantly ( \(p<0.001\) ) lower, ranging from 0.085 to 0.180, with slightly increased rR values in QT (for all classes) as well as P and PQ (for O and NSR). There were no significant differences between O and NSR for the mean rR per segment except for the PQ segment, with the rR in NSR significantly ( \(p<0.001\) ) exceeding that in O. Same applies for the SD of the mean rR per segment, except for Q, R, and S, where the variability in O significantly exceeded ( \(p<0.001\) ) that in NSR. Differences between the rR in individual segments were more subtle in the short-term model than in the long-term model. The most important segments for the short-term model were the P waves, with the median of mean rR of 0.449 and 0.424 for O and NSR, respectively. The P wave was followed by the R peak (all classes), the TQ segment (especially in O and NSR), and the F waves (AF only) with the median of mean rR ranging from 0.317 for F waves in AF to 0.376 for TQ segments in O. However, for AF, the SD was significantly increased ( \(p<0.001\) ) by 90.7% (in S) to 213.4% (in TQ) compared to O and by 137.8% (in S) to 332.9% (in T) compared to NSR. Relation between model explanations and rhythm To investigate whether rhythm information is reflected in the model explanations, we conducted correlation analyses (see Table 3 ). Direct and indirect correlations were tested between absolute changes in consecutive RR intervals \(\left|\delta RR\right|\) and absolute changes of segment-specific rR in consecutive beats \(\left|\delta rR\right|\) at both the beat-to-beat level and the recording level. The classification of correlation strengths is based on Cohen 30 . At beat-to-beat level, we found no correlations for the short-term model, but significant weak correlations between \(\left|\delta RR\right|\) and \(\left|\delta rR\right|\) in ECGs with AF and O for the long-term model in the Q, R, and S segments with \(r=0.120\) to \(r=0.160\) ( \(p<0.001\) ), peaking in AF. Furthermore, weak correlations were found for the ST segment in AF ( \(r=0.109\) , \(p<0.001\) ) and for the P and PQ segments in O ( \(r=0.127\) and \(r=0.129\) , \(p<0.001\) ). At recording level, we found significant ( \(p<0.001\) ) weak to strong correlations between mean absolute differences in consecutive RR intervals \(\stackrel{-}{\left|\delta RR\right|}\) and mean absolute changes of segment-specific rR in consecutive beats \(\stackrel{-}{\left|\delta rR\right|}\) across all classes. The strongest correlations up to \(r=0.635\) ( \(p<0.001\) , class: O, model: long-term, segment: Q) were found for QRS complexes regardless of class. However, these correlations were prominent only in the long-term model, whereas in the short-term model, correlations with the \(\stackrel{-}{\left|\delta rR\right|}\) of the P wave and the TQ segment were most prominent. Furthermore, there are class differences in the correlation between \(\stackrel{-}{\left|\delta RR\right|}\) and \(\stackrel{-}{\left|\delta rR\right|}\) . In AF, \(\stackrel{-}{\left|\delta RR\right|}\) and \(\stackrel{-}{\left|\delta rR\right|}\) were moderately correlated for Q, R, and S in the long-term model with \(r=0.336\) to \(r=0.415\) and weakly correlated with \(r=0.178\) and \(r=0.243\) for ST and QT ( \(p<0.001\) ). In the short-term model, we found a weak negative correlation for TQ ( \(r=-0.133\) , \(p<0.01\) ). For class O, we found moderate to strong correlations across all segments and models. In the long-term model, correlations with \(\stackrel{-}{\left|\delta rR\right|}\) of QRS complexes, P waves, and PQ segments stood out with \(r=0.591\) to \(r=0.635\) ( \(p<0.001\) ). In the short-term model, however, we found the strongest correlations for the P wave ( \(r=0.590\) ) and the TQ segment ( \(r=0.596\) ), but only moderate correlations for the QRS complex ( \(r=0.405\) to \(r=0.439\) , \(p<0.001\) ). In NSR ECGs, a similar pattern emerged with a correlation of \(r=0.338\) or \(r=0.349\) ( \(p<0.001\) ) for the P wave and the TQ segment in the short-term model versus \(r=0.009\) ( \(p\ge 0.05\) ) to \(r=0.183\) ( \(p<0.001\) ) for the remaining segments. In the long-term model, correlations were weak across segments, with the maximum for the QRS complex ( \(r=0.267\) to \(r=0.293\) , \(p<0.001\) ). 3 Discussion We have proposed an approach for the systematic analysis of DL model explanations in ECG analysis. Compared to previous approaches that utilize static segmentation of beats 16 or RR intervals 13,20 , our approach allows for the quantitative analysis of the relevance of diagnostically relevant segments, which can be both beat-based and, as exemplified by F waves, irregularly distributed across the signal. Compared to pseudo-quantitative template-based approaches 14,15 , we obtain rhythm information in addition to information on the variability of morphology and relevance. Moreover, this enables quantitative investigations of group differences using statistical methods. To ensure the trustworthiness and interpretability of DL explanations, it is crucial to verify the consistency in the use of features by DL models in addition to their agreement with clinically used ECG characteristics and diagnostic criteria for the classification problem. All three requirements are verified using our approach. In order to investigate the relationship between automatically learned features and diagnostic criteria, we investigated the example of AF. The main characteristics of AF are the substitution of P waves by superimposed F waves due to uncoordinated atrial excitation and absolute tachyarrhythmia due to irregular conduction to the ventricles 29,31 . For both models of xECGArch, long- and short term, we analyzed the mean relevance within ECG segments and their variability over a recording. Based on the hypothesis that the long-term model primarily uses rhythm information while the short-term model analyzes signal morphology, we expected the following: c1) The long-term model focuses on QRS complexes, as observed in previous studies 14,15,23 , that are particularly suitable for rhythm analysis and c2) the short-term model focuses on segments that differ in morphology between AF and n-AF. c1) We assumed irregular, closely spaced QRS complexes to be relevant to the long-term model for AF detection, while regular QRS complexes are relevant for n-AF detection. Regardless of class, Q, R, and S are the most relevant segments for the long-term model (see Fig. 3 ). The fact that the mean rR in all three segments is lower for AF than for n-AF, but the SD is significantly higher, shows that the relevance of the QRS complexes for the detection of AF fluctuates and is more evenly distributed for the detection of n-AF, which is supported by the exemplary ECGs in Fig. 2 . c2) We expected the P waves to be particularly relevant to the short-term model for n-AF detection and the F waves for AF detection. F waves are differently pronounced depending on the ECG lead and the subject and therefore cannot necessarily be observed. However, they are superimposed on the remaining signal and lead to irregular deformation of characteristic segments 28 , e.g., in the TQ segment. As anticipated, according to Fig. 2 and Fig. 3 , TQ segment and F waves are particularly relevant for the short-term model for AF detection and P waves for n-AF. The enhanced relevance of the TQ segments for n-AF detection can be explained by the position of the P wave within the segment. The R wave has an increased relevance for both classes, which can be attributed to either its rhythmic or morphological information content. Moreover, it is relevant to distinguish P and F waves depending on the distance to subsequent QRS complexes. The increased SD of the rR across a recording in AF compared to n-AF supports our hypothesis that F waves lead to irregularity of morphology and, consequently, in the relevance of all segments. To assess the ability of the employed models to learn rhythm information, we examined the relationship between changes in RR intervals \(\left|\delta RR\right|\) and beat-to-beat changes of segment-specific rR in consecutive beats \(\left|\delta rR\right|\) . Correlation analyses at beat-to-beat level revealed weak correlations ( \(r>0.1\) , \(p<0.001\) ) between \(\left|\delta RR\right|\) and \(\left|\delta rR\right|\) in the Q, R, and S segments in AF and O but not NSR, for the long-term model only. This supports our hypothesis that, in contrast to the short-term model, the long-term model learns rhythm and uses this information for classification with large \(\left|\delta rR\right|\) indicating AF and small \(\left|\delta rR\right|\) indicating the absence of AF. This is further supported by the color-coding of equidistant QRS complexes as highly relevant for the classification of n-AF in Fig. 2 b), while ventricular ectopics were marked as irrelevant to the long-term model but relevant to the short-term model. One potential explanation for the absence of stronger correlations at the beat-to-beat level is the possibly complex relationship between rhythm and model explanations, which result from the model complexity and cannot be represented at this level. Another correlation analysis at the recording level revealed moderate to strong correlations between \(\stackrel{-}{\left|\delta RR\right|}\) and \(\stackrel{-}{\left|\delta rR\right|}\) , depending on the class. Consistently, the strongest correlations were found for the long-term model, peaking in the QRS complexes for AF and O. Therefore, rhythm does not appear to be a significant factor for the short-term model in the classification of AF. For the class O, all correlations were moderate to strong regardless of model and segment. This can be explained by the fact that rhythm changes relevant for the long-term model, e.g., due to extrasystoles, are mostly related to morphological beat changes, which in turn is relevant for the short-term model. In accordance with the hypothesis that strong rhythm changes are indicative for n-AF, we observed weak correlations between \(\stackrel{-}{\left|\delta RR\right|}\) and \(\stackrel{-}{\left|\delta rR\right|}\) of the long-term model for NSR, which aligns with the continuous coloring of the equidistant QRS complexes in Fig. 2 b)-c). For TQ segments and the P waves that are within, except for AF, we observed an increased correlation with \(\stackrel{-}{\left|\delta RR\right|}\) for the short-term model across all classes, which is negative for AF. With increasing variability of the RR within a recording, the rR across these segments therefore becomes more homogenous for AF and more heterogenous for n-AF. Correlation analyses do not provide any information about causal relationships. However, possible explanations for AF are the poorer detectability of F waves in high-tachycardic, pseudo-regularized rhythm and for n-AF the autonomic coupling of jointly occurring changes in the atrial depolarization (P wave) and the RR 32 or pathophysiological correlations. In summary, both correlation analyses indicate that the rhythmicity is reflected in the relevance of the QRS complex for the long-term model, but not for the short-term model. For providing diagnostic support, automated ECG analysis must be trustworthy, i.e., the classifier’s decisions must be accurate and comprehensible. Comprehensibility combines the provision of meaningful explanations of the decision-making process and its interpretability for physicians. xAI approaches aim on explaining DL-based decisions. However, their reliability has to be verified for each scenario in a systematic comparison of xAI methods, which is usually not carried out 15 . We applied xECGArch to a large, diverse dataset, derived from 4 public databases. Overall, with an accuracy of 96.3%, compared to 95.3% 12 – 99.6% 33 , and an F1 score of 94.5%, compared to 80.7% 13 – 93.1% 12 , xECGArch is in the upper range of the state of the art of AF detection algorithms and therefore fulfills the need for accurate classification. For high classification accuracy, the DTD explanations, which provided the most reliable explanations for xECGArch in a systematic comparison 15 , can thus be assumed to be meaningful. Interpretability is ensured by displaying the explanations using xFuseMap and by systematically validating their agreement with clinically relevant ECG characteristics. With xFuseMap for the first time we present an approach to visualize the explanations of two models with different focus in a combined saliency map as dissociable information. We show that the explanations are consistent and, due to the design of xECGArch, based on the clinical reading of ECGs, divided into rhythm and morphology. In this way, we not only provide meaningful explanations for the model decision, but also separate them according to the usual reading of biosignals regarding their relevance from a rhythmic or morphological point of view in order to support diagnostics as effectively as possible. The principle of xECGArch as well as the combined representation of both model explanations using xFuseMap and the validation of the explanations by extracting the relevance values within diagnostically relevant segments can be transferred to other diseases or biosignals. This, however, requires an adaption of xECGArch and the choice of segments for validation matching the classification problem. xFuseMap is applicable to the model explanations of any classification problem based on continuous signals, solved by two classifiers of different domains. The combined representation enables for the first time assigning relevance information to a specific domain, in this case rhythm or morphology, which has significant impact on the interpretability of explanations. On the one hand, this is not only useful for diagnostic support, but also opens up new possibilities for applications in the areas of research and teaching. 4 Methods We used the pre-trained xECGArch 15 to classify 1,521 unseen ECGs into AF and n-AF. Subsequently model explanations were extracted in terms of each sample’s relevance for classification using DTD for both, the long-term and the short-term model. The model explanations were combined in xFuseMap to present the long- and short-term relevance information, illustrating the impact of rhythmic and morphological characteristics on the classification decision of xECGArch. To validate the model-dependent explanations, we developed a generalizable method to quantify the relevance of diagnostically important ECG segments. Subsequently, we statistically examined the relationship between model explanations and morphological characteristics and rhythm. The classification procedure utilizing xECGArch and the subsequent extraction of the model explanations were conducted in Python 3.9.19 and TensorFlow 2.12. All other operations were performed in Matlab R2021b (MathWorks Inc., Natick, MA, USA). Data material We used the unseen training dataset from xECGArch 15 and increased the number of NSR \({n}_{NSR}\) and O \({n}_{O}\) ECGs using additional unseen data from the same databases to equal the number of AF \({n}_{AF}\) ECGs ( \({n}_{AF}{=n}_{NSR}={n}_{O}=507\) ) and thus created a balanced dataset. This enabled the statistical examination of differences between AF, NSR, and O in the distribution of relevance values. Although the classes NSR and O were not considered separately but combined during training, differences in the explanations for both classes could indicate the methodology’s transferability. In total, we used 1,521 ECGs from 4 public databases, acessible via PhysioNet 34 . These included the China Physiological Signal Challenge 2018 (CPSC2018) database 35 , the Chapman-Shaoxing (ChapShao) database 36,37 , the Georgia 12-lead ECG Challenge (Georgia) database 38,39 , and the XL database from the Physikalisch-Technische Bundesanstalt (PTB-XL) 5,40 . As the majority of the recordings in these databases are 10 seconds in length, only ECGs of at least this length were used. Shorter ECGs were not considered. Longer ECGs were clipped to the middle 10 seconds. The distribution of age and sex among the databases and classes can be observed in Table 4 . To ensure transferability to future applications in wearables and mobile devices, as used for example in the TIMELY project 41,42 , we have restricted our analyses to single-lead ECGs. F waves, one of the main characteristics of AF, can be best seen in leads II and V1, since the position and axes of the ECG leads are most suitable for measuring the spatial excitation of both atria 43,44 . Since leads from mobile devices are usually based on the Einthoven limb leads, we chose lead II for our analyses. Table 4 Data composition in terms of age and gender, divided according to the source database and class criteria. AF, atrial fibrillation; CPSC, China physiological signal challenge; f, female; m, male; NSR, normal sinus rhythm; O, others; PTB, Physikalisch-Technische Bundesanstalt; SD, standard deviation. Criteria \(\varvec{n}\) Age Sex Mean ± SD (years) m / f (%) Overall 1,521 62.3 ± 17.0 54 / 46 Source databases Chapman/Shaoxing 36 412 63.9 ± 15.7 53 / 47 CPSC2018 35 360 62.7 ± 19.2 58 / 42 Georgia 38,39 370 62.4 ± 15.2 57 / 43 PTB-XL 5,40 379 60.0 ± 17.7 50 / 50 Classes AF 507 71.6 ± 12.3 58 / 42 NSR 507 54.2 ± 16.2 45 / 55 O 507 61.0 ± 17.3 57 / 43 Model parameterization xECGArch 15 is an architecture for automated ECG analysis comprising two parallel 1D CNNs and a combined decision-making process based on the clinical reading of ECGs. Both CNNs share the same architecture, however, the long-term model considers the entire signal duration at once, making it more sensitive to rhythmicity. The short-term model has a short observation period of 0.6 seconds, making it more sensitive to morphological changes within beats. Both, the long-term and the short-term CNNs individually analyze and classify an ECG. For the combined decision, the decision reliability values of both models for or against AF are weighted averaged. The highest averaging class values are considered the classification result. This allows us to consider that, depending on the specific CVD, the rhythm and the morphology may be differentially affected. The weights were determined by maximizing the combined F1 score. Various xAI methods have been applied to ECG classification previously (see Table 1). However, our recent findings indicate that the eligibility of xAI methods might be case-dependent, and that the xAI method must therefore be carefully selected to obtain reliable explanations 15 . With regard to xECGArch, we investigated the reliability of 13 different xAI methods by gradually perturbing and reclassifying the signals. DTD exhibited the most significant decline in classification performance, and thus was selected as the most reliable xAI method for this scenario. The concept underlying the DTD is the redistribution of the model output to the input components via relevance propagation rules. The relevance of a neuron in the upstream layer is determined by summing the relevance values of all neurons (or neuron outputs) in a layer. This procedure is performed for each neuron in each layer, including the input layer. In contrast to other methods, the DTD only provides information about the relevance of individual samples for and not against the assignment to a class. The DTD explanations in terms of the sample-wise relevance for the classification as AF or n-AF were calculated using the iNNvestigate 2.0 Toolbox 45 and subsequently normalized between 0 and 1. Model explanation fusion (xFuseMap) Since both, rhythm and morphology, are pertinent to ECG analysis and the detection of CVDs such as AF, a trusthworthy algorithm should use information from both domains and present relevant areas in a dissociative manner. With xFuseMap, we integrate the relevance information from both domains into a combined saliency map. To ensure interpretability, we color-code the origin of the relevance information in terms of rhythm or morphology, assigning a color to both models. A two-dimensional color space is created with the rR for each model on one axis (see Fig. 4 ). To achieve a high color contrast, we chose blue for the long-term model and orange for the short-term model. Areas that are relevant to neither or both models are colored black or pink. In the areas in between, the colors blend smoothly. A dark gray background improves visual recognition of relevant areas compared to a light background. Other color combinations are conceivable. By multiplying the rR values by the models’ decision certainties, we account for the reduced significance of uncertain decisions in the combined representation. In addition, the line thickness in the saliency maps for each sample is adjusted to correlate with the maximum relevance value from both models for each sample. Fusing the model explanations in xFuseMap therefore requires the following steps: applying xECGArch or two other models with dissociable focus, extracting and normalizing the relevance values of both models for the majority class, and visualizing the combined rR of both models according to xFuseMap, by sample-wise color-coding according to xFuseMap and adjusting the line thickness according to the merged rR. Figure 4 shows an example saliency map for the combined explanation of both models in xECGArch for the classification of an AF ECG using xFuseMap. The peaks of the second and fifth QRS complexes (rectangular marking) are highly relevant for the long-term model and hardly relevant for the short-term model, which is why they are located at the bottom right of the color map and are therefore highlighted in clear blue and with a thick line in the saliency map. The remaining QRS complexes (triangular marking) show a lower relevance for the long-term model and still hardly for the short-term model. The relevance values are shifted to the left in the colormap and the corresponding points in the saliency map are colored dark blue. Due to the lower rR, the line for these QRS complexes is thinner. F waves are clearly recognizable in the ECG, which partially overlap the ST segment or the T wave (circular marking). They are colored orange and highlighted by a thick line, which means that the F waves are highly relevant for the decision, especially for the short-term model and less for the long-term model, which is why the rR values are positioned at the top left of the colormap. Investigating the class-specific relation between model-dependent relevance and diagnostic criteria We quantified the relevance of specific ECG segments using a template-based approach to investigate the class-specific relationship between the model-dependent relevance and diagnostic criteria, such as class-specific rhythm or morphology. A generalizable method for quantifying the relevance of specific ECG segments To quantitatively evaluate the explanations, we determined the mean rR of all samples within the ECG segments shown in Fig. 5 for each model, as well as its variability in terms of the standard deviation (SD) of the segment-averaged rR within a recording. Each ECG was processed separately. First, the ECGs were band-pass filtered between 0.3 and 120 Hz to remove artifacts and high-frequency noise, and notch filtered at 50 or 60 Hz, depending on the database source, to remove grid noise 46,47 . Second, to robustly detect each the fiducial points of each beat, we applied iterative two-dimensional signal warping (i2DSW) 48,49 , which implies the reflection of fiducial points of a template beat to every single beat by iteratively fitting the template beat to each individual beat. For this purpose, QRS complexes were automatically detected 50,51 , beats were extracted 50 , and then template beats were generated by averaging the beats 48,49 . Fiducial points were detected using the ECGdeli toolbox 46 , checked by an expert and corrected manually if necessary. Subsequently, 9 segments were defined, within which the relevance information was averaged beat-wise, as shown in Fig. 5 . The Q and the S segments were defined as the time between the R peak and the Q or S peak, respectively. The TQ segment was defined as the time between the offset of the T wave T Off of beat \(i\) and the Q peak of the following beat \(i+1\) . This could not be calculated for the last beat as there was no following beat. We proceeded identically for F waves, that we detected additionally by applying the findpeaks Matlab function to beat-corrected and filtered ECGs 52 . Beat correction was achieved by subtracting the i2DSW beat adjusted templates from the signal. Subsequently, the beat-corrected signal was band-pass filtered between 5 and 10 Hz. Statistical analysis of the class-specific relation between model explanations and morphology The objective of our study was to demonstrate that xECGArch employs information that aligns with clinical knowledge for the detection of AF, and that the short-term model, in contrast to the long-term model, primarily uses morphology features. We therefore considered class- and model-dependent differences in the mean rR as well as the SD of the segment-averaged rR and tested for statistical significance using a two-factor ANOVA for each metric and each model separately. We used segment and class as independent factors and relevance as dependent factor. Only correctly classified ECGs were included in the ANOVA. Furthermore, outliers were identified and removed using the Matlab function isoutlier with outliers being defined as values that were more than three scaled median absolute deviations away from the median. The residuals were tested for normal distribution using the Kolmogorov-Smirnov test, although the execution of the ANOVA did not depend on the result, as the ANOVA is considered robust against violations of the normal distribution 53 . Subsequently, multiple Student’s t -tests were applied as a post-hoc analysis to identify group-specific differences, with Tukey-Kramer's correction for multiple testing employed to address alpha error accumulation. Statistical analysis of the relation between model explanations and rhythm Two analyses were conducted to examine the models’ ability to use rhythm information for classification. Initially, at a beat-to-beat level, we examined whether the altered rhythm observed in AF compared to O and NSR directly translates into a fluctuating rR within segments of the same type. Secondly, at the recording level, we examined whether there is a more complex relation between the RR variability and the variability of rR information within similar segments that cannot be examined at a beat-to-beat level but on an abstracted level. We hypothesized that the change in RR intervals results in a change in the rR in specific segments. Consequently, we considered the absolute difference between consecutive RR intervals ( \(\left|\delta RR\right|\) ) and the absolute differences of segment-specific rR inconsecutive beats ( \(\left|\delta rR\right|\) ) as measures for spontaneous changes in rhythm and rR. It should be noted that we did not exclude abnormal beats. Therefore, \(\left|\delta RR\right|\) as a measure of rhythm changes should not be confused with heart rate variability features, which measure the variability of the distance between normal beats. Furthermore, F waves have not been included in these analyses as they are not directly related to the excitation of the ventricles and, therefore, cannot be assigned to individual beats in the ECG. For beat-to-beat level analysis, we computed the correlation between the Euclidean normalized \(\left|\delta RR\right|\) and the \(\left|\delta rR\right|\) for all recordings of each class for the long-term and the short-term models separately. We used Spearman’s rank correlation analysis as the weighted interconnection in neurons enables NNs to learn complex, non-linear relationships 54 . The \(\left|\delta RR\right|\) time series were normalized using the Euclidean distance on a recording-wise basis to eliminate the influence of the absolute RR interval length, allowing for a relative consideration of RR changes. For the analysis at the recording level, we calculated the correlation between the mean \(\left|\delta RR\right|\) ( \(\stackrel{-}{\left|\delta RR\right|}\) ) and the mean \(\left|\delta rR\right|\) ( \(\stackrel{-}{\left|\delta rR\right|}\) ) across all recordings of a class for each segment type separately. Given the significance of absolute arrhythmia in AF, we postulated that an increased variability of the RR interval duration would be reflected in the model explanations, particularly in the long-term model, and would be expressed in an increased variability of the relevance of certain segments, particularly the QRS complex. Consequently, we expected \(\stackrel{-}{\left|\delta RR\right|}\) to be linearly reflected in \(\stackrel{-}{\left|\delta rR\right|}\) and thus employed Pearson’s correlation analysis. Declarations Data Availability For maximum comparability and reproducibility, only public databases were used for this study. The saliency maps for the entire dataset and all statistics can be found in the appendices A-C. The associated relevance data and the segment boundaries will be made available upon publication. Code Availability The matlab scripts utilized for statistical analysis, based on the relevance data and segment boundaries, are made publicly available upon publication. Acknowledgements This study was partly supported by grants from the European Union’s Horizon 2020 research and innovation program (TIMELY, No. 101017424). Author contributions A.H. and M.S. designed the study concept. A.H., M.G., and M.S. made substantial contributions to the data preparation. M.G. implemented xECGArch and extracted model explanations. A.H. designed and implemented xFuseMap. A.H. performed the segmentation of model explanations. A.H. and M.S. conducted statistical analyses. A.H., H.M., A.L., S.R., N.M., and M.S. made substantial contributions to the interpretation of the results. A.H. is the first author. M.S. supervised this work. All authors contributed to the manuscript preparation, critical revisions, and approved the final version of the manuscript. Competing interests A.H., M.G., H.M., and M.S. are the inventors of the patent application DE 10 2023 118 246.3, which covers the principle of xFuseMap presented in this paper and the architecture of xECGArch. The TU Dresden is the patent applicant. A.L. and S.R. have received grants from Novartis and Edwards Lifesciences; personal fees from Abbott, Abiomed, AstraZeneca, Bayer, Boehringer Ingelheim, Boston Scientific, Edwards Lifesciences, Medtronic, Novartis, Sanofi Genzyme, and Pfizer; and other fees from Picardia, Filterlex, and Transverse Medical outside the submitted work. N.M. reports personal fees from Edwards Lifesciences, Medtronic, Biotronik, Novartis, Sanofi Genzyme, AstraZeneca, Pfizer, Bayer, Abbott, Abiomed, B. Braun, and Boston Scientific, outside the submitted work. References 1. Vos, T. et al. Global burden of 369 diseases and injuries in 204 countries and territories, 1990–2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet 396 , 1204–1222 (2020). 2. Di Cesare, M. et al. World Heart Report 2023: Confronting the World’s Number One Killer . (2023). 3. Roth, G. A. et al. Global Burden of Cardiovascular Diseases and Risk Factors, 1990–2019. J Am Coll Cardio 76 , 2982–3021 (2020). 4. Rosiek, A. & Leksowski, K. The risk factors and prevention of cardiovascular disease: the importance of electrocardiogram in the diagnosis and treatment of acute coronary syndrome. Ther Clin Risk Manag 12 , 1223–1229 (2016). 5. Wagner, P. et al. PTB-XL, a large publicly available electrocardiography dataset. Sci Data 7 , 154 (2020). 6. Stracina, T., Ronzhina, M., Redina, R. & Novakova, M. Golden Standard or Obsolete Method? Review of ECG Applications in Clinical and Experimental Context. Front Physiol 13 , 867033 (2022). 7. Janiesch, C., Zschech, P. & Heinrich, K. Machine learning and deep learning. Electron Mark 31 , 685–695 (2021). 8. Bishop, C. M. Pattern Recognition and Machine Learning . (Springer, New York, NY, 2006). doi:10.1007/978-0-387-45528-0. 9. Holzinger, A., Langs, G., Denk, H., Zatloukal, K. & Müller, H. Causability and Explainability of Artificial Intelligence in Medicine. WIREs Data Mining Knowl Discov 9 , e1312 (2019). 10. Gershman, S. J., Horvitz, E. J. & Tenenbaum, J. B. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science 349 , 273–278 (2015). 11. Vale, D., El-Sharif, A. & Ali, M. Explainable artificial intelligence (XAI) post-hoc explainability methods: risks and limitations in non-discrimination law. AI Ethics 2 , 815–826 (2022). 12. Taniguchi, H. et al. Explainable Artificial Intelligence Model for Diagnosis of Atrial Fibrillation Using Holter Electrocardiogram Waveforms. Int Heart J 62 , 534–539 (2021). 13. Ivaturi, P. et al. A Comprehensive Explanation Framework for Biomedical Time Series Classification. IEEE J Biomed Health Inform 25 , 2398–2408 (2021). 14. Bender, T. et al. Analysis of a Deep Learning Model for 12-Lead ECG Classification Reveals Learned Features Similar to Diagnostic Criteria. IEEE J Biomed Health Inform 28 , 1–12 (2023). 15. Goettling, M., Hammer, A., Malberg, H. & Schmidt, M. xECGArch: a trustworthy deep learning architecture for interpretable ECG analysis considering short-term and long-term features. Sci Rep 14 , 13122 (2024). 16. Aufiero, S. et al. A deep learning approach identifies new ECG features in congenital long QT syndrome. BMC Med 20 , 162 (2022). 17. Cao, Y. et al. Detection and Localization of Myocardial Infarction Based on Multi-Scale ResNet and Attention Mechanism. Front Physiol 13 , 783184 (2022). 18. Jahmunah, V., Ng, E. Y. K., Tan, R.-S., Oh, S. L. & Acharya, U. R. Explainable detection of myocardial infarction using deep learning models with Grad-CAM technique on ECG signals. Comput Biol Med 146 , 105550 (2022). 19. Prabhakararao, E. & Dandapat, S. Myocardial Infarction Severity Stages Classification From ECG Signals Using Attentional Recurrent Neural Network. IEEE Sens J 20 , 8711–8720 (2020). 20. Singh, P. & Sharma, A. Interpretation and Classification of Arrhythmia Using Deep Convolutional Network. IEEE Trans Instrum Meas 71 , 1–12 (2022). 21. Zhang, D., Yang, S., Yuan, X. & Zhang, P. Interpretable deep learning for automatic diagnosis of 12-lead electrocardiogram. iScience 24 , 102373 (2021). 22. Reddy, L., Talwar, V., Alle, S., Bapi, Raju. S. & Priyakumar, U. D. IMLE-Net: An Interpretable Multi-level Multi-channel Model for ECG Classification. in 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC) 1068–1074 (Melbourne, Australia, 2021). doi:10.1109/SMC52423.2021.9658706. 23. Honarvar, H. et al. Enhancing convolutional neural network predictions of electrocardiograms with left ventricular dysfunction using a novel sub-waveform representation. Cardiovasc Digit Health J 3 , 220–231 (2022). 24. Plante, E. & Vance, R. Selection of preschool language tests: a data-based approach. Lang Speech Hear Serv Sch 25 , 15–24 (1994). 25. Chugh, S. S. et al. Worldwide Epidemiology of Atrial Fibrillation: a Global Burden of Disease 2010 Study. Circulation 129 , 837–847 (2014). 26. Mou, L. et al. Lifetime Risk of Atrial Fibrillation by Race and Socioeconomic Status: ARIC Study (Atherosclerosis Risk in Communities). Circ Arrhythm Electrophysiol 11 , e006350 (2018). 27. Odutayo, A. et al. Atrial fibrillation and risks of cardiovascular disease, renal disease, and death: systematic review and meta-analysis. BMJ 354 , i4482 (2016). 28. Hammer, A., Malberg, H. & Schmidt, M. Towards the Prediction of Atrial Fibrillation Using Interpretable ECG Features. in Computing in Cardiology 2022 vol. 49 1–4 (Tampere, Finland, 2022). 29. Lip, G. Y. H. et al. Atrial fibrillation. Nat Rev Dis Primers 2 , 16016 (2016). 30. Cohen, J. Statistical Power Analysis for the Behavioral Sciences . (Erlbaum, Hillsdale, NJ, 1988). 31. Brundel, B. J. J. M. et al. Atrial fibrillation. Nat Rev Dis Primers 8 , 21 (2022). 32. Dilaveris, P. E., Färbom, P., Batchvarov, V., Ghuran, A. & Malik, M. Circadian behavior of P-wave duration, P-wave area, and PR interval in healthy subjects. Ann Noninvasive Electrocardiol 6 , 92–97 (2001). 33. Ribeiro, A. H. et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun 11 , 1760 (2020). 34. Goldberger, A. et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101 , e215–e220 (2000). 35. Liu, F. F. et al. An open access database for evaluating the algorithms of ECG rhythm and morphology abnormal detection. J Med Imaging Health Infor 8 , 1368–1373 (2018). 36. Zheng, J. et al. Optimal Multi-Stage Arrhythmia Classification Approach. Sci Rep 10 , 2898 (2020). 37. Zheng, J., Guo, H. & Chu. A large scale 12-lead electrocardiogram database for arrhythmia study. PhysioNet https://doi.org/10.13026/wgex-er52 (2022). 38. Perez Alday, E. A. et al. Classification of 12-lead ECGs: The PhysioNet/Computing in Cardiology Challenge 2020. Physiol. Meas. 41 , 124003 (2021). 39. Perez Alday, E. A. et al. Classification of 12-lead ECGs: The PhysioNet/Computing in Cardiology Challenge 2020. PhysioNet https://doi.org/10.13026/dvyd-kd57. 40. Wagner, P., Strodthoff, N., Bousseljot, R.-D., Samek, W. & Schaeffter, T. PTB-XL, a large publicly available electrocardiography dataset. PhysioNet https://doi.org/10.13026/kfzx-aw45 (2022). 41. Schmitz, B. et al. Patient-centered cardiac rehabilitation by AI-powered lifestyle intervention – the timely approach. Atherosclerosis 355 , 251 (2022). 42. Hammer, A., Goettling, M., Malberg, H., Linke, A. & Schmidt, M. An explainable AI for trustworthy detection of atrial fibrillation on reduced lead ECGs in mobile applications. Eur Heart J 45 , (accepted). 43. Nault, I. et al. Clinical value of fibrillatory wave amplitude on surface ECG in patients with persistent atrial fibrillation. J Interv Card Electrophysiol 26 , 11–19 (2009). 44. Park, J. et al. Early differentiation of long-standing persistent atrial fibrillation using the characteristics of fibrillatory waves in surface ECG multi-leads. Sci Rep 9 , 2746 (2019). 45. Alber, M. et al. iNNvestigate Neural Networks! J Mach Learn Res 20 , 1–8 (2019). 46. Pilia, N. et al. ECGdeli - An open source ECG delineation toolbox for MATLAB. SoftwareX 13 , 100639 (2021). 47. Hammer, A., Malberg, H. & Schmidt, M. Cardiovascular Reflections of Sympathovagal Imbalance Precede the Onset of Atrial Fibrillation. in Computing in Cardiology 2023 vol. 50 1–4 (Atlanta (GA), USA, 2023). 48. Schmidt, M., Baumert, M., Porta, A., Malberg, H. & Zaunseder, S. Two-Dimensional Warping for One-Dimensional Signals—Conceptual Framework and Application to ECG Processing. IEEE Trans Signal Process 62 , 5577–5588 (2014). 49. Schmidt, M., Baumert, M., Malberg, H. & Zaunseder, S. Iterative two-dimensional signal warping—Towards a generalized approach for adaption of one-dimensional signals. Biomed Signal Process Control 43 , 311–319 (2018). 50. Hammer, A. et al. Automatic Classification of Full- And Reduced-Lead Electrocardiograms Using Morphological Feature Extraction. in Computing in Cardiology 2021 vol. 48 1–4 (Brno, Czech Republic, 2021). 51. Johnson, A. E., Behar, J., Andreotti, F., Clifford, G. D. & Oster, J. R-Peak Estimation Using Multimodal Lead Switching. in Computing in Cardiology 2014 vol. 41 281–284 (2014). 52. Hammer, A., Malberg, H. & Schmidt, M. Morphology Features Self-Learned by Explainable Deep Learning for Atrial Fibrillation Detection Correspond to Fibrillatory Waves. in Computing in Cardiology 2024 vol. 51 1–4 (Karlsruhe, Germany, accepted). 53. Blanca, M. J., Alarcón, R., Arnau, J., Bono, R. & Bendayan, R. Non-normal data: Is ANOVA still a valid option? Psicothema 29 , 552–557 (2017). 54. Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J. & Müller, K.-R. Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. Proc IEEE 109 , 247–278 (2021). 55. Selvaraju, R. R. et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. in 2017 IEEE International Conference on Computer Vision 618–626 (Venezia, Italy, 2017). doi:10.1109/ICCV.2017.74. 56. Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. in Proceedings of the 34th International Conference on Machine Learning - Volume 70 3145–3153 (JMLR.org, Sydney, NSW, Australia, 2017). 57. Clifford, G. D. et al. AF Classification from a Short Single Lead ECG Recording: the PhysioNet/Computing in Cardiology Challenge 2017. Comput Cardiol (2010) 44 , (2017). 58. Moody, G. B. & Mark, R. G. The impact of the MIT-BIH Arrhythmia Database. IEEE Eng Med Biol Mag 20 , 45–50 (2001). Tables 1 and 3 Tables 1 and 3 are available in the Supplementary Files section. Additional Declarations Competing interest reported. A.H., M.G., H.M., and M.S. are the inventors of the patent application DE 10 2023 118 246.3, which covers the principle of xFuseMap presented in this paper and the architecture of xECGArch. The TU Dresden is the patent applicant. A.L. and S.R. have received grants from Novartis and Edwards Lifesciences; personal fees from Abbott, Abiomed, AstraZeneca, Bayer, Boehringer Ingelheim, Boston Scientific, Edwards Lifesciences, Medtronic, Novartis, Sanofi Genzyme, and Pfizer; and other fees from Picardia, Filterlex, and Transverse Medical outside the submitted work. N.M. reports personal fees from Edwards Lifesciences, Medtronic, Biotronik, Novartis, Sanofi Genzyme, AstraZeneca, Pfizer, Bayer, Abbott, Abiomed, B. Braun, and Boston Scientific, outside the submitted work. Supplementary Files Tables1and3.docx AppendixA.pdf AppendixB.pdf AppendixC.pdf Cite Share Download PDF Status: Published Journal Publication published 27 Aug, 2025 Read the published version in npj Artificial Intelligence → Version 1 posted Editorial decision: Revision requested 16 Mar, 2025 Reviews received at journal 25 Jan, 2025 Reviewers agreed at journal 23 Jan, 2025 Reviews received at journal 22 Oct, 2024 Reviewers agreed at journal 14 Oct, 2024 Reviewers agreed at journal 16 Aug, 2024 Reviewers agreed at journal 16 Aug, 2024 Reviewers invited by journal 14 Aug, 2024 Editor assigned by journal 31 Jul, 2024 Submission checks completed at journal 28 Jun, 2024 First submitted to journal 28 Jun, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4655592","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":341927848,"identity":"ac3f0198-6e09-4acf-bb83-e37eafb7f864","order_by":0,"name":"Alexander Hammer","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABM0lEQVRIie2QP0vDQBiHLxyky6VZU9T2K7zhIEVS2q/SEmiXBB0dBAuBm9q9Qj5Ep+DgcOEgXYJzxaEFoVMK4iZF61FR0fMPbg73wB0vP3j48b4IaTT/kcrux/KRl8BGrzN/Dz+APys19mcF8l8UO8a583Dh15tDK7u/DltHdDECtL5sdaqzMUfliaI4wgzccTGg+7wa7EVp/zDNCRjJqt87L666RlKoNYK4S4uJ3gQRkIoALyfNDeGiC/MQsMUUoyHsu+yRiTOp0E2UboEyApjwbQcWpVSeFAUEMW5lS9dBxJMtHMDcKdyYzuVgDRXFFSalB2zgTjDx/CgNwMn7x0bCA7lLCFmSK0p9Fq9qa+Y3nMqI3kRpG+xYTFHJ2/JihbssT9X138BfhfwHQaPRaDTf8wyb52d6Zw4VkwAAAABJRU5ErkJggg==","orcid":"","institution":"Institute of Biomedical Engineering, TU Dresden","correspondingAuthor":true,"prefix":"","firstName":"Alexander","middleName":"","lastName":"Hammer","suffix":""},{"id":341927849,"identity":"ca93b0dd-5c81-4e82-b3b3-4bb94524536e","order_by":1,"name":"Marc Goettling","email":"","orcid":"","institution":"Institute of Biomedical Engineering, TU Dresden","correspondingAuthor":false,"prefix":"","firstName":"Marc","middleName":"","lastName":"Goettling","suffix":""},{"id":341927850,"identity":"4e4a5644-1a11-45c4-92d4-c9e2740c0d32","order_by":2,"name":"Hagen Malberg","email":"","orcid":"","institution":"Institute of Biomedical Engineering, TU Dresden","correspondingAuthor":false,"prefix":"","firstName":"Hagen","middleName":"","lastName":"Malberg","suffix":""},{"id":341927851,"identity":"bf38fea3-2812-44e6-884b-4cd535b09b6b","order_by":3,"name":"Axel Linke","email":"","orcid":"","institution":"Department for Internal Medicine and Cardiology, Heart Center Dresden, TU Dresden","correspondingAuthor":false,"prefix":"","firstName":"Axel","middleName":"","lastName":"Linke","suffix":""},{"id":341927852,"identity":"a5037310-b37a-400d-8e76-6ccc28e188bd","order_by":4,"name":"Sergio Richter","email":"","orcid":"","institution":"Department for Internal Medicine and Cardiology, Heart Center Dresden, TU Dresden","correspondingAuthor":false,"prefix":"","firstName":"Sergio","middleName":"","lastName":"Richter","suffix":""},{"id":341927853,"identity":"ec793f20-a1ca-45b7-8002-cddc0c604a69","order_by":5,"name":"Norman Mangner","email":"","orcid":"","institution":"Department for Internal Medicine and Cardiology, Heart Center Dresden, TU Dresden","correspondingAuthor":false,"prefix":"","firstName":"Norman","middleName":"","lastName":"Mangner","suffix":""},{"id":341927857,"identity":"37a0c1af-c70f-44b8-aa2b-5bdedb10130c","order_by":6,"name":"Martin Schmidt","email":"","orcid":"","institution":"Institute of Biomedical Engineering, TU Dresden","correspondingAuthor":false,"prefix":"","firstName":"Martin","middleName":"","lastName":"Schmidt","suffix":""}],"badges":[],"createdAt":"2024-06-28 15:07:42","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4655592/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4655592/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s44387-025-00022-w","type":"published","date":"2025-08-28T00:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":63136635,"identity":"186cb3b2-0177-46e3-9b62-32f450ba21cd","added_by":"auto","created_at":"2024-08-23 14:31:44","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":507850,"visible":true,"origin":"","legend":"\u003cp\u003eApproach for automatically learning and visualizing clinically relevant long-term (rhythmic) and short-term (morphological) characteristics for the detection of atrial fibrillation (AF) in the electrocardiogram (ECG). It combines the explainable ECG architecture (xECGArch)\u003csup\u003e15\u003c/sup\u003e, and the two models it contains (the long-term and the short-term model) with the fused representation of the respective model explanations in a combined saliency map (xFuseMap). The model explanations were validated for agreement with ECG characteristics from clinical knowledge. AI, artificial intelligence; xAI, explainable artificial intelligence.\u003c/p\u003e","description":"","filename":"floatimage42.png","url":"https://assets-eu.researchsquare.com/files/rs-4655592/v1/502e670747dfb0701d749bfa.png"},{"id":63137644,"identity":"42cbab08-a7cd-487a-a0f8-32eef97fee2d","added_by":"auto","created_at":"2024-08-23 14:47:44","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":642908,"visible":true,"origin":"","legend":"\u003cp\u003eModel explanations fused in a combined saliency map (xFuseMap). Preprocessed and colorized ECGs, showing the relative relevance (rR) of ECG segments for the decision making of the long-term (blue), the short-term (orange), or both models (magenta) for ECGs that contain a) atrial fibrillation (AF), b) another anomaly (O), and c) normal sinus rhythm (NSR). Above the colored ECGs, the classification result of both models for AF or non-AF (n-AF) is shown with the respective decision certainty. White-bordered circles show zoomed-in signal parts.\u003c/p\u003e","description":"","filename":"floatimage51.png","url":"https://assets-eu.researchsquare.com/files/rs-4655592/v1/eee6275199fd0ac2d972a7b8.png"},{"id":63136633,"identity":"afabc46b-07fb-48bb-9e72-31b49647cf46","added_by":"auto","created_at":"2024-08-23 14:31:44","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":694568,"visible":true,"origin":"","legend":"\u003cp\u003eClass dependent distribution of electrocardiogram (ECG) segments’ relative relevance (rR) for classification. The mean rR of all segments of a type is displayed in a) and b) and the standard deviation (SD) of the mean rR per segment type is displayed in c) and d) over one record per segment, class, and model in arbitrary units (a.u.). Significant Tukey-Kramer-corrected inter-class comparisons from post-hoc analyses with previous analysis of variance (ANOVA) are marked. The boxplot limits represent the upper and lower quartiles, with the white dot in between representing the median. The whiskers are limited to 1.5-times the interquartile range. Outliers are shown as colored dots.\u003c/p\u003e","description":"","filename":"floatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-4655592/v1/86edf9c04b6784bccfe5c093.png"},{"id":63137133,"identity":"d62bbdac-0013-4b12-838a-7bdf9cdfaea3","added_by":"auto","created_at":"2024-08-23 14:39:44","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":266599,"visible":true,"origin":"","legend":"\u003cp\u003eReading of the combined representation of model explanations in a fused saliency map (xFuseMap) using an exemplary electrocardiogram (ECG) with atrial fibrillation. Data points are color-coded according to their relative relevance (rR) in arbitrary units (a.u.) for classification by the long-term and short-term models included in the explainable ECG analysis architecture (xECGArch)\u003csup\u003e15\u003c/sup\u003e, as indicated by the color map on the right. Relevant ECG characteristics are marked in the saliency map and the color map.\u003c/p\u003e","description":"","filename":"floatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-4655592/v1/821ab7324ad73f37cf9fdc44.png"},{"id":63137134,"identity":"c27effa7-cd0e-4362-ac43-e1cf69552d5b","added_by":"auto","created_at":"2024-08-23 14:39:44","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":99280,"visible":true,"origin":"","legend":"\u003cp\u003eElectrocardiogram (ECG) segments that were extracted from each beat i for calculating the relevance of beat segments. TQ segment was calculated between offset of T wave from beat i and Q of beat i + 1.\u003c/p\u003e","description":"","filename":"floatimage8.png","url":"https://assets-eu.researchsquare.com/files/rs-4655592/v1/6f40742c0caa29335bd0c21d.png"},{"id":95948856,"identity":"c67e8f38-5b4c-47d9-8a36-f2017b6e1c21","added_by":"auto","created_at":"2025-11-14 18:40:11","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2962136,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4655592/v1/0ce9d7a8-fbba-440b-ba8b-15c34575a8a3.pdf"},{"id":63137131,"identity":"cebdbeb5-3426-442e-8228-eb4afdfaaafc","added_by":"auto","created_at":"2024-08-23 14:39:44","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":78808,"visible":true,"origin":"","legend":"","description":"","filename":"Tables1and3.docx","url":"https://assets-eu.researchsquare.com/files/rs-4655592/v1/feda9c8214f22474b6f328b0.docx"},{"id":63136639,"identity":"70705dde-a30f-44f6-9a4a-fa2aabbbdb91","added_by":"auto","created_at":"2024-08-23 14:31:44","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":138888,"visible":true,"origin":"","legend":"","description":"","filename":"AppendixA.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4655592/v1/1367346772cf0543f1d83851.pdf"},{"id":63136637,"identity":"eeff1b4c-eb34-4ac5-9e5c-f36ae6cf1986","added_by":"auto","created_at":"2024-08-23 14:31:44","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":1084997,"visible":true,"origin":"","legend":"","description":"","filename":"AppendixB.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4655592/v1/d99ad80e17d37eac4b5fef6c.pdf"},{"id":63136652,"identity":"cd26104d-f471-4e33-b332-67cbbd588920","added_by":"auto","created_at":"2024-08-23 14:32:02","extension":"pdf","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":265838693,"visible":true,"origin":"","legend":"","description":"","filename":"AppendixC.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4655592/v1/2e6fc1caf4cbdaafdeb166a3.pdf"}],"financialInterests":"Competing interest reported. A.H., M.G., H.M., and M.S. are the inventors of the patent application DE 10 2023 118 246.3, which covers the principle of xFuseMap presented in this paper and the architecture of xECGArch. The TU Dresden is the patent applicant.\nA.L. and S.R. have received grants from Novartis and Edwards Lifesciences; personal fees from Abbott, Abiomed, AstraZeneca, Bayer, Boehringer Ingelheim, Boston Scientific, Edwards Lifesciences, Medtronic, Novartis, Sanofi Genzyme, and Pfizer; and other fees from Picardia, Filterlex, and Transverse Medical outside the submitted work.\nN.M. reports personal fees from Edwards Lifesciences, Medtronic, Biotronik, Novartis, Sanofi Genzyme, AstraZeneca, Pfizer, Bayer, Abbott, Abiomed, B. Braun, and Boston Scientific, outside the submitted work.","formattedTitle":"Fusion of automatically learned rhythm and morphology features matches diagnostic criteria and enhances AI explainability","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eCardiovascular diseases (CVDs) are the leading cause of premature death worldwide\u003csup\u003e1\u003c/sup\u003e, being responsible for an estimated 20.5\u0026nbsp;million deaths in 2021 according to World Heart Federation\u003csup\u003e2\u003c/sup\u003e, and a main factor for disability\u003csup\u003e3\u003c/sup\u003e. Many CVDs are age related\u003csup\u003e1,3\u003c/sup\u003e. Therefore, demographic changes and increasing life expectancy are expected to lead to a higher incidence of disease in many parts of the world\u003csup\u003e3\u003c/sup\u003e. If detected early, interventions can be initiated and critical courses of CVDs, including premature death, can be prevented\u003csup\u003e3,4\u003c/sup\u003e. Electrocardiogram (ECG) analyses can non-invasively reveal anomalies in the cardiac excitation as predictors of CVDs\u003csup\u003e5\u003c/sup\u003e. The 12-lead ECG analyzed by at least one cardiologist is considered the clinical standard\u003csup\u003e6\u003c/sup\u003e. However, manual ECG analysis is very time consuming and strongly dependent on the physician\u0026rsquo;s expertise, experience, and routine as well as factors such as stress or fatigue\u003csup\u003e6\u003c/sup\u003e. Deep learning (DL) is a high-performance method for automatically detecting CVDs in ECGs, which can support medical diagnostics and reduce personnel dependency\u003csup\u003e6\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eDL refers to deep neural networks (DNNs) which are complex mathematical models from the field of artificial intelligence (AI). DNNs generate knowledge from training data based on machine learning (ML) methods and apply it to unknown data\u003csup\u003e7\u003c/sup\u003e. During training, DNNs can learn non-linear representations of the model output in raw input data through interconnected neurons in hidden layers\u003csup\u003e7,8\u003c/sup\u003e. Therefore, in contrast to shallow ML methods, DNNs are able to learn representations from raw data themselves and do not require any prior feature extraction, which is prone to errors and requires \u003cem\u003ea priori\u003c/em\u003e knowledge\u003csup\u003e7\u003c/sup\u003e. However, DNNs are often referred to as black boxes due to their complexity, making their decision-making process inexplainable\u003csup\u003e9\u003c/sup\u003e. Nevertheless, decision support requires trustworthiness, which is achieved through the explainability of the machine\u0026rsquo;s decision-making process and the interpretability of the features used\u003csup\u003e9,10\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eExplainable AI (xAI) methods can partially brighten the black box of DNNs doing \u003cem\u003epost-hoc\u003c/em\u003e explanations to approximate the relevance of input values for the model decision\u003csup\u003e11\u003c/sup\u003e. Initial research has therefore recently dealt with of the use of xAI in ECG analysis, e.g., for atrial fibrillation (AF) detection\u003csup\u003e12\u0026ndash;15\u003c/sup\u003e, long QT syndrome detection and classification\u003csup\u003e16\u003c/sup\u003e, myocardial infarction detection and classification\u003csup\u003e17\u0026ndash;19\u003c/sup\u003e, or multiclass cardiac anomaly detection and classification\u003csup\u003e13,14,20\u0026ndash;22\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eHowever, making DL usable for decision support requires the understanding of the learned representations\u0026rsquo; causality\u003csup\u003e10\u003c/sup\u003e. Most times\u003csup\u003e12,17\u0026ndash;19,21,22\u003c/sup\u003e, the correspondence of relevant signal components according to xAI and diagnostic criteria is only proven showing exemplary so-called saliency or heat maps, illustrating the relevance of single data points for classification. However, actually understanding the learned representations\u0026rsquo; origin requires systematic investigations, such as those carried out in few papers\u003csup\u003e12\u0026ndash;14,16,20,23\u003c/sup\u003e that are summarized in Table\u0026nbsp;1.\u003c/p\u003e \u003cp\u003eTo achieve clinical relevance, the model explanations must be as close as possible to the cardiological reading of the ECG, considering rhythm and beat morphology for decision-making. In previous studies, QRS complexes are often identified as relevant\u003csup\u003e12,13,15,20,23\u003c/sup\u003e, although xAI methods do not report whether relevance is based on rhythmic or morphologic characteristics, both, or neither. Singh \u0026amp; Sharma classified individual beats and thus narrowed down explanations to morphological characteristics\u003csup\u003e20\u003c/sup\u003e. Ivaturi \u003cem\u003eet al.\u003c/em\u003e estimated the relevance of rhythmic information by measuring the altered relevance of the QRS complexes when eliminating RR interval variability by stretching or compressing the RR intervals\u003csup\u003e13\u003c/sup\u003e. However, it is unclear whether the deformation of the QRS complexes due to stretching/compression influences the changes in relevance. Our recently introduced explainable architecture for ECG analysis (xECGArch) is designed to learn rhythmic and morphological information in two separate CNNs for joint classification\u003csup\u003e15\u003c/sup\u003e. The long-term model considers an entire 10-second ECG at once, enabling rhythm analysis, while the short-term model considers only 0.6-second windows at once and thus obtains hardly any rhythm information but morphology information. However, the relationship between automatically learned features and rhythm and morphology information has not yet been systematically investigated.\u003c/p\u003e \u003cp\u003eIn summary, to the best of our knowledge, there is currently no method for generalizable quantitative analysis of model explanations by examining the relevance of specific ECG characteristics in ECG analysis. However, this is fundamental to bring xAI into clinical use. Aside from the selection of quantitative analysis method, the explanatory power of the results depends on a1) the model\u0026rsquo;s classification quality, a2) the dataset characteristics, and a3) the choice of the xAI method\u003csup\u003e15\u003c/sup\u003e:\u003c/p\u003e \u003cp\u003ea1) Some of the models listed in Table 1 had a sensitivity of less than 90 %\u003csup\u003e13,14,16\u003c/sup\u003e, which is below the threshold for good classification\u003csup\u003e24\u003c/sup\u003e and raises the question of whether the ECG characteristics identified as relevant by xAI are suitable for detecting the corresponding class.\u003c/p\u003e\n\u003cp\u003ea2) The generalizability of the results increases, e.g., with the size and diversity of the dataset. Therefore, the comparison groups for a target class should be diverse in terms of age, sex, or assorted classes for comparison. The latter was considered in most papers summarized in Table 1.\u003c/p\u003e\n\u003cp\u003ea3) In a previous study\u003csup\u003e15\u003c/sup\u003e, by systematically perturbing ECGs we were able to show that the reliability of xAI methods varies greatly. Therefore, to obtain meaningful explanations, the appropriate xAI method must be carefully chosen for each scenario.\u003c/p\u003e \u003cp\u003eWe present a novel approach to trace DL explanations back to the relevance of rhythmic and morphological information. To do this, we use the highly performant xECGArch\u003csup\u003e15\u003c/sup\u003e, which processes long- and short-term information in separate CNNs for a joint classification as depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. We develop a generalizable method for quantitatively and systematically analyzing the relevance of clinically relevant ECG segments for classification. Based on this, we examine the relationship of model explanations and morphological beat changes or rhythmic information. Furthermore, we present xFuseMap, combining explanations for both long- and short-term models in a fused saliency map to provide a highly interpretable representation of relevant rhythm and morphology features in ECG analysis for the first time. To achieve this, we construct a two-dimensional color space by assigning both models a distinctive color.\u003c/p\u003e \u003cp\u003eTo conduct our study, we use AF as the target class as it is particularly suitable for our study due to b1) its clinical relevance and b2) its pathophysiology and non-AF (n-AF) as a reference class in favor of the generalizability of our results:\u003c/p\u003e \u003cp\u003e b1) AF is the most common clinically relevant cardiac arrhythmic disease globally\u003csup\u003e25\u003c/sup\u003e with a lifetime risk of 22\u0026ndash;36%\u003csup\u003e26\u003c/sup\u003e. If untreated, AF is associated with an up to 5-fold increased morbidity, especially stroke, and an up to 2-fold increased mortality\u003csup\u003e27\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eb2) AF is characterized by pathophysiological rhythmic and morphological changes in ECG. It is defined by a disordered spread of excitation in the atria and an irregular transmission to the ventricles. Therefore, the P waves, as reflections of the atrial depolarization, are substituted by F waves that superimpose on the beat morphology\u003csup\u003e28\u003c/sup\u003e. The irregular transmission to the ventricles reflects in an absolute arrhythmia\u003csup\u003e29\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eBy conducting a systematic and quantitative analysis of the causal relationship between DL explanations and clinically relevant ECG features on a large and diverse dataset for the first time, we enhance the comprehensibility of xAI, combining meaningful explanations and their interpretability. Furthermore, our approach enables for the first time to account for morphological characteristics that are irregularly distributed across the ECG, such as F waves, which are of great clinical importance. For the first time, xFuseMap offers the potential to present explanations that originate from rhythm and morphology in a combined and interpretable manner that is oriented towards the clinical reading of ECGs. Our methods therefore shed light on the black box of DL-based approaches and pave the way for their implementation of diagnostic support in clinical practice.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"2 Results","content":"\u003cp\u003eWe applied xECGArch to 1,521 unseen, publicly available 10-second single-lead ECGs to classify AF vs. n-AF. The fusion of the decision explanations of the long- and short-term models of xECGArch in xFuseMap provides dissociable information about the relevance of ECG segments on two temporal levels. We demonstrate the relationship between model explanations and signal morphology or rhythm. For this purpose, we split the n-AF class into NSR and other anomalies (O).\u003c/p\u003e \u003cp\u003e \u003cb\u003exECGArch classification performance\u003c/b\u003e \u003c/p\u003e \u003cp\u003eOn an unseen dataset, containing \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(n\\)\u003c/span\u003e\u003c/span\u003e = 1,521 ECGs, xECGArch reached an overall accuracy of 96.3% and an F1-score of 94.5% for classifying AF vs. n-AF. To achieve this, the long- and short-term models were weighted in a ratio of 1.669:1, improving the accuracy by 0.46 and the F1-score by 0.65 percentage points compared to xECGArch without optimized weights (see Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e2\u003c/span\u003e). xECGArch discriminated AF slightly better from NSR (F1\u0026thinsp;=\u0026thinsp;96.4%) than from O (F1\u0026thinsp;=\u0026thinsp;95.4%), which both make 50% of the n-AF class, respectively. The long-term model outperformed the short-term model with F1\u0026thinsp;=\u0026thinsp;94.3% vs. F1\u0026thinsp;=\u0026thinsp;92.1% in classifying AF vs. n-AF.\u003c/p\u003e \u003cp\u003eThe combination of the long-term and the short-term model in xECGArch with optimized weights improved the F1-score by 0.16 or 1.87 percentage points compared to each single model for AF vs. n-AF. However, the optimization did not lead to a change in sensitivity, but an increase in precision from 92.9\u0026ndash;94.1%. ECGs with AF were detected equally well, but there were less misclassifications of n-AF ECGs (details in Appendix A). This applies to discriminating AF from both NSR and O.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eClassification metrics for the long-term and the short-term model as well as xECGArch with and without optimized weights of both models that are combined in xECGArch. The original classfication task was atrial fibrillation (AF) vs. non-atrial fibrillation (n-AF). For more detailed insights, the classification metrics are also provided for n-AF being split into normal sinus rhythm (NSR) and other anomalies (O), each containing 50% of the n-AF cases.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eClassifier\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eClassification task\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSensitivity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eSpecificity\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eF1\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eLong-term model\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF vs. n-AF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e94.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e97.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e96.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e94.3%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF vs. NSR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e97.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e97.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e96.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e96.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF vs. O\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e96.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e96.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e95.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.3%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003eShort-term model\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF vs. n-AF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e90.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e94.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e94.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e92.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF vs. NSR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e95.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e95.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e94.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e94.8%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF vs. O\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e94.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e94.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e94.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e94.3%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003exECGArch\u003c/p\u003e \u003cp\u003e\u003cem\u003e(1:1 weights)\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF vs. n-AF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e92.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e96.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e95.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e93.9%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF vs. NSR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e97.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e97.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e96.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e96.0%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF vs. O\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e95.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e95.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e95.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.2%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003exECGArch\u003c/p\u003e \u003cp\u003e\u003cem\u003e(optimized weights)\u003c/em\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF vs. n-AF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e94.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e97.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e96.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e94.5%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF vs. NSR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e98.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e98.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e96.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e96.4%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF vs. O\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e96.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e94.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e96.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e95.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e95.4%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eFusion of long- and short-term model explanations in a combined saliency map (xFuseMap)\u003c/b\u003e \u003c/p\u003e \u003cp\u003eWe extracted the explanations of both xECGArch models in terms of the relative relevance (rR) of individual samples for the classification decision using deep Taylor decomposition (DTD) and visualized them using xFuseMap. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e shows the xFuseMaps for correctly classified a) AF, b) O (classified as n-AF; sinus rhythm with ventricular ectopics), and c) NSR (classified as n-AF). The colors represent the samples' rR according to DTD. Blue sections were particularly relevant for the long-term model, orange sections for the short-term model, and pink sections for both models. Black sections were not relevant for any model. The decision reliability is given for both models separately.\u003c/p\u003e \u003cp\u003eIn all three examples, xFuseMap indicates QRS complexes and especially their right flanks to be primarily relevant to the long-term model, while sequences in between are primarily relevant to the short-term model. An exception is the example for O, which is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb). The ventricular ectopics at seconds 6.2 and 9.3 including the immediately preceding beats are primarily relevant to the short-term model and barely relevant to the long-term model. Furthermore, the remaining R peaks which are in sinus rhythm are colored purple because the left flanks up to the R peaks are of increased rR to the short-term model.\u003c/p\u003e \u003cp\u003eWhile, in case of NSR, the QRS complexes are uniformly colored blue (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ec)) or purple (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb)), in case of AF (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea)), the blue tone of the QRS complexes fluctuates irregularly between dark and light blue. This means that while the rR of the QRS complexes that occur irregularly in AF varies, the QRS complexes in NSR are all approximately equally relevant.\u003c/p\u003e \u003cp\u003eIn the case of n-AF, the flanks of the P wave (primarily the right flanks) are particularly relevant to the short-term model. The right flank of the T wave, the area between the T and subsequent P waves and the left flanks of ventricular ectopics are also partially colored orange and are therefore relevant for distinguishing from AF. Figure\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003ea) shows that F waves and, in this case, especially their right flanks are relevant for the short-term model for the detection of AF. This is best visible between seconds 5.5 and 6 and between seconds 7.8 and 8.5.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eClass-specific relation between model explanations and morphology\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo systematically investigate the model explanations, we analyzed the rR distribution across 9 diagnostically relevant beat segments and F waves. Therefore, we determined the mean rR of each beat segment type as well as its variability in terms of standard deviation (SD) within a recording. The descriptive statistics are presented as box plots in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea) and b), for the mean rR and in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ec) and d), for the SD. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003ea) and c) show the results of the long-term model, while Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eb) and d) show the results of the short-term model.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eTwo-factor analyses of variance (ANOVAs), performed separately on the mean rR and the SD of the rR of both models individually, showed significant differences (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e) between the classes, the segments, and the factor interaction. The residuals of the ANOVAs were not normally distributed (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e). For clarity, we only show the Tukey-Kramer-corrected significance levels for the class differences within the same segment types in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e. An overview of all results of the \u003cem\u003epost-hoc\u003c/em\u003e analyses can be found in Appendix B. Across all classes, the R peaks were the most important segments for the long-term model, with a mean rR of 0.609, 0.754, and 0.760 in the median for AF, O, and NSR, respectively. R was followed by the S and Q segments, with the median of the mean rR ranging from 0.341 for the Q segment in AF to 0.521 for the S segment in NSR. The mean rR in Q, R, and S for the long-term model was significantly lower (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e) in cases of AF compared to O or NSR. However, for AF, the SD of mean rR per segment over one recording was significantly (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e) increased by 59.9% (in R) to 67.4% (in S) compared to O and by 69.2% (in Q) to 80.2% (in S) compared to NSR. The median of the remaining segments' mean rR for the long-term model was significantly (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e) lower, ranging from 0.085 to 0.180, with slightly increased rR values in QT (for all classes) as well as P and PQ (for O and NSR). There were no significant differences between O and NSR for the mean rR per segment except for the PQ segment, with the rR in NSR significantly (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e) exceeding that in O. Same applies for the SD of the mean rR per segment, except for Q, R, and S, where the variability in O significantly exceeded (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e) that in NSR.\u003c/p\u003e \u003cp\u003eDifferences between the rR in individual segments were more subtle in the short-term model than in the long-term model. The most important segments for the short-term model were the P waves, with the median of mean rR of 0.449 and 0.424 for O and NSR, respectively. The P wave was followed by the R peak (all classes), the TQ segment (especially in O and NSR), and the F waves (AF only) with the median of mean rR ranging from 0.317 for F waves in AF to 0.376 for TQ segments in O. However, for AF, the SD was significantly increased (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e) by 90.7% (in S) to 213.4% (in TQ) compared to O and by 137.8% (in S) to 332.9% (in T) compared to NSR.\u003c/p\u003e \u003cp\u003e \u003cb\u003eRelation between model explanations and rhythm\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo investigate whether rhythm information is reflected in the model explanations, we conducted correlation analyses (see Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Direct and indirect correlations were tested between absolute changes in consecutive RR intervals \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta RR\\right|\\)\u003c/span\u003e\u003c/span\u003e and absolute changes of segment-specific rR in consecutive beats \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta rR\\right|\\)\u003c/span\u003e\u003c/span\u003e at both the beat-to-beat level and the recording level. The classification of correlation strengths is based on Cohen\u003csup\u003e30\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eAt beat-to-beat level, we found no correlations for the short-term model, but significant weak correlations between \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta RR\\right|\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta rR\\right|\\)\u003c/span\u003e\u003c/span\u003e in ECGs with AF and O for the long-term model in the Q, R, and S segments with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.120\\)\u003c/span\u003e\u003c/span\u003e to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.160\\)\u003c/span\u003e\u003c/span\u003e (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e), peaking in AF. Furthermore, weak correlations were found for the ST segment in AF (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.109\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e) and for the P and PQ segments in O (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.127\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.129\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eAt recording level, we found significant (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e) weak to strong correlations between mean absolute differences in consecutive RR intervals \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta RR\\right|}\\)\u003c/span\u003e\u003c/span\u003e and mean absolute changes of segment-specific rR in consecutive beats \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta rR\\right|}\\)\u003c/span\u003e\u003c/span\u003e across all classes. The strongest correlations up to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.635\\)\u003c/span\u003e\u003c/span\u003e (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e, class: O, model: long-term, segment: Q) were found for QRS complexes regardless of class. However, these correlations were prominent only in the long-term model, whereas in the short-term model, correlations with the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta rR\\right|}\\)\u003c/span\u003e\u003c/span\u003e of the P wave and the TQ segment were most prominent. Furthermore, there are class differences in the correlation between \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta RR\\right|}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta rR\\right|}\\)\u003c/span\u003e\u003c/span\u003e. In AF, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta RR\\right|}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta rR\\right|}\\)\u003c/span\u003e\u003c/span\u003e were moderately correlated for Q, R, and S in the long-term model with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.336\\)\u003c/span\u003e\u003c/span\u003e to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.415\\)\u003c/span\u003e\u003c/span\u003e and weakly correlated with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.178\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.243\\)\u003c/span\u003e\u003c/span\u003e for ST and QT (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e). In the short-term model, we found a weak negative correlation for TQ (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=-0.133\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.01\\)\u003c/span\u003e\u003c/span\u003e). For class O, we found moderate to strong correlations across all segments and models. In the long-term model, correlations with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta rR\\right|}\\)\u003c/span\u003e\u003c/span\u003e of QRS complexes, P waves, and PQ segments stood out with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.591\\)\u003c/span\u003e\u003c/span\u003e to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.635\\)\u003c/span\u003e\u003c/span\u003e (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e). In the short-term model, however, we found the strongest correlations for the P wave (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.590\\)\u003c/span\u003e\u003c/span\u003e) and the TQ segment (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.596\\)\u003c/span\u003e\u003c/span\u003e), but only moderate correlations for the QRS complex (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.405\\)\u003c/span\u003e\u003c/span\u003e to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.439\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e). In NSR ECGs, a similar pattern emerged with a correlation of \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.338\\)\u003c/span\u003e\u003c/span\u003e or \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.349\\)\u003c/span\u003e\u003c/span\u003e (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e) for the P wave and the TQ segment in the short-term model versus \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.009\\)\u003c/span\u003e\u003c/span\u003e (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\\ge 0.05\\)\u003c/span\u003e\u003c/span\u003e) to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.183\\)\u003c/span\u003e\u003c/span\u003e (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e) for the remaining segments. In the long-term model, correlations were weak across segments, with the maximum for the QRS complex (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.267\\)\u003c/span\u003e\u003c/span\u003e to \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r=0.293\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e).\u003c/p\u003e"},{"header":"3 Discussion","content":"\u003cp\u003eWe have proposed an approach for the systematic analysis of DL model explanations in ECG analysis. Compared to previous approaches that utilize static segmentation of beats\u003csup\u003e16\u003c/sup\u003e or RR intervals\u003csup\u003e13,20\u003c/sup\u003e, our approach allows for the quantitative analysis of the relevance of diagnostically relevant segments, which can be both beat-based and, as exemplified by F waves, irregularly distributed across the signal. Compared to pseudo-quantitative template-based approaches\u003csup\u003e14,15\u003c/sup\u003e, we obtain rhythm information in addition to information on the variability of morphology and relevance. Moreover, this enables quantitative investigations of group differences using statistical methods. To ensure the trustworthiness and interpretability of DL explanations, it is crucial to verify the consistency in the use of features by DL models in addition to their agreement with clinically used ECG characteristics and diagnostic criteria for the classification problem. All three requirements are verified using our approach.\u003c/p\u003e \u003cp\u003eIn order to investigate the relationship between automatically learned features and diagnostic criteria, we investigated the example of AF. The main characteristics of AF are the substitution of P waves by superimposed F waves due to uncoordinated atrial excitation and absolute tachyarrhythmia due to irregular conduction to the ventricles\u003csup\u003e29,31\u003c/sup\u003e. For both models of xECGArch, long- and short term, we analyzed the mean relevance within ECG segments and their variability over a recording. Based on the hypothesis that the long-term model primarily uses rhythm information while the short-term model analyzes signal morphology, we expected the following: c1) The long-term model focuses on QRS complexes, as observed in previous studies\u003csup\u003e14,15,23\u003c/sup\u003e, that are particularly suitable for rhythm analysis and c2) the short-term model focuses on segments that differ in morphology between AF and n-AF.\u003c/p\u003e \u003cp\u003ec1) We assumed irregular, closely spaced QRS complexes to be relevant to the long-term model for AF detection, while regular QRS complexes are relevant for n-AF detection. Regardless of class, Q, R, and S are the most relevant segments for the long-term model (see Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). The fact that the mean rR in all three segments is lower for AF than for n-AF, but the SD is significantly higher, shows that the relevance of the QRS complexes for the detection of AF fluctuates and is more evenly distributed for the detection of n-AF, which is supported by the exemplary ECGs in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e \u003cp\u003ec2) We expected the P waves to be particularly relevant to the short-term model for n-AF detection and the F waves for AF detection. F waves are differently pronounced depending on the ECG lead and the subject and therefore cannot necessarily be observed. However, they are superimposed on the remaining signal and lead to irregular deformation of characteristic segments\u003csup\u003e28\u003c/sup\u003e, e.g., in the TQ segment. As anticipated, according to Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e and Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, TQ segment and F waves are particularly relevant for the short-term model for AF detection and P waves for n-AF. The enhanced relevance of the TQ segments for n-AF detection can be explained by the position of the P wave within the segment. The R wave has an increased relevance for both classes, which can be attributed to either its rhythmic or morphological information content. Moreover, it is relevant to distinguish P and F waves depending on the distance to subsequent QRS complexes. The increased SD of the rR across a recording in AF compared to n-AF supports our hypothesis that F waves lead to irregularity of morphology and, consequently, in the relevance of all segments.\u003c/p\u003e \u003cp\u003eTo assess the ability of the employed models to learn rhythm information, we examined the relationship between changes in RR intervals \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta RR\\right|\\)\u003c/span\u003e\u003c/span\u003e and beat-to-beat changes of segment-specific rR in consecutive beats \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta rR\\right|\\)\u003c/span\u003e\u003c/span\u003e. Correlation analyses at beat-to-beat level revealed weak correlations (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(r\u0026gt;0.1\\)\u003c/span\u003e\u003c/span\u003e, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e) between \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta RR\\right|\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta rR\\right|\\)\u003c/span\u003e\u003c/span\u003e in the Q, R, and S segments in AF and O but not NSR, for the long-term model only. This supports our hypothesis that, in contrast to the short-term model, the long-term model learns rhythm and uses this information for classification with large \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta rR\\right|\\)\u003c/span\u003e\u003c/span\u003e indicating AF and small \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta rR\\right|\\)\u003c/span\u003e\u003c/span\u003e indicating the absence of AF. This is further supported by the color-coding of equidistant QRS complexes as highly relevant for the classification of n-AF in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb), while ventricular ectopics were marked as irrelevant to the long-term model but relevant to the short-term model.\u003c/p\u003e \u003cp\u003eOne potential explanation for the absence of stronger correlations at the beat-to-beat level is the possibly complex relationship between rhythm and model explanations, which result from the model complexity and cannot be represented at this level. Another correlation analysis at the recording level revealed moderate to strong correlations between \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta RR\\right|}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta rR\\right|}\\)\u003c/span\u003e\u003c/span\u003e, depending on the class. Consistently, the strongest correlations were found for the long-term model, peaking in the QRS complexes for AF and O. Therefore, rhythm does not appear to be a significant factor for the short-term model in the classification of AF. For the class O, all correlations were moderate to strong regardless of model and segment. This can be explained by the fact that rhythm changes relevant for the long-term model, e.g., due to extrasystoles, are mostly related to morphological beat changes, which in turn is relevant for the short-term model. In accordance with the hypothesis that strong rhythm changes are indicative for n-AF, we observed weak correlations between \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta RR\\right|}\\)\u003c/span\u003e\u003c/span\u003e and \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta rR\\right|}\\)\u003c/span\u003e\u003c/span\u003e of the long-term model for NSR, which aligns with the continuous coloring of the equidistant QRS complexes in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eb)-c). For TQ segments and the P waves that are within, except for AF, we observed an increased correlation with \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta RR\\right|}\\)\u003c/span\u003e\u003c/span\u003e for the short-term model across all classes, which is negative for AF. With increasing variability of the RR within a recording, the rR across these segments therefore becomes more homogenous for AF and more heterogenous for n-AF. Correlation analyses do not provide any information about causal relationships. However, possible explanations for AF are the poorer detectability of F waves in high-tachycardic, pseudo-regularized rhythm and for n-AF the autonomic coupling of jointly occurring changes in the atrial depolarization (P wave) and the RR\u003csup\u003e32\u003c/sup\u003e or pathophysiological correlations. In summary, both correlation analyses indicate that the rhythmicity is reflected in the relevance of the QRS complex for the long-term model, but not for the short-term model.\u003c/p\u003e \u003cp\u003eFor providing diagnostic support, automated ECG analysis must be trustworthy, i.e., the classifier\u0026rsquo;s decisions must be accurate and comprehensible. Comprehensibility combines the provision of meaningful explanations of the decision-making process and its interpretability for physicians. xAI approaches aim on explaining DL-based decisions. However, their reliability has to be verified for each scenario in a systematic comparison of xAI methods, which is usually not carried out\u003csup\u003e15\u003c/sup\u003e. We applied xECGArch to a large, diverse dataset, derived from 4 public databases. Overall, with an accuracy of 96.3%, compared to 95.3%\u003csup\u003e12\u003c/sup\u003e \u0026ndash; 99.6%\u003csup\u003e33\u003c/sup\u003e, and an F1 score of 94.5%, compared to 80.7%\u003csup\u003e13\u003c/sup\u003e \u0026ndash; 93.1%\u003csup\u003e12\u003c/sup\u003e, xECGArch is in the upper range of the state of the art of AF detection algorithms and therefore fulfills the need for accurate classification. For high classification accuracy, the DTD explanations, which provided the most reliable explanations for xECGArch in a systematic comparison\u003csup\u003e15\u003c/sup\u003e, can thus be assumed to be meaningful. Interpretability is ensured by displaying the explanations using xFuseMap and by systematically validating their agreement with clinically relevant ECG characteristics.\u003c/p\u003e \u003cp\u003eWith xFuseMap for the first time we present an approach to visualize the explanations of two models with different focus in a combined saliency map as dissociable information. We show that the explanations are consistent and, due to the design of xECGArch, based on the clinical reading of ECGs, divided into rhythm and morphology. In this way, we not only provide meaningful explanations for the model decision, but also separate them according to the usual reading of biosignals regarding their relevance from a rhythmic or morphological point of view in order to support diagnostics as effectively as possible.\u003c/p\u003e \u003cp\u003eThe principle of xECGArch as well as the combined representation of both model explanations using xFuseMap and the validation of the explanations by extracting the relevance values within diagnostically relevant segments can be transferred to other diseases or biosignals. This, however, requires an adaption of xECGArch and the choice of segments for validation matching the classification problem. xFuseMap is applicable to the model explanations of any classification problem based on continuous signals, solved by two classifiers of different domains. The combined representation enables for the first time assigning relevance information to a specific domain, in this case rhythm or morphology, which has significant impact on the interpretability of explanations. On the one hand, this is not only useful for diagnostic support, but also opens up new possibilities for applications in the areas of research and teaching.\u003c/p\u003e"},{"header":"4 Methods","content":"\u003cp\u003eWe used the pre-trained xECGArch\u003csup\u003e15\u003c/sup\u003e to classify 1,521 unseen ECGs into AF and n-AF. Subsequently model explanations were extracted in terms of each sample\u0026rsquo;s relevance for classification using DTD for both, the long-term and the short-term model.\u003c/p\u003e \u003cp\u003eThe model explanations were combined in xFuseMap to present the long- and short-term relevance information, illustrating the impact of rhythmic and morphological characteristics on the classification decision of xECGArch.\u003c/p\u003e \u003cp\u003eTo validate the model-dependent explanations, we developed a generalizable method to quantify the relevance of diagnostically important ECG segments. Subsequently, we statistically examined the relationship between model explanations and morphological characteristics and rhythm.\u003c/p\u003e \u003cp\u003eThe classification procedure utilizing xECGArch and the subsequent extraction of the model explanations were conducted in Python 3.9.19 and TensorFlow 2.12. All other operations were performed in Matlab R2021b (MathWorks Inc., Natick, MA, USA).\u003c/p\u003e \u003cp\u003e \u003cb\u003eData material\u003c/b\u003e \u003c/p\u003e \u003cp\u003eWe used the unseen training dataset from xECGArch\u003csup\u003e15\u003c/sup\u003e and increased the number of NSR \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({n}_{NSR}\\)\u003c/span\u003e\u003c/span\u003e and O \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({n}_{O}\\)\u003c/span\u003e\u003c/span\u003e ECGs using additional unseen data from the same databases to equal the number of AF \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({n}_{AF}\\)\u003c/span\u003e\u003c/span\u003e ECGs (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\({n}_{AF}{=n}_{NSR}={n}_{O}=507\\)\u003c/span\u003e\u003c/span\u003e) and thus created a balanced dataset. This enabled the statistical examination of differences between AF, NSR, and O in the distribution of relevance values. Although the classes NSR and O were not considered separately but combined during training, differences in the explanations for both classes could indicate the methodology\u0026rsquo;s transferability.\u003c/p\u003e \u003cp\u003eIn total, we used 1,521 ECGs from 4 public databases, acessible via PhysioNet\u003csup\u003e34\u003c/sup\u003e. These included the China Physiological Signal Challenge 2018 (CPSC2018) database\u003csup\u003e35\u003c/sup\u003e, the Chapman-Shaoxing (ChapShao) database\u003csup\u003e36,37\u003c/sup\u003e, the Georgia 12-lead ECG Challenge (Georgia) database\u003csup\u003e38,39\u003c/sup\u003e, and the XL database from the Physikalisch-Technische Bundesanstalt (PTB-XL)\u003csup\u003e5,40\u003c/sup\u003e. As the majority of the recordings in these databases are 10 seconds in length, only ECGs of at least this length were used. Shorter ECGs were not considered. Longer ECGs were clipped to the middle 10 seconds. The distribution of age and sex among the databases and classes can be observed in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e4\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eTo ensure transferability to future applications in wearables and mobile devices, as used for example in the TIMELY project\u003csup\u003e41,42\u003c/sup\u003e, we have restricted our analyses to single-lead ECGs. F waves, one of the main characteristics of AF, can be best seen in leads II and V1, since the position and axes of the ECG leads are most suitable for measuring the spatial excitation of both atria\u003csup\u003e43,44\u003c/sup\u003e. Since leads from mobile devices are usually based on the Einthoven limb leads, we chose lead II for our analyses.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eData composition in terms of age and gender, divided according to the source database and class criteria. AF, atrial fibrillation; CPSC, China physiological signal challenge; f, female; m, male; NSR, normal sinus rhythm; O, others; PTB, Physikalisch-Technische Bundesanstalt; SD, standard deviation.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"6\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colspan=\"2\" morerows=\"1\" nameend=\"c2\" namest=\"c1\" rowspan=\"2\"\u003e \u003cp\u003eCriteria\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\" morerows=\"1\" rowspan=\"2\"\u003e \u003cp\u003e\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\varvec{n}\\)\u003c/span\u003e\u003c/span\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eAge\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\" morerows=\"1\" rowspan=\"2\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eSex\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003eMean\u0026thinsp;\u0026plusmn;\u0026thinsp;SD (years)\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cem\u003em / f (%)\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eOverall\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1,521\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e62.3\u0026thinsp;\u0026plusmn;\u0026thinsp;17.0\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e54 / 46\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"3\" rowspan=\"4\"\u003e \u003cp\u003e\u003cb\u003eSource databases\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eChapman/Shaoxing\u003csup\u003e36\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e412\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e63.9\u0026thinsp;\u0026plusmn;\u0026thinsp;15.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e53 / 47\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCPSC2018\u003csup\u003e35\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e360\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e62.7\u0026thinsp;\u0026plusmn;\u0026thinsp;19.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e58 / 42\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eGeorgia\u003csup\u003e38,39\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e370\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e62.4\u0026thinsp;\u0026plusmn;\u0026thinsp;15.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e57 / 43\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePTB-XL\u003csup\u003e5,40\u003c/sup\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e379\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e60.0\u0026thinsp;\u0026plusmn;\u0026thinsp;17.7\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e50 / 50\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\" morerows=\"2\" rowspan=\"3\"\u003e \u003cp\u003e\u003cb\u003eClasses\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAF\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e507\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e71.6\u0026thinsp;\u0026plusmn;\u0026thinsp;12.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e58 / 42\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNSR\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e507\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e54.2\u0026thinsp;\u0026plusmn;\u0026thinsp;16.2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e45 / 55\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eO\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e507\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e61.0\u0026thinsp;\u0026plusmn;\u0026thinsp;17.3\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e57 / 43\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eModel parameterization\u003c/b\u003e \u003c/p\u003e \u003cp\u003exECGArch\u003csup\u003e15\u003c/sup\u003e is an architecture for automated ECG analysis comprising two parallel 1D CNNs and a combined decision-making process based on the clinical reading of ECGs. Both CNNs share the same architecture, however, the long-term model considers the entire signal duration at once, making it more sensitive to rhythmicity. The short-term model has a short observation period of 0.6 seconds, making it more sensitive to morphological changes within beats.\u003c/p\u003e \u003cp\u003eBoth, the long-term and the short-term CNNs individually analyze and classify an ECG. For the combined decision, the decision reliability values of both models for or against AF are weighted averaged. The highest averaging class values are considered the classification result. This allows us to consider that, depending on the specific CVD, the rhythm and the morphology may be differentially affected. The weights were determined by maximizing the combined F1 score. Various xAI methods have been applied to ECG classification previously (see Table\u0026nbsp;1). However, our recent findings indicate that the eligibility of xAI methods might be case-dependent, and that the xAI method must therefore be carefully selected to obtain reliable explanations\u003csup\u003e15\u003c/sup\u003e. With regard to xECGArch, we investigated the reliability of 13 different xAI methods by gradually perturbing and reclassifying the signals. DTD exhibited the most significant decline in classification performance, and thus was selected as the most reliable xAI method for this scenario.\u003c/p\u003e \u003cp\u003eThe concept underlying the DTD is the redistribution of the model output to the input components via relevance propagation rules. The relevance of a neuron in the upstream layer is determined by summing the relevance values of all neurons (or neuron outputs) in a layer. This procedure is performed for each neuron in each layer, including the input layer. In contrast to other methods, the DTD only provides information about the relevance of individual samples for and not against the assignment to a class. The DTD explanations in terms of the sample-wise relevance for the classification as AF or n-AF were calculated using the iNNvestigate 2.0 Toolbox\u003csup\u003e45\u003c/sup\u003e and subsequently normalized between 0 and 1.\u003c/p\u003e \u003cp\u003e \u003cb\u003eModel explanation fusion (xFuseMap)\u003c/b\u003e \u003c/p\u003e \u003cp\u003eSince both, rhythm and morphology, are pertinent to ECG analysis and the detection of CVDs such as AF, a trusthworthy algorithm should use information from both domains and present relevant areas in a dissociative manner. With xFuseMap, we integrate the relevance information from both domains into a combined saliency map. To ensure interpretability, we color-code the origin of the relevance information in terms of rhythm or morphology, assigning a color to both models. A two-dimensional color space is created with the rR for each model on one axis (see Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e). To achieve a high color contrast, we chose blue for the long-term model and orange for the short-term model. Areas that are relevant to neither or both models are colored black or pink. In the areas in between, the colors blend smoothly. A dark gray background improves visual recognition of relevant areas compared to a light background. Other color combinations are conceivable. By multiplying the rR values by the models\u0026rsquo; decision certainties, we account for the reduced significance of uncertain decisions in the combined representation. In addition, the line thickness in the saliency maps for each sample is adjusted to correlate with the maximum relevance value from both models for each sample.\u003c/p\u003e \u003cp\u003eFusing the model explanations in xFuseMap therefore requires the following steps:\u003c/p\u003e\u003cp\u003e\n\u003col\u003e\n \u003cli\u003eapplying xECGArch or two other models with dissociable focus,\u0026nbsp;\u003c/li\u003e\n \u003cli\u003eextracting and normalizing the relevance values of both models for the majority class, and\u003c/li\u003e\n \u003cli\u003evisualizing the combined rR of both models according to xFuseMap, by\u003col style=\"list-style-type: lower-alpha;\"\u003e\n \u003cli\u003esample-wise color-coding according to xFuseMap and\u003c/li\u003e\n \u003cli\u003eadjusting the line thickness according to the merged rR.\u003c/li\u003e\n \u003c/ol\u003e\n \u003c/li\u003e\n\u003c/ol\u003e\u003c/p\u003e\u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e shows an example saliency map for the combined explanation of both models in xECGArch for the classification of an AF ECG using xFuseMap. The peaks of the second and fifth QRS complexes (rectangular marking) are highly relevant for the long-term model and hardly relevant for the short-term model, which is why they are located at the bottom right of the color map and are therefore highlighted in clear blue and with a thick line in the saliency map. The remaining QRS complexes (triangular marking) show a lower relevance for the long-term model and still hardly for the short-term model. The relevance values are shifted to the left in the colormap and the corresponding points in the saliency map are colored dark blue. Due to the lower rR, the line for these QRS complexes is thinner.\u003c/p\u003e \u003cp\u003eF waves are clearly recognizable in the ECG, which partially overlap the ST segment or the T wave (circular marking). They are colored orange and highlighted by a thick line, which means that the F waves are highly relevant for the decision, especially for the short-term model and less for the long-term model, which is why the rR values are positioned at the top left of the colormap.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eInvestigating the class-specific relation between model-dependent relevance and diagnostic criteria\u003c/b\u003e \u003c/p\u003e \u003cp\u003eWe quantified the relevance of specific ECG segments using a template-based approach to investigate the class-specific relationship between the model-dependent relevance and diagnostic criteria, such as class-specific rhythm or morphology.\u003c/p\u003e \u003cp\u003e \u003cb\u003eA generalizable method for quantifying the relevance of specific ECG segments\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTo quantitatively evaluate the explanations, we determined the mean rR of all samples within the ECG segments shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e for each model, as well as its variability in terms of the standard deviation (SD) of the segment-averaged rR within a recording.\u003c/p\u003e \u003cp\u003eEach ECG was processed separately. First, the ECGs were band-pass filtered between 0.3 and 120 Hz to remove artifacts and high-frequency noise, and notch filtered at 50 or 60 Hz, depending on the database source, to remove grid noise\u003csup\u003e46,47\u003c/sup\u003e. Second, to robustly detect each the fiducial points of each beat, we applied iterative two-dimensional signal warping (i2DSW)\u003csup\u003e48,49\u003c/sup\u003e, which implies the reflection of fiducial points of a template beat to every single beat by iteratively fitting the template beat to each individual beat. For this purpose, QRS complexes were automatically detected\u003csup\u003e50,51\u003c/sup\u003e, beats were extracted\u003csup\u003e50\u003c/sup\u003e, and then template beats were generated by averaging the beats\u003csup\u003e48,49\u003c/sup\u003e. Fiducial points were detected using the ECGdeli toolbox\u003csup\u003e46\u003c/sup\u003e, checked by an expert and corrected manually if necessary.\u003c/p\u003e \u003cp\u003eSubsequently, 9 segments were defined, within which the relevance information was averaged beat-wise, as shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e. The Q and the S segments were defined as the time between the R peak and the Q or S peak, respectively. The TQ segment was defined as the time between the offset of the T wave T\u003csub\u003eOff\u003c/sub\u003e of beat \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(i\\)\u003c/span\u003e\u003c/span\u003e and the Q peak of the following beat \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(i+1\\)\u003c/span\u003e\u003c/span\u003e. This could not be calculated for the last beat as there was no following beat.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eWe proceeded identically for F waves, that we detected additionally by applying the \u003cem\u003efindpeaks\u003c/em\u003e Matlab function to beat-corrected and filtered ECGs\u003csup\u003e52\u003c/sup\u003e. Beat correction was achieved by subtracting the i2DSW beat adjusted templates from the signal. Subsequently, the beat-corrected signal was band-pass filtered between 5 and 10 Hz.\u003c/p\u003e \u003cp\u003e \u003cb\u003eStatistical analysis of the class-specific relation between model explanations and morphology\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe objective of our study was to demonstrate that xECGArch employs information that aligns with clinical knowledge for the detection of AF, and that the short-term model, in contrast to the long-term model, primarily uses morphology features. We therefore considered class- and model-dependent differences in the mean rR as well as the SD of the segment-averaged rR and tested for statistical significance using a two-factor ANOVA for each metric and each model separately. We used segment and class as independent factors and relevance as dependent factor. Only correctly classified ECGs were included in the ANOVA. Furthermore, outliers were identified and removed using the Matlab function \u003cem\u003eisoutlier\u003c/em\u003e with outliers being defined as values that were more than three scaled median absolute deviations away from the median. The residuals were tested for normal distribution using the Kolmogorov-Smirnov test, although the execution of the ANOVA did not depend on the result, as the ANOVA is considered robust against violations of the normal distribution\u003csup\u003e53\u003c/sup\u003e. Subsequently, multiple Student\u0026rsquo;s \u003cem\u003et\u003c/em\u003e-tests were applied as a \u003cem\u003epost-hoc\u003c/em\u003e analysis to identify group-specific differences, with Tukey-Kramer's correction for multiple testing employed to address alpha error accumulation.\u003c/p\u003e \u003cp\u003e \u003cb\u003eStatistical analysis of the relation between model explanations and rhythm\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTwo analyses were conducted to examine the models\u0026rsquo; ability to use rhythm information for classification. Initially, at a beat-to-beat level, we examined whether the altered rhythm observed in AF compared to O and NSR directly translates into a fluctuating rR within segments of the same type. Secondly, at the recording level, we examined whether there is a more complex relation between the RR variability and the variability of rR information within similar segments that cannot be examined at a beat-to-beat level but on an abstracted level. We hypothesized that the change in RR intervals results in a change in the rR in specific segments. Consequently, we considered the absolute difference between consecutive RR intervals (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta RR\\right|\\)\u003c/span\u003e\u003c/span\u003e) and the absolute differences of segment-specific rR inconsecutive beats (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta rR\\right|\\)\u003c/span\u003e\u003c/span\u003e) as measures for spontaneous changes in rhythm and rR. It should be noted that we did not exclude abnormal beats. Therefore, \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta RR\\right|\\)\u003c/span\u003e\u003c/span\u003e as a measure of rhythm changes should not be confused with heart rate variability features, which measure the variability of the distance between normal beats. Furthermore, F waves have not been included in these analyses as they are not directly related to the excitation of the ventricles and, therefore, cannot be assigned to individual beats in the ECG.\u003c/p\u003e \u003cp\u003eFor beat-to-beat level analysis, we computed the correlation between the Euclidean normalized \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta RR\\right|\\)\u003c/span\u003e\u003c/span\u003e and the \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta rR\\right|\\)\u003c/span\u003e\u003c/span\u003e for all recordings of each class for the long-term and the short-term models separately. We used Spearman\u0026rsquo;s rank correlation analysis as the weighted interconnection in neurons enables NNs to learn complex, non-linear relationships\u003csup\u003e54\u003c/sup\u003e. The \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta RR\\right|\\)\u003c/span\u003e\u003c/span\u003e time series were normalized using the Euclidean distance on a recording-wise basis to eliminate the influence of the absolute RR interval length, allowing for a relative consideration of RR changes.\u003c/p\u003e \u003cp\u003eFor the analysis at the recording level, we calculated the correlation between the mean \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta RR\\right|\\)\u003c/span\u003e\u003c/span\u003e (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta RR\\right|}\\)\u003c/span\u003e\u003c/span\u003e) and the mean \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\left|\\delta rR\\right|\\)\u003c/span\u003e\u003c/span\u003e (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta rR\\right|}\\)\u003c/span\u003e\u003c/span\u003e) across all recordings of a class for each segment type separately. Given the significance of absolute arrhythmia in AF, we postulated that an increased variability of the RR interval duration would be reflected in the model explanations, particularly in the long-term model, and would be expressed in an increased variability of the relevance of certain segments, particularly the QRS complex. Consequently, we expected \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta RR\\right|}\\)\u003c/span\u003e\u003c/span\u003e to be linearly reflected in \u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(\\stackrel{-}{\\left|\\delta rR\\right|}\\)\u003c/span\u003e\u003c/span\u003e and thus employed Pearson\u0026rsquo;s correlation analysis.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch1\u003eData Availability\u003c/h1\u003e\n\u003cp\u003eFor maximum comparability and reproducibility, only public databases were used for this study.\u0026nbsp;The saliency maps for the entire dataset and all statistics can be found in the appendices A-C. The associated relevance data and the segment boundaries will be made available upon publication.\u003c/p\u003e\n\u003ch1\u003eCode Availability\u003c/h1\u003e\n\u003cp\u003eThe matlab scripts utilized for statistical analysis, based on the relevance data and segment boundaries, are made publicly available upon publication.\u003c/p\u003e\n\u003ch1\u003eAcknowledgements\u003c/h1\u003e\n\u003cp\u003eThis study was partly supported by grants from the European Union\u0026rsquo;s Horizon 2020 research and innovation program (TIMELY, No. 101017424).\u003c/p\u003e\n\u003ch1\u003eAuthor contributions\u003c/h1\u003e\n\u003cp\u003eA.H. and M.S. designed the study concept. A.H., M.G., and M.S. made substantial contributions to the data preparation. M.G. implemented xECGArch and extracted model explanations. A.H. designed and implemented xFuseMap. A.H. performed the segmentation of model explanations. A.H. and M.S. conducted statistical analyses. A.H., H.M., A.L., S.R., N.M., and M.S. made substantial contributions to the interpretation of the results. A.H. is the first author. M.S. supervised this work. All authors contributed to the manuscript preparation, critical revisions, and approved the final version of the manuscript.\u003c/p\u003e\n\u003ch1\u003eCompeting interests\u003c/h1\u003e\n\u003cp\u003eA.H., M.G., H.M., and M.S. are the inventors of the patent application DE 10 2023 118 246.3, which covers the principle of xFuseMap presented in this paper and the architecture of xECGArch. The TU Dresden is the patent applicant.\u003c/p\u003e\n\u003cp\u003eA.L. and S.R. have received grants from Novartis and Edwards Lifesciences; personal fees from Abbott, Abiomed, AstraZeneca, Bayer, Boehringer Ingelheim, Boston Scientific, Edwards Lifesciences, Medtronic, Novartis, Sanofi Genzyme, and Pfizer; and other fees from Picardia, Filterlex, and Transverse Medical outside the submitted work.\u003c/p\u003e\n\u003cp\u003eN.M. reports personal fees from Edwards Lifesciences, Medtronic, Biotronik, Novartis, Sanofi Genzyme, AstraZeneca, Pfizer, Bayer, Abbott, Abiomed, B. Braun, and Boston Scientific, outside the submitted work.\u003c/p\u003e"},{"header":"References","content":"\u003cp\u003e1.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Vos, T. \u003cem\u003eet al.\u003c/em\u003e Global burden of 369 diseases and injuries in 204 countries and territories, 1990\u0026ndash;2019: a systematic analysis for the Global Burden of Disease Study 2019. \u003cem\u003eLancet\u003c/em\u003e \u003cstrong\u003e396\u003c/strong\u003e, 1204\u0026ndash;1222 (2020).\u003c/p\u003e\n\u003cp\u003e2.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Di Cesare, M. \u003cem\u003eet al.\u003c/em\u003e \u003cem\u003eWorld Heart Report 2023: Confronting the World\u0026rsquo;s Number One Killer\u003c/em\u003e. (2023).\u003c/p\u003e\n\u003cp\u003e3.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Roth, G. A. \u003cem\u003eet al.\u003c/em\u003e Global Burden of Cardiovascular Diseases and Risk Factors, 1990\u0026ndash;2019. \u003cem\u003eJ Am Coll Cardio\u003c/em\u003e \u003cstrong\u003e76\u003c/strong\u003e, 2982\u0026ndash;3021 (2020).\u003c/p\u003e\n\u003cp\u003e4.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Rosiek, A. \u0026amp; Leksowski, K. The risk factors and prevention of cardiovascular disease: the importance of electrocardiogram in the diagnosis and treatment of acute coronary syndrome. \u003cem\u003eTher Clin Risk Manag\u003c/em\u003e \u003cstrong\u003e12\u003c/strong\u003e, 1223\u0026ndash;1229 (2016).\u003c/p\u003e\n\u003cp\u003e5.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Wagner, P. \u003cem\u003eet al.\u003c/em\u003e PTB-XL, a large publicly available electrocardiography dataset. \u003cem\u003eSci Data\u003c/em\u003e \u003cstrong\u003e7\u003c/strong\u003e, 154 (2020).\u003c/p\u003e\n\u003cp\u003e6.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Stracina, T., Ronzhina, M., Redina, R. \u0026amp; Novakova, M. Golden Standard or Obsolete Method? Review of ECG Applications in Clinical and Experimental Context. \u003cem\u003eFront Physiol\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 867033 (2022).\u003c/p\u003e\n\u003cp\u003e7.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Janiesch, C., Zschech, P. \u0026amp; Heinrich, K. Machine learning and deep learning. \u003cem\u003eElectron Mark\u003c/em\u003e \u003cstrong\u003e31\u003c/strong\u003e, 685\u0026ndash;695 (2021).\u003c/p\u003e\n\u003cp\u003e8.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Bishop, C. M. \u003cem\u003ePattern Recognition and Machine Learning\u003c/em\u003e. (Springer, New York, NY, 2006). doi:10.1007/978-0-387-45528-0.\u003c/p\u003e\n\u003cp\u003e9.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Holzinger, A., Langs, G., Denk, H., Zatloukal, K. \u0026amp; M\u0026uuml;ller, H. Causability and Explainability of Artificial Intelligence in Medicine. \u003cem\u003eWIREs Data Mining Knowl Discov\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, e1312 (2019).\u003c/p\u003e\n\u003cp\u003e10.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Gershman, S. J., Horvitz, E. J. \u0026amp; Tenenbaum, J. B. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. \u003cem\u003eScience\u003c/em\u003e \u003cstrong\u003e349\u003c/strong\u003e, 273\u0026ndash;278 (2015).\u003c/p\u003e\n\u003cp\u003e11.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Vale, D., El-Sharif, A. \u0026amp; Ali, M. Explainable artificial intelligence (XAI) post-hoc explainability methods: risks and limitations in non-discrimination law. \u003cem\u003eAI Ethics\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, 815\u0026ndash;826 (2022).\u003c/p\u003e\n\u003cp\u003e12.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Taniguchi, H. \u003cem\u003eet al.\u003c/em\u003e Explainable Artificial Intelligence Model for Diagnosis of Atrial Fibrillation Using Holter Electrocardiogram Waveforms. \u003cem\u003eInt Heart J\u003c/em\u003e \u003cstrong\u003e62\u003c/strong\u003e, 534\u0026ndash;539 (2021).\u003c/p\u003e\n\u003cp\u003e13.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Ivaturi, P. \u003cem\u003eet al.\u003c/em\u003e A Comprehensive Explanation Framework for Biomedical Time Series Classification. \u003cem\u003eIEEE J Biomed Health Inform\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 2398\u0026ndash;2408 (2021).\u003c/p\u003e\n\u003cp\u003e14.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Bender, T. \u003cem\u003eet al.\u003c/em\u003e Analysis of a Deep Learning Model for 12-Lead ECG Classification Reveals Learned Features Similar to Diagnostic Criteria. \u003cem\u003eIEEE J Biomed Health Inform\u003c/em\u003e \u003cstrong\u003e28\u003c/strong\u003e, 1\u0026ndash;12 (2023).\u003c/p\u003e\n\u003cp\u003e15.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Goettling, M., Hammer, A., Malberg, H. \u0026amp; Schmidt, M. xECGArch: a trustworthy deep learning architecture for interpretable ECG analysis considering short-term and long-term features. \u003cem\u003eSci Rep\u003c/em\u003e \u003cstrong\u003e14\u003c/strong\u003e, 13122 (2024).\u003c/p\u003e\n\u003cp\u003e16.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Aufiero, S. \u003cem\u003eet al.\u003c/em\u003e A deep learning approach identifies new ECG features in congenital long QT syndrome. \u003cem\u003eBMC Med\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 162 (2022).\u003c/p\u003e\n\u003cp\u003e17.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Cao, Y. \u003cem\u003eet al.\u003c/em\u003e Detection and Localization of Myocardial Infarction Based on Multi-Scale ResNet and Attention Mechanism. \u003cem\u003eFront Physiol\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 783184 (2022).\u003c/p\u003e\n\u003cp\u003e18.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Jahmunah, V., Ng, E. Y. K., Tan, R.-S., Oh, S. L. \u0026amp; Acharya, U. R. Explainable detection of myocardial infarction using deep learning models with Grad-CAM technique on ECG signals. \u003cem\u003eComput Biol Med\u003c/em\u003e \u003cstrong\u003e146\u003c/strong\u003e, 105550 (2022).\u003c/p\u003e\n\u003cp\u003e19.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Prabhakararao, E. \u0026amp; Dandapat, S. Myocardial Infarction Severity Stages Classification From ECG Signals Using Attentional Recurrent Neural Network. \u003cem\u003eIEEE Sens J\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 8711\u0026ndash;8720 (2020).\u003c/p\u003e\n\u003cp\u003e20.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Singh, P. \u0026amp; Sharma, A. Interpretation and Classification of Arrhythmia Using Deep Convolutional Network. \u003cem\u003eIEEE Trans Instrum Meas\u003c/em\u003e \u003cstrong\u003e71\u003c/strong\u003e, 1\u0026ndash;12 (2022).\u003c/p\u003e\n\u003cp\u003e21.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Zhang, D., Yang, S., Yuan, X. \u0026amp; Zhang, P. Interpretable deep learning for automatic diagnosis of 12-lead electrocardiogram. \u003cem\u003eiScience\u003c/em\u003e \u003cstrong\u003e24\u003c/strong\u003e, 102373 (2021).\u003c/p\u003e\n\u003cp\u003e22.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Reddy, L., Talwar, V., Alle, S., Bapi, Raju. S. \u0026amp; Priyakumar, U. D. IMLE-Net: An Interpretable Multi-level Multi-channel Model for ECG Classification. in \u003cem\u003e2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC)\u003c/em\u003e 1068\u0026ndash;1074 (Melbourne, Australia, 2021). doi:10.1109/SMC52423.2021.9658706.\u003c/p\u003e\n\u003cp\u003e23.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Honarvar, H. \u003cem\u003eet al.\u003c/em\u003e Enhancing convolutional neural network predictions of electrocardiograms with left ventricular dysfunction using a novel sub-waveform representation. \u003cem\u003eCardiovasc Digit Health J\u003c/em\u003e \u003cstrong\u003e3\u003c/strong\u003e, 220\u0026ndash;231 (2022).\u003c/p\u003e\n\u003cp\u003e24.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Plante, E. \u0026amp; Vance, R. Selection of preschool language tests: a data-based approach. \u003cem\u003eLang Speech Hear Serv Sch\u003c/em\u003e \u003cstrong\u003e25\u003c/strong\u003e, 15\u0026ndash;24 (1994).\u003c/p\u003e\n\u003cp\u003e25.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Chugh, S. S. \u003cem\u003eet al.\u003c/em\u003e Worldwide Epidemiology of Atrial Fibrillation: a Global Burden of Disease 2010 Study. \u003cem\u003eCirculation\u003c/em\u003e \u003cstrong\u003e129\u003c/strong\u003e, 837\u0026ndash;847 (2014).\u003c/p\u003e\n\u003cp\u003e26.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Mou, L. \u003cem\u003eet al.\u003c/em\u003e Lifetime Risk of Atrial Fibrillation by Race and Socioeconomic Status: ARIC Study (Atherosclerosis Risk in Communities). \u003cem\u003eCirc Arrhythm Electrophysiol\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, e006350 (2018).\u003c/p\u003e\n\u003cp\u003e27.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Odutayo, A. \u003cem\u003eet al.\u003c/em\u003e Atrial fibrillation and risks of cardiovascular disease, renal disease, and death: systematic review and meta-analysis. \u003cem\u003eBMJ\u003c/em\u003e \u003cstrong\u003e354\u003c/strong\u003e, i4482 (2016).\u003c/p\u003e\n\u003cp\u003e28.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Hammer, A., Malberg, H. \u0026amp; Schmidt, M. Towards the Prediction of Atrial Fibrillation Using Interpretable ECG Features. in \u003cem\u003eComputing in Cardiology 2022\u003c/em\u003e vol. 49 1\u0026ndash;4 (Tampere, Finland, 2022).\u003c/p\u003e\n\u003cp\u003e29.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Lip, G. Y. H. \u003cem\u003eet al.\u003c/em\u003e Atrial fibrillation. \u003cem\u003eNat Rev Dis Primers\u003c/em\u003e \u003cstrong\u003e2\u003c/strong\u003e, 16016 (2016).\u003c/p\u003e\n\u003cp\u003e30.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Cohen, J. \u003cem\u003eStatistical Power Analysis for the Behavioral Sciences\u003c/em\u003e. (Erlbaum, Hillsdale, NJ, 1988).\u003c/p\u003e\n\u003cp\u003e31.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Brundel, B. J. J. M. \u003cem\u003eet al.\u003c/em\u003e Atrial fibrillation. \u003cem\u003eNat Rev Dis Primers\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 21 (2022).\u003c/p\u003e\n\u003cp\u003e32.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Dilaveris, P. E., F\u0026auml;rbom, P., Batchvarov, V., Ghuran, A. \u0026amp; Malik, M. Circadian behavior of P-wave duration, P-wave area, and PR interval in healthy subjects. \u003cem\u003eAnn Noninvasive Electrocardiol\u003c/em\u003e \u003cstrong\u003e6\u003c/strong\u003e, 92\u0026ndash;97 (2001).\u003c/p\u003e\n\u003cp\u003e33.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Ribeiro, A. H. \u003cem\u003eet al.\u003c/em\u003e Automatic diagnosis of the 12-lead ECG using a deep neural network. \u003cem\u003eNat Commun\u003c/em\u003e \u003cstrong\u003e11\u003c/strong\u003e, 1760 (2020).\u003c/p\u003e\n\u003cp\u003e34.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Goldberger, A. \u003cem\u003eet al.\u003c/em\u003e PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. \u003cem\u003eCirculation\u003c/em\u003e \u003cstrong\u003e101\u003c/strong\u003e, e215\u0026ndash;e220 (2000).\u003c/p\u003e\n\u003cp\u003e35.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Liu, F. F. \u003cem\u003eet al.\u003c/em\u003e An open access database for evaluating the algorithms of ECG rhythm and morphology abnormal detection. \u003cem\u003eJ Med Imaging Health Infor\u003c/em\u003e \u003cstrong\u003e8\u003c/strong\u003e, 1368\u0026ndash;1373 (2018).\u003c/p\u003e\n\u003cp\u003e36.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Zheng, J. \u003cem\u003eet al.\u003c/em\u003e Optimal Multi-Stage Arrhythmia Classification Approach. \u003cem\u003eSci Rep\u003c/em\u003e \u003cstrong\u003e10\u003c/strong\u003e, 2898 (2020).\u003c/p\u003e\n\u003cp\u003e37.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Zheng, J., Guo, H. \u0026amp; Chu. A large scale 12-lead electrocardiogram database for arrhythmia study. PhysioNet https://doi.org/10.13026/wgex-er52 (2022).\u003c/p\u003e\n\u003cp\u003e38.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Perez Alday, E. A. \u003cem\u003eet al.\u003c/em\u003e Classification of 12-lead ECGs: The PhysioNet/Computing in Cardiology Challenge 2020. \u003cem\u003ePhysiol. Meas.\u003c/em\u003e \u003cstrong\u003e41\u003c/strong\u003e, 124003 (2021).\u003c/p\u003e\n\u003cp\u003e39.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Perez Alday, E. A. \u003cem\u003eet al.\u003c/em\u003e Classification of 12-lead ECGs: The PhysioNet/Computing in Cardiology Challenge 2020. PhysioNet https://doi.org/10.13026/dvyd-kd57.\u003c/p\u003e\n\u003cp\u003e40.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Wagner, P., Strodthoff, N., Bousseljot, R.-D., Samek, W. \u0026amp; Schaeffter, T. PTB-XL, a large publicly available electrocardiography dataset. PhysioNet https://doi.org/10.13026/kfzx-aw45 (2022).\u003c/p\u003e\n\u003cp\u003e41.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Schmitz, B. \u003cem\u003eet al.\u003c/em\u003e Patient-centered cardiac rehabilitation by AI-powered lifestyle intervention \u0026ndash; the timely approach. \u003cem\u003eAtherosclerosis\u003c/em\u003e \u003cstrong\u003e355\u003c/strong\u003e, 251 (2022).\u003c/p\u003e\n\u003cp\u003e42.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Hammer, A., Goettling, M., Malberg, H., Linke, A. \u0026amp; Schmidt, M. An explainable AI for trustworthy detection of atrial fibrillation on reduced lead ECGs in mobile applications. \u003cem\u003eEur Heart J\u003c/em\u003e \u003cstrong\u003e45\u003c/strong\u003e, (accepted).\u003c/p\u003e\n\u003cp\u003e43.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Nault, I. \u003cem\u003eet al.\u003c/em\u003e Clinical value of fibrillatory wave amplitude on surface ECG in patients with persistent atrial fibrillation. \u003cem\u003eJ Interv Card Electrophysiol\u003c/em\u003e \u003cstrong\u003e26\u003c/strong\u003e, 11\u0026ndash;19 (2009).\u003c/p\u003e\n\u003cp\u003e44.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Park, J. \u003cem\u003eet al.\u003c/em\u003e Early differentiation of long-standing persistent atrial fibrillation using the characteristics of fibrillatory waves in surface ECG multi-leads. \u003cem\u003eSci Rep\u003c/em\u003e \u003cstrong\u003e9\u003c/strong\u003e, 2746 (2019).\u003c/p\u003e\n\u003cp\u003e45.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Alber, M. \u003cem\u003eet al.\u003c/em\u003e iNNvestigate Neural Networks! \u003cem\u003eJ Mach Learn Res\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 1\u0026ndash;8 (2019).\u003c/p\u003e\n\u003cp\u003e46.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Pilia, N. \u003cem\u003eet al.\u003c/em\u003e ECGdeli - An open source ECG delineation toolbox for MATLAB. \u003cem\u003eSoftwareX\u003c/em\u003e \u003cstrong\u003e13\u003c/strong\u003e, 100639 (2021).\u003c/p\u003e\n\u003cp\u003e47.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Hammer, A., Malberg, H. \u0026amp; Schmidt, M. Cardiovascular Reflections of Sympathovagal Imbalance Precede the Onset of Atrial Fibrillation. in \u003cem\u003eComputing in Cardiology 2023\u003c/em\u003e vol. 50 1\u0026ndash;4 (Atlanta (GA), USA, 2023).\u003c/p\u003e\n\u003cp\u003e48.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Schmidt, M., Baumert, M., Porta, A., Malberg, H. \u0026amp; Zaunseder, S. Two-Dimensional Warping for One-Dimensional Signals\u0026mdash;Conceptual Framework and Application to ECG Processing. \u003cem\u003eIEEE Trans Signal Process\u003c/em\u003e \u003cstrong\u003e62\u003c/strong\u003e, 5577\u0026ndash;5588 (2014).\u003c/p\u003e\n\u003cp\u003e49.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Schmidt, M., Baumert, M., Malberg, H. \u0026amp; Zaunseder, S. Iterative two-dimensional signal warping\u0026mdash;Towards a generalized approach for adaption of one-dimensional signals. \u003cem\u003eBiomed Signal Process Control\u003c/em\u003e \u003cstrong\u003e43\u003c/strong\u003e, 311\u0026ndash;319 (2018).\u003c/p\u003e\n\u003cp\u003e50.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Hammer, A. \u003cem\u003eet al.\u003c/em\u003e Automatic Classification of Full- And Reduced-Lead Electrocardiograms Using Morphological Feature Extraction. in \u003cem\u003eComputing in Cardiology 2021\u003c/em\u003e vol. 48 1\u0026ndash;4 (Brno, Czech Republic, 2021).\u003c/p\u003e\n\u003cp\u003e51.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Johnson, A. E., Behar, J., Andreotti, F., Clifford, G. D. \u0026amp; Oster, J. R-Peak Estimation Using Multimodal Lead Switching. in \u003cem\u003eComputing in Cardiology 2014\u003c/em\u003e vol. 41 281\u0026ndash;284 (2014).\u003c/p\u003e\n\u003cp\u003e52.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Hammer, A., Malberg, H. \u0026amp; Schmidt, M. Morphology Features Self-Learned by Explainable Deep Learning for Atrial Fibrillation Detection Correspond to Fibrillatory Waves. in \u003cem\u003eComputing in Cardiology 2024\u003c/em\u003e vol. 51 1\u0026ndash;4 (Karlsruhe, Germany, accepted).\u003c/p\u003e\n\u003cp\u003e53.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Blanca, M. J., Alarc\u0026oacute;n, R., Arnau, J., Bono, R. \u0026amp; Bendayan, R. Non-normal data: Is ANOVA still a valid option? \u003cem\u003ePsicothema\u003c/em\u003e \u003cstrong\u003e29\u003c/strong\u003e, 552\u0026ndash;557 (2017).\u003c/p\u003e\n\u003cp\u003e54.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J. \u0026amp; M\u0026uuml;ller, K.-R. Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications. \u003cem\u003eProc IEEE\u003c/em\u003e \u003cstrong\u003e109\u003c/strong\u003e, 247\u0026ndash;278 (2021).\u003c/p\u003e\n\u003cp\u003e55.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Selvaraju, R. R. \u003cem\u003eet al.\u003c/em\u003e Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. in \u003cem\u003e2017 IEEE International Conference on Computer Vision\u003c/em\u003e 618\u0026ndash;626 (Venezia, Italy, 2017). doi:10.1109/ICCV.2017.74.\u003c/p\u003e\n\u003cp\u003e56.\u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;\u0026nbsp;Shrikumar, A., Greenside, P. \u0026amp; Kundaje, A. Learning important features through propagating activation differences. in \u003cem\u003eProceedings of the 34th International Conference on Machine Learning - Volume 70\u003c/em\u003e 3145\u0026ndash;3153 (JMLR.org, Sydney, NSW, Australia, 2017).\u003c/p\u003e\n\u003cp\u003e57. \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Clifford, G. D. \u003cem\u003eet al.\u003c/em\u003e AF Classification from a Short Single Lead ECG Recording: the PhysioNet/Computing in Cardiology Challenge 2017. \u003cem\u003eComput Cardiol (2010)\u003c/em\u003e \u003cstrong\u003e44\u003c/strong\u003e, (2017).\u003c/p\u003e\n\u003cp\u003e58. \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp; \u0026nbsp;Moody, G. B. \u0026amp; Mark, R. G. The impact of the MIT-BIH Arrhythmia Database. \u003cem\u003eIEEE Eng Med Biol Mag\u003c/em\u003e \u003cstrong\u003e20\u003c/strong\u003e, 45\u0026ndash;50 (2001).\u003c/p\u003e"},{"header":"Tables 1 and 3","content":"\u003cp\u003eTables 1 and 3 are available in the Supplementary Files section.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"npj-artificial-intelligence","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [npj Artificial Intelligence](https://www.nature.com/npjai)","snPcode":"443878","submissionUrl":"https://submission.springernature.com/new-submission/443878/3","title":"npj Artificial Intelligence","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-4655592/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4655592/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eDeep learning (DL) has demonstrated high accuracy in ECG analysis but lacks in explainability. Although explanations can be estimated using explainable artificial intelligence, their causality has not yet been sufficiently investigated. We present a generalizable method for extensively validating the DL explanations\u0026rsquo; causality by relating them to clinically relevant ECG characteristics. We applied xECGArch, combining a long-term and a short-term model, for atrial fibrillation (AF) detection in 1,521 single-lead ECGs, achieving an accuracy of 96.3%. The explanations match the diagnostic criteria of AF regarding rhythm and morphology. While the short-term model emphasizes morphology features such as P and fibrillatory waves, the long-term model focuses on QRS complexes. Moreover, the long-term model explanations strongly correlate with rhythm (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\(p\u0026lt;0.001\\)\u003c/span\u003e\u003c/span\u003e). For improved clinical interpretability, we introduce a fused representation (xFuseMap), highlighting relevant explanations for rhythm and morphology. We thus demonstrate an explainable and interpretable DL application with potential for providing diagnostic support.\u003c/p\u003e","manuscriptTitle":"Fusion of automatically learned rhythm and morphology features matches diagnostic criteria and enhances AI explainability","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-23 14:31:39","doi":"10.21203/rs.3.rs-4655592/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-03-16T06:39:06+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-01-25T20:53:28+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"214498730698903018562733110521051709712","date":"2025-01-24T03:03:01+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-10-22T22:00:28+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"76892699042882188917964442581646046443","date":"2024-10-14T13:43:15+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"96220244436766147393933650215500589938","date":"2024-08-16T15:28:24+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"141036041080837483170754584332701088907","date":"2024-08-16T14:18:02+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-08-14T06:24:04+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-07-31T06:46:35+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-06-28T16:40:19+00:00","index":"","fulltext":""},{"type":"submitted","content":"npj Artificial Intelligence","date":"2024-06-28T15:06:17+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"npj-artificial-intelligence","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [npj Artificial Intelligence](https://www.nature.com/npjai)","snPcode":"443878","submissionUrl":"https://submission.springernature.com/new-submission/443878/3","title":"npj Artificial Intelligence","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"69d83d34-7b87-41f2-95b3-be9fe02ada25","owner":[],"postedDate":"August 23rd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":36221051,"name":"Physical sciences/Mathematics and computing/Computational science"},{"id":36221052,"name":"Physical sciences/Mathematics and computing/Computer science"}],"tags":[],"updatedAt":"2025-11-14T18:40:00+00:00","versionOfRecord":{"articleIdentity":"rs-4655592","link":"https://doi.org/10.1038/s44387-025-00022-w","journal":{"identity":"npj-artificial-intelligence","isVorOnly":false,"title":"npj Artificial Intelligence"},"publishedOn":"2025-08-28 00:00:00","publishedOnDateReadable":"August 28th, 2025"},"versionCreatedAt":"2024-08-23 14:31:39","video":"","vorDoi":"10.1038/s44387-025-00022-w","vorDoiUrl":"https://doi.org/10.1038/s44387-025-00022-w","workflowStages":[]},"version":"v1","identity":"rs-4655592","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4655592","identity":"rs-4655592","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.