A Lightweight, CPU-Deployable, and Interpretable ECG Arrhythmia Classification Pipeline Using the MIT-BIH Arrhythmia Database | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Lightweight, CPU-Deployable, and Interpretable ECG Arrhythmia Classification Pipeline Using the MIT-BIH Arrhythmia Database Weihao Cheng This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8415145/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Electrocardiogram (ECG) arrhythmia classification is a foundational task in medical artificial intelligence, yet many high-performing deep learning approaches require GPU resources and may be difficult to interpret in clinical settings. This study presents a lightweight, CPU-deployable, and interpretable pipeline for heartbeat-level arrhythmia classification using the MIT-BIH Arrhythmia Database. ECG beats were segmented around annotated R-peaks and mapped into five AAMI-style aggregated classes (N, S, V, F, Q). We extracted compact time-domain statistics and frequency-domain energy features, then trained and compared three classical machine learning models: Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM). Using a record-wise split to reduce data leakage risk, RF achieved the highest accuracy (0.864), while LR provided the strongest one-vs-rest ROC-AUC (0.806). Class-wise ROC curves and feature-importance analysis suggested that spectral energy and amplitude-related statistics contributed substantially to discrimination. Overall, the results demonstrate that interpretable, resource-efficient ECG classification remains feasible without deep networks, supporting practical deployment in CPU-only environments and rapid prototyping of medical AI systems. Biotechnology and Bioengineering ECG arrhythmia classification MIT-BIH interpretable machine learning feature engineering CPU deployment Figures Figure 1 Figure 2 Figure 3 Figure 4 1. Introduction Arrhythmias are a major contributor to morbidity and mortality worldwide, and ECG remains the most accessible noninvasive signal for screening and diagnosis. Automated ECG interpretation has therefore become a central topic in medical AI, with applications ranging from bedside monitoring to ambulatory Holter analysis. Public physiological signal repositories have played an essential role in enabling reproducible research; among them, PhysioNet provides datasets and tooling that have shaped decades of signal-processing and machine-learning development in biomedical informatics [ 1 ]. The MIT-BIH Arrhythmia Database is one of the most influential standardized ECG benchmarks and has historically served as a common evaluation substrate for arrhythmia detection algorithms [ 2 ][ 3 ]. Despite the recent dominance of deep learning, classical machine learning pipelines remain attractive in scenarios where computational budgets are limited or interpretability is prioritized—e.g., embedded devices, CPU-only hospital systems, or rapid clinical decision support prototypes. This work focuses on a “small but complete” engineering contribution: a fully reproducible, CPU-deployable pipeline that (i) segments beats from annotated R-peaks, (ii) uses lightweight handcrafted features, (iii) evaluates multiple classical models under a record-wise split, and (iv) provides interpretable feature-importance analysis. The goal is not to claim state-of-the-art performance, but to demonstrate that a practical medical AI pipeline can be built quickly, transparently, and reproducibly with a modest feature set—an approach that can serve as a strong baseline for subsequent deep-learning expansion. 2. Related Work 2.1 Public ECG resources and benchmarking PhysioNet and its associated toolkits were introduced to support research on complex physiologic signals, making datasets like MIT-BIH widely accessible for benchmarking [ 1 ]. The MIT-BIH Arrhythmia Database, in particular, has had enduring impact on algorithm evaluation and comparative studies of arrhythmia detectors [ 2 ][ 3 ]. 2.2 AAMI-style beat grouping for arrhythmia classification Many ECG heartbeat-classification studies group beat annotations into higher-level categories aligned with AAMI testing and reporting guidance, commonly using the N/S/V/F/Q-style aggregation for algorithm evaluation [ 4 ]. This reduces label granularity while maintaining clinically meaningful groupings for supraventricular and ventricular ectopy assessment. 2.3 Classical ML vs. deep learning Feature-based methods have historically delivered strong performance on arrhythmia classification and remain valuable for interpretability and efficiency. For example, mixture-of-features approaches have been used to classify AAMI-recommended classes on MIT-BIH-like settings [ 5 ]. Deep learning methods can further improve performance by learning representations directly from raw waveforms; for instance, convolutional architectures have been reported for heartbeat classification under AAMI-style grouping [ 6 ]. However, deep models may require more computation, careful tuning, and can be harder to interpret—motivating lightweight baselines such as the pipeline proposed here. 3. Methods 3.1 Dataset and access We used the MIT-BIH Arrhythmia Database hosted on PhysioNet [ 3 ]. Signals and annotations were programmatically accessed using the WFDB Python package, which provides reading and processing utilities for physiological waveforms and annotation files [ 7 ]. Dataset rationale MIT-BIH has long been used as a reference benchmark for arrhythmia detection and evaluation, making it suitable for a reproducible baseline study [ 2 ][ 3 ]. 3.2 Beat segmentation For each recording, we used annotated R-peak positions from the standard atr annotation stream and extracted a fixed-length beat-centered window. Each beat segment covered 0.2 s pre-R and 0.4 s post-R (total 0.6 s). This window provides a compact representation of local morphology and rhythm context while remaining computationally efficient. 3.3 Label mapping (five-class aggregation) Each beat annotation symbol was mapped into one of five aggregated categories (N, S, V, F, Q), consistent with widely used AAMI-style evaluation conventions [ 4 ]. Briefly: N normal and bundle-branch block–type beats S supraventricular ectopic beats V ventricular ectopic beats F fusion beats Q paced/unknown/other beats (as applicable under aggregation) 3.4 Feature extraction We extracted a lightweight feature set designed for CPU efficiency and interpretability: Time-domain statistics mean, standard deviation, min, max, peak-to-peak amplitude (ptp), RMS, and mean absolute value. Frequency-domain energy we computed a real FFT for each beat (after normalization) and summarized spectral energy in coarse bands (five bins) plus total energy. These features are intentionally compact so that (i) training and inference remain fast on CPU, and (ii) feature importance can be directly interpreted and discussed. 3.5 Models and training protocol We trained and compared three classical models: Logistic Regression (LR) with standardized features and class balancing Random Forest (RF) with balanced subsampling Support Vector Machine (SVM) with RBF kernel, standardized features, and probability estimates Record-wise split To reduce overly optimistic performance caused by correlated beats from the same record appearing in both train and test, we used a record-level grouping split (GroupShuffleSplit). This is an important engineering step for fairer evaluation in beat-level tasks. 3.6 Evaluation metrics Given the strong class imbalance typical of MIT-BIH-derived beat datasets, we reported: Accuracy Macro-F1 (emphasizes minority-class performance) One-vs-rest ROC-AUC (OVR AUC) for multi-class discrimination assessment 4. Experiments 4.1 Dataset Summary, Class Distribution, and Experimental Setup A total of 109,460 heartbeat segments were extracted from the MIT-BIH Arrhythmia Database using beat-centered windows (0.2 s pre-R and 0.4 s post-R). The resulting dataset exhibited a highly imbalanced class distribution, which is typical for long-term ambulatory ECG recordings: N constituted the majority class, whereas F and S were minor classes. This imbalance is clinically realistic but introduces an evaluation challenge: a model may achieve high accuracy by predominantly predicting the majority class, while still performing poorly for rare yet clinically important arrhythmias. To mitigate overly optimistic evaluation caused by leakage across beats from the same record, we adopted a record-wise split (GroupShuffleSplit). Under this protocol, beats from the same record are assigned exclusively to either training or test sets, which better approximates generalization to unseen recordings. We report three complementary metrics: Accuracy, Macro-F1, and one-vs-rest ROC-AUC (OVR AUC). Accuracy reflects overall correctness; Macro-F1 emphasizes balanced performance across classes by averaging per-class F1 equally; OVR AUC captures threshold-independent separability for each class against the rest and is especially informative in imbalanced multi-class settings. Table 1 Model comparison on MIT-BIH (record-wise split) Model Accuracy Macro-F1 OVR AUC Logistic Regression 0.424 0.365 0.806 Random Forest 0.864 0.332 0.709 SVM 0.604 0.313 0.704 The results show a clear trade-off: Random Forest (RF) achieved the highest accuracy (0.864), suggesting strong performance on the dominant class and robust rule-based partitioning of feature space. However, its Macro-F1 (0.332) is lower than LR’s, implying that performance on minor classes is not proportionally improved. Logistic Regression (LR) achieved the best OVR AUC (0.806) and highest Macro-F1 (0.365) among the three. This indicates better overall class separability and a more balanced performance distribution than RF, despite lower accuracy. SVM achieved moderate accuracy (0.604) and Macro-F1 (0.313). In this pipeline, SVM may be limited by feature simplicity and class overlap, especially between N and S. 4.2 ROC Analysis: Class-wise Discrimination Performance In multi-class arrhythmia classification, ROC curves provide a class-wise view of how well the model separates each category from the rest across all thresholds. Figure 1 presents one-vs-rest ROC curves for the five aggregated classes. Two patterns are especially noteworthy: (1) Strong separability for Q and V classes. The ROC curves indicate that Q and V are more easily separable under the current feature representation. This may arise because paced/unknown-like beats and ventricular ectopy often introduce distinct morphological or spectral energy characteristics. In practical terms, a high AUC implies that even if the final decision threshold is adjusted (e.g., to reduce false positives), the model can maintain relatively strong sensitivity-specificity trade-offs. (2) Lower separability for S class. The S class has the lowest ROC performance among the five, which is consistent with the clinical and algorithmic difficulty of distinguishing supraventricular ectopy from normal beats using short-window morphology alone. Supraventricular beats may differ subtly, and the discriminatory signal may require rhythm context (e.g., RR intervals, local variability) rather than purely local morphology. This observation sets up a concrete future-work direction: add RR-based features or longer context windows while still remaining CPU-friendly. 4.3 Confusion matrix (error patterns) The normalized confusion matrix (Fig. 2 ) reveals the structure of misclassifications and provides insight beyond scalar metrics. (1) Diagonal dominance and class stability Several classes show strong diagonal values (e.g., Q and V), indicating stable recognition patterns. This supports the ROC findings and suggests that the chosen feature set contains adequate discriminative cues for these classes. (2) N–S confusions: the central bottleneck A prominent error mode is confusion between N and S, which is common in heartbeat-level classification. This is an important point to discuss carefully in SCI writing: Engineering explanation the feature set emphasizes amplitude statistics and coarse spectral energy; these may not fully capture subtle supraventricular morphological differences. Physiological explanation supraventricular beats can appear similar to normal beats in short segments, and the discriminating cues often emerge in timing (prematurity) or broader context rather than morphology alone. (3) Minor classes and imbalance sensitivity For minor classes such as F, the matrix often shows dispersion into major classes. This is expected under imbalance and limited sample size. It is also the reason Macro-F1 is relatively low even when accuracy is high. (4) Why this matters clinically From a clinical screening perspective, false negatives in arrhythmia classes (e.g., predicting N for an arrhythmia beat) are typically more concerning than false positives. The confusion matrix gives you a ready-made narrative to justify future threshold calibration (e.g., increase sensitivity for S/V) and cost-sensitive training if the deployment scenario demands it. 4.4 Feature importance (interpretability) Figure 3 shows the top-10 features ranked by absolute LR coefficients. Among them, amplitude-related statistics (e.g., absolute mean, RMS, standard deviation) and coarse spectral energy features (fft bands) were consistently influential. This supports the intuition that both waveform magnitude and frequency characteristics contribute to class separation in beat-level arrhythmia classification. 4.5 Representative spectral patterns (qualitative visualization) Figure 4 provides a qualitative visualization of representative spectral-energy profiles for each class. Although the curves share a common shape (expected due to normalization and the inherent structure of ECG signals), subtle differences are visible across bands. This plot supports two important points: Coarse spectral summarization is informative. Even with only five energy bands, the model benefits from frequency-domain information. Classes differ in nuanced ways, which may not be fully separable using only coarse bands, especially for borderline categories such as N vs S. This further motivates augmenting the feature set with rhythm context or more refined spectral descriptors if higher sensitivity is needed. 5. Discussion This study highlights a practical and reproducible approach to ECG arrhythmia classification that is computationally efficient and interpretable. Importantly, the results illuminate the trade-offs between accuracy-driven performance and balanced multi-class recognition under severe imbalance. 5.1 Why classical ML remains valuable in medical AI While deep learning can be highly accurate, real-world clinical and engineering settings often require rapid prototyping, transparency, and deployment feasibility on limited hardware. The proposed pipeline addresses these constraints by using compact features and CPU-friendly models, making it attractive for: embedded or bedside deployments, research baselines and ablation studies, educational settings and rapid iteration cycles. The interpretability of Logistic Regression is particularly valuable: coefficients provide a direct mapping from features to decisions, allowing error analysis and feature refinement without resorting to opaque latent representations. 5.2 Interpreting the metric trade-offs (Accuracy vs Macro-F1 vs AUC) Your results show an instructive pattern: RF has high accuracy but relatively low Macro-F1. Under imbalance, accuracy can be dominated by the majority class (N). RF may learn highly effective rules for N, improving accuracy substantially, but still struggle with minority classes. LR has the best AUC and Macro-F1 among the three. This indicates better overall separability and a more balanced classification profile, even if the default thresholding yields lower accuracy. 5.3 Error analysis: why N and S are hard, and what to do next The confusion matrix suggests that N–S discrimination is the main bottleneck. This is expected when using short beat windows and morphology-focused features. Two low-cost improvements are strongly justified: RR interval features: pre-RR, post-RR, RR ratio, local mean RR, etc. These capture prematurity and rhythm changes that are crucial for supraventricular ectopy. Morphological descriptors: approximate QRS width, slopes, or wavelet energies; these can be computed efficiently without GPUs. You can present this as a targeted future direction: “improving features is likely more effective than simply changing the classifier.” 5.4 Clinical translation considerations From a deployment perspective, misclassifications have asymmetric cost. In many screening contexts, sensitivity for arrhythmias (S/V) may be prioritized over specificity, suggesting: probability calibration, class-specific thresholds, cost-sensitive training. Your pipeline is well-suited to such adjustments because it outputs calibrated-like probabilities (for LR/SVM) and can be tuned without extensive retraining overhead. 6.Conclusion and Future Work 6.1 Conclusion We developed a lightweight, interpretable, and CPU-deployable pipeline for heartbeat-level arrhythmia classification using beat-centered segmentation and compact handcrafted features. Across three classical models, Random Forest achieved the highest accuracy (0.864), while Logistic Regression achieved the best discrimination ability measured by OVR ROC-AUC (0.806) and the highest Macro-F1 among compared models (0.365). The ROC and confusion-matrix analyses indicate strong separability for Q and V classes and identify N–S confusion as the primary challenge, consistent with limited rhythm context in short beat windows. Feature-importance analysis further demonstrates that amplitude-related statistics and coarse spectral energy features drive model decisions, supporting transparent interpretation and efficient deployment. 6.2 Future Work To strengthen both performance and clinical relevance while preserving CPU efficiency, future work will pursue: 6.2.1 Rhythm-aware feature augmentation: Add pre-/post-RR intervals, RR ratios, and local HRV descriptors to improve discrimination between N and S, which is the dominant error mode observed. 6.2.2 Feature refinement with minimal overhead: Explore wavelet energy features, QRS width approximations, and slope-based morphology descriptors. These remain computationally light but can capture subtler morphological cues. 6.2.3 Imbalance-aware learning and thresholding: Investigate cost-sensitive optimization, re-sampling strategies, and class-specific decision thresholds. This is especially relevant for minority classes such as F. 6.2.4 External validation and robustness testing: Validate the pipeline on additional ECG datasets and evaluate robustness to noise, baseline wander, and lead variations to better approximate real-world clinical conditions. 6.2.5 Deployment-oriented evaluation: Benchmark inference latency and memory footprint on CPU-only devices and evaluate calibrated probabilities for safe integration into clinical decision support. References Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220. https://doi.org/10.1161/01.CIR.101.23.e215 Moody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag 20(3):45–50. https://doi.org/10.1109/51.932724 PhysioNet (n.d.). MIT-BIH Arrhythmia Database (mitdb), v1.0.0. Retrieved from https://physionet.org/content/mitdb/1.0.0/ Association for the Advancement of Medical Instrumentation (2012) /2020 ANSI/AAMI EC57: Testing and reporting performance results of cardiac rhythm and ST segment measurement algorithms (ANSI/AAMI EC57:2012 (R2020)). (Preview). Retrieved from https://webstore.ansi.org/preview-pages/AAMI/preview_ANSI%2BAAMI%2BEC57-2012%2B%28R2020%29.pdf Das MK, Ari S (2014) & (others as listed in the article). (2014). ECG beats classification using mixture of features. Comput Math Methods Med, Article ID 178414. https://pmc.ncbi.nlm.nih.gov/articles/PMC4897569/ Kachuee M, Fazeli S, Sarrafzadeh M (2018) ECG heartbeat classification: A deep transferable representation. arXiv preprint, arXiv:1805.00794. https://arxiv.org/abs/1805.00794 PhysioNet (2023) Waveform Database Software Package (WFDB) for Python. Retrieved from https://physionet.org/content/wfdb-python/ Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8415145","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":563551721,"identity":"fd81c17a-5a1f-4401-9e58-d36fa8064ba8","order_by":0,"name":"Weihao Cheng","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA9UlEQVRIiWNgGAWjYFACxjYwZQAiPjYww9nEaWGcSZwWBja4FmZeYrQYHG9ue/CjgkHenL338GvbHdaJDezN2yQYau7g1nLmYLthzxkGw50959Ksc8+kJzbwHCuTYDj2DKcWsxuJbRK8bQwJBjdyzIxz2w4nNkjkmEkwNhzGreX+wzbJvyAt99+YGVuCtMi/IaDlBmObNMQWHuPHjGBbePBrsT+T2CYtA/TLhjM5Zoy9benGbTxpxRYJx3BrkWw//kzyDTDEDI6fMf7ws81atp/98MYbH2pwa4GC/yCCTQJMgogEQhqggPkDkQpHwSgYBaNghAEAeq5WAC2EZlAAAAAASUVORK5CYII=","orcid":"https://orcid.org/0009-0009-1891-4760","institution":"Hangzhou Dianzi University","correspondingAuthor":true,"prefix":"","firstName":"Weihao","middleName":"","lastName":"Cheng","suffix":""}],"badges":[],"createdAt":"2025-12-21 04:36:38","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-8415145/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8415145/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":98849703,"identity":"4002c0d4-1bb5-47d2-9846-8b0c09b00613","added_by":"auto","created_at":"2025-12-23 05:56:09","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":352015,"visible":true,"origin":"","legend":"","description":"","filename":"artical1.docx","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/a4d11e7de802d7c024d9ca68.docx"},{"id":98849697,"identity":"279d32f9-4e28-4099-a856-b2347a10d0c9","added_by":"auto","created_at":"2025-12-23 05:56:08","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":342,"visible":true,"origin":"","legend":"","description":"","filename":"rs8415145.json","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/26c358dc0a6cf470c34aec34.json"},{"id":99308164,"identity":"e8ad4cf0-2f9c-4887-9d39-2a3c1bdefa69","added_by":"auto","created_at":"2025-12-31 16:07:53","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":41396,"visible":true,"origin":"","legend":"","description":"","filename":"rs84151450enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/61a89d3c26e12cee24e7c239.xml"},{"id":99308640,"identity":"1e8a071b-985e-4140-93f9-26b6c0c01d01","added_by":"auto","created_at":"2025-12-31 16:08:53","extension":"jpeg","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":240865,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/e01b9523355a89edda0edfa3.jpeg"},{"id":98849709,"identity":"a5e46f1d-2d71-4ec4-a245-4e82fa115032","added_by":"auto","created_at":"2025-12-23 05:56:09","extension":"jpeg","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":201938,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/25cae56b645597ff249c4c3b.jpeg"},{"id":98849711,"identity":"36aca06d-811a-4c2d-9a16-3d15f4105b44","added_by":"auto","created_at":"2025-12-23 05:56:09","extension":"jpeg","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":175913,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/ff3acb21fee772f0be4a9c66.jpeg"},{"id":98849708,"identity":"cf08196c-9d37-4dde-a55d-d62743bd1c58","added_by":"auto","created_at":"2025-12-23 05:56:09","extension":"jpeg","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":129653,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/c38a3351d22fc291d9623961.jpeg"},{"id":98849706,"identity":"46bd7d5f-3e3b-473b-abc3-f473e97122a5","added_by":"auto","created_at":"2025-12-23 05:56:09","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":48143,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/bb7367d3a052a317ba78df9d.png"},{"id":98849702,"identity":"410c8470-fdac-49cc-b96f-30de3397ad52","added_by":"auto","created_at":"2025-12-23 05:56:09","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":39121,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/1c452c0e6adec50243a679f4.png"},{"id":99308472,"identity":"7481872a-7384-4795-b21f-dbc69dd9711d","added_by":"auto","created_at":"2025-12-31 16:08:37","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":28922,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/ac1a7f86abfbbe2d6a4116f9.png"},{"id":99308311,"identity":"b1bf6b86-c72d-490f-9fd9-5a8f3aeba7c6","added_by":"auto","created_at":"2025-12-31 16:08:15","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":23861,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/2afbca79a0abcf903e392326.png"},{"id":98849713,"identity":"439d5f5b-4105-433a-a29c-cbb3bd66e114","added_by":"auto","created_at":"2025-12-23 05:56:09","extension":"xml","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":40704,"visible":true,"origin":"","legend":"","description":"","filename":"rs84151450structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/ef3d970a0f5025032d7060e9.xml"},{"id":98849712,"identity":"e9a58ed0-dc07-4c9a-8202-395bf32c7ca3","added_by":"auto","created_at":"2025-12-23 05:56:09","extension":"html","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":47293,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/76b6e234ed1a6064c483ee37.html"},{"id":98849696,"identity":"499c8820-d78a-4f7b-8300-9f9ab27e967c","added_by":"auto","created_at":"2025-12-23 05:56:08","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":141217,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eClass-wise ROC curves for five-class ECG arrhythmia classification (one-vs-rest)\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/baeb24e38c5964147430be58.png"},{"id":98849700,"identity":"ea9c26a5-562a-4a58-ad77-c732b0df0099","added_by":"auto","created_at":"2025-12-23 05:56:09","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":95043,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eNormalized confusion matrix revealing dominant error modes across five arrhythmia categories\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/c6bfebea58976f8f78e9bb6d.png"},{"id":98849698,"identity":"5502283c-0b83-4439-8bae-1587170be5e0","added_by":"auto","created_at":"2025-12-23 05:56:08","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":73928,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eTop-10 most influential handcrafted features in Logistic Regression (absolute coefficient magnitude)\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/fd3a516133f8661cc810f7f9.png"},{"id":98849699,"identity":"1c223b2e-028c-441e-a196-63bff2ebe441","added_by":"auto","created_at":"2025-12-23 05:56:09","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":68096,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eRepresentative spectral-energy signatures across arrhythmia classes using coarse FFT band summaries\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/82bec56135a62ea97477424a.png"},{"id":99322673,"identity":"1d3d2ec4-3e35-4e94-a7dd-e260edc5b19f","added_by":"auto","created_at":"2025-12-31 16:43:54","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1454735,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8415145/v1/e90cdc92-6f70-435d-932a-44cc6af69b33.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eA Lightweight, CPU-Deployable, and Interpretable ECG Arrhythmia Classification Pipeline Using the MIT-BIH Arrhythmia Database\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eArrhythmias are a major contributor to morbidity and mortality worldwide, and ECG remains the most accessible noninvasive signal for screening and diagnosis. Automated ECG interpretation has therefore become a central topic in medical AI, with applications ranging from bedside monitoring to ambulatory Holter analysis. Public physiological signal repositories have played an essential role in enabling reproducible research; among them, PhysioNet provides datasets and tooling that have shaped decades of signal-processing and machine-learning development in biomedical informatics [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe MIT-BIH Arrhythmia Database is one of the most influential standardized ECG benchmarks and has historically served as a common evaluation substrate for arrhythmia detection algorithms [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e][\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Despite the recent dominance of deep learning, classical machine learning pipelines remain attractive in scenarios where computational budgets are limited or interpretability is prioritized\u0026mdash;e.g., embedded devices, CPU-only hospital systems, or rapid clinical decision support prototypes.\u003c/p\u003e \u003cp\u003eThis work focuses on a \u0026ldquo;small but complete\u0026rdquo; engineering contribution: a fully reproducible, CPU-deployable pipeline that (i) segments beats from annotated R-peaks, (ii) uses lightweight handcrafted features, (iii) evaluates multiple classical models under a record-wise split, and (iv) provides interpretable feature-importance analysis. The goal is not to claim state-of-the-art performance, but to demonstrate that a practical medical AI pipeline can be built quickly, transparently, and reproducibly with a modest feature set\u0026mdash;an approach that can serve as a strong baseline for subsequent deep-learning expansion.\u003c/p\u003e"},{"header":"2. Related Work","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Public ECG resources and benchmarking\u003c/h2\u003e \u003cp\u003ePhysioNet and its associated toolkits were introduced to support research on complex physiologic signals, making datasets like MIT-BIH widely accessible for benchmarking [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. The MIT-BIH Arrhythmia Database, in particular, has had enduring impact on algorithm evaluation and comparative studies of arrhythmia detectors [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e][\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 AAMI-style beat grouping for arrhythmia classification\u003c/h2\u003e \u003cp\u003eMany ECG heartbeat-classification studies group beat annotations into higher-level categories aligned with AAMI testing and reporting guidance, commonly using the N/S/V/F/Q-style aggregation for algorithm evaluation [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. This reduces label granularity while maintaining clinically meaningful groupings for supraventricular and ventricular ectopy assessment.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Classical ML vs. deep learning\u003c/h2\u003e \u003cp\u003eFeature-based methods have historically delivered strong performance on arrhythmia classification and remain valuable for interpretability and efficiency. For example, mixture-of-features approaches have been used to classify AAMI-recommended classes on MIT-BIH-like settings [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]. Deep learning methods can further improve performance by learning representations directly from raw waveforms; for instance, convolutional architectures have been reported for heartbeat classification under AAMI-style grouping [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. However, deep models may require more computation, careful tuning, and can be harder to interpret\u0026mdash;motivating lightweight baselines such as the pipeline proposed here.\u003c/p\u003e \u003c/div\u003e"},{"header":"3. Methods","content":"\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Dataset and access\u003c/h2\u003e \u003cp\u003eWe used the MIT-BIH Arrhythmia Database hosted on PhysioNet [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Signals and annotations were programmatically accessed using the WFDB Python package, which provides reading and processing utilities for physiological waveforms and annotation files [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eDataset rationale\u003c/strong\u003e \u003cp\u003eMIT-BIH has long been used as a reference benchmark for arrhythmia detection and evaluation, making it suitable for a reproducible baseline study [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e][\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e].\u003c/p\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Beat segmentation\u003c/h2\u003e \u003cp\u003eFor each recording, we used annotated R-peak positions from the standard atr annotation stream and extracted a fixed-length beat-centered window. Each beat segment covered 0.2 s pre-R and 0.4 s post-R (total 0.6 s). This window provides a compact representation of local morphology and rhythm context while remaining computationally efficient.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Label mapping (five-class aggregation)\u003c/h2\u003e \u003cp\u003eEach beat annotation symbol was mapped into one of five aggregated categories (N, S, V, F, Q), consistent with widely used AAMI-style evaluation conventions [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Briefly:\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eN\u003c/strong\u003e \u003cp\u003enormal and bundle-branch block\u0026ndash;type beats\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eS\u003c/strong\u003e \u003cp\u003esupraventricular ectopic beats\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eV\u003c/strong\u003e \u003cp\u003eventricular ectopic beats\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eF\u003c/strong\u003e \u003cp\u003efusion beats\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eQ\u003c/strong\u003e \u003cp\u003epaced/unknown/other beats (as applicable under aggregation)\u003c/p\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Feature extraction\u003c/h2\u003e \u003cp\u003eWe extracted a lightweight feature set designed for CPU efficiency and interpretability:\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eTime-domain statistics\u003c/strong\u003e \u003cp\u003emean, standard deviation, min, max, peak-to-peak amplitude (ptp), RMS, and mean absolute value.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eFrequency-domain energy\u003c/strong\u003e \u003cp\u003ewe computed a real FFT for each beat (after normalization) and summarized spectral energy in coarse bands (five bins) plus total energy.\u003c/p\u003e \u003c/p\u003e \u003cp\u003eThese features are intentionally compact so that (i) training and inference remain fast on CPU, and (ii) feature importance can be directly interpreted and discussed.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e3.5 Models and training protocol\u003c/h2\u003e \u003cp\u003eWe trained and compared three classical models:\u003c/p\u003e \u003cp\u003e \u003cb\u003eLogistic Regression (LR)\u003c/b\u003e with standardized features and class balancing\u003c/p\u003e \u003cp\u003e \u003cb\u003eRandom Forest (RF)\u003c/b\u003e with balanced subsampling\u003c/p\u003e \u003cp\u003e \u003cb\u003eSupport Vector Machine (SVM)\u003c/b\u003e with RBF kernel, standardized features, and probability estimates\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eRecord-wise split\u003c/strong\u003e \u003cp\u003eTo reduce overly optimistic performance caused by correlated beats from the same record appearing in both train and test, we used a record-level grouping split (GroupShuffleSplit). This is an important engineering step for fairer evaluation in beat-level tasks.\u003c/p\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e3.6 Evaluation metrics\u003c/h2\u003e \u003cp\u003eGiven the strong class imbalance typical of MIT-BIH-derived beat datasets, we reported:\u003c/p\u003e \u003cp\u003e \u003cb\u003eAccuracy\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eMacro-F1 (emphasizes minority-class performance)\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eOne-vs-rest ROC-AUC (OVR AUC) for multi-class discrimination assessment\u003c/b\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4. Experiments","content":"\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Dataset Summary, Class Distribution, and Experimental Setup\u003c/h2\u003e \u003cp\u003eA total of 109,460 heartbeat segments were extracted from the MIT-BIH Arrhythmia Database using beat-centered windows (0.2 s pre-R and 0.4 s post-R). The resulting dataset exhibited a highly imbalanced class distribution, which is typical for long-term ambulatory ECG recordings: N constituted the majority class, whereas F and S were minor classes. This imbalance is clinically realistic but introduces an evaluation challenge: a model may achieve high accuracy by predominantly predicting the majority class, while still performing poorly for rare yet clinically important arrhythmias.\u003c/p\u003e \u003cp\u003eTo mitigate overly optimistic evaluation caused by leakage across beats from the same record, we adopted a record-wise split (GroupShuffleSplit). Under this protocol, beats from the same record are assigned exclusively to either training or test sets, which better approximates generalization to unseen recordings. We report three complementary metrics: Accuracy, Macro-F1, and one-vs-rest ROC-AUC (OVR AUC). Accuracy reflects overall correctness; Macro-F1 emphasizes balanced performance across classes by averaging per-class F1 equally; OVR AUC captures threshold-independent separability for each class against the rest and is especially informative in imbalanced multi-class settings.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eModel comparison on MIT-BIH (record-wise split)\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccuracy\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMacro-F1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eOVR AUC\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLogistic Regression\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e0.424\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.365\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.806\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e0.864\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.332\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.709\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSVM\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003e0.604\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e0.313\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.704\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe results show a clear trade-off:\u003c/p\u003e \u003cp\u003eRandom Forest (RF) achieved the highest accuracy (0.864), suggesting strong performance on the dominant class and robust rule-based partitioning of feature space. However, its Macro-F1 (0.332) is lower than LR\u0026rsquo;s, implying that performance on minor classes is not proportionally improved.\u003c/p\u003e \u003cp\u003eLogistic Regression (LR) achieved the best OVR AUC (0.806) and highest Macro-F1 (0.365) among the three. This indicates better overall class separability and a more balanced performance distribution than RF, despite lower accuracy.\u003c/p\u003e \u003cp\u003eSVM achieved moderate accuracy (0.604) and Macro-F1 (0.313). In this pipeline, SVM may be limited by feature simplicity and class overlap, especially between N and S.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e4.2 ROC Analysis: Class-wise Discrimination Performance\u003c/h2\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eIn multi-class arrhythmia classification, ROC curves provide a class-wise view of how well the model separates each category from the rest across all thresholds. Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e presents one-vs-rest ROC curves for the five aggregated classes. Two patterns are especially noteworthy:\u003c/p\u003e \u003cp\u003e(1) Strong separability for Q and V classes.\u003c/p\u003e \u003cp\u003eThe ROC curves indicate that Q and V are more easily separable under the current feature representation. This may arise because paced/unknown-like beats and ventricular ectopy often introduce distinct morphological or spectral energy characteristics. In practical terms, a high AUC implies that even if the final decision threshold is adjusted (e.g., to reduce false positives), the model can maintain relatively strong sensitivity-specificity trade-offs.\u003c/p\u003e \u003cp\u003e(2) Lower separability for S class.\u003c/p\u003e \u003cp\u003eThe S class has the lowest ROC performance among the five, which is consistent with the clinical and algorithmic difficulty of distinguishing supraventricular ectopy from normal beats using short-window morphology alone. Supraventricular beats may differ subtly, and the discriminatory signal may require rhythm context (e.g., RR intervals, local variability) rather than purely local morphology. This observation sets up a concrete future-work direction: add RR-based features or longer context windows while still remaining CPU-friendly.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e4.3 Confusion matrix (error patterns)\u003c/h2\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe normalized confusion matrix (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e) reveals the structure of misclassifications and provides insight beyond scalar metrics.\u003c/p\u003e \u003cp\u003e(1) Diagonal dominance and class stability\u003c/p\u003e \u003cp\u003eSeveral classes show strong diagonal values (e.g., Q and V), indicating stable recognition patterns. This supports the ROC findings and suggests that the chosen feature set contains adequate discriminative cues for these classes.\u003c/p\u003e \u003cp\u003e(2) N\u0026ndash;S confusions: the central bottleneck\u003c/p\u003e \u003cp\u003eA prominent error mode is confusion between N and S, which is common in heartbeat-level classification. This is an important point to discuss carefully in SCI writing:\u003c/p\u003e \u003cp\u003e \u003cstrong\u003eEngineering explanation\u003c/strong\u003e \u003cp\u003ethe feature set emphasizes amplitude statistics and coarse spectral energy; these may not fully capture subtle supraventricular morphological differences.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003ePhysiological explanation\u003c/strong\u003e \u003cp\u003esupraventricular beats can appear similar to normal beats in short segments, and the discriminating cues often emerge in timing (prematurity) or broader context rather than morphology alone.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e(3) Minor classes and imbalance sensitivity\u003c/p\u003e \u003cp\u003eFor minor classes such as F, the matrix often shows dispersion into major classes. This is expected under imbalance and limited sample size. It is also the reason Macro-F1 is relatively low even when accuracy is high.\u003c/p\u003e \u003cp\u003e(4) Why this matters clinically\u003c/p\u003e \u003cp\u003eFrom a clinical screening perspective, false negatives in arrhythmia classes (e.g., predicting N for an arrhythmia beat) are typically more concerning than false positives. The confusion matrix gives you a ready-made narrative to justify future threshold calibration (e.g., increase sensitivity for S/V) and cost-sensitive training if the deployment scenario demands it.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e4.4 Feature importance (interpretability)\u003c/h2\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e shows the top-10 features ranked by absolute LR coefficients. Among them, amplitude-related statistics (e.g., absolute mean, RMS, standard deviation) and coarse spectral energy features (fft bands) were consistently influential. This supports the intuition that both waveform magnitude and frequency characteristics contribute to class separation in beat-level arrhythmia classification.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e4.5 Representative spectral patterns (qualitative visualization)\u003c/h2\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFigure \u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e provides a qualitative visualization of representative spectral-energy profiles for each class. Although the curves share a common shape (expected due to normalization and the inherent structure of ECG signals), subtle differences are visible across bands.\u003c/p\u003e \u003cp\u003eThis plot supports two important points:\u003c/p\u003e \u003cp\u003eCoarse spectral summarization is informative. Even with only five energy bands, the model benefits from frequency-domain information.\u003c/p\u003e \u003cp\u003eClasses differ in nuanced ways, which may not be fully separable using only coarse bands, especially for borderline categories such as N vs S. This further motivates augmenting the feature set with rhythm context or more refined spectral descriptors if higher sensitivity is needed.\u003c/p\u003e \u003c/div\u003e"},{"header":"5. Discussion","content":"\u003cp\u003eThis study highlights a practical and reproducible approach to ECG arrhythmia classification that is computationally efficient and interpretable. Importantly, the results illuminate the trade-offs between accuracy-driven performance and balanced multi-class recognition under severe imbalance.\u003c/p\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003e5.1 Why classical ML remains valuable in medical AI\u003c/h2\u003e \u003cp\u003eWhile deep learning can be highly accurate, real-world clinical and engineering settings often require rapid prototyping, transparency, and deployment feasibility on limited hardware. The proposed pipeline addresses these constraints by using compact features and CPU-friendly models, making it attractive for:\u003c/p\u003e \u003cp\u003eembedded or bedside deployments,\u003c/p\u003e \u003cp\u003eresearch baselines and ablation studies,\u003c/p\u003e \u003cp\u003eeducational settings and rapid iteration cycles.\u003c/p\u003e \u003cp\u003eThe interpretability of Logistic Regression is particularly valuable: coefficients provide a direct mapping from features to decisions, allowing error analysis and feature refinement without resorting to opaque latent representations.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e5.2 Interpreting the metric trade-offs (Accuracy vs Macro-F1 vs AUC)\u003c/h2\u003e \u003cp\u003eYour results show an instructive pattern:\u003c/p\u003e \u003cp\u003eRF has high accuracy but relatively low Macro-F1. Under imbalance, accuracy can be dominated by the majority class (N). RF may learn highly effective rules for N, improving accuracy substantially, but still struggle with minority classes.\u003c/p\u003e \u003cp\u003eLR has the best AUC and Macro-F1 among the three. This indicates better overall separability and a more balanced classification profile, even if the default thresholding yields lower accuracy.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec22\" class=\"Section2\"\u003e \u003ch2\u003e5.3 Error analysis: why N and S are hard, and what to do next\u003c/h2\u003e \u003cp\u003eThe confusion matrix suggests that N\u0026ndash;S discrimination is the main bottleneck. This is expected when using short beat windows and morphology-focused features. Two low-cost improvements are strongly justified:\u003c/p\u003e \u003cp\u003eRR interval features: pre-RR, post-RR, RR ratio, local mean RR, etc. These capture prematurity and rhythm changes that are crucial for supraventricular ectopy.\u003c/p\u003e \u003cp\u003eMorphological descriptors: approximate QRS width, slopes, or wavelet energies; these can be computed efficiently without GPUs.\u003c/p\u003e \u003cp\u003eYou can present this as a targeted future direction: \u0026ldquo;improving features is likely more effective than simply changing the classifier.\u0026rdquo;\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec23\" class=\"Section2\"\u003e \u003ch2\u003e5.4 Clinical translation considerations\u003c/h2\u003e \u003cp\u003eFrom a deployment perspective, misclassifications have asymmetric cost. In many screening contexts, sensitivity for arrhythmias (S/V) may be prioritized over specificity, suggesting:\u003c/p\u003e \u003cp\u003eprobability calibration,\u003c/p\u003e \u003cp\u003eclass-specific thresholds,\u003c/p\u003e \u003cp\u003ecost-sensitive training.\u003c/p\u003e \u003cp\u003eYour pipeline is well-suited to such adjustments because it outputs calibrated-like probabilities (for LR/SVM) and can be tuned without extensive retraining overhead.\u003c/p\u003e \u003c/div\u003e"},{"header":"6.Conclusion and Future Work","content":"\u003cdiv id=\"Sec25\" class=\"Section2\"\u003e \u003ch2\u003e6.1 Conclusion\u003c/h2\u003e \u003cp\u003eWe developed a lightweight, interpretable, and CPU-deployable pipeline for heartbeat-level arrhythmia classification using beat-centered segmentation and compact handcrafted features. Across three classical models, Random Forest achieved the highest accuracy (0.864), while Logistic Regression achieved the best discrimination ability measured by OVR ROC-AUC (0.806) and the highest Macro-F1 among compared models (0.365). The ROC and confusion-matrix analyses indicate strong separability for Q and V classes and identify N\u0026ndash;S confusion as the primary challenge, consistent with limited rhythm context in short beat windows. Feature-importance analysis further demonstrates that amplitude-related statistics and coarse spectral energy features drive model decisions, supporting transparent interpretation and efficient deployment.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec26\" class=\"Section2\"\u003e \u003ch2\u003e6.2 Future Work\u003c/h2\u003e \u003cp\u003eTo strengthen both performance and clinical relevance while preserving CPU efficiency, future work will pursue:\u003c/p\u003e \u003cdiv id=\"Sec27\" class=\"Section3\"\u003e \u003ch2\u003e6.2.1 Rhythm-aware feature augmentation:\u003c/h2\u003e \u003cp\u003eAdd pre-/post-RR intervals, RR ratios, and local HRV descriptors to improve discrimination between N and S, which is the dominant error mode observed.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec28\" class=\"Section3\"\u003e \u003ch2\u003e6.2.2 Feature refinement with minimal overhead:\u003c/h2\u003e \u003cp\u003eExplore wavelet energy features, QRS width approximations, and slope-based morphology descriptors. These remain computationally light but can capture subtler morphological cues.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec29\" class=\"Section3\"\u003e \u003ch2\u003e6.2.3 Imbalance-aware learning and thresholding:\u003c/h2\u003e \u003cp\u003eInvestigate cost-sensitive optimization, re-sampling strategies, and class-specific decision thresholds. This is especially relevant for minority classes such as F.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec30\" class=\"Section3\"\u003e \u003ch2\u003e6.2.4 External validation and robustness testing:\u003c/h2\u003e \u003cp\u003eValidate the pipeline on additional ECG datasets and evaluate robustness to noise, baseline wander, and lead variations to better approximate real-world clinical conditions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec31\" class=\"Section3\"\u003e \u003ch2\u003e6.2.5 Deployment-oriented evaluation:\u003c/h2\u003e \u003cp\u003eBenchmark inference latency and memory footprint on CPU-only devices and evaluate calibrated probabilities for safe integration into clinical decision support.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eGoldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101(23):e215\u0026ndash;e220. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1161/01.CIR.101.23.e215\u003c/span\u003e\u003cspan address=\"10.1161/01.CIR.101.23.e215\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMoody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag 20(3):45\u0026ndash;50. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/51.932724\u003c/span\u003e\u003cspan address=\"10.1109/51.932724\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePhysioNet (n.d.). MIT-BIH Arrhythmia Database (mitdb), v1.0.0. Retrieved from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://physionet.org/content/mitdb/1.0.0/\u003c/span\u003e\u003cspan address=\"https://physionet.org/content/mitdb/1.0.0/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAssociation for the Advancement of Medical Instrumentation (2012) /2020 ANSI/AAMI EC57: Testing and reporting performance results of cardiac rhythm and ST segment measurement algorithms (ANSI/AAMI EC57:2012 (R2020)). (Preview). Retrieved from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://webstore.ansi.org/preview-pages/AAMI/preview_ANSI%2BAAMI%2BEC57-2012%2B%28R2020%29.pdf\u003c/span\u003e\u003cspan address=\"https://webstore.ansi.org/preview-pages/AAMI/preview_ANSI%2BAAMI%2BEC57-2012%2B%28R2020%29.pdf\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDas MK, Ari S (2014) \u0026amp; (others as listed in the article). (2014). ECG beats classification using mixture of features. Comput Math Methods Med, Article ID 178414. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://pmc.ncbi.nlm.nih.gov/articles/PMC4897569/\u003c/span\u003e\u003cspan address=\"https://pmc.ncbi.nlm.nih.gov/articles/PMC4897569/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKachuee M, Fazeli S, Sarrafzadeh M (2018) ECG heartbeat classification: A deep transferable representation. arXiv preprint, arXiv:1805.00794. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/1805.00794\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/1805.00794\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePhysioNet (2023) Waveform Database Software Package (WFDB) for Python. Retrieved from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://physionet.org/content/wfdb-python/\u003c/span\u003e\u003cspan address=\"https://physionet.org/content/wfdb-python/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Hangzhou Dianzi University","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"ECG, arrhythmia classification, MIT-BIH, interpretable machine learning, feature engineering, CPU deployment","lastPublishedDoi":"10.21203/rs.3.rs-8415145/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8415145/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eElectrocardiogram (ECG) arrhythmia classification is a foundational task in medical artificial intelligence, yet many high-performing deep learning approaches require GPU resources and may be difficult to interpret in clinical settings. This study presents a lightweight, CPU-deployable, and interpretable pipeline for heartbeat-level arrhythmia classification using the MIT-BIH Arrhythmia Database. ECG beats were segmented around annotated R-peaks and mapped into five AAMI-style aggregated classes (N, S, V, F, Q). We extracted compact time-domain statistics and frequency-domain energy features, then trained and compared three classical machine learning models: Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM). Using a record-wise split to reduce data leakage risk, RF achieved the highest accuracy (0.864), while LR provided the strongest one-vs-rest ROC-AUC (0.806). Class-wise ROC curves and feature-importance analysis suggested that spectral energy and amplitude-related statistics contributed substantially to discrimination. Overall, the results demonstrate that interpretable, resource-efficient ECG classification remains feasible without deep networks, supporting practical deployment in CPU-only environments and rapid prototyping of medical AI systems.\u003c/p\u003e","manuscriptTitle":"A Lightweight, CPU-Deployable, and Interpretable ECG Arrhythmia Classification Pipeline Using the MIT-BIH Arrhythmia Database","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-23 05:56:04","doi":"10.21203/rs.3.rs-8415145/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"1b9af617-d6a5-46fd-9200-1ffa5abfc3cf","owner":[],"postedDate":"December 23rd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":60011855,"name":"Biotechnology and Bioengineering"}],"tags":[],"updatedAt":"2025-12-23T05:56:04+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-23 05:56:04","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8415145","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8415145","identity":"rs-8415145","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.