A Resource-Efficient Hybrid ViT–CNN Framework with CutMix Regularization for Cardiac MRI Image Classification

doi:10.21203/rs.3.rs-8735303/v1

A Resource-Efficient Hybrid ViT–CNN Framework with CutMix Regularization for Cardiac MRI Image Classification

2026 · doi:10.21203/rs.3.rs-8735303/v1

preprint OA: closed

Full text JSON View at publisher

Full text 12,639 characters · extracted from preprint-html · click to expand

A Resource-Efficient Hybrid ViT–CNN Framework with CutMix Regularization for Cardiac MRI Image Classification | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article A Resource-Efficient Hybrid ViT–CNN Framework with CutMix Regularization for Cardiac MRI Image Classification Amirreza Khayyat assadi, Babak Nouri-Moghaddam, Abbas Mirzaei This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8735303/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Hybrid deep learning architectures that combine Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have emerged as an effective paradigm in medical image computing by enabling simultaneous modeling of local textures and global contextual dependencies. Nevertheless, achieving high classification accuracy while maintaining computational efficiency remains a significant challenge, particularly in resource-constrained computing environments and data-limited medical imaging scenarios. In this study, we propose a resource-efficient Hybrid ViT–CNN framework for cardiac magnetic resonance imaging (MRI) classification, explicitly designed to optimize architectural inductive bias through structured feature fusion and CutMix regularization. The proposed model employs a shallow convolutional stem to encode localized texture information and inject domain-specific inductive bias, followed by a lightweight Transformer encoder to capture long-range global dependencies. To enhance generalization and training stability on limited datasets, CutMix stochastic augmentation is incorporated, while a dynamic resource-adaptive batching strategy is utilized to optimize memory usage and computational throughput during training on CPU-only hardware. The framework is evaluated on the CAD Cardiac MRI dataset using stratified five-fold cross-validation. Experimental results demonstrate an average classification accuracy of 96.71% ± 1.2%, an F1-score of 0.9671, and an Area Under the ROC Curve (AUC) of 0.9960, consistently outperforming standalone CNN- and ViT-based baselines. Importantly, the proposed model converges within 231 seconds on CPU-only hardware and achieves real-time inference performance of approximately 12 ms per image, highlighting its practical feasibility for deployment in constrained computing environments. Ablation studies further confirm that the hybrid architectural design yields an intrinsic performance gain of approximately 4.5%, with CutMix providing additional robustness. These findings demonstrate that high-accuracy cardiac MRI classification can be achieved without reliance on high-end GPU resources, underscoring the potential of hybrid, resource-aware deep learning architectures for scalable and efficient medical image computing applications. Hybrid Deep Learning Architecture Vision Transformer and CNN Integration Resource-Efficient Computing Medical Image Classification CutMix Regularization Inductive Bias Optimization CPU-Based Inference Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8735303","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":596522282,"identity":"451fdb10-0f5a-49b7-95c8-fb224209f38e","order_by":0,"name":"Amirreza Khayyat assadi","email":"","orcid":"","institution":"Ard.C, Islamic Azad University","correspondingAuthor":false,"prefix":"","firstName":"Amirreza","middleName":"Khayyat","lastName":"assadi","suffix":""},{"id":596522283,"identity":"f8585dd3-fecc-4680-8d50-7b5f3a4ee501","order_by":1,"name":"Babak Nouri-Moghaddam","email":"","orcid":"","institution":"Ard.C, Islamic Azad University","correspondingAuthor":false,"prefix":"","firstName":"Babak","middleName":"","lastName":"Nouri-Moghaddam","suffix":""},{"id":596522284,"identity":"06df8d31-9f97-48ee-a272-1e3e511c8bd6","order_by":2,"name":"Abbas Mirzaei","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAxklEQVRIiWNgGAWjYJCCAyCCjb2xAcJlJloLz0EStECARAKRCvkbuBMPV/yxk+OTfNz4mYfBTp6BnfcBfrMP8G44eIYn2ZhNOrFZmoch2bCBmd0AvzUgLQ0SzIlt0okNQC3MCQzMbPh1yIO1GNTXt0kebP7Nw1BPWIsBWEvC4QQ2CcY2oC2HCWsxPAzScuC4YRtPYpvlHAMgg5AWueO9mz82/KmWl28//vjGm4pqeX7+Y/i1oEUcMKwI2DEKRsEoGAWjgBgAADLFOboIOV/CAAAAAElFTkSuQmCC","orcid":"","institution":"Ard.C, Islamic Azad University","correspondingAuthor":true,"prefix":"","firstName":"Abbas","middleName":"","lastName":"Mirzaei","suffix":""}],"badges":[],"createdAt":"2026-01-29 21:53:05","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8735303/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8735303/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":104205177,"identity":"3f94c759-b6d6-46fd-a02a-d7550404ed60","added_by":"auto","created_at":"2026-03-09 06:41:33","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1194579,"visible":true,"origin":"","legend":"","description":"","filename":"PaperDiscoverComputing.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8735303/v1_covered_6640524e-e9e3-4c2f-8691-b3e583f556b1.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Resource-Efficient Hybrid ViT–CNN Framework with CutMix Regularization for Cardiac MRI Image Classification","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Hybrid Deep Learning Architecture, Vision Transformer and CNN Integration, Resource-Efficient Computing, Medical Image Classification, CutMix Regularization, Inductive Bias Optimization, CPU-Based Inference","lastPublishedDoi":"10.21203/rs.3.rs-8735303/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8735303/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eHybrid deep learning architectures that combine Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) have emerged as an effective paradigm in medical image computing by enabling simultaneous modeling of local textures and global contextual dependencies. Nevertheless, achieving high classification accuracy while maintaining computational efficiency remains a significant challenge, particularly in resource-constrained computing environments and data-limited medical imaging scenarios. In this study, we propose a resource-efficient Hybrid ViT\u0026ndash;CNN framework for cardiac magnetic resonance imaging (MRI) classification, explicitly designed to optimize architectural inductive bias through structured feature fusion and CutMix regularization. The proposed model employs a shallow convolutional stem to encode localized texture information and inject domain-specific inductive bias, followed by a lightweight Transformer encoder to capture long-range global dependencies. To enhance generalization and training stability on limited datasets, CutMix stochastic augmentation is incorporated, while a dynamic resource-adaptive batching strategy is utilized to optimize memory usage and computational throughput during training on CPU-only hardware.\u003c/p\u003e \u003cp\u003eThe framework is evaluated on the CAD Cardiac MRI dataset using stratified five-fold cross-validation. Experimental results demonstrate an average classification accuracy of 96.71% \u0026plusmn; 1.2%, an F1-score of 0.9671, and an Area Under the ROC Curve (AUC) of 0.9960, consistently outperforming standalone CNN- and ViT-based baselines. Importantly, the proposed model converges within 231 seconds on CPU-only hardware and achieves real-time inference performance of approximately 12 ms per image, highlighting its practical feasibility for deployment in constrained computing environments. Ablation studies further confirm that the hybrid architectural design yields an intrinsic performance gain of approximately 4.5%, with CutMix providing additional robustness.\u003c/p\u003e \u003cp\u003eThese findings demonstrate that high-accuracy cardiac MRI classification can be achieved without reliance on high-end GPU resources, underscoring the potential of hybrid, resource-aware deep learning architectures for scalable and efficient medical image computing applications.\u003c/p\u003e","manuscriptTitle":"A Resource-Efficient Hybrid ViT–CNN Framework with CutMix Regularization for Cardiac MRI Image Classification","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-25 16:50:59","doi":"10.21203/rs.3.rs-8735303/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3f8418b3-113c-4365-bf99-6d4f631e63ba","owner":[],"postedDate":"February 25th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-03-09T06:39:02+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-25 16:50:59","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8735303","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8735303","identity":"rs-8735303","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00