An Explainable AI Framework Integrating Machine and Deep Learning Models for Multi-Species DNA Functional Group Classification

preprint OA: closed CC-BY-4.0
📄 Open PDF Full text JSON View at publisher
Full text 20,984 characters · extracted from preprint-html · click to expand
An Explainable AI Framework Integrating Machine and Deep Learning Models for Multi-Species DNA Functional Group Classification | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article An Explainable AI Framework Integrating Machine and Deep Learning Models for Multi-Species DNA Functional Group Classification Pratik Chakraborty, Shanthi P. B. This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7979065/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 10 You are reading this latest preprint version Abstract DNA functional group classification across species plays a crucial role in understanding genetic diversity and biological function. The increasing availability of genomic data has led to the use of machine learning and deep learning methods for identifying functional patterns within DNA sequences. However, the interpretability of these models remains a challenge in validating biological relevance. This study presents an explainable AI framework that integrates machine learning and deep learning models for multi-species DNA functional group classification. Classification of the DNA functional groups is done on Human, Chimpanzee, Dog, and a custom combined dataset integrating the three species. The DNA sequences were transformed into k-mers to capture local compositional patterns before training. After extensive hyperparameter tuning, the Multinomial Naive Bayes model achieved the highest accuracy across all datasets, outperforming other models in the study and previously reported results on the same datasets. While deep learning architectures captured longer motif dependencies, classical models showed stronger generalization across species. Explainable AI techniques including Feature Importance, Saliency maps, Integrated Gradients, GradientSHAP and Attention heatmaps were applied to identify consensus motifs that align with known genomic and regulatory regions such as CpG-rich promoters and transmembrane domain signatures. The results demonstrate that the use of having an explainable framework can enhance biological insight and reliability in multi-species genomic analysis. Biological sciences/Computational biology and bioinformatics Biological sciences/Genetics Attention heatmaps DNA functional group classification Deep learning Explainable AI Framework k-mers Machine learning Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 01 Dec, 2025 Reviews received at journal 20 Nov, 2025 Reviewers agreed at journal 20 Nov, 2025 Reviews received at journal 16 Nov, 2025 Reviewers agreed at journal 12 Nov, 2025 Reviewers invited by journal 12 Nov, 2025 Editor assigned by journal 11 Nov, 2025 Editor invited by journal 07 Nov, 2025 Submission checks completed at journal 31 Oct, 2025 First submitted to journal 31 Oct, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7979065","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":549022028,"identity":"beb5927f-0fc1-4126-a943-4d445b4d1d25","order_by":0,"name":"Pratik Chakraborty","email":"","orcid":"","institution":"Manipal Academy of Higher Education","correspondingAuthor":false,"prefix":"","firstName":"Pratik","middleName":"","lastName":"Chakraborty","suffix":""},{"id":549022029,"identity":"5234ee46-1a16-4d5a-9b36-28397dd9ca5c","order_by":1,"name":"Shanthi P. B.","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABI0lEQVRIiWNgGAWjYDACCSBmbJAAsxkbGBgS+EGsBAYGGSBlgFPLQYQWgwTJBogWHgJaGBBaDA5AODi1yM9ufvb44w4LBt325mMfZ1T8yTO+dvjZg4c77HgY2Ju3SWDRYnDnmLnBwTMSDGZnjiXP3HDGoNjsdpq5QeKZZB4GnmNlWLVIJJhJHGwDarmRY8z4sM0gcdttoEhiGzMPg0SOGTYt8jPSv6Fq2TwbKJLYVs/DIP8GqxaGGzlItmwEatkgnQOy5TDQFh6sWgxu5JRJnG0DygL9wjjjjHHijNtAkcQzx3nYeNKKLbA7bJtEZVudnNnx5sOMPRVyif2z07dJ/txRLcfPfnjjDWwOgwIeVC4oHbDhUY4FgLSMglEwCkbBKIACABffZcL2N5NaAAAAAElFTkSuQmCC","orcid":"","institution":"Manipal Academy of Higher Education","correspondingAuthor":true,"prefix":"","firstName":"Shanthi","middleName":"P.","lastName":"B.","suffix":""}],"badges":[],"createdAt":"2025-10-29 11:08:21","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7979065/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7979065/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":96605875,"identity":"017dec9b-1983-4c4d-b478-5b02095cb870","added_by":"auto","created_at":"2025-11-24 09:24:16","extension":"json","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4462,"visible":true,"origin":"","legend":"","description":"","filename":"b43d4cf6af18471aa6e71f96209a7bbd.json","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/a733b6cfaad5494b3892b658.json"},{"id":96605893,"identity":"9635b33d-6b2f-42b7-82dd-e846af27779c","added_by":"auto","created_at":"2025-11-24 09:24:19","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":369117,"visible":true,"origin":"","legend":"","description":"","filename":"AnExplainableAIFrameworkDNA.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/123477cdad76d410e45d6299.pdf"},{"id":96579467,"identity":"d27b1ae7-4b8b-45b8-bcb1-c7391bd0b254","added_by":"auto","created_at":"2025-11-24 01:57:31","extension":"zip","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":757514,"visible":true,"origin":"","legend":"","description":"","filename":"DNAResearchPaper.zip","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/a5f349689746df0818da68de.zip"},{"id":96579459,"identity":"87e3094b-ec4a-4796-8b67-943f668c8868","added_by":"auto","created_at":"2025-11-24 01:57:31","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":38472,"visible":true,"origin":"","legend":"","description":"","filename":"RevisionStatementDNA.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/960e22390f59c00ab7dca52c.pdf"},{"id":96579464,"identity":"7afd06fe-9ac4-4918-9a50-a986d48bf549","added_by":"auto","created_at":"2025-11-24 01:57:31","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":85246,"visible":true,"origin":"","legend":"","description":"","filename":"chakr1.png","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/1f8860cfa9701d239fb0f475.png"},{"id":96579462,"identity":"4d4c99ec-943b-406b-979a-28dc709d1c4e","added_by":"auto","created_at":"2025-11-24 01:57:31","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":13266,"visible":true,"origin":"","legend":"","description":"","filename":"chakr2.png","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/bc72053b3e3f873729643f4e.png"},{"id":96604948,"identity":"4ba08633-b1f7-44f7-829a-21d5d6dbb7a6","added_by":"auto","created_at":"2025-11-24 09:16:40","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12883,"visible":true,"origin":"","legend":"","description":"","filename":"chakr3.png","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/f019b7f11304bc5e7f93af89.png"},{"id":96605232,"identity":"a3bebad2-9999-412d-a43a-d3517cf47c85","added_by":"auto","created_at":"2025-11-24 09:21:41","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12560,"visible":true,"origin":"","legend":"","description":"","filename":"chakr4.png","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/ee20de618365fd2ec9a9a264.png"},{"id":96605351,"identity":"150fd818-f92c-4337-91ee-8424cc0901bc","added_by":"auto","created_at":"2025-11-24 09:22:28","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12362,"visible":true,"origin":"","legend":"","description":"","filename":"chakr5.png","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/f7956585dd5b0cf2f9de566c.png"},{"id":96605106,"identity":"d170212e-0269-4360-bc49-2357441c7e69","added_by":"auto","created_at":"2025-11-24 09:18:27","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":13191,"visible":true,"origin":"","legend":"","description":"","filename":"chakr6.png","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/e1e71e4f0f24efc2d50422dd.png"},{"id":96579472,"identity":"99695464-0b4b-45c4-b14e-269601cc201f","added_by":"auto","created_at":"2025-11-24 01:57:32","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":13337,"visible":true,"origin":"","legend":"","description":"","filename":"chakr7.png","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/81d38a9a4df38049df7fb354.png"},{"id":96579468,"identity":"49687cf1-d953-4a17-92b5-a754d09eafaa","added_by":"auto","created_at":"2025-11-24 01:57:31","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12772,"visible":true,"origin":"","legend":"","description":"","filename":"chakr8.png","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/bb6a999168e98206921108d0.png"},{"id":96579475,"identity":"d8130475-1a8f-4b7f-af10-4cd3654daf8c","added_by":"auto","created_at":"2025-11-24 01:57:32","extension":"ldf","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":49107,"visible":true,"origin":"","legend":"","description":"","filename":"jabbrvltwaall.ldf","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/096abd04b79435d1f45d3567.ldf"},{"id":96579476,"identity":"24b789e5-84bb-4079-9bfd-1cb1057fe0e3","added_by":"auto","created_at":"2025-11-24 01:57:32","extension":"ldf","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":268657,"visible":true,"origin":"","legend":"","description":"","filename":"jabbrvltwaen.ldf","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/5961c34595ddf3a6d71e89b1.ldf"},{"id":96579477,"identity":"d77e9e44-8c30-441d-bed9-0b37574d07d3","added_by":"auto","created_at":"2025-11-24 01:57:32","extension":"sty","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":15480,"visible":true,"origin":"","legend":"","description":"","filename":"jabbrv.sty","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/2706e5cb5d8ff3c2f321e976.sty"},{"id":96579473,"identity":"86ebcad5-0a8e-495a-8988-a76f3ba65a67","added_by":"auto","created_at":"2025-11-24 01:57:32","extension":"bst","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":36153,"visible":true,"origin":"","legend":"","description":"","filename":"naturemagdoi.bst","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/cc76e13ee9f62f50a4435cde.bst"},{"id":96605120,"identity":"bdb76361-3b04-4e45-86f7-e302728bf205","added_by":"auto","created_at":"2025-11-24 09:18:32","extension":"bbl","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":18928,"visible":true,"origin":"","legend":"","description":"","filename":"output.bbl","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/4e0e44de8bb2002606e18e06.bbl"},{"id":96579478,"identity":"74640c4a-7e41-4c33-b349-804ff90d0b7e","added_by":"auto","created_at":"2025-11-24 01:57:32","extension":"jpg","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":379752,"visible":true,"origin":"","legend":"","description":"","filename":"stream.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/83db9ee7624947c3d9e73371.jpg"},{"id":96579469,"identity":"9e5e71be-6057-4967-bd71-bba57cc3e17e","added_by":"auto","created_at":"2025-11-24 01:57:31","extension":"cls","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":5824,"visible":true,"origin":"","legend":"","description":"","filename":"wlscirep.cls","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/30c8fa974543746b3ca9e17f.cls"},{"id":96605065,"identity":"11e64c8f-d9be-4f6b-b2ee-116379168a62","added_by":"auto","created_at":"2025-11-24 09:17:55","extension":"xml","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":113984,"visible":true,"origin":"","legend":"","description":"","filename":"b43d4cf6af18471aa6e71f96209a7bbd1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1/d53b7a1bdc783dc6ab31ceda.xml"},{"id":96708308,"identity":"094e5c81-8072-48b6-abb7-b50f16718cac","added_by":"auto","created_at":"2025-11-25 10:00:50","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":418628,"visible":true,"origin":"","legend":"","description":"","filename":"AnExplainableAIFrameworkDNA.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7979065/v1_covered_4942f1a5-8348-4c19-92bd-fe34470d8039.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"An Explainable AI Framework Integrating Machine and Deep Learning Models for Multi-Species DNA Functional Group Classification","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Attention heatmaps, DNA functional group classification, Deep learning, Explainable AI Framework, k-mers, Machine learning","lastPublishedDoi":"10.21203/rs.3.rs-7979065/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7979065/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"DNA functional group classification across species plays a crucial role in understanding genetic diversity and biological function. The increasing availability of genomic data has led to the use of machine learning and deep learning methods for identifying functional patterns within DNA sequences. However, the interpretability of these models remains a challenge in validating biological relevance. This study presents an explainable AI framework that integrates machine learning and deep learning models for multi-species DNA functional group classification. Classification of the DNA functional groups is done on Human, Chimpanzee, Dog, and a custom combined dataset integrating the three species. The DNA sequences were transformed into k-mers to capture local compositional patterns before training. After extensive hyperparameter tuning, the Multinomial Naive Bayes model achieved the highest accuracy across all datasets, outperforming other models in the study and previously reported results on the same datasets. While deep learning architectures captured longer motif dependencies, classical models showed stronger generalization across species. Explainable AI techniques including Feature Importance, Saliency maps, Integrated Gradients, GradientSHAP and Attention heatmaps were applied to identify consensus motifs that align with known genomic and regulatory regions such as CpG-rich promoters and transmembrane domain signatures. The results demonstrate that the use of having an explainable framework can enhance biological insight and reliability in multi-species genomic analysis.","manuscriptTitle":"An Explainable AI Framework Integrating Machine and Deep Learning Models for Multi-Species DNA Functional Group Classification","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-24 01:57:27","doi":"10.21203/rs.3.rs-7979065/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-12-01T06:40:30+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-21T00:30:04+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"328593465725538754661681614370876866075","date":"2025-11-21T00:02:05+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-16T21:00:37+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"199876433907070933295665298089191870806","date":"2025-11-12T06:11:18+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-11-12T05:31:13+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-11-11T13:19:33+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-11-07T05:40:21+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-10-31T11:44:28+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2025-10-31T11:39:27+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"bd203997-b102-44c0-8d20-418aa0d35f4c","owner":[],"postedDate":"November 24th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":58423778,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":58423779,"name":"Biological sciences/Genetics"}],"tags":[],"updatedAt":"2026-05-11T05:41:26+00:00","versionOfRecord":[],"versionCreatedAt":"2025-11-24 01:57:27","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7979065","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7979065","identity":"rs-7979065","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-05-24T02:00:01.246996+00:00
License: CC-BY-4.0