DAMRN: Deep attention mechanism residual network for deepfake audio detection using cepstral coefficient features

doi:10.21203/rs.3.rs-4446190/v1

DAMRN: Deep attention mechanism residual network for deepfake audio detection using cepstral coefficient features

2024 · doi:10.21203/rs.3.rs-4446190/v1

preprint OA: closed

Full text JSON View at publisher

Full text 11,313 characters · extracted from preprint-html · click to expand

DAMRN: Deep attention mechanism residual network for deepfake audio detection using cepstral coefficient features | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article DAMRN: Deep attention mechanism residual network for deepfake audio detection using cepstral coefficient features Feng Zhou, Haitao Yang, Xiai Yan, Yingzhuo Xiong This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4446190/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract In order to further curb the misuse of Deepfake audio technology, we proposed deep attention mechanism residual network (DAMRN) that can effectively detect forged audio. The network exhibits stable operation, low risk of gradient disappearance and gradient explosion, and high detection accuracy. The structure of proposed network mainly involves the following contents: Firstly, a data balancing strategy is adopted in the front end of the network so that the ratio of positive and negative samples in the data maintained proportional balance, which improves the network performance and reduces the overfitting phenomenon. This strategy has been effectively proven by the experiments in this article. Secondly, we compare the accuracy rates of different depths among the network models for Deepfake audio detection (DFAD), and select the network that best suits the depth of this article. Finally, we introduce an effective attention mechanism in the network structure appropriately to increase the network's sensitivity to forged speech artifact information. By obtaining the artifact information of the Deepfake audio, the network model can learn more falsification frequency features that can effectively distinguish between spoofed and bonafide audio, and the accuracy has been improved to 99.81%, with the EER reduced to 0.69%, compared to the baseline system. Experiments are conducted using three acoustic features (MFCC, LFCC, GFCC) extracted from two mainstream datasets (ASVspoof2019LA, ASVspoof2021DF) respectively, and the results show that the best EER value of the method proposed in this paper is 0.32%, which is a better performance compared with other mainstream models. Deepfake audio Acoustics physical Computer modeling Multimedia Data balancing Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4446190","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":308656533,"identity":"2a539bcb-5305-4db0-b2dc-fd2090db1bac","order_by":0,"name":"Feng Zhou","email":"","orcid":"","institution":"Hunan Police Academy","correspondingAuthor":false,"prefix":"","firstName":"Feng","middleName":"","lastName":"Zhou","suffix":""},{"id":308656534,"identity":"f2044c73-12ad-4903-b4c3-2982d1e3476f","order_by":1,"name":"Haitao Yang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA+UlEQVRIiWNgGAWjYJACZgYGCRB1DMpPIE4LUA9bGklaQNbwmBGnxeD42cOvC2os6vjbe749Ltxhx8DPnmPA8HMHHi1n8tKsZxyTkJA4c3a78cwzyQySPW8MGHvP4NFyIMfMmIcN6JcbudukeduYGQxu5BgwM7bh0XL+DVDLPwkJ+Rs5z4Ba6hnsCWq5kWP8mLdNQgLIYANqOcxgIEFAi+SNN2bMvH0SkhvPHDOT5j1znEfizLOCg714tPCdzzH+zPOtjl/ueDPQYTuq5fjbkzc++IlHi8IBBjYJOI+xgYEHRB/ArYGBQb6BgfkDspZRMApGwSgYBRgAABNKTRXa4Ou9AAAAAElFTkSuQmCC","orcid":"","institution":"Hunan Police Academy","correspondingAuthor":true,"prefix":"","firstName":"Haitao","middleName":"","lastName":"Yang","suffix":""},{"id":308656535,"identity":"c79e047d-a7f7-47e1-bfe7-044b2586c8ac","order_by":2,"name":"Xiai Yan","email":"","orcid":"","institution":"Hunan Police Academy","correspondingAuthor":false,"prefix":"","firstName":"Xiai","middleName":"","lastName":"Yan","suffix":""},{"id":308656536,"identity":"62e96504-dcff-45d0-9721-3feb1f7e8db2","order_by":3,"name":"Yingzhuo Xiong","email":"","orcid":"","institution":"Hunan Police Academy","correspondingAuthor":false,"prefix":"","firstName":"Yingzhuo","middleName":"","lastName":"Xiong","suffix":""}],"badges":[],"createdAt":"2024-05-20 02:37:43","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4446190/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4446190/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":58111959,"identity":"24be4603-5a78-4ad6-bb31-8fb89d10310f","added_by":"auto","created_at":"2024-06-11 09:45:42","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":869994,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4446190/v1_covered_90cb8636-2d69-4d93-9bee-2a703b934f64.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"DAMRN: Deep attention mechanism residual network for deepfake audio detection using cepstral coefficient features","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Deepfake audio, Acoustics physical, Computer modeling, Multimedia, Data balancing","lastPublishedDoi":"10.21203/rs.3.rs-4446190/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4446190/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"In order to further curb the misuse of Deepfake audio technology, we proposed deep attention mechanism residual network (DAMRN) that can effectively detect forged audio. The network exhibits stable operation, low risk of gradient disappearance and gradient explosion, and high detection accuracy. The structure of proposed network mainly involves the following contents: Firstly, a data balancing strategy is adopted in the front end of the network so that the ratio of positive and negative samples in the data maintained proportional balance, which improves the network performance and reduces the overfitting phenomenon. This strategy has been effectively proven by the experiments in this article. Secondly, we compare the accuracy rates of different depths among the network models for Deepfake audio detection (DFAD), and select the network that best suits the depth of this article. Finally, we introduce an effective attention mechanism in the network structure appropriately to increase the network's sensitivity to forged speech artifact information. By obtaining the artifact information of the Deepfake audio, the network model can learn more falsification frequency features that can effectively distinguish between spoofed and bonafide audio, and the accuracy has been improved to 99.81%, with the EER reduced to 0.69%, compared to the baseline system. Experiments are conducted using three acoustic features (MFCC, LFCC, GFCC) extracted from two mainstream datasets (ASVspoof2019LA, ASVspoof2021DF) respectively, and the results show that the best EER value of the method proposed in this paper is 0.32%, which is a better performance compared with other mainstream models.\n","manuscriptTitle":"DAMRN: Deep attention mechanism residual network for deepfake audio detection using cepstral coefficient features","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-06-03 09:01:33","doi":"10.21203/rs.3.rs-4446190/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"86a227c4-7e46-45f3-8f6b-04126348b72b","owner":[],"postedDate":"June 3rd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-06-22T08:15:52+00:00","versionOfRecord":[],"versionCreatedAt":"2024-06-03 09:01:33","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4446190","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4446190","identity":"rs-4446190","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00