A Multi-Scale Feature Fusion Hybrid Convolution Attention Model for Birdsong Recognition | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article A Multi-Scale Feature Fusion Hybrid Convolution Attention Model for Birdsong Recognition Wei Li, Danju Lv, Yueyun Yu, Yan Zhang, Lianglian Gu, Ziqian Wang, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4976065/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Birdsong is a valuable indicator of rich biodiversity and ecological significance. Although feature extraction has demonstrated satisfactory performance in classification, single-scale feature extraction methods may not fully capture the complexity of birdsong, potentially leading to suboptimal classification outcomes. The integration of multi-scale feature extraction and fusion enables the model to better handle scale variations, thereby enhancing its adaptability across different scales. To address this issue, we propose a Multi-Scale Hybird Convolutional Attention Mechanism Model (MUSCA). This method combines depth wise separable convolution and traditional convolution for feature extraction and incorporates self-attention and spatial attention mechanisms to refine spatial and channel features, thereby improving the effectiveness of multi-scale feature extraction. To further enhance multi-scale feature fusion, we have developed a layer-by-layer aligned feature fusion method that establishes deeper correlations, thereby improving classification accuracy and robustness. In our study, we investigated the songs of 20 bird species, extracting wavelet spectrogram, log-Mel spectrogram and log-spectrogram features. The classification accuracies achieved by our proposed method were 93.79%, 96.97% and 95.44% for these respective features. The results indicate that the birdcall recognition method proposed in this paper outperforms recent and state-of-the-art methods. Biological sciences/Computational biology and bioinformatics Biological sciences/Ecology Earth and environmental sciences/Ecology Birdsong recognition hybird convolution attention mechanism multi-scale feature extraction deep learning Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4976065","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":359728437,"identity":"5b7a6b46-90a4-42a8-aef2-69b33778f7f3","order_by":0,"name":"Wei Li","email":"","orcid":"","institution":"Southwest Forestry University","correspondingAuthor":false,"prefix":"","firstName":"Wei","middleName":"","lastName":"Li","suffix":""},{"id":359728439,"identity":"3da5bf4f-6643-42ff-8787-1a2d7228b32e","order_by":1,"name":"Danju Lv","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA10lEQVRIiWNgGAWjYBACPgYGAwYGHgY5BgbGB0A+M2EtbFAtxkDVBqRoYWBIbCBei0Tyxs8FMofT589IZpNgqLBObGA/e4CAlrRi6Rk8h3M33ABpOZOe2MCTl4BfC88ZA2kenrTcDdL5xyQY2w4nNkjwGBDSYvwbqCVdfjbQFsZ/xGhh7zED2mKTwHAbpKWBKC1tZdZALYYb7j9mtkg4lm7cxpODXws/M/Pm27w9EvLyPYcZb3yosZbtZz+DXwsYMPZAGQkM4JgiBvwgTtkoGAWjYBSMUAAAew84ffwaTVUAAAAASUVORK5CYII=","orcid":"","institution":"Southwest Forestry University","correspondingAuthor":true,"prefix":"","firstName":"Danju","middleName":"","lastName":"Lv","suffix":""},{"id":359728440,"identity":"46b02c83-4619-47f0-aa30-a04e3ac897fa","order_by":2,"name":"Yueyun Yu","email":"","orcid":"","institution":"Southwest Forestry University","correspondingAuthor":false,"prefix":"","firstName":"Yueyun","middleName":"","lastName":"Yu","suffix":""},{"id":359728441,"identity":"3c1cd160-9196-4ebe-9c67-f0705322883a","order_by":3,"name":"Yan Zhang","email":"","orcid":"","institution":"Southwest Forestry University","correspondingAuthor":false,"prefix":"","firstName":"Yan","middleName":"","lastName":"Zhang","suffix":""},{"id":359728444,"identity":"12206aee-ec89-4673-b8e5-ee48720ee395","order_by":4,"name":"Lianglian Gu","email":"","orcid":"","institution":"Southwest Forestry University","correspondingAuthor":false,"prefix":"","firstName":"Lianglian","middleName":"","lastName":"Gu","suffix":""},{"id":359728445,"identity":"24917c0e-af31-4a1e-a2cf-1a0bd8ef506d","order_by":5,"name":"Ziqian Wang","email":"","orcid":"","institution":"Southwest Forestry University","correspondingAuthor":false,"prefix":"","firstName":"Ziqian","middleName":"","lastName":"Wang","suffix":""},{"id":359728447,"identity":"4eef513e-f893-4a2b-bb79-81e310f9ba36","order_by":6,"name":"Zhicheng Zhu","email":"","orcid":"","institution":"Southwest Forestry University","correspondingAuthor":false,"prefix":"","firstName":"Zhicheng","middleName":"","lastName":"Zhu","suffix":""}],"badges":[],"createdAt":"2024-08-26 07:44:18","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4976065/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4976065/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":72888987,"identity":"01c27d1a-9696-4eb9-b845-e9b50e372403","added_by":"auto","created_at":"2025-01-03 10:24:18","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1046752,"visible":true,"origin":"","legend":"","description":"","filename":"BirdSongRecognitionwithMulti.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4976065/v1_covered_14f2a7bc-930a-452e-8774-c63b083a6dd7.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Multi-Scale Feature Fusion Hybrid Convolution Attention Model for Birdsong Recognition","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Birdsong recognition, hybird convolution attention mechanism, multi-scale feature extraction, deep learning","lastPublishedDoi":"10.21203/rs.3.rs-4976065/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4976065/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eBirdsong is a valuable indicator of rich biodiversity and ecological significance. Although feature extraction has demonstrated satisfactory performance in classification, single-scale feature extraction methods may not fully capture the complexity of birdsong, potentially leading to suboptimal classification outcomes. The integration of multi-scale feature extraction and fusion enables the model to better handle scale variations, thereby enhancing its adaptability across different scales. To address this issue, we propose a Multi-Scale Hybird Convolutional Attention Mechanism Model (MUSCA). This method combines depth wise separable convolution and traditional convolution for feature extraction and incorporates self-attention and spatial attention mechanisms to refine spatial and channel features, thereby improving the effectiveness of multi-scale feature extraction. To further enhance multi-scale feature fusion, we have developed a layer-by-layer aligned feature fusion method that establishes deeper correlations, thereby improving classification accuracy and robustness. In our study, we investigated the songs of 20 bird species, extracting wavelet spectrogram, log-Mel spectrogram and log-spectrogram features. The classification accuracies achieved by our proposed method were 93.79%, 96.97% and 95.44% for these respective features. The results indicate that the birdcall recognition method proposed in this paper outperforms recent and state-of-the-art methods.\u003c/p\u003e","manuscriptTitle":"A Multi-Scale Feature Fusion Hybrid Convolution Attention Model for Birdsong Recognition","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-10-04 18:22:36","doi":"10.21203/rs.3.rs-4976065/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"59a5d792-1d19-4528-8304-80e7bb5f4c79","owner":[],"postedDate":"October 4th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":38279251,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":38279252,"name":"Biological sciences/Ecology"},{"id":38279253,"name":"Earth and environmental sciences/Ecology"}],"tags":[],"updatedAt":"2025-01-03T10:24:02+00:00","versionOfRecord":[],"versionCreatedAt":"2024-10-04 18:22:36","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4976065","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4976065","identity":"rs-4976065","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.