Real-Time Tiny Object Detection in UAV Aerial Images with Multi-Scale Attention Fusion | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Real-Time Tiny Object Detection in UAV Aerial Images with Multi-Scale Attention Fusion Junming Gao, Yanshan Zhang, Yuanzhang Fan, Bao Tian This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7384955/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 27 Mar, 2026 Read the published version in Journal of Real-Time Image Processing → Version 1 posted 9 You are reading this latest preprint version Abstract With the rapid advancement of unmanned aerial vehicle (UAV) technology, the demand for efficient and accurate object detection algorithms has become increasingly urgent. However, UAV aerial images present numerous challenges, including irregular target shapes, frequent occlusion, and stringent real-time requirements. These factors limit the performance of existing detection algorithms in practical applications. To address these issues, this paper proposes MCFA-Net, a multi-scale contextual feature aggregation network that integrates Transformer and Convolutional Neural Network (CNN) techniques. Specifically, YOLOv8 serves as the backbone, enhanced with an Attention-based Intrascale Feature Interaction (AIFI) module that leverages self-attention mechanisms to improve small object recognition across different scales. In the neck, a lightweight multi-resolution feature pyramid network (MRFPN) is designed to strengthen multi-scale feature fusion, while the Dynamic Detection Head (DyHead) incorporates adaptive attention to enhance robustness in dense and small-object scenarios. Comprehensive experiments conducted on the VisDrone2019 dataset, including ablation studies, comparative analyses, and interpretability evaluations, demonstrate the effectiveness of the proposed method. MCFA-Net achieves notable improvements, raising [email protected] and [email protected] : 0.95 by 21% and 23.4%, respectively, while also increasing the inference speed from 105 FPS to 118 FPS. Furthermore, validation on the AI-TOD dataset confirms the robustness and generalization capability of the model. \newline\textnormal{Keywords:} tiny object detection; UAV imagery; multi-scale feature fusion; MCFA-Net tiny object detection UAV imagery multiscale feature fusion MCFA-Net Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 27 Mar, 2026 Read the published version in Journal of Real-Time Image Processing → Version 1 posted Editorial decision: Revision requested 08 Jan, 2026 Reviews received at journal 07 Jan, 2026 Reviewers agreed at journal 19 Dec, 2025 Reviews received at journal 14 Oct, 2025 Reviewers agreed at journal 21 Sep, 2025 Reviewers invited by journal 02 Sep, 2025 Editor assigned by journal 01 Sep, 2025 Submission checks completed at journal 01 Sep, 2025 First submitted to journal 16 Aug, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7384955","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":509197350,"identity":"e6dc23d9-df97-44ae-9234-6168ac79c0ec","order_by":0,"name":"Junming Gao","email":"","orcid":"","institution":"Zhengzhou University of Aeronautics","correspondingAuthor":false,"prefix":"","firstName":"Junming","middleName":"","lastName":"Gao","suffix":""},{"id":509197351,"identity":"693b1143-7dfb-4207-9d3e-750694746a35","order_by":1,"name":"Yanshan Zhang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAzElEQVRIiWNgGAWjYJACZiCWgzDZSNBiTLqWxAaitRgcP3v4c0HNnfT5M5IfMHwoO8zAP7uBgJYzeWnSM449y22ckWbAOOPcYQaJOwcIaDmQY8bMw3Y4t1kiwYCZt+0wg4FEAgEt598Yf+b5dzidTSL9A/NforTcyDGQBhqewCORY8DMSIwWyRtvzKR5+w4bzuB5U3Cw51w6j8QNAlr4zucAHfbtsLx8e/rGBz/KrOX4ZxDQonAAiQNi8+BXDwTyDQSVjIJRMApGwYgHALvtQmqqhTnxAAAAAElFTkSuQmCC","orcid":"","institution":"Zhengzhou University of Aeronautics","correspondingAuthor":true,"prefix":"","firstName":"Yanshan","middleName":"","lastName":"Zhang","suffix":""},{"id":509197352,"identity":"e1f7b44e-1a2e-4969-bbb9-3b12926e25b0","order_by":2,"name":"Yuanzhang Fan","email":"","orcid":"","institution":"Zhengzhou University of Aeronautics","correspondingAuthor":false,"prefix":"","firstName":"Yuanzhang","middleName":"","lastName":"Fan","suffix":""},{"id":509197353,"identity":"376cc469-5ba1-46db-82bc-68602cdc9a71","order_by":3,"name":"Bao Tian","email":"","orcid":"","institution":"Zhengzhou University of Aeronautics","correspondingAuthor":false,"prefix":"","firstName":"Bao","middleName":"","lastName":"Tian","suffix":""}],"badges":[],"createdAt":"2025-08-16 04:23:17","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7384955/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7384955/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s11554-026-01873-5","type":"published","date":"2026-03-27T16:11:26+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":105755015,"identity":"c82e27c3-f0b6-4222-8c05-4049252d1fcb","added_by":"auto","created_at":"2026-03-30 16:24:06","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1214079,"visible":true,"origin":"","legend":"","description":"","filename":"realtime1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7384955/v1_covered_120a5c87-3b5c-4aa6-b52b-e8635a54f0a2.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Real-Time Tiny Object Detection in UAV Aerial Images with Multi-Scale Attention Fusion\n\n\n","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"journal-of-real-time-image-processing","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"rtip","sideBox":"Learn more about [Journal of Real-Time Image Processing](http://link.springer.com/journal/11554)","snPcode":"11554","submissionUrl":"https://submission.nature.com/new-submission/11554/3","title":"Journal of Real-Time Image Processing","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"tiny object detection, UAV imagery, multiscale feature fusion, MCFA-Net","lastPublishedDoi":"10.21203/rs.3.rs-7384955/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7384955/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"With the rapid advancement of unmanned aerial vehicle (UAV) technology, the demand for efficient and accurate object detection algorithms has become increasingly urgent. However, UAV aerial images present numerous challenges, including irregular target shapes, frequent occlusion, and stringent real-time requirements. These factors limit the performance of existing detection algorithms in practical applications. To address these issues, this paper proposes MCFA-Net, a multi-scale contextual feature aggregation network that integrates Transformer and Convolutional Neural Network (CNN) techniques. Specifically, YOLOv8 serves as the backbone, enhanced with an Attention-based Intrascale Feature Interaction (AIFI) module that leverages self-attention mechanisms to improve small object recognition across different scales. In the neck, a lightweight multi-resolution feature pyramid network (MRFPN) is designed to strengthen multi-scale feature fusion, while the Dynamic Detection Head (DyHead) incorporates adaptive attention to enhance robustness in dense and small-object scenarios. Comprehensive experiments conducted on the VisDrone2019 dataset, including ablation studies, comparative analyses, and interpretability evaluations, demonstrate the effectiveness of the proposed method. MCFA-Net achieves notable improvements, raising
[email protected] and
[email protected]:\n0.95 by 21\\% and 23.4\\%, respectively, while also increasing the inference speed from 105 FPS to 118 FPS. Furthermore, validation on the AI-TOD dataset confirms the robustness and generalization capability of the model. \n\\newline\\textnormal{Keywords:} tiny object detection; UAV imagery; multi-scale feature fusion; MCFA-Net","manuscriptTitle":"Real-Time Tiny Object Detection in UAV Aerial Images with Multi-Scale Attention Fusion","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-08 02:49:08","doi":"10.21203/rs.3.rs-7384955/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-01-08T14:25:33+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-01-07T10:20:00+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"142816679529623957362545302346810348817","date":"2025-12-19T10:32:19+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-10-15T03:42:13+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"15395619976748621268415420047680816175","date":"2025-09-22T01:16:17+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-09-02T15:09:42+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-09-02T02:37:17+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-09-02T02:36:53+00:00","index":"","fulltext":""},{"type":"submitted","content":"Journal of Real-Time Image Processing","date":"2025-08-16T04:20:37+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"journal-of-real-time-image-processing","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"rtip","sideBox":"Learn more about [Journal of Real-Time Image Processing](http://link.springer.com/journal/11554)","snPcode":"11554","submissionUrl":"https://submission.nature.com/new-submission/11554/3","title":"Journal of Real-Time Image Processing","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"5165ae53-ee55-4ec0-ac8a-3dcd04ab87a4","owner":[],"postedDate":"September 8th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2026-03-30T16:18:31+00:00","versionOfRecord":{"articleIdentity":"rs-7384955","link":"https://doi.org/10.1007/s11554-026-01873-5","journal":{"identity":"journal-of-real-time-image-processing","isVorOnly":false,"title":"Journal of Real-Time Image Processing"},"publishedOn":"2026-03-27 16:11:26","publishedOnDateReadable":"March 27th, 2026"},"versionCreatedAt":"2025-09-08 02:49:08","video":"","vorDoi":"10.1007/s11554-026-01873-5","vorDoiUrl":"https://doi.org/10.1007/s11554-026-01873-5","workflowStages":[]},"version":"v1","identity":"rs-7384955","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7384955","identity":"rs-7384955","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.