Maritime Ship Target Detection Based on Visible and Infrared Modal Image Fusion | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Maritime Ship Target Detection Based on Visible and Infrared Modal Image Fusion Runbang Liu, Zhiyu Zhu, Huilin Ge, Jing Wang, Yongdong Shu, Qingshan Ji This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9091613/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 7 You are reading this latest preprint version Abstract The deep learning based maritime ship target detection is a key technology in fields such as ship navigation, water surface security, and military early warning. In view of the inherent limitations of maritime vessel object detection in single modality, a novel YOLO for maritime vessel object detection according to the visible and infrared modality images fusion (VIMF-YOLO) is built. The VIMF-YOLO is improved from YOLO v8 and which can effectively extract and aggregate the features of different modal ship target images. Additionally, it employs dual-modal fusion module (DMFM) to adaptively weight and fuse the different modalities features of vessel images in visible and infrared, thereby fully leveraging the complementary superiority of these modalities. To better acquire channel and positional information of different modal features, efficient multi-scale attention (EMA) is introduced into DMFM and VIMF-YOLO networks to improve the representation ability of different modal features. In addition, a paired image dataset for visible and infrared maritime ship images is built, and a large number detection test experiments for VIMF-YOLO is conducted on this basis. The experimental results prove that, matched with current SOTA ship target detection algorithms, the dual-modal fusion detection algorithm VIMF-YOLO exhibits superior detection accuracy. Physical sciences/Engineering Physical sciences/Mathematics and computing Ship Detection Multimodal Fusion Deep Learning Attention Mechanism Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 19 May, 2026 Reviewers agreed at journal 06 May, 2026 Reviewers invited by journal 18 Mar, 2026 Editor invited by journal 16 Mar, 2026 Editor assigned by journal 12 Mar, 2026 Submission checks completed at journal 12 Mar, 2026 First submitted to journal 11 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9091613","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":608133200,"identity":"2ab7beae-b6ed-456f-8ff1-07a4fa3708a3","order_by":0,"name":"Runbang Liu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABDElEQVRIie2OsWrDMBBAzxguix2vCaX4F2IMydLSX7HoJxRKhwwCg74h0CTuJwgKpdlOGDSFevWQIVOmBgIe26FyAs1kx2OhenA6dNy7OwCL5Q/Sd4+JTLhEhyc8lb0WBc8KMjVbGwUvKfCreHHuC+ig9Lxx5b9vwknwMiJ/2WdZxp3tXkA4aTzMi5/99S5azbYJDd+QSQ1utBAQrXiz4void2RJRFGtIOCVuTAZUbtyJ0vFic2RZQJ6X10UJosUSHFkXJtiu4IPzkLs7mWJoLjGWGqWDucfg0g2KEGQv8JebG5lUVTV91RfZ2muDp+PN2HTljODpH61CYfX30v99b7j0GmHTovFYvlv/ADHmV3ieOFXlAAAAABJRU5ErkJggg==","orcid":"","institution":"Jiangsu University of Science and Technology","correspondingAuthor":true,"prefix":"","firstName":"Runbang","middleName":"","lastName":"Liu","suffix":""},{"id":608133201,"identity":"154fbf77-f583-43e5-9e3d-a3024b6ceae6","order_by":1,"name":"Zhiyu Zhu","email":"","orcid":"","institution":"Jiangsu University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Zhiyu","middleName":"","lastName":"Zhu","suffix":""},{"id":608133202,"identity":"e9d093ed-aa2f-46b5-9224-84575aea7bba","order_by":2,"name":"Huilin Ge","email":"","orcid":"","institution":"Jiangsu University of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Huilin","middleName":"","lastName":"Ge","suffix":""},{"id":608133203,"identity":"43548162-ef36-47bc-a384-5839563b8012","order_by":3,"name":"Jing Wang","email":"","orcid":"","institution":"University of PLA","correspondingAuthor":false,"prefix":"","firstName":"Jing","middleName":"","lastName":"Wang","suffix":""},{"id":608133204,"identity":"ae5fef0c-1825-493d-856d-41ef41fb169c","order_by":4,"name":"Yongdong Shu","email":"","orcid":"","institution":"Nanjing High Accurate Marine Equipment Co, Ltd","correspondingAuthor":false,"prefix":"","firstName":"Yongdong","middleName":"","lastName":"Shu","suffix":""},{"id":608133205,"identity":"bcd9c0d3-eb5f-4d79-928d-795597523f97","order_by":5,"name":"Qingshan Ji","email":"","orcid":"","institution":"Nanjing High Accurate Marine Equipment Co, Ltd","correspondingAuthor":false,"prefix":"","firstName":"Qingshan","middleName":"","lastName":"Ji","suffix":""}],"badges":[],"createdAt":"2026-03-11 07:55:39","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9091613/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9091613/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":105062717,"identity":"09484a17-c0f6-40ff-b953-0aeefd74ae06","added_by":"auto","created_at":"2026-03-20 13:11:57","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":591096,"visible":true,"origin":"","legend":"","description":"","filename":"MaritimeShipTargetDetectionBasedonVisibleandInfraredModalImageFusion.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9091613/v1_covered_7218bb4c-fdd1-4fd4-8273-fd909a36d0a1.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Maritime Ship Target Detection Based on Visible and Infrared Modal Image Fusion","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Ship Detection, Multimodal Fusion, Deep Learning, Attention Mechanism","lastPublishedDoi":"10.21203/rs.3.rs-9091613/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9091613/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe deep learning based maritime ship target detection is a key technology in fields such as ship navigation, water surface security, and military early warning. In view of the inherent limitations of maritime vessel object detection in single modality, a novel YOLO for maritime vessel object detection according to the visible and infrared modality images fusion (VIMF-YOLO) is built. The VIMF-YOLO is improved from YOLO v8 and which can effectively extract and aggregate the features of different modal ship target images. Additionally, it employs dual-modal fusion module (DMFM) to adaptively weight and fuse the different modalities features of vessel images in visible and infrared, thereby fully leveraging the complementary superiority of these modalities. To better acquire channel and positional information of different modal features, efficient multi-scale attention (EMA) is introduced into DMFM and VIMF-YOLO networks to improve the representation ability of different modal features. In addition, a paired image dataset for visible and infrared maritime ship images is built, and a large number detection test experiments for VIMF-YOLO is conducted on this basis. The experimental results prove that, matched with current SOTA ship target detection algorithms, the dual-modal fusion detection algorithm VIMF-YOLO exhibits superior detection accuracy.\u003c/p\u003e","manuscriptTitle":"Maritime Ship Target Detection Based on Visible and Infrared Modal Image Fusion","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-20 13:11:02","doi":"10.21203/rs.3.rs-9091613/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-05-19T08:11:17+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"125336904359551535386143068734787964907","date":"2026-05-06T08:05:21+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-03-18T07:46:30+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-03-16T20:25:47+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-03-12T11:17:18+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-03-12T11:16:52+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2026-03-11T07:48:45+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d0f5f273-aacf-46fc-844c-e191de4d934c","owner":[],"postedDate":"March 20th, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-19T08:11:17+00:00","index":86,"fulltext":""},{"type":"reviewerAgreed","content":"125336904359551535386143068734787964907","date":"2026-05-06T08:05:21+00:00","index":85,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":64710183,"name":"Physical sciences/Engineering"},{"id":64710184,"name":"Physical sciences/Mathematics and computing"}],"tags":[],"updatedAt":"2026-03-20T13:11:02+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-20 13:11:02","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9091613","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9091613","identity":"rs-9091613","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.