Enhancing Infrared-Visible Image Fusion via Text-Guided Adaptive Feature Integration | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Enhancing Infrared-Visible Image Fusion via Text-Guided Adaptive Feature Integration Jundong Zhang, Yanan Guo, Kangjian He, Dan Xu, SongHan Zheng, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6809147/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 04 Dec, 2025 Read the published version in Multimedia Systems → Version 1 posted 13 You are reading this latest preprint version Abstract Image fusion techniques aim to integrate complementary information from multiple modalities, such as infrared and visible images, to generate enhanced images that preserve both texture details and salient targets. Traditional methods often overemphasize low-level visual features, neglecting high-level semantic information, which limits their performance in downstream applications. This paper proposes a text-guided adaptive fusion network that incorporates language-based textual descriptions during feature extraction to capture semantic information effectively. An Adaptive Attention Fusion module dynamically integrates critical features from both modalities, while a simplified ResFormer module enhances the network’s ability to perceive local details and global structures. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in both subjective visual quality and objective metrics, achieving significant improvements in high-level vision tasks such as semantic segmentation and object detection (e.g., a 8% increase in mIoU for semantic segmentation on the MSRS dataset). Our findings underscore the potential of text-guided fusion networks in advancing image fusion technology. The code and datasets are available at https://github.com/VCMHE/TGAF . Image fusion text-guided adaptive semantic information Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 04 Dec, 2025 Read the published version in Multimedia Systems → Version 1 posted Editorial decision: Revision requested 29 Aug, 2025 Reviews received at journal 05 Aug, 2025 Reviews received at journal 01 Aug, 2025 Reviewers agreed at journal 27 Jul, 2025 Reviews received at journal 24 Jul, 2025 Reviewers agreed at journal 24 Jul, 2025 Reviews received at journal 22 Jul, 2025 Reviewers agreed at journal 22 Jul, 2025 Reviewers agreed at journal 21 Jul, 2025 Reviewers invited by journal 21 Jul, 2025 Editor assigned by journal 14 Jul, 2025 Submission checks completed at journal 05 Jun, 2025 First submitted to journal 03 Jun, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6809147","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":489058481,"identity":"48dde413-86a2-4130-b700-39b852ebf31b","order_by":0,"name":"Jundong Zhang","email":"","orcid":"","institution":"Yunnan University","correspondingAuthor":false,"prefix":"","firstName":"Jundong","middleName":"","lastName":"Zhang","suffix":""},{"id":489058492,"identity":"35959a39-0e5b-4102-8e6f-da7cc741057c","order_by":1,"name":"Yanan Guo","email":"","orcid":"","institution":"Beijing Information Science \u0026 Technology University","correspondingAuthor":false,"prefix":"","firstName":"Yanan","middleName":"","lastName":"Guo","suffix":""},{"id":489058494,"identity":"b09bdc7d-720b-415b-ba73-a6251140c45f","order_by":2,"name":"Kangjian He","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAtElEQVRIiWNgGAWjYBACAwY2BoMPMDbRWgxnkKyFmYckLebsbQnFtm13EhvYm7dJMNTcIazFsufYAePctmeJDTzHyiQYjj0jwmE30huAWg4nNkjkmEkwNhwmQsv95w3GliAt8m+I1XKD7YAxI9gWHmK1nElLMOw5d9i4jSet2CLhGDFajh8zM/hRdli2n/3wxhsfaojQAgRsBoxsQBLETCBKAwMD8wOGP0QqHQWjYBSMgpEJACoYOmPo16dwAAAAAElFTkSuQmCC","orcid":"","institution":"Yunnan University","correspondingAuthor":true,"prefix":"","firstName":"Kangjian","middleName":"","lastName":"He","suffix":""},{"id":489058495,"identity":"8b88efe3-19dd-440f-b833-b4e24b540c08","order_by":3,"name":"Dan Xu","email":"","orcid":"","institution":"Yunnan University","correspondingAuthor":false,"prefix":"","firstName":"Dan","middleName":"","lastName":"Xu","suffix":""},{"id":489058496,"identity":"253b8597-c632-4b28-8ed0-8e5b832eda21","order_by":4,"name":"SongHan Zheng","email":"","orcid":"","institution":"Yunnan University","correspondingAuthor":false,"prefix":"","firstName":"SongHan","middleName":"","lastName":"Zheng","suffix":""},{"id":489058497,"identity":"97d1a020-dd1e-4137-a398-62c71ad00c39","order_by":5,"name":"WenCheng Mei","email":"","orcid":"","institution":"Yunnan University","correspondingAuthor":false,"prefix":"","firstName":"WenCheng","middleName":"","lastName":"Mei","suffix":""}],"badges":[],"createdAt":"2025-06-03 08:53:32","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6809147/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6809147/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s00530-025-02069-w","type":"published","date":"2025-12-04T15:58:15+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":97724126,"identity":"bc02352e-ebda-4746-a12d-0c8b561af8fc","added_by":"auto","created_at":"2025-12-08 16:12:05","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":7221206,"visible":true,"origin":"","legend":"","description":"","filename":"TGAFMS.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6809147/v1_covered_ae451593-c19c-4a36-917e-2cc255a8d336.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Enhancing Infrared-Visible Image Fusion via Text-Guided Adaptive Feature Integration","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"multimedia-systems","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"mmsj","sideBox":"Learn more about [Multimedia Systems](http://link.springer.com/journal/530)","snPcode":"530","submissionUrl":"https://submission.nature.com/new-submission/530/3","title":"Multimedia Systems","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Image fusion, text-guided, adaptive, semantic information","lastPublishedDoi":"10.21203/rs.3.rs-6809147/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6809147/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Image fusion techniques aim to integrate complementary information from multiple modalities, such as infrared and visible images, to generate enhanced images that preserve both texture details and salient targets. Traditional methods often overemphasize low-level visual features, neglecting high-level semantic information, which limits their performance in downstream applications. This paper proposes a text-guided adaptive fusion network that incorporates language-based textual descriptions during feature extraction to capture semantic information effectively. An Adaptive Attention Fusion module dynamically integrates critical features from both modalities, while a simplified ResFormer module enhances the network’s ability to perceive local details and global structures. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in both subjective visual quality and objective metrics, achieving significant improvements in high-level vision tasks such as semantic segmentation and object detection (e.g., a 8% increase in mIoU for semantic segmentation on the MSRS dataset). Our findings underscore the potential of text-guided fusion networks in advancing image fusion technology. The code and datasets are available at https://github.com/VCMHE/TGAF.","manuscriptTitle":"Enhancing Infrared-Visible Image Fusion via Text-Guided Adaptive Feature Integration","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-07-23 03:04:52","doi":"10.21203/rs.3.rs-6809147/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-08-29T13:10:13+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-08-05T05:59:42+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-08-01T04:37:19+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"155345274603144728813622956214216719513","date":"2025-07-28T02:41:55+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-07-24T09:11:15+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"195969451683273659326099479941875327074","date":"2025-07-24T08:15:26+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-07-22T08:55:22+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"29295931621326238102100298439859439831","date":"2025-07-22T08:15:21+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"127949071431775316947575645435168172863","date":"2025-07-21T16:04:15+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-07-21T11:38:15+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-07-14T07:26:12+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-06-05T14:00:43+00:00","index":"","fulltext":""},{"type":"submitted","content":"Multimedia Systems","date":"2025-06-03T08:51:07+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"multimedia-systems","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"mmsj","sideBox":"Learn more about [Multimedia Systems](http://link.springer.com/journal/530)","snPcode":"530","submissionUrl":"https://submission.nature.com/new-submission/530/3","title":"Multimedia Systems","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"a4b6728e-f882-48ab-88bf-90c2dee4df4b","owner":[],"postedDate":"July 23rd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-12-08T16:07:32+00:00","versionOfRecord":{"articleIdentity":"rs-6809147","link":"https://doi.org/10.1007/s00530-025-02069-w","journal":{"identity":"multimedia-systems","isVorOnly":false,"title":"Multimedia Systems"},"publishedOn":"2025-12-04 15:58:15","publishedOnDateReadable":"December 4th, 2025"},"versionCreatedAt":"2025-07-23 03:04:52","video":"","vorDoi":"10.1007/s00530-025-02069-w","vorDoiUrl":"https://doi.org/10.1007/s00530-025-02069-w","workflowStages":[]},"version":"v1","identity":"rs-6809147","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6809147","identity":"rs-6809147","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.