Full text
6,978 characters
· extracted from
preprint-html
· click to expand
CrossLoc: Attention-Enhanced Cross-Modal Place Recognition | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 17 March 2026 V1 Latest version Share on CrossLoc: Attention-Enhanced Cross-Modal Place Recognition Authors : Andre Williams 0009-0000-9889-9808 [email protected] , Shanice Brown , and Kevin Thompson Authors Info & Affiliations https://doi.org/10.22541/au.177376150.00142207/v1 105 views 53 downloads Contents Abstract Supplementary Material Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract Cross-modal place recognition, specifically Image-to-PointCloud (I2P) localization, is fundamental for robust self-localization and navigation in various autonomous systems. However, it faces significant challenges including the inherent semantic gap between modalities, severe environmental variations, viewpoint differences, and stringent real-time computational demands. This paper introduces CrossLoc, a novel attention-enhanced framework meticulously designed for efficient and robust I2P place recognition. Our method initiates with comprehensive data preprocessing, including FoV alignment and the generation of high-quality dense depth maps from sparse LiDAR point clouds. A dual-stream feature encoder, leveraging lightweight, partially weight-shared EfficientNet B0 variants, extracts local features from both RGB images and dense depth maps. A core contribution is our Transformer-based Cross-Modal Attention Fusion Module, which dynamically learns to integrate visual and geometric information by enabling RGB features to query geometric context, thereby generating highly discriminative fused representations. These fused features are then aggregated into compact global descriptors using an Adaptive Generalized Mean (GeM) Pooling layer. Trained end-to-end using a Triplet Loss on the KITTI dataset and validated on the diverse HAOMO dataset, CrossLoc achieves leading performance and remarkable runtime efficiency, significantly outperforming prior art. Ablation studies confirm the critical contributions of our attention fusion and adaptive pooling mechanisms, while detailed analyses highlight superior feature discriminability and robustness to challenging environmental conditions. CrossLoc's blend of high accuracy, robustness, and real-time capability positions it as a practical and impactful solution for real-world autonomous applications. Supplementary Material File (crossloc.pdf) Download 1.55 MB Information & Authors Information Version history V1 Version 1 17 March 2026 Copyright This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License Keywords attention mechanism autonomous systems cross-modal feature fusion i2p localization Authors Affiliations Andre Williams 0009-0000-9889-9808 [email protected] Northern Caribbean University View all articles by this author Shanice Brown Northern Caribbean University View all articles by this author Kevin Thompson Northern Caribbean University View all articles by this author Metrics & Citations Metrics Article Usage 105 views 53 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Andre Williams, Shanice Brown, Kevin Thompson. CrossLoc: Attention-Enhanced Cross-Modal Place Recognition. Authorea . 17 March 2026. DOI: https://doi.org/10.22541/au.177376150.00142207/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.177376150.00142207/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9fe3ca989cf21640',t:'MTc3OTIwMDk0MA=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.