Multi-Modal LLMs and Multi-Camera Tracking across an Edge- Fog-Cloud Architecture: Toward Inclusive Smart Urban Mobility | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Multi-Modal LLMs and Multi-Camera Tracking across an Edge- Fog-Cloud Architecture: Toward Inclusive Smart Urban Mobility Mourad Raif, Abdessamad El Rharras, Rachid Saadane, Abdellah Chehri This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8952135/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 10 You are reading this latest preprint version Abstract The integration of Generative Artificial-Intelligence (Gen-AI) with edge computing techniques opens a large spectrum of opportunities in intelligent transportation system (ITS), where emerging mobility infrastructures can significantly benefit vulnerable road users, and specifically visually impaired people (VIP). In this paper, we propose a human-aware, three-tier Edge-Fog-Cloud architecture which unlocks mutual recognition between vehicles and visually impaired individuals, through distributed multimodal perception, reasoning, and communication. Compared to conventional centralized frameworks, this framework presents the interaction of multi-camera tracking, multimodal large language models (MLLMs), and Generative AI in a coherent, latency-aware processes. At the Edge perception detectors reinforce privacy-preserving embeddings, and bidirectional lightweight-based communication protocols between pedestrians, and vehicles. The results are passed to the Fog level to process cross-camera data using graph-association and occlusion-aware transformer-based algorithms. The Cloud hosts the model hub, knowledge base, model governance, and updates aligned with ISO/IEC 42001, IEEE 7000, and NIST AI RMF standards. Multimodal large language models (MLLMs) provide grounded scene narration and instruction-based queries for inclusive interactions. The insights gained and methods showcased in this research highlight the potential of our architecture to extend ITS capabilities from perception to understanding, by combining generative recovery and Re-Identification in challenging environments, withing a trustworthy AI governance framework. Our contribution defines the reference framework and design principles for future Assistive Vehicular Intelligence. Assistive vehicular intelligence Edge-Fog-Cloud architecture Edge computing Intelligent Transportation Systems Multi-Modal Large Language Models Multi-Camera Multi-Object Tracking (MC-MOT) Mutual recognition Occlusion-Aware Tracking Smart Cities Vehicle-to-pedestrian (V2P) communication: Visually impaired pedestrians (VIPs) Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 24 Mar, 2026 Reviews received at journal 19 Mar, 2026 Reviewers agreed at journal 17 Mar, 2026 Reviewers agreed at journal 07 Mar, 2026 Reviews received at journal 06 Mar, 2026 Reviewers agreed at journal 04 Mar, 2026 Reviewers invited by journal 04 Mar, 2026 Editor assigned by journal 26 Feb, 2026 Submission checks completed at journal 25 Feb, 2026 First submitted to journal 23 Feb, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8952135","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":602485232,"identity":"638ca849-bbd2-4110-9227-d269abc5b74d","order_by":0,"name":"Mourad Raif","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABI0lEQVRIiWNgGAWjYNACNiBmZnwAIhkY2BvYiNXCbADRwnOAjeEAUVoYYFokEvBr0W3vMfvwo4whn78dqPpHhbUc/8zHxx5/3MEgz9/AnfgBixazM2eMZ/acY7CccZiZgbHnTLqxxO20dIODZxgMZxzg3SyBTcuNHGMG3jYGA4bD/AeYGdsOJ26QzjGTONjGwLiBgXcDLi2Mf4Fa5IG2MDP+A2qRPP8NpMUeqGXzDxxamEG2GIC1NAC1SPCwgbQkArVsw2rLmWPFzDLnJAwMgVoO9hwD+uVMmpnE2TMSyTMO826zwKblePNmxjdlNgZy5w8zPvhRAwyx9sPPJCp32Nj2t/duvoE7rCEOOADnMzZIgOOIBMDYQJLyUTAKRsEoGN4AAEaSXoDaqA8iAAAAAElFTkSuQmCC","orcid":"","institution":"Hassania School of Public Works","correspondingAuthor":true,"prefix":"","firstName":"Mourad","middleName":"","lastName":"Raif","suffix":""},{"id":602485233,"identity":"dcd4d4f0-54db-413f-9885-dcc7c5d50adf","order_by":1,"name":"Abdessamad El Rharras","email":"","orcid":"","institution":"Hassania School of Public Works","correspondingAuthor":false,"prefix":"","firstName":"Abdessamad","middleName":"El","lastName":"Rharras","suffix":""},{"id":602485234,"identity":"222ee0bf-ed29-4c9f-8432-765c77f4c12e","order_by":2,"name":"Rachid Saadane","email":"","orcid":"","institution":"Hassania School of Public Works","correspondingAuthor":false,"prefix":"","firstName":"Rachid","middleName":"","lastName":"Saadane","suffix":""},{"id":602485239,"identity":"ce118063-e884-407d-af8b-31bc8a1ed616","order_by":3,"name":"Abdellah Chehri","email":"","orcid":"","institution":"Royal Military College of Canada","correspondingAuthor":false,"prefix":"","firstName":"Abdellah","middleName":"","lastName":"Chehri","suffix":""}],"badges":[],"createdAt":"2026-02-24 03:23:57","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8952135/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8952135/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":104404720,"identity":"1ba091ba-23b5-46b7-a1e1-5861af8e4357","added_by":"auto","created_at":"2026-03-11 12:20:57","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":571638,"visible":true,"origin":"","legend":"","description":"","filename":"MLLMsMCMOTEdgeFogCloudArchitectureInclusiveSmartUrbanMobilityV1.1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8952135/v1_covered_ca593e0d-8982-4c18-8b33-6d53883f207d.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Multi-Modal LLMs and Multi-Camera Tracking across an Edge- Fog-Cloud Architecture: Toward Inclusive Smart Urban Mobility","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"humancentric-intelligent-systems","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Human-Centric Intelligent Systems](https://link.springer.com/journal/44230)","snPcode":"44230","submissionUrl":"https://submission.springernature.com/new-submission/44230/3","title":"Human-Centric Intelligent Systems","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Open","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Assistive vehicular intelligence, Edge-Fog-Cloud architecture, Edge computing, Intelligent Transportation Systems, Multi-Modal Large Language Models, Multi-Camera Multi-Object Tracking (MC-MOT), Mutual recognition, Occlusion-Aware Tracking, Smart Cities, Vehicle-to-pedestrian (V2P) communication: Visually impaired pedestrians (VIPs)","lastPublishedDoi":"10.21203/rs.3.rs-8952135/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8952135/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eThe integration of\u003c/strong\u003e \u003cstrong\u003eGenerative Artificial-Intelligence (Gen-AI) with edge computing techniques opens a large spectrum of opportunities in intelligent transportation system (ITS), where emerging mobility infrastructures can significantly benefit vulnerable road users, and specifically visually impaired people (VIP). In this paper, we propose a human-aware, three-tier Edge-Fog-Cloud architecture which unlocks mutual recognition between vehicles and visually impaired individuals, through distributed multimodal perception, reasoning, and communication.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompared to conventional centralized frameworks, this framework presents the interaction of multi-camera tracking, multimodal large language models (MLLMs), and Generative AI in a coherent, latency-aware processes. At the Edge perception detectors reinforce privacy-preserving embeddings, and bidirectional lightweight-based communication protocols between pedestrians, and vehicles. The results are passed to the Fog level to process cross-camera data using graph-association and occlusion-aware transformer-based algorithms. The Cloud hosts the model hub, knowledge base, model governance, and updates aligned with ISO/IEC 42001, IEEE 7000, and NIST AI RMF standards. Multimodal large language models (MLLMs) provide grounded scene narration and instruction-based queries for inclusive interactions.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eThe insights gained and methods showcased in this research highlight the potential of our architecture to extend ITS capabilities from perception to understanding, by combining generative recovery and Re-Identification in challenging environments, withing a trustworthy AI governance framework.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eOur contribution defines the reference framework and design principles for future Assistive Vehicular Intelligence.\u003c/strong\u003e\u003c/p\u003e","manuscriptTitle":"Multi-Modal LLMs and Multi-Camera Tracking across an Edge- Fog-Cloud Architecture: Toward Inclusive Smart Urban Mobility","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-09 11:44:29","doi":"10.21203/rs.3.rs-8952135/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-03-24T05:32:22+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-03-19T09:02:27+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"241662247335536156784267948749001053938","date":"2026-03-17T07:03:30+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"281848205813800047550301148352463980644","date":"2026-03-08T02:58:19+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-03-06T09:04:31+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"325564374308410022376411644988530864594","date":"2026-03-04T13:24:10+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-03-04T12:56:49+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-02-26T07:55:29+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-02-25T05:59:18+00:00","index":"","fulltext":""},{"type":"submitted","content":"Human-Centric Intelligent Systems","date":"2026-02-24T03:18:15+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"humancentric-intelligent-systems","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [Human-Centric Intelligent Systems](https://link.springer.com/journal/44230)","snPcode":"44230","submissionUrl":"https://submission.springernature.com/new-submission/44230/3","title":"Human-Centric Intelligent Systems","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Open","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d92f6788-ad14-4709-bfbb-83f20daf68ad","owner":[],"postedDate":"March 9th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-14T08:53:29+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-09 11:44:29","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8952135","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8952135","identity":"rs-8952135","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.