Deterministic Inference on Distributed AI Accelerators Interconnected by TSN | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Deterministic Inference on Distributed AI Accelerators Interconnected by TSN Abdulmajeed Alhumaidi, Yosab Bebawy, Roman Obermaisser This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9246074/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 10 You are reading this latest preprint version Abstract Deep neural networks (DNNs) are increasingly critical to embedded and cyber–physical systems that demand strict real-time guarantees. However, the computational intensity of modern DNNs often exceeds the capacity of individual resource-constrained nodes. While distributing inference across multiple nodes offers a scalable alternative, achieving deterministic end-to-end latency remains difficult due to heterogeneous hardware, complex layer dependencies, and non-deterministic communication delays. This paper introduces a execution time(ET)-aware two-level scheduling framework designed to provide predictable, distributed DNN inference on FPGA-based accelerators interconnected via time-triggered networking. The proposed approach bridges the gap between hardware-level execution and system-level coordination. At the lower level, we employ cycle-accurate instruction analysis to derive tight Worst-Case Execution Time (WCET) bounds for individual neural network layers. At the higher level, a static scheduling algorithm maps tasks to heterogeneous processors and allocates communication slots while respecting data dependencies and deterministic network constraints. The framework supports both layer-level and block-level scheduling, enabling flexible task decomposition to exploit inter-layer and intra-layer parallelism. At the higher level, a static scheduling algorithm maps these layers to heterogeneous processors and allocates communication slots, ensuring all data dependencies and network constraints are met deterministically. To validate the framework, we developed a cycle-accurate simulator of the Time Triggered-Versatile Tensor Accelerator (TT-VTA) using SystemC and implemented time-sensitive networking in OMNeT++. Evaluations using representative CNN models demonstrate that the proposed framework enables predictable, periodic inference with reduced end-to-end latency compared to single-node execution, consistently approaching the theoretical lower bound. Furthermore, block-level scheduling improves load balancing, increases processor utilization, and eliminates anomalous scheduling behavior observed under coarse-grained partitioning. These results indicate that ET-aware distributed scheduling provides the assured performance necessary for safety-critical real-time applications. Real-time systems Worst-case execution time (WCET) Deterministic scheduling Distributed deep neural network inference Heterogeneous multiprocessor scheduling Directed acyclic graph (DAG) scheduling Time-Sensitive Networking (TSN) FPGA-based acceleration Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 13 May, 2026 Reviews received at journal 11 May, 2026 Reviews received at journal 10 May, 2026 Reviewers agreed at journal 22 Apr, 2026 Reviewers agreed at journal 19 Apr, 2026 Reviewers agreed at journal 17 Apr, 2026 Reviewers invited by journal 15 Apr, 2026 Editor assigned by journal 07 Apr, 2026 Submission checks completed at journal 30 Mar, 2026 First submitted to journal 27 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9246074","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":626523424,"identity":"201da73d-db4a-4903-ad65-a6a59d2b2486","order_by":0,"name":"Abdulmajeed Alhumaidi","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABZUlEQVRIie2RQUsCQRTHnww4ly2v4yH6Cg+CEAr9Kjss6EVMEEIoaGNhvVhdDaI+QZCXOW8MbBeh60ZCLoInD9tNwqi3bNKKeg/a32UeM+/H/80MQEbGHyQvAGEEwKhGMJserTlbmG06otJLusy0UogVc6FEuFAGm5ViL1FiMNfDpEfk3KVZlhR8dZSQ9TIvcOeBGTiU9x19HoS3urENPPSij8MGcMtLK0P/WEhlsWLXb5IykWognZJUupUHAx+vL6stMCbpGAzq+6QwRgWSoqXypEs7Wrq7U9BbXS1tUceUUkmUM1Z5myKbx8pzSMoNKcBH+rP7RcpRtJqiGQrjJyWIU+xYAdQw8+KUpesP/RZN/sTEoNrUpOypIHRKpl8jhe5yYVvSNSbpwejF+sG7OrEKHd0fG229o55r4cvs9EDe2XwczeZlecWtEaxiwe/HpYk/KL+mnyiv3waYbzrIyMjI+D98A3wckCTgz7bzAAAAAElFTkSuQmCC","orcid":"","institution":"University of Siegen","correspondingAuthor":true,"prefix":"","firstName":"Abdulmajeed","middleName":"","lastName":"Alhumaidi","suffix":""},{"id":626523426,"identity":"aa45a1de-e30e-4a25-9f66-f7145380bb7f","order_by":1,"name":"Yosab Bebawy","email":"","orcid":"","institution":"University of Siegen","correspondingAuthor":false,"prefix":"","firstName":"Yosab","middleName":"","lastName":"Bebawy","suffix":""},{"id":626523427,"identity":"c9beda09-c848-43d0-9aa2-218a3ac7962c","order_by":2,"name":"Roman Obermaisser","email":"","orcid":"","institution":"University of Siegen","correspondingAuthor":false,"prefix":"","firstName":"Roman","middleName":"","lastName":"Obermaisser","suffix":""}],"badges":[],"createdAt":"2026-03-27 14:53:59","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9246074/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9246074/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107489988,"identity":"880c8e6f-2c53-44a7-8600-b753d8896998","added_by":"auto","created_at":"2026-04-22 02:49:32","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1394145,"visible":true,"origin":"","legend":"","description":"","filename":"DeterministicInferenceonDistributedAIAcceleratorsInterconnectedbyTSN.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9246074/v1_covered_3ed4ac5e-789c-4708-b8a5-e3decf8d5d01.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Deterministic Inference on Distributed AI Accelerators Interconnected by TSN","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"real-time-systems","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"time","sideBox":"Learn more about [Real-Time Systems](http://link.springer.com/journal/11241)","snPcode":"11241","submissionUrl":"https://submission.nature.com/new-submission/11241/3","title":"Real-Time Systems","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Real-time systems, Worst-case execution time (WCET), Deterministic scheduling, Distributed deep neural network inference, Heterogeneous multiprocessor scheduling, Directed acyclic graph (DAG) scheduling, Time-Sensitive Networking (TSN), FPGA-based acceleration ","lastPublishedDoi":"10.21203/rs.3.rs-9246074/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9246074/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Deep neural networks (DNNs) are increasingly critical to embedded and cyber–physical systems that demand strict real-time guarantees. However, the computational intensity of modern DNNs often exceeds the capacity of individual resource-constrained nodes. While distributing inference across multiple nodes offers a scalable alternative, achieving deterministic end-to-end latency remains difficult due to heterogeneous hardware, complex layer dependencies, and non-deterministic communication delays. This paper introduces a execution time(ET)-aware two-level scheduling framework designed to provide predictable, distributed DNN inference on FPGA-based accelerators interconnected via time-triggered networking.\nThe proposed approach bridges the gap between hardware-level execution and system-level coordination. At the lower level, we employ cycle-accurate instruction analysis to derive tight Worst-Case Execution Time (WCET) bounds for individual neural network layers. \nAt the higher level, a static scheduling algorithm maps tasks to heterogeneous processors and allocates communication slots while respecting data dependencies and deterministic network constraints. The framework supports both layer-level and block-level scheduling, enabling flexible task decomposition to exploit inter-layer and intra-layer parallelism.\nAt the higher level, a static scheduling algorithm maps these layers to heterogeneous processors and allocates communication slots, ensuring all data dependencies and network constraints are met deterministically. \nTo validate the framework, we developed a cycle-accurate simulator of the Time Triggered-Versatile Tensor Accelerator (TT-VTA) using SystemC and implemented time-sensitive networking in OMNeT++. Evaluations using representative CNN models demonstrate that the proposed framework enables predictable, periodic inference with reduced end-to-end latency compared to single-node execution, consistently approaching the theoretical lower bound. Furthermore, block-level scheduling improves load balancing, increases processor utilization, and eliminates anomalous scheduling behavior observed under coarse-grained partitioning. These results indicate that ET-aware distributed scheduling provides the assured performance necessary for safety-critical real-time applications.","manuscriptTitle":"Deterministic Inference on Distributed AI Accelerators Interconnected by TSN","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-21 12:12:49","doi":"10.21203/rs.3.rs-9246074/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-05-13T22:16:57+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-11T09:50:51+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-10T12:43:06+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"73952687419106247324671855141164700102","date":"2026-04-22T22:22:07+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"284116214171664878285006247615288234713","date":"2026-04-19T19:53:10+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"271016393763366330237176187228254387653","date":"2026-04-17T06:18:28+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-15T10:12:41+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-07T13:35:35+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-03-31T03:41:08+00:00","index":"","fulltext":""},{"type":"submitted","content":"Real-Time Systems","date":"2026-03-27T14:47:30+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"real-time-systems","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"time","sideBox":"Learn more about [Real-Time Systems](http://link.springer.com/journal/11241)","snPcode":"11241","submissionUrl":"https://submission.nature.com/new-submission/11241/3","title":"Real-Time Systems","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"43320646-a87e-4d5e-9544-62e610dd2e4a","owner":[],"postedDate":"April 21st, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-13T22:16:57+00:00","index":23,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-11T09:50:51+00:00","index":22,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-10T12:43:06+00:00","index":21,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-21T12:12:49+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-21 12:12:49","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9246074","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9246074","identity":"rs-9246074","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.