Build on Priors: Vision-Language-Guided Neuro-Symbolic Imitation Learning for Data-Efficient Real-World Robot Manipulation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Build on Priors: Vision-Language-Guided Neuro-Symbolic Imitation Learning for Data-Efficient Real-World Robot Manipulation Pierrick Lorang, Johannes Huemer, Timothy Duggan, Kai Goebel, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9254688/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 4 You are reading this latest preprint version Abstract Enabling robots to learn long-horizon manipulation tasks from a handful of demonstrations remains a central challenge in robotics. Existing neuro-symbolic approaches often rely on hand-crafted symbolic abstractions, semantically labeled trajectories or large demonstration datasets, limiting their scalability and real-world applicability. We present a scalable neuro-symbolic framework that autonomously constructs symbolic planning domains and data-efficient control policies from as few as one to thirty unannotated skill demonstrations, without requiring manual domain engineering. Our method segments demonstrations into skills and employs a Vision-Language Model (VLM) to classify skills and identify equivalent high-level states, enabling automatic construction of a state-transition graph. This graph is processed by an Answer Set Programming solver to synthesize a PDDL planning domain, which an oracle function exploits to isolate the minimal, task-relevant and target relative observation and action spaces for each skill policy. Policies are learned at the control reference level rather than at the raw actuator signal level, yielding a smoother and less noisy learning target. Known controllers can be leveraged for real-world data augmentation by projecting a single demonstration onto other objects in the scene, simultaneously enriching the graph construction process and the dataset for imitation learning. We validate our framework primarily on a real industrial forklift across statistically rigorous manipulation trials, and demonstrate cross-platform generality on a Kinova Gen3 robotic arm across two standard benchmarks. Our results show that grounding control learning, VLM-driven abstraction, and automated planning synthesis into a unified pipeline constitutes a practical path toward scalable, data-efficient, expert-free and interpretable neuro-symbolic robotics. Neuro-symbolic Imitation Learning Task and Motion Planning Symbolic Planning Skill Learning Human-Robot Interaction Real-World Robotics Full Text Additional Declarations No competing interests reported. Supplementary Files NeuroSymbolic1DemoUnloadTruckspeedx4.mp4 NeuroSymbolic30DemosStatisticalAnalysisForklift.m4v Cite Share Download PDF Status: Under Review Version 1 posted Reviewers invited by journal 27 Apr, 2026 Editor assigned by journal 31 Mar, 2026 Submission checks completed at journal 31 Mar, 2026 First submitted to journal 28 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9254688","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":615550436,"identity":"4844218c-9c99-41cd-a56f-03f4d2b58c2a","order_by":0,"name":"Pierrick Lorang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABB0lEQVRIiWNgGAWjYDACZhBhAITsEL4cAwPjAxCDsYGgFmYI35iHgdmwAa8WKIBrSewhpMWcnfnY54oCO2MGZuZjH37uqE3fL5HM/vAHg43shgPYtVg2syXPPGOQbMbADGT0njme2yORzNjMw5BmjEuLwWEeY8YGgwM2DMw8xgy8bceAWvIPNjMwHE7ErYX/M1wL49+2Y+k8QFsafzD8x6OFhxmkxQykhZm3rSYBpKWBh+EATi1Av4AclmzMBvQLs2zbAcOeM48ZZ/MARWbi0GLOf/gxY8MfO8N+9ubDjG/b6uTZ25MZPv6osJPtw+UwGIMNQh1GF8ejBQrqcCsdBaNgFIyCEQsAfGVSPphhY2sAAAAASUVORK5CYII=","orcid":"","institution":"Austrian Institute of Technology GmbH (AIT)","correspondingAuthor":true,"prefix":"","firstName":"Pierrick","middleName":"","lastName":"Lorang","suffix":""},{"id":615550437,"identity":"353e701d-48c5-40ff-82d6-5b9aa06a6b11","order_by":1,"name":"Johannes Huemer","email":"","orcid":"","institution":"Austrian Institute of Technology GmbH (AIT)","correspondingAuthor":false,"prefix":"","firstName":"Johannes","middleName":"","lastName":"Huemer","suffix":""},{"id":615550438,"identity":"34e79459-d7ff-4e0c-8c34-6e3133d6342c","order_by":2,"name":"Timothy Duggan","email":"","orcid":"","institution":"Tufts University","correspondingAuthor":false,"prefix":"","firstName":"Timothy","middleName":"","lastName":"Duggan","suffix":""},{"id":615550439,"identity":"bf42f2d3-7bc0-4e45-98e0-121c5f9ef561","order_by":3,"name":"Kai Goebel","email":"","orcid":"","institution":"Austrian Institute of Technology GmbH (AIT)","correspondingAuthor":false,"prefix":"","firstName":"Kai","middleName":"","lastName":"Goebel","suffix":""},{"id":615550440,"identity":"20fec9c5-8ccf-4813-b62f-4f22431e6934","order_by":4,"name":"Patrik Zips","email":"","orcid":"","institution":"Austrian Institute of Technology GmbH (AIT)","correspondingAuthor":false,"prefix":"","firstName":"Patrik","middleName":"","lastName":"Zips","suffix":""},{"id":615550441,"identity":"d16223f2-02f4-4993-8a3c-4b16cf8eba7a","order_by":5,"name":"Matthias Scheutz","email":"","orcid":"","institution":"Tufts University","correspondingAuthor":false,"prefix":"","firstName":"Matthias","middleName":"","lastName":"Scheutz","suffix":""}],"badges":[],"createdAt":"2026-03-28 18:23:48","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9254688/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9254688/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":106403841,"identity":"fbb4f489-24c2-4488-9f48-f07b38363b86","added_by":"auto","created_at":"2026-04-08 09:15:03","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3841906,"visible":true,"origin":"","legend":"","description":"","filename":"jqxkyyhxzxcdtbvbmvvvpkczddfsmhhw.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9254688/v1_covered_9711e3f2-7b16-4c04-95c1-f18e73f6dd8c.pdf"},{"id":106277563,"identity":"ecae607c-7d67-4af9-b69e-87ad0112cb3f","added_by":"auto","created_at":"2026-04-07 04:45:24","extension":"mp4","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":36564098,"visible":true,"origin":"","legend":"","description":"","filename":"NeuroSymbolic1DemoUnloadTruckspeedx4.mp4","url":"https://assets-eu.researchsquare.com/files/rs-9254688/v1/56315dc96a5c2e90c6671e94.mp4"},{"id":106277564,"identity":"4e57a469-d319-414b-8b30-ef6f467c07c5","added_by":"auto","created_at":"2026-04-07 04:45:24","extension":"m4v","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":37773124,"visible":true,"origin":"","legend":"","description":"","filename":"NeuroSymbolic30DemosStatisticalAnalysisForklift.m4v","url":"https://assets-eu.researchsquare.com/files/rs-9254688/v1/d1424666a250524a6f86a4fd.m4v"}],"financialInterests":"No competing interests reported.","formattedTitle":"Build on Priors: Vision-Language-Guided Neuro-Symbolic Imitation Learning for Data-Efficient Real-World Robot Manipulation","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"autonomous-robots","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"auro","sideBox":"Learn more about [Autonomous Robots](http://link.springer.com/journal/10514)","snPcode":"10514","submissionUrl":"https://submission.nature.com/new-submission/10514/3","title":"Autonomous Robots","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Neuro-symbolic, Imitation Learning, Task and Motion Planning, Symbolic Planning, Skill Learning, Human-Robot Interaction, Real-World Robotics","lastPublishedDoi":"10.21203/rs.3.rs-9254688/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9254688/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Enabling robots to learn long-horizon manipulation tasks from a handful of demonstrations remains a central challenge in robotics. Existing neuro-symbolic approaches often rely on hand-crafted symbolic abstractions, semantically labeled trajectories or large demonstration datasets, limiting their scalability and real-world applicability. We present a scalable neuro-symbolic framework that autonomously constructs symbolic planning domains and data-efficient control policies from as few as one to thirty unannotated skill demonstrations, without requiring manual domain engineering. Our method segments demonstrations into skills and employs a Vision-Language Model (VLM) to classify skills and identify equivalent high-level states, enabling automatic construction of a state-transition graph. This graph is processed by an Answer Set Programming solver to synthesize a PDDL planning domain, which an oracle function exploits to isolate the minimal, task-relevant and target relative observation and action spaces for each skill policy. Policies are learned at the control reference level rather than at the raw actuator signal level, yielding a smoother and less noisy learning target. Known controllers can be leveraged for real-world data augmentation by projecting a single demonstration onto other objects in the scene, simultaneously enriching the graph construction process and the dataset for imitation learning. We validate our framework primarily on a real industrial forklift across statistically rigorous manipulation trials, and demonstrate cross-platform generality on a Kinova Gen3 robotic arm across two standard benchmarks. Our results show that grounding control learning, VLM-driven abstraction, and automated planning synthesis into a unified pipeline constitutes a practical path toward scalable, data-efficient, expert-free and interpretable neuro-symbolic robotics.","manuscriptTitle":"Build on Priors: Vision-Language-Guided Neuro-Symbolic Imitation Learning for Data-Efficient Real-World Robot Manipulation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-07 04:45:12","doi":"10.21203/rs.3.rs-9254688/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewersInvited","content":"","date":"2026-04-27T19:57:27+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-03-31T23:37:34+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-03-31T05:00:22+00:00","index":"","fulltext":""},{"type":"submitted","content":"Autonomous Robots","date":"2026-03-28T18:12:14+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"autonomous-robots","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"auro","sideBox":"Learn more about [Autonomous Robots](http://link.springer.com/journal/10514)","snPcode":"10514","submissionUrl":"https://submission.nature.com/new-submission/10514/3","title":"Autonomous Robots","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"em","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"2c649356-87b0-4a13-941a-f48f2b899eee","owner":[],"postedDate":"April 7th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-27T20:08:44+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-07 04:45:12","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9254688","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9254688","identity":"rs-9254688","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.