Full text
6,767 characters
· extracted from
preprint-html
· click to expand
A Survey on Evaluation of Embodied AI | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 15 January 2026 V1 Latest version Share on A Survey on Evaluation of Embodied AI Authors : Liyu Hou 0009-0007-2007-6714 [email protected] , Linyuan Gao , Yuan Wu , and Yi Chang Authors Info & Affiliations https://doi.org/10.22541/au.176851544.45077723/v1 480 views 212 downloads Contents Abstract Supplementary Material Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract The rapid progression of Multimodal Large Language Models and Vision-Language-Action models has substantially propelled Embodied AI. Nevertheless, comprehensively evaluating these systems remains challenging, primarily due to the representational gaps between semantic understanding and physical grounding, alongside inherent limitations within specific modules of current agents. Existing evaluations reveal that agents face significant deficits not only in individual capabilities like perception and planning but also in the dynamic system integration required for reliable real-world deployment. To address these challenges, this review establishes a systematic evaluation framework structured around the complete Perception-Cognition-Planning-Action loop. First, regarding evaluation targets, we dissect four core capabilities ranging from spatial perception to action execution, and further analyze the system's trustworthiness across dimensions of Safety, Robustness, and Generalization. Second, regarding evaluation platforms, we systematically summarize representative simulators, datasets, and benchmarks, highlighting the technological transition from rigid physics engines to scalable generative environments to assist researchers in selecting appropriate testbeds. Third, regarding evaluation methodologies, we examine the critical shift from outcome-oriented metrics to multidimensional assessments that emphasize process quality and physical safety. Finally, we identify grand challenges and advocate for a closed-loop Evaluation-Diagnosis-Enhancement paradigm. This work aims to facilitate the bridging of the gap between semantic understanding and physical grounding, providing a rigorous reference for standardizing the evaluation of General Embodied Intelligence. Supplementary Material File (a survey on evaluation of embodied ai.pdf) Download 2.22 MB Information & Authors Information Version history V1 Version 1 15 January 2026 Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords agent benchmark embodied ai multimodal large language model vision-language-action model Authors Affiliations Liyu Hou 0009-0007-2007-6714 [email protected] View all articles by this author Linyuan Gao View all articles by this author Yuan Wu View all articles by this author Yi Chang View all articles by this author Metrics & Citations Metrics Article Usage 480 views 212 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Liyu Hou, Linyuan Gao, Yuan Wu, et al. A Survey on Evaluation of Embodied AI. Authorea . 15 January 2026. DOI: https://doi.org/10.22541/au.176851544.45077723/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.176851544.45077723/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9fe4038bbec1593a',t:'MTc3OTIwMzI3Mw=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.