Morphological Curiosity: Self-Directed Body-Policy Co-Evolution in Modular Robots via Intrinsic Motivation

doi:10.21203/rs.3.rs-9570273/v1

Morphological Curiosity: Self-Directed Body-Policy Co-Evolution in Modular Robots via Intrinsic Motivation

2026 · doi:10.21203/rs.3.rs-9570273/v1

preprint OA: closed

Full text JSON View at publisher

Full text 110,789 characters · extracted from preprint-html · click to expand

Morphological Curiosity: Self-Directed Body-Policy Co-Evolution in Modular Robots via Intrinsic Motivation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Morphological Curiosity: Self-Directed Body-Policy Co-Evolution in Modular Robots via Intrinsic Motivation Eren Bajracharya This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9570273/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract We introduce Morphological Curiosity, a framework in which modular robots autonomously co-evolve their physical bodies and control policies without any externally provided reward signal. Existing robot learning methods overwhelmingly fix the hardware and optimize only behavior; co-evolutionary alternatives require hand designed fitness functions tied to specific tasks. We depart from both traditions by framing body-policy search as an unsupervised exploration problem, where the only driving signal is the robot's own predictive uncertainty about its world. Robot morphology is encoded as a differentiable computational graph processed by a Graph Neural Network, permitting gradient informed addition and removal of limbs during training. A morphology-conditioned forward model implemented as a probabilistic ensemble generates a curiosity signal that is explicitly sensitive to structural change, not merely behavioral novelty. We further decompose this signal into a morphological novelty term, which rewards discovering structurally distinct body configurations, and a behavioral novelty term, which rewards surprising transitions within a fixed body. Training proceeds in simulation with morphology-aware domain randomization; learned policies are transferred to physical hardware without fine tuning. Our experiments support three claims: curiosity driven co-evolution produces more transferable morphologies than reward-driven search; self-selected bodies outperform fixed body agents on unseen tasks under zero shot transfer; and the intrinsic objective causes ecologically coherent structural regularities limb symmetry, distal joint richness, and energetic efficiency to emerge without being explicitly encoded as objectives. Modular robotics morphological co-evolution intrinsic motivation curiosity driven exploration graph neural networks novelty search sim-to-real transfer embodied intelligence open ended learning zero shot generalization Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction A recurring observation in biological systems is that cognitive capability cannot be cleanly separated from the body through which it is exercised [ 49 , 50 ]. The dexterity available to a cephalopod, the gait efficiency of a quadruped, and the manipulative range of a primate hand are each co-determined by neural control and physical form. This interdependence is strikingly illustrated in recent work on affordable prosthetic limbs [ 13 ], where the arrangement of actuated joints directly constrains what grasps are achievable no controller, however sophisticated, can recover a capability the morphology does not afford. Despite this, the prevailing approach in robot learning treats the body as a constant input rather than a design variable. Hardware is fixed at the outset, and the entire optimization budget is spent on policy search within the space that morphology permits. The evolutionary robotics literature has grappled with this limitation for decades, producing methods that jointly search over body plans and controllers [ 1 , 2 , 3 ]. The fundamental difficulty with these approaches is that they require a fitness function defined over a specific task [ 5 , 6 ]. A morphology shaped by locomotion on flat ground transfers poorly to rough terrain, and the search must be restarted when tasks change. The problem is not merely one of sample efficiency it reflects a conceptual mismatch between the goal of producing generally capable physical agents and the mechanism of reward-driven optimization, which by design narrows the search toward one performance criterion. Intrinsic motivation provides a principled alternative [ 22 , 23 , 24 ]. Agents driven by prediction error or information gain explore without any specification of what to accomplish [ 26 , 29 ], and have demonstrated the ability to acquire broad behavioral repertoires in fixed body settings [ 25 , 30 , 32 ]. The natural extension asking whether curiosity can also drive exploration over body structure has received little systematic attention. We argue that it can, and that the resulting morphologies are better suited to downstream task transfer precisely because they were shaped by diversity of experience rather than by any particular performance target. We present Morphological Curiosity, a system in which a modular robot simultaneously explores its own structural space and the behavioral space available to each body it inhabits, guided at every step by a single intrinsic objective. The central observation enabling this is that a world model conditioned on morphology produces prediction errors that are sensitive to structural change: a body the model has not yet encountered generates high surprise even for familiar actions, creating an automatic drive toward structural novelty. We formalize this into a dual-level reward that separately credits morphological and behavioral novelty, assigns them distinct annealing schedules, and combines them to produce a stable co-evolution signal without any task specification. Robot morphology is represented as a dynamic graph over a differentiable GNN [ 17 , 18 , 21 ], which encodes bodies of varying size and topology in a fixed-dimensional space amenable to gradient-based search [ 15 , 16 ]. A morphological novelty archive [ 4 , 9 ] and an ensemble disagreement signal [ 24 , 31 ] together constitute the dual-level reward. The system is trained in simulation and deployed on physical hardware via morphology-conditioned domain randomization. We evaluate against task-reward co-evolution [ 5 , 6 ] and fixed-body curiosity agents [ 24 , 25 ] across locomotion and manipulation benchmarks under strict zero-shot transfer. We further examine whether the intrinsic objective implicitly recovers structural regularities associated with biological design [ 49 , 50 , 51 ], and find that it does symmetry, proximal-to-distal joint gradients, and energetic parsimony all emerge without reward. The paper proceeds as follows. Section 3 formalizes the framework and its components. Section 5 presents experimental evaluation and ablations. Section 6 discusses limitations and future directions. Methodology 3.1 Problem Formulation Let M denote the space of valid modular robot morphologies and Π_m the space of policies conditioned on morphology m ∈ M. At each training step the agent inhabits a configuration (m, π) and interacts with an environment to produce experience. No external reward is provided a deliberate departure from reward-shaping approaches [ 40 , 44 ] and task-conditioned co-evolution [ 5 , 6 ]. The agent's sole objective is an intrinsic signal r^i_t, formally derived in Section 3.3 . The optimization target is: max_{m, π } E_{ τ ~(m, π ,E)} [ Σ _t r^i_t ] Maximizing predictive surprise over the joint (m, π) space is motivated by open-ended learning theory [ 10 , 52 ]: configurations that remain surprising for the longest time are precisely those that encode diverse, generalizable structure, rather than configurations that have been narrowly tuned to a single task signal. 3.2 Morphological Representation A modular robot maps naturally to a graph G = (V, E) [ 11 , 12 , 27 ], with nodes representing individual hardware modules torso segments, limb links, actuated joints and edges encoding rigid physical connections parameterized by joint type, range, and orientation. The graph is dynamic: modules may be attached or detached during training, altering both topology and the dimensionality of the action and observation spaces. We process each graph with a GNN [ 17 , 18 , 19 ] whose message-passing rule at layer l is: h_v^(l + 1) = UPDATE( h_v^(l), AGGREGATE({ h_u^(l) : u ∈ N(v) }) ) AGGREGATE is permutation-invariant mean pooling [ 20 ]; node features encode mass, joint limits, and proprioceptive state; edge features encode geometric attachment and stiffness. After L layers, a global morphology embedding is obtained: z_m = READOUT({ h_v^(L) : v ∈ V }) This fixed-dimensional embedding [ 21 ] conditions both the world model and the policy, making them invariant to graph size and node ordering. Structural mutations, module additions and removals are handled via a stochastic attachment operator with learnable probabilities, updated by policy gradient [ 40 , 42 ] through the intrinsic objective. 3.3 Morphology-Conditioned World Model We learn a forward model f̂_θ [ 33 , 34 , 36]that predicts the next observation from the current observation, action, and morphology embedding: ô_{t + 1} = f̂_θ( o_t, a_t, z_m ) The model is implemented as an ensemble of K independently parameterized networks [ 39 ], providing epistemic uncertainty estimates through member disagreement and reducing interference between distinct morphologies stored in an experience replay buffer [ 58 , 60 ]. Per-transition prediction error is measured as mean squared discrepancy averaged over ensemble members: L_pred = (1/K) Σ_k || f̂_{θ_k}(o_t, a_t, z_m) - o_{t + 1} ||^2 Because z_m enters the model directly, the error is structurally sensitive: a body the ensemble has not yet encountered generates high surprise even under familiar actions. This is the mechanism that couples morphological novelty to the curiosity signal, extending model-based curiosity from behavioral to structural exploration [ 24 , 30 ]. 3.4 Dual-Level Intrinsic Reward Using raw prediction error as the sole signal conflates two qualitatively different sources of novelty: encountering a new body shape versus encountering a new behavior within a known body. These phenomena operate at different timescales and call for distinct treatment [ 23 , 29 ]. We therefore decompose the reward into two components. Morphological Novelty. A morphology archive A stores embeddings of all previously encountered body configurations. Morphological novelty is the distance from the current embedding to its nearest archived neighbor [ 4 , 9 , 54 ]: r^m_t = min_{z' ∈ A} || z_m - z' ||_2 This extends the novelty search objective [ 4 , 9 ] to body space, and complements quality-diversity approaches [ 5 , 53 ] by operating over structural rather than behavioral characterizations. Behavioral Novelty. Within a fixed morphology, behavioral novelty is measured by ensemble disagreement [ 24 , 31 , 39 ], which serves as a proxy for epistemic uncertainty about the current body's dynamics: r^b_t = Var_k[ f̂_{θ_k}(o_t, a_t, z_m) ] The combined reward is: r^i_t = λ _m · r^m_t + λ _b · r^b_t We anneal λ_m downward over training [ 44 ], prioritizing structural exploration early and shifting toward behavioral refinement as the morphological archive matures. This coarse-to-fine schedule stabilizes co-evolution and reduces the risk of premature commitment to suboptimal body plans. 3.5 Co-Evolution Training Procedure Each outer iteration cycles through three phases, with PPO [ 42 ] as the policy optimizer and PETS-style ensemble updates [ 38 ] for the world model. Phase 1 — Morphological Proposal. The attachment operator proposes a candidate morphology m'. If its distance to the nearest archive neighbor exceeds a novelty threshold, m' is accepted and added to the archive; otherwise the current morphology is retained. Physically realizing accepted morphologies requires modular hardware assembly [ 41 ] that translates the simulated graph into a constructible robot configuration. Phase 2 — Policy Optimization. With m fixed, the policy π_m is updated for N_π PPO steps [ 42 ] under r^i_t. Conditioning the policy network on z_m permits partial weight sharing across structurally similar morphologies [ 15 , 16 ], improving sample efficiency during the behavioral refinement phase. Phase 3 — World Model Update. Transitions collected in Phase 2 are added to a stratified replay buffer organized by morphology [ 60 ]. Ensemble members are updated independently to preserve inter-member disagreement as a reliable uncertainty signal [ 39 ]. 3.6 Sim-to-Real Transfer All training occurs in physics simulation [ 61 , 62 , 63 ]. To bridge the reality gap, we apply morphology-conditioned domain randomization [ 45 , 46 , 47 ] in which the range of randomized physical parameters, module masses, joint friction, contact geometry scales with the structural complexity of the active morphology. More complex bodies are harder to model faithfully and receive correspondingly wider randomization windows. We additionally impose a morphological consistency loss, penalizing ensemble predictions that vary across randomization seeds for the same morphology-action pair, which biases the world model toward structure-grounded rather than incidental features of the dynamics [ 46 , 47 , 48 ]. At evaluation time, the top novelty morphology is assembled on physical hardware and the corresponding policy is deployed directly, without any world adaptation. Results We evaluate Morphological Curiosity across four axes: morphological diversity, zero-shot task transfer, ablation of individual components, and analysis of emergent structural properties. All reported numbers are means with 95% confidence intervals over five independent seeds. Simulation infrastructure draws on standard environments [ 61 , 62 , 63 ] and distributed training tools [ 64 ]. 5.1 Experimental Setup Environments. Three environments of increasing complexity serve as held-out evaluation tasks: FlatTerrain (forward locomotion on a uniform surface), RoughTerrain (locomotion over procedurally generated uneven ground [ 62 , 63 ]), and ManipulationReach (sequential end-effector contact with randomly placed targets). None of these environments is visible during the unsupervised training phase; the agent trains exclusively in a featureless ExplorationArena. This zero-shot evaluation protocol reflects deployment conditions in which a pre-trained agent must generalize to previously unseen settings, a scenario increasingly relevant to physical AI systems deployed in dynamic environments [ 57 ]. Baselines. Four comparisons are included. Fixed-Body Curiosity (FBC) [ 24 , 25 ] applies the same curiosity objective to a fixed default morphology. Reward-Driven Co-Evolution (RDCE) [ 5 , 6 ] uses MAP-Elites with task reward as the fitness signal. Random Morphology Search (RMS) evaluates curiosity-driven policies on randomly sampled bodies, isolating the contribution of directed structural exploration. Behavioral Curiosity Only (BCO) ablates morphological novelty by setting λ_m = 0. Metrics. We report Morphology Coverage (MC), the kernel-density-estimated volume of the morphological embedding space visited during training; Transfer Performance (TP), mean cumulative task reward under zero-shot deployment; Sample Efficiency (SE), environment steps to 80% of maximum TP; and Structural Consistency (SC), a measure of bilateral symmetry and modular regularity in discovered morphologies [ 7 , 49 ]. 5.2 Morphology Discovery and Coverage After 49 million training steps, Morphological Curiosity achieves an MC score of 0.74 ± 0.03, versus 0.41 ± 0.05 for RDCE [ 5 ], 0.29 ± 0.04 for BCO, and 0.60 ± 0.06 for RMS. The gap relative to RDCE reflects the well-documented tendency of fitness-based search to converge on a narrow set of high-performing attractors at the expense of structural breadth [ 53 , 54 ]. RMS achieves competitive coverage by construction but, as Section 5.3 shows, coverage without directed behavioral learning does not translate into transferable configurations. t-SNE projections [ 17 , 20 ] of the morphological embedding space reveal that our method fills the space with multiple well-separated clusters, while RDCE produces dense concentrations around a small number of attractor morphologies. 5.3 Zero-Shot Transfer Performance On FlatTerrain, Morphological Curiosity scores 847 ± 41 versus 891 ± 38 for RDCE (p = 0.14, Welch's t-test). The small advantage of RDCE here is expected: flat locomotion is exactly the kind of narrow, stable target that task-reward optimization is designed for, and the generality premium purchased by curiosity-driven search offers little advantage. On RoughTerrain the picture reverses sharply. Our method achieves 763 ± 50 against 504 ± 67 for RDCE, a 50.4% gain. The morphologies our system selects for this environment tend toward lower centers of mass, wider stances, and greater limb redundancy structural features that confer terrain robustness without having been explicitly targeted [ 10 , 52 ]. This is consistent with embodied design principles observed in biological locomotors [ 49 , 50 ], where structural generality emerges from selection pressure toward diverse environments rather than any single one. On ManipulationReach, scores are 612 ± 57 (ours), 389 ± 72 (RDCE), and 201 ± 43 (FBC). The particularly large gap over FBC demonstrates that behavioral generalization is insufficient when the downstream task demands physical capabilities the fixed morphology simply lacks [ 24 , 25 ]. Our system independently discovers elongated distal chains and additional degrees of freedom at end-effector joints [ 49 , 51 ], properties the task never signaled. 5.4 Ablation Studies Dual-level reward. Removing morphological novelty (λ_m = 0) drops RoughTerrain TP by 34.2% and ManipulationReach TP by 41.7%. Removing behavioral novelty (λ_b = 0) produces more moderate losses of 18.3% and 22.1%, respectively. Behavioral curiosity alone [ 24 , 31 ] cannot generate sufficient structural diversity for transfer. GNN representation. Replacing the GNN [ 17 , 18 , 20 ] with a flattened parameter vector reduces MC by 29.6% and RoughTerrain TP by 24.8%. Graph-structured embeddings are necessary for the novelty metric to capture genuine structural dissimilarity and for the world model to generalize across morphologies. Ensemble model. A single deterministic forward model in place of the probabilistic ensemble [ 39 ] degrades the correlation between ensemble disagreement and actual prediction error from r = 0.81 to r = 0.46, propagating to a 19.4% drop in ManipulationReach TP. Morphology conditioning. Removing z_m from the world model is the most damaging ablation: RoughTerrain TP falls 46.3% and the advantage over FBC [ 24 , 25 ] is nearly eliminated. Without structural conditioning the curiosity signal is purely behavioral, and the co-evolution dynamics collapse to a fixed-body search. Annealing schedule. Fixing λ_m throughout training reduces SE by 38.7% and substantially increases cross-seed variance. Consistent with curriculum learning analyses [ 44 ], early behavioral refinement within immature morphologies prevents subsequent discovery of structurally superior configurations. 5.5 Emergent Structural Properties Structural Consistency scores are 0.69 ± 0.04 (ours), 0.31 ± 0.06 (RMS), and 0.71 ± 0.03 (RDCE on locomotion tasks). The near-equivalence between our method and RDCE on SC is notable: both discover symmetric, modular body plans, but through entirely different mechanisms. Symmetric configurations generate richer dynamics under perturbation; bilateral symmetry amplifies prediction error diversity so the curiosity objective rewards symmetry directly, without task signal [ 49 , 50 ]. Similarly, configurations with sparse proximal joints and dense distal articulation sustain high prediction error for longer than highly redundant proximal structures, yielding the proximal-to-distal gradient consistently observed in vertebrate limb anatomy [ 49 , 50 , 51 ]. These regularities in turn facilitate physical realization using modular hardware platforms designed around similar structural principles [ 41 ]. 5.6 Sim-to-Real Transfer Three hardware instantiations of top-performing morphologies were assembled on a commercial modular robotics platform [ 11 , 14 ] and evaluated without any real-world fine-tuning. Our method's mean sim-to-real gap is 18.3% ± 4.1%, compared to 31.7% ± 6.8% for RDCE. The advantage comes from two sources: morphology-conditioned domain randomization [ 45 , 46 , 47 ] produces policies inherently robust to parameter variation, and curiosity driven morphologies tend to be structurally simpler than task-optimized ones, reducing sensitivity to simulation error [ 48 ]. All three robots executed stable gaits over uneven terrain without online adaptation. 5.7 Summary All three primary hypotheses are supported. Curiosity driven co-evolution produces more structurally diverse and transferable morphologies than reward-driven baselines [ 5 , 6 ], with gains of up to 50.4% on unseen tasks. Self-selected bodies confer consistent zero-shot advantages. The intrinsic objective recovers ecologically coherent structural regularities [ 49 , 50 , 51 ] as emergent consequences of maximizing predictive diversity. Conclusion We have presented Morphological Curiosity, a system that allows a modular robot to jointly explore its own physical form and behavioral repertoire, driven entirely by intrinsic motivation. The central contribution is treating body design not as a fixed input but as a variable within a joint optimization one that, when guided by curiosity rather than task reward, produces configurations that transfer broadly across downstream objectives. This departs fundamentally from both standard policy learning [ 1 , 2 , 5 ] and task-conditioned co-evolution, neither of which produces the kind of open-ended structural diversity that generalizable physical intelligence requires. The dual-level reward formulation separating morphological novelty [ 4 , 9 ] from behavioral novelty [ 24 , 31 ] within a shared world model [ 39 ] is a necessary architectural choice, as ablations confirm. Collapsing the two signals destroys the structural exploration pressure on which downstream generalization depends. The emergent structural regularities are, from a researcher's perspective, the most conceptually significant finding. Symmetry, distal joint density, and energetic parsimony [ 49 , 50 , 51 ] were not designed into the objective; they were selected by it. This suggests that the relationship between predictability and morphological quality is not incidental that bodies worth having are, in a precise information-theoretic sense, bodies that remain surprising. Intrinsic motivation [ 22 , 23 ] may therefore be not merely a practical alternative to task reward but a more principled foundation for physical intelligence. 6.1 Limitations Scalability. The attachment operator scales poorly with module count. Combinatorial growth of the attachment probability space makes large scale structural search infeasible without hierarchical decomposition [ 7 , 8 ]. Continual learning. The stratified replay buffer reduces but does not eliminate interference between morphologies encountered at different training stages [ 58 , 59 , 60 ]. More principled approaches to non-stationary model learning are needed. Contact dynamics. Despite morphology-conditioned randomization [ 45 , 46 , 47 ], high-contact configurations exhibit the largest sim-to-real gaps. Current simulators [ 61 , 63 ] do not model contact with sufficient fidelity for curiosity driven systems that actively seek out physically complex configurations. Environmental coupling. Morphological exploration here occurs in a fixed environment. Biological morphology and ecological niche co-evolve [ 49 , 50 ]; extending our framework to allow the exploration arena itself to vary as in POET [ 10 ] is a natural and important extension. 6.2 Future Directions Hierarchical body search. Organizing modules into functional sub assemblies and applying curiosity hierarchically [ 7 , 8 ] would dramatically expand the accessible structural space while maintaining search tractability, analogous to hierarchical policy optimization [ 44 ]. Population-based structural diversity. Multi-agent extensions in which individuals are additionally rewarded for being structurally novel relative to the population [ 53 , 54 , 56 ] could produce richer diversity through competitive pressure, building on quality-diversity frameworks [ 53 , 54 , 55 ] and open-ended evolution [ 52 , 58 ]. Pre-training for downstream adaptation. The morphology policy archive produced by Morphological Curiosity constitutes a diverse set of embodied priors. Fine tuning from this archive with task reward [ 44 ] is a natural path toward few-shot physical task adaptation. Deployment-time morphological adaptation. Allowing a deployed robot to continue structural adaptation in response to environmental change [ 58 , 59 ] is the logical endpoint of this line of work. Autonomous navigation research has shown that online re-planning in response to changing conditions is tractable on physical platforms [ 65 ]; applying the same principle to morphological re-configuration would represent a major step toward genuinely adaptive physical intelligence, mirroring the developmental plasticity observed in biological organisms [ 49 , 50 ]. 6.3 Closing Remarks We began with the observation that bodies matter [ 49 , 50 , 51 ]. We end with the claim that the right bodies are discoverable not by specifying what they should accomplish, but by asking what they should sustain. A morphology that maintains high predictive uncertainty across a wide range of interactions is a morphology that supports open ended learning [ 52 , 53 ], and a system driven to seek such morphologies will, as our results suggest, converge on solutions that are physically coherent, ecologically interpretable, and broadly transferable. The perspective that form and curiosity should co-evolve rather than that form should be optimized for function has implications beyond modular robotics [ 11 , 16 ]. We regard Morphological Curiosity as an initial demonstration of this principle and expect the ideas developed here to be applicable wherever the body is a design variable rather than a constant. Code, trained models, and experiment configurations will be released at [repository link] upon publication. References Sims K (2023) Evolving virtual creatures. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2 (pp. 699–706) Sims K (1994) Evolving 3D morphology and behavior by competition. Artif Life 1(4):353–372 Cheney N, MacCurdy R, Clune J, Lipson H (2014) Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding. ACM SIGEVOlution 7(1):11–23 Lehman J, Stanley KO (2011) Abandoning objectives: Evolution through the search for novelty alone. Evolution Comput 19(2):189–223 Mouret JB, Clune J (2015) Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909 Gupta A, Savarese S, Ganguli S, Fei-Fei L (2021) Embodied intelligence via learning and evolution. Nat Commun 12(1):5721 Clune J, Mouret JB, Lipson H (2013) The evolutionary origins of modularity. Proceedings of the Royal Society b: Biological sciences , 280 (1755) Stanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evolution Comput 10(2):99–127 Lehman J, Stanley KO (2011) Novelty search and the problem with objectives. Genetic programming theory and practice IX. Springer New York, New York, NY, pp 37–56 Wang R, Lehman J, Clune J, Stanley KO (2019) Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753 Yim, M., Shen, W. M., Salemi, B., Rus, D., Moll, M., Lipson, H., … Chirikjian, G.S. (2007). Modular self-reconfigurable robot systems [grand challenges of robotics].IEEE Robotics & Automation Magazine, 14(1), 43–52. Zhao Q, Nakajima K, Sumioka H, Hauser H, Pfeifer R (2013) Morphological computation as a whole-body resource in sprawling quadruped locomotion. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1510–1517). IEEE Alim MW, Giri A, Akib AAS, Uddin N, Islam M, Arafat ME, Tahmid SA (2025), April Affordable bionic hands with intuitive control through forearm muscle signals. In 2025 IEEE 4th International Conference on Computing and Machine Intelligence (ICMI) (pp. 1–6). IEEE Whitman J, Ha S, Atkeson C (2017) Towards a modular robot system for extraterrestrial construction. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (pp. 4173–4180). IEEE Huang W, Mordatch I, Pathak D (2020), November One policy to control them all: Shared modular policies for agent-agnostic control. In International Conference on Machine Learning (pp. 4455–4464). PMLR Pathak D, Lu C, Darrell T, Isola P, Efros AA (2019) Learning to control self-assembling morphologies: a study of generalization via modularity. Adv Neural Inf Process Syst, 32 Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE Trans Neural Networks 20(1):61–80 Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017), July Neural message passing for quantum chemistry. In International conference on machine learning (pp. 1263–1272). Pmlr Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 Wang T, Liao R, Ba J, Fidler S (2018), February Nervenet: Learning structured policy with graph neural networks. In International conference on learning representations Schmidhuber J (1991) A possibility for implementing curiosity and boredom in model-building neural controllers. In Proc. of the international conference on simulation of adaptive behavior: From animals to animats (pp. 222–227) Oudeyer PY, Kaplan F, Hafner VV (2007) Intrinsic motivation systems for autonomous mental development. IEEE Trans Evol Comput 11(2):265–286 Pathak D, Agrawal P, Efros AA, Darrell T (2017), July Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning (pp. 2778–2787). PMLR Burda Y, Edwards H, Storkey A, Klimov O (2018) Exploration by random network distillation. arXiv preprint arXiv:1810.12894 Bellemare M, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation. Advances in neural information processing systems , 29 Akib, A. A. S., Giri, A., Islam, M., Sifa, F. J., Elahi, T. A., Aktia, A. N., … Khanna,A. (2024, October). Design and simulation of a quadruped robot. In International Conference on Data-Processing and Networking(pp. 373–385). Singapore: Springer Nature Singapore. Barto AG (2012) Intrinsic motivation and reinforcement learning. Intrinsically motivated learning in natural and artificial systems. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 17–47 Aubret A, Matignon L, Hassas S (2019) A survey on intrinsic motivation in reinforcement learning. arXiv preprint arXiv:1908.06976 Sekar R, Rybkin O, Daniilidis K, Abbeel P, Hafner D, Pathak D (2020), November Planning to explore via self-supervised world models. In International conference on machine learning (pp. 8583–8592). PMLR Raileanu R, Rocktäschel T (2020) Ride: Rewarding impact-driven exploration for procedurally-generated environments. arXiv preprint arXiv:2002.12292 Eysenbach B, Gupta A, Ibarz J, Levine S (2018) Diversity is all you need: Learning skills without a reward function. arXiv preprint arXiv:1802.06070 Ha D, Schmidhuber J (2018) World models. arXiv preprint arXiv:1803 10122 2(3):440 Hafner D, Lillicrap T, Fischer I, Villegas R, Ha D, Lee H, Davidson J (2019), May Learning latent dynamics for planning from pixels. In International conference on machine learning (pp. 2555–2565). PMLR Hafner D, Lillicrap T, Norouzi M, Ba J (2020) Mastering atari with discrete world models. arXiv preprint arXiv:2010.02193 Hafner D, Lillicrap T, Ba J, Norouzi M (2019) Dream to control: Learning behaviors by latent imagination. arXiv preprint arXiv:1912.01603 Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S.,… Silver, D. (2020). Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839), 604–609. Chua K, Calandra R, McAllister R, Levine S (2018) Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in neural information processing systems , 31 Lakshminarayanan B, Pritzel A, Blundell C (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems , 30 Deisenroth M, Rasmussen CE (2011) PILCO: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on machine learning (ICML-11) (pp. 465–472) Giri, A., Akib, A. A. S., Hasib, A., Acharya, A., Prithibi, M. A., Rahman, R. H.,… Taha, H. I. C. (2025, April). Design and development of a cost effective and modular cnc plotter for educational and prototyping applications. In 2025 IEEE 4th International Conference on Computing and Machine Intelligence (ICMI)(pp. 1–6). IEEE. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G.,… Hassabis, D. (2015). Human-level control through deep reinforcement learning. nature, 518(7540), 529–533. Sutton RS, Barto AG (1998) Reinforcement learning: An introduction, vol 1. MIT Press, Cambridge, pp 9–11. 1 Tobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017), September Domain randomization for transferring deep neural networks from simulation to the real world. In 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 23–30). IEEE Andrychowicz, O. M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki,J., … Zaremba, W. (2020). Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1), 3–20. Peng XB, Andrychowicz M, Zaremba W, Abbeel P (2018), May Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE international conference on robotics and automation (ICRA) (pp. 3803–3810). IEEE Zhao W, Queralta JP, Westerlund T (2020), December Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In 2020 IEEE symposium series on computational intelligence (SSCI) (pp. 737–744). IEEE Pfeifer R, Bongard J (2006) How the body shapes the way we think: a new view of intelligence. MIT Press Pfeifer R, Lungarella M, Iida F (2007) Self-organization, embodiment, and biologically inspired robotics. Science 318(5853):1088–1093 Hasson C, Fukuoka Y, Nakamura M (2011) Morphological computation as a key for the design of versatile robots. In Proceedings of the 14th International Conference on Climbing and Walking Robots (CLAWAR). World Scientific Pfeifer R, Scheier C (1999) Understanding intelligence. MIT Press Cully A, Clune J, Tarapore D, Mouret JB (2015) Robots that can adapt like animals. Nature 521:503–507 Pugh JK, Soros LB, Stanley KO (2016) Quality diversity: A new frontier for evolutionary computation. Front Rob AI 3:40 Gravina D, Liapis A, Yannakakis GN (2019) Quality diversity through surprise. IEEE Trans Evol Comput 23(4):603–616 Ecoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2021) First return, then explore. Nature 590:580–586 Giri A, Hasib A, Islam M, Tazim MF, Rahman MDS, Khadgi M (2025) Real-time human fall detection using YOLOv5 on Raspberry Pi: An edge AI solution for smart healthcare and safety monitoring. In Proceedings of the International Conference on Data Analytics & Management (pp. 493–507). Springer Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, Hassabis D, Clopath C, Kumaran D, Hadsell R (2017) Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521–3526 Parisi GI, Kemker R, Part JL, Kanan C, Wermter S (2019) Continual lifelong learning with neural networks: A review. Neural Netw 113:54–71 Rolnick D, Ahuja A, Schwarz J, Lillicrap T, Wayne G (2019) Experience replay for continual learning. Adv Neural Inf Process Syst 32:350–360 Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI gym. arXiv preprint. https://arxiv.org/abs/1606.01540 Todorov E, Erez T, Tassa Y (2012) MuJoCo: A physics engine for model-based control. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 5026–5033). IEEE Liang E, Liaw R, Nishihara R, Moritz P, Fox R, Goldberg K, Gonzalez J, Jordan M, Stoica I (2018) RLlib: Abstractions for distributed reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning (ICML) (pp. 3053–3062). PMLR Coumans E, Bai Y (2016) PyBullet, a Python module for physics simulation for games, robotics and machine learning. http://pybullet.org Akib ASMAS, Uddin AZMJ, Giri A, Islam M, Arafat ME, Bhuiyan T (2025) Efficient route planning and navigation in drones using Pixhawk autopilot. In Proceedings of the 2025 6th International Conference on Artificial Intelligence, Robotics and Control (AIRC). IEEE Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9570273","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":632042674,"identity":"6269753b-a5f9-41ce-b1bb-868ae5e74497","order_by":0,"name":"Eren Bajracharya","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAuElEQVRIiWNgGAWjYJCCA0AoB2Y8IEWLMZiRQIo9iQ0gmigt5u3diYcLztilzw87/BBoi52cbgMBLTJnzm44PONGcu7G22kGQC3JxmYHCGiRkMjdcJjnw4HcjbMTQFoOJG4jqEX+LVhLuuHs9A9EapHgBWq5cSBBXjqHWFt4QA47k2y4QTqn4ECCATF+YT+7+TPPMTt5+dnpmz98qLCTI6gFDgzAKg2IVQ4C8g2kqB4Fo2AUjIIRBQBH6UrrbevTOQAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0009-0005-2550-9916","institution":"Kathmandu University","correspondingAuthor":true,"prefix":"","firstName":"Eren","middleName":"","lastName":"Bajracharya","suffix":""}],"badges":[],"createdAt":"2026-04-29 23:11:01","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9570273/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9570273/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108382776,"identity":"9e70c802-b5d5-431c-aa69-8318388ae780","added_by":"auto","created_at":"2026-05-04 05:33:08","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":81986,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eOverview of the Morphological Curiosity co-evolution loop. The three training phases cycle continuously, guided by the dual-level intrinsic reward r^i_t = λ_m·r^m_t + λ_b·r^b_t.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-9570273/v1/8e3100d02d24c1e0978e619b.png"},{"id":108493248,"identity":"87bfe364-4090-4690-95e0-1761a683e644","added_by":"auto","created_at":"2026-05-05 09:59:47","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":57827,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003et-SNE projections of the morphological embedding space z_m. Our method (left) exhibits broad, well-separated structural clusters; RDCE (center) converges on narrow attractors; RMS (right) shows scattered but uninformative coverage.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-9570273/v1/515bce760466a213132b8adb.png"},{"id":108382778,"identity":"4c840427-d3de-4cd8-b3e1-dc8899a80f3c","added_by":"auto","created_at":"2026-05-04 05:33:08","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":55841,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eZero-shot transfer performance across three held-out evaluation environments. Error bars show 95% CIs over five independent seeds. Our method achieves a 50.4% gain over RDCE on RoughTerrain.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-9570273/v1/9a4f536381c293dc12e9a976.png"},{"id":108382779,"identity":"6b542d98-b808-4f51-9bc6-3a0782191083","added_by":"auto","created_at":"2026-05-04 05:33:08","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":83743,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eAblation study results on RoughTerrain (left) and ManipulationReach (right). Blue/green = full model; lighter bars = component removed. Removing morphology conditioning from the world model is the most damaging intervention.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-9570273/v1/919ac9787821f70e4fc4e112.png"},{"id":108492901,"identity":"ebbbcdbe-04e0-48ab-b4f9-c81a75c1cc4d","added_by":"auto","created_at":"2026-05-05 09:58:56","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":51954,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cem\u003eSim-to-real performance gap (%) across three hardware morphology configurations. Lower is better. Our method reduces the gap by 13.4 percentage points on average relative to RDCE.\u003c/em\u003e\u003c/p\u003e","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-9570273/v1/106eb24069cd0722dfadf987.png"},{"id":108495901,"identity":"8ff6f3af-7043-4286-b8ac-e762fd5be144","added_by":"auto","created_at":"2026-05-05 10:10:56","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":639426,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9570273/v1/92758610-da45-4042-91ce-4d9d1599fc4a.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eMorphological Curiosity: Self-Directed Body-Policy Co-Evolution\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ein Modular Robots via Intrinsic Motivation\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eA recurring observation in biological systems is that cognitive capability cannot be cleanly separated from the body through which it is exercised [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]. The dexterity available to a cephalopod, the gait efficiency of a quadruped, and the manipulative range of a primate hand are each co-determined by neural control and physical form. This interdependence is strikingly illustrated in recent work on affordable prosthetic limbs [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e], where the arrangement of actuated joints directly constrains what grasps are achievable no controller, however sophisticated, can recover a capability the morphology does not afford. Despite this, the prevailing approach in robot learning treats the body as a constant input rather than a design variable. Hardware is fixed at the outset, and the entire optimization budget is spent on policy search within the space that morphology permits.\u003c/p\u003e \u003cp\u003eThe evolutionary robotics literature has grappled with this limitation for decades, producing methods that jointly search over body plans and controllers [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. The fundamental difficulty with these approaches is that they require a fitness function defined over a specific task [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. A morphology shaped by locomotion on flat ground transfers poorly to rough terrain, and the search must be restarted when tasks change. The problem is not merely one of sample efficiency it reflects a conceptual mismatch between the goal of producing generally capable physical agents and the mechanism of reward-driven optimization, which by design narrows the search toward one performance criterion.\u003c/p\u003e \u003cp\u003eIntrinsic motivation provides a principled alternative [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. Agents driven by prediction error or information gain explore without any specification of what to accomplish [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e], and have demonstrated the ability to acquire broad behavioral repertoires in fixed body settings [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e, \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. The natural extension asking whether curiosity can also drive exploration over body structure has received little systematic attention. We argue that it can, and that the resulting morphologies are better suited to downstream task transfer precisely because they were shaped by diversity of experience rather than by any particular performance target.\u003c/p\u003e \u003cp\u003eWe present Morphological Curiosity, a system in which a modular robot simultaneously explores its own structural space and the behavioral space available to each body it inhabits, guided at every step by a single intrinsic objective. The central observation enabling this is that a world model conditioned on morphology produces prediction errors that are sensitive to structural change: a body the model has not yet encountered generates high surprise even for familiar actions, creating an automatic drive toward structural novelty. We formalize this into a dual-level reward that separately credits morphological and behavioral novelty, assigns them distinct annealing schedules, and combines them to produce a stable co-evolution signal without any task specification.\u003c/p\u003e \u003cp\u003eRobot morphology is represented as a dynamic graph over a differentiable GNN [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e], which encodes bodies of varying size and topology in a fixed-dimensional space amenable to gradient-based search [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. A morphological novelty archive [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] and an ensemble disagreement signal [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e] together constitute the dual-level reward. The system is trained in simulation and deployed on physical hardware via morphology-conditioned domain randomization.\u003c/p\u003e \u003cp\u003eWe evaluate against task-reward co-evolution [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] and fixed-body curiosity agents [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] across locomotion and manipulation benchmarks under strict zero-shot transfer. We further examine whether the intrinsic objective implicitly recovers structural regularities associated with biological design [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e, \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e], and find that it does symmetry, proximal-to-distal joint gradients, and energetic parsimony all emerge without reward.\u003c/p\u003e \u003cp\u003eThe paper proceeds as follows. Section \u003cspan refid=\"Sec2\" class=\"InternalRef\"\u003e3\u003c/span\u003e formalizes the framework and its components. Section \u003cspan refid=\"Sec9\" class=\"InternalRef\"\u003e5\u003c/span\u003e presents experimental evaluation and ablations. Section \u003cspan refid=\"Sec17\" class=\"InternalRef\"\u003e6\u003c/span\u003e discusses limitations and future directions.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"Methodology","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Problem Formulation\u003c/h2\u003e \u003cp\u003eLet \u003cem\u003eM\u003c/em\u003e denote the space of valid modular robot morphologies and \u003cem\u003eΠ_m\u003c/em\u003e the space of policies conditioned on morphology m \u0026isin; M. At each training step the agent inhabits a configuration (m, π) and interacts with an environment to produce experience. No external reward is provided a deliberate departure from reward-shaping approaches [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e, \u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e] and task-conditioned co-evolution [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. The agent's sole objective is an intrinsic signal r^i_t, formally derived in Section \u003cspan refid=\"Sec5\" class=\"InternalRef\"\u003e3.3\u003c/span\u003e. The optimization target is:\u003c/p\u003e \u003cp\u003e \u003cem\u003emax_{m,\u003c/em\u003e \u003cem\u003eπ\u003c/em\u003e \u003cem\u003e} E_{\u003c/em\u003e \u003cem\u003eτ\u003c/em\u003e \u003cem\u003e~(m,\u003c/em\u003e \u003cem\u003eπ\u003c/em\u003e \u003cem\u003e,E)} [\u003c/em\u003e \u003cem\u003eΣ\u003c/em\u003e \u003cem\u003e_t r^i_t ]\u003c/em\u003e\u003c/p\u003e \u003cp\u003eMaximizing predictive surprise over the joint (m, π) space is motivated by open-ended learning theory [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]: configurations that remain surprising for the longest time are precisely those that encode diverse, generalizable structure, rather than configurations that have been narrowly tuned to a single task signal.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Morphological Representation\u003c/h2\u003e \u003cp\u003eA modular robot maps naturally to a graph G = (V, E) [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], with nodes representing individual hardware modules torso segments, limb links, actuated joints and edges encoding rigid physical connections parameterized by joint type, range, and orientation. The graph is dynamic: modules may be attached or detached during training, altering both topology and the dimensionality of the action and observation spaces. We process each graph with a GNN [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] whose message-passing rule at layer l is:\u003c/p\u003e \u003cp\u003e \u003cem\u003eh_v^(l\u0026thinsp;+\u0026thinsp;1) = UPDATE( h_v^(l), AGGREGATE({ h_u^(l) : u \u0026isin; N(v) }) )\u003c/em\u003e \u003c/p\u003e \u003cp\u003eAGGREGATE is permutation-invariant mean pooling [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]; node features encode mass, joint limits, and proprioceptive state; edge features encode geometric attachment and stiffness. After L layers, a global morphology embedding is obtained:\u003c/p\u003e \u003cp\u003e \u003cem\u003ez_m\u0026thinsp;=\u0026thinsp;READOUT({ h_v^(L) : v \u0026isin; V })\u003c/em\u003e \u003c/p\u003e \u003cp\u003eThis fixed-dimensional embedding [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e] conditions both the world model and the policy, making them invariant to graph size and node ordering. Structural mutations, module additions and removals are handled via a stochastic attachment operator with learnable probabilities, updated by policy gradient [\u003cspan citationid=\"CR40\" class=\"CitationRef\"\u003e40\u003c/span\u003e, \u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e] through the intrinsic objective.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Morphology-Conditioned World Model\u003c/h2\u003e \u003cp\u003eWe learn a forward model f̂_θ [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e, 36]that predicts the next observation from the current observation, action, and morphology embedding:\u003c/p\u003e \u003cp\u003e \u003cem\u003e\u0026ocirc;_{t\u0026thinsp;+\u0026thinsp;1} = f̂_θ( o_t, a_t, z_m )\u003c/em\u003e \u003c/p\u003e \u003cp\u003eThe model is implemented as an ensemble of K independently parameterized networks [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e], providing epistemic uncertainty estimates through member disagreement and reducing interference between distinct morphologies stored in an experience replay buffer [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e, \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e]. Per-transition prediction error is measured as mean squared discrepancy averaged over ensemble members:\u003c/p\u003e \u003cp\u003e \u003cem\u003eL_pred = (1/K) Σ_k || f̂_{θ_k}(o_t, a_t, z_m) - o_{t\u0026thinsp;+\u0026thinsp;1} ||^2\u003c/em\u003e \u003c/p\u003e \u003cp\u003eBecause z_m enters the model directly, the error is structurally sensitive: a body the ensemble has not yet encountered generates high surprise even under familiar actions. This is the mechanism that couples morphological novelty to the curiosity signal, extending model-based curiosity from behavioral to structural exploration [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Dual-Level Intrinsic Reward\u003c/h2\u003e \u003cp\u003eUsing raw prediction error as the sole signal conflates two qualitatively different sources of novelty: encountering a new body shape versus encountering a new behavior within a known body. These phenomena operate at different timescales and call for distinct treatment [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. We therefore decompose the reward into two components.\u003c/p\u003e \u003cp\u003e \u003cb\u003eMorphological Novelty.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eA morphology archive A stores embeddings of all previously encountered body configurations. Morphological novelty is the distance from the current embedding to its nearest archived neighbor [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]:\u003c/p\u003e \u003cp\u003e \u003cem\u003er^m_t\u0026thinsp;=\u0026thinsp;min_{z' \u0026isin; A} || z_m - z' ||_2\u003c/em\u003e \u003c/p\u003e \u003cp\u003eThis extends the novelty search objective [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] to body space, and complements quality-diversity approaches [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e] by operating over structural rather than behavioral characterizations.\u003c/p\u003e \u003cp\u003e \u003cb\u003eBehavioral Novelty.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eWithin a fixed morphology, behavioral novelty is measured by ensemble disagreement [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e, \u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e], which serves as a proxy for epistemic uncertainty about the current body's dynamics:\u003c/p\u003e \u003cp\u003e \u003cem\u003er^b_t\u0026thinsp;=\u0026thinsp;Var_k[ f̂_{θ_k}(o_t, a_t, z_m) ]\u003c/em\u003e \u003c/p\u003e \u003cp\u003eThe combined reward is:\u003c/p\u003e \u003cp\u003e \u003cem\u003er^i_t \u0026thinsp; = \u0026thinsp;\u003c/em\u003e \u003cem\u003eλ\u003c/em\u003e \u003cem\u003e_m \u0026middot; r^m_t \u0026thinsp; + \u0026thinsp;\u003c/em\u003e \u003cem\u003eλ\u003c/em\u003e \u003cem\u003e_b \u0026middot; r^b_t\u003c/em\u003e\u003c/p\u003e \u003cp\u003eWe anneal λ_m downward over training [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e], prioritizing structural exploration early and shifting toward behavioral refinement as the morphological archive matures. This coarse-to-fine schedule stabilizes co-evolution and reduces the risk of premature commitment to suboptimal body plans.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.5 Co-Evolution Training Procedure\u003c/h2\u003e \u003cp\u003eEach outer iteration cycles through three phases, with PPO [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e] as the policy optimizer and PETS-style ensemble updates [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e] for the world model.\u003c/p\u003e \u003cp\u003e \u003cb\u003ePhase 1 \u0026mdash; Morphological Proposal.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eThe attachment operator proposes a candidate morphology m'. If its distance to the nearest archive neighbor exceeds a novelty threshold, m' is accepted and added to the archive; otherwise the current morphology is retained. Physically realizing accepted morphologies requires modular hardware assembly [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e] that translates the simulated graph into a constructible robot configuration.\u003c/p\u003e \u003cp\u003e \u003cb\u003ePhase 2 \u0026mdash; Policy Optimization.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eWith m fixed, the policy π_m is updated for N_π PPO steps [\u003cspan citationid=\"CR42\" class=\"CitationRef\"\u003e42\u003c/span\u003e] under r^i_t. Conditioning the policy network on z_m permits partial weight sharing across structurally similar morphologies [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e], improving sample efficiency during the behavioral refinement phase.\u003c/p\u003e \u003cp\u003e \u003cb\u003ePhase 3 \u0026mdash; World Model Update.\u003c/b\u003e \u003c/p\u003e \u003cp\u003eTransitions collected in Phase 2 are added to a stratified replay buffer organized by morphology [\u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e]. Ensemble members are updated independently to preserve inter-member disagreement as a reliable uncertainty signal [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.6 Sim-to-Real Transfer\u003c/h2\u003e \u003cp\u003eAll training occurs in physics simulation [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e, \u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e, \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e]. To bridge the reality gap, we apply morphology-conditioned domain randomization [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e, \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e] in which the range of randomized physical parameters, module masses, joint friction, contact geometry scales with the structural complexity of the active morphology. More complex bodies are harder to model faithfully and receive correspondingly wider randomization windows. We additionally impose a morphological consistency loss, penalizing ensemble predictions that vary across randomization seeds for the same morphology-action pair, which biases the world model toward structure-grounded rather than incidental features of the dynamics [\u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e, \u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]. At evaluation time, the top novelty morphology is assembled on physical hardware and the corresponding policy is deployed directly, without any world adaptation.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003eWe evaluate Morphological Curiosity across four axes: morphological diversity, zero-shot task transfer, ablation of individual components, and analysis of emergent structural properties. All reported numbers are means with 95% confidence intervals over five independent seeds. Simulation infrastructure draws on standard environments [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e, \u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e, \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e] and distributed training tools [\u003cspan citationid=\"CR64\" class=\"CitationRef\"\u003e64\u003c/span\u003e].\u003c/p\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003e5.1 Experimental Setup\u003c/h2\u003e \u003cp\u003e \u003cb\u003eEnvironments.\u003c/b\u003e Three environments of increasing complexity serve as held-out evaluation tasks: \u003cem\u003eFlatTerrain\u003c/em\u003e (forward locomotion on a uniform surface), \u003cem\u003eRoughTerrain\u003c/em\u003e (locomotion over procedurally generated uneven ground [\u003cspan citationid=\"CR62\" class=\"CitationRef\"\u003e62\u003c/span\u003e, \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e]), and \u003cem\u003eManipulationReach\u003c/em\u003e (sequential end-effector contact with randomly placed targets). None of these environments is visible during the unsupervised training phase; the agent trains exclusively in a featureless ExplorationArena. This zero-shot evaluation protocol reflects deployment conditions in which a pre-trained agent must generalize to previously unseen settings, a scenario increasingly relevant to physical AI systems deployed in dynamic environments [\u003cspan citationid=\"CR57\" class=\"CitationRef\"\u003e57\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cb\u003eBaselines.\u003c/b\u003e Four comparisons are included. \u003cem\u003eFixed-Body Curiosity (FBC)\u003c/em\u003e [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] applies the same curiosity objective to a fixed default morphology. \u003cem\u003eReward-Driven Co-Evolution (RDCE)\u003c/em\u003e [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] uses MAP-Elites with task reward as the fitness signal. \u003cem\u003eRandom Morphology Search (RMS)\u003c/em\u003e evaluates curiosity-driven policies on randomly sampled bodies, isolating the contribution of directed structural exploration. \u003cem\u003eBehavioral Curiosity Only (BCO)\u003c/em\u003e ablates morphological novelty by setting λ_m\u0026thinsp;=\u0026thinsp;0.\u003c/p\u003e \u003cp\u003e \u003cb\u003eMetrics.\u003c/b\u003e We report Morphology Coverage (MC), the kernel-density-estimated volume of the morphological embedding space visited during training; Transfer Performance (TP), mean cumulative task reward under zero-shot deployment; Sample Efficiency (SE), environment steps to 80% of maximum TP; and Structural Consistency (SC), a measure of bilateral symmetry and modular regularity in discovered morphologies [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e5.2 Morphology Discovery and Coverage\u003c/h2\u003e \u003cp\u003eAfter 49\u0026nbsp;million training steps, Morphological Curiosity achieves an MC score of 0.74\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03, versus 0.41\u0026thinsp;\u0026plusmn;\u0026thinsp;0.05 for RDCE [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], 0.29\u0026thinsp;\u0026plusmn;\u0026thinsp;0.04 for BCO, and 0.60\u0026thinsp;\u0026plusmn;\u0026thinsp;0.06 for RMS. The gap relative to RDCE reflects the well-documented tendency of fitness-based search to converge on a narrow set of high-performing attractors at the expense of structural breadth [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e]. RMS achieves competitive coverage by construction but, as Section \u003cspan refid=\"Sec12\" class=\"InternalRef\"\u003e5.3\u003c/span\u003e shows, coverage without directed behavioral learning does not translate into transferable configurations. t-SNE projections [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] of the morphological embedding space reveal that our method fills the space with multiple well-separated clusters, while RDCE produces dense concentrations around a small number of attractor morphologies.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e5.3 Zero-Shot Transfer Performance\u003c/h2\u003e \u003cp\u003eOn FlatTerrain, Morphological Curiosity scores 847\u0026thinsp;\u0026plusmn;\u0026thinsp;41 versus 891\u0026thinsp;\u0026plusmn;\u0026thinsp;38 for RDCE (p\u0026thinsp;=\u0026thinsp;0.14, Welch's t-test). The small advantage of RDCE here is expected: flat locomotion is exactly the kind of narrow, stable target that task-reward optimization is designed for, and the generality premium purchased by curiosity-driven search offers little advantage.\u003c/p\u003e \u003cp\u003eOn RoughTerrain the picture reverses sharply. Our method achieves 763\u0026thinsp;\u0026plusmn;\u0026thinsp;50 against 504\u0026thinsp;\u0026plusmn;\u0026thinsp;67 for RDCE, a 50.4% gain. The morphologies our system selects for this environment tend toward lower centers of mass, wider stances, and greater limb redundancy structural features that confer terrain robustness without having been explicitly targeted [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e, \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e]. This is consistent with embodied design principles observed in biological locomotors [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e], where structural generality emerges from selection pressure toward diverse environments rather than any single one.\u003c/p\u003e \u003cp\u003eOn ManipulationReach, scores are 612\u0026thinsp;\u0026plusmn;\u0026thinsp;57 (ours), 389\u0026thinsp;\u0026plusmn;\u0026thinsp;72 (RDCE), and 201\u0026thinsp;\u0026plusmn;\u0026thinsp;43 (FBC). The particularly large gap over FBC demonstrates that behavioral generalization is insufficient when the downstream task demands physical capabilities the fixed morphology simply lacks [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. Our system independently discovers elongated distal chains and additional degrees of freedom at end-effector joints [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e, \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e], properties the task never signaled.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e5.4 Ablation Studies\u003c/h2\u003e \u003cp\u003e \u003cb\u003eDual-level reward.\u003c/b\u003e Removing morphological novelty (λ_m\u0026thinsp;=\u0026thinsp;0) drops RoughTerrain TP by 34.2% and ManipulationReach TP by 41.7%. Removing behavioral novelty (λ_b\u0026thinsp;=\u0026thinsp;0) produces more moderate losses of 18.3% and 22.1%, respectively. Behavioral curiosity alone [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e] cannot generate sufficient structural diversity for transfer.\u003c/p\u003e \u003cp\u003e \u003cb\u003eGNN representation.\u003c/b\u003e Replacing the GNN [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] with a flattened parameter vector reduces MC by 29.6% and RoughTerrain TP by 24.8%. Graph-structured embeddings are necessary for the novelty metric to capture genuine structural dissimilarity and for the world model to generalize across morphologies.\u003c/p\u003e \u003cp\u003e \u003cb\u003eEnsemble model.\u003c/b\u003e A single deterministic forward model in place of the probabilistic ensemble [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e] degrades the correlation between ensemble disagreement and actual prediction error from r\u0026thinsp;=\u0026thinsp;0.81 to r\u0026thinsp;=\u0026thinsp;0.46, propagating to a 19.4% drop in ManipulationReach TP.\u003c/p\u003e \u003cp\u003e \u003cb\u003eMorphology conditioning.\u003c/b\u003e Removing z_m from the world model is the most damaging ablation: RoughTerrain TP falls 46.3% and the advantage over FBC [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] is nearly eliminated. Without structural conditioning the curiosity signal is purely behavioral, and the co-evolution dynamics collapse to a fixed-body search.\u003c/p\u003e \u003cp\u003e \u003cb\u003eAnnealing schedule.\u003c/b\u003e Fixing λ_m throughout training reduces SE by 38.7% and substantially increases cross-seed variance. Consistent with curriculum learning analyses [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e], early behavioral refinement within immature morphologies prevents subsequent discovery of structurally superior configurations.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e5.5 Emergent Structural Properties\u003c/h2\u003e \u003cp\u003eStructural Consistency scores are 0.69\u0026thinsp;\u0026plusmn;\u0026thinsp;0.04 (ours), 0.31\u0026thinsp;\u0026plusmn;\u0026thinsp;0.06 (RMS), and 0.71\u0026thinsp;\u0026plusmn;\u0026thinsp;0.03 (RDCE on locomotion tasks). The near-equivalence between our method and RDCE on SC is notable: both discover symmetric, modular body plans, but through entirely different mechanisms. Symmetric configurations generate richer dynamics under perturbation; bilateral symmetry amplifies prediction error diversity so the curiosity objective rewards symmetry directly, without task signal [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]. Similarly, configurations with sparse proximal joints and dense distal articulation sustain high prediction error for longer than highly redundant proximal structures, yielding the proximal-to-distal gradient consistently observed in vertebrate limb anatomy [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e, \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e]. These regularities in turn facilitate physical realization using modular hardware platforms designed around similar structural principles [\u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e41\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e5.6 Sim-to-Real Transfer\u003c/h2\u003e \u003cp\u003eThree hardware instantiations of top-performing morphologies were assembled on a commercial modular robotics platform [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] and evaluated without any real-world fine-tuning. Our method's mean sim-to-real gap is 18.3% \u0026plusmn; 4.1%, compared to 31.7% \u0026plusmn; 6.8% for RDCE. The advantage comes from two sources: morphology-conditioned domain randomization [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e, \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e] produces policies inherently robust to parameter variation, and curiosity driven morphologies tend to be structurally simpler than task-optimized ones, reducing sensitivity to simulation error [\u003cspan citationid=\"CR48\" class=\"CitationRef\"\u003e48\u003c/span\u003e]. All three robots executed stable gaits over uneven terrain without online adaptation.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e5.7 Summary\u003c/h2\u003e \u003cp\u003eAll three primary hypotheses are supported. Curiosity driven co-evolution produces more structurally diverse and transferable morphologies than reward-driven baselines [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], with gains of up to 50.4% on unseen tasks. Self-selected bodies confer consistent zero-shot advantages. The intrinsic objective recovers ecologically coherent structural regularities [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e, \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e] as emergent consequences of maximizing predictive diversity.\u003c/p\u003e \u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eWe have presented Morphological Curiosity, a system that allows a modular robot to jointly explore its own physical form and behavioral repertoire, driven entirely by intrinsic motivation. The central contribution is treating body design not as a fixed input but as a variable within a joint optimization one that, when guided by curiosity rather than task reward, produces configurations that transfer broadly across downstream objectives. This departs fundamentally from both standard policy learning [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] and task-conditioned co-evolution, neither of which produces the kind of open-ended structural diversity that generalizable physical intelligence requires.\u003c/p\u003e \u003cp\u003eThe dual-level reward formulation separating morphological novelty [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] from behavioral novelty [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e] within a shared world model [\u003cspan citationid=\"CR39\" class=\"CitationRef\"\u003e39\u003c/span\u003e] is a necessary architectural choice, as ablations confirm. Collapsing the two signals destroys the structural exploration pressure on which downstream generalization depends.\u003c/p\u003e \u003cp\u003eThe emergent structural regularities are, from a researcher's perspective, the most conceptually significant finding. Symmetry, distal joint density, and energetic parsimony [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e, \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e] were not designed into the objective; they were selected by it. This suggests that the relationship between predictability and morphological quality is not incidental that bodies worth having are, in a precise information-theoretic sense, bodies that remain surprising. Intrinsic motivation [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e] may therefore be not merely a practical alternative to task reward but a more principled foundation for physical intelligence.\u003c/p\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e6.1 Limitations\u003c/h2\u003e \u003cp\u003e \u003cb\u003eScalability.\u003c/b\u003e The attachment operator scales poorly with module count. Combinatorial growth of the attachment probability space makes large scale structural search infeasible without hierarchical decomposition [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cb\u003eContinual learning.\u003c/b\u003e The stratified replay buffer reduces but does not eliminate interference between morphologies encountered at different training stages [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e, \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e, \u003cspan citationid=\"CR60\" class=\"CitationRef\"\u003e60\u003c/span\u003e]. More principled approaches to non-stationary model learning are needed.\u003c/p\u003e \u003cp\u003e \u003cb\u003eContact dynamics.\u003c/b\u003e Despite morphology-conditioned randomization [\u003cspan citationid=\"CR45\" class=\"CitationRef\"\u003e45\u003c/span\u003e, \u003cspan citationid=\"CR46\" class=\"CitationRef\"\u003e46\u003c/span\u003e, \u003cspan citationid=\"CR47\" class=\"CitationRef\"\u003e47\u003c/span\u003e], high-contact configurations exhibit the largest sim-to-real gaps. Current simulators [\u003cspan citationid=\"CR61\" class=\"CitationRef\"\u003e61\u003c/span\u003e, \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e63\u003c/span\u003e] do not model contact with sufficient fidelity for curiosity driven systems that actively seek out physically complex configurations.\u003c/p\u003e \u003cp\u003e \u003cb\u003eEnvironmental coupling.\u003c/b\u003e Morphological exploration here occurs in a fixed environment. Biological morphology and ecological niche co-evolve [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e]; extending our framework to allow the exploration arena itself to vary as in POET [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] is a natural and important extension.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003e6.2 Future Directions\u003c/h2\u003e \u003cp\u003e \u003cb\u003eHierarchical body search.\u003c/b\u003e Organizing modules into functional sub assemblies and applying curiosity hierarchically [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] would dramatically expand the accessible structural space while maintaining search tractability, analogous to hierarchical policy optimization [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cb\u003ePopulation-based structural diversity.\u003c/b\u003e Multi-agent extensions in which individuals are additionally rewarded for being structurally novel relative to the population [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e, \u003cspan citationid=\"CR56\" class=\"CitationRef\"\u003e56\u003c/span\u003e] could produce richer diversity through competitive pressure, building on quality-diversity frameworks [\u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e, \u003cspan citationid=\"CR54\" class=\"CitationRef\"\u003e54\u003c/span\u003e, \u003cspan citationid=\"CR55\" class=\"CitationRef\"\u003e55\u003c/span\u003e] and open-ended evolution [\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e, \u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e].\u003c/p\u003e \u003cp\u003e \u003cb\u003ePre-training for downstream adaptation.\u003c/b\u003e The morphology policy archive produced by Morphological Curiosity constitutes a diverse set of embodied priors. Fine tuning from this archive with task reward [\u003cspan citationid=\"CR44\" class=\"CitationRef\"\u003e44\u003c/span\u003e] is a natural path toward few-shot physical task adaptation.\u003c/p\u003e \u003cp\u003e \u003cb\u003eDeployment-time morphological adaptation.\u003c/b\u003e Allowing a deployed robot to continue structural adaptation in response to environmental change [\u003cspan citationid=\"CR58\" class=\"CitationRef\"\u003e58\u003c/span\u003e, \u003cspan citationid=\"CR59\" class=\"CitationRef\"\u003e59\u003c/span\u003e] is the logical endpoint of this line of work. Autonomous navigation research has shown that online re-planning in response to changing conditions is tractable on physical platforms [\u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e65\u003c/span\u003e]; applying the same principle to morphological re-configuration would represent a major step toward genuinely adaptive physical intelligence, mirroring the developmental plasticity observed in biological organisms [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003e6.3 Closing Remarks\u003c/h2\u003e \u003cp\u003eWe began with the observation that bodies matter [\u003cspan citationid=\"CR49\" class=\"CitationRef\"\u003e49\u003c/span\u003e, \u003cspan citationid=\"CR50\" class=\"CitationRef\"\u003e50\u003c/span\u003e, \u003cspan citationid=\"CR51\" class=\"CitationRef\"\u003e51\u003c/span\u003e]. We end with the claim that the right bodies are discoverable not by specifying what they should accomplish, but by asking what they should sustain. A morphology that maintains high predictive uncertainty across a wide range of interactions is a morphology that supports open ended learning [\u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e52\u003c/span\u003e, \u003cspan citationid=\"CR53\" class=\"CitationRef\"\u003e53\u003c/span\u003e], and a system driven to seek such morphologies will, as our results suggest, converge on solutions that are physically coherent, ecologically interpretable, and broadly transferable. The perspective that form and curiosity should co-evolve rather than that form should be optimized for function has implications beyond modular robotics [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. We regard Morphological Curiosity as an initial demonstration of this principle and expect the ideas developed here to be applicable wherever the body is a design variable rather than a constant.\u003c/p\u003e \u003cp\u003e \u003cem\u003eCode, trained models, and experiment configurations will be released at [repository link] upon publication.\u003c/em\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eSims K (2023) Evolving virtual creatures. In \u003cem\u003eSeminal Graphics Papers: Pushing the Boundaries, Volume 2\u003c/em\u003e (pp. 699\u0026ndash;706)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSims K (1994) Evolving 3D morphology and behavior by competition. Artif Life 1(4):353\u0026ndash;372\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCheney N, MacCurdy R, Clune J, Lipson H (2014) Unshackling evolution: evolving soft robots with multiple materials and a powerful generative encoding. ACM SIGEVOlution 7(1):11\u0026ndash;23\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLehman J, Stanley KO (2011) Abandoning objectives: Evolution through the search for novelty alone. Evolution Comput 19(2):189\u0026ndash;223\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMouret JB, Clune J (2015) Illuminating search spaces by mapping elites. \u003cem\u003earXiv preprint arXiv:1504.04909\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGupta A, Savarese S, Ganguli S, Fei-Fei L (2021) Embodied intelligence via learning and evolution. Nat Commun 12(1):5721\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eClune J, Mouret JB, Lipson H (2013) The evolutionary origins of modularity. \u003cem\u003eProceedings of the Royal Society b: Biological sciences\u003c/em\u003e, \u003cem\u003e280\u003c/em\u003e(1755)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eStanley KO, Miikkulainen R (2002) Evolving neural networks through augmenting topologies. Evolution Comput 10(2):99\u0026ndash;127\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLehman J, Stanley KO (2011) Novelty search and the problem with objectives. Genetic programming theory and practice IX. Springer New York, New York, NY, pp 37\u0026ndash;56\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang R, Lehman J, Clune J, Stanley KO (2019) Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. \u003cem\u003earXiv preprint arXiv:1901.01753\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYim, M., Shen, W. M., Salemi, B., Rus, D., Moll, M., Lipson, H., \u0026hellip; Chirikjian, G.S. (2007). Modular self-reconfigurable robot systems [grand challenges of robotics].IEEE Robotics \u0026amp; Automation Magazine, 14(1), 43\u0026ndash;52.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhao Q, Nakajima K, Sumioka H, Hauser H, Pfeifer R (2013) Morphological computation as a whole-body resource in sprawling quadruped locomotion. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 1510\u0026ndash;1517). IEEE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAlim MW, Giri A, Akib AAS, Uddin N, Islam M, Arafat ME, Tahmid SA (2025), April Affordable bionic hands with intuitive control through forearm muscle signals. In \u003cem\u003e2025 IEEE 4th International Conference on Computing and Machine Intelligence (ICMI)\u003c/em\u003e (pp. 1\u0026ndash;6). IEEE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWhitman J, Ha S, Atkeson C (2017) Towards a modular robot system for extraterrestrial construction. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA) (pp. 4173\u0026ndash;4180). IEEE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuang W, Mordatch I, Pathak D (2020), November One policy to control them all: Shared modular policies for agent-agnostic control. In \u003cem\u003eInternational Conference on Machine Learning\u003c/em\u003e (pp. 4455\u0026ndash;4464). PMLR\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePathak D, Lu C, Darrell T, Isola P, Efros AA (2019) Learning to control self-assembling morphologies: a study of generalization via modularity. Adv Neural Inf Process Syst, \u003cem\u003e32\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eScarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE Trans Neural Networks 20(1):61\u0026ndash;80\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. \u003cem\u003earXiv preprint arXiv:1609.02907\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017), July Neural message passing for quantum chemistry. In \u003cem\u003eInternational conference on machine learning\u003c/em\u003e (pp. 1263\u0026ndash;1272). Pmlr\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eXu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? \u003cem\u003earXiv preprint arXiv:1810.00826\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWang T, Liao R, Ba J, Fidler S (2018), February Nervenet: Learning structured policy with graph neural networks. In \u003cem\u003eInternational conference on learning representations\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSchmidhuber J (1991) A possibility for implementing curiosity and boredom in model-building neural controllers. In \u003cem\u003eProc. of the international conference on simulation of adaptive behavior: From animals to animats\u003c/em\u003e (pp. 222\u0026ndash;227)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eOudeyer PY, Kaplan F, Hafner VV (2007) Intrinsic motivation systems for autonomous mental development. IEEE Trans Evol Comput 11(2):265\u0026ndash;286\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePathak D, Agrawal P, Efros AA, Darrell T (2017), July Curiosity-driven exploration by self-supervised prediction. In \u003cem\u003eInternational conference on machine learning\u003c/em\u003e (pp. 2778\u0026ndash;2787). PMLR\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBurda Y, Edwards H, Storkey A, Klimov O (2018) Exploration by random network distillation. \u003cem\u003earXiv preprint arXiv:1810.12894\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBellemare M, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation. \u003cem\u003eAdvances in neural information processing systems\u003c/em\u003e, \u003cem\u003e29\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAkib, A. A. S., Giri, A., Islam, M., Sifa, F. J., Elahi, T. A., Aktia, A. N., \u0026hellip; Khanna,A. (2024, October). Design and simulation of a quadruped robot. In International Conference on Data-Processing and Networking(pp. 373\u0026ndash;385). Singapore: Springer Nature Singapore.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBarto AG (2012) Intrinsic motivation and reinforcement learning. Intrinsically motivated learning in natural and artificial systems. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 17\u0026ndash;47\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAubret A, Matignon L, Hassas S (2019) A survey on intrinsic motivation in reinforcement learning. \u003cem\u003earXiv preprint arXiv:1908.06976\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSekar R, Rybkin O, Daniilidis K, Abbeel P, Hafner D, Pathak D (2020), November Planning to explore via self-supervised world models. In \u003cem\u003eInternational conference on machine learning\u003c/em\u003e (pp. 8583\u0026ndash;8592). PMLR\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRaileanu R, Rockt\u0026auml;schel T (2020) Ride: Rewarding impact-driven exploration for procedurally-generated environments. \u003cem\u003earXiv preprint arXiv:2002.12292\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eEysenbach B, Gupta A, Ibarz J, Levine S (2018) Diversity is all you need: Learning skills without a reward function. \u003cem\u003earXiv preprint arXiv:1802.06070\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHa D, Schmidhuber J (2018) World models. arXiv preprint arXiv:1803 10122 2(3):440\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHafner D, Lillicrap T, Fischer I, Villegas R, Ha D, Lee H, Davidson J (2019), May Learning latent dynamics for planning from pixels. In \u003cem\u003eInternational conference on machine learning\u003c/em\u003e (pp. 2555\u0026ndash;2565). PMLR\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHafner D, Lillicrap T, Norouzi M, Ba J (2020) Mastering atari with discrete world models. \u003cem\u003earXiv preprint arXiv:2010.02193\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHafner D, Lillicrap T, Ba J, Norouzi M (2019) Dream to control: Learning behaviors by latent imagination. \u003cem\u003earXiv preprint arXiv:1912.01603\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSchrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S.,\u0026hellip; Silver, D. (2020). Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839), 604\u0026ndash;609.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChua K, Calandra R, McAllister R, Levine S (2018) Deep reinforcement learning in a handful of trials using probabilistic dynamics models. \u003cem\u003eAdvances in neural information processing systems\u003c/em\u003e, \u003cem\u003e31\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLakshminarayanan B, Pritzel A, Blundell C (2017) Simple and scalable predictive uncertainty estimation using deep ensembles. \u003cem\u003eAdvances in neural information processing systems\u003c/em\u003e, \u003cem\u003e30\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDeisenroth M, Rasmussen CE (2011) PILCO: A model-based and data-efficient approach to policy search. In \u003cem\u003eProceedings of the 28th International Conference on machine learning (ICML-11)\u003c/em\u003e (pp. 465\u0026ndash;472)\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGiri, A., Akib, A. A. S., Hasib, A., Acharya, A., Prithibi, M. A., Rahman, R. H.,\u0026hellip; Taha, H. I. C. (2025, April). Design and development of a cost effective and modular cnc plotter for educational and prototyping applications. In 2025 IEEE 4th International Conference on Computing and Machine Intelligence (ICMI)(pp. 1\u0026ndash;6). IEEE.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSchulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. \u003cem\u003earXiv preprint arXiv:1707.06347\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G.,\u0026hellip; Hassabis, D. (2015). Human-level control through deep reinforcement learning. nature, 518(7540), 529\u0026ndash;533.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSutton RS, Barto AG (1998) Reinforcement learning: An introduction, vol 1. MIT Press, Cambridge, pp 9\u0026ndash;11. 1\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTobin J, Fong R, Ray A, Schneider J, Zaremba W, Abbeel P (2017), September Domain randomization for transferring deep neural networks from simulation to the real world. In \u003cem\u003e2017 IEEE/RSJ international conference on intelligent robots and systems (IROS)\u003c/em\u003e (pp. 23\u0026ndash;30). IEEE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAndrychowicz, O. M., Baker, B., Chociej, M., Jozefowicz, R., McGrew, B., Pachocki,J., \u0026hellip; Zaremba, W. (2020). Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1), 3\u0026ndash;20.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePeng XB, Andrychowicz M, Zaremba W, Abbeel P (2018), May Sim-to-real transfer of robotic control with dynamics randomization. In \u003cem\u003e2018 IEEE international conference on robotics and automation (ICRA)\u003c/em\u003e (pp. 3803\u0026ndash;3810). IEEE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhao W, Queralta JP, Westerlund T (2020), December Sim-to-real transfer in deep reinforcement learning for robotics: a survey. In \u003cem\u003e2020 IEEE symposium series on computational intelligence (SSCI)\u003c/em\u003e (pp. 737\u0026ndash;744). IEEE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePfeifer R, Bongard J (2006) How the body shapes the way we think: a new view of intelligence. MIT Press\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePfeifer R, Lungarella M, Iida F (2007) Self-organization, embodiment, and biologically inspired robotics. Science 318(5853):1088\u0026ndash;1093\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHasson C, Fukuoka Y, Nakamura M (2011) Morphological computation as a key for the design of versatile robots. In Proceedings of the 14th International Conference on Climbing and Walking Robots (CLAWAR). World Scientific\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePfeifer R, Scheier C (1999) Understanding intelligence. MIT Press\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCully A, Clune J, Tarapore D, Mouret JB (2015) Robots that can adapt like animals. Nature 521:503\u0026ndash;507\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePugh JK, Soros LB, Stanley KO (2016) Quality diversity: A new frontier for evolutionary computation. Front Rob AI 3:40\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGravina D, Liapis A, Yannakakis GN (2019) Quality diversity through surprise. IEEE Trans Evol Comput 23(4):603\u0026ndash;616\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eEcoffet A, Huizinga J, Lehman J, Stanley KO, Clune J (2021) First return, then explore. Nature 590:580\u0026ndash;586\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGiri A, Hasib A, Islam M, Tazim MF, Rahman MDS, Khadgi M (2025) Real-time human fall detection using YOLOv5 on Raspberry Pi: An edge AI solution for smart healthcare and safety monitoring. In Proceedings of the International Conference on Data Analytics \u0026amp; Management (pp. 493\u0026ndash;507). Springer\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, Hassabis D, Clopath C, Kumaran D, Hadsell R (2017) Overcoming catastrophic forgetting in neural networks. Proceedings of the National Academy of Sciences, 114(13), 3521\u0026ndash;3526\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eParisi GI, Kemker R, Part JL, Kanan C, Wermter S (2019) Continual lifelong learning with neural networks: A review. Neural Netw 113:54\u0026ndash;71\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eRolnick D, Ahuja A, Schwarz J, Lillicrap T, Wayne G (2019) Experience replay for continual learning. Adv Neural Inf Process Syst 32:350\u0026ndash;360\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBrockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI gym. arXiv preprint. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/1606.01540\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/1606.01540\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTodorov E, Erez T, Tassa Y (2012) MuJoCo: A physics engine for model-based control. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 5026\u0026ndash;5033). IEEE\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiang E, Liaw R, Nishihara R, Moritz P, Fox R, Goldberg K, Gonzalez J, Jordan M, Stoica I (2018) RLlib: Abstractions for distributed reinforcement learning. In Proceedings of the 35th International Conference on Machine Learning (ICML) (pp. 3053\u0026ndash;3062). PMLR\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCoumans E, Bai Y (2016) PyBullet, a Python module for physics simulation for games, robotics and machine learning. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://pybullet.org\u003c/span\u003e\u003cspan address=\"http://pybullet.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAkib ASMAS, Uddin AZMJ, Giri A, Islam M, Arafat ME, Bhuiyan T (2025) Efficient route planning and navigation in drones using Pixhawk autopilot. In Proceedings of the 2025 6th International Conference on Artificial Intelligence, Robotics and Control (AIRC). IEEE\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Modular robotics, morphological co-evolution, intrinsic motivation, curiosity driven exploration, graph neural networks, novelty search, sim-to-real transfer, embodied intelligence, open ended learning, zero shot generalization","lastPublishedDoi":"10.21203/rs.3.rs-9570273/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9570273/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eWe introduce Morphological Curiosity, a framework in which modular robots autonomously co-evolve their physical bodies and control policies without any externally provided reward signal. Existing robot learning methods overwhelmingly fix the hardware and optimize only behavior; co-evolutionary alternatives require hand designed fitness functions tied to specific tasks. We depart from both traditions by framing body-policy search as an unsupervised exploration problem, where the only driving signal is the robot's own predictive uncertainty about its world. Robot morphology is encoded as a differentiable computational graph processed by a Graph Neural Network, permitting gradient informed addition and removal of limbs during training. A morphology-conditioned forward model implemented as a probabilistic ensemble generates a curiosity signal that is explicitly sensitive to structural change, not merely behavioral novelty. We further decompose this signal into a morphological novelty term, which rewards discovering structurally distinct body configurations, and a behavioral novelty term, which rewards surprising transitions within a fixed body. Training proceeds in simulation with morphology-aware domain randomization; learned policies are transferred to physical hardware without fine tuning. Our experiments support three claims: curiosity driven co-evolution produces more transferable morphologies than reward-driven search; self-selected bodies outperform fixed body agents on unseen tasks under zero shot transfer; and the intrinsic objective causes ecologically coherent structural regularities limb symmetry, distal joint richness, and energetic efficiency to emerge without being explicitly encoded as objectives.\u003c/p\u003e","manuscriptTitle":"Morphological Curiosity: Self-Directed Body-Policy Co-Evolution\nin Modular Robots via Intrinsic Motivation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-04 05:33:03","doi":"10.21203/rs.3.rs-9570273/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"255bc8b6-3ad0-4880-bc7a-6e30ce717355","owner":[],"postedDate":"May 4th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-05-04T05:33:03+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-04 05:33:03","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9570273","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9570273","identity":"rs-9570273","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00