Predicting Ecosystem Resilience Using Multi-Agent Reinforcement Learning

doi:10.1101/2025.06.07.658424

Predicting Ecosystem Resilience Using Multi-Agent Reinforcement Learning

2025 · doi:10.1101/2025.06.07.658424

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 61,706 characters · extracted from oa-pdf · 5 sections · click to expand

Abstract

. Twin models of natural ecosystems hold great promise for informing real-world decisions on sustainable land use and biodiversity conservation. However, existing simulations of animal behavior often rely on manually crafted rules, limiting their scalability and practi- cal utility. Here, we present a flexible and scalable agent-based modeling approach that uses reinforcement learning—instead of hand-coded rules—to simulate animal behavior. We vali- date this approach across ten alpine ecosystems featuring wolves, chamois, and vegetation. By comparing model outputs with empirical data, we show that the simulations reproduce realistic ecological and behavioral patterns, including population dynamics, life history traits, and social interactions. We then use the model to assess ecosystem resilience under scenarios of habitat degradation, game hunting, and heat stress. Our framework paves the way for realistic simu- lations advancing our ability to predict ecosystem responses to disturbance and tipping points leading to biodiversity loss, in order to support conservation planning and guide the sustainable use of natural resources. 1 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint Ecosystems and the species they contain provide us with multiple goods and services, such as food, clean water, and material for buildings, clothes, and medicine [1]. However, more than a million species are estimated to be threatened with extinction, which means that with- out radical policy changes, the contribution of nature to people may soon be disrupted, with profound consequences for biodiversity and human communities throughout the world [2]. The importance and urgency of tackling the ongoing biodiversity crisis have been recognized in landmark international agreements calling for major improvements in the protection of nature worldwide [3]. Part of our ability to find solutions for this crisis is linked to the understanding of ecosystems and our capacity to predict biodiversity dynamics. Recent research has shown the potential of ecosystem models (sometimes referred to as “digital twins”) to guide the de- velopment of science-based policies that protect biodiversity while ensuring long-term human prosperity [4, 5]. Ecosystem models have also been used to study ecosystem dynamics [6] and for exploring the consequences of human activities such as recreation, farming, logging, and urban development, right from the planning stage [7], thus providing decision-makers with input for planning interventions with the least environmental impact. Additionally, ecosystem models have been employed to predict patterns of ecological interactions, such as predator-prey fluctuations and the effects of ecosystem alterations, driven by factors such as climate change and anthropogenic impact [6]. Analytic models. Ecosystem dynamics have traditionally been described through analytic models, such as the Lotka-V olterra [8] and Arditi–Ginzburg equations [9]. These models typi- cally describe predator-prey dynamics using systems of ordinary differential equations that are appreciated for their ability to represent population interactions over time. Their strength lies in modeling species interactions and biomass fluctuations over multiple generations, contributing to our understanding of energy and mass flow within ecosystems [10]. However, it remains dif- ficult to model analytically the behavior of organisms in complex and more realistic scenarios, for instance, where multiple functional groups (e.g. carnivores, herbivores, and primary pro- ducers) interact, and where spatial features and environmental variables affect the dynamics of the ecosystem [11]. Simulation models. Another approach to modeling ecosystems is based on computer simula- tions [12] that help solve problems for which analytic solutions are impractical or unavailable, 2 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint such as ecosystems with multiple functional groups, going beyond a simple prey-predator dy- namic, or that account explicitly for spatial features. Several simulation models have been de- veloped to generate realistic representations of biological systems, including population-based models, which represent organisms at an aggregated level [13, 14, 4, 15], andagent-based mod- els, which represent organisms individually [16, 17, 18, 19]. In agent-based models, each agent typically has a mechanism for decision-making, which governs its actions, such as moving and feeding, in a spatially explicit environment. Thanks to these characteristics, agent-based modeling has proven useful in wildlife management [20], in fishery ecology [21], and even in evolutionary contexts [15]. Reinforcement learning. Reinforcement learning [22] is a paradigm in artificial intelligence that enables an agent to interact with an environment and learn a behavior through trial and er- ror. The agent can interact with the environment to various degrees by makingobservations and performing actions. It receives feedback from the environment in the form of a reward signal, consisting of a number that, at each time step, quantifies to what extent the interaction between the agent and the environment has a positive or negative effect on the agent. Reinforcement learning algorithms are used to optimize models that encode how agents behave in their envi- ronment to maximize their reward. These models are called policies, mapping observations to actions. A common way of representing policies is to use policy networks: artificial neural net- works that take observations as input and return actions as output [23]. Reinforcement learning has been used in domains such as robotics, autonomous driving, finance, natural language pro- cessing, and healthcare [22]. It has also been used for developing programs that can play video games like Pac-Man and Space Invaders [24], strategic games like chess and Go [25], and most recently, Minecraft [26]. In environmental sciences, reinforcement learning has been used to select optimal areas for biological conservation, where the agent is a policy maker establishing protected areas in an environment with multiple species and the reward is determined by pre- venting biodiversity loss [4, 27]. Additional applications of reinforcement learning within the field of ecology have been discussed [28] but have not yet led to applications for modeling real ecosystems. Multi-agent reinforcement learningis a form of reinforcement learning used to develop behavioral models for multiple agents interacting within a shared environment [29]. For in- stance, researchers have used it to train models that play StarCraft II at a professional level [30] and to simulate predator-prey interactions in virtual maze-like environments [31, 32]. 3 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint Challenges in agent-based ecosystem modeling. In spite of the broad flexibility of simulation approaches, examples of agent-based ecosystem models remain relatively few and their poten- tial has not been fully exploited [33]. Efficient and realistic ecosystem modeling is challenged by our limited ability to model the behavior of its organisms under a wide range of conditions. Indeed, a key component of agent-based ecosystem modeling is the encoding of animal behav- ior, which may include moving, feeding, and resting as a response to environmental inputs (e.g. presence or absence of predators in the vicinity) and physiological inputs (e.g. hydration and energy levels) [17]. A common approach is to hand-code animal behavior, typically by defining a set of “if-then” rules formulated based on our understanding of behavioral ecology [34]. Yet, hand-coding behavioral models is highly challenging, as the response of organisms to multiple inputs (visual, olfactory, tactile, physiological) is poorly understood in biological systems and virtually impossible to capture in an exhaustive set of rules linking all possible combinations of inputs with resulting actions [35]. One of the challenges is achievingstable ecosystem models, such that under beneficial circumstances, their functional groups can coexist for long periods of time in a dynamic equilibrium, without any of them dying out [36]. Another challenge in hand- coding behavioral models is obtaining real-world behavioral data, specifying in detail how indi- vidual animals act in different situations that they may encounter in their natural environments. This is also a limiting factor for developing behavioral models based on supervised learning [37]. Yet another challenge related to agent-based ecosystem modeling is obtaining population data for all functional groups, which might be used to define starting points of simulations and to validate the models. We propose to tackle one of the main challenges of agent-based ecosys- tem modeling by using reinforcement learning rather than hand-coding for constructing models of animal behavior. This is a special case of a more general idea—replacing rule-based models by machine learning models—which has been highly successful in other scientific areas, such as predicting protein structure [38] and, most recently, weather forecasting [39].

Results

We developed a new general strategy for constructing agent-based ecosystem models powered by multi-agent reinforcement learning and used it to create models of real ecosystems and explore to what extent these models were able to reproduce realistic ecosystem dynam- ics and animal behavior. Our ecosystem models were based on ten alpine areas, located in the Rhaetian Alps around the Stelvio National Park, in Italy (Fig. 1(A)). An overview of the flora and fauna of the Rhaetian Alps can be found in [40, 41, 42]. Our ecosystem models included: 4 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint FIGURE 1. (A): Aerial photograph from 2020 of an area in the Rhaetian Alps. (B): A spatial model of the same area with 100 × 100 cells. Blue represents water, while the three shades of green represent areas with three types of vege- tation, with darker green representing vegetation that grows relatively fast. The white and gray dots represent chamois of different age, while the red dots repre- sent wolves. Brown indicates vegetation cells that have recently been grazed by chamois. (C) and (D): Two of the many thousand synthetic spatial models with 50 × 50 cells that were used for training the behavioral model. (1) primary producers represented by three types of vegetation, each with its own growth rate; (2) a large herbivore, hereafter referred to as the chamois (Rupicapra rupicapra), one of the mammalian herbivores in the region; and (3) an apex predator, hereafter referred to as the wolf (Canis lupus), one of the top mammalian predators in the area [42]. The primary producers were modeled collectively, while the chamois and wolves were modeled individually as agents. In the Alpine area, chamois and wolves have been able to coexist for hundreds of years. After a period when wolves were hunted to extinction, migrations and human reintroduction repopu- lated the area [43, 42]. We used geographical information to constructspatial models of the ten areas, discretized into 100 × 100 cells and populated with vegetation and agents representing individual chamois and wolves (Fig. 1(B)). After defining a dynamic model, which specifies the rules for reproduction, death, and the consequences of the agents’ actions, we used rein- forcement learning to train a behavioral model, which computes the action of any agent at any 5 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint FI GURE 2. Maps of ten areas in the Rhaetian Alps based on multi-band Sentinel-2 images with 100 × 100 pixels. Light blue represents water, gray is used for rocks and bare soil, lilac stands for snow, yellow for scarcely vegetated areas, light green for pastures and grasslands, dark green for shrubs and tree- vegetated areas, while red is used for buildings and artificial settlements. location in any spatial model. The behavioral model was trained by using thousands of ran- domly populated “synthetic” spatial models, generated using Perlin noise [44] (Fig. 1(C)–(D)). The purpose of using multiple spatial models was to produce versatile behavioral models of chamois and wolves that can cope with a range of different environments. We then used our trained behavioral models to conduct experiments, allowing us to highlight different properties of our ecological models across several areas. Coexistence experiments. The purpose of these experiments was to see to what extent the chamois and wolf agents (hereafter, the chamois and wolves for simplicity) were able to coexist in spatial models of the ten areas of the Rhaetian Alps (Fig. 2). Ten simulations were run on each of the ten spatial models, starting from randomly spawned populations. Thus, there was a total of 100 simulations. Each simulation was run for up to 4,000 steps, corresponding to eight times the maximum lifetime of a wolf (which was set to 500 steps), but it was stopped as soon as an extinction event occurred (either the chamois or the wolves went extinct). In 95 of these 100 simulations, neither the chamois nor the wolves died out. In the remaining five simulations, the wolves died out (Supplementary Fig. 3). These simulations showed that the agents with behavior optimized through reinforcement learning had developed the ability to coexist in spatial models of the ten natural environments. Behavioral experiment. We then ran an experiment to evaluate the ability of the framework to reproduce and maintain long-term realistic population dynamics and animal behavior, in the 6 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint absence of external disruptive factors. Using one of the ten spatial models (Fig. 1(B)), we ran a simulation for 10,000 steps, which corresponds to 20 times the maximum lifetime of a wolf. The population dynamics observed in this experiment show that the chamois and wolves were indeed able to coexist for multiple generations in the same area. The number of agents alive at the same time during the simulation ranged between 64 and 307 for the chamois and between 7 and 33 for wolves. These numbers reflect the expected imbalance between the abundances of prey and predators [45, 46]. Moreover, Lotka-V olterra dynamics arose as an emergent property of the model. Our simulation revealed a clear and consistent pattern where a chamois popula- tion surge is followed by a wolf surge, which leads to a chamois decline, followed by a wolf decline, before the cycle starts over again (Fig. 3(A)). This is remarkable because such dy- namics had not been manually encoded in the behavior of the agents, but emerged from the simulation. The location of the chamois and wolves agents was monitored during the simula- tion so that heat maps of their spatial distribution could be constructed. The heat maps indicate that the chamois were relatively spread throughout the area (Fig. 3(B)), while the wolves were found mainly around the big lake (Fig. 3(C)). The statistics collected show that many chamois died at a young age and only a few reached the maximum age (Fig. 4(A)). Young wolves had high death rates, but those that survived past early life tended to live relatively long lives (Fig. 4(B)). A comparison of these statistics (Fig. 4) against available data about natural populations of chamois and wolves in the literature showed that our simulations led to realistic patterns [43]. The animal agents in the simulation can seem to behave in a similar way to their natu- ral counterparts (Supplementary Video 1). Chamois tend to move between vegetation and water sources; wolves tend to hunt chamois, and chamois tend to run away from wolves; both chamois and wolves tend to move along relatively straight trajectories, while avoiding obstacles such as lakes; and both chamois and wolves have a tendency to form groups. Resilience experiments. The purpose of these experiments was to evaluate the resilience of the ten ecosystems (Fig. 1) to various degrees of environmental pressure in the form of habitat degradation, game hunting, and heat stress. In particular, we wanted to see if we could locate tipping points, where the pressure would likely cause a decrease in biodiversity. Habitat degradation. Habitat degradation—potentially caused by agriculture, forestry, min- ing, infrastructure projects, or urban development—was simulated by replacing areas of fast- growing natural vegetation with slow-growing vegetation. This effectively reflects a transition from a naturally productive pasture to one that take substantially longer time to recover its full 7 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint (A) Rescaled amounts of vegetation (green), chamois (blue), and wolves (red) during a simulation. X-position Y -position Chamois position frequency per cell 0 200 400 600 800 1000 1200 1400 (B) Heat map of the chamois X-position Y -position Wolf position frequency per cell 0 50 100 150 200 250 (C) Heat map of the wolves FIGURE 3. Population dynamics from the behavioral experiment. (A): The graph shows the amount of wolves, chamois, and vegetation during the simu- lation run. For each of these functional groups, the amount was normalized to the interval [0,1]. The max level 1.0 represents 35 wolves, 267 chamois, and fully grown vegetation across the map. (B) and (C): Heat maps for chamois and wolves, reflecting their spatial distribution during the simulation. biomass after grazing. Thus, habitat degradation led to more limited resources for the chamois. We explored the effect of different levels of habitat degradation, ranging from 0% to 100% of the area covered with vegetation. Each level of habitat degradation was tested on ten different spatial models and each model was simulated ten times using randomized initial animal pop- ulations. Thus, we ran 1,100 simulations in total. The simulations indicate a tipping point at ∼ 40% of habitat degradation (Fig. 5(A)). Game hunting. To model game hunting, we randomly removed animals, at nine levels of hunt- ing pressure (ranging from 0% to 90% killed every 100 time steps). Wolf hunting and chamois hunting were studied separately. We ran ten simulations at each level of hunting, thus generat- ing 90 data points for the chamois and 90 for the wolves. The simulations indicate that there 8 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint 0 100 200 300 400 500 Age 0 2000 4000 6000 8000Frequency Age distribution (chamois) (A) Chamois age at death 0 100 200 300 400 500 Age 0 50 100 150Frequency Age distribution (wolf) (B) Wolf age at death 0 1 2 3 4 5 6 7 8 9 10 Number of offspring 0 5000 10000 15000 20000Frequency Distribution of number of offspring (chamois) (C) Chamois offspring distribution 0 1 2 3 4 Number of offspring 0 100 200 300 400 500Frequency Distribution of number of offspring (wolf) (D) Wolf offspring distribution FIGURE 4. Life history statistics from the behavioral experiment. Longevity distribution of (A) chamois and (B) wolves. Offspring distribution of the chamois (C) and wolves (D). The rightmost bar in plot (C) represents ten off- spring or more. are tipping points for hunting at, respectively, ∼ 50% for chamois and ∼ 40% for wolves (Fig. 5(C) and 5(D)), above which an extinction event occurred. In all simulations where extinction events occurred, the wolves died out first, even when hunting targeted the chamois only. This reveals a higher sensitivity of the predator species in our system to increased mortality. Heat stress. We explored the effect of prolonged heat stress, for instance resulting from ex- treme weather patterns linked to climate change. Increased temperature puts pressure on many animal species by causing their energy consumption (metabolic rate) to increase. We studied the effect of heat stress at eight different levels of increased energy consumption (in the range 0–35%) for both chamois and wolves. Each level of stress was simulated ten times, starting from the same initial environment in all cases. Thus, a total of 80 simulations were performed. In the simulations, there was virtually no effect of increased temperature up to a tipping point at (∼ 20%), after which the extinction risk increased sharply (Fig. 5(B)). In all simulations with extinction events, the wolves died out first. 9 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Habitat degradation 0 1000 2000 3000 4000Episode length (A) Habitat degradation 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 Energy factor 1000 2000 3000 4000Episode length (B) Heat stress 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 Hunting pressure: fraction of chamois killed every 100 steps 1000 2000 3000 4000Episode length (C) Chamois hunting 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 Hunting pressure: fraction of wolves killed every 100 steps 0 1000 2000 3000 4000Episode length (D) Wolf hunting FIGURE 5. Results from our resilience experiments. Each colored dot repre- sents a simulation and its y-coordinate indicates how long the simulation lasted. Each simulation (episode) was run for up to 4,000 steps, but it was stopped as soon as an extinction event occurred. Our experiments studied the episode length for different levels of (A) habitat degradation; (B) metabolic energy con- sumption; (C) chamois hunting; and (D) wolf hunting.

Discussion

We have constructed a highly flexible strategy for ecosystem modeling that captures animal behavior and population dynamics without the need for hand-coding behavioral rules. This constitutes a first step towards robust exploration of the impact of individual interventions – such as habitat degradation, selective hunting or ecological restoration – on ecosystem health, integrity and biodiversity. Realism. Real ecosystems may contain thousands of species, millions of individuals, and countless known and unknown mechanisms that govern animal behavior and ecosystem dynam- ics. How ecosystems function and respond to changes depends strongly on phenomena that are 10 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint not fully understood and commonly modeled as stochastic processes, for example reproduction, death, migration, genetic mutations, and climate and landscape changes [47]. Because of this complexity, ecosystem models must inevitably rely on simplifying assumptions, for example when defining food webs. The simplifying assumptions must be chosen with the intended use of the ecosystem model in mind, whether it is to explore ecological patterns, analyze different management interventions, or predict resilience and ecological tipping points. The simplifying assumptions that were used in our models include a food web focused on a few representative species within trophic groups, imposed area boundaries that do not account for migration, and a reproduction model that does not fully capture biological behaviors. Yet, our results showed that the models were able to reproduce realistic ecological and behavioral patterns including population dynamics, life history statistics, foraging, and social behavior. Scope. The behavioral and dynamic models used in our experiments can be trained to model different ecosystems in any geographical area. This gives the model a considerable flexibility in its biogeographic scope and the biological species it can include. Our modeling framework can be extended with the addition of functional groups (for instance scavengers or omnivores) that can make the food web more complex and realistic. Additional land cover classes (e.g., more vegetation types) can also be included in the ecosystem model. After defining the proper- ties of the different species and vegetation types the ecosystem models can be trained through reinforcement learning following the procedure utilized in our experiments. There are limits to the number of agents that can be included in agent-based models, depending on the time avail- able for training and inference, and the available computational resources. We used hundreds of agents in our experiments, while millions of agents have been used in other studies based on reinforcement learning [31], leaving substantial room for scaling up the scope of our ecosystem models. Perception. Our animal agents receive input signals that can be divided into internal signals (interoception) and external signals (exteroception). The internal signals inform the agents about their status, in this case with regard to energy and hydration, while the external signals inform the agents about the quantity and location of food sources, water sources, competi- tors, predators, and obstacles in their surrounding environment. In reality, such information can originate from a combination of multiple senses, possibly together with memories of earlier observations. The external signals can be divided into short-range and long-range signals. The short-range signals provide relatively precise quantitative information about each cell in the 11 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint closest neighborhood surrounding the agent. The long-range signals, which we call “smells” for simplicity, provide the agents with less precise, aggregated information about cells outside their closest neighborhood. By perceiving and comparing the smell intensity in the nine cells surrounding it, the agent obtained approximate information about the amounts and direction of the sources of the smell. Thus, smell might help agents locate food, water bodies, competitors, and predators. Emergent behavior . During training, the animal agents were directly rewarded for eating and drinking. Indirectly, they were also rewarded for surviving, since surviving enables them to eat and drink more in the future and thus receive a greater total reward. Thus, the agents were trained to eat, drink, and survive in a broad range of environments, just like real animals are able to do. Our results from the coexistence experiment indicate that the trained agents are in- deed able to survive in a range of spatial models based on real geographical data. This is not an obvious outcome, since the agents were never trained on empirical landscape models, but only on synthetic environments. For the chamois, surviving means being able to obtain food and water, while efficiently navigating in the presence of obstacles and actively escaping predation from wolves. For the wolves, it means being able to locate, chase and hunt chamois. These behaviors, which the agents share with their natural counterparts, were also observed in our behavioral experiment (Supporting Video 1). Just like real animals, our animal models were able to coexist for multiple generations in a range of different environments. This means that they had developed an ability to survive and reproduce by tackling the stream of challenges that their environment poses to their cognitive abilities. The behavioral strategies that emerged dur- ing training, along with the biologically realistic population dynamics, show that this approach based on reinforcement learning is promising to help overcome the challenges of modeling complex social-ecological systems. Conclusions. Here we have shown how to tackle a main difficulty in agent-based ecosystem modeling by using machine learning rather than hand-coding for modeling animal behavior. We used this strategy to construct ecosystem models of ten different areas in the Rhaetian Alps. In these ecosystem models: (i) the chamois and wolves were able to coexist for long periods of time without any of them dying out; (ii) several characteristic patterns of animal behavior and population dynamics emerged; and (iii) ecological tipping points could be identified when the models were exposed to increasing levels of habitat degradation, game hunting, and heat stress. 12 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint Our results indicate that it is possible to build agent-based ecosystem models that leverage ma- chine learning and that the resulting ecosystem models are useful for analyzing and comparing different management policies from the perspective of ecology and sustainability. Next steps in this research might include building and validating more detailed models of ecosystems where high-resolution time-series data about animal behavior and population dynamics exist, and in- troducing additional modeling dimensions such as sexual and seasonal reproduction, altitude, and temperature. As ecosystem twins increase in popularity, we hope that behavioral models powered by machine learning will pave the way for more realistic, scalable, and useful ecosys- tem models for policymakers and scientists alike.

Methods

Modeling strategy. We used a general strategy for constructing agent-based ecosystem mod- els that combined three key ideas: (i) to model animals as reinforcement learning agents that make decisions based on their perception of the environment and their internal homeostatic signals; (ii) to reward agents for living long lives and thus, indirectly, for eating and drinking, avoiding predators and other lethal dangers, and navigating efficiently in the terrain; and (iii) to train the agents in many different environments so that they become flexible enough to survive in models of many different geographical areas with varying ecological conditions. Before de- scribing in detail how our ecosystem models were developed, we outline our general strategy for ecosystem modeling. Conceptual model. Given a set of ecosystems and ecological phenomena that one wishes to model, we first defined the building blocks, or the ontology, of our ecosystem models. A con- ceptual model consists of (i) a set of functional groups, which is divided into decision-making and non-decision-making functional groups, where decision-making functional groups (e.g., wolves and chamois) are modeled individually as agents, and non-decision-making functional groups (e.g., different kinds of vegetation) are modeled collectively; (ii) a set of agent proper- ties, for example age, weight, max speed, max age, energy level, hydration level, position, ob- servation space, and action space; (iii) a set ofcell properties, for example, biological properties such as the abundance of each functional group, physical properties such as altitude and tem- perature, and landscape properties such as land-cover class (with values like rock, field, sand, water). The cell properties might also include smell intensity of different functional groups and (organisms associated with) land-cover classes. 13 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint Spatial model. We use spatial models to represent ecosystems at a given moment in time. A spatial model consists of a grid of cells, where each cell has its own cell properties, and a set of agents, with its own agent properties (Fig. 1(B)). Behavioral model. A behavioral modelconsists of the following components, for each decision- making functional group: (i) an observation space; (ii) an action space; and (iii) a policy net- work computing a function from the observation space to the action space. We use one policy network per decision-making functional group for computing the actions of each agent, based on the properties of the cells surrounding it and its own internal state (Fig. 6(A)). Dynamic model. A dynamic model is a mechanism that takes as input a spatial model and the decisions (actions) of all agents of that spatial model; and outputs an updated spatial model. Dynamic models are typically defined as a set of update rules specifying how the primary producers grow, the consequences of the agents’ locomotion and feeding actions, and how the physical properties of the cells develop over time. Training process. Given a conceptual model C and a dynamic model D, we use reinforcement learning to construct a behavioral modelB. More precisely, we fix an architecture for the policy network of each decision-making functional group and train their parameters (weights and bi- ases) with reinforcement learning. Thus, a reward signal is used, which provides an assessment of the impact of the agent’s actions. For instance, actions leading to improved energy levels are typically assigned a positive reward, while actions leading to the death of an agent is assigned a negative reward. The goal of the training process is to optimize the parameters of the policy network toward maximizing the cumulative reward in the long run. The policy networks of all decision-making functional groups can be constructed simultaneously as follows: (1) Define a class of spatial models S that are suitable for training; (2) Initialize the behavioral modelB by using randomization; (3) Repeat the following until B is good enough: For every spatial model S ∈ S , use B and D to run a simulation starting fromS, for a fixed number of steps, then update B using reinforcement learning. The reason for training the behavioral model B on a class of spatial models S rather than a single spatial model is to achieve generalization properties [48]. When defining S , one may consider using synthetic data, which have been used successfully in supervised learning as well as reinforcement learning, e.g., when data are scarce [49]. Generic model. Finally, we put the pieces together and define a generic model, comprising three elements: a (1) conceptual model, (2) dynamic model, and (3) behavioral model. With a 14 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint generic model, one may start with any spatial model based on the same conceptual model and then apply the behavioral and dynamic models repeatedly, to generate a sequence of spatial models that simulate how ecosystems develop over time. Alpine model. Next we show how we used the general strategy described above to construct a generic model for alpine ecosystems. Conceptual model. Our conceptual model included two decision-making functional groups (chamois and wolf) and three non-decision-making functional groups (three types of vegeta- tion). We only used two land-cover types: land and water. The cell properties included biomass and smell of the functional groups, and smell of water. For both chamois and wolf agents, the observation space consisted of two internal variables (energy and hydration) and the cell properties of a 3 × 3 cell neighborhood, while the action space consisted of triplets of numbers encoding direction and speed. Behavioral model. For simplicity we let all agents belonging to the same functional group share the same behavioral model. Thus, the decisions of all chamois are taken by the same policy network and similarly for the wolves. Note that different agents will usually perceive the environment differently, based on their current location, so the actions computed by the policy network will often differ among them at any given time. Moreover, to simplify the behavioral model further, we used a single policy network for both chamois and wolves. To that end, we added an input to the policy network for encoding the functional group (chamois or wolf). The input to the policy network also included the agent’s other properties and the properties of the 3 ×3 cells surrounding it, all in all 144 inputs. The output of the policy network was a triplet of real numbers (∆x, ∆y,v), specifying movement in the direction (∆x, ∆y) at speed v. The speed v is a number in [0,1] that specifies a fraction of the agent’s maximum speed. The network had a fully connected architecture with two hidden layers with 64 nodes each and with tanh as the activation function (Fig. 6(A)). Dynamic model. We now describe how the set of agents, the agent properties, and the cell properties were updated at each time step. Primary production. When a chamois was on a cell, the biomass of the vegetation of that cell was set to 0, modeling grazing. Then the vegetation grew back gradually, reaching a predefined maximum biomass after 20, 100, or 200 steps, depending on the vegetation type. 15 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint (A) Input-output interface of a policy network 0.0 0 .2 0 .4 0 .6 0 .8 1 .0 Energy 0.0 0.2 0.4 0.6 0.8 1.0 Hydration Reward 0.000 0.125 0.250 0.375 0.500 0.625 0.750 0.875 1.000 (B) Homeostatic reward FIGURE 6. (A) Interface of a shared policy network between chamois and wolf. The agent observes its own internal state, including its own functional group (left) and the cell properties of the 3× 3 cells surrounding it (middle). Thus, the agent can perceive its own state, along with the presence of resources, obstacles, predators, and competitors in its near environment. Given this input, the agent uses a policy network to compute its movement, given by a direction and a speed (right). (B) The homeostatic reward used for training policy networks, quantify- ing the “well-being” of the agent in terms of its energy and hydration levels. An agent with both energy and hydration levels above the saturation threshold of 0.8 will not get additional reward from increasing its energy or hydration levels further. Metabolism. A chamois in a cell containing vegetation would graze automatically. Thus, the energy level of the chamois was increased by up to 3% of its maximum energy level, while the vegetation disappeared and started to regrow at a speed that depended on its type. Similarly, a wolf in the same cell as a chamois would kill the chamois automatically. Thus, the energy level of the wolf was increased by up to +18% of its maximum energy level. Moreover, both chamois 16 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint and wolves drank automatically every time they were in a cell adjacent to water, thus reaching a hydration level of 100%. Movement. The agents could use their actions to move, rest, feed, and drink. In fact, resting corresponds to moving at speed 0, while eating and drinking happened automatically, as men- tioned. Although the space was divided into a discrete grid of cells, we kept track of the exact position of each agent in continuous space, by including the coordinates (x,y) ∈ R2 of its posi- tion among the agent properties. The agents could move in any direction and at different speeds, as determined by the output of the policy network. The maximum speed varied with the age of the agent and was set to increase linearly until an age of 100 time units, after which it plateaued until 250 units, whereupon it declined linearly until 500 units (i.e., the maximum age). We set the maximum speed of the wolf to slightly exceed that of the chamois. The rule for updating the position of an agent was as follows: If the position of an agent at time t is (xt,yt) and the output of the policy network of the agent is(∆x, ∆y,v), then its position (xt+1,yt+1) at time t +1 will be (xt + ∆x · v,yt + ∆y · v). Reproduction. Both chamois and wolves reproduced with a probability that was set to 0.1, given that their energy level was above a threshold, here set to 95% and 50% of their maximal energy level for wolves and chamois, respectively. Death. Individual chamois and wolves would die from starvation if they ran out of energy, from thirst if they ran out of hydration, from drowning if they entered a water cell, and from old age if they reached their predefined maximum age, which was set to 500 steps. In addition, the chamois would die from predation if they were caught by a wolf. Smells. Once the other cell properties had been updated, we computed the smells. We used a basic concept of smell. For example, to compute the smell of wolf or water in a certain cell, we simply took the Euclidean distance to the closest wolf or closest water cell. Training process. Reinforcement learning was used to train the policy networks of the chamois and the wolf. We used a homeostatic reward signal for each agent, which measures its “well- being”, defined as a quadratic function of its energy and hydration levels (Fig. 6(B)). Thus, the policy network was optimized to keep the energy and hydration levels of each agent above a threshold that was set to 80% of the maximum energy and hydration level, respectively. The policy network was trained with reinforcement learning using the PPO algorithm from Stable- Baselines3 [50]. For training, we created thousands of synthetic spatial models by using Perlin noise [44], which is widely used in the movie and gaming industries for generating artificial 17 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint landscapes. More precisely, the spatial models were created by first defining a landscape with 50 × 50 cells using Perlin noise and then randomly spawning 40 chamois and 10 wolves on that grid (Fig. 1(E)–(F)). At the start of each training episode, a new spatial model (Perlin world) was constructed using randomization. By training on many Perlin worlds, the agents were encouraged to learn general behavioral rules that could be useful in many spatial models, instead of learning the specifics of a single spatial model. Again for the sake of efficiency, the number of agents was kept constant during training by automatically adding a new individual of the same functional group, when a death occurred. We used intermediate checkpoints of the model obtained during training to monitor their ability to maintain stability, where none of the functional groups goes extinct. During validation, episodes were run for 4,000 time steps, but they were aborted as soon as one of the functional groups (specifically the chamois or wolf) went extinct. More information about the training process, including the choice of algorithm, reward signal, and hyperparameters can be found in the Supplementary Information. Spatial models. We used land-cover maps obtained by classifying multiband satellite images, retrieved in 2023 by Copernicus Sentinel-2 fleet, of ten non-overlapping 4 × 4 km areas in the Rhaetian Alps. All ten contain water bodies, as the basis for our spatial models (Fig. 2). To create spatial models from these maps, we populated them by the three types of vegetation, which were distributed following the land cover types and initially set to 50% of their maximum biomass in each cell. Moreover, 100 chamois and 25 wolves were randomly spawned on the land cells of the map. The agents were spawned randomly since—as far as we are aware—no high resolution population data exist for the ten areas considered. DATA AVAILABILITY All data supporting this research are available in the main text and the Supplementary Information. Original data can be obtained from the corresponding authors upon request. CODE AVAILABILITY The code of the ecosystem models is implemented in Python v.3 and is available in the repository https://gitlab.com/ecotwin/abm_paper_2024.git.

References

[1] Gretchen C. Daily and Pamela A. Matson. Ecosystem services: from theory to implementation. Proceedings of the National Academy of Sciences, 105(28):9455–9456, 2008. [2] Alexandre Antonelli. The Hidden Universe: Adventures in Biodiversity. University of Chicago Press, 2022. 18 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint [3] Alice C. Hughes and Edward R. Grumbine. The Kunming-Montreal Global Biodiversity Framework: what it does and does not do, and how to improve it. Frontiers in Environmental Science, 11:1281536, 2023. [4] Daniele Silvestro, Stefano Goria, Thomas Sterner, and Alexandre Antonelli. Improving biodiversity protec- tion through artificial intelligence. Nature Sustainability, 5(5):415–424, 2022. [5] S. R. Weiskopf, B. J. E. Myers, M. I. Arce-Plata, J. L. Blanchard, S. Ferrier, E. A. Fulton, M. Harfoot, F. Isbell, J. A. Johnson, A. S. Mori, E. Weng, Z. V . Harmá ˇcková, M. C. Londoño-Murcia, B. W. Miller, L. M. Pereira, and I. M. D. Rosa. A conceptual framework to integrate biodiversity, ecosystem function, and ecosystem service models. BioScience, 72(11):1062–1073, 2022. [6] William L. Geary, Michael Bode, Tim S. Doherty, Elizabeth A. Fulton, Dale G. Nimmo, Ayesha I.T. Tul- loch, Vivitskaia J.D. Tulloch, and Euan G. Ritchie. A guide to ecosystem models and their environmental applications. Nature Ecology & Evolution, 4(11):1459–1471, 2020. [7] Claes Strannegård, Niklas Engsner, Rasmus Lindgren, Simon Olsson, and John Endler. AI tool for exploring how economic activities impact local ecosystems. In Kohei Arai, editor,Intelligent Systems and Applications, pages 690–709, Cham, 2024. Springer Nature Switzerland. [8] Alfred James Lotka. Elements of Physical Biology. Williams & Wilkins, 1925. [9] Roger Arditi and Lev R. Ginzburg. Coupling in predator-prey dynamics: ratio-dependence. Journal of Theoretical Biology, 139(3):311–326, 1989. [10] Gordon L. Swartzman. Simulation modeling of material and energy flow through an ecosystem: methods and documentation. Ecological Modelling, 7(1):55–81, 1979. [11] Brad DeYoung, Mike Heath, Francisco Werner, Fei Chai, Bernard Megrey, and Patrick Monfray. Challenges of modeling ocean basin ecosystems. Science, 304(5676):1463–1466, 2004. [12] Karline Soetaert and Peter MJ Herman. A practical guide to ecological modelling: using R as a simulation platform. Springer, 2009. [13] Tomo Royama. Analytical population dynamics, volume 10. Springer Science & Business Media, 2012. [14] Mathieu Colléter, Audrey Valls, Jérôme Guitton, Morissette Lyne, Francisco Arreguín-Sánchez, Villy Chris- tensen, Didier D Gascuel, and Daniel Pauly. EcoBase: a repository solution to gather and communicate information from EwE models. PhD thesis, Fisheries Centre, University of British Columbia, Canada, 2013. [15] Oskar Hagen, Benjamin Flück, Fabian Fopp, Juliano S. Cabral, Florian Hartig, Mikael Pontarp, Thiago F. Rangel, and Loïc Pellissier. gen3sis: A general engine for eco-evolutionary simulations of the processes that shape Earth’s biodiversity.PLoS Biology, 19(7):e3001340, 2021. [16] Uri Wilensky and William Rand. An Introduction to Agent-Based Modeling: Modeling Natural, Social, and Engineered Complex Systems with NetLogo. The MIT Press, 2015. [17] Donald L. DeAngelis and V olker Grimm. Individual-based models in ecology after four decades. F1000Prime reports, 6, 2014. [18] Andrew Crooks, Nick Malleson, Ed Manley, and Alison Heppenstall. Agent-Based Modelling and Geo- graphical Information Systems: a practical primer. SAGE Publications, 2019. 19 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint [19] Nathan D. Rollins, C. Michael Barton, Sean Bergin, Marco A. Janssen, and Allen Lee. A computational model library for publishing model documentation and code. Environmental Modelling & Software, 61:59– 64, 2014. [20] Adam J. McLane, Christina Semeniuk, Gregory J. McDermid, and Danielle J. Marceau. The role of agent- based models in wildlife ecology and management. Ecological modelling, 222(8):1544–1556, 2011. [21] Emilie Lindkvist, Nanda Wijermans, Tim M. Daw, Blanca Gonzalez-Mon, Alfredo Giron-Nava, Andrew F. Johnson, Ingrid van Putten, Xavier Basurto, and Maja Schlüter. Navigating complexities: agent-based mod- eling to support research, governance, and management in small-scale fisheries.Frontiers in Marine Science, 6:733, 2020. [22] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An introduction. MIT press, 2018. [23] V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015. [24] Adrià Puigdomènech Badia, Bilal Piot, Steven Kapturowski, Pablo Sprechmann, Alex Vitvitskyi, Zhao- han Daniel Guo, and Charles Blundell. Agent57: Outperforming the Atari Human Benchmark. In Inter- national Conference on Machine Learning, pages 507–517. PMLR, 2020. [25] David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018. [26] Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, and Timothy Lillicrap. Mastering diverse control tasks through world models. Nature, pages 1–7, 2025. [27] Daniele Silvestro, Stefano Goria, Ben Groom, Piotr Jacobsson, Thomas Sterner, and Alexandre Antonelli. Using artificial intelligence to optimize ecological restoration for climate and biodiversity. bioRxiv, pages 2025–01, 2025. [28] Wei Zhang, Andrea Valencia, and Ni-Bin Chang. Synergistic integration between machine learning and agent-based modeling: A multidisciplinary review. IEEE Transactions on Neural Networks and Learning Systems, 2021. [29] Lorenzo Canese, Gian Carlo Cardarilli, Luca Di Nunzio, Rocco Fazzolari, Daniele Giardino, Marco Re, and Sergio Spanò. Multi-agent reinforcement learning: A review of challenges and applications. Applied Sciences, 11(11):4948, 2021. [30] Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019. [31] Jun Yamada, John Shawe-Taylor, and Zafeirios Fountas. Evolution of a complex predator-prey ecosystem on large-scale multi-agent deep reinforcement learning. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2020. 20 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint [32] Peter Sunehag, Guy Lever, Siqi Liu, Josh Merel, Nicolas Heess, Joel Z Leibo, Edward Hughes, Tom Eccles, and Thore Graepel. Reinforcement learning agents acquire flocking and symbiotic behaviour in simulated ecosystems. In Artificial life conference proceedings, pages 103–110. MIT Press, 2019. [33] Ivo Siekmann and James M. Osborne. Editorial: Do individuals matter? - Individual-based versus population- based models applied to biology and health. Frontiers in Applied Mathematics and Statistics, 9, 2023. [34] Donald L DeAngelis and Stephanie G Diaz. Decision-making in agent-based modeling: A current review and future prospectus. Frontiers in Ecology and Evolution, 6:237, 2019. [35] Himanshu Kaul and Yiannis Ventikos. Investigating biocomplexity through the agent-based paradigm.Brief- ings in Bioinformatics, 16(1):137–152, 2015. [36] Carl Walters and Villy Christensen. Adding realism to foraging arena predictions of trophic flow rates in ecosim ecosystem models: shared foraging arenas and bout feeding. Ecological Modelling, 209(2-4):342– 350, 2007. [37] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436, 2015. [38] John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873):583–589, 2021. [39] Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R Andersson, Andrew El-Kadi, Dominic Masters, Timo Ewalds, Jacklynn Stott, Shakir Mohamed, Peter Battaglia, et al. Probabilistic weather forecasting with machine learning. Nature, 637(8044):84–90, 2025. [40] Jonathan Lenoir, Jens-Christian Svenning, Stefan Dullinger, Harald Pauli, Wolfgang Willner, Antoine Guisan, Pascal Vittoz, Thomas Wohlgemuth, Niklaus Zimmermann, and Jean-Claude Gégout. The Alps Vegetation Database—a geo-referenced community-level archive of all terrestrial plants occurring in the Alps. Biodivers. Ecol, 4:331–332, 2012. [41] Rodolfo Gentili, Stefano Armiraglio, Graziano Rossi, Sergio Sgorbati, and Carlo Baroni. Floristic patterns, ecological gradients and biodiversity in the composite channels (Central Alps, Italy). Flora-Morphology, Distribution, Functional Ecology of Plants, 205(6):388–398, 2010. [42] A. Gazzola, E. Avanzinelli, I. Bertelli, A. Tolosano, P. Bertotto, R. Musso, and M. Apollonio. The role of the wolf in shaping a multi-species ungulate community in the Italian western Alps. Italian Journal of Zoology, 74(3):297–307, 2007. [43] Wolf Alpine Group. The wolf Alpine population in 2020-2024 over 7 countries. Technical report for LIFE WolfAlps EU project LIFE18 NAT/IT/000972, Action C4. Technical report, Life Wolfalps EU project team, https://www.lifewolfalps.eu/en/archivi/technical-reports-en/, 2024. [44] Ken Perlin. An image synthesizer. ACM Siggraph Computer Graphics, 19(3):287–296, 1985. [45] Håkan Sand, John A Vucetich, Barbara Zimmermann, Petter Wabakken, Camilla Wikenros, Hans C. Peder- sen, Rolf O. Peterson, and Olof Liberg. Assessing the influence of prey–predator ratio, prey age structure and packs size on wolf kill rates. Oikos, 121(9):1454–1463, 2012. 21 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint [46] Ivan Palmegiani, Andrea Gazzola, and Marco Apollonio. Wolf diet and its impact on the ungulates commu- nity in a new recolonized area of Western Alps: Gran Paradiso National Park.Folia Zoologica, 62(1):59–66, 2013. [47] Russell Lande, Steinar Engen, and Bernt-Erik Saether. Stochastic Population Dynamics in Ecology and Conservation. Oxford University Press, USA, 2003. [48] Karl Cobbe, Oleg Klimov, Chris Hesse, Taehoon Kim, and John Schulman. Quantifying generalization in reinforcement learning. In International Conference on Machine Learning, pages 1282–1289. PMLR, 2019. [49] Chunli Liu, Carmine Ventre, and Maria Polukarov. Synthetic data augmentation for deep reinforcement learning in financial trading. In Proceedings of the third ACM international conference on AI in finance, pages 343–351, 2022. [50] Antonin Raffin, Ashley Hill, Adam Gleave, Anssi Kanervisto, Maximilian Ernestus, and Noah Dormann. Stable-baselines3: Reliable reinforcement learning implementations.Journal of Machine Learning Research, 22(268):1–8, 2021. ACKNOWLEDGMENTS CS acknowledges funding from the Sten A Olsson Foundation for Research and Culture, Stiftelsen Sävs- taholm, and Erik och Lily Philipsons Stiftelse. DS received funding from ETH Zurich. AA and DS acknowledge financial support from the Swedish Research Council (2019-05191; 2024-04303) and the Swedish Foundation for Strategic Environmental Research MISTRA within the framework of the research programme BIOPATH (F 2022/1448). AA also acknowledges funding from RBG Kew Development. The computations were enabled by resources provided by the National Academic Infrastructure for Supercomputing in Sweden (NAISS), partially funded by the Swedish Research Council through grant agreement no. 2022-06725. We thank Rhian Smith for feedback on this manuscript. 22 .CC-BY 4.0 International licensemade available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is The copyright holder for this preprintthis version posted June 9, 2025. ; https://doi.org/10.1101/2025.06.07.658424doi: bioRxiv preprint

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-pdf ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-4.0