LLM-Based Generative Agents Simulate Mental-Health Trajectories Under Adverse Socio-Environmental Conditions

preprint OA: closed
Full text JSON View at publisher
Full text 137,276 characters · extracted from preprint-html · click to expand
LLM-Based Generative Agents Simulate Mental-Health Trajectories Under Adverse Socio-Environmental Conditions | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article LLM-Based Generative Agents Simulate Mental-Health Trajectories Under Adverse Socio-Environmental Conditions Joseph Kambeitz This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9291631/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 7 You are reading this latest preprint version Abstract Mental disorders are largely shaped by socio-environmental adversity, yet causal experiments on major stressors are typically infeasible in humans. Here, we propose a simulation framework using LLM-based generative agents to model the behaviour and mental health of individuals. A total of 25 agents were simulated for 5 days, with 6 agents being exposed to personalised adverse events while continuously monitoring mental health with established clinical instruments. Adversity produced acute increases in depression and stress-related symptoms relative to controls, with elevated scores persisting thereafter. Moreover, the impact of adverse events was moderated by vulnerability factors: agents with high neuroticism showed stronger increases relative to agents with low neuroticism. Analysis of agents’ conversations revealed that exposed agents disclosed personal difficulties and sought support at rates that increased with neuroticism level, providing converging behavioural evidence. Overall, the present results support the feasibility of LLM-based generative agents for controlled, counterfactual experiments on socio-environmental drivers of mental health. Biological sciences/Psychology Social science/Psychology Health sciences/Risk factors Figures Figure 1 Figure 2 Figure 3 Figure 4 Introduction Mental disorders such as major depression constitute a substantial and growing share of the global burden of disease (GBD 2021 Diseases and Injuries Collaborators, 2024 ), with prevalence rising sharply among young people in recent years (McGorry et al., 2024 ). Social and environmental factors such as job loss, financial hardship, or social isolation are among the strongest determinants of mental health (Alon et al., 2024 ; Arango et al., 2021 ). Following the stress-vulnerability model, such environmental stressors interact with pre-existing individual vulnerabilities to facilitate mental disorders (Monroe and Simons, 1991 ; Zubin and Spring, 1977 ). Especially, personality traits such as neuroticism - characterised by heightened emotional reactivity and proneness to negative affect - are strong predictors of mental health (Kotov et al., 2010 ) and moderate the depressogenic impact of stressful life events (Kendler et al., 2004 ; Ormel et al., 2001 ). Most importantly, many socio-environmental influences are potentially modifiable and thus represent significant opportunities for intervention and prevention (Dragioti et al., 2022 ). Yet our ability to establish how specific adversities cause, sustain, or exacerbate symptoms remains fundamentally limited. Ethical constraints prohibit experimental manipulation of major life stressors in humans; observational studies are beset by confounding, reverse causality, and limited capacity to capture dynamic processes; and statistical methods for causal inference from observational data rely on assumptions that are often untestable (Marinescu et al., 2018 ; Öngür and Paulus, 2025 ). The result is that interventions targeting socio-environmental risk factors remain insufficiently grounded in causal knowledge (Scheffer et al., 2024a, 2024b). Agent-based modelling offers a complementary strategy: by simulating individual humans as autonomous agents that act and interact within virtual environments, researchers gain complete experimental control over the factors under investigation (Bonabeau, 2002 ). Early applications in public health have modelled health-related behaviours including smoking, physical activity, and transportation use (Tracy et al., 2018 ; Yang et al., 2020 ). However, for studying mental health specifically, classical agent-based modelling faces important limitations: environments are typically highly simplified, agent behaviour is restricted to predefined action sets, and outcomes focus on observable behaviour rather than the subjective experiences and internal states central to psychopathology (Kambeitz and Meyer-Lindenberg, 2025 ). Large language models (LLMs) have the potential to overcome these constraints. Trained on vast corpora of human-generated text encoding diverse aspects of thought, emotion, and social interaction, LLMs can generate contextually appropriate behaviour and natural conversations (Chen et al., 2024 ; Shanahan et al., 2023 ). Several reviews have argued that generative AI may fundamentally reshape how the social and behavioural sciences study human populations — for example by enabling large-scale simulations, automated content analysis, and novel forms of experimental research (Bail, 2024 ; Sadée et al., 2025 ). Crucially for psychiatric applications, LLMs also encode structured knowledge of psychopathology (Kambeitz et al., 2025 ), and inducing adverse psychological contexts produces contextually appropriate changes on clinical instruments (Ben-Zion et al., 2025 ; Coda-Forno et al., 2023 ). Building on these capabilities, LLMs have the potential to augment agent-based modelling and to create generative agents that exhibit a range of plausible human behaviour such as navigation of complex social environments or formation of relationships (Li et al., 2024 ; Park et al., 2023 ). Recent advances have demonstrated the scalability of this paradigm: generative agent simulations of up to 1,000 individuals replicate participants' survey responses with accuracy approaching human test–retest reliability (Park et al., 2024 ). Moreover, large-scale platforms now integrate LLMs with agent-based frameworks for social network modelling with simulation up to one million agents (Ferraro et al., 2024 ; Yang et al., 2024 ). These developments suggest that generative agents are maturing from proof-of-concept demonstrations toward tools capable of addressing substantive scientific questions. We recently proposed that such agents could be embedded in virtual environments and assessed using standardised mental health instruments to investigate socio-environmental determinants of mental health under full experimental control (Kambeitz and Meyer-Lindenberg, 2025 ). In the present study, we provide the first empirical implementation of this framework, testing whether LLM-based generative agents can capture core features of the stress–vulnerability model of mental disorders under controlled experimental conditions. Results 25 generative agents were simulated interacting in a virtual environment over five days (Fig. 1 ). During a three-day baseline period, agents were assessed twice daily with respect to depressive (PHQ-9) and stress-related symptoms (K10). On day three, six agents were exposed to individually tailored adverse life events of varying severity—including a family medical emergency, financial threat, and workplace conflict—while 19 agents served as unexposed controls. To test personality moderation, the simulation was branched at day three into three parallel runs in which the six exposed agents were assigned low, average, or high neuroticism profiles before experiencing identical adverse events. Both PHQ-9 and K10 showed marked increases in symptom scores in exposed agents compared to non-exposed controls (Fig. 2 A), with elevated scores persisting across all subsequent assessment timepoints (all p < 0.05). Analysis of the trajectory of individual items of the PHQ-9 and the K10 indicated symptom score increases following exposure to adverse events in most items (Supplementary Figs. 1 & 2). Interestingly, the increase in symptom scores varied depending on which questionnaire item but also depending on the type of adverse event (Fig. 2 B and 2 C), suggesting that the present simulations exhibit variability in symptoms depending on the type of adverse events and the type of agent as would be expected in empirical investigations. For depression symptoms (PHQ-9), a mixed-effects model adjusting for baseline scores revealed a significant main effect of neuroticism (χ²(2) = 32.13, p < 0.001) indicating that the increase of depression symptoms was moderated by personality style (Fig. 3 A). The fixed effects explained 19.5% of the variance in post-intervention PHQ-9 scores (marginal R² = 0.195). Post-hoc Tukey-corrected comparisons showed significantly lower PHQ-9 scores in the low-neuroticism condition compared to the average condition (p = 0.003, SMD = -2.55) and compared to the high-neuroticism condition (p < 0.001, SMD = -3.05). The difference between average and high neuroticism did not reach statistical significance (p = 0.676, SMD = -0.5). For stress-related symptoms (K10), the same modeling approach showed a significant main effect of neuroticism (χ²(2) = 52.5, p < 0.001) indicating that also the increase of stress symptoms was moderated by personality style (Fig. 3 B). Neuroticism and baseline distress together explained 36.7% of the variance in post-intervention K10 scores (marginal R² = 0.367). Post-hoc analyses indicated significantly lower K10 scores in the low-neuroticism condition compared to the average condition (p = 0.002, SMD = -2.78) and compared to the high-neuroticism condition (p < 0.001, SMD = -4.1). The contrast between average and high neuroticism was not statistically significant (p = 0.105, SMD = -1.32). Looking at each individual item of the PHQ-9 and the K10, we detected moderation by neuroticism after FDR correction for each item except for suicidal ideation (Fig. 3 C and 3 F). The strongest neuroticism moderation effects were observed for anxiety-related symptoms in the K10 such as nervousness and restlessness (Fig. 3 D and 3 E). This pattern is consistent with the conceptual overlap between neuroticism - defined partly by anxiety proneness and emotional reactivity - and anxiety-related symptom content (Kotov et al., 2010 ). The fact that the simulation reproduces this differential sensitivity rather than uniformly inflating all items provides evidence that manipulating neuroticism leads to some degree of symptom specificity rather than acting as a nonspecific amplifier of distress scores. The qualitative analysis indicated that agents store adverse experiences as memories and are building secondary memories as a result of reflecting about their current situation (e.g. John is “...feeling anxious about the upcoming disciplinary meeting”). Also, agents adapt their behaviour following adverse events. E.g. after hearing that her café is in financial trouble, Isabella Rodriguez is “reviewing the letter regarding the insolvency review" and "contacting the bank to address financial concerns" or after hearing that her scholarship might not be prolonged, Ayesha Khan spends time across multiple actions such as "preparing documentation for the appeal meeting", "gathering necessary documents", "writing a summary", "reviewing the appeal requirements" and "finalizing the documentation". Moreover, agents disclose their adverse experiences to other agents. As an example Maria Lopez discloses to 3 agents that her mother suffered from an aneurysm ("I'm still feeling a bit overwhelmed thinking about everything going on with my mom”, "It's been a tough time with my mom in the hospital, so knowing I can talk to you makes things a bit easier") and other agents respond in a supportive manner (e.g. Hailey responds "I'm so sorry to hear about your mom, Maria. That must be really tough for you. I completely understand if you need to take a step back...", Fig. 3 ). Lastly, after hearing about others' adverse experiences, agents build memories of these events, indicating that adverse events socially spread through the agent community - a pattern consistent with evidence that emotional states propagate through social networks in human populations (Fowler and Christakis, 2008 ). To quantify these conversational patterns, we used GPT-4.1 to classify each post-event conversation for five behavioral features and compared rates between exposed and control agents (Fig. 4 , Table 1 ). Across all neuroticism conditions, exposed agents showed significantly higher rates of personal difficulty disclosure, negative emotion expression, and support seeking compared to controls (Table 1 ). Notably, these conversational features showed a dose-response pattern with neuroticism: high-neuroticism agents disclosed personal difficulties in 27.3% of conversations versus 14.8% for low-neuroticism agents, and expressed negative emotions in 28.8% versus 13.1% of conversations. Conversation partners of exposed agents also showed elevated empathy and practical help responses, with effect sizes increasing alongside the exposed agent's neuroticism level (Table 1 ). This convergence between standardised symptom measures and naturalistic conversational behaviour suggests that the neuroticism manipulation propagates beyond questionnaire responses into social interaction patterns. Notably, practical help offered by conversation partners was the only feature that did not differ significantly between exposed and control agents in the average-neuroticism condition (p = 0.306), suggesting that the conversational classification captures meaningful variation rather than uniformly inflating all social features in exposed agents. Table 1 Conversational behaviour features in post-adverse-event conversations: exposed vs. control agents across neuroticism conditions. Exposed and control rates represent proportions of conversations containing each feature. χ² = Pearson chi-squared statistic; φ = phi coefficient (effect size); OR = odds ratio from Fisher’s exact test. — indicates undefined OR (zero events in control group). All tests are two-sided. Condition Feature Exposed Control χ² p φ OR Fisher p Low neuroticism Negative Emotion Expressed 13.1% 1.7% 15.0 < 0.001 0.21 8.4 < 0.001 Partner Empathy Shown 18.0% 1.7% 26.7 < 0.001 0.28 12.2 < 0.001 Low neuroticism Partner Practical Help 27.9% 7.7% 18.5 < 0.001 0.23 4.6 < 0.001 Personal Difficulty Mentioned 14.8% 2.1% 16.5 < 0.001 0.22 8.0 < 0.001 Low neuroticism Support Seeking 14.8% 0.7% 27.9 < 0.001 0.28 24.2 < 0.001 Average Negative Emotion Expressed 17.1% 0.7% 34.7 < 0.001 0.32 27.5 < 0.001 Partner Empathy Shown 21.1% 1.8% 35.4 < 0.001 0.32 14.1 < 0.001 Average Partner Practical Help 15.8% 10.7% 1.0 0.306 0.05 1.6 0.230 Personal Difficulty Mentioned 23.7% 0.7% 53.6 < 0.001 0.39 41.3 < 0.001 Average Support Seeking 14.5% 0.0% 36.1 < 0.001 0.32 — < 0.001 High neuroticism Negative Emotion Expressed 28.8% 0.3% 75.7 < 0.001 0.46 113.0 < 0.001 Partner Empathy Shown 31.8% 2.1% 62.8 < 0.001 0.42 21.5 < 0.001 High neuroticism Partner Practical Help 31.8% 7.0% 29.7 < 0.001 0.29 6.2 < 0.001 Personal Difficulty Mentioned 27.3% 0.7% 65.8 < 0.001 0.43 52.4 < 0.001 High neuroticism Support Seeking 22.7% 0.0% 62.4 < 0.001 0.42 — < 0.001 Discussion This study demonstrates that LLM-based generative agents can exhibit structured, longitudinal symptom changes on standardised instruments following experimentally introduced adversity, with moderation by neuroticism consistent with the stress–vulnerability model(Monroe and Simons, 1991 ; Zubin and Spring, 1977 ). These findings support the feasibility of using generative agents as controllable experimental substrates in which exposures, vulnerability factors, and outcomes can be manipulated without the ethical and practical constraints of human experimentation. A key contribution is the experimental control afforded by simulation. Observational studies robustly link adversity to mental health outcomes, but causal interpretation is limited by confounding and reverse causality. In contrast, the present design introduces adversity exogenously within a closed computational world and tracks within-agent symptom changes over time. This makes it possible to ask counterfactual questions—"What would this agent's trajectory have been without the event?" or "How would the same event unfold under altered vulnerability?"—that are typically inaccessible in clinical cohorts. By implementing counterfactual branches from the same pre-event world state, we reduce between-run variability and enable direct comparison of post-event trajectories. From a psychiatric modelling standpoint, the observed moderation by neuroticism indicates that personality manipulations can propagate through the agent's memory, reflection, and action-generation loops into measurable symptom changes. This aligns with psychometric evidence that personality traits in LLMs can be reliably measured and shaped along desired dimensions (Serapio-García et al., 2025 ). Overall, this suggests a practical route for formalising stress–vulnerability hypotheses computationally: specify vulnerability dimensions, introduce controlled stressors, and quantify resulting symptom readouts. Importantly, the neuroticism effect was not confined to questionnaire scores. Conversational analysis revealed a dose-response pattern in which high-neuroticism agents disclosed personal difficulties, expressed negative emotions, and sought support at substantially higher rates than low-neuroticism agents (Table 1 ). This convergence between standardised instruments and naturalistic social behaviour provides initial evidence for construct validity—the personality manipulation altered not only self-reported symptoms but also observable interaction patterns. A related question is whether such agents can achieve fidelity not only to theoretical patterns but also to individual human responses. Encouragingly, recent work demonstrates that generative agents initialised from detailed personal interviews can replicate participants' own responses on attitudinal surveys at levels approaching human test–retest reliability (Park et al., 2024 ), and that LLMs can predict personality traits from brief open-ended narratives with accuracy exceeding traditional natural language processing methods (Wright et al., 2026 ). The present study did not incorporate individual-level calibration, as agents were initialised from fictional biographical profiles rather than real participant data, and achieving such grounding remains an important target for future work (Sadée et al., 2025 ). Nevertheless, the current work does not establish that simulated moderation reflects the same latent causal pathways observed in humans. A plausible alternative is that "neuroticism" acts primarily as a prompt-level prior that shifts the linguistic style and valence of self-report, which then mechanically maps onto higher symptom scores (Coda-Forno et al., 2023 ). Several features of the present results argue against a purely stylistic account — the neuroticism manipulation produced differential effects across symptom domains rather than a uniform score increase, and it altered social behaviours that are not direct components of questionnaire scoring — but distinguishing these interpretations fully will require negative-control manipulations (personality dimensions that should not affect depressive symptoms) and cross-model replication. More broadly, interpreting LLM agents as surrogates for human psychological processes requires careful qualification. In the synthetic participant literature, silicon-sample approaches have been shown to be sensitive to analytic choices — including model selection, prompting strategy, temperature, and persona specification — such that small configuration differences can substantially alter correspondence with human data (Cummins, 2025 ). These concerns are compounded by findings that autonomous LLM-based respondents can evade standard data-quality checks (Westwood, 2025), and that the integration of LLMs may exacerbate rather than alleviate long-standing challenges of validating agent-based models, given their black-box structure, cultural biases, and stochastic outputs (Larooij and Törnberg, 2026). While the present simulation-based approach differs from direct participant replacement — agents interact within a structured environment rather than simply producing questionnaire responses — these methodological risks underscore the need for rigorous validation, negative-control experiments, and cross-model replication before simulation outputs can be interpreted as evidence for specific psychological mechanisms. Several limitations should be noted. The sample comprised only six exposed agents, limiting statistical power and generalisability. All simulations used a single LLM (GPT-3.5-turbo), and results may differ across model providers or parameter settings. The simulation spanned five days, leaving open whether longer trajectories would show recovery, chronicity, or other clinically relevant patterns. Finally, LLMs operate within the domain of language and cannot capture non-verbal, physiological, or neurobiological aspects of mental health. Looking forward, several priorities emerge. First, robustness should be quantified across model providers and prompt perturbations, and simulation outputs should be benchmarked against human reference datasets. Second, scaling to larger populations will enable investigation of emergent social phenomena such as stress contagion, social buffering, and network-mediated spread of psychological distress — dynamics that have been theorised in the psychiatric literature (Fowler and Christakis, 2008 ; Scheffer et al., 2024b) but are difficult to study with conventional longitudinal designs. Third, generative agent simulations offer a promising tool for addressing open questions that are difficult to tackle with existing methods — such as why adverse experiences during specific developmental windows confer disproportionate risk for psychopathology (McGorry et al., 2024 ), how loneliness propagates through communities to affect population-level mental health(Fowler and Christakis, 2008 ), and whether specific intervention components can buffer symptom trajectories following adversity (Dragioti et al., 2022 ). By enabling controlled counterfactual experiments on these questions, generative agent models could serve as hypothesis-generating tools for prevention science, complementing traditional randomised trials and observational studies (Bail, 2024 ; Kambeitz and Meyer-Lindenberg, 2025 ). In summary, LLM-based generative agents offer a promising experimental complement to observational mental health research by enabling controlled, counterfactual tests of socio-environmental adversity and vulnerability. The present results demonstrate feasibility, theoretically aligned symptom patterns, and convergent evidence across self-report and behavioural measures. To convert this promise into reliable scientific evidence, future work must prioritise rigorous validation against human data, robustness to analytic choices, and transparent reporting of the full simulation pipeline. Methods Generative Agent Framework Our simulation was built on the generative agent framework introduced by Park et al. ( 2023 ), which employs large language models (LLMs) to create autonomous agents capable of human-like behaviour within a virtual environment. The environment consists of a virtual village comprising houses, a café, a pharmacy, a college dormitory, and public spaces where agents can navigate and interact. Each of the 25 agents was initialised with a unique biographical profile specifying their name, age, occupation, daily routines, relationships with other agents, and personality traits. These profiles were stored as structured text files and loaded into each agent’s memory at the start of the simulation. The cognitive architecture underlying each agent comprises four modules that jointly determine behaviour (Park et al., 2023 ). First , a memory stream records all observations, actions, and reflections as timestamped entries, each tagged with an embedding vector and an importance score (rated 1–10 by the LLM). Second , a retrieval mechanism selects relevant memories for each new situation by computing a weighted combination of three factors: recency (exponential decay with a factor of 0.995), importance (the stored score), and relevance (cosine similarity between the memory’s embedding and the current query). Third , a reflection module periodically synthesises higher-level insights from accumulated memories once their cumulative importance exceeds a threshold of 150 points; these insights are themselves stored back into the memory stream. Fourth , a planning module generates hierarchical daily schedules that are recursively decomposed into actions at approximately 5–15 minute granularity. All language generation and scoring calls were routed through OpenAI’s gpt-3.5-turbo (ChatGPT) API, consistent with the original framework (Park et al., 2023 ). Default API parameters were used throughout (temperature = 1.0). Text embeddings for memory retrieval were generated using OpenAI’s embedding endpoint. Simulation Design and Timeline The simulation advanced in discrete time steps, with each step corresponding to 10 seconds of simulated time (360 steps per hour). In total, the simulation spanned five simulated days, divided into consecutive 12-hour segments to facilitate checkpointing and branching. The overall design comprised three phases: Baseline phase (Days 1–3). All 25 agents lived their daily routines in Smallville without any experimental manipulation. This 72-hour period allowed agents to accumulate naturalistic memories, form social interactions, and establish stable behavioural patterns. Mental health assessments (PHQ-9 and K10) were administered at the end of each 12-hour segment (i.e., twice per simulated day), yielding six baseline measurement points per agent. Adverse event phase (Day 4). On the fourth simulated day, six of the 25 agents (n = 6; hereafter “target agents”) were exposed to individually tailored adverse life events during the second 12-hour segment (i.e., approximately 18 hours into Day 4). The remaining 19 agents continued their routines without any scheduled events and served as non-exposed controls. Both PHQ-9 and K10 assessments were administered at the end of each 12-hour segment of Day 4. Follow-up phase (Day 5). All agents continued in the simulation for an additional 24 hours without further experimental events. PHQ-9 and K10 assessments were again administered at the end of each 12-hour segment, providing two follow-up measurements. In sum, each agent completed 10 mental health assessments (two per day × five days) for each instrument, enabling the analysis of symptom trajectories across baseline, acute exposure, and follow-up. Adverse Life Events Six adverse life events were designed to be individually tailored to the biographical context of each target agent (Table 2 ). Events were crafted to span different life domains (family crisis, financial threat, workplace conflict, medical emergency, housing instability, and academic/financial jeopardy) and to represent realistic stressors of moderate to high severity. Each event description was written as a narrative vignette that the simulation engine injected directly into the agent’s memory stream at the scheduled time step. Technically, adverse events were processed as follows. At the designated time step, the event description was added to the agent’s memory stream via the standard memory-insertion pathway. The LLM assigned an importance (poignancy) score to each event by evaluating the event description in the context of the agent’s persona, ensuring that emotional impact was assessed in a character-specific manner. An embedding vector was computed for the event description to enable subsequent retrieval. Events were tagged with keywords (e.g., “inadequacy,” “self worth,” “cognitive distortion,” “rejection”) to facilitate retrieval during future interactions and assessments. A forced reflection was triggered approximately 40 seconds (4 time steps) after each adverse event, prompting the agent to generate an introspective entry about their thoughts and feelings. The six events were administered sequentially with a 10-minute stagger between agents (60 time steps apart). Table 2 Adverse life events administered to target agents on Day 4. Agent Event Domain Event Description Maria Lopez Family crisis Mother suffered a ruptured brain aneurysm; emergency surgery; prognosis uncertain Isabella Rodriguez Financial threat Bank warns of insolvency review for café business; supplier filed for unpaid invoices John Lin Workplace conflict Dispensing-procedure error triggered client complaint; disciplinary investigation initiated Jane Moreno Medical emergency Father had a stroke; hospitalised in intensive care; asked to review treatment consent Klaus Mueller Housing instability Dorm building scheduled for urgent renovation; forced relocation within 30 days Ayesha Khan Academic/financial Merit scholarship placed on hold; tuition balance due within 10 days Mental Health Assessment Agent mental health was assessed using two standardised self-report instruments: the Patient Health Questionnaire-9 (PHQ-9)(Kroenke et al., 2001 ), a 9-item measure of depressive symptom severity, and the Kessler Psychological Distress Scale (K10)(Kessler et al., 2002 ), a 10-item measure of non-specific psychological distress. Both instruments were chosen because they are among the most widely used screening tools in psychiatric research and clinical practice and produce ordinal sum scores suitable for longitudinal comparison. To administer these instruments, prompts were constructed that included the agent’s current persona description along with contextually relevant memories retrieved via the standard retrieval mechanism (combining recency, importance, and relevance scores). The agent’s personality traits were included in the prompt to ensure character-consistent responding. The LLM was then asked to respond to each questionnaire item on the standard response scale. Responses were parsed and summed to produce total scores for each instrument. Assessments were scheduled at the end of each 12-hour simulation segment, yielding two assessment time points per simulated day (approximately at 12:00 and 24:00 in simulated time). Over the full five-day simulation, this produced 10 assessment time points per agent per instrument. Neuroticism Moderation Analysis To examine whether personality traits moderate the impact of adverse life events on mental health outcomes, we conducted a branching experiment manipulating neuroticism levels. Starting from the shared baseline state at the end of Day 3, the simulation was forked into three parallel branches for the six target agents: (1) Default neuroticism The original personality profiles were retained as defined in the agents’ biographical files (the main analysis described above). (2) High neuroticism The personality descriptions of the six target agents were modified to include elevated neuroticism-related traits (e.g., heightened emotional reactivity, tendency toward worry and rumination, sensitivity to negative events). These modified descriptions were loaded into a separate branch of the simulation that otherwise replicated the identical adverse event schedule. (3) Low neuroticism The personality descriptions were modified to include low neuroticism-related traits (e.g., emotional stability, resilience under stress, capacity to maintain equanimity). This branch also received the identical adverse event schedule. Importantly, all three branches inherited the complete memory state from the shared three-day baseline, ensuring that any observed differences in post-event mental health scores could be attributed to the personality manipulation rather than divergent experiential histories. Each branch ran for two additional simulated days (Days 4–5), with the same assessment schedule as described above. The same six target agents (n = 6) were assessed across all three conditions, yielding a within-agent, between-condition comparison. Statistical Analysis All statistical analyses were conducted in R (R Core Team, 2024 ). To test the primary hypothesis that adverse life events increase psychological distress, we compared post-event PHQ-9 and K10 scores between exposed (n = 6) and non-exposed (n = 19) agents using independent-samples Welch’s t-tests at each post-event assessment time point. To model symptom trajectories over time, we fitted linear mixed-effects models using the lme4 package with PHQ-9 and K10 total scores as dependent variables. Fixed effects included time (assessment time point), group (exposed vs. non-exposed), and their interaction. Agent identity was included as a random intercept to account for repeated measurements within individuals. A significant time × group interaction was interpreted as evidence that adverse events altered the trajectory of mental health scores. For the neuroticism moderation analysis, we fitted linear mixed-effects models with post-event total scores as the dependent variable, neuroticism condition (low, average, high) and baseline score as fixed effects, and agent identity as a random intercept. Post-hoc pairwise comparisons were conducted using estimated marginal means with Tukey correction for multiple comparisons. Effect sizes are reported as standardised mean differences (SMD) derived from the emmeans package. To examine whether neuroticism differentially moderates specific symptoms, we fitted separate mixed-effects models for each questionnaire item, with item score as the dependent variable, neuroticism condition and baseline item score as fixed effects, and agent as a random intercept. P-values were corrected for multiple comparisons using the false discovery rate (FDR) method. Effect sizes for the high-versus-low neuroticism contrast were computed as standardised mean differences from estimated marginal means. Given the exploratory nature of this study and the relatively small sample of agents, we report both p-values and effect sizes (standardised mean differences for pairwise comparisons). The significance threshold was set at α = 0.05 for all tests. Conversational Behaviour Analysis: To assess whether adverse events and personality traits influence naturalistic social behaviour, we analysed all post-event conversations involving target agents. Each conversation was classified by GPT-4.1 for five behavioural features: (1) personal difficulty mentioned, (2) negative emotion expressed, (3) support seeking, (4) partner empathy shown, and (5) partner practical help offered. Classification was performed at the conversation level using a structured prompt that presented the full conversation text and requested binary (present/absent) ratings for each feature. Rates of each feature were compared between exposed and non-exposed agents within each neuroticism condition using Pearson chi-squared tests and Fisher's exact tests, with phi coefficients and odds ratios as effect size measures. Data and Code Availability The simulation and analysis pipeline is implemented as an open-source toolbox (psyagent) and is available together with all agent configuration files, prompt templates, and simulation logs at https://github.com/kambeitzlab/psyagent . Generative AI: During the preparation of this manuscript, the author used ChatGPT 5.2 (OpenAI) and Claude Opus 4.6 (Anthropic) for language editing and manuscript revision. The author reviewed and edited all AI-generated output and takes full responsibility for the content of the publication. Declarations Author Contribution J.K. designed the analysis, wrote the code and analysis and wrote the manuscript. AI (ChatGPT 5.3 and Claude Opus 4.6) was used during designing of this analysis, as part of the simulation code, during the analysis and during the writing of the manuscript. All text and code was reviewed by the author. Data Availability The simulation and analysis pipeline is implemented as an open-source toolbox (psyagent) and is available together with all agent configuration files, prompt templates, and simulation logs at https://github.com/kambeitzlab/psyagent. References Alon, N., Macrynikola, N., Jester, D.J., Keshavan, M., Reynolds, C.F., 3rd, Saxena, S., Thomas, M.L., Torous, J., Jeste, D.V., 2024. Social determinants of mental health in major depressive disorder: Umbrella review of 26 meta-analyses and systematic reviews. Psychiatry Res. 335, 115854. Arango, C., Dragioti, E., Solmi, M., Cortese, S., Domschke, K., Murray, R.M., Jones, P.B., Uher, R., Carvalho, A.F., Reichenberg, A., Shin, J., Ii, Andreassen, O.A., Correll, C.U., Fusar-Poli, P., 2021. Risk and protective factors for mental disorders beyond genetics: an evidence-based atlas. World Psychiatry 20, 417–436. Bail, C.A., 2024. Can Generative AI improve social science? Proc. Natl. Acad. Sci. U. S. A. 121, e2314021121. Ben-Zion, Z., Witte, K., Jagadish, A.K., Duek, O., Harpaz-Rotem, I., Khorsandian, M.-C., Burrer, A., Seifritz, E., Homan, P., Schulz, E., Spiller, T.R., 2025. Assessing and alleviating state anxiety in large language models. NPJ Digit. Med. 8, 132. Bonabeau, E., 2002. Agent-based modeling: methods and techniques for simulating human systems. Proc. Natl. Acad. Sci. U. S. A. 99 Suppl 3, 7280–7287. Chen, J., Wang, X., Xu, R., Yuan, S., Zhang, Y., Shi, W., Xie, J., Li, S., Yang, R., Zhu, T., Chen, A., Li, N., Chen, L., Hu, C., Wu, S., Ren, S., Fu, Z., Xiao, Y., 2024. From persona to personalization: A survey on role-Playing Language Agents. arXiv [cs.CL]. Coda-Forno, J., Witte, K., Jagadish, A.K., Binz, M., Akata, Z., Schulz, E., 2023. Inducing anxiety in large language models increases exploration and bias. arXiv [cs.CL]. Cummins, J., 2025. The threat of analytic flexibility in using large language models to simulate human data: A call to attention. arXiv [cs.CY]. https://doi.org/10.48550/arXiv.2509.13397 Dragioti, E., Radua, J., Solmi, M., Arango, C., Oliver, D., Cortese, S., Jones, P.B., Il Shin, J., Correll, C.U., Fusar-Poli, P., 2022. Global population attributable fraction of potentially modifiable risk factors for mental disorders: a meta-umbrella systematic review. Mol. Psychiatry 27, 3510–3519. Ferraro, A., Galli, A., La Gatta, V., Postiglione, M., Orlando, G.M., Russo, D., Riccio, G., Romano, A., Moscato, V., 2024. Agent-Based Modelling meets generative AI in social network simulations. arXiv [cs.SI]. https://doi.org/10.48550/arXiv.2411.16031 Fowler, J.H., Christakis, N.A., 2008. Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the Framingham Heart Study. BMJ 337, a2338. GBD 2021 Diseases and Injuries Collaborators, 2024. Global incidence, prevalence, years lived with disability (YLDs), disability-adjusted life-years (DALYs), and healthy life expectancy (HALE) for 371 diseases and injuries in 204 countries and territories and 811 subnational locations, 1990–2021: a systematic analysis for the Global Burden of Disease Study 2021. Lancet 403, 2133–2161. Kambeitz, J., Meyer-Lindenberg, A., 2025. Modelling the impact of environmental and social determinants on mental health using generative agents. NPJ Digit. Med. 8. https://doi.org/10.1038/s41746-024-01422-z Kambeitz, J., Schiffman, J., Kambeitz-Ilankovic, L., Mittal, V.A., Ettinger, U., Vogeley, K., 2025. The empirical structure of psychopathology is represented in large language models. Nat. Ment. Health 1–11. Kendler, K.S., Kuhn, J., Prescott, C.A., 2004. The interrelationship of neuroticism, sex, and stressful life events in the prediction of episodes of major depression. Am. J. Psychiatry 161, 631–636. Kessler, R.C., Andrews, G., Colpe, L.J., Hiripi, E., Mroczek, D.K., Normand, S.L.T., Walters, E.E., Zaslavsky, A.M., 2002. Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychol. Med. 32, 959–976. Kotov, R., Gamez, W., Schmidt, F., Watson, D., 2010. Linking “big” personality traits to anxiety, depressive, and substance use disorders: a meta-analysis. Psychol. Bull. 136, 768–821. Kroenke, K., Spitzer, R.L., Williams, J.B., 2001. The PHQ-9: validity of a brief depression severity measure. J. Gen. Intern. Med. 16, 606–613. Li, J., Wang, S., Zhang, M., Li, W., Lai, Y., Kang, X., Ma, W., Liu, Y., 2024. Agent Hospital: A simulacrum of hospital with evolvable medical agents. arXiv [cs.AI]. Marinescu, I.E., Lawlor, P.N., Kording, K.P., 2018. Quasi-experimental causality in neuroscience and behavioural research. Nat. Hum. Behav. 2, 891–898. McGorry, P.D., Mei, C., Dalal, N., Alvarez-Jimenez, M., Blakemore, S.-J., Browne, V., Dooley, B., Hickie, I.B., Jones, P.B., McDaid, D., Mihalopoulos, C., Wood, S.J., El Azzouzi, F.A., Fazio, J., Gow, E., Hanjabam, S., Hayes, A., Morris, A., Pang, E., Paramasivam, K., Quagliato Nogueira, I., Tan, J., Adelsheim, S., Broome, M.R., Cannon, M., Chanen, A.M., Chen, E.Y.H., Danese, A., Davis, M., Ford, T., Gonsalves, P.P., Hamilton, M.P., Henderson, J., John, A., Kay-Lambkin, F., Le, L.K.-D., Kieling, C., Mac Dhonnagáin, N., Malla, A., Nieman, D.H., Rickwood, D., Robinson, J., Shah, J.L., Singh, S., Soosay, I., Tee, K., Twenge, J., Valmaggia, L., van Amelsvoort, T., Verma, S., Wilson, J., Yung, A., Iyer, S.N., Killackey, E., 2024. The Lancet Psychiatry Commission on youth mental health. Lancet Psychiatry 11, 731–774. Monroe, S.M., Simons, A.D., 1991. Diathesis-stress theories in the context of life stress research: implications for the depressive disorders. Psychol. Bull. 110, 406–425. Öngür, D., Paulus, M.P., 2025. Embracing complexity in psychiatry-from reductionistic to systems approaches. Lancet Psychiatry 12, 220–227. Ormel, J., Oldehinkel, A.J., Brilman, E.I., 2001. The interplay and etiological continuity of neuroticism, difficulties, and life events in the etiology of major and subsyndromal, first and recurrent depressive episodes in later life. Am. J. Psychiatry 158, 885–891. Park, J.S., O’Brien, J.C., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S., 2023. Generative agents: Interactive simulacra of human behavior. arXiv [cs.HC]. Park, J.S., Zou, C.Q., Shaw, A., Hill, B.M., Cai, C., Morris, M.R., Willer, R., Liang, P., Bernstein, M.S., 2024. Generative agent simulations of 1,000 people. arXiv [cs.AI]. R Core Team, 2024. R: A Language and Environment for Statistical Computing. Sadée, C., Testa, S., Barba, T., Hartmann, K., Schuessler, M., Thieme, A., Church, G.M., Okoye, I., Hernandez-Boussard, T., Hood, L., Shmulevich, I., Kuhl, E., Gevaert, O., 2025. Medical digital twins: enabling precision medicine and medical artificial intelligence. Lancet Digit. Health 0, 100864. Scheffer, M., Bockting, C.L., Borsboom, D., Cools, R., Delecroix, C., Hartmann, J.A., Kendler, K.S., van de Leemput, I., van der Maas, H.L.J., van Nes, E., Mattson, M., McGorry, P.D., Nelson, B., 2024a. A dynamical systems view of psychiatric disorders-practical implications: A review: A review. JAMA Psychiatry 81, 624–630. Scheffer, M., Bockting, C.L., Borsboom, D., Cools, R., Delecroix, C., Hartmann, J.A., Kendler, K.S., van de Leemput, I., van der Maas, H.L.J., van Nes, E., Mattson, M., McGorry, P.D., Nelson, B., 2024b. A dynamical systems view of psychiatric disorders-theory: A review: A review. JAMA Psychiatry 81, 618–623. Serapio-García, G., Safdari, M., Crepy, C., Sun, L., Fitz, S., Romero, P., Abdulhai, M., Faust, A., Matarić, M., 2025. A psychometric framework for evaluating and shaping personality traits in large language models. Nat. Mach. Intell. 7, 1954–1968. Shanahan, M., McDonell, K., Reynolds, L., 2023. Role play with large language models. Nature 623, 493–498. Tracy, M., Cerdá, M., Keyes, K.M., 2018. Agent-based modeling in public health: Current applications and future directions. Annu. Rev. Public Health 39, 77–94. Wright, A.G.C., Ringwald, W.R., Vize, C.E., Eichstaedt, J.C., Angstadt, M., Taxali, A., Sripada, C., 2026. Assessing personality using zero-shot generative AI scoring of brief open-ended text. Nat. Hum. Behav. 1–15. Yang, Y., Langellier, B.A., Stankov, I., Purtle, J., Nelson, K.L., Reinhard, E., Van Lenthe, F.J., Diez Roux, A.V., 2020. Public transit and depression among older adults: using agent-based models to examine plausible impacts of a free bus policy. J. Epidemiol. Community Health 74, 875–881. Yang, Z., Zhang, Z., Zheng, Z., Jiang, Y., Gan, Z., Wang, Z., Ling, Z., Chen, J., Ma, M., Dong, B., Gupta, P., Hu, S., Yin, Z., Li, G., Jia, X., Wang, L., Ghanem, B., Lu, H., Lu, C., Ouyang, W., Qiao, Y., Torr, P., Shao, J., 2024. OASIS: Open Agent Social Interaction Simulations with One Million Agents. arXiv [cs.CL]. https://doi.org/10.48550/arXiv.2411.11581 Zubin, J., Spring, B., 1977. Vulnerability–a new view of schizophrenia. J. Abnorm. Psychol. 86, 103–126. Additional Declarations No competing interests reported. Supplementary Files KambeitzLLMbasedagentssupplement.docx Cite Share Download PDF Status: Under Review Version 1 posted Reviewers agreed at journal 14 May, 2026 Reviews received at journal 01 May, 2026 Reviewers agreed at journal 20 Apr, 2026 Reviewers invited by journal 19 Apr, 2026 Editor assigned by journal 05 Apr, 2026 Submission checks completed at journal 04 Apr, 2026 First submitted to journal 01 Apr, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9291631","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":628670369,"identity":"8c266eab-42b0-4176-a0e2-ec505b8b3319","order_by":0,"name":"Joseph Kambeitz","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABJ0lEQVRIie2Rv0vDQBTHXzlIl8SuDwrmX7gQiA75Y14ImCWpFBcHKTeli+AmyeLfkNGxpeB0pX9AlhahrpEuAUG8pLqddnW4z3Dhfnz4vvcCYDD8QwbiuHAAG4AAztWXqbNL6LenFQL/W8FflR/vqKhHkTilsPl89zp9Bn/0uF5ud+0sqeSC8eYWJzCMF9qAe+n7pYQA60nMiaysWgsWFRJvwN5rYwZFao2dHEKo0wCJ7KzajA4rJ8dIYMr1SvL20SlunV60RJjwDbCV89kp141eoYApJeAqRU2ME1eFxY7oU/TDUr2Myxx9T/WCdEVeqdr3iheMcnuvLcxTEztM89B7qrPlexvO3DNJDJu7MHoYxlutIqD7Dai5srRlAbjQKwaDwWD4gy/vBV9++3oGDgAAAABJRU5ErkJggg==","orcid":"","institution":"University Hospital Cologne","correspondingAuthor":true,"prefix":"","firstName":"Joseph","middleName":"","lastName":"Kambeitz","suffix":""}],"badges":[],"createdAt":"2026-04-01 12:08:35","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9291631/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9291631/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107947879,"identity":"1c94f2a1-b290-4d47-b68f-33033c870c0b","added_by":"auto","created_at":"2026-04-28 00:05:20","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":299114,"visible":true,"origin":"","legend":"\u003cp\u003eExperimental design to investigate the effect of adverse events on mental health in simulations of generative agents. A total set of n=25 interacting agents was modeled. After 3 days of simulated time, 6 agents were exposed to individual adverse events of different severity. The simulation was continued for another total duration of 2 days. In a parallel simulation, 6 agents were duplicated and their personality traits were altered to simulate persons with high neuroticism and subsequently exposed to the same adverse events. Mental health was assessed bi-daily using K10 and PHQ-9.\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-9291631/v1/a5d4120995fea88693770042.png"},{"id":107947880,"identity":"f8a9591e-3f3e-4fc3-8847-b32ef3fe5bdc","added_by":"auto","created_at":"2026-04-28 00:05:20","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":294293,"visible":true,"origin":"","legend":"\u003cp\u003e(A) Effect of adverse events on stress (K10) and depression scores (PHQ-9).\u003cstrong\u003e \u003c/strong\u003eP-values indicate significant differences in symptom scores between exposed (n=6) and non-exposed (n=19) agents derived from independent sample t-test. Item-level symptom changes by adverse event type on the PHQ-9 (B) and K10 (C). Each cell shows the pre-to-post score change for a given item and agent. Positive values (red) indicate symptom increases; negative values (blue) indicate decrease.\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-9291631/v1/13f490e4e17138883f1beb49.png"},{"id":107947881,"identity":"77657d22-5630-4af7-8d47-649dec34bedf","added_by":"auto","created_at":"2026-04-28 00:05:20","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":260056,"visible":true,"origin":"","legend":"\u003cp\u003eEffect of neuroticism on baseline-adjusted symptom severity following an adverse event. Estimated marginal means (points) and 95% confidence intervals (error bars) for (A) depressive symptoms (PHQ-9) and (B) psychological distress (K10) are shown for low, average, and high neuroticism conditions. Values are adjusted for baseline symptom levels using mixed-effects models with agent as a random intercept. Brackets indicate significant Tukey-corrected pairwise post-hoc comparisons; p-values are capped at p \u0026lt; 0.001 and reported with three decimal places. Effect size (Cohen's d) of neuroticism moderation for individual items of the PHQ-9 (D) and K10 (E). Values represent the baseline-adjusted standardised mean difference between high and low neuroticism conditions. Higher values indicate stronger increases in item scores for high relative to low neuroticism. P-values are FDR-corrected: * p\u0026lt;.05, ** p\u0026lt;.01, *** p\u0026lt;.001. Post-event item scores on the PHQ-9 (C) and K10 (F) by neuroticism level. Values represent mean item scores after the adverse event. Significance stars indicate the effect of neuroticism from item-level mixed-effects models adjusting for baseline (pre-event) scores with agent-level random intercepts. P-values are FDR-corrected: * p\u0026lt;.05, ** p\u0026lt;.01, *** p\u0026lt;.001.\u003c/p\u003e","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-9291631/v1/d5a15aecb12a09d17cf221c2.png"},{"id":107947882,"identity":"ffaefba3-1580-488c-8823-e65ae8ca9b67","added_by":"auto","created_at":"2026-04-28 00:05:20","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":425228,"visible":true,"origin":"","legend":"\u003cp\u003eAnalysis of agent conversations after exposure to adverse events compared to control.\u003c/p\u003e","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-9291631/v1/d6be4dd14b23f7d9760f3864.png"},{"id":108006773,"identity":"2c0ac976-a163-424c-ab3d-3e49f6f1b642","added_by":"auto","created_at":"2026-04-28 12:57:01","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1633027,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9291631/v1/9be219a7-8136-4bfa-97ab-f71d90a27e6f.pdf"},{"id":107947883,"identity":"34fab035-1d6c-401f-b18f-dd25dd33977b","added_by":"auto","created_at":"2026-04-28 00:05:20","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":405895,"visible":true,"origin":"","legend":"","description":"","filename":"KambeitzLLMbasedagentssupplement.docx","url":"https://assets-eu.researchsquare.com/files/rs-9291631/v1/9e8f3ce0bde9b501c75996f7.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"\u003cp\u003eLLM-Based Generative Agents Simulate Mental-Health Trajectories Under Adverse Socio-Environmental Conditions\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eMental disorders such as major depression constitute a substantial and growing share of the global burden of disease (GBD 2021 Diseases and Injuries Collaborators, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), with prevalence rising sharply among young people in recent years (McGorry et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Social and environmental factors such as job loss, financial hardship, or social isolation are among the strongest determinants of mental health (Alon et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Arango et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). Following the stress-vulnerability model, such environmental stressors interact with pre-existing individual vulnerabilities to facilitate mental disorders (Monroe and Simons, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e1991\u003c/span\u003e; Zubin and Spring, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e1977\u003c/span\u003e). Especially, personality traits such as neuroticism - characterised by heightened emotional reactivity and proneness to negative affect - are strong predictors of mental health (Kotov et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2010\u003c/span\u003e) and moderate the depressogenic impact of stressful life events (Kendler et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2004\u003c/span\u003e; Ormel et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2001\u003c/span\u003e). Most importantly, many socio-environmental influences are potentially modifiable and thus represent significant opportunities for intervention and prevention (Dragioti et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eYet our ability to establish how specific adversities cause, sustain, or exacerbate symptoms remains fundamentally limited. Ethical constraints prohibit experimental manipulation of major life stressors in humans; observational studies are beset by confounding, reverse causality, and limited capacity to capture dynamic processes; and statistical methods for causal inference from observational data rely on assumptions that are often untestable (Marinescu et al., \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; \u0026Ouml;ng\u0026uuml;r and Paulus, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). The result is that interventions targeting socio-environmental risk factors remain insufficiently grounded in causal knowledge (Scheffer et al., 2024a, 2024b).\u003c/p\u003e \u003cp\u003eAgent-based modelling offers a complementary strategy: by simulating individual humans as autonomous agents that act and interact within virtual environments, researchers gain complete experimental control over the factors under investigation (Bonabeau, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2002\u003c/span\u003e). Early applications in public health have modelled health-related behaviours including smoking, physical activity, and transportation use (Tracy et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Yang et al., \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). However, for studying mental health specifically, classical agent-based modelling faces important limitations: environments are typically highly simplified, agent behaviour is restricted to predefined action sets, and outcomes focus on observable behaviour rather than the subjective experiences and internal states central to psychopathology (Kambeitz and Meyer-Lindenberg, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eLarge language models (LLMs) have the potential to overcome these constraints. Trained on vast corpora of human-generated text encoding diverse aspects of thought, emotion, and social interaction, LLMs can generate contextually appropriate behaviour and natural conversations (Chen et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Shanahan et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Several reviews have argued that generative AI may fundamentally reshape how the social and behavioural sciences study human populations \u0026mdash; for example by enabling large-scale simulations, automated content analysis, and novel forms of experimental research (Bail, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Sad\u0026eacute;e et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Crucially for psychiatric applications, LLMs also encode structured knowledge of psychopathology (Kambeitz et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), and inducing adverse psychological contexts produces contextually appropriate changes on clinical instruments (Ben-Zion et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Coda-Forno et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eBuilding on these capabilities, LLMs have the potential to augment agent-based modelling and to create generative agents that exhibit a range of plausible human behaviour such as navigation of complex social environments or formation of relationships (Li et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Park et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Recent advances have demonstrated the scalability of this paradigm: generative agent simulations of up to 1,000 individuals replicate participants' survey responses with accuracy approaching human test\u0026ndash;retest reliability (Park et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Moreover, large-scale platforms now integrate LLMs with agent-based frameworks for social network modelling with simulation up to one million agents (Ferraro et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Yang et al., \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). These developments suggest that generative agents are maturing from proof-of-concept demonstrations toward tools capable of addressing substantive scientific questions. We recently proposed that such agents could be embedded in virtual environments and assessed using standardised mental health instruments to investigate socio-environmental determinants of mental health under full experimental control (Kambeitz and Meyer-Lindenberg, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). In the present study, we provide the first empirical implementation of this framework, testing whether LLM-based generative agents can capture core features of the stress\u0026ndash;vulnerability model of mental disorders under controlled experimental conditions.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003e25 generative agents were simulated interacting in a virtual environment over five days (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). During a three-day baseline period, agents were assessed twice daily with respect to depressive (PHQ-9) and stress-related symptoms (K10). On day three, six agents were exposed to individually tailored adverse life events of varying severity\u0026mdash;including a family medical emergency, financial threat, and workplace conflict\u0026mdash;while 19 agents served as unexposed controls. To test personality moderation, the simulation was branched at day three into three parallel runs in which the six exposed agents were assigned low, average, or high neuroticism profiles before experiencing identical adverse events.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eBoth PHQ-9 and K10 showed marked increases in symptom scores in exposed agents compared to non-exposed controls (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eA), with elevated scores persisting across all subsequent assessment timepoints (all p\u0026thinsp;\u0026lt;\u0026thinsp;0.05). Analysis of the trajectory of individual items of the PHQ-9 and the K10 indicated symptom score increases following exposure to adverse events in most items (Supplementary Figs.\u0026nbsp;1 \u0026amp; 2). Interestingly, the increase in symptom scores varied depending on which questionnaire item but also depending on the type of adverse event (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eB and \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003eC), suggesting that the present simulations exhibit variability in symptoms depending on the type of adverse events and the type of agent as would be expected in empirical investigations.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFor depression symptoms (PHQ-9), a mixed-effects model adjusting for baseline scores revealed a significant main effect of neuroticism (χ\u0026sup2;(2)\u0026thinsp;=\u0026thinsp;32.13, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001) indicating that the increase of depression symptoms was moderated by personality style (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eA). The fixed effects explained 19.5% of the variance in post-intervention PHQ-9 scores (marginal R\u0026sup2; = 0.195). Post-hoc Tukey-corrected comparisons showed significantly lower PHQ-9 scores in the low-neuroticism condition compared to the average condition (p\u0026thinsp;=\u0026thinsp;0.003, SMD = -2.55) and compared to the high-neuroticism condition (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001, SMD = -3.05). The difference between average and high neuroticism did not reach statistical significance (p\u0026thinsp;=\u0026thinsp;0.676, SMD = -0.5).\u003c/p\u003e \u003cp\u003eFor stress-related symptoms (K10), the same modeling approach showed a significant main effect of neuroticism (χ\u0026sup2;(2)\u0026thinsp;=\u0026thinsp;52.5, p\u0026thinsp;\u0026lt;\u0026thinsp;0.001) indicating that also the increase of stress symptoms was moderated by personality style (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eB). Neuroticism and baseline distress together explained 36.7% of the variance in post-intervention K10 scores (marginal R\u0026sup2; = 0.367). Post-hoc analyses indicated significantly lower K10 scores in the low-neuroticism condition compared to the average condition (p\u0026thinsp;=\u0026thinsp;0.002, SMD = -2.78) and compared to the high-neuroticism condition (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001, SMD = -4.1). The contrast between average and high neuroticism was not statistically significant (p\u0026thinsp;=\u0026thinsp;0.105, SMD = -1.32).\u003c/p\u003e \u003cp\u003eLooking at each individual item of the PHQ-9 and the K10, we detected moderation by neuroticism after FDR correction for each item except for suicidal ideation (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eC and \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eF). The strongest neuroticism moderation effects were observed for anxiety-related symptoms in the K10 such as nervousness and restlessness (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eD and \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003eE). This pattern is consistent with the conceptual overlap between neuroticism - defined partly by anxiety proneness and emotional reactivity - and anxiety-related symptom content (Kotov et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2010\u003c/span\u003e). The fact that the simulation reproduces this differential sensitivity rather than uniformly inflating all items provides evidence that manipulating neuroticism leads to some degree of symptom specificity rather than acting as a nonspecific amplifier of distress scores.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThe qualitative analysis indicated that agents store adverse experiences as memories and are building secondary memories as a result of reflecting about their current situation (e.g. John is \u0026ldquo;...feeling anxious about the upcoming disciplinary meeting\u0026rdquo;). Also, agents adapt their behaviour following adverse events. E.g. after hearing that her caf\u0026eacute; is in financial trouble, Isabella Rodriguez is \u0026ldquo;reviewing the letter regarding the insolvency review\" and \"contacting the bank to address financial concerns\" or after hearing that her scholarship might not be prolonged, Ayesha Khan spends time across multiple actions such as \"preparing documentation for the appeal meeting\", \"gathering necessary documents\", \"writing a summary\", \"reviewing the appeal requirements\" and \"finalizing the documentation\". Moreover, agents disclose their adverse experiences to other agents. As an example Maria Lopez discloses to 3 agents that her mother suffered from an aneurysm (\"I'm still feeling a bit overwhelmed thinking about everything going on with my mom\u0026rdquo;, \"It's been a tough time with my mom in the hospital, so knowing I can talk to you makes things a bit easier\") and other agents respond in a supportive manner (e.g. Hailey responds \"I'm so sorry to hear about your mom, Maria. That must be really tough for you. I completely understand if you need to take a step back...\", Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Lastly, after hearing about others' adverse experiences, agents build memories of these events, indicating that adverse events socially spread through the agent community - a pattern consistent with evidence that emotional states propagate through social networks in human populations (Fowler and Christakis, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2008\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eTo quantify these conversational patterns, we used GPT-4.1 to classify each post-event conversation for five behavioral features and compared rates between exposed and control agents (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e, Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Across all neuroticism conditions, exposed agents showed significantly higher rates of personal difficulty disclosure, negative emotion expression, and support seeking compared to controls (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Notably, these conversational features showed a dose-response pattern with neuroticism: high-neuroticism agents disclosed personal difficulties in 27.3% of conversations versus 14.8% for low-neuroticism agents, and expressed negative emotions in 28.8% versus 13.1% of conversations. Conversation partners of exposed agents also showed elevated empathy and practical help responses, with effect sizes increasing alongside the exposed agent's neuroticism level (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). This convergence between standardised symptom measures and naturalistic conversational behaviour suggests that the neuroticism manipulation propagates beyond questionnaire responses into social interaction patterns. Notably, practical help offered by conversation partners was the only feature that did not differ significantly between exposed and control agents in the average-neuroticism condition (p\u0026thinsp;=\u0026thinsp;0.306), suggesting that the conversational classification captures meaningful variation rather than uniformly inflating all social features in exposed agents.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cb\u003eConversational behaviour features in post-adverse-event conversations: exposed vs. control agents across neuroticism conditions.\u003c/b\u003e Exposed and control rates represent proportions of conversations containing each feature. χ\u0026sup2; = Pearson chi-squared statistic; φ\u0026thinsp;=\u0026thinsp;phi coefficient (effect size); OR\u0026thinsp;=\u0026thinsp;odds ratio from Fisher\u0026rsquo;s exact test. \u0026mdash; indicates undefined OR (zero events in control group). All tests are two-sided.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"9\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCondition\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFeature\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eExposed\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eControl\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cem\u003eχ\u0026sup2;\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u003cem\u003ep\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003e\u003cem\u003eφ\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003eOR\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u003cem\u003eFisher p\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLow neuroticism\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNegative Emotion Expressed\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003e13.1%\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.7%\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003e15.0\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.21\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003e8.4\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePartner Empathy Shown\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003e18.0%\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e1.7%\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003e26.7\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003e0.28\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003e12.2\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLow neuroticism\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003ePartner Practical Help\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e27.9%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e7.7%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e18.5\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.23\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e4.6\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003ePersonal Difficulty Mentioned\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e14.8%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e2.1%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e16.5\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.22\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e8.0\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eLow neuroticism\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eSupport Seeking\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e14.8%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.7%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e27.9\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.28\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e24.2\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eAverage\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eNegative Emotion Expressed\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e17.1%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.7%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e34.7\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.32\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e27.5\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003ePartner Empathy Shown\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e21.1%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e1.8%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e35.4\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.32\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e14.1\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eAverage\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003ePartner Practical Help\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e15.8%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e10.7%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e1.0\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e0.306\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.05\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e1.6\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e0.230\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003ePersonal Difficulty Mentioned\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e23.7%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.7%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e53.6\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.39\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e41.3\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eAverage\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eSupport Seeking\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e14.5%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.0%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e36.1\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.32\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e\u0026mdash;\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eHigh neuroticism\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eNegative Emotion Expressed\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e28.8%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.3%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e75.7\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.46\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e113.0\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003ePartner Empathy Shown\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e31.8%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e2.1%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e62.8\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.42\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e21.5\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eHigh neuroticism\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003ePartner Practical Help\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e31.8%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e7.0%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e29.7\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.29\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e6.2\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003ePersonal Difficulty Mentioned\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e27.3%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.7%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e65.8\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.43\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e52.4\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eHigh neuroticism\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cb\u003eSupport Seeking\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e\u003cb\u003e22.7%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e\u003cb\u003e0.0%\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e\u003cb\u003e62.4\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e\u003cb\u003e0.42\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u003cb\u003e\u0026mdash;\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c9\"\u003e \u003cp\u003e\u003cb\u003e\u0026lt;\u0026thinsp;0.001\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study demonstrates that LLM-based generative agents can exhibit structured, longitudinal symptom changes on standardised instruments following experimentally introduced adversity, with moderation by neuroticism consistent with the stress\u0026ndash;vulnerability model(Monroe and Simons, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e1991\u003c/span\u003e; Zubin and Spring, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e1977\u003c/span\u003e). These findings support the feasibility of using generative agents as controllable experimental substrates in which exposures, vulnerability factors, and outcomes can be manipulated without the ethical and practical constraints of human experimentation.\u003c/p\u003e \u003cp\u003eA key contribution is the experimental control afforded by simulation. Observational studies robustly link adversity to mental health outcomes, but causal interpretation is limited by confounding and reverse causality. In contrast, the present design introduces adversity exogenously within a closed computational world and tracks within-agent symptom changes over time. This makes it possible to ask counterfactual questions\u0026mdash;\"What would this agent's trajectory have been without the event?\" or \"How would the same event unfold under altered vulnerability?\"\u0026mdash;that are typically inaccessible in clinical cohorts. By implementing counterfactual branches from the same pre-event world state, we reduce between-run variability and enable direct comparison of post-event trajectories.\u003c/p\u003e \u003cp\u003eFrom a psychiatric modelling standpoint, the observed moderation by neuroticism indicates that personality manipulations can propagate through the agent's memory, reflection, and action-generation loops into measurable symptom changes. This aligns with psychometric evidence that personality traits in LLMs can be reliably measured and shaped along desired dimensions (Serapio-Garc\u0026iacute;a et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Overall, this suggests a practical route for formalising stress\u0026ndash;vulnerability hypotheses computationally: specify vulnerability dimensions, introduce controlled stressors, and quantify resulting symptom readouts. Importantly, the neuroticism effect was not confined to questionnaire scores. Conversational analysis revealed a dose-response pattern in which high-neuroticism agents disclosed personal difficulties, expressed negative emotions, and sought support at substantially higher rates than low-neuroticism agents (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). This convergence between standardised instruments and naturalistic social behaviour provides initial evidence for construct validity\u0026mdash;the personality manipulation altered not only self-reported symptoms but also observable interaction patterns.\u003c/p\u003e \u003cp\u003eA related question is whether such agents can achieve fidelity not only to theoretical patterns but also to individual human responses. Encouragingly, recent work demonstrates that generative agents initialised from detailed personal interviews can replicate participants' own responses on attitudinal surveys at levels approaching human test\u0026ndash;retest reliability (Park et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), and that LLMs can predict personality traits from brief open-ended narratives with accuracy exceeding traditional natural language processing methods (Wright et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2026\u003c/span\u003e). The present study did not incorporate individual-level calibration, as agents were initialised from fictional biographical profiles rather than real participant data, and achieving such grounding remains an important target for future work (Sad\u0026eacute;e et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eNevertheless, the current work does not establish that simulated moderation reflects the same latent causal pathways observed in humans. A plausible alternative is that \"neuroticism\" acts primarily as a prompt-level prior that shifts the linguistic style and valence of self-report, which then mechanically maps onto higher symptom scores (Coda-Forno et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Several features of the present results argue against a purely stylistic account \u0026mdash; the neuroticism manipulation produced differential effects across symptom domains rather than a uniform score increase, and it altered social behaviours that are not direct components of questionnaire scoring \u0026mdash; but distinguishing these interpretations fully will require negative-control manipulations (personality dimensions that should not affect depressive symptoms) and cross-model replication.\u003c/p\u003e \u003cp\u003eMore broadly, interpreting LLM agents as surrogates for human psychological processes requires careful qualification. In the synthetic participant literature, silicon-sample approaches have been shown to be sensitive to analytic choices \u0026mdash; including model selection, prompting strategy, temperature, and persona specification \u0026mdash; such that small configuration differences can substantially alter correspondence with human data (Cummins, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). These concerns are compounded by findings that autonomous LLM-based respondents can evade standard data-quality checks (Westwood, 2025), and that the integration of LLMs may exacerbate rather than alleviate long-standing challenges of validating agent-based models, given their black-box structure, cultural biases, and stochastic outputs (Larooij and T\u0026ouml;rnberg, 2026). While the present simulation-based approach differs from direct participant replacement \u0026mdash; agents interact within a structured environment rather than simply producing questionnaire responses \u0026mdash; these methodological risks underscore the need for rigorous validation, negative-control experiments, and cross-model replication before simulation outputs can be interpreted as evidence for specific psychological mechanisms.\u003c/p\u003e \u003cp\u003eSeveral limitations should be noted. The sample comprised only six exposed agents, limiting statistical power and generalisability. All simulations used a single LLM (GPT-3.5-turbo), and results may differ across model providers or parameter settings. The simulation spanned five days, leaving open whether longer trajectories would show recovery, chronicity, or other clinically relevant patterns. Finally, LLMs operate within the domain of language and cannot capture non-verbal, physiological, or neurobiological aspects of mental health.\u003c/p\u003e \u003cp\u003eLooking forward, several priorities emerge. First, robustness should be quantified across model providers and prompt perturbations, and simulation outputs should be benchmarked against human reference datasets. Second, scaling to larger populations will enable investigation of emergent social phenomena such as stress contagion, social buffering, and network-mediated spread of psychological distress \u0026mdash; dynamics that have been theorised in the psychiatric literature (Fowler and Christakis, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2008\u003c/span\u003e; Scheffer et al., 2024b) but are difficult to study with conventional longitudinal designs. Third, generative agent simulations offer a promising tool for addressing open questions that are difficult to tackle with existing methods \u0026mdash; such as why adverse experiences during specific developmental windows confer disproportionate risk for psychopathology (McGorry et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), how loneliness propagates through communities to affect population-level mental health(Fowler and Christakis, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2008\u003c/span\u003e), and whether specific intervention components can buffer symptom trajectories following adversity (Dragioti et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). By enabling controlled counterfactual experiments on these questions, generative agent models could serve as hypothesis-generating tools for prevention science, complementing traditional randomised trials and observational studies (Bail, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Kambeitz and Meyer-Lindenberg, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eIn summary, LLM-based generative agents offer a promising experimental complement to observational mental health research by enabling controlled, counterfactual tests of socio-environmental adversity and vulnerability. The present results demonstrate feasibility, theoretically aligned symptom patterns, and convergent evidence across self-report and behavioural measures. To convert this promise into reliable scientific evidence, future work must prioritise rigorous validation against human data, robustness to analytic choices, and transparent reporting of the full simulation pipeline.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eGenerative Agent Framework\u003c/h2\u003e \u003cp\u003eOur simulation was built on the generative agent framework introduced by Park et al. (\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), which employs large language models (LLMs) to create autonomous agents capable of human-like behaviour within a virtual environment. The environment consists of a virtual village comprising houses, a caf\u0026eacute;, a pharmacy, a college dormitory, and public spaces where agents can navigate and interact. Each of the 25 agents was initialised with a unique biographical profile specifying their name, age, occupation, daily routines, relationships with other agents, and personality traits. These profiles were stored as structured text files and loaded into each agent\u0026rsquo;s memory at the start of the simulation.\u003c/p\u003e \u003cp\u003eThe cognitive architecture underlying each agent comprises four modules that jointly determine behaviour (Park et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). \u003cem\u003eFirst\u003c/em\u003e, a \u003cem\u003ememory stream\u003c/em\u003e records all observations, actions, and reflections as timestamped entries, each tagged with an embedding vector and an importance score (rated 1\u0026ndash;10 by the LLM). \u003cem\u003eSecond\u003c/em\u003e, a \u003cem\u003eretrieval mechanism\u003c/em\u003e selects relevant memories for each new situation by computing a weighted combination of three factors: recency (exponential decay with a factor of 0.995), importance (the stored score), and relevance (cosine similarity between the memory\u0026rsquo;s embedding and the current query). \u003cem\u003eThird\u003c/em\u003e, a \u003cem\u003ereflection module\u003c/em\u003e periodically synthesises higher-level insights from accumulated memories once their cumulative importance exceeds a threshold of 150 points; these insights are themselves stored back into the memory stream. \u003cem\u003eFourth\u003c/em\u003e, a \u003cem\u003eplanning module\u003c/em\u003e generates hierarchical daily schedules that are recursively decomposed into actions at approximately 5\u0026ndash;15 minute granularity.\u003c/p\u003e \u003cp\u003eAll language generation and scoring calls were routed through OpenAI\u0026rsquo;s \u003cem\u003egpt-3.5-turbo (ChatGPT)\u003c/em\u003e API, consistent with the original framework (Park et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Default API parameters were used throughout (temperature\u0026thinsp;=\u0026thinsp;1.0). Text embeddings for memory retrieval were generated using OpenAI\u0026rsquo;s embedding endpoint.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eSimulation Design and Timeline\u003c/h3\u003e\n\u003cp\u003eThe simulation advanced in discrete time steps, with each step corresponding to 10 seconds of simulated time (360 steps per hour). In total, the simulation spanned five simulated days, divided into consecutive 12-hour segments to facilitate checkpointing and branching. The overall design comprised three phases:\u003c/p\u003e \u003cp\u003e \u003cb\u003eBaseline phase (Days 1\u0026ndash;3).\u003c/b\u003e All 25 agents lived their daily routines in Smallville without any experimental manipulation. This 72-hour period allowed agents to accumulate naturalistic memories, form social interactions, and establish stable behavioural patterns. Mental health assessments (PHQ-9 and K10) were administered at the end of each 12-hour segment (i.e., twice per simulated day), yielding six baseline measurement points per agent.\u003c/p\u003e \u003cp\u003e \u003cb\u003eAdverse event phase (Day 4).\u003c/b\u003e On the fourth simulated day, six of the 25 agents (n\u0026thinsp;=\u0026thinsp;6; hereafter \u0026ldquo;target agents\u0026rdquo;) were exposed to individually tailored adverse life events during the second 12-hour segment (i.e., approximately 18 hours into Day 4). The remaining 19 agents continued their routines without any scheduled events and served as non-exposed controls. Both PHQ-9 and K10 assessments were administered at the end of each 12-hour segment of Day 4.\u003c/p\u003e \u003cp\u003e \u003cb\u003eFollow-up phase (Day 5).\u003c/b\u003e All agents continued in the simulation for an additional 24 hours without further experimental events. PHQ-9 and K10 assessments were again administered at the end of each 12-hour segment, providing two follow-up measurements.\u003c/p\u003e \u003cp\u003eIn sum, each agent completed 10 mental health assessments (two per day \u0026times; five days) for each instrument, enabling the analysis of symptom trajectories across baseline, acute exposure, and follow-up.\u003c/p\u003e\n\u003ch3\u003eAdverse Life Events\u003c/h3\u003e\n\u003cp\u003eSix adverse life events were designed to be individually tailored to the biographical context of each target agent (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Events were crafted to span different life domains (family crisis, financial threat, workplace conflict, medical emergency, housing instability, and academic/financial jeopardy) and to represent realistic stressors of moderate to high severity. Each event description was written as a narrative vignette that the simulation engine injected directly into the agent\u0026rsquo;s memory stream at the scheduled time step.\u003c/p\u003e \u003cp\u003eTechnically, adverse events were processed as follows. At the designated time step, the event description was added to the agent\u0026rsquo;s memory stream via the standard memory-insertion pathway. The LLM assigned an importance (poignancy) score to each event by evaluating the event description in the context of the agent\u0026rsquo;s persona, ensuring that emotional impact was assessed in a character-specific manner. An embedding vector was computed for the event description to enable subsequent retrieval. Events were tagged with keywords (e.g., \u0026ldquo;inadequacy,\u0026rdquo; \u0026ldquo;self worth,\u0026rdquo; \u0026ldquo;cognitive distortion,\u0026rdquo; \u0026ldquo;rejection\u0026rdquo;) to facilitate retrieval during future interactions and assessments. A forced reflection was triggered approximately 40 seconds (4 time steps) after each adverse event, prompting the agent to generate an introspective entry about their thoughts and feelings.\u003c/p\u003e \u003cp\u003eThe six events were administered sequentially with a 10-minute stagger between agents (60 time steps apart).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAdverse life events administered to target agents on Day 4.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAgent\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEvent Domain\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEvent Description\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMaria Lopez\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFamily crisis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMother suffered a ruptured brain aneurysm; emergency surgery; prognosis uncertain\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIsabella Rodriguez\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eFinancial threat\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eBank warns of insolvency review for caf\u0026eacute; business; supplier filed for unpaid invoices\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJohn Lin\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eWorkplace conflict\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDispensing-procedure error triggered client complaint; disciplinary investigation initiated\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJane Moreno\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eMedical emergency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFather had a stroke; hospitalised in intensive care; asked to review treatment consent\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eKlaus Mueller\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHousing instability\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDorm building scheduled for urgent renovation; forced relocation within 30 days\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAyesha Khan\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAcademic/financial\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMerit scholarship placed on hold; tuition balance due within 10 days\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eMental Health Assessment\u003c/h2\u003e \u003cp\u003eAgent mental health was assessed using two standardised self-report instruments: the Patient Health Questionnaire-9 (PHQ-9)(Kroenke et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2001\u003c/span\u003e), a 9-item measure of depressive symptom severity, and the Kessler Psychological Distress Scale (K10)(Kessler et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2002\u003c/span\u003e), a 10-item measure of non-specific psychological distress. Both instruments were chosen because they are among the most widely used screening tools in psychiatric research and clinical practice and produce ordinal sum scores suitable for longitudinal comparison.\u003c/p\u003e \u003cp\u003eTo administer these instruments, prompts were constructed that included the agent\u0026rsquo;s current persona description along with contextually relevant memories retrieved via the standard retrieval mechanism (combining recency, importance, and relevance scores). The agent\u0026rsquo;s personality traits were included in the prompt to ensure character-consistent responding. The LLM was then asked to respond to each questionnaire item on the standard response scale. Responses were parsed and summed to produce total scores for each instrument. Assessments were scheduled at the end of each 12-hour simulation segment, yielding two assessment time points per simulated day (approximately at 12:00 and 24:00 in simulated time). Over the full five-day simulation, this produced 10 assessment time points per agent per instrument.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eNeuroticism Moderation Analysis\u003c/h3\u003e\n\u003cp\u003eTo examine whether personality traits moderate the impact of adverse life events on mental health outcomes, we conducted a branching experiment manipulating neuroticism levels. Starting from the shared baseline state at the end of Day 3, the simulation was forked into three parallel branches for the six target agents:\u003c/p\u003e \u003cp\u003e \u003cstrong\u003e(1) Default neuroticism\u003c/strong\u003e \u003cp\u003eThe original personality profiles were retained as defined in the agents\u0026rsquo; biographical files (the main analysis described above).\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003e(2) High neuroticism\u003c/strong\u003e \u003cp\u003eThe personality descriptions of the six target agents were modified to include elevated neuroticism-related traits (e.g., heightened emotional reactivity, tendency toward worry and rumination, sensitivity to negative events). These modified descriptions were loaded into a separate branch of the simulation that otherwise replicated the identical adverse event schedule.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003e(3) Low neuroticism\u003c/strong\u003e \u003cp\u003eThe personality descriptions were modified to include low neuroticism-related traits (e.g., emotional stability, resilience under stress, capacity to maintain equanimity). This branch also received the identical adverse event schedule.\u003c/p\u003e \u003c/p\u003e \u003cp\u003eImportantly, all three branches inherited the complete memory state from the shared three-day baseline, ensuring that any observed differences in post-event mental health scores could be attributed to the personality manipulation rather than divergent experiential histories. Each branch ran for two additional simulated days (Days 4\u0026ndash;5), with the same assessment schedule as described above. The same six target agents (n\u0026thinsp;=\u0026thinsp;6) were assessed across all three conditions, yielding a within-agent, between-condition comparison.\u003c/p\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eStatistical Analysis\u003c/h2\u003e \u003cp\u003eAll statistical analyses were conducted in R (R Core Team, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). To test the primary hypothesis that adverse life events increase psychological distress, we compared post-event PHQ-9 and K10 scores between exposed (n\u0026thinsp;=\u0026thinsp;6) and non-exposed (n\u0026thinsp;=\u0026thinsp;19) agents using independent-samples Welch\u0026rsquo;s t-tests at each post-event assessment time point.\u003c/p\u003e \u003cp\u003eTo model symptom trajectories over time, we fitted linear mixed-effects models using the lme4 package with PHQ-9 and K10 total scores as dependent variables. Fixed effects included time (assessment time point), group (exposed vs. non-exposed), and their interaction. Agent identity was included as a random intercept to account for repeated measurements within individuals. A significant time \u0026times; group interaction was interpreted as evidence that adverse events altered the trajectory of mental health scores.\u003c/p\u003e \u003cp\u003eFor the neuroticism moderation analysis, we fitted linear mixed-effects models with post-event total scores as the dependent variable, neuroticism condition (low, average, high) and baseline score as fixed effects, and agent identity as a random intercept. Post-hoc pairwise comparisons were conducted using estimated marginal means with Tukey correction for multiple comparisons. Effect sizes are reported as standardised mean differences (SMD) derived from the emmeans package.\u003c/p\u003e \u003cp\u003eTo examine whether neuroticism differentially moderates specific symptoms, we fitted separate mixed-effects models for each questionnaire item, with item score as the dependent variable, neuroticism condition and baseline item score as fixed effects, and agent as a random intercept. P-values were corrected for multiple comparisons using the false discovery rate (FDR) method. Effect sizes for the high-versus-low neuroticism contrast were computed as standardised mean differences from estimated marginal means.\u003c/p\u003e \u003cp\u003eGiven the exploratory nature of this study and the relatively small sample of agents, we report both p-values and effect sizes (standardised mean differences for pairwise comparisons). The significance threshold was set at α\u0026thinsp;=\u0026thinsp;0.05 for all tests.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eConversational Behaviour Analysis:\u003c/h2\u003e \u003cp\u003eTo assess whether adverse events and personality traits influence naturalistic social behaviour, we analysed all post-event conversations involving target agents. Each conversation was classified by GPT-4.1 for five behavioural features: (1) personal difficulty mentioned, (2) negative emotion expressed, (3) support seeking, (4) partner empathy shown, and (5) partner practical help offered. Classification was performed at the conversation level using a structured prompt that presented the full conversation text and requested binary (present/absent) ratings for each feature. Rates of each feature were compared between exposed and non-exposed agents within each neuroticism condition using Pearson chi-squared tests and Fisher's exact tests, with phi coefficients and odds ratios as effect size measures.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eData and Code Availability\u003c/h2\u003e \u003cp\u003eThe simulation and analysis pipeline is implemented as an open-source toolbox (psyagent) and is available together with all agent configuration files, prompt templates, and simulation logs at \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://github.com/kambeitzlab/psyagent\u003c/span\u003e\u003cspan address=\"https://github.com/kambeitzlab/psyagent\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eGenerative AI:\u003c/h2\u003e \u003cp\u003eDuring the preparation of this manuscript, the author used ChatGPT 5.2 (OpenAI) and Claude Opus 4.6 (Anthropic) for language editing and manuscript revision. The author reviewed and edited all AI-generated output and takes full responsibility for the content of the publication.\u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eJ.K. designed the analysis, wrote the code and analysis and wrote the manuscript. AI (ChatGPT 5.3 and Claude Opus 4.6) was used during designing of this analysis, as part of the simulation code, during the analysis and during the writing of the manuscript. All text and code was reviewed by the author.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe simulation and analysis pipeline is implemented as an open-source toolbox (psyagent) and is available together with all agent configuration files, prompt templates, and simulation logs at https://github.com/kambeitzlab/psyagent.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAlon, N., Macrynikola, N., Jester, D.J., Keshavan, M., Reynolds, C.F., 3rd, Saxena, S., Thomas, M.L., Torous, J., Jeste, D.V., 2024. Social determinants of mental health in major depressive disorder: Umbrella review of 26 meta-analyses and systematic reviews. Psychiatry Res. 335, 115854.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eArango, C., Dragioti, E., Solmi, M., Cortese, S., Domschke, K., Murray, R.M., Jones, P.B., Uher, R., Carvalho, A.F., Reichenberg, A., Shin, J., Ii, Andreassen, O.A., Correll, C.U., Fusar-Poli, P., 2021. Risk and protective factors for mental disorders beyond genetics: an evidence-based atlas. World Psychiatry 20, 417\u0026ndash;436.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBail, C.A., 2024. Can Generative AI improve social science? Proc. Natl. Acad. Sci. U. S. A. 121, e2314021121.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBen-Zion, Z., Witte, K., Jagadish, A.K., Duek, O., Harpaz-Rotem, I., Khorsandian, M.-C., Burrer, A., Seifritz, E., Homan, P., Schulz, E., Spiller, T.R., 2025. Assessing and alleviating state anxiety in large language models. NPJ Digit. Med. 8, 132.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBonabeau, E., 2002. Agent-based modeling: methods and techniques for simulating human systems. Proc. Natl. Acad. Sci. U. S. A. 99 Suppl 3, 7280\u0026ndash;7287.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen, J., Wang, X., Xu, R., Yuan, S., Zhang, Y., Shi, W., Xie, J., Li, S., Yang, R., Zhu, T., Chen, A., Li, N., Chen, L., Hu, C., Wu, S., Ren, S., Fu, Z., Xiao, Y., 2024. From persona to personalization: A survey on role-Playing Language Agents. arXiv [cs.CL].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCoda-Forno, J., Witte, K., Jagadish, A.K., Binz, M., Akata, Z., Schulz, E., 2023. Inducing anxiety in large language models increases exploration and bias. arXiv [cs.CL].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCummins, J., 2025. The threat of analytic flexibility in using large language models to simulate human data: A call to attention. arXiv [cs.CY]. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2509.13397\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2509.13397\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDragioti, E., Radua, J., Solmi, M., Arango, C., Oliver, D., Cortese, S., Jones, P.B., Il Shin, J., Correll, C.U., Fusar-Poli, P., 2022. Global population attributable fraction of potentially modifiable risk factors for mental disorders: a meta-umbrella systematic review. Mol. Psychiatry 27, 3510\u0026ndash;3519.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFerraro, A., Galli, A., La Gatta, V., Postiglione, M., Orlando, G.M., Russo, D., Riccio, G., Romano, A., Moscato, V., 2024. Agent-Based Modelling meets generative AI in social network simulations. arXiv [cs.SI]. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2411.16031\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2411.16031\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFowler, J.H., Christakis, N.A., 2008. Dynamic spread of happiness in a large social network: longitudinal analysis over 20 years in the Framingham Heart Study. BMJ 337, a2338.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGBD 2021 Diseases and Injuries Collaborators, 2024. Global incidence, prevalence, years lived with disability (YLDs), disability-adjusted life-years (DALYs), and healthy life expectancy (HALE) for 371 diseases and injuries in 204 countries and territories and 811 subnational locations, 1990\u0026ndash;2021: a systematic analysis for the Global Burden of Disease Study 2021. Lancet 403, 2133\u0026ndash;2161.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKambeitz, J., Meyer-Lindenberg, A., 2025. Modelling the impact of environmental and social determinants on mental health using generative agents. NPJ Digit. Med. 8. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41746-024-01422-z\u003c/span\u003e\u003cspan address=\"10.1038/s41746-024-01422-z\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKambeitz, J., Schiffman, J., Kambeitz-Ilankovic, L., Mittal, V.A., Ettinger, U., Vogeley, K., 2025. The empirical structure of psychopathology is represented in large language models. Nat. Ment. Health 1\u0026ndash;11.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKendler, K.S., Kuhn, J., Prescott, C.A., 2004. The interrelationship of neuroticism, sex, and stressful life events in the prediction of episodes of major depression. Am. J. Psychiatry 161, 631\u0026ndash;636.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKessler, R.C., Andrews, G., Colpe, L.J., Hiripi, E., Mroczek, D.K., Normand, S.L.T., Walters, E.E., Zaslavsky, A.M., 2002. Short screening scales to monitor population prevalences and trends in non-specific psychological distress. Psychol. Med. 32, 959\u0026ndash;976.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKotov, R., Gamez, W., Schmidt, F., Watson, D., 2010. Linking \u0026ldquo;big\u0026rdquo; personality traits to anxiety, depressive, and substance use disorders: a meta-analysis. Psychol. Bull. 136, 768\u0026ndash;821.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKroenke, K., Spitzer, R.L., Williams, J.B., 2001. The PHQ-9: validity of a brief depression severity measure. J. Gen. Intern. Med. 16, 606\u0026ndash;613.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi, J., Wang, S., Zhang, M., Li, W., Lai, Y., Kang, X., Ma, W., Liu, Y., 2024. Agent Hospital: A simulacrum of hospital with evolvable medical agents. arXiv [cs.AI].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMarinescu, I.E., Lawlor, P.N., Kording, K.P., 2018. Quasi-experimental causality in neuroscience and behavioural research. Nat. Hum. Behav. 2, 891\u0026ndash;898.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcGorry, P.D., Mei, C., Dalal, N., Alvarez-Jimenez, M., Blakemore, S.-J., Browne, V., Dooley, B., Hickie, I.B., Jones, P.B., McDaid, D., Mihalopoulos, C., Wood, S.J., El Azzouzi, F.A., Fazio, J., Gow, E., Hanjabam, S., Hayes, A., Morris, A., Pang, E., Paramasivam, K., Quagliato Nogueira, I., Tan, J., Adelsheim, S., Broome, M.R., Cannon, M., Chanen, A.M., Chen, E.Y.H., Danese, A., Davis, M., Ford, T., Gonsalves, P.P., Hamilton, M.P., Henderson, J., John, A., Kay-Lambkin, F., Le, L.K.-D., Kieling, C., Mac Dhonnag\u0026aacute;in, N., Malla, A., Nieman, D.H., Rickwood, D., Robinson, J., Shah, J.L., Singh, S., Soosay, I., Tee, K., Twenge, J., Valmaggia, L., van Amelsvoort, T., Verma, S., Wilson, J., Yung, A., Iyer, S.N., Killackey, E., 2024. The Lancet Psychiatry Commission on youth mental health. Lancet Psychiatry 11, 731\u0026ndash;774.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMonroe, S.M., Simons, A.D., 1991. Diathesis-stress theories in the context of life stress research: implications for the depressive disorders. Psychol. Bull. 110, 406\u0026ndash;425.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e\u0026Ouml;ng\u0026uuml;r, D., Paulus, M.P., 2025. Embracing complexity in psychiatry-from reductionistic to systems approaches. Lancet Psychiatry 12, 220\u0026ndash;227.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOrmel, J., Oldehinkel, A.J., Brilman, E.I., 2001. The interplay and etiological continuity of neuroticism, difficulties, and life events in the etiology of major and subsyndromal, first and recurrent depressive episodes in later life. Am. J. Psychiatry 158, 885\u0026ndash;891.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePark, J.S., O\u0026rsquo;Brien, J.C., Cai, C.J., Morris, M.R., Liang, P., Bernstein, M.S., 2023. Generative agents: Interactive simulacra of human behavior. arXiv [cs.HC].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePark, J.S., Zou, C.Q., Shaw, A., Hill, B.M., Cai, C., Morris, M.R., Willer, R., Liang, P., Bernstein, M.S., 2024. Generative agent simulations of 1,000 people. arXiv [cs.AI].\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eR Core Team, 2024. R: A Language and Environment for Statistical Computing.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSad\u0026eacute;e, C., Testa, S., Barba, T., Hartmann, K., Schuessler, M., Thieme, A., Church, G.M., Okoye, I., Hernandez-Boussard, T., Hood, L., Shmulevich, I., Kuhl, E., Gevaert, O., 2025. Medical digital twins: enabling precision medicine and medical artificial intelligence. Lancet Digit. Health 0, 100864.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eScheffer, M., Bockting, C.L., Borsboom, D., Cools, R., Delecroix, C., Hartmann, J.A., Kendler, K.S., van de Leemput, I., van der Maas, H.L.J., van Nes, E., Mattson, M., McGorry, P.D., Nelson, B., 2024a. A dynamical systems view of psychiatric disorders-practical implications: A review: A review. JAMA Psychiatry 81, 624\u0026ndash;630.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eScheffer, M., Bockting, C.L., Borsboom, D., Cools, R., Delecroix, C., Hartmann, J.A., Kendler, K.S., van de Leemput, I., van der Maas, H.L.J., van Nes, E., Mattson, M., McGorry, P.D., Nelson, B., 2024b. A dynamical systems view of psychiatric disorders-theory: A review: A review. JAMA Psychiatry 81, 618\u0026ndash;623.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSerapio-Garc\u0026iacute;a, G., Safdari, M., Crepy, C., Sun, L., Fitz, S., Romero, P., Abdulhai, M., Faust, A., Matarić, M., 2025. A psychometric framework for evaluating and shaping personality traits in large language models. Nat. Mach. Intell. 7, 1954\u0026ndash;1968.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShanahan, M., McDonell, K., Reynolds, L., 2023. Role play with large language models. Nature 623, 493\u0026ndash;498.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTracy, M., Cerd\u0026aacute;, M., Keyes, K.M., 2018. Agent-based modeling in public health: Current applications and future directions. Annu. Rev. Public Health 39, 77\u0026ndash;94.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWright, A.G.C., Ringwald, W.R., Vize, C.E., Eichstaedt, J.C., Angstadt, M., Taxali, A., Sripada, C., 2026. Assessing personality using zero-shot generative AI scoring of brief open-ended text. Nat. Hum. Behav. 1\u0026ndash;15.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang, Y., Langellier, B.A., Stankov, I., Purtle, J., Nelson, K.L., Reinhard, E., Van Lenthe, F.J., Diez Roux, A.V., 2020. Public transit and depression among older adults: using agent-based models to examine plausible impacts of a free bus policy. J. Epidemiol. Community Health 74, 875\u0026ndash;881.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYang, Z., Zhang, Z., Zheng, Z., Jiang, Y., Gan, Z., Wang, Z., Ling, Z., Chen, J., Ma, M., Dong, B., Gupta, P., Hu, S., Yin, Z., Li, G., Jia, X., Wang, L., Ghanem, B., Lu, H., Lu, C., Ouyang, W., Qiao, Y., Torr, P., Shao, J., 2024. OASIS: Open Agent Social Interaction Simulations with One Million Agents. arXiv [cs.CL]. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2411.11581\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2411.11581\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZubin, J., Spring, B., 1977. Vulnerability\u0026ndash;a new view of schizophrenia. J. Abnorm. Psychol. 86, 103\u0026ndash;126.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"npj-digital-medicine","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjdigitalmed","sideBox":"Learn more about [npj Digital Medicine](http://www.nature.com/npjdigitalmed/)","snPcode":"41746","submissionUrl":"https://submission.springernature.com/new-submission/41746/3","title":"npj Digital Medicine","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-9291631/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9291631/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eMental disorders are largely shaped by socio-environmental adversity, yet causal experiments on major stressors are typically infeasible in humans. Here, we propose a simulation framework using LLM-based generative agents to model the behaviour and mental health of individuals. A total of 25 agents were simulated for 5 days, with 6 agents being exposed to personalised adverse events while continuously monitoring mental health with established clinical instruments. Adversity produced acute increases in depression and stress-related symptoms relative to controls, with elevated scores persisting thereafter. Moreover, the impact of adverse events was moderated by vulnerability factors: agents with high neuroticism showed stronger increases relative to agents with low neuroticism. Analysis of agents\u0026rsquo; conversations revealed that exposed agents disclosed personal difficulties and sought support at rates that increased with neuroticism level, providing converging behavioural evidence. Overall, the present results support the feasibility of LLM-based generative agents for controlled, counterfactual experiments on socio-environmental drivers of mental health.\u003c/p\u003e","manuscriptTitle":"LLM-Based Generative Agents Simulate Mental-Health Trajectories Under Adverse Socio-Environmental Conditions","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-28 00:05:15","doi":"10.21203/rs.3.rs-9291631/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewerAgreed","content":"229882404449790306270305458492863222722","date":"2026-05-14T16:50:09+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-02T02:51:30+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"97497857085051732912735575769825808627","date":"2026-04-20T14:17:37+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-20T01:47:36+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-05T23:23:19+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-04-04T23:53:16+00:00","index":"","fulltext":""},{"type":"submitted","content":"npj Digital Medicine","date":"2026-04-01T11:56:33+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"npj-digital-medicine","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjdigitalmed","sideBox":"Learn more about [npj Digital Medicine](http://www.nature.com/npjdigitalmed/)","snPcode":"41746","submissionUrl":"https://submission.springernature.com/new-submission/41746/3","title":"npj Digital Medicine","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"90a5d5f9-4c53-4d01-9f3d-efc73acf8d37","owner":[],"postedDate":"April 28th, 2026","published":true,"recentEditorialEvents":[{"type":"reviewerAgreed","content":"229882404449790306270305458492863222722","date":"2026-05-14T16:50:09+00:00","index":22,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-02T02:51:30+00:00","index":16,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":67121492,"name":"Biological sciences/Psychology"},{"id":67121493,"name":"Social science/Psychology"},{"id":67121494,"name":"Health sciences/Risk factors"}],"tags":[],"updatedAt":"2026-04-28T00:05:15+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-28 00:05:15","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9291631","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9291631","identity":"rs-9291631","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00