Collaborative quality assessment of simulation-based interactive learning environments for climate education | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Collaborative quality assessment of simulation-based interactive learning environments for climate education Jefferson K. Rajah, Andreas Nicolaidis Lindqvist, Theresia B. Putranti, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8843292/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Simulation-based interactive learning environments (ILEs) are widely used in climate education to foster systems understanding and experiential learning. Yet, most evaluations emphasize predefined learning outcomes, offering limited insight into how learners experience these tools and what quality means to them. This study introduces an adapted Collaborative Quality Assessment (Co-QA) framework as a participatory, developmental evaluation approach for simulation-based ILEs. This approach invites users to co-define quality criteria and assess tools against those criteria, enabling a context-sensitive appraisal of instructional design decisions. We applied Co-QA to En-ROADS, a climate policy simulator, across eight workshops with 104 learners. Our mixed-methods design combined qualitative and quantitative assessments structured around five quality principles: salience, accessibility, credibility, legitimacy, and effectiveness for systems understanding. By foregrounding learner perspectives, Co-QA reveals process-based dimensions of quality that conventional outcome metrics may overlook. Our findings indicate that the ILE supports broad exploration of climate policies and awareness of the systemic nature of climate action, but that opaque causal mechanisms and limited social contextualization impede deeper systems understanding and actionability. We position Co-QA as a formative design cycle that generates actionable implications for design. Specifically, we recommend (1) micro-explanations and visualizations to enhance transparency; (2) narratives and impact visuals to improve representational relatability; (3) broadened policy space to support personally relevant experimentation; and (4) implementation framing and pathways to bolster feasibility and actionability of climate solutions. We contend that Co-QA helps advance transparent, participatory, and evidence-based standards for evaluating and designing simulation-based ILEs for climate education. Environmental Policy simulation-based learning interactive learning environments climate education knowledge quality assessment collaborative formative evaluation developmental research. Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Introduction Simulation-based interactive learning environments (ILEs) are often deployed as an educational technology to expose learners to the model-based insights from a formal simulation model. Beyond the model itself, these tools typically comprise a human-computer interaction interface and a gaming functionality (Maier & Größler, 2000 ; Rouwette et al., 2004 ). They enable learners to actively engage with the model by making decisions to manipulate inputs and observe real-time simulated outcomes, all within a game-like setting that includes decision timing, user competition, and a contextual narrative. Like most ILEs, simulation-based ones are grounded in experiential learning theory (Kolb, 2015 ), which emphasizes learning through doing, reflection, and iterative feedback. Simulation-based ILEs further emphasize learning objectives related to “declarative knowledge (knowing that) as well as procedural knowledge (knowing how) and structural knowledge (knowing why)” (Maier & Größler, 2000 , p. 139). For learners, this means developing a deeper understanding of the modelled system by actively experimenting with decisions, receiving feedback, and refining their mental models of the system’s underlying causal logic (Deegan et al., 2014 ; Kopainsky & Alessi, 2015 ; Kopainsky & Sawicka, 2011 ). In this way, the instructional goal of such ILEs tends to be directed towards fostering systems understanding and informed decision-making through simulation-based experimentation. In the domain of climate education, simulation-based ILEs have gained prominence through tools like Climate Rapid Overview and Decision Support (C-ROADS; Sterman et al., 2012 , 2013 , 2015 ) and Energy Rapid Overview and Decision Support (En-ROADS; Kapmeier et al., 2021 ; Rooney-Varga et al., 2020 ). While C-ROADS focuses on testing country-level and regional emissions pledges, En-ROADS explores the global cross-sector climate impacts of various climate mitigation options, including carbon pricing, renewable energy adoption, and land-use changes (Sterman et al., 2013 ). En-ROADS provides an interactive environment for learners to test climate-relevant solutions, supported by facilitated learning formats: the En-ROADS Climate Workshop (ECW) and the Climate Action Simulation (CAS) role-playing game (see Climate Interactive, 2025 ). In both formats, learners work with the educational technology to identify and test strategies for limiting global temperature rise to well below 2℃ by 2100. The role-playing game has the added complexity of climate negotiations, where learners represent diverse and often conflicting interest groups. Today, En-ROADS is considered a state-of-the-art simulation-based ILE, with a cumulative reach of over 350,000 learners across 165 countries (Climate Interactive, 2025 ). It has been used in classroom settings, policy workshops, and corporate training to foster systems thinking and collaborative problem-solving. Its educational impact has also been studied, showing significant increases in learners’ understanding of climate change causes, impacts, and solutions as well as personal and emotional engagement with climate issues (Rooney-Varga et al., 2020 ). A follow-up study suggests that longer-term outcomes on understanding, affective engagement, intent to act, and real-world action persist over time (Rooney-Varga et al., 2025 ). While evaluations of En-ROADS have demonstrated its effectiveness in promoting climate literacy, they have primarily focused on predefined learning outcomes such as knowledge gains and behavioural intentions. Such evaluation frameworks are valuable for understanding the ILE’s broad educational impacts. However, predefined outcomes-based metrics may overlook the experiential, process-based dimensions that shape user engagement and learning. In other words, they offer limited insight into how learners experience the ILE, particularly in terms of usability and alignment with a plurality of learning goals and criteria for evaluating its quality. This, in turn, limits our understanding of how such ILEs might be better tailored to diverse learner needs. To enrich this perspective, we propose a complementary collaborative evaluation framework that invites learners to co-define quality criteria based on their own goals and interactions with the ILE. This approach supports a more context-sensitive and responsive assessment of simulation-based ILEs. In developing this framework, we turned to scholarship on knowledge quality assessment (KQA) in contexts where science is used to educate and inform action. Cash et al. ( 2003 ) introduced the foundational triad of quality criteria for evaluating knowledge system’s fitness for addressing societal challenges: salience (relevance to user needs), credibility (scientific rigour), and legitimacy (trust in knowledge production process). Since then, scholars have expanded these criteria to include dimensions like usability (accessibility and applicability to users), effectiveness (contribution to positive change) (Belcher et al., 2016 ; Lemos & Morehouse, 2005 ). Bremer et al. ( 2022 , p. 2) caution against uncritical applications of such “ a priori principles of quality,” arguing that they may obscure the nuanced and contingent ways in which quality is understood in particular user contexts. To address this concern, Bremer et al. ( 2021 ) built on earlier work on the science-policy interface and knowledge co-production (Bremer et al., 2019 ) to develop the Collaborative Quality Assessment (Co-QA) framework. As a KQA technology, Co-QA supports systematic, critical analysis of uncertainties, assumptions and dissent relative to science’s fitness for function in public decision-making (van der Sluijs et al., 2008 ). It was designed for co-production processes in climate services, enabling ‘users’ and ‘producers’ of climate information to collaboratively define and assess quality criteria that are meaningful within their specific use contexts. In doing so, it offers “a way of bridging knowledge quality expectations across all actors in a knowledge system” (Bremer et al., 2021 , p. 4). This study draws on Co-QA to co-evaluate a simulation-based ILE – particularly En-ROADS – for assessing its quality for climate education. Here, we must first come with some caveats. First, Co-QA was developed in the context of climate services, where ‘user groups’ in some contexts have quite clearly defined decision-making needs and expectations regarding climate information products. This presupposition presents a challenge when applied to simulation-based ILEs, which are broadly designed for exploratory and educational engagement. Here, learners may not enter with predefined needs or expectations that can be meaningfully elicited without prior experience with the tool. Second, the Co-QA framework was proposed to be filled through the direct participation of actors engaged in co-production, identifying and writing down their own quality criteria, demanding that a small group meet and interact over an extended period. Yet, the ILE studied here engages a relatively large group of learners, lasting a few hours. This demanded a design for eliciting criteria from a large group in a limited time. Our co-evaluation framework therefore began with a series of workshops with a sample of learners to collaboratively define quality criteria. These criteria were then mobilized in subsequent group interviews and surveys to iteratively evaluate the ILE through critical dialogue with learners. This adaptation facilitates reflection on how learners perceive the ILE’s quality in relation to their educational goals and needs. In the next section, we detail our adapted Co-QA framework for co-evaluating En-ROADS as a case study. As alluded to, our approach has two distinct phases: (1) workshops to collaboratively define learners’ quality criteria, which are then used to develop evaluation instruments, and (2) workshops to collaboratively assess the quality of En-ROADS using the developed instruments. We then present the results of our Co-QA across five quality principles abstracted from the co-defined quality criteria. While En-ROADS serves as the context of this study, our aim is to demonstrate how a Co-QA approach can generate additional insights into the design and relevance of simulation-based ILEs more broadly. We conclude by discussing these insights in relation to the strengths and limitations of our approach, and by reflecting on opportunities for advancing Co-QA of simulation-based ILEs. Materials and Methods This study employs a mixed-methods, developmental evaluation approach, integrating both qualitative and quantitative data in the collaborative quality assessment. In this section, we outline the methodological process undertaken for the collaborative quality assessment, as illustrated in Fig. 1 , including the strategies used for data collection and analysis in each phase of the process. This evaluation approach is conducted within workshop settings where participants engage with En-ROADS, either in the ECW or the CAS format. We selected En-ROADS due to its status as a state-of-the-art simulation-based ILE, and because two authors are trained En-ROADS Climate Ambassadors, making it well-suited for use in facilitated workshops and data collection. The evaluation comprises two phases: (1) co-creation of learner-defined quality criteria via thematic analysis of post-workshop group interviews; and (2) structured co-evaluation using a survey within facilitated group interviews to collect descriptive quantitative evidence and qualitative explanations. Participation in the assessment was entirely voluntary, and participants could decline to answer any question at any time. This design is intentionally formative. That is, the data collection and analysis serves to demonstrate our Co-QA approach and generate evidence-linked design guidance consistent with developmental research priorities in instructional technology (Richey et al., 2004 ). Co-creation of quality criteria To elicit the quality criteria relevant to learners, we conducted three workshops with graduate students at two different universities (see Table 1 ). After playing through En-ROADS, learners co-evaluated the tool in facilitated discussion. Here, we used semi-structured group interviews to elicit learners’ perceptions of the fitness for purpose, trust in the information provided, impression of the learning experience, and suggested improvements to better fulfil their learning goals (see Appendix A). Interview transcripts were analysed to identify emergent quality criteria within the learning context, capturing the process-based dimensions of user engagement. Table 1 Sources of data collection for the co-creation of quality criteria. University names have been removed for double-blind peer review. Date Location Format N Participant Profile 28 Nov 2023 University in Norway CAS 20 Master’s students in system dynamics 11 Jan 2024 University in Germany ECW 10 Master’s students in sustainability and digitalization 21 Feb 2024 University in Norway ECW 21 Master’s students in sustainability For the data analysis, we uploaded the transcripts to NVivo 14, where we employed an inductive and iterative thematic analysis at the latent level (Braun & Clarke, 2006 ) to interpret ways participants expressed judgement and discussed quality. We selected this method for its flexibility and suitability in uncovering themes that emerge from learners’ “ underlying ideas, assumptions, and conceptualizations” (Braun & Clarke, 2006 , p. 84) – i.e., the implicit quality criteria employed in their evaluations. Latent themes and interview talk was then coded relative to ‘quality criteria’, and these criteria ordered in relation to each other. Finally, emergent criteria codes were viewed in concert with established quality principles from the literature. This deductive step aligns learners’ implicit quality criteria to established theory while remaining sensitive to context (Bremer et al., 2022 ). We finished with a set of quality criteria that emerged at this ‘bottom-up’(empirical) meets ‘top-down’ (theoretical) interface (see also Appendix B). Assessment interviews and surveys Drawing on the co-defined quality criteria in the first exploratory phase, we developed a survey to collect quantitative data and, importantly, structure the critical dialogue for evaluating En-ROADS. Quality was operationalized as five dimensions: salience, accessibility, credibility, legitimacy, and effectiveness. The quality dimensions are treated as latent constructs that are not directly observable; instead, they are inferred through observable indicators (i.e., criteria elicited from learners) that describe what increasing levels of the quality dimension look like. Indicators are then operationalized into Likert-type survey items, which are the specific questions or statements posed to learners during the evaluation. Responses (on 4-point or 5-point scales) reflect learners’ appraisal of En-ROADS relative to each quality criterion and serve as evidence for inferring the latent quality of the ILE experience. We deployed the survey with a total of 104 learners after using En-ROADS: seven groups of graduate students and one group of high school students (see Table 2 ). The evaluation was conducted after the facilitated learning workshop and structured by the survey, using Mentimeter (interactive presentation tool). Learners first responded to the survey questions individually and then collectively reflected on the scoring after each response. The group interview was facilitated by two guiding questions: (1) What are some important features or aspects of En-ROADS that influenced your rating? (2) Can you think of additional features or changes you would make to enhance the performance of En-ROADS for this dimension? Through this approach, we capture both quantitative data in terms of learner’s rating of the ILE relative to quality criteria and qualitative insights on the underlying reasoning for these scores. Table 2 Sources of data collection for the assessment. University names have been removed for double-blind peer review. Date Location Format N Participant Profile 16 Sep 2024 University in Norway ECW 11 Master’s students in geography 20 Sep 2024 University in Switzerland ECW 8 Master’s students in agriculture 03 Oct 2024 University in Iceland ECW 8 Master’s students in coastal communities and regional development 08 Nov 2024 High School in Switzerland CAS 32 Adult high school students 25 Nov 2024 University in Norway CAS 17 Master’s students in system dynamics 22 Aug 2025 University in Norway CAS 18 Master’s students in system dynamics 19 Sep 2025 University in Switzerland ECW 2 Master’s students in agriculture 03 Oct 2025 University in Norway ECW 8 Master’s students in geography Of the 104 learners who participated in the workshop, 88 responded to more than three survey items and were included in this study. 64% of the respondents belonged to the 18–25 age group, 19% to the 26–35 group, 3% were older than 35, and 14% did not answer. 67% of the respondents were European, 8% North American, 7% West and East Asian, 2% North African, and 16% did not answer. As for education level, 49% held a bachelor’s degree, 11% held a master’s degree, 26% were non-degree holders, and 14% did not answer. During the data collection period, we iterated on the survey items based on feedback and preliminary analysis of the scores. In some co-evaluation workshops, the facilitator also had to skip a few questions given time constraints in the classroom setting. As a result, items were systematically excluded for some groups in the final dataset. The quantitative data were analysed in R (version 4.5.1) to summarize the descriptive statistics of the responses. Specifically, we calculated the percentage frequency of each response category and visualized the distribution of the raw scores for each item (i.e., quality criteria). Given the formative purpose of the evaluation, we report descriptive statistics (frequency distributions) rather than inferential statistics. The qualitative data (transcripts of the co-evaluation) were analysed in NVivo 14 using the deductive thematic analysis approach (Braun & Clarke, 2006 ). Given that the evaluation was structured by the survey, the co-defined quality criteria (used as child codes) and the broader quality principles (used as parent codes) formed the coding framework (see Appendix C). Results Through our exploratory co-creation phase, we identified 17 quality criteria under five broader constructs or principles: salience, accessibility, credibility, legitimacy, and effectiveness for building systems understanding competency (see Fig. 2 ). For each construct, we juxtapose the distribution of survey responses with qualitative insights from the explanations groups offered for their rating during the critical reflections. Salience Salience broadly refers to the relevance of the knowledge or service provided in relation to the key priorities and concerns of its users (Cash et al., 2003 ). In context of En-ROADS, salience refers to how relevant, relatable, and practical information and insights are for learners. Learners deemed the ILE to be salient when it provides practical insights on climate change and mitigation, offers opportunities to challenge their mental models or preconceptions, and motivates climate action in their everyday lives. Figure 3 depicts the distribution of valid responses for the Salience construct across the three co-defined criteria: Practical Insights, Experimentation, and Inspire Action. Practical Insights. At least two-thirds of the respondents rated the ILE positively for providing practical insights (36.2% fair amount; 31.2% a lot). Learners valued the wide scope of climate solutions spanning multiple sectors. Several noted that experimenting with policy options raised their awareness that addressing climate change requires a combination of strategies rather than a single “silver bullet.” As for the remaining third, their ratings stemmed largely from an inability to personally relate to the information provided. Learners frequently noted that policy effects were presented in terms of systemic variables (e.g. temperature rise) rather than societal impacts on livelihoods. They wanted to know what their everyday lives would look like under certain policy scenarios, including the trade-offs involved. One group even proposed “flipping the script,” where the primary goal is designing future sustainable ways of living, with climate outcomes viewed as consequences of those scenarios. Experimentation. Most respondents (57.5%) indicated that they were able to test a fair amount of their preconceived ideas using the ILE. Learners generally perceived En-ROADS as offering a good coverage of policies found in mainstream discourse. They also noted that the simulations confirmed their understanding of major drivers of climate change, yet they were at times surprised by the effects of certain policies, which prompted a desire to explore the underlying mechanisms further. Conversely, 30% of respondents reported being able to test few or none of their ideas. These negative ratings were primarily attributed to the level of aggregation in En-ROADS as a global model. Specifically, several learners mentioned that they could not test policies related to individual lifestyle choices or more granular sector-specific interventions. Others expressed interest in testing “radical solutions” beyond mainstream discourse, such as degrowth policies, that were not offered in the ILE. Inspire Action. The En-ROADS ILE was less effective in motivating climate action among learners. Nearly half of the respondents indicated the ILE only inspired broad notions of climate action that were not actionable in their everyday lives. Learners mentioned that the impactful policies appeared actionable only for “big key players” beyond their locus of control, “unless you want become like a full-time activist.” As a result, some learners reported feeling demoralized, perceiving climate change as inevitable, while others rationalized inaction, believing that individual action was pointless. About one-third of respondents indicated that the ILE motivated climate action that they would at least consider. Here, they emphasized that they became aware that climate mitigation is a collective action problem, and that this awareness could influence attitudes and encourage coordinated efforts to create pressure in areas of the system with the greatest impact. Accessibility Accessibility has been discussed, by Lemos and Morehouse ( 2005 ) for example, as ‘usability’; the extent to which knowledge or a service/product is usable, understandable, and inclusive for a diverse range of users. In our context, this category refers to the usability of the ILE product itself. The ILE is deemed accessible when it can be easily navigated ( Ease of Use ), its elements can be easily interpreted and related to the real-world system ( Relatable Representations ), and its inner workings or causal logic can be easily perceived and understood by learners ( Transparent Inner Workings ). Figure 4 presents the results of the survey across these three co-defined criteria. Ease of Use. Most learners found the En-ROADS ILE to be easily navigable (57%), with some requiring minor clarifications from the facilitator (38%). Across all workshops, there was a consensus that the design of the sliders in the tool was straightforward to use. Although the facilitator reminded learners that the ellipses button provided more in-depth information, one learner suggested that the functionality was easily discoverable either way. An interesting note from one of the groups was that the time pressure prevented them from reading the information provided too thoroughly, leading them to work with the concepts represented by sliders more abstractly. Relatable Representations . In response to the statement that the concepts in the ILE were concretely relatable to learners in terms of their knowledge of the real-world system, about 14% strongly agreed, 49% agreed, 23% were neutral, 12% disagreed and less than 2% strongly disagreed. Here, learners appreciated that the tool provided supplementary contextual information for each slider. However, several learners mentioned that the sliders in the ILE (e.g., taxes/subsidies) failed to provide them with a sense of scale since they did not have a reference value for ascertaining if their input was large or small. Importantly, learners wanted more visualization of climate impacts beyond quantitative figures and pointed to the sea-level induced flooding map of countries as an exemplar. They further requested narratives to contextualize the figures to lived realities. Transparent Inner Workings . About half of the respondents agreed or strongly agreed that they could understand why their inputs results in the observed outputs. About 27% remained neutral while about 20% disagreed or strongly disagreed. A common theme in the discussion is that the En-ROADS model appeared to be a ‘black box’ since the inner workings of the model are not accessible to users. As a result, learners were able to perceive how a change in input resulted in changes in the outputs but could not explain any counterintuitive results. Here, they mentioned that they relied on explanations from the facilitator who is privy to the model. While some learners thought it was acceptable given the nature of the user interface, others wanted the ability to inspect the model structure themselves. Learners specifically desired more transparency in relation to policy interactions; for instance, without the facilitator’s explanation, they would not have known that the impact of the electrification could be nullified by prior policies on renewables. Credibility Credibility relates to user perceptions of the scientific accuracy or adequacy of the arguments and evidence presented by the knowledge system (Cash et al., 2003 ). In our context, learners perceive the ILE to be credible when they consider it to have adequately represented the real-world system ( External Consistency ), accurately captured the dynamics of the system in terms of causal logic ( Internal Consistency ), and provided realistic and feasible insights for climate mitigation ( Policy Feasibility ). The quantitative assessment of the ILE along this construct is depicted in Fig. 5 . External Consistency . Most respondents (52%) found the tool to have adequately represented the real-world system. As previously mentioned, learners perceived the scope of the climate solutions presented in the tool to be fairly comprehensive, especially given its level of aggregation at the global scale and focus on mitigation. Among the other half of respondents who remained neutral or disagreed, they desired more detail complexity. For instance, there was an impression that the model was bias towards the Global North; consequently, learners wanted more disaggregation between regions to capture inequalities in the world. Another major theme was behavioural change – that is, they wanted more specificity in the social system to reflect changing individual preferences and consumption patterns. Internal Consistency . Interestingly, most respondents (56%) were neutral to the statement that tool accurately translated inputs into outputs. During the discussion, it was clear that learners were unable to determine the accuracy given that the causal logic of the model was not accessible to them. They either took the outputs for granted, amounting to “a kind of blind trust in the model” or inferred the causal logic based on their background knowledge and facilitator explanation. Policy Feasibility . Responses were split for assessing the feasibility of climate solutions presented in the ILE: about 48% indicated that a fair amount were feasible, whereas 42% thought only a few were. In general, there was a sentiment that the tool was suitable for assessing the effectiveness of hypothetical policy measures, but not the feasibility of actual implementation. To better assess feasibility, learners suggested that the tool should describe how the policy measures could be implemented in the real-world and consider the costs associated with each policy – both in terms of monetary cost and other social costs and externalities. Legitimacy Legitimacy relates to user trust in the sources and production of information, and the presentation of information in the knowledge system, leading to its acceptance (Cash et al., 2003 ). In our context, learners deemed the ILE to be legitimate when users trust it as an important source of information based on the veracity of its inputs and outputs, beyond simply appealing to the authority and expertise of the facilitator(s). Figure 6 shows the assessment results for the three dimensions of Legitimacy: Appeal to Authority, Trust in Inputs, and Trust in Outputs. Appeal to Authority . Most respondents (76%) indicated that the facilitator’s expertise enhanced their trust in the information provided in the ILE. Only about 15% of respondents indicated that their trust is independent of the facilitator whereas less than 10% were wholly dependent on the facilitator’s expertise. These results suggest that legitimacy in the ILE is partially anchored in social cues of expertise, particularly because learners did not think they would have been able to fully understand the outputs they observed. As one learner explains, “…it required your explanation as to why things [happen]…you could see the result, but you didn’t understand why it was counterintuitive.” This reliance could undermine the tool’s legitimacy as a standalone resource. Trust in Inputs and Outputs . Generally, learners seemed to trust the inputs used to build the ILE with 48% indicating almost all were justifiable and 44% indicating a fair amount. However, trust in outputs was more cautious: most respondents (63%) reported they would use the tool primarily as a supplementary source of climate information rather than a fundamental source (17%). During the discussion, it became clear that learners’ trust in inputs was influenced by an appeal to authority; they relied on the reputation of the modellers from a highly prestigious institution (MIT) despite lacking visibility into the model structure, data sources, or underlying assumptions within the ILE. Consequently, learners expressed greater confidence in the directionality and relative magnitude of changes in outputs rather than their precision. For that purpose, they indicated that they would cross-reference the results with other tools. Several learners also reiterated that their trust would improve if they could “open up the black box” and understand the causal logic driving the model. Effectiveness Simulation-based ILEs are generally intended to foster systems understanding (Kopainsky & Sawicka, 2011 ). In our study, learners engaged with En-ROADS within a systems thinking context and naturally evaluated the tool against its ability to illuminate interrelationships within the system. As one learner put it: “I don’t actually get an actual systems understanding of the drivers of climate change. I only get to see that this is the policy we’re going to put and I’m going to see the consequences.” We mapped this to the quality principle of effectiveness (Belcher et al., 2016 ), which, for our purpose, refers to the actual or potential contribution of the ILE to building systems understanding competency among users. Recognizing that systems understanding is a latent construct, we drew on existing operationalizations for assessing different levels of this understanding (e.g., Stave & Hopper, 2007 ). Accordingly, the ILE is deemed effective for building systems understanding when it helps learners identify the causes and consequences of climate change; make the causal connections between climate change and climate impacts; perceive the interconnected feedback nature of the causal connections; identify how to intervene in the system to mitigate climate change; and evaluate the effectiveness and unintended consequences of policy options. Figure 7 reports the results for systems understanding competency across the five subdimensions. Since the systems understanding statements were presented and evaluated together in a single slide during the session, we report the findings collectively rather than criterion by criterion. Overall, learners perceived a mixed effectiveness of the ILE in fostering systems understanding. Positive ratings were highest for identifying causes and consequences (64%) followed by understanding causal pathways (52%). Importantly, positive ratings declined to less than half for perceiving the more complex dimensions of systems understanding: 46% for leverage points, 42% for unintended consequences, and 33% for feedback loops. Across all dimensions, neutral responses were substantial, which could indicate uncertainty rather than outright disagreement. The first two results are unsurprising given the tool’s focus on climate solutions that address the drivers or causes of climate change in order to mitigate unfavourable climate consequences. Here, experimentation with the policy sliders allowed learners to infer how impacting the drivers could result in changes in climate outcomes. However, during the discussion, learners frequently focused on the poor visibility of the interconnections between variables. They reflected that feedback relationships could only be inferred through the most effective policies, from which they had to work backwards to understand how they were connected. They further mentioned that such inferences could be challenging for those without sufficient knowledge about the relationships between policies and drivers. Given the uncertainty over the feedback loops and their interplay, learners were not confident in their understanding of the leverage points or unintended consequences despite identifying those in the model outputs. Discussion and Conclusions In this study, we have demonstrated a participatory, developmental evaluation approach for evaluating a state-of-the-art simulation-based ILE for climate education, grounded on the Co-QA framework. Rather than focusing solely on predefined learning outcomes, we elicited learner-defined quality criteria and evaluated how specific design decisions shape perceived salience, accessibility, credibility, legitimacy, and effectiveness for systems understanding. Learners deemed En-ROADS to be salient for providing system-level insights and enabling experimentation with prior beliefs but found it less salient for motivating concrete personal climate action. Accessibility was high for ease of use yet constrained by limited transparency of model mechanisms and by representations that were not always relatable in scale or social meaning. Credibility hinged on learners’ perceived adequacy of the coverage of climate solutions and intersectoral representations of the system but was hampered by uncertainty about internal causal logic between inputs and outputs as well as limited discussion on the feasibility of implementing potential solutions. Legitimacy rested strongly on the authority of the facilitator and/or institutional affiliation of the tool given limited accessibility of the inner workings. As for effectiveness, learners were able to identify connections between causes and consequences but struggled to perceive feedback relationships or explain observed leverage points and unintended consequences. These findings suggest that while the ILE affords broad exploration of climate policies that supports awareness of the systemic nature of climate action, it also obscures the causal structure of the simulation model and the social implications of the simulations, which prevent deeper systems understanding and actionability. Prior evaluations of simulation-based ILEs emphasize predefined outcomes such as knowledge gains and engagement (e.g., Rooney-Varga et al., 2020 , 2025 ). The Co-QA lens, adapted in our approach, complements these by foregrounding how learners experience the ILE along multiple co-defined quality criteria, including the inputs to the ILE and the process of engaging with the ILE. By looking ‘upstream’ from the learning outcomes alone, and widening the criteria considered, this framework reveals how design choices shape the quality of the tool and, importantly, the process-based bottlenecks (e.g., “black-box” opacity that dampens accessibility, credibility, and legitimacy) that conventional outcomes measures may miss. Such findings could then provide design recommendations to improve the fitness of function of the ILE. For instance, some key design implications from our findings are summarized in Table 3 . Table 3 Translating formative evaluation findings into instructional design: design decisions aligned with learner-defined quality criteria and their intended instructional effects. Design decision Targeted quality criteria Intended instructional effect Add micro-explanations and/or simplified visualization of model structure and assumptions, causal logic, and feedback interactions Accessibility: Transparent Inner Workings; Credibility: Internal Consistency; Legitimacy: Trust in Inputs and Outputs; Effectiveness: Feedback Loops Enhance transparency; enable identification and evaluation of feedback interrelationships; strengthen systemic causal reasoning Embed narratives and visuals that link abstract variables (inputs or outputs) to lived realities Salience: Practical Insights; Accessibility: Relatable Representations; Effectiveness: Unintended Consequences Improve relatability of policy options as well as the simulated outcomes of those policies; support reasoning about unintended consequences of simulated outcomes Broaden policy space for non-mainstream or lifestyle-oriented intervention options Salience: Experimentation and Inspire Action; Effectiveness: Leverage Points Increase relevance of policy experimentation; support perceived individual agency for climate action; reveal additional leverage points Discuss policy implementation and costs as well as pathways for individual and collective action Credibility: Policy Feasibility; Salience: Inspire Action Improve judgements of feasibility and actionability of policy options; support contextualized decision-making for climate action Table 4 Deductive coding framework used to analyse the qualitative data during the Co-QA phase, including sample coded excerpts from the transcripts. Parent Code Child Codes Examples Salience Practical Insights “The tool reminds you of the complexity. And I do believe that someone who doesn't think about climate change that much can learn a lot here… I found that interesting. Normally you only look at one thing at a time. For example, vegan nutrition. But to see how everything is connected, what the whole thing looks like, that was exciting. To see the complexity.” “I think to have a little bit more [practical insight], like, the model is just based on reducing the CO2 levels. And it doesn't reflect what would happen in society if that were to be implemented, for example. Or that you see the consequences for humans.” Experimentation “In our group, we discussed what we are already doing for the climate. We came up with a few examples. But we then found it very difficult to find these actions in the tool. For example, there is no responsible production and consumption section. We would have liked to know how much fair trade brings. Or less flying. Or the consequences of fast fashion. But you can't see that in the tool.” “It didn't have like radical solutions, or quote unquote radical solutions, or like things that are becoming to be part of mainstream debates, like Degrowth, for example. You couldn't go negative economic growth. Alright, it's a debate, I think you should be able to play around with it. And then also it doesn't have the implications of what that means for any other industries.” Inspire Action “It did not inspire any climate action because I think that to the end; to just get to the degrees, we made a lot of policies that just seemed completely unrealistic and that would never happen, and it was only to get to the degrees. So, if anything, this experience just made me believe that we are all doomed and there is no hope… So, if I previously did something like saving energy or anything else that I believe is good for the environment. But after this, I couldn't care less. If anything, yeah, I would say I care less about the environment than I did before because I didn't really see any reason, or I don't believe there was hope in coordinated action that would actually change anything.” “I don't know, an awareness that the real change is going to come from pressure points in the system and to not sort of like individualized, atomized actions, which can be depending on how you look at it like either super discouraging or super galvanizing.” Accessibility Ease of Use “…everything was explained very clearly. So yes, like you said at the beginning, oh you should use the three dots, but otherwise I think someone could have even like discovered that himself...Everything was super clear.” “The only thing we did lack maybe sometimes was time, but that's okay. For example, we would modify parameters without really knowing what it meant concretely because we did not have the time to just click on the concept and read more thoroughly what it was really about. But as a part of this, I think, quite straightforward.” Relatable Representations “I had a pretty low score for this one… because, well, I mentioned the kind of like the coal subsidy, but I can't determine whether or not $ 5 a ton or $ 5 per cubic foot is small or large. I googled what my carbon tax is at home, and it's 80 Canadian dollars, so it was 60 US dollars per ton. So, I went off that, so that was one kind of sense of scale that I had, but it is the scale that I didn't have an easy time with.” “I think I also found myself thinking about when you're in a gamified environment. It's kind of easy for it to feel like, oh, 0.1° and what? What does that really mean? Visualization tools are just informational tools that really shows how much of a different world even 0.1° of warming generates…Like how much human suffering that hurts? How much ecosystem impacts it has an effect on? I think that would help.” Transparent Inner Workings “…sometimes nothing happened. I think that was one of our applications: electrification. It went all the way up and nothing happened. I'm like, OK, I still don't know what it does.” “I think just like physically, like moving the slider you see the thing change so that's why I kind of gave this a 3. But I think I didn't really understand the kind of magnitude of the change before I made. It was basically trial and error. But I think the actual like interface like the user experience design worked pretty well for that.” Credibility External Consistency “The only thing that I really thought was kind of glaringly missing, that I also acknowledge would be really hard to put on a sliding scale, is behavioural change, particularly related to consumption. Maybe that's influenced by the fact that it's something that we've been talking about a lot collectively, but I think that was, yeah, that, because it feels like it has a bearing on so much of the rest of the factors, particularly the ones in the, yeah, like transport building industry growth. So, some kind of metric for behavioural change, particularly related to consumption, felt missing, but also understand how difficult that would be to incorporate.” “I was going to say that for the second question, I was more on the high end because I think it's adequate for me for what it's trying to do, like it's not trying to get into all of the details of how politically or financially difficult any of these issues would be, or like the mentioned equity impacts as something that it's not specifically. It can't do because it's just doing one global model. So, for me it is adequate. But of course, it could be better.” Internal Consistency “I don't know more about the data, the assumptions, and how the model works. How can I understand, really, how something relates; how an input creates an output?” “I think there were definitely things that I toggled, that my brain was like that shouldn't have that effect or like that should have a bigger effect than that…but then my second thought was sort of like okay there's probably some delay in the system, or some inertia, or some like unforeseen unclear reason why this is the case, which I don't know if it's like assumptions based on assuming that systems are sticky or just kind of blind trust in the model, right? But like, even if I did have a moment where I was like, huh, that's not what I expected. I did kind of assume that there was a reason for it that's hidden somewhere in the complexity.” Policy Feasibility “I would say we need a bit more of an approach as to how it could be integrated and actually implemented. So far, we’ve only looked at theoretical measures and not how feasible they are…And maybe also a diagram or something with a degree of difficulty of implementation, because it all looks much easier than it is” “I think to have a little bit more, like, the models just based on reducing the CO2 levels. And it doesn't reflect what would happen in society if that were to be implemented, for example. And I think that is a very important thing that needs to make it realistic. Because we can shoot, like, carbon tax way high and reduce it really easily. But then you look at the consequences, and you find out why that hasn't happened yet. Yeah. So, I think that's where it could be improved.” Legitimacy Appeal to Authority “I think that's because it required your explanation as to why things, like when something was counterintuitive, maybe you could see the result, but you didn't understand why it was counterintuitive.” “Well, we kind of had you with us. From that point of view, it was okay. But if you're supposed to do it alone, then it's not so understandable. When someone does it alone.” Trust in Inputs “I mean, I trust them, but the only reason why I put I think I trust all of them is because it's backed by the MIT, a very prestigious institution. It's basically because I know I'm not an expert in this and I rely on all of my trust towards the knowledge of these procedures in the university. It's basically a program-related trust.” “I feel like this only gives a rough relationship and we cannot depend on it. I mean, there are several factors which are influencing one thing. So, I don't believe in this model. It just gives a rough estimate and maybe helps us understand relationship. That's it.” Trust in Outputs “I've never seen such a comprehensive tool. a very hands-on tool where I think it was really helpful. So, like, as it was mentioned, like play with the sliders, go left and right and see what the impact is. So, I would say like for me, this is like the best climate information source I've ever interacted with. So that's why it's a fundamental source.” “…it's also, you know, you don't need to be able to build a car to drive a car. But you need to understand how cars work if you want to fix them…And you need to understand a little how they work because otherwise driving can become really dangerous. And that's probably similar to modeling. You don't need to understand the model in detail to trust it. But if you don't understand anything at all, it's not a big surprise that people don't act upon your recommendations.” Effectiveness Systems Understanding “It does help me to understand what are the important drivers. And what are its impacts on the consequences and to decreasing the climate, but it does not help me to understand the causal relationship.” “I didn't see a single, like any kind of like feedback loop kind of situation. It would just be me trying to base off of previous knowledge.” Our facilitated co-defined evaluation approach, therefore, doubles as a design-thinking cycle within formative developmental research (Richey et al., 2004 ). That is, the ILE prototype is iteratively assessed after each cycle of instructional design improvements to identify priority areas for enhancement. We do not assume that any single ILE should satisfy all user needs or use contexts. Rather, the evaluation findings reveal instructional design trade-offs that must be explicitly negotiated in subsequent iterations. By making these trade-offs transparent, the framework moves beyond summative evaluations to inform adaptive design (Richey et al., 2004 ) – ultimately complementing outcomes-based learning assessments and strengthening their explanatory power and effect sizes. By mapping learner-elicited criteria to established quality principles and operationalizing them as measurable indicators, this study also advances the co-creation of a transparent theory of how ILE quality manifest to learners based on their subjective experiences and expectations. However, the framework presented here remains contingent since the sample was limited to students, predominantly from European contexts. Future work could therefore extend this Co-QA to other user groups such as educators, policy practitioners, and community organizers across diverse cultural contexts to elicit additional quality criteria. This would provide a more textured picture of what ‘quality’ means for actors working with ILE, and which criteria are shared or transferable across groups. A logical next step is to formalize the measurement instrument. The Construct Mapping approach (Wilson, 2023 ) offers a systematic method to enforce hierarchical progressions within each quality dimension, while Rasch modelling (Andrich & Marais, 2019 ) can be used to statistically validate the scale. Here, the completed Co-QA tables shape the item hierarchy, ensuring that the measurement reflects learner experiences rather than expert assumptions alone. Once validated through Rasch analysis, the instrument can be deployed to rigorously measure and benchmark the quality of multiple ILEs. Ultimately, we contend that our Co-QA approach, adapted for educational technology, lays the foundation for advancing transparent, participatory, and evidence-based standards for evaluating and designing simulation-based ILEs for climate education. Declarations Ethics Approval: All procedures adhered to the ethical standards set by the Norwegian National Research Ethics Committees and received formal approval from the University of Bergen’s System for Risk and Compliance (RETTE), under reference number F3071. Consent: Informed consent was obtained from all individual participants included in the study. Participation in this study was voluntary, and participants could decline to answer any question at any time. Data availability statement The data and codes for reproducing the figures can be retrieved from https://doi.org/10.5281/zenodo.17805467 . References Andrich D, Marais I (2019) A Course in Rasch Measurement Theory: Measuring in the Educational, Social and Health Sciences. Springer Nature Singapore. https://doi.org/10.1007/978-981-13-7496-8 Belcher BM, Rasmussen KE, Kemshaw MR, Zornes DA (2016) Defining and assessing research quality in a transdisciplinary context. Res Evaluation 25(1):1–17. https://doi.org/10.1093/reseval/rvv025 Braun V, Clarke V (2006) Using thematic analysis in psychology. Qualitative Res Psychol 3(2):77–101. https://doi.org/10.1191/1478088706qp063oa Bremer S, Wardekker A, Baldissera Pacchetti M, Soares B, M., Van Der Sluijs J (2022) Editorial: High-Quality Knowledge for Climate Adaptation: Revisiting Criteria of Credibility, Legitimacy, Salience, and Usability. Front Clim 4:905786. https://doi.org/10.3389/fclim.2022.905786 Bremer S, Wardekker A, Dessai S, Sobolowski S, Slaattelid R, Van Der Sluijs J (2019) Toward a multi-faceted conception of co-production of climate services. Clim Serv 13:42–50. https://doi.org/10.1016/j.cliser.2019.01.003 Bremer S, Wardekker A, Jensen ES, Van Der Sluijs JP (2021) Quality Assessment in Co-developing Climate Services in Norway and the Netherlands. Front Clim 3:627665. https://doi.org/10.3389/fclim.2021.627665 Cash DW, Clark WC, Alcock F, Dickson NM, Eckley N, Guston DH, Jäger J, Mitchell RB (2003) Knowledge systems for sustainable development. Proceedings of the National Academy of Sciences , 100 (14), 8086–8091. https://doi.org/10.1073/pnas.1231332100 Climate Interactive (2025) The En-ROADS Climate Solutions Simulator . Climate Interactive: Tools for a Thriving Future. https://www.climateinteractive.org/the-en-roads-climate-workshop/learn-to-lead-the-workshop/#workshop-materials Deegan M, Stave K, MacDonald R, Andersen D, Ku M, Rich E (2014) Simulation-Based Learning Environments to Teach Complexity: The Missing Link in Teaching Sustainable Public Management. Systems 2(2):217–236. https://doi.org/10.3390/systems2020217 Kapmeier F, Greenspan AS, Jones AP, Sterman JD (2021) Science-based analysis for climate action: How HSBC Bank uses the En‐ROADS climate policy simulation. Syst Dynamics Rev 37(4):333–352. https://doi.org/10.1002/sdr.1697 Kolb DA (2015) Experiential Learning: Experience as the Source of Learning and Development (Second edition). Pearson Education, Inc Kopainsky B, Alessi S (2015) Effects of Structural Transparency in System Dynamics Simulators on Performance and Understanding. Systems 3(4):152–176. https://doi.org/10.3390/systems3040152 Kopainsky B, Sawicka A (2011) Simulator-supported descriptions of complex dynamic problems: Experimental results on task performance and system understanding. Syst Dynamics Rev 27(2):142–172. https://doi.org/10.1002/sdr.445 Lemos MC, Morehouse BJ (2005) The co-production of science and policy in integrated climate assessments. Glob Environ Change 15(1):57–68. https://doi.org/10.1016/j.gloenvcha.2004.09.004 Maier FH, Größler A (2000) What are we talking about??A taxonomy of computer simulations to support learning. Syst Dynamics Rev 16(2):135–148. https://doi.org/10.1002/1099-1727(200022)16 :2%253C135::AID-SDR193%253E3.0.CO;2-P Richey RC, Klein JD, Nelson WA (2004) Developmental Research: Studies of Instructional Design and Development. Handbook of research on educational communications and technology, 2nd edn. Lawrence Erlbaum Associates, pp 1099–1130 Rooney-Varga JN, Coleman RL, Jones AP, Kapmeier F, Newsome P, Noiseux K, Patten B, Rath K, Sterman JD (2025) Interactive role-play with climate policy simulation can motivate evidence-based climate action. Commun Earth Environ 6(1):769. https://doi.org/10.1038/s43247-025-02744-w Rooney-Varga JN, Kapmeier F, Sterman JD, Jones AP, Putko M, Rath K (2020) The Climate Action Simulation. Simul Gaming 51(2):114–140. https://doi.org/10.1177/1046878119890643 Rouwette EAJA, Größler A, Vennix JAM (2004) Exploring influencing factors on rationality: A literature review of dynamic decision-making studies in system dynamics. Syst Res Behav Sci 21(4):351–370. https://doi.org/10.1002/sres.647 Stave K, Hopper M (2007), August 29 What Constitutes Systems Thinking? A Proposed Taxonomy. Proceedings of the 2007 International System Dynamics Conference Sterman J, Fiddaman T, Franck T, Jones A, McCauley S, Rice P, Sawin E, Siegel L (2012) Climate interactive: The C-ROADS climate policy model. Syst Dynamics Rev 28(3):295–305. https://doi.org/10.1002/sdr.1474 Sterman J, Fiddaman T, Franck T, Jones A, McCauley S, Rice P, Sawin E, Siegel L (2013) Management flight simulators to support climate negotiations. Environ Model Softw 44:122–135. https://doi.org/10.1016/j.envsoft.2012.06.004 Sterman J, Franck T, Fiddaman T, Jones A, McCauley S, Rice P, Sawin E, Siegel L, Rooney-Varga JN (2015) WORLD CLIMATE: A Role-Play Simulation of Climate Negotiations. Simul Gaming 46(3–4):348–382. https://doi.org/10.1177/1046878113514935 van der Sluijs JP, Petersen AC, Janssen PHM, Risbey JS, Ravetz JR (2008) Exploring the quality of evidence for complex and contested policy decisions. Environ Res Lett 3(2):024008. https://doi.org/10.1088/1748-9326/3/2/024008 Wilson M (2023) Constructing Measures: An Item Response Modeling Approach (2nd ed.). Routledge. https://doi.org/10.4324/9781003286929 Additional Declarations The authors declare no competing interests. Supplementary Files AppendixA.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8843292","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":589116136,"identity":"0abb8341-1f94-41da-a09f-624507b84bb0","order_by":0,"name":"Jefferson K. Rajah","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA/ElEQVRIie3QMUsDMRTA8RcexCXXOV2uX+GOG6zfpsWhk+joIEfKwXU5nNPJr3AfwOHJA29R/AA6KAedz0U6iHgFK3VosgrNnwwJ5Ad5AQiF/mHCiNfu96SBYkDlI5jaXZJ5CcAuAaCpAQ/BRSHM+20eH0P01o2vXmZ1ExFeXDoeVrGY2xVnJ+Yo0/p+dVbzYIL2wUHsedMqomlNErSW3BOVYFQ6yE0rik/KNwTX+otniZdYFAUQbojUw5InflKdinlFnCUs5Xh4zemyn4WVY5Z0wcKsKY+TpsRn/cGjwdPjXascP5aa7Q77pX/2tB8AjP68stt3LRQKhQ66b+6LU0Ib5Si/AAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0001-8365-0428","institution":"System Dynamics Group, Department of Geography, University of Bergen","correspondingAuthor":true,"prefix":"","firstName":"Jefferson","middleName":"K.","lastName":"Rajah","suffix":""},{"id":589118392,"identity":"43ddcc1b-62d9-497a-805a-132e356f18a8","order_by":1,"name":"Andreas Nicolaidis Lindqvist","email":"","orcid":"https://orcid.org/0000-0002-6323-1397","institution":"RISE Research Institutes of Sweden","correspondingAuthor":false,"prefix":"","firstName":"Andreas","middleName":"Nicolaidis","lastName":"Lindqvist","suffix":""},{"id":589118393,"identity":"92e0693a-7b55-4330-8908-f4c3a37e8544","order_by":2,"name":"Theresia B. Putranti","email":"","orcid":"","institution":"System Dynamics Group, Department of Geography, University of Bergen","correspondingAuthor":false,"prefix":"","firstName":"Theresia","middleName":"B.","lastName":"Putranti","suffix":""},{"id":589118394,"identity":"f3a90d0e-e79f-45c9-91ae-0e44c2555050","order_by":3,"name":"Scott Bremer","email":"","orcid":"https://orcid.org/0000-0002-4505-9386","institution":"Centre for the Study of the Sciences and Humanities, University of Bergen","correspondingAuthor":false,"prefix":"","firstName":"Scott","middleName":"","lastName":"Bremer","suffix":""},{"id":589118395,"identity":"171983a3-aaad-4e4c-bbbe-cc0ed5bc961f","order_by":4,"name":"Birgit Kopainsky","email":"","orcid":"https://orcid.org/0000-0002-1271-8365","institution":"System Dynamics Group, Department of Geography, University of Bergen","correspondingAuthor":false,"prefix":"","firstName":"Birgit","middleName":"","lastName":"Kopainsky","suffix":""}],"badges":[],"createdAt":"2026-02-10 16:34:17","currentVersionCode":1,"declarations":{"humanSubjects":true,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":true,"humanSubjectConsent":true,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-8843292/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8843292/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":102754633,"identity":"bdf7a899-20f1-42bb-a9ed-b24d3aa67f78","added_by":"auto","created_at":"2026-02-16 09:38:43","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":26459,"visible":true,"origin":"","legend":"\u003cp\u003eAdapted Co-QA framework for evaluating simulation-based ILEs.\u003c/p\u003e","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8843292/v1/2576cdfd8876e13264757e9f.png"},{"id":102754671,"identity":"e3e0ca4f-d0ea-48e8-89c7-6eb3b36cbf1d","added_by":"auto","created_at":"2026-02-16 09:38:47","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":90603,"visible":true,"origin":"","legend":"\u003cp\u003eConceptual representation of Simulation-based ILE Quality in terms of identified quality dimension (latent constructs) from co-defined quality criteria (observable indicators).\u003c/p\u003e","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8843292/v1/cd4b44c726c1b0f07f7754f7.png"},{"id":102754087,"identity":"fac9775a-49ec-4d95-9868-3232771b385a","added_by":"auto","created_at":"2026-02-16 09:37:09","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":45328,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of survey responses for the Salience construct.\u003c/p\u003e","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8843292/v1/b89663c45857f6ba3eb2cd2a.png"},{"id":102754028,"identity":"4f6cd2a7-6ae3-4c23-b179-92fefb8217a3","added_by":"auto","created_at":"2026-02-16 09:36:51","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":38309,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of survey responses for the Accessibility construct.\u003c/p\u003e","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-8843292/v1/4988183ebda88cf81fdcfed2.png"},{"id":102754086,"identity":"04c09413-12b9-480c-b339-854203075c32","added_by":"auto","created_at":"2026-02-16 09:37:08","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":37735,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of survey responses for the Credibility construct.\u003c/p\u003e","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-8843292/v1/c4e75b645d0874d19b8b8da1.png"},{"id":102754077,"identity":"cba1a7f9-27d9-428a-996e-1bef2859cfb1","added_by":"auto","created_at":"2026-02-16 09:36:56","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":46293,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of survey responses for the Legitimacy construct.\u003c/p\u003e","description":"","filename":"Onlinefloatimage6.png","url":"https://assets-eu.researchsquare.com/files/rs-8843292/v1/54fe86474d8fb17a5a6d7024.png"},{"id":102753978,"identity":"c96f74e3-0c87-43ad-a2f3-0e3f6a7f6b7a","added_by":"auto","created_at":"2026-02-16 09:36:45","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":56074,"visible":true,"origin":"","legend":"\u003cp\u003eDistribution of survey responses for the Systems Understanding Effectiveness construct.\u003c/p\u003e","description":"","filename":"Onlinefloatimage7.png","url":"https://assets-eu.researchsquare.com/files/rs-8843292/v1/a7dd5044540e318d566cba52.png"},{"id":102962272,"identity":"5c5e2602-2771-4110-9bc5-24b94d14b80b","added_by":"auto","created_at":"2026-02-19 04:06:39","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1133235,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8843292/v1/0c0a099c-837b-499c-885c-131e0dcce4b0.pdf"},{"id":102753996,"identity":"fd5d3a50-55be-424d-9ef0-09064b694e77","added_by":"auto","created_at":"2026-02-16 09:36:51","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":27657,"visible":true,"origin":"","legend":"","description":"","filename":"AppendixA.docx","url":"https://assets-eu.researchsquare.com/files/rs-8843292/v1/a4e01c4380b5e7dc1df5658e.docx"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eCollaborative quality assessment of simulation-based interactive learning environments for climate education\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eSimulation-based interactive learning environments (ILEs) are often deployed as an educational technology to expose learners to the model-based insights from a formal simulation model. Beyond the model itself, these tools typically comprise a human-computer interaction interface and a gaming functionality (Maier \u0026amp; Gr\u0026ouml;\u0026szlig;ler, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2000\u003c/span\u003e; Rouwette et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2004\u003c/span\u003e). They enable learners to actively engage with the model by making decisions to manipulate inputs and observe real-time simulated outcomes, all within a game-like setting that includes decision timing, user competition, and a contextual narrative. Like most ILEs, simulation-based ones are grounded in experiential learning theory (Kolb, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2015\u003c/span\u003e), which emphasizes learning through doing, reflection, and iterative feedback. Simulation-based ILEs further emphasize learning objectives related to \u0026ldquo;declarative knowledge (knowing that) as well as procedural knowledge (knowing how) and structural knowledge (knowing why)\u0026rdquo; (Maier \u0026amp; Gr\u0026ouml;\u0026szlig;ler, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2000\u003c/span\u003e, p. 139). For learners, this means developing a deeper understanding of the modelled system by actively experimenting with decisions, receiving feedback, and refining their mental models of the system\u0026rsquo;s underlying causal logic (Deegan et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2014\u003c/span\u003e; Kopainsky \u0026amp; Alessi, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2015\u003c/span\u003e; Kopainsky \u0026amp; Sawicka, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). In this way, the instructional goal of such ILEs tends to be directed towards fostering systems understanding and informed decision-making through simulation-based experimentation.\u003c/p\u003e \u003cp\u003eIn the domain of climate education, simulation-based ILEs have gained prominence through tools like Climate Rapid Overview and Decision Support (C-ROADS; Sterman et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2012\u003c/span\u003e, \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2013\u003c/span\u003e, \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2015\u003c/span\u003e) and Energy Rapid Overview and Decision Support (En-ROADS; Kapmeier et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Rooney-Varga et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). While C-ROADS focuses on testing country-level and regional emissions pledges, En-ROADS explores the global cross-sector climate impacts of various climate mitigation options, including carbon pricing, renewable energy adoption, and land-use changes (Sterman et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). En-ROADS provides an interactive environment for learners to test climate-relevant solutions, supported by facilitated learning formats: the En-ROADS Climate Workshop (ECW) and the Climate Action Simulation (CAS) role-playing game (see Climate Interactive, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). In both formats, learners work with the educational technology to identify and test strategies for limiting global temperature rise to well below 2℃ by 2100. The role-playing game has the added complexity of climate negotiations, where learners represent diverse and often conflicting interest groups.\u003c/p\u003e \u003cp\u003eToday, En-ROADS is considered a state-of-the-art simulation-based ILE, with a cumulative reach of over 350,000 learners across 165 countries (Climate Interactive, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). It has been used in classroom settings, policy workshops, and corporate training to foster systems thinking and collaborative problem-solving. Its educational impact has also been studied, showing significant increases in learners\u0026rsquo; understanding of climate change causes, impacts, and solutions as well as personal and emotional engagement with climate issues (Rooney-Varga et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). A follow-up study suggests that longer-term outcomes on understanding, affective engagement, intent to act, and real-world action persist over time (Rooney-Varga et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eWhile evaluations of En-ROADS have demonstrated its effectiveness in promoting climate literacy, they have primarily focused on predefined learning outcomes such as knowledge gains and behavioural intentions. Such evaluation frameworks are valuable for understanding the ILE\u0026rsquo;s broad educational impacts. However, predefined outcomes-based metrics may overlook the experiential, process-based dimensions that shape user engagement and learning. In other words, they offer limited insight into how learners experience the ILE, particularly in terms of usability and alignment with a plurality of learning goals and criteria for evaluating its quality. This, in turn, limits our understanding of how such ILEs might be better tailored to diverse learner needs. To enrich this perspective, we propose a complementary collaborative evaluation framework that invites learners to co-define quality criteria based on their own goals and interactions with the ILE. This approach supports a more context-sensitive and responsive assessment of simulation-based ILEs.\u003c/p\u003e \u003cp\u003eIn developing this framework, we turned to scholarship on knowledge quality assessment (KQA) in contexts where science is used to educate and inform action. Cash et al. (\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2003\u003c/span\u003e) introduced the foundational triad of quality criteria for evaluating knowledge system\u0026rsquo;s fitness for addressing societal challenges: \u003cem\u003esalience\u003c/em\u003e (relevance to user needs), \u003cem\u003ecredibility\u003c/em\u003e (scientific rigour), and \u003cem\u003elegitimacy\u003c/em\u003e (trust in knowledge production process). Since then, scholars have expanded these criteria to include dimensions like \u003cem\u003eusability\u003c/em\u003e (accessibility and applicability to users), \u003cem\u003eeffectiveness\u003c/em\u003e (contribution to positive change) (Belcher et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Lemos \u0026amp; Morehouse, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2005\u003c/span\u003e). Bremer et al. (\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2022\u003c/span\u003e, p. 2) caution against uncritical applications of such \u0026ldquo;\u003cem\u003ea priori\u003c/em\u003e principles of quality,\u0026rdquo; arguing that they may obscure the nuanced and contingent ways in which quality is understood in particular user contexts.\u003c/p\u003e \u003cp\u003eTo address this concern, Bremer et al. (\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2021\u003c/span\u003e) built on earlier work on the science-policy interface and knowledge co-production (Bremer et al., \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) to develop the Collaborative Quality Assessment (Co-QA) framework. As a KQA technology, Co-QA supports systematic, critical analysis of uncertainties, assumptions and dissent relative to science\u0026rsquo;s fitness for function in public decision-making (van der Sluijs et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2008\u003c/span\u003e). It was designed for co-production processes in climate services, enabling \u0026lsquo;users\u0026rsquo; and \u0026lsquo;producers\u0026rsquo; of climate information to collaboratively define and assess quality criteria that are meaningful within their specific use contexts. In doing so, it offers \u0026ldquo;a way of bridging knowledge quality expectations across all actors in a knowledge system\u0026rdquo; (Bremer et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2021\u003c/span\u003e, p. 4).\u003c/p\u003e \u003cp\u003eThis study draws on Co-QA to co-evaluate a simulation-based ILE \u0026ndash; particularly En-ROADS \u0026ndash; for assessing its quality for climate education. Here, we must first come with some caveats. First, Co-QA was developed in the context of climate services, where \u0026lsquo;user groups\u0026rsquo; in some contexts have quite clearly defined decision-making needs and expectations regarding climate information products. This presupposition presents a challenge when applied to simulation-based ILEs, which are broadly designed for exploratory and educational engagement. Here, learners may not enter with predefined needs or expectations that can be meaningfully elicited without prior experience with the tool. Second, the Co-QA framework was proposed to be filled through the direct participation of actors engaged in co-production, identifying and writing down their own quality criteria, demanding that a small group meet and interact over an extended period. Yet, the ILE studied here engages a relatively large group of learners, lasting a few hours. This demanded a design for eliciting criteria from a large group in a limited time. Our co-evaluation framework therefore began with a series of workshops with a sample of learners to collaboratively define quality criteria. These criteria were then mobilized in subsequent group interviews and surveys to iteratively evaluate the ILE through critical dialogue with learners. This adaptation facilitates reflection on how learners perceive the ILE\u0026rsquo;s quality in relation to their educational goals and needs.\u003c/p\u003e \u003cp\u003eIn the next section, we detail our adapted Co-QA framework for co-evaluating En-ROADS as a case study. As alluded to, our approach has two distinct phases: (1) workshops to collaboratively define learners\u0026rsquo; quality criteria, which are then used to develop evaluation instruments, and (2) workshops to collaboratively assess the quality of En-ROADS using the developed instruments. We then present the results of our Co-QA across five quality principles abstracted from the co-defined quality criteria. While En-ROADS serves as the context of this study, our aim is to demonstrate how a Co-QA approach can generate additional insights into the design and relevance of simulation-based ILEs more broadly. We conclude by discussing these insights in relation to the strengths and limitations of our approach, and by reflecting on opportunities for advancing Co-QA of simulation-based ILEs.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003eThis study employs a mixed-methods, developmental evaluation approach, integrating both qualitative and quantitative data in the collaborative quality assessment. In this section, we outline the methodological process undertaken for the collaborative quality assessment, as illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, including the strategies used for data collection and analysis in each phase of the process. This evaluation approach is conducted within workshop settings where participants engage with En-ROADS, either in the ECW or the CAS format. We selected En-ROADS due to its status as a state-of-the-art simulation-based ILE, and because two authors are trained En-ROADS Climate Ambassadors, making it well-suited for use in facilitated workshops and data collection. The evaluation comprises two phases: (1) co-creation of learner-defined quality criteria via thematic analysis of post-workshop group interviews; and (2) structured co-evaluation using a survey within facilitated group interviews to collect descriptive quantitative evidence and qualitative explanations. Participation in the assessment was entirely voluntary, and participants could decline to answer any question at any time. This design is intentionally formative. That is, the data collection and analysis serves to demonstrate our Co-QA approach and generate evidence-linked design guidance consistent with developmental research priorities in instructional technology (Richey et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2004\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eCo-creation of quality criteria\u003c/h2\u003e \u003cp\u003eTo elicit the quality criteria relevant to learners, we conducted three workshops with graduate students at two different universities (see Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). After playing through En-ROADS, learners co-evaluated the tool in facilitated discussion. Here, we used semi-structured group interviews to elicit learners\u0026rsquo; perceptions of the fitness for purpose, trust in the information provided, impression of the learning experience, and suggested improvements to better fulfil their learning goals (see Appendix A). Interview transcripts were analysed to identify emergent quality criteria within the learning context, capturing the process-based dimensions of user engagement.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSources of data collection for the co-creation of quality criteria. University names have been removed for double-blind peer review.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDate\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLocation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFormat\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eN\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eParticipant Profile\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e28 Nov 2023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUniversity in Norway\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCAS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e20\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMaster\u0026rsquo;s students in system dynamics\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e11 Jan 2024\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUniversity in Germany\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eECW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMaster\u0026rsquo;s students in sustainability and digitalization\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e21 Feb 2024\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUniversity in Norway\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eECW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e21\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMaster\u0026rsquo;s students in sustainability\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFor the data analysis, we uploaded the transcripts to NVivo 14, where we employed an inductive and iterative thematic analysis at the latent level (Braun \u0026amp; Clarke, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2006\u003c/span\u003e) to interpret ways participants expressed judgement and discussed quality. We selected this method for its flexibility and suitability in uncovering themes that emerge from learners\u0026rsquo; \u0026ldquo;\u003cem\u003eunderlying\u003c/em\u003e ideas, assumptions, and conceptualizations\u0026rdquo; (Braun \u0026amp; Clarke, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2006\u003c/span\u003e, p. 84) \u0026ndash; i.e., the implicit quality criteria employed in their evaluations. Latent themes and interview talk was then coded relative to \u0026lsquo;quality criteria\u0026rsquo;, and these criteria ordered in relation to each other. Finally, emergent criteria codes were viewed in concert with established quality principles from the literature. This deductive step aligns learners\u0026rsquo; implicit quality criteria to established theory while remaining sensitive to context (Bremer et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). We finished with a set of quality criteria that emerged at this \u0026lsquo;bottom-up\u0026rsquo;(empirical) meets \u0026lsquo;top-down\u0026rsquo; (theoretical) interface (see also Appendix B).\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eAssessment interviews and surveys\u003c/h3\u003e\n\u003cp\u003eDrawing on the co-defined quality criteria in the first exploratory phase, we developed a survey to collect quantitative data and, importantly, structure the critical dialogue for evaluating En-ROADS. Quality was operationalized as five dimensions: salience, accessibility, credibility, legitimacy, and effectiveness. The quality dimensions are treated as latent constructs that are not directly observable; instead, they are inferred through observable indicators (i.e., criteria elicited from learners) that describe what increasing levels of the quality dimension look like. Indicators are then operationalized into Likert-type survey items, which are the specific questions or statements posed to learners during the evaluation. Responses (on 4-point or 5-point scales) reflect learners\u0026rsquo; appraisal of En-ROADS relative to each quality criterion and serve as evidence for inferring the latent quality of the ILE experience.\u003c/p\u003e \u003cp\u003eWe deployed the survey with a total of 104 learners after using En-ROADS: seven groups of graduate students and one group of high school students (see Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The evaluation was conducted after the facilitated learning workshop and structured by the survey, using Mentimeter (interactive presentation tool). Learners first responded to the survey questions individually and then collectively reflected on the scoring after each response. The group interview was facilitated by two guiding questions: (1) \u003cem\u003eWhat are some important features or aspects of En-ROADS that influenced your rating?\u003c/em\u003e (2) \u003cem\u003eCan you think of additional features or changes you would make to enhance the performance of En-ROADS for this dimension?\u003c/em\u003e Through this approach, we capture both quantitative data in terms of learner\u0026rsquo;s rating of the ILE relative to quality criteria and qualitative insights on the underlying reasoning for these scores.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eSources of data collection for the assessment. University names have been removed for double-blind peer review.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDate\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eLocation\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eFormat\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eN\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eParticipant Profile\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e16 Sep 2024\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUniversity in Norway\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eECW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMaster\u0026rsquo;s students in geography\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e20 Sep 2024\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUniversity in Switzerland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eECW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMaster\u0026rsquo;s students in agriculture\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e03 Oct 2024\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUniversity in Iceland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eECW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMaster\u0026rsquo;s students in coastal communities and regional development\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e08 Nov 2024\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eHigh School in Switzerland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCAS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAdult high school students\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e25 Nov 2024\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUniversity in Norway\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCAS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMaster\u0026rsquo;s students in system dynamics\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e22 Aug 2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUniversity in Norway\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCAS\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMaster\u0026rsquo;s students in system dynamics\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e19 Sep 2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUniversity in Switzerland\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eECW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMaster\u0026rsquo;s students in agriculture\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e03 Oct 2025\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eUniversity in Norway\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eECW\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMaster\u0026rsquo;s students in geography\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eOf the 104 learners who participated in the workshop, 88 responded to more than three survey items and were included in this study. 64% of the respondents belonged to the 18\u0026ndash;25 age group, 19% to the 26\u0026ndash;35 group, 3% were older than 35, and 14% did not answer. 67% of the respondents were European, 8% North American, 7% West and East Asian, 2% North African, and 16% did not answer. As for education level, 49% held a bachelor\u0026rsquo;s degree, 11% held a master\u0026rsquo;s degree, 26% were non-degree holders, and 14% did not answer. During the data collection period, we iterated on the survey items based on feedback and preliminary analysis of the scores. In some co-evaluation workshops, the facilitator also had to skip a few questions given time constraints in the classroom setting. As a result, items were systematically excluded for some groups in the final dataset.\u003c/p\u003e \u003cp\u003eThe quantitative data were analysed in R (version 4.5.1) to summarize the descriptive statistics of the responses. Specifically, we calculated the percentage frequency of each response category and visualized the distribution of the raw scores for each item (i.e., quality criteria). Given the formative purpose of the evaluation, we report descriptive statistics (frequency distributions) rather than inferential statistics. The qualitative data (transcripts of the co-evaluation) were analysed in NVivo 14 using the deductive thematic analysis approach (Braun \u0026amp; Clarke, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2006\u003c/span\u003e). Given that the evaluation was structured by the survey, the co-defined quality criteria (used as child codes) and the broader quality principles (used as parent codes) formed the coding framework (see Appendix C).\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eThrough our exploratory co-creation phase, we identified 17 quality criteria under five broader constructs or principles: salience, accessibility, credibility, legitimacy, and effectiveness for building systems understanding competency (see Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). For each construct, we juxtapose the distribution of survey responses with qualitative insights from the explanations groups offered for their rating during the critical reflections.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\n\u003ch3\u003eSalience\u003c/h3\u003e\n\u003cp\u003eSalience broadly refers to the relevance of the knowledge or service provided in relation to the key priorities and concerns of its users (Cash et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2003\u003c/span\u003e). In context of En-ROADS, salience refers to how relevant, relatable, and practical information and insights are for learners. Learners deemed the ILE to be salient when it provides practical insights on climate change and mitigation, offers opportunities to challenge their mental models or preconceptions, and motivates climate action in their everyday lives. Figure\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e depicts the distribution of valid responses for the Salience construct across the three co-defined criteria: Practical Insights, Experimentation, and Inspire Action.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003ePractical Insights.\u003c/em\u003e At least two-thirds of the respondents rated the ILE positively for providing practical insights (36.2% fair amount; 31.2% a lot). Learners valued the wide scope of climate solutions spanning multiple sectors. Several noted that experimenting with policy options raised their awareness that addressing climate change requires a combination of strategies rather than a single \u0026ldquo;silver bullet.\u0026rdquo; As for the remaining third, their ratings stemmed largely from an inability to personally relate to the information provided. Learners frequently noted that policy effects were presented in terms of systemic variables (e.g. temperature rise) rather than societal impacts on livelihoods. They wanted to know what their everyday lives would look like under certain policy scenarios, including the trade-offs involved. One group even proposed \u0026ldquo;flipping the script,\u0026rdquo; where the primary goal is designing future sustainable ways of living, with climate outcomes viewed as consequences of those scenarios.\u003c/p\u003e \u003cp\u003e \u003cem\u003eExperimentation.\u003c/em\u003e Most respondents (57.5%) indicated that they were able to test a fair amount of their preconceived ideas using the ILE. Learners generally perceived En-ROADS as offering a good coverage of policies found in mainstream discourse. They also noted that the simulations confirmed their understanding of major drivers of climate change, yet they were at times surprised by the effects of certain policies, which prompted a desire to explore the underlying mechanisms further. Conversely, 30% of respondents reported being able to test few or none of their ideas. These negative ratings were primarily attributed to the level of aggregation in En-ROADS as a global model. Specifically, several learners mentioned that they could not test policies related to individual lifestyle choices or more granular sector-specific interventions. Others expressed interest in testing \u0026ldquo;radical solutions\u0026rdquo; beyond mainstream discourse, such as degrowth policies, that were not offered in the ILE.\u003c/p\u003e \u003cp\u003e \u003cem\u003eInspire Action.\u003c/em\u003e The En-ROADS ILE was less effective in motivating climate action among learners. Nearly half of the respondents indicated the ILE only inspired broad notions of climate action that were not actionable in their everyday lives. Learners mentioned that the impactful policies appeared actionable only for \u0026ldquo;big key players\u0026rdquo; beyond their locus of control, \u0026ldquo;unless you want become like a full-time activist.\u0026rdquo; As a result, some learners reported feeling demoralized, perceiving climate change as inevitable, while others rationalized inaction, believing that individual action was pointless. About one-third of respondents indicated that the ILE motivated climate action that they would at least consider. Here, they emphasized that they became aware that climate mitigation is a collective action problem, and that this awareness could influence attitudes and encourage coordinated efforts to create pressure in areas of the system with the greatest impact.\u003c/p\u003e\n\u003ch3\u003eAccessibility\u003c/h3\u003e\n\u003cp\u003eAccessibility has been discussed, by Lemos and Morehouse (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2005\u003c/span\u003e) for example, as \u0026lsquo;usability\u0026rsquo;; the extent to which knowledge or a service/product is usable, understandable, and inclusive for a diverse range of users. In our context, this category refers to the usability of the ILE product itself. The ILE is deemed accessible when it can be easily navigated (\u003cem\u003eEase of Use\u003c/em\u003e), its elements can be easily interpreted and related to the real-world system (\u003cem\u003eRelatable Representations\u003c/em\u003e), and its inner workings or causal logic can be easily perceived and understood by learners (\u003cem\u003eTransparent Inner Workings\u003c/em\u003e). Figure\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e presents the results of the survey across these three co-defined criteria.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003eEase of Use.\u003c/em\u003e Most learners found the En-ROADS ILE to be easily navigable (57%), with some requiring minor clarifications from the facilitator (38%). Across all workshops, there was a consensus that the design of the sliders in the tool was straightforward to use. Although the facilitator reminded learners that the ellipses button provided more in-depth information, one learner suggested that the functionality was easily discoverable either way. An interesting note from one of the groups was that the time pressure prevented them from reading the information provided too thoroughly, leading them to work with the concepts represented by sliders more abstractly.\u003c/p\u003e \u003cp\u003e \u003cem\u003eRelatable Representations\u003c/em\u003e. In response to the statement that the concepts in the ILE were concretely relatable to learners in terms of their knowledge of the real-world system, about 14% strongly agreed, 49% agreed, 23% were neutral, 12% disagreed and less than 2% strongly disagreed. Here, learners appreciated that the tool provided supplementary contextual information for each slider. However, several learners mentioned that the sliders in the ILE (e.g., taxes/subsidies) failed to provide them with a sense of scale since they did not have a reference value for ascertaining if their input was large or small. Importantly, learners wanted more visualization of climate impacts beyond quantitative figures and pointed to the sea-level induced flooding map of countries as an exemplar. They further requested narratives to contextualize the figures to lived realities.\u003c/p\u003e \u003cp\u003e \u003cem\u003eTransparent Inner Workings\u003c/em\u003e. About half of the respondents agreed or strongly agreed that they could understand why their inputs results in the observed outputs. About 27% remained neutral while about 20% disagreed or strongly disagreed. A common theme in the discussion is that the En-ROADS model appeared to be a \u0026lsquo;black box\u0026rsquo; since the inner workings of the model are not accessible to users. As a result, learners were able to perceive how a change in input resulted in changes in the outputs but could not explain any counterintuitive results. Here, they mentioned that they relied on explanations from the facilitator who is privy to the model. While some learners thought it was acceptable given the nature of the user interface, others wanted the ability to inspect the model structure themselves. Learners specifically desired more transparency in relation to policy interactions; for instance, without the facilitator\u0026rsquo;s explanation, they would not have known that the impact of the electrification could be nullified by prior policies on renewables.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eCredibility\u003c/h2\u003e \u003cp\u003eCredibility relates to user perceptions of the scientific accuracy or adequacy of the arguments and evidence presented by the knowledge system (Cash et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2003\u003c/span\u003e). In our context, learners perceive the ILE to be credible when they consider it to have adequately represented the real-world system (\u003cem\u003eExternal Consistency\u003c/em\u003e), accurately captured the dynamics of the system in terms of causal logic (\u003cem\u003eInternal Consistency\u003c/em\u003e), and provided realistic and feasible insights for climate mitigation (\u003cem\u003ePolicy Feasibility\u003c/em\u003e). The quantitative assessment of the ILE along this construct is depicted in Fig.\u0026nbsp;\u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e5\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003eExternal Consistency\u003c/em\u003e. Most respondents (52%) found the tool to have adequately represented the real-world system. As previously mentioned, learners perceived the scope of the climate solutions presented in the tool to be fairly comprehensive, especially given its level of aggregation at the global scale and focus on mitigation. Among the other half of respondents who remained neutral or disagreed, they desired more detail complexity. For instance, there was an impression that the model was bias towards the Global North; consequently, learners wanted more disaggregation between regions to capture inequalities in the world. Another major theme was behavioural change \u0026ndash; that is, they wanted more specificity in the social system to reflect changing individual preferences and consumption patterns.\u003c/p\u003e \u003cp\u003e \u003cem\u003eInternal Consistency\u003c/em\u003e. Interestingly, most respondents (56%) were neutral to the statement that tool accurately translated inputs into outputs. During the discussion, it was clear that learners were unable to determine the accuracy given that the causal logic of the model was not accessible to them. They either took the outputs for granted, amounting to \u0026ldquo;a kind of blind trust in the model\u0026rdquo; or inferred the causal logic based on their background knowledge and facilitator explanation.\u003c/p\u003e \u003cp\u003e \u003cem\u003ePolicy Feasibility\u003c/em\u003e. Responses were split for assessing the feasibility of climate solutions presented in the ILE: about 48% indicated that a fair amount were feasible, whereas 42% thought only a few were. In general, there was a sentiment that the tool was suitable for assessing the effectiveness of hypothetical policy measures, but not the feasibility of actual implementation. To better assess feasibility, learners suggested that the tool should describe how the policy measures could be implemented in the real-world and consider the costs associated with each policy \u0026ndash; both in terms of monetary cost and other social costs and externalities.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eLegitimacy\u003c/h3\u003e\n\u003cp\u003eLegitimacy relates to user trust in the sources and production of information, and the presentation of information in the knowledge system, leading to its acceptance (Cash et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2003\u003c/span\u003e). In our context, learners deemed the ILE to be legitimate when users trust it as an important source of information based on the veracity of its inputs and outputs, beyond simply appealing to the authority and expertise of the facilitator(s). Figure\u0026nbsp;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e6\u003c/span\u003e shows the assessment results for the three dimensions of Legitimacy: Appeal to Authority, Trust in Inputs, and Trust in Outputs.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cem\u003eAppeal to Authority\u003c/em\u003e. Most respondents (76%) indicated that the facilitator\u0026rsquo;s expertise enhanced their trust in the information provided in the ILE. Only about 15% of respondents indicated that their trust is independent of the facilitator whereas less than 10% were wholly dependent on the facilitator\u0026rsquo;s expertise. These results suggest that legitimacy in the ILE is partially anchored in social cues of expertise, particularly because learners did not think they would have been able to fully understand the outputs they observed. As one learner explains, \u0026ldquo;\u0026hellip;it required your explanation as to why things [happen]\u0026hellip;you could see the result, but you didn\u0026rsquo;t understand why it was counterintuitive.\u0026rdquo; This reliance could undermine the tool\u0026rsquo;s legitimacy as a standalone resource.\u003c/p\u003e \u003cp\u003e \u003cem\u003eTrust in Inputs and Outputs\u003c/em\u003e. Generally, learners seemed to trust the inputs used to build the ILE with 48% indicating almost all were justifiable and 44% indicating a fair amount. However, trust in outputs was more cautious: most respondents (63%) reported they would use the tool primarily as a supplementary source of climate information rather than a fundamental source (17%). During the discussion, it became clear that learners\u0026rsquo; trust in inputs was influenced by an appeal to authority; they relied on the reputation of the modellers from a highly prestigious institution (MIT) despite lacking visibility into the model structure, data sources, or underlying assumptions within the ILE. Consequently, learners expressed greater confidence in the directionality and relative magnitude of changes in outputs rather than their precision. For that purpose, they indicated that they would cross-reference the results with other tools. Several learners also reiterated that their trust would improve if they could \u0026ldquo;open up the black box\u0026rdquo; and understand the causal logic driving the model.\u003c/p\u003e\n\u003ch3\u003eEffectiveness\u003c/h3\u003e\n\u003cp\u003eSimulation-based ILEs are generally intended to foster systems understanding (Kopainsky \u0026amp; Sawicka, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2011\u003c/span\u003e). In our study, learners engaged with En-ROADS within a systems thinking context and naturally evaluated the tool against its ability to illuminate interrelationships within the system. As one learner put it: \u0026ldquo;I don\u0026rsquo;t actually get an actual systems understanding of the drivers of climate change. I only get to see that this is the policy we\u0026rsquo;re going to put and I\u0026rsquo;m going to see the consequences.\u0026rdquo; We mapped this to the quality principle of effectiveness (Belcher et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2016\u003c/span\u003e), which, for our purpose, refers to the actual or potential contribution of the ILE to building systems understanding competency among users.\u003c/p\u003e \u003cp\u003eRecognizing that systems understanding is a latent construct, we drew on existing operationalizations for assessing different levels of this understanding (e.g., Stave \u0026amp; Hopper, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). Accordingly, the ILE is deemed effective for building systems understanding when it helps learners identify the causes and consequences of climate change; make the causal connections between climate change and climate impacts; perceive the interconnected feedback nature of the causal connections; identify how to intervene in the system to mitigate climate change; and evaluate the effectiveness and unintended consequences of policy options. Figure\u0026nbsp;\u003cspan refid=\"Fig7\" class=\"InternalRef\"\u003e7\u003c/span\u003e reports the results for systems understanding competency across the five subdimensions.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eSince the systems understanding statements were presented and evaluated together in a single slide during the session, we report the findings collectively rather than criterion by criterion. Overall, learners perceived a mixed effectiveness of the ILE in fostering systems understanding. Positive ratings were highest for identifying causes and consequences (64%) followed by understanding causal pathways (52%). Importantly, positive ratings declined to less than half for perceiving the more complex dimensions of systems understanding: 46% for leverage points, 42% for unintended consequences, and 33% for feedback loops. Across all dimensions, neutral responses were substantial, which could indicate uncertainty rather than outright disagreement.\u003c/p\u003e \u003cp\u003eThe first two results are unsurprising given the tool\u0026rsquo;s focus on climate solutions that address the drivers or causes of climate change in order to mitigate unfavourable climate consequences. Here, experimentation with the policy sliders allowed learners to infer how impacting the drivers could result in changes in climate outcomes. However, during the discussion, learners frequently focused on the poor visibility of the interconnections between variables. They reflected that feedback relationships could only be inferred through the most effective policies, from which they had to work backwards to understand how they were connected. They further mentioned that such inferences could be challenging for those without sufficient knowledge about the relationships between policies and drivers. Given the uncertainty over the feedback loops and their interplay, learners were not confident in their understanding of the leverage points or unintended consequences despite identifying those in the model outputs.\u003c/p\u003e "},{"header":"Discussion and Conclusions","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003cp\u003eIn this study, we have demonstrated a participatory, developmental evaluation approach for evaluating a state-of-the-art simulation-based ILE for climate education, grounded on the Co-QA framework. Rather than focusing solely on predefined learning outcomes, we elicited learner-defined quality criteria and evaluated how specific design decisions shape perceived salience, accessibility, credibility, legitimacy, and effectiveness for systems understanding.\u003c/p\u003e \u003cp\u003eLearners deemed En-ROADS to be salient for providing system-level insights and enabling experimentation with prior beliefs but found it less salient for motivating concrete personal climate action. Accessibility was high for ease of use yet constrained by limited transparency of model mechanisms and by representations that were not always relatable in scale or social meaning. Credibility hinged on learners\u0026rsquo; perceived adequacy of the coverage of climate solutions and intersectoral representations of the system but was hampered by uncertainty about internal causal logic between inputs and outputs as well as limited discussion on the feasibility of implementing potential solutions. Legitimacy rested strongly on the authority of the facilitator and/or institutional affiliation of the tool given limited accessibility of the inner workings. As for effectiveness, learners were able to identify connections between causes and consequences but struggled to perceive feedback relationships or explain observed leverage points and unintended consequences. These findings suggest that while the ILE affords broad exploration of climate policies that supports awareness of the systemic nature of climate action, it also obscures the causal structure of the simulation model and the social implications of the simulations, which prevent deeper systems understanding and actionability.\u003c/p\u003e \u003cp\u003ePrior evaluations of simulation-based ILEs emphasize predefined outcomes such as knowledge gains and engagement (e.g., Rooney-Varga et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2020\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). The Co-QA lens, adapted in our approach, complements these by foregrounding how learners experience the ILE along multiple co-defined quality criteria, including the inputs to the ILE and the process of engaging with the ILE. By looking \u0026lsquo;upstream\u0026rsquo; from the learning outcomes alone, and widening the criteria considered, this framework reveals how design choices shape the quality of the tool and, importantly, the process-based bottlenecks (e.g., \u0026ldquo;black-box\u0026rdquo; opacity that dampens accessibility, credibility, and legitimacy) that conventional outcomes measures may miss. Such findings could then provide design recommendations to improve the fitness of function of the ILE. For instance, some key design implications from our findings are summarized in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eTranslating formative evaluation findings into instructional design: design decisions aligned with learner-defined quality criteria and their intended instructional effects.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDesign decision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTargeted quality criteria\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eIntended instructional effect\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAdd micro-explanations and/or simplified visualization of model structure and assumptions, causal logic, and feedback interactions\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAccessibility: Transparent Inner Workings; Credibility: Internal Consistency; Legitimacy: Trust in Inputs and Outputs; Effectiveness: Feedback Loops\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEnhance transparency; enable identification and evaluation of feedback interrelationships; strengthen systemic causal reasoning\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEmbed narratives and visuals that link abstract variables (inputs or outputs) to lived realities\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSalience: Practical Insights; Accessibility: Relatable Representations; Effectiveness: Unintended Consequences\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eImprove relatability of policy options as well as the simulated outcomes of those policies; support reasoning about unintended consequences of simulated outcomes\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eBroaden policy space for non-mainstream or lifestyle-oriented intervention options\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSalience: Experimentation and Inspire Action; Effectiveness: Leverage Points\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eIncrease relevance of policy experimentation; support perceived individual agency for climate action; reveal additional leverage points\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiscuss policy implementation and costs as well as pathways for individual and collective action\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCredibility: Policy Feasibility; Salience: Inspire Action\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eImprove judgements of feasibility and actionability of policy options; support contextualized decision-making for climate action\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDeductive coding framework used to analyse the qualitative data during the Co-QA phase, including sample coded excerpts from the transcripts.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eParent Code\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eChild Codes\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eExamples\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSalience\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePractical Insights\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;The tool reminds you of the complexity. And I do believe that someone who doesn't think about climate change that much can learn a lot here\u0026hellip; I found that interesting. Normally you only look at one thing at a time. For example, vegan nutrition. But to see how everything is connected, what the whole thing looks like, that was exciting. To see the complexity.\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;I think to have a little bit more [practical insight], like, the model is just based on reducing the CO2 levels. And it doesn't reflect what would happen in society if that were to be implemented, for example. Or that you see the consequences for humans.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eExperimentation\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;In our group, we discussed what we are already doing for the climate. We came up with a few examples. But we then found it very difficult to find these actions in the tool. For example, there is no responsible production and consumption section. We would have liked to know how much fair trade brings. Or less flying. Or the consequences of fast fashion. But you can't see that in the tool.\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;It didn't have like radical solutions, or quote unquote radical solutions, or like things that are becoming to be part of mainstream debates, like Degrowth, for example. You couldn't go negative economic growth. Alright, it's a debate, I think you should be able to play around with it. And then also it doesn't have the implications of what that means for any other industries.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eInspire Action\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;It did not inspire any climate action because I think that to the end; to just get to the degrees, we made a lot of policies that just seemed completely unrealistic and that would never happen, and it was only to get to the degrees. So, if anything, this experience just made me believe that we are all doomed and there is no hope\u0026hellip; So, if I previously did something like saving energy or anything else that I believe is good for the environment. But after this, I couldn't care less. If anything, yeah, I would say I care less about the environment than I did before because I didn't really see any reason, or I don't believe there was hope in coordinated action that would actually change anything.\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;I don't know, an awareness that the real change is going to come from pressure points in the system and to not sort of like individualized, atomized actions, which can be depending on how you look at it like either super discouraging or super galvanizing.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAccessibility\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEase of Use\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;\u0026hellip;everything was explained very clearly. So yes, like you said at the beginning, oh you should use the three dots, but otherwise I think someone could have even like discovered that himself...Everything was super clear.\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;The only thing we did lack maybe sometimes was time, but that's okay. For example, we would modify parameters without really knowing what it meant concretely because we did not have the time to just click on the concept and read more thoroughly what it was really about. But as a part of this, I think, quite straightforward.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eRelatable \u003c/p\u003e \u003cp\u003eRepresentations\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;I had a pretty low score for this one\u0026hellip; because, well, I mentioned the kind of like the coal subsidy, but I can't determine whether or not \u003cspan\u003e$\u003c/span\u003e5 a ton or \u003cspan\u003e$\u003c/span\u003e5 per cubic foot is small or large. I googled what my carbon tax is at home, and it's 80 Canadian dollars, so it was 60 US dollars per ton. So, I went off that, so that was one kind of sense of scale that I had, but it is the scale that I didn't have an easy time with.\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;I think I also found myself thinking about when you're in a gamified environment. It's kind of easy for it to feel like, oh, 0.1\u0026deg; and what? What does that really mean? Visualization tools are just informational tools that really shows how much of a different world even 0.1\u0026deg; of warming generates\u0026hellip;Like how much human suffering that hurts? How much ecosystem impacts it has an effect on? I think that would help.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTransparent Inner Workings\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;\u0026hellip;sometimes nothing happened. I think that was one of our applications: electrification. It went all the way up and nothing happened. I'm like, OK, I still don't know what it does.\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;I think just like physically, like moving the slider you see the thing change so that's why I kind of gave this a 3. But I think I didn't really understand the kind of magnitude of the change before I made. It was basically trial and error. But I think the actual like interface like the user experience design worked pretty well for that.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCredibility\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eExternal Consistency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;The only thing that I really thought was kind of glaringly missing, that I also acknowledge would be really hard to put on a sliding scale, is behavioural change, particularly related to consumption. Maybe that's influenced by the fact that it's something that we've been talking about a lot collectively, but I think that was, yeah, that, because it feels like it has a bearing on so much of the rest of the factors, particularly the ones in the, yeah, like transport building industry growth. So, some kind of metric for behavioural change, particularly related to consumption, felt missing, but also understand how difficult that would be to incorporate.\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;I was going to say that for the second question, I was more on the high end because I think it's adequate for me for what it's trying to do, like it's not trying to get into all of the details of how politically or financially difficult any of these issues would be, or like the mentioned equity impacts as something that it's not specifically. It can't do because it's just doing one global model. So, for me it is adequate. But of course, it could be better.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eInternal Consistency\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;I don't know more about the data, the assumptions, and how the model works. How can I understand, really, how something relates; how an input creates an output?\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;I think there were definitely things that I toggled, that my brain was like that shouldn't have that effect or like that should have a bigger effect than that\u0026hellip;but then my second thought was sort of like okay there's probably some delay in the system, or some inertia, or some like unforeseen unclear reason why this is the case, which I don't know if it's like assumptions based on assuming that systems are sticky or just kind of blind trust in the model, right? But like, even if I did have a moment where I was like, huh, that's not what I expected. I did kind of assume that there was a reason for it that's hidden somewhere in the complexity.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003ePolicy Feasibility\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;I would say we need a bit more of an approach as to how it could be integrated and actually implemented. So far, we\u0026rsquo;ve only looked at theoretical measures and not how feasible they are\u0026hellip;And maybe also a diagram or something with a degree of difficulty of implementation, because it all looks much easier than it is\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;I think to have a little bit more, like, the models just based on reducing the CO2 levels. And it doesn't reflect what would happen in society if that were to be implemented, for example. And I think that is a very important thing that needs to make it realistic. Because we can shoot, like, carbon tax way high and reduce it really easily. But then you look at the consequences, and you find out why that hasn't happened yet. Yeah. So, I think that's where it could be improved.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLegitimacy\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAppeal to Authority\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;I think that's because it required your explanation as to why things, like when something was counterintuitive, maybe you could see the result, but you didn't understand why it was counterintuitive.\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;Well, we kind of had you with us. From that point of view, it was okay. But if you're supposed to do it alone, then it's not so understandable. When someone does it alone.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTrust in Inputs\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;I mean, I trust them, but the only reason why I put I think I trust all of them is because it's backed by the MIT, a very prestigious institution. It's basically because I know I'm not an expert in this and I rely on all of my trust towards the knowledge of these procedures in the university. It's basically a program-related trust.\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;I feel like this only gives a rough relationship and we cannot depend on it. I mean, there are several factors which are influencing one thing. So, I don't believe in this model. It just gives a rough estimate and maybe helps us understand relationship. That's it.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eTrust in Outputs\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;I've never seen such a comprehensive tool. a very hands-on tool where I think it was really helpful. So, like, as it was mentioned, like play with the sliders, go left and right and see what the impact is. So, I would say like for me, this is like the best climate information source I've ever interacted with. So that's why it's a fundamental source.\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;\u0026hellip;it's also, you know, you don't need to be able to build a car to drive a car. But you need to understand how cars work if you want to fix them\u0026hellip;And you need to understand a little how they work because otherwise driving can become really dangerous. And that's probably similar to modeling. You don't need to understand the model in detail to trust it. But if you don't understand anything at all, it's not a big surprise that people don't act upon your recommendations.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEffectiveness\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eSystems \u003c/p\u003e \u003cp\u003eUnderstanding\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026ldquo;It does help me to understand what are the important drivers. And what are its impacts on the consequences and to decreasing the climate, but it does not help me to understand the causal relationship.\u0026rdquo;\u003c/p\u003e \u003cp\u003e\u0026ldquo;I didn't see a single, like any kind of like feedback loop kind of situation. It would just be me trying to base off of previous knowledge.\u0026rdquo;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eOur facilitated co-defined evaluation approach, therefore, doubles as a design-thinking cycle within formative developmental research (Richey et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2004\u003c/span\u003e). That is, the ILE prototype is iteratively assessed after each cycle of instructional design improvements to identify priority areas for enhancement. We do not assume that any single ILE should satisfy all user needs or use contexts. Rather, the evaluation findings reveal instructional design trade-offs that must be explicitly negotiated in subsequent iterations. By making these trade-offs transparent, the framework moves beyond summative evaluations to inform adaptive design (Richey et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2004\u003c/span\u003e) \u0026ndash; ultimately complementing outcomes-based learning assessments and strengthening their explanatory power and effect sizes.\u003c/p\u003e \u003cp\u003eBy mapping learner-elicited criteria to established quality principles and operationalizing them as measurable indicators, this study also advances the co-creation of a transparent theory of how ILE quality manifest to learners based on their subjective experiences and expectations. However, the framework presented here remains contingent since the sample was limited to students, predominantly from European contexts. Future work could therefore extend this Co-QA to other user groups such as educators, policy practitioners, and community organizers across diverse cultural contexts to elicit additional quality criteria. This would provide a more textured picture of what \u0026lsquo;quality\u0026rsquo; means for actors working with ILE, and which criteria are shared or transferable across groups.\u003c/p\u003e \u003cp\u003eA logical next step is to formalize the measurement instrument. The Construct Mapping approach (Wilson, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) offers a systematic method to enforce hierarchical progressions within each quality dimension, while Rasch modelling (Andrich \u0026amp; Marais, \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2019\u003c/span\u003e) can be used to statistically validate the scale. Here, the completed Co-QA tables shape the item hierarchy, ensuring that the measurement reflects learner experiences rather than expert assumptions alone. Once validated through Rasch analysis, the instrument can be deployed to rigorously measure and benchmark the quality of multiple ILEs. Ultimately, we contend that our Co-QA approach, adapted for educational technology, lays the foundation for advancing transparent, participatory, and evidence-based standards for evaluating and designing simulation-based ILEs for climate education.\u003c/p\u003e \u003c/div\u003e"},{"header":"Declarations","content":" \u003cp\u003eEthics Approval:\u003c/p\u003e\n\u003cp\u003eAll procedures adhered to the ethical standards set by the Norwegian National Research Ethics Committees and received formal approval from the University of Bergen’s System for Risk and Compliance (RETTE), under reference number F3071.\u003c/p\u003e\n\u003cp\u003eConsent:\u003c/p\u003e\n\u003cp\u003eInformed consent was obtained from all individual participants included in the study. Participation in this study was voluntary, and participants could decline to answer any question at any time.\u003c/p\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003eData availability statement\u003c/h2\u003e \u003cp\u003eThe data and codes for reproducing the figures can be retrieved from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.5281/zenodo.17805467\u003c/span\u003e\u003cspan address=\"10.5281/zenodo.17805467\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/p\u003e \u003c/div\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAndrich D, Marais I (2019) A Course in Rasch Measurement Theory: Measuring in the Educational, Social and Health Sciences. Springer Nature Singapore. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-981-13-7496-8\u003c/span\u003e\u003cspan address=\"10.1007/978-981-13-7496-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBelcher BM, Rasmussen KE, Kemshaw MR, Zornes DA (2016) Defining and assessing research quality in a transdisciplinary context. Res Evaluation 25(1):1\u0026ndash;17. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/reseval/rvv025\u003c/span\u003e\u003cspan address=\"10.1093/reseval/rvv025\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBraun V, Clarke V (2006) Using thematic analysis in psychology. Qualitative Res Psychol 3(2):77\u0026ndash;101. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1191/1478088706qp063oa\u003c/span\u003e\u003cspan address=\"10.1191/1478088706qp063oa\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBremer S, Wardekker A, Baldissera Pacchetti M, Soares B, M., Van Der Sluijs J (2022) Editorial: High-Quality Knowledge for Climate Adaptation: Revisiting Criteria of Credibility, Legitimacy, Salience, and Usability. Front Clim 4:905786. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fclim.2022.905786\u003c/span\u003e\u003cspan address=\"10.3389/fclim.2022.905786\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBremer S, Wardekker A, Dessai S, Sobolowski S, Slaattelid R, Van Der Sluijs J (2019) Toward a multi-faceted conception of co-production of climate services. Clim Serv 13:42\u0026ndash;50. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.cliser.2019.01.003\u003c/span\u003e\u003cspan address=\"10.1016/j.cliser.2019.01.003\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBremer S, Wardekker A, Jensen ES, Van Der Sluijs JP (2021) Quality Assessment in Co-developing Climate Services in Norway and the Netherlands. Front Clim 3:627665. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fclim.2021.627665\u003c/span\u003e\u003cspan address=\"10.3389/fclim.2021.627665\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCash DW, Clark WC, Alcock F, Dickson NM, Eckley N, Guston DH, J\u0026auml;ger J, Mitchell RB (2003) Knowledge systems for sustainable development. \u003cem\u003eProceedings of the National Academy of Sciences\u003c/em\u003e, \u003cem\u003e100\u003c/em\u003e(14), 8086\u0026ndash;8091. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1073/pnas.1231332100\u003c/span\u003e\u003cspan address=\"10.1073/pnas.1231332100\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eClimate Interactive (2025) \u003cem\u003eThe En-ROADS Climate Solutions Simulator\u003c/em\u003e. Climate Interactive: Tools for a Thriving Future. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.climateinteractive.org/the-en-roads-climate-workshop/learn-to-lead-the-workshop/#workshop-materials\u003c/span\u003e\u003cspan address=\"https://www.climateinteractive.org/the-en-roads-climate-workshop/learn-to-lead-the-workshop/#workshop-materials\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDeegan M, Stave K, MacDonald R, Andersen D, Ku M, Rich E (2014) Simulation-Based Learning Environments to Teach Complexity: The Missing Link in Teaching Sustainable Public Management. Systems 2(2):217\u0026ndash;236. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/systems2020217\u003c/span\u003e\u003cspan address=\"10.3390/systems2020217\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKapmeier F, Greenspan AS, Jones AP, Sterman JD (2021) Science-based analysis for climate action: How HSBC Bank uses the En‐ROADS climate policy simulation. Syst Dynamics Rev 37(4):333\u0026ndash;352. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/sdr.1697\u003c/span\u003e\u003cspan address=\"10.1002/sdr.1697\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKolb DA (2015) \u003cem\u003eExperiential Learning: Experience as the Source of Learning and Development\u003c/em\u003e (Second edition). Pearson Education, Inc\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKopainsky B, Alessi S (2015) Effects of Structural Transparency in System Dynamics Simulators on Performance and Understanding. Systems 3(4):152\u0026ndash;176. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/systems3040152\u003c/span\u003e\u003cspan address=\"10.3390/systems3040152\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKopainsky B, Sawicka A (2011) Simulator-supported descriptions of complex dynamic problems: Experimental results on task performance and system understanding. Syst Dynamics Rev 27(2):142\u0026ndash;172. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/sdr.445\u003c/span\u003e\u003cspan address=\"10.1002/sdr.445\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLemos MC, Morehouse BJ (2005) The co-production of science and policy in integrated climate assessments. Glob Environ Change 15(1):57\u0026ndash;68. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.gloenvcha.2004.09.004\u003c/span\u003e\u003cspan address=\"10.1016/j.gloenvcha.2004.09.004\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMaier FH, Gr\u0026ouml;\u0026szlig;ler A (2000) What are we talking about??A taxonomy of computer simulations to support learning. Syst Dynamics Rev 16(2):135\u0026ndash;148. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/1099-1727(200022)16\u003c/span\u003e\u003cspan address=\"10.1002/1099-1727(200022)16\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e:2%253C135::AID-SDR193%253E3.0.CO;2-P\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRichey RC, Klein JD, Nelson WA (2004) Developmental Research: Studies of Instructional Design and Development. Handbook of research on educational communications and technology, 2nd edn. Lawrence Erlbaum Associates, pp 1099\u0026ndash;1130\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRooney-Varga JN, Coleman RL, Jones AP, Kapmeier F, Newsome P, Noiseux K, Patten B, Rath K, Sterman JD (2025) Interactive role-play with climate policy simulation can motivate evidence-based climate action. Commun Earth Environ 6(1):769. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s43247-025-02744-w\u003c/span\u003e\u003cspan address=\"10.1038/s43247-025-02744-w\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRooney-Varga JN, Kapmeier F, Sterman JD, Jones AP, Putko M, Rath K (2020) The Climate Action Simulation. Simul Gaming 51(2):114\u0026ndash;140. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1177/1046878119890643\u003c/span\u003e\u003cspan address=\"10.1177/1046878119890643\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRouwette EAJA, Gr\u0026ouml;\u0026szlig;ler A, Vennix JAM (2004) Exploring influencing factors on rationality: A literature review of dynamic decision-making studies in system dynamics. Syst Res Behav Sci 21(4):351\u0026ndash;370. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/sres.647\u003c/span\u003e\u003cspan address=\"10.1002/sres.647\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStave K, Hopper M (2007), August 29 What Constitutes Systems Thinking? A Proposed Taxonomy. \u003cem\u003eProceedings of the 2007 International System Dynamics Conference\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSterman J, Fiddaman T, Franck T, Jones A, McCauley S, Rice P, Sawin E, Siegel L (2012) Climate interactive: The C-ROADS climate policy model. Syst Dynamics Rev 28(3):295\u0026ndash;305. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/sdr.1474\u003c/span\u003e\u003cspan address=\"10.1002/sdr.1474\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSterman J, Fiddaman T, Franck T, Jones A, McCauley S, Rice P, Sawin E, Siegel L (2013) Management flight simulators to support climate negotiations. Environ Model Softw 44:122\u0026ndash;135. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.envsoft.2012.06.004\u003c/span\u003e\u003cspan address=\"10.1016/j.envsoft.2012.06.004\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSterman J, Franck T, Fiddaman T, Jones A, McCauley S, Rice P, Sawin E, Siegel L, Rooney-Varga JN (2015) WORLD CLIMATE: A Role-Play Simulation of Climate Negotiations. Simul Gaming 46(3\u0026ndash;4):348\u0026ndash;382. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1177/1046878113514935\u003c/span\u003e\u003cspan address=\"10.1177/1046878113514935\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003evan der Sluijs JP, Petersen AC, Janssen PHM, Risbey JS, Ravetz JR (2008) Exploring the quality of evidence for complex and contested policy decisions. Environ Res Lett 3(2):024008. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1088/1748-9326/3/2/024008\u003c/span\u003e\u003cspan address=\"10.1088/1748-9326/3/2/024008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWilson M (2023) \u003cem\u003eConstructing Measures: An Item Response Modeling Approach\u003c/em\u003e (2nd ed.). Routledge. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.4324/9781003286929\u003c/span\u003e\u003cspan address=\"10.4324/9781003286929\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[{"identity":"6f647cfc-2079-4d0b-be87-2cf4f9bea67b","identifier":"10.13039/501100000780","name":"European Commission","awardNumber":"101081661","order_by":0}],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"University of Bergen","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"simulation-based learning, interactive learning environments, climate education, knowledge quality assessment, collaborative formative evaluation, developmental research.","lastPublishedDoi":"10.21203/rs.3.rs-8843292/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8843292/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eSimulation-based interactive learning environments (ILEs) are widely used in climate education to foster systems understanding and experiential learning. Yet, most evaluations emphasize predefined learning outcomes, offering limited insight into how learners experience these tools and what quality means to them. This study introduces an adapted Collaborative Quality Assessment (Co-QA) framework as a participatory, developmental evaluation approach for simulation-based ILEs. This approach invites users to co-define quality criteria and assess tools against those criteria, enabling a context-sensitive appraisal of instructional design decisions. We applied Co-QA to En-ROADS, a climate policy simulator, across eight workshops with 104 learners. Our mixed-methods design combined qualitative and quantitative assessments structured around five quality principles: salience, accessibility, credibility, legitimacy, and effectiveness for systems understanding. By foregrounding learner perspectives, Co-QA reveals process-based dimensions of quality that conventional outcome metrics may overlook. Our findings indicate that the ILE supports broad exploration of climate policies and awareness of the systemic nature of climate action, but that opaque causal mechanisms and limited social contextualization impede deeper systems understanding and actionability. We position Co-QA as a formative design cycle that generates actionable implications for design. Specifically, we recommend (1) micro-explanations and visualizations to enhance transparency; (2) narratives and impact visuals to improve representational relatability; (3) broadened policy space to support personally relevant experimentation; and (4) implementation framing and pathways to bolster feasibility and actionability of climate solutions. We contend that Co-QA helps advance transparent, participatory, and evidence-based standards for evaluating and designing simulation-based ILEs for climate education.\u003c/p\u003e","manuscriptTitle":"Collaborative quality assessment of simulation-based interactive learning environments for climate education","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-16 09:13:42","doi":"10.21203/rs.3.rs-8843292/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"17421348-77f9-46c8-b9d5-55e258d20b55","owner":[],"postedDate":"February 16th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":62683979,"name":"Environmental Policy"}],"tags":[],"updatedAt":"2026-02-16T09:13:43+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-16 09:13:42","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8843292","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8843292","identity":"rs-8843292","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.