Redistributing Epistemic Labor: Prior Knowledge Shapes How Effectively Students Use Large Language Models | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Redistributing Epistemic Labor: Prior Knowledge Shapes How Effectively Students Use Large Language Models Matthias Stadler, Raisa Kirikaidou, Michael Sailer, Maria Bannert, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9084455/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Medical education prepares students to reason under uncertainty, critically evaluate evidence, and make accountable clinical judgements, epistemic demands that define professional competence. Large language models (LLMs) are now entering this field, shifting the responsibility for seeking and synthesizing information from learners to tools. Whether this redistribution of epistemic labor supports or undermines the development of clinical reasoning depends on what learners already know, we argue. To test this theory, we conducted an experiment in which medical and social science students were asked to complete a critical reasoning task on an unsettled medical issue, the safety of nanoparticle-based sunscreen, using either an LLM (ChatGPT-4) or a traditional search engine (Google). Across both groups, LLM users reported significantly lower cognitive load, confirming that AI assistance reduces perceived effort regardless of expertise. However, the effect on justification quality was moderated by domain knowledge: medical students produced stronger justifications when using the LLM, whereas social science students produced stronger justifications when using the search engine. These findings suggest that LLMs do not uniformly support or hinder epistemic performance, their effect depends on the prior knowledge that learners bring to the task. For medical education, this has direct implications: integrating LLMs into curricula requires not only technical access, but also the deliberate preparation of students to exercise epistemic responsibility, evaluating, challenging and taking ownership of AI-generated information, rather than deferring to it. Figures Figure 1 Introduction Medical education makes specific epistemological demands on learners. Students must acquire factual knowledge and learn to evaluate evidence in situations of uncertainty, weigh competing claims and base their clinical judgements on scientific reasoning. These capacities, which are sometimes referred to as epistemic competence, are central to medical training and to the way clinicians reason. The way in which students develop these capacities depends, in part, on the cognitive and epistemic work they are required to undertake during their studies. The tools that mediate this work therefore matter, as they do not merely support learning; they also shape the type of learning that is possible. Large language models (LLMs) are now entering this mediating role (Rainie et al., 2019 ). Unlike search engines, which return a list of external documents requiring learners to actively search, filter and synthesize information, LLMs generate synthesized responses directly. The cognitive and epistemic work involved in inquiry, locating sources, evaluating their credibility and reconciling conflicting evidence, is handled at least in part by the system (Ayoub et al., 2024 ). The learner's task then becomes one of judging plausibility, calibrating trust, and deciding whether to accept, challenge, or extend the AI's output (Fischer et al., 2014 , McFadden, 2025). This redistribution of labor raises an urgent question for medical education: does it promote or hinder the development of the epistemic competencies required for professional practice? Early evidence suggests that the answer depends on what learners already know. According to cognitive load theory (Sweller, 2011 ), learning involves three types of cognitive load: intrinsic load, which arises from content complexity; extraneous load, which arises from inefficient information presentation; and germane load, which reflects the effort invested in schema construction and meaning-making (Klepsch et al., 2017 ). Since LLMs provide information in a concise, conversational format, they likely reduce the extraneous burden of navigating multiple sources and the intrinsic burden of understanding complex materials (Ahmed, 2024 ; Bai, et al., 2023 ). Stadler et al. ( 2024 ) confirmed this experimentally. Social science students who used LLMs to research an unsettled medical question reported significantly lower cognitive load across all three dimensions compared to students who used traditional search engines. Other studies corroborate this pattern of reduced mental effort with AI assistance (Firat & Kuleli, 2024 ). However, reduced effort did not translate into better performance. Students in the LLM condition produced justifications of lower quality and depth and engaged less actively with diverse sources. Patac and Patac Jr. ( 2025 ) observed a similar pattern. LLMs increase perceived efficiency, but they encourage more superficial engagement. By contrast, traditional search demands the kind of active elaboration, contrasting, synthesizing, and inferring, that underlies robust understanding (Chi & Wylie, 2014 ; Fan et al., 2025 ). These findings suggest a potential tension in medical education. If LLMs reduce the cognitive effort associated with seeking evidence, and if that effort is precisely what builds the epistemic skills that clinicians need, then unreflectively integrating LLMs into medical curricula could undermine, rather than support, professional formation. Bauer et al. ( 2025 ) referred to this effect as an inversion effect of LLMs. However, this concern may not apply uniformly. Students with substantial prior knowledge may be better positioned to critically engage with LLM outputs, recognizing omissions and implausibilities and integrating AI-generated information into coherent reasoning rather than simply accepting it (Binhammad et al., 2024 ). In terms of cognitive load, this corresponds to the expertise reversal effect: instructional supports that hinder novices may be neutral or beneficial for advanced learners (Kalyuga, 2007 ; Paas & van Merriënboer, 2020 ; Tetzlaff et al., 2025 ). Thus, whether LLMs function as genuine epistemic support or as shortcuts that displace necessary reasoning depends on the knowledge learners bring to the task. The present study tests this directly. Building on Stadler et al. ( 2024 ), who excluded medical students to avoid the effects of prior knowledge, we now include them as the theoretically critical case, not as an afterthought. Medical students have stronger domain knowledge and professional epistemic norms that emphasize evaluating evidence, acknowledging uncertainty, and providing accountable justifications. Thus, they represent a population for whom LLMs might function differently than for novices. By assigning medical and social science students to research the same unsettled medical question using either ChatGPT-4 or Google, we examine whether the "cognitive ease at a cost" finding generalizes, reverses, or differentiates across levels of domain expertise. The outcome will have direct implications for the integration of LLMs into health professions education, or for their cautious introduction. Methods Design and Participants This study used a between-subjects experimental design, with the primary factor being the research tool assigned: LLM or Search Engine. The target population was advanced students in medicine, given the study’s focus on a medically relevant research task. We recruited 30 medical students from the BLINDED FOR REVIEW. Participants were in the later years of their program (third year or higher), ensuring they had sufficient scientific background and were familiar with the norms of evidence-based reasoning in medicine. The sample was 60% female, had a mean age of 27.8 years ( SD = 4.95). We obtained informed consent from all students for participation. In addition to the medical student sample, we made use of data from a comparison sample of 91 students in social sciences from a previous study (Stadler et al., 2024 ). Those students had been randomly assigned to the same two tool conditions (LLM or Search Engine) and had completed the same task (described below) under comparable experimental settings. We included this external dataset to enable between-group comparisons (medical vs. social sciences students), thereby testing the generalizability of tool effects across learners with different domain knowledge. Notably, the social science students had been screened to ensure none were from medicine or related disciplines (Stadler et al., 2024 ). Ethical approval for using the previously published dataset was covered under the original study’s review; for our new data collection with medical students, the research was approved by the institutional review board of BLINDED FOR REVIEW in accordance with the Declaration of Helsinki. An a priori power analysis was conducted using G*Power 3.1 (Faul et al., 2007 ) to estimate the required medical student sample size for detecting a difference between the ChatGPT and Google conditions, assuming an effect size in the range observed in the original study (d = 0.6–0.7; Stadler et al., 2024 ), α = .05, and power (1–β) = .80. Accordingly, the present medical student sample (N = 30) was not designed to test tool-use differences within medical students alone, but to examine whether the previously observed tool effects generalize or reverse under conditions of substantially higher domain knowledge when compared with the larger social sciences sample. Task and Procedure Participants first completed demographic questions and were then introduced to the research scenario. Adapting a task design from Kammerer et al. ( 2021 ), students were asked to assist a fictional friend, Paul, in deciding whether to continue using sunscreen containing mineral nanoparticles- specifically zinc oxide and titanium dioxide. Paul highlighted three perceived advantages: these particles reflect UV light rather than relying on chemical UV filters that may trigger allergies or hormonal side effects; no harmful effects of the particles are currently known; and nanoparticle-based sunscreens can achieve high SPF values offering strong skin protection. Despite these benefits, Paul was uncertain due to concerns about potential health risks associated with nanoparticles. Students were tasked with investigating whether these concerns were scientifically justified or unfounded. Students were randomly assigned to one of two groups using different tools of information search. The first group was assigned the “web search” condition (using the Google search engine; google.com)and the second group the “LLM” condition (sing the ChatGPT chatbot; GPT 4o). They were given 20 minutes to conduct their research, after which they would write a recommendation including a justification, relying solely on memory without access to any notes, websites, or ChatGPT outputs. To prevent cognitive load responses from being influenced by the writing task, the cognitive load questionnaire was administered immediately after the research phase, and before students composed their recommendations. Lastly, participants completed a brief prior knowledge assessment on nanotechnology. Measures Cognitive Load To assess participants’ experienced cognitive load, we used the 7 item Cognitive Load Scale developed by Klepsch, Schmitz, and Seufert ( 2017 ), which provides separate subscales for intrinsic load (e.g. “The topic was very complex”), extraneous load (e.g. “The way information was presented was confusing”), and germane load (e.g. “I put a lot of effort into understanding the content”). Each subscale consisted of several Likert-type items (rated 1 = strongly disagree to 7 = strongly agree). Following standard practice, we computed a mean score for each type of load. The internal consistency (Cronbach’s α) was satisfactory for all three subscales (α = 0.78–0.85). Prior Knowledge To assess participants’ understanding of nanotechnology, we employed a modified version of the Public Knowledge of Nanotechnology Test (PKNT) originally developed by Lin et al. ( 2013 ). This instrument consists of eight multiple-choice items, each offering four answer options, and covers foundational topics such as the scale and size of nanoparticles, material structure, and real-world applications of nanomaterials. Because both the prior knowledge measure and the justification quality score function as index scores rather than scales, internal consistency was not expected (see Stadler et al., 2021 ). Justification Quality : In line with Kammerer et al. ( 2021 ) and drawing on prior work in multiple document comprehension (Bråten et al., 2018 ), we evaluated the quality of students’ justifications by analyzing the content of their written recommendations. Specifically, we coded whether participants included any of the following arguments: (a) that risks are minimal or outweighed by benefits, (b) that applying a coating to nanoparticles can reduce potential harm, (c) that risks primarily arise when the skin is damaged, (d) that sprays may be hazardous due to the possibility of inhalation, (e) that risks are high or uncertain, (f) that the benefit is purely cosmetic, and (g) that non-nanoparticle mineral sunscreens could serve as alternatives. The coding framework was developed through expert consultation with nanotechnology specialists at the Leibniz Institute for New Materials in Saarbrücken, Germany, as well as through inductive analysis of the student justifications. The number of these relevant aspects identified in each justification was used as the dependent variable. Two independent coders rated all responses, achieving strong inter-rater reliability (κ = 0.92). Discrepancies were resolved through discussion. Participants were not informed of the coding categories in advance. Recommendation direction To examine whether using an LLM leads to a convergence of conclusions, which is a potential concern because AI tools may bias users toward a particular answer, we analyzed the recommendation decision in each essay. Specifically, we noted whether each student recommended using, avoiding, or was being ambivalent about nanoparticle sunscreens. Inter-rater reliability was very high for recommendation direction (κ = 0.95). Data Analysis All statistical analyses were performed using jamovi version 2.3 (The jamovi project, 2025 ). To examine the hypothesized differences in prior knowledge between the two student groups, we conducted an independent samples t-test. For the primary hypotheses, we ran analyses of variance (ANOVA), comparing group means for the Research Tool condition and the Student Groups with cognitive load and justification quality as dependent variables. To compare the homogeneity in recommendations across conditions, we used a chi-squared analysis. All analyses adhered to an alpha threshold of .05. Results Table 1 provides the descriptive statistics for all scales used by Student Group and scale correlations. The most notable descriptive difference between the student groups can be seen for prior knowledge, as expected. This difference also proved statistically significant ( t (119) = 3.66; p < .001; d = 0.77). Table 1 Descriptive statistics and correlations for all scales Medical Social Sciences M SD M SD 1 2 3 4 1. Prior knowledge 4.17 1.72 2.96 1.52 - 2. ICL 3.43 1.41 3.81 1.65 − .18 - 3. ECL 3.09 1.28 3.49 1.67 − .15 .49* - 4. GCL 3.58 1.31 3.99 1.59 − .10 .59* .53* - 5. Justification Quality 1.73 0.87 1.55 1.00 .22* .20* .21* .29* Note. ICL = Intrinsic cognitive load, ECL = Extraneous cognitive load, GCL = Germane cognitive load, M = Mean, SD = Standard deviation, * p < .05 Table 2 summarizes the ANOVA results comparing the mean values of the three cognitive load scales across the Student Group and Research Tool factors. There was no statistically significant effect of Student Group for all scales, but there was for Research Tool, with no interaction between the two factors. These results support the findings of Stadler et al. ( 2024 ) that researching with ChatGPT results in lower cognitive load than researching with Google, a finding that holds true for both medical and social sciences students. Table 2 ANOVA results for cognitive load and justification quality Dependent variable Effect F( 1,117) p η 2 ICL Student group Research Tool Interaction 1.40 5.92 2.33 .239 .016 .129 .01 .05 .02 ECL Student group Research Tool Interaction 1.70 5.15 0.46 .194 .025 .502 .01 .04 .00 GCL Student group Research Tool Interaction 2.64 25.17 0.80 .107 < .001 .389 .02 .17 .00 Justification Quality Student group Research Tool Interaction 1.21 0.71 6.45 .273 .401 .012 .01 .01 .05 Note. ICL = Intrinsic cognitive load, ECL = Extraneous cognitive load, GCL = Germane cognitive load Regarding the justification quality, there was no statistically significant effect of Student Group or Research Tool but a statistically significant moderation effect. Closer inspection revealed a disordinal interaction with medical students having higher quality justification in the LLM condition ( M = 1.92; SE = 0.26) than in the Search Engine condition ( M = 1.59; SE = 0.23) as opposed to the social sciences students that had higher quality justification in the Google condition (M = 1.87; SE = 0.14) than in the ChatGPT condition (M = 1.20; SE = 0.14). This is visualized in Fig. 1 . There were no statistically significant differences in recommendation homogeneity by Student Group or by Research Tool ( X 2 (2) = 4.42; p = .110). Discussion This study builds on previous research into cognitive load and LLM-supported learning by examining whether the epistemic effects of AI-assisted research differ as a function of learners’ prior knowledge. Building on the findings of Stadler et al. ( 2024 ), who showed that social sciences students using LLMs experienced reduced cognitive load but produced weaker justifications than those using search engines, the present study extends this work by introducing a sample of medical students. Medical students are trained within a professional context that emphasizes evaluating the quality of evidence, considering uncertainty, and grounding recommendations in scientific justification. By testing the same task in a population characterized by higher domain expertise and such professional epistemic norms, we examined whether the previously observed trade-off between cognitive ease and justification quality generalizes to learners who are expected to reason under conditions of accountability and evidence-based decision making. The results confirm and expand upon the earlier findings. Across both groups, LLM users reported significantly lower levels of cognitive load (intrinsic, extraneous and germane), replicating the 'ease' effect identified by Stadler et al. ( 2024 ). However, the quality of justifications revealed a more nuanced picture: whereas social sciences students produced better arguments using search engines, medical students showed the opposite pattern, generating stronger justifications with the LLM. This interaction effect suggests that prior knowledge moderates whether the cognitive ease afforded by an LLM comes at the expense of epistemic performance or enables more efficient reasoning without compromising depth. From a theoretical perspective, these findings contribute to ongoing discussions in cognitive load theory and multiple document comprehension. Cognitive load theory (Sweller et al., 2011) has long emphasized reducing extraneous load while preserving the generative aspects of germane load. However, our data suggest that, although ChatGPT reduces overall effort, it may also shift the nature of epistemic labor from an effortful search for information to judgment, interpretation, and calibration of trust. Therefore, germane load may not correspond directly to observable effort, especially when AI tools alter the learning task itself. These findings complement research in multiple document comprehension (Bråten et al., 2018 ; Kammerer et al., 2021 ), which has shown that deep reasoning requires integrating conflicting or uncertain information. This demand may be masked when using synthesized LLM outputs, unless learners possess the knowledge and disposition to critically interrogate them. The notion of epistemic responsibility is also central to interpreting these results. Relying on AI-generated responses can create a false sense of fluency and completeness in learners, which can lead to overtrust and underverification (Zhai et al. 2024 ). However, our findings show that this risk is not uniform. Medical students, likely due to their stronger prior knowledge and exposure to evidence-based reasoning norms, were able to navigate the LLM outputs without sacrificing the quality of their justifications. In contrast, social sciences students, who have less relevant domain expertise, appeared to benefit from the transparency and traceability of traditional search methods. These results underscore the importance of epistemic self-regulation, which involves not only knowing how to find information but also recognizing when and how to question the credibility and completeness of AI-generated content. From an educational standpoint, integrating LLMs into medical and health sciences curricula requires more than providing access or technical instruction. When students are assigned open-ended tasks that allow unrestricted LLM use, the effectiveness of such activities depends critically on how much relevant prior knowledge learners bring to the task. With limited domain knowledge, students are more likely to outsource core epistemic work to the system, relying on fluent outputs as substitutes for understanding. In contrast, when students possess sufficient prior knowledge, LLMs can support more productive forms of engagement, such as formulating meaningful questions, evaluating plausibility, and integrating information into coherent justifications. In conclusion, whether LLMs exhibit inversion effects when learning (see Bauer et al., 2025 ) depends on the prior knowledge of the learners. Instructional strategies should therefore be sensitive to learners’ knowledge levels and explicitly address how epistemic labor is redistributed in AI-supported inquiry. This is particularly important in clinical domains, where accountability, transparency, and evidence-based judgment are central to professional identity formation. Finally, the study contributes to a broader shift in how we conceptualize digital learning tools. Instead of focusing solely on cognitive efficiency or usability, we suggest viewing LLMs as epistemic interfaces that alter the relationship between learners and tools. The educational implications of this shift depend on both tool design and learner characteristics, including prior knowledge, disciplinary norms, and epistemic maturity. Future research should explore how these dynamics unfold across tasks, domains, and instructional settings and how curricula can support students in engaging responsibly with AI in high-stakes environments. Limitations Several limitations should be acknowledged. First, the sample size was relatively small, particularly within the medical student subgroup. While this sample size was sufficient to detect interaction effects of theoretical interest, the results should be interpreted with caution and replicated in larger, more diverse samples to assess generalizability. Second, although the justification quality metric was carefully developed and is rooted in prior research (e.g., Kammerer et al., 2021 ), it only captures surface-level indicators of reasoning and does not fully represent students’ underlying epistemic processes. Future studies could complement such coding schemes with think-aloud protocols, source evaluation tracking, or interviews to gain deeper insight into how students reason with AI-generated content. Third, the task focused on a single topic, nanoparticle-based sunscreen, which, while scientifically unsettled and appropriate for interdisciplinary inquiry, may limit the transferability of the findings to other domains or clinical decision-making scenarios. Finally, the artificial time constraint and instruction to write without notes may have increased cognitive load and prevented students from using verification strategies they would use in authentic learning contexts. Despite these constraints, the study provides valuable insights into how learners interact with LLMs in situations that resemble high-stakes, time-sensitive academic research. Future research should examine how these dynamics unfold over longer timeframes and within curricular settings that allow for iterative reflection and source triangulation. Conclusion This study shows that the educational impact of LLMs varies depending on learners’ prior knowledge and professional context. Although the LLM reduced cognitive load for all students, only those with greater domain expertise, such as medical students, were able to maintain or improve the quality of their justifications. Conversely, less knowledgeable students benefited more from traditional search tools that reveal the structure and credibility of information sources. These findings suggest reframing LLMs as epistemic tools that redistribute cognitive and interpretive labor, not just learning aids. To use LLMs effectively, learners must possess the knowledge and judgment needed to assess plausibility, recognize uncertainty, and take responsibility for their conclusions. Therefore, integrating LLMs into health professions education requires technical training and explicit attention to epistemic responsibility. As generative AI becomes a persistent presence in academic and clinical environments, it will be essential to prepare students to reason with, through, and beyond these tools in order to safeguard the integrity of evidence-based practice. Declarations Funding Declaration The contributions by Matthias Stadler and Constanze Richters was supported by a grant from the Deutsche Forschungsgesellschaft DFG (Collaborative Research Centre - SHARP (TRR419)). There is no conflict of interest. Author Contribution All authors contributed to each part of the manuscript. Data Availability All data supporting the findings of this study will be available within the paper and its Supplementary Information. References Ahmed, R. (2024). Exploring ChatGPT usage in higher education: Patterns, perceptions, and ethical implications among university students. Journal of Digital Learning and Distance Education , 3 (6), 1122–1131. Ayoub, N. F., Lee, Y., Grimm, D., & Divi, V. (2024). Head-to-head comparison of ChatGPT versus Google Search for medical knowledge acquisition. Otolaryngology–Head and Neck Surgery , 179 (6), 1484–1491. Bai, L., Liu, X., & Su, J. (2023). ChatGPT: The cognitive effects on learning and memory. Brain-X , 1 (3), Article e30. Bauer, E., Greiff, S., Graesser, A. C., Scheiter, K., & Sailer, M. (2025). Looking beyond the hype: Understanding the effects of AI on learning. Educational Psychology Review , 37 (2), 45. ttps://doi.org/10.1007/s10648-025-10020-8 Binhammad, M. H. Y., Othman, A., Abuljadayel, L., Mheiri, A., Alkaabi, H., M., & Almarri, M. (2024). Investigating how generative AI can create personalized learning materials tailored to individual student needs. Creative Education , 15 (7), 1499–1523. ttps://doi.org/10.4236/ce.2024.157091 Bråten, I., McCrudden, M. T., Lund, S., Brante, E., E. W., & Strømsø, H. I. (2018). Task-oriented learning with multiple documents: Effects of topic familiarity, author expertise, and content relevance on document selection, processing, and use. Reading Research Quarterly , 53 (3), 345–365. ttps://doi.org/10.1002/rrq.197 Chi, M. T., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational psychologist , 49 (4), 219–243. ttps://doi.org/10.1080/00461520.2014.965823 Fan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., & Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. British Journal of Educational Technology , 56 (2), 489–530. ttps://doi.org/10.1111/bjet.13544 Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods , 39 (2), 175–191. ttps://doi.org/10.3758/BF03193146 Firat, M., & Kuleli, S. (2024). GPT vs. Google: A comparative study of self-code learning in ODL students. Journal of Educational Technology & Online Learning , 7 (3), 308–320. Fischer, F., Kollar, I., Ufer, S., Sodian, B., Hussmann, H., Pekrun, R., & Eberle, J. (2014). Scientific reasoning and argumentation: Advancing an interdisciplinary research agenda in education. Frontline Learning Research , 2 (3), 28–45. Kalyuga, S. (2007). Expertise reversal effect and its implications for learner-tailored instruction. Educational Psychology Review , 19 (4), 509–539. ttps://doi.org/10.1007/s10648-007-9054-3 Kammerer, Y., Gottschling, S., & Bråten, I. (2021). The role of internet-specific justification beliefs in source evaluation and corroboration during web search on an unsettled socio-scientific issue. Journal of Educational Computing Research , 59 (2), 342–378. ttps://doi.org/10.1177/0735633120952731 Klepsch, M., Schmitz, F., & Seufert, T. (2017). Development and validation of two instruments measuring intrinsic, extraneous, and germane cognitive load. Frontiers in Psychology, 8 , Article 1997. Lin, S. F., Lin, H. S., & Wu, Y. Y. (2013). Validation and exploration of instruments for assessing public knowledge of and attitudes toward nanotechnology. Journal of Science Education and Technology , 22 (4), 548–559. McFadden, M. (2025, August 5). The epistemic stakes of large language models . Institute for Experiential AI, Northeastern University. Retrieved from https://ai.northeastern.edu/news/the-stakes-of-llms Paas, F., & van Merriënboer, J. J. G. (2020). Cognitive-load theory: Methods to manage working memory load in the learning of complex tasks. Current Directions in Psychological Science , 29 (4), 394–398. Patac, L. P., & PatacJr., A. V. (2025). Using ChatGPT for academic support: Managing cognitive load and enhancing learning efficiency – A phenomenological approach. Social Sciences & Humanities Open , 11 , 101301. Rainie, L., Funk, C., & Anderson, M. (2019). How Americans approach facts and information . Pew Research Center. Stadler, M., Sailer, M., & Fischer, F. (2021). Knowledge as a formative construct: A good alpha is not always better. New Ideas in Psychology , 60 , 100832. ttps://doi.org/10.1016/j.newideapsych.2020.100832 Stadler, M., Bannert, M., & Sailer, M. (2024). Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry. Computers in Human Behavior , 160 , 108386. Sweller (2011). CHAPTER TWO - Cognitive Load Theory. Psychology of Learning and Motivation , 55 , 37–76. ttps://doi.org/10.1016/B978-0-12-387691-1.00002-8 Tetzlaff, L., Simonsmeier, B., Peters, T., & Brod, G. (2025). A cornerstone of adaptivity–A meta-analysis of the expertise reversal effect. Learning and Instruction , 98 , 102142. ttps://doi.org/10.1016/j.learninstruc.2025.102142 The jamovi project (2025). jamovi (Version 2.6) [Computer software]. Retrieved from https://www.jamovi.org Zhai, C., Wibowo, S., & Li, L. D. (2024). The effects of over-reliance on AI dialogue systems on students' cognitive abilities: a systematic review. Smart Learning Environments , 11 (1), 28. ttps://doi.org/10.1186/s40561-024-00316-7 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9084455","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":605026051,"identity":"9fc905c1-e953-4bb5-a5cd-e22cb8ffded8","order_by":0,"name":"Matthias Stadler","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABX0lEQVRIie2RwWfDUBzHXzxyepPbvMps/8KvwmvDNP9K4pFTaSgzVhpK/obS/hHtZefUo7nUepvQHjqhpx1KmdGx/ZJtpDrbdYd8iHxffD++jxBSUfEPgZgQSgCTgYkF+MbM8WFfjbiolRQ7ptqg+MIl1jDUwj8Up1ByOP1U8l1eapwoIlkmWRC0HMKN2e4ZrjvWo5qlWrS+IOeLehb01pcNQqebkrKQ2mAI0gsNRWtj8O37lS+bWrRlxGxb1nC+texQ75ZmRIwKA+oSIonJQIFYtQXXIsUcDOZZqLxJzETpqrDMcqXvoEIPDN7BGnVeCgVXGgdU+qg0XktKWqwoLSRSx5UYwGzr34qgqLiAK+X/kmb1EYPEi7gU9hgk8JVvNd2HXPG7Jpur+kTp3aOLeZs9e7tzDGOWpc+3LTBG8ind3SiHmHK6Zz11BclguiOn6Ecn9+hEf+hXVFRUVPzGB8bqekKbikkzAAAAAElFTkSuQmCC","orcid":"","institution":"LMU Klinikum","correspondingAuthor":true,"prefix":"","firstName":"Matthias","middleName":"","lastName":"Stadler","suffix":""},{"id":605026052,"identity":"49643ac8-5a44-44a1-991a-a7aa8c4da51d","order_by":1,"name":"Raisa Kirikaidou","email":"","orcid":"","institution":"LMU Klinikum","correspondingAuthor":false,"prefix":"","firstName":"Raisa","middleName":"","lastName":"Kirikaidou","suffix":""},{"id":605026053,"identity":"b80d7acc-1d0f-428d-9430-d5ddb933516a","order_by":2,"name":"Michael Sailer","email":"","orcid":"","institution":"University of Augsburg","correspondingAuthor":false,"prefix":"","firstName":"Michael","middleName":"","lastName":"Sailer","suffix":""},{"id":605026054,"identity":"40ceb323-65e6-4009-b6a7-c2a64b01248b","order_by":3,"name":"Maria Bannert","email":"","orcid":"","institution":"Technical University of Munich","correspondingAuthor":false,"prefix":"","firstName":"Maria","middleName":"","lastName":"Bannert","suffix":""},{"id":605026055,"identity":"64f2470c-834e-4f6d-8d3a-1a3306fa55f2","order_by":4,"name":"Constanze Richters","email":"","orcid":"","institution":"LMU Klinikum","correspondingAuthor":false,"prefix":"","firstName":"Constanze","middleName":"","lastName":"Richters","suffix":""}],"badges":[],"createdAt":"2026-03-10 13:11:37","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9084455/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9084455/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":104867664,"identity":"e12ff761-d1fb-4b3a-94bf-0e379b8eb2f6","added_by":"auto","created_at":"2026-03-18 07:13:38","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":20645,"visible":true,"origin":"","legend":"\u003cp\u003eStudent group by Research Tool interaction effect on Justification Quality\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-9084455/v1/1a9fce926cab3901ee9b0b2d.png"},{"id":108914473,"identity":"f948b5b7-a821-4aa5-8909-9fb46243e631","added_by":"auto","created_at":"2026-05-10 14:55:50","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":296038,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9084455/v1/c6a0fca0-8863-45c7-9ed8-de167809c2e1.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Redistributing Epistemic Labor: Prior Knowledge Shapes How Effectively Students Use Large Language Models","fulltext":[{"header":"Introduction","content":"\u003cp\u003eMedical education makes specific epistemological demands on learners. Students must acquire factual knowledge and learn to evaluate evidence in situations of uncertainty, weigh competing claims and base their clinical judgements on scientific reasoning. These capacities, which are sometimes referred to as epistemic competence, are central to medical training and to the way clinicians reason. The way in which students develop these capacities depends, in part, on the cognitive and epistemic work they are required to undertake during their studies. The tools that mediate this work therefore matter, as they do not merely support learning; they also shape the type of learning that is possible.\u003c/p\u003e \u003cp\u003eLarge language models (LLMs) are now entering this mediating role (Rainie et al., \u003cspan class=\"CitationRef\"\u003e2019\u003c/span\u003e). Unlike search engines, which return a list of external documents requiring learners to actively search, filter and synthesize information, LLMs generate synthesized responses directly. The cognitive and epistemic work involved in inquiry, locating sources, evaluating their credibility and reconciling conflicting evidence, is handled at least in part by the system (Ayoub et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e). The learner's task then becomes one of judging plausibility, calibrating trust, and deciding whether to accept, challenge, or extend the AI's output (Fischer et al., \u003cspan class=\"CitationRef\"\u003e2014\u003c/span\u003e, McFadden, 2025). This redistribution of labor raises an urgent question for medical education: does it promote or hinder the development of the epistemic competencies required for professional practice?\u003c/p\u003e \u003cp\u003eEarly evidence suggests that the answer depends on what learners already know. According to cognitive load theory (Sweller, \u003cspan class=\"CitationRef\"\u003e2011\u003c/span\u003e), learning involves three types of cognitive load: intrinsic load, which arises from content complexity; extraneous load, which arises from inefficient information presentation; and germane load, which reflects the effort invested in schema construction and meaning-making (Klepsch et al., \u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e). Since LLMs provide information in a concise, conversational format, they likely reduce the extraneous burden of navigating multiple sources and the intrinsic burden of understanding complex materials (Ahmed, \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e; Bai, et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e). Stadler et al. (\u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e) confirmed this experimentally. Social science students who used LLMs to research an unsettled medical question reported significantly lower cognitive load across all three dimensions compared to students who used traditional search engines. Other studies corroborate this pattern of reduced mental effort with AI assistance (Firat \u0026amp; Kuleli, \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e). However, reduced effort did not translate into better performance. Students in the LLM condition produced justifications of lower quality and depth and engaged less actively with diverse sources. Patac and Patac Jr. (\u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e) observed a similar pattern. LLMs increase perceived efficiency, but they encourage more superficial engagement. By contrast, traditional search demands the kind of active elaboration, contrasting, synthesizing, and inferring, that underlies robust understanding (Chi \u0026amp; Wylie, \u003cspan class=\"CitationRef\"\u003e2014\u003c/span\u003e; Fan et al., \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThese findings suggest a potential tension in medical education. If LLMs reduce the cognitive effort associated with seeking evidence, and if that effort is precisely what builds the epistemic skills that clinicians need, then unreflectively integrating LLMs into medical curricula could undermine, rather than support, professional formation. Bauer et al. (\u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e) referred to this effect as an inversion effect of LLMs. However, this concern may not apply uniformly. Students with substantial prior knowledge may be better positioned to critically engage with LLM outputs, recognizing omissions and implausibilities and integrating AI-generated information into coherent reasoning rather than simply accepting it (Binhammad et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e). In terms of cognitive load, this corresponds to the expertise reversal effect: instructional supports that hinder novices may be neutral or beneficial for advanced learners (Kalyuga, \u003cspan class=\"CitationRef\"\u003e2007\u003c/span\u003e; Paas \u0026amp; van Merriënboer, \u003cspan class=\"CitationRef\"\u003e2020\u003c/span\u003e; Tetzlaff et al., \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e). Thus, whether LLMs function as genuine epistemic support or as shortcuts that displace necessary reasoning depends on the knowledge learners bring to the task.\u003c/p\u003e \u003cp\u003eThe present study tests this directly. Building on Stadler et al. (\u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e), who excluded medical students to avoid the effects of prior knowledge, we now include them as the theoretically critical case, not as an afterthought. Medical students have stronger domain knowledge and professional epistemic norms that emphasize evaluating evidence, acknowledging uncertainty, and providing accountable justifications. Thus, they represent a population for whom LLMs might function differently than for novices. By assigning medical and social science students to research the same unsettled medical question using either ChatGPT-4 or Google, we examine whether the \"cognitive ease at a cost\" finding generalizes, reverses, or differentiates across levels of domain expertise. The outcome will have direct implications for the integration of LLMs into health professions education, or for their cautious introduction.\u003c/p\u003e "},{"header":"Methods","content":"\u003cp\u003eDesign and Participants\u003c/p\u003e\u003cp\u003eThis study used a between-subjects experimental design, with the primary factor being the research tool assigned: LLM or Search Engine. The target population was advanced students in medicine, given the study’s focus on a medically relevant research task. We recruited 30 medical students from the BLINDED FOR REVIEW. Participants were in the later years of their program (third year or higher), ensuring they had sufficient scientific background and were familiar with the norms of evidence-based reasoning in medicine. The sample was 60% female, had a mean age of 27.8 years (\u003cem\u003eSD\u003c/em\u003e = 4.95). We obtained informed consent from all students for participation.\u003c/p\u003e\u003cp\u003eIn addition to the medical student sample, we made use of data from a comparison sample of 91 students in social sciences from a previous study (Stadler et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e). Those students had been randomly assigned to the same two tool conditions (LLM or Search Engine) and had completed the same task (described below) under comparable experimental settings. We included this external dataset to enable between-group comparisons (medical vs. social sciences students), thereby testing the generalizability of tool effects across learners with different domain knowledge. Notably, the social science students had been screened to ensure none were from medicine or related disciplines (Stadler et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e). Ethical approval for using the previously published dataset was covered under the original study’s review; for our new data collection with medical students, the research was approved by the institutional review board of BLINDED FOR REVIEW in accordance with the Declaration of Helsinki.\u003c/p\u003e\u003cp\u003eAn a priori power analysis was conducted using G*Power 3.1 (Faul et al., \u003cspan class=\"CitationRef\"\u003e2007\u003c/span\u003e) to estimate the required medical student sample size for detecting a difference between the ChatGPT and Google conditions, assuming an effect size in the range observed in the original study (d = 0.6–0.7; Stadler et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e), α = .05, and power (1–β) = .80. Accordingly, the present medical student sample (N = 30) was not designed to test tool-use differences within medical students alone, but to examine whether the previously observed tool effects generalize or reverse under conditions of substantially higher domain knowledge when compared with the larger social sciences sample.\u003c/p\u003e\u003cp\u003eTask and Procedure\u003c/p\u003e\u003cp\u003eParticipants first completed demographic questions and were then introduced to the research scenario. Adapting a task design from Kammerer et al. (\u003cspan class=\"CitationRef\"\u003e2021\u003c/span\u003e), students were asked to assist a fictional friend, Paul, in deciding whether to continue using sunscreen containing mineral nanoparticles- specifically zinc oxide and titanium dioxide. Paul highlighted three perceived advantages: these particles reflect UV light rather than relying on chemical UV filters that may trigger allergies or hormonal side effects; no harmful effects of the particles are currently known; and nanoparticle-based sunscreens can achieve high SPF values offering strong skin protection. Despite these benefits, Paul was uncertain due to concerns about potential health risks associated with nanoparticles. Students were tasked with investigating whether these concerns were scientifically justified or unfounded. Students were randomly assigned to one of two groups using different tools of information search. The first group was assigned the “web search” condition (using the Google search engine; google.com)and the second group the “LLM” condition (sing the ChatGPT chatbot; GPT 4o).\u003c/p\u003e\u003cp\u003eThey were given 20 minutes to conduct their research, after which they would write a recommendation including a justification, relying solely on memory without access to any notes, websites, or ChatGPT outputs. To prevent cognitive load responses from being influenced by the writing task, the cognitive load questionnaire was administered immediately after the research phase, and before students composed their recommendations. Lastly, participants completed a brief prior knowledge assessment on nanotechnology.\u003c/p\u003e\u003cp\u003eMeasures\u003c/p\u003e\u003cp\u003e \u003cstrong\u003eCognitive Load\u003c/strong\u003e \u003c/p\u003e\u003cp\u003eTo assess participants’ experienced cognitive load, we used the 7 item Cognitive Load Scale developed by Klepsch, Schmitz, and Seufert (\u003cspan class=\"CitationRef\"\u003e2017\u003c/span\u003e), which provides separate subscales for intrinsic load (e.g. “The topic was very complex”), extraneous load (e.g. “The way information was presented was confusing”), and germane load (e.g. “I put a lot of effort into understanding the content”). Each subscale consisted of several Likert-type items (rated 1 = strongly disagree to 7 = strongly agree). Following standard practice, we computed a mean score for each type of load. The internal consistency (Cronbach’s α) was satisfactory for all three subscales (α = 0.78–0.85).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e \u003cstrong\u003ePrior Knowledge\u003c/strong\u003e \u003c/p\u003e\u003cp\u003eTo assess participants’ understanding of nanotechnology, we employed a modified version of the Public Knowledge of Nanotechnology Test (PKNT) originally developed by Lin et al. (\u003cspan class=\"CitationRef\"\u003e2013\u003c/span\u003e). This instrument consists of eight multiple-choice items, each offering four answer options, and covers foundational topics such as the scale and size of nanoparticles, material structure, and real-world applications of nanomaterials. Because both the prior knowledge measure and the justification quality score function as index scores rather than scales, internal consistency was not expected (see Stadler et al., \u003cspan class=\"CitationRef\"\u003e2021\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e \u003cb\u003eJustification Quality\u003c/b\u003e: In line with Kammerer et al. (\u003cspan class=\"CitationRef\"\u003e2021\u003c/span\u003e) and drawing on prior work in multiple document comprehension (Bråten et al., \u003cspan class=\"CitationRef\"\u003e2018\u003c/span\u003e), we evaluated the quality of students’ justifications by analyzing the content of their written recommendations. Specifically, we coded whether participants included any of the following arguments: (a) that risks are minimal or outweighed by benefits, (b) that applying a coating to nanoparticles can reduce potential harm, (c) that risks primarily arise when the skin is damaged, (d) that sprays may be hazardous due to the possibility of inhalation, (e) that risks are high or uncertain, (f) that the benefit is purely cosmetic, and (g) that non-nanoparticle mineral sunscreens could serve as alternatives. The coding framework was developed through expert consultation with nanotechnology specialists at the Leibniz Institute for New Materials in Saarbrücken, Germany, as well as through inductive analysis of the student justifications. The number of these relevant aspects identified in each justification was used as the dependent variable. Two independent coders rated all responses, achieving strong inter-rater reliability (κ = 0.92). Discrepancies were resolved through discussion. Participants were not informed of the coding categories in advance.\u003c/p\u003e\u003cp\u003e \u003cstrong\u003eRecommendation direction\u003c/strong\u003e \u003c/p\u003e\u003cp\u003eTo examine whether using an LLM leads to a convergence of conclusions, which is a potential concern because AI tools may bias users toward a particular answer, we analyzed the recommendation decision in each essay. Specifically, we noted whether each student recommended using, avoiding, or was being ambivalent about nanoparticle sunscreens. Inter-rater reliability was very high for recommendation direction (κ = 0.95).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003ch2\u003eData Analysis\u003c/h2\u003e\u003cp\u003eAll statistical analyses were performed using jamovi version 2.3 (The jamovi project, \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e). To examine the hypothesized differences in prior knowledge between the two student groups, we conducted an independent samples t-test. For the primary hypotheses, we ran analyses of variance (ANOVA), comparing group means for the Research Tool condition and the Student Groups with cognitive load and justification quality as dependent variables. To compare the homogeneity in recommendations across conditions, we used a chi-squared analysis. All analyses adhered to an alpha threshold of .05.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e provides the descriptive statistics for all scales used by Student Group and scale correlations. The most notable descriptive difference between the student groups can be seen for prior knowledge, as expected. This difference also proved statistically significant (\u003cem\u003et\u003c/em\u003e(119)\u0026thinsp;=\u0026thinsp;3.66; \u003cem\u003ep\u003c/em\u003e \u0026lt; .001; \u003cem\u003ed\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.77).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDescriptive statistics and correlations for all scales\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"9\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c9\" colnum=\"9\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\" morerows=\"1\" rowspan=\"2\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c3\" namest=\"c2\"\u003e \u003cp\u003eMedical\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"2\" nameend=\"c5\" namest=\"c4\"\u003e \u003cp\u003eSocial Sciences\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e\u0026nbsp;\u003c/th\u003e \u003c/tr\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003e\u003cem\u003eM\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eSD\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003eM\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u003cem\u003eSD\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003e2\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c8\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c9\"\u003e \u003cp\u003e4\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e1. Prior knowledge\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e4.17\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2.96\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e2. ICL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e3.43\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.41\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e3.81\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.65\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u0026minus;\u0026thinsp;.18\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e3. ECL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e3.09\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.28\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e3.49\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.67\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u0026minus;\u0026thinsp;.15\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e.49*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e\u0026nbsp;\u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e4. GCL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e3.58\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e1.31\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e3.99\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.59\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e\u0026minus;\u0026thinsp;.10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e.59*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e.53*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e-\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e5. Justification Quality\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e1.73\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.87\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e1.55\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e1.00\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e.22*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e.20*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e.21*\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c9\"\u003e \u003cp\u003e.29*\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"9\"\u003eNote. ICL\u0026thinsp;=\u0026thinsp;Intrinsic cognitive load, ECL\u0026thinsp;=\u0026thinsp;Extraneous cognitive load, GCL\u0026thinsp;=\u0026thinsp;Germane cognitive load, M\u0026thinsp;=\u0026thinsp;Mean, SD\u0026thinsp;=\u0026thinsp;Standard deviation, * p \u0026lt; .05\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e summarizes the ANOVA results comparing the mean values of the three cognitive load scales across the Student Group and Research Tool factors. There was no statistically significant effect of Student Group for all scales, but there was for Research Tool, with no interaction between the two factors. These results support the findings of Stadler et al. (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) that researching with ChatGPT results in lower cognitive load than researching with Google, a finding that holds true for both medical and social sciences students.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eANOVA results for cognitive load and justification quality\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDependent variable\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eEffect\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u003cem\u003eF(\u003c/em\u003e1,117)\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003e\u003cem\u003ep\u003c/em\u003e\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eη\u003csup\u003e2\u003c/sup\u003e\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eICL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStudent group\u003c/p\u003e \u003cp\u003eResearch Tool\u003c/p\u003e \u003cp\u003eInteraction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.40\u003c/p\u003e \u003cp\u003e5.92\u003c/p\u003e \u003cp\u003e2.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e.239\u003c/p\u003e \u003cp\u003e.016\u003c/p\u003e \u003cp\u003e.129\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e.01\u003c/p\u003e \u003cp\u003e.05\u003c/p\u003e \u003cp\u003e.02\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eECL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStudent group\u003c/p\u003e \u003cp\u003eResearch Tool\u003c/p\u003e \u003cp\u003eInteraction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.70\u003c/p\u003e \u003cp\u003e5.15\u003c/p\u003e \u003cp\u003e0.46\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e.194\u003c/p\u003e \u003cp\u003e.025\u003c/p\u003e \u003cp\u003e.502\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e.01\u003c/p\u003e \u003cp\u003e.04\u003c/p\u003e \u003cp\u003e.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGCL\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStudent group\u003c/p\u003e \u003cp\u003eResearch Tool\u003c/p\u003e \u003cp\u003eInteraction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e2.64\u003c/p\u003e \u003cp\u003e25.17\u003c/p\u003e \u003cp\u003e0.80\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e.107\u003c/p\u003e \u003cp\u003e\u0026lt;\u0026thinsp;.001\u003c/p\u003e \u003cp\u003e.389\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e.02\u003c/p\u003e \u003cp\u003e.17\u003c/p\u003e \u003cp\u003e.00\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eJustification Quality\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eStudent group\u003c/p\u003e \u003cp\u003eResearch Tool\u003c/p\u003e \u003cp\u003eInteraction\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1.21\u003c/p\u003e \u003cp\u003e0.71\u003c/p\u003e \u003cp\u003e6.45\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e.273\u003c/p\u003e \u003cp\u003e.401\u003c/p\u003e \u003cp\u003e.012\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e.01\u003c/p\u003e \u003cp\u003e.01\u003c/p\u003e \u003cp\u003e.05\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"5\"\u003eNote. ICL\u0026thinsp;=\u0026thinsp;Intrinsic cognitive load, ECL\u0026thinsp;=\u0026thinsp;Extraneous cognitive load, GCL\u0026thinsp;=\u0026thinsp;Germane cognitive load\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eRegarding the justification quality, there was no statistically significant effect of Student Group or Research Tool but a statistically significant moderation effect. Closer inspection revealed a disordinal interaction with medical students having higher quality justification in the LLM condition (\u003cem\u003eM\u003c/em\u003e\u0026thinsp;=\u0026thinsp;1.92; \u003cem\u003eSE\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.26) than in the Search Engine condition (\u003cem\u003eM\u003c/em\u003e\u0026thinsp;=\u0026thinsp;1.59; \u003cem\u003eSE\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.23) as opposed to the social sciences students that had higher quality justification in the Google condition (M\u0026thinsp;=\u0026thinsp;1.87; SE\u0026thinsp;=\u0026thinsp;0.14) than in the ChatGPT condition (M\u0026thinsp;=\u0026thinsp;1.20; SE\u0026thinsp;=\u0026thinsp;0.14). This is visualized in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eThere were no statistically significant differences in recommendation homogeneity by Student Group or by Research Tool (\u003cem\u003eX\u003c/em\u003e\u003csup\u003e2\u003c/sup\u003e(2)\u0026thinsp;=\u0026thinsp;4.42; \u003cem\u003ep\u003c/em\u003e = .110).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study builds on previous research into cognitive load and LLM-supported learning by examining whether the epistemic effects of AI-assisted research differ as a function of learners\u0026rsquo; prior knowledge. Building on the findings of Stadler et al. (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), who showed that social sciences students using LLMs experienced reduced cognitive load but produced weaker justifications than those using search engines, the present study extends this work by introducing a sample of medical students. Medical students are trained within a professional context that emphasizes evaluating the quality of evidence, considering uncertainty, and grounding recommendations in scientific justification. By testing the same task in a population characterized by higher domain expertise and such professional epistemic norms, we examined whether the previously observed trade-off between cognitive ease and justification quality generalizes to learners who are expected to reason under conditions of accountability and evidence-based decision making.\u003c/p\u003e \u003cp\u003eThe results confirm and expand upon the earlier findings. Across both groups, LLM users reported significantly lower levels of cognitive load (intrinsic, extraneous and germane), replicating the 'ease' effect identified by Stadler et al. (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). However, the quality of justifications revealed a more nuanced picture: whereas social sciences students produced better arguments using search engines, medical students showed the opposite pattern, generating stronger justifications with the LLM. This interaction effect suggests that prior knowledge moderates whether the cognitive ease afforded by an LLM comes at the expense of epistemic performance or enables more efficient reasoning without compromising depth.\u003c/p\u003e \u003cp\u003eFrom a theoretical perspective, these findings contribute to ongoing discussions in cognitive load theory and multiple document comprehension. Cognitive load theory (Sweller et al., 2011) has long emphasized reducing extraneous load while preserving the generative aspects of germane load. However, our data suggest that, although ChatGPT reduces overall effort, it may also shift the nature of epistemic labor from an effortful search for information to judgment, interpretation, and calibration of trust. Therefore, germane load may not correspond directly to observable effort, especially when AI tools alter the learning task itself. These findings complement research in multiple document comprehension (Br\u0026aring;ten et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Kammerer et al., \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), which has shown that deep reasoning requires integrating conflicting or uncertain information. This demand may be masked when using synthesized LLM outputs, unless learners possess the knowledge and disposition to critically interrogate them.\u003c/p\u003e \u003cp\u003eThe notion of epistemic responsibility is also central to interpreting these results. Relying on AI-generated responses can create a false sense of fluency and completeness in learners, which can lead to overtrust and underverification (Zhai et al. \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). However, our findings show that this risk is not uniform. Medical students, likely due to their stronger prior knowledge and exposure to evidence-based reasoning norms, were able to navigate the LLM outputs without sacrificing the quality of their justifications. In contrast, social sciences students, who have less relevant domain expertise, appeared to benefit from the transparency and traceability of traditional search methods. These results underscore the importance of epistemic self-regulation, which involves not only knowing how to find information but also recognizing when and how to question the credibility and completeness of AI-generated content.\u003c/p\u003e \u003cp\u003eFrom an educational standpoint, integrating LLMs into medical and health sciences curricula requires more than providing access or technical instruction. When students are assigned open-ended tasks that allow unrestricted LLM use, the effectiveness of such activities depends critically on how much relevant prior knowledge learners bring to the task. With limited domain knowledge, students are more likely to outsource core epistemic work to the system, relying on fluent outputs as substitutes for understanding. In contrast, when students possess sufficient prior knowledge, LLMs can support more productive forms of engagement, such as formulating meaningful questions, evaluating plausibility, and integrating information into coherent justifications. In conclusion, whether LLMs exhibit inversion effects when learning (see Bauer et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2025\u003c/span\u003e) depends on the prior knowledge of the learners. Instructional strategies should therefore be sensitive to learners\u0026rsquo; knowledge levels and explicitly address how epistemic labor is redistributed in AI-supported inquiry. This is particularly important in clinical domains, where accountability, transparency, and evidence-based judgment are central to professional identity formation.\u003c/p\u003e \u003cp\u003eFinally, the study contributes to a broader shift in how we conceptualize digital learning tools. Instead of focusing solely on cognitive efficiency or usability, we suggest viewing LLMs as epistemic interfaces that alter the relationship between learners and tools. The educational implications of this shift depend on both tool design and learner characteristics, including prior knowledge, disciplinary norms, and epistemic maturity. Future research should explore how these dynamics unfold across tasks, domains, and instructional settings and how curricula can support students in engaging responsibly with AI in high-stakes environments.\u003c/p\u003e"},{"header":"Limitations","content":"\u003cp\u003eSeveral limitations should be acknowledged. First, the sample size was relatively small, particularly within the medical student subgroup. While this sample size was sufficient to detect interaction effects of theoretical interest, the results should be interpreted with caution and replicated in larger, more diverse samples to assess generalizability.\u003c/p\u003e \u003cp\u003eSecond, although the justification quality metric was carefully developed and is rooted in prior research (e.g., Kammerer et al., \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), it only captures surface-level indicators of reasoning and does not fully represent students\u0026rsquo; underlying epistemic processes. Future studies could complement such coding schemes with think-aloud protocols, source evaluation tracking, or interviews to gain deeper insight into how students reason with AI-generated content. Third, the task focused on a single topic, nanoparticle-based sunscreen, which, while scientifically unsettled and appropriate for interdisciplinary inquiry, may limit the transferability of the findings to other domains or clinical decision-making scenarios.\u003c/p\u003e \u003cp\u003eFinally, the artificial time constraint and instruction to write without notes may have increased cognitive load and prevented students from using verification strategies they would use in authentic learning contexts. Despite these constraints, the study provides valuable insights into how learners interact with LLMs in situations that resemble high-stakes, time-sensitive academic research. Future research should examine how these dynamics unfold over longer timeframes and within curricular settings that allow for iterative reflection and source triangulation.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThis study shows that the educational impact of LLMs varies depending on learners\u0026rsquo; prior knowledge and professional context. Although the LLM reduced cognitive load for all students, only those with greater domain expertise, such as medical students, were able to maintain or improve the quality of their justifications. Conversely, less knowledgeable students benefited more from traditional search tools that reveal the structure and credibility of information sources. These findings suggest reframing LLMs as epistemic tools that redistribute cognitive and interpretive labor, not just learning aids. To use LLMs effectively, learners must possess the knowledge and judgment needed to assess plausibility, recognize uncertainty, and take responsibility for their conclusions. Therefore, integrating LLMs into health professions education requires technical training and explicit attention to epistemic responsibility. As generative AI becomes a persistent presence in academic and clinical environments, it will be essential to prepare students to reason with, through, and beyond these tools in order to safeguard the integrity of evidence-based practice.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eFunding Declaration\u003c/h2\u003e\n\u003cp\u003eThe contributions by Matthias Stadler and Constanze Richters was supported by a grant from the Deutsche Forschungsgesellschaft DFG (Collaborative Research Centre - SHARP (TRR419)). There is no conflict of interest.\u003c/p\u003e\n\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\n\u003cp\u003eAll authors contributed to each part of the manuscript.\u003c/p\u003e\n\u003ch2\u003eData Availability\u003c/h2\u003e\n\u003cp\u003eAll data supporting the findings of this study will be available within the paper and its Supplementary Information.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAhmed, R. (2024). Exploring ChatGPT usage in higher education: Patterns, perceptions, and ethical implications among university students. \u003cem\u003eJournal of Digital Learning and Distance Education\u003c/em\u003e, \u003cem\u003e3\u003c/em\u003e(6), 1122\u0026ndash;1131.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAyoub, N. F., Lee, Y., Grimm, D., \u0026amp; Divi, V. (2024). Head-to-head comparison of ChatGPT versus Google Search for medical knowledge acquisition. \u003cem\u003eOtolaryngology\u0026ndash;Head and Neck Surgery\u003c/em\u003e, \u003cem\u003e179\u003c/em\u003e(6), 1484\u0026ndash;1491.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBai, L., Liu, X., \u0026amp; Su, J. (2023). ChatGPT: The cognitive effects on learning and memory. \u003cem\u003eBrain-X\u003c/em\u003e, \u003cem\u003e1\u003c/em\u003e(3), Article e30.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBauer, E., Greiff, S., Graesser, A. C., Scheiter, K., \u0026amp; Sailer, M. (2025). Looking beyond the hype: Understanding the effects of AI on learning. \u003cem\u003eEducational Psychology Review\u003c/em\u003e, \u003cem\u003e37\u003c/em\u003e(2), 45. ttps://doi.org/10.1007/s10648-025-10020-8\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBinhammad, M. H. Y., Othman, A., Abuljadayel, L., Mheiri, A., Alkaabi, H., M., \u0026amp; Almarri, M. (2024). Investigating how generative AI can create personalized learning materials tailored to individual student needs. \u003cem\u003eCreative Education\u003c/em\u003e, \u003cem\u003e15\u003c/em\u003e(7), 1499\u0026ndash;1523. ttps://doi.org/10.4236/ce.2024.157091\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBr\u0026aring;ten, I., McCrudden, M. T., Lund, S., Brante, E., E. W., \u0026amp; Str\u0026oslash;ms\u0026oslash;, H. I. (2018). Task-oriented learning with multiple documents: Effects of topic familiarity, author expertise, and content relevance on document selection, processing, and use. \u003cem\u003eReading Research Quarterly\u003c/em\u003e, \u003cem\u003e53\u003c/em\u003e(3), 345\u0026ndash;365. ttps://doi.org/10.1002/rrq.197\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChi, M. T., \u0026amp; Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. \u003cem\u003eEducational psychologist\u003c/em\u003e, \u003cem\u003e49\u003c/em\u003e(4), 219\u0026ndash;243. ttps://doi.org/10.1080/00461520.2014.965823\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFan, Y., Tang, L., Le, H., Shen, K., Tan, S., Zhao, Y., \u0026amp; Gašević, D. (2025). Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance. \u003cem\u003eBritish Journal of Educational Technology\u003c/em\u003e, \u003cem\u003e56\u003c/em\u003e(2), 489\u0026ndash;530. ttps://doi.org/10.1111/bjet.13544\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFaul, F., Erdfelder, E., Lang, A. G., \u0026amp; Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. \u003cem\u003eBehavior Research Methods\u003c/em\u003e, \u003cem\u003e39\u003c/em\u003e(2), 175\u0026ndash;191. ttps://doi.org/10.3758/BF03193146\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFirat, M., \u0026amp; Kuleli, S. (2024). GPT vs. Google: A comparative study of self-code learning in ODL students. \u003cem\u003eJournal of Educational Technology \u0026amp; Online Learning\u003c/em\u003e, \u003cem\u003e7\u003c/em\u003e(3), 308\u0026ndash;320.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFischer, F., Kollar, I., Ufer, S., Sodian, B., Hussmann, H., Pekrun, R., \u0026amp; Eberle, J. (2014). Scientific reasoning and argumentation: Advancing an interdisciplinary research agenda in education. \u003cem\u003eFrontline Learning Research\u003c/em\u003e, \u003cem\u003e2\u003c/em\u003e(3), 28\u0026ndash;45.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKalyuga, S. (2007). Expertise reversal effect and its implications for learner-tailored instruction. \u003cem\u003eEducational Psychology Review\u003c/em\u003e, \u003cem\u003e19\u003c/em\u003e(4), 509\u0026ndash;539. ttps://doi.org/10.1007/s10648-007-9054-3\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKammerer, Y., Gottschling, S., \u0026amp; Br\u0026aring;ten, I. (2021). The role of internet-specific justification beliefs in source evaluation and corroboration during web search on an unsettled socio-scientific issue. \u003cem\u003eJournal of Educational Computing Research\u003c/em\u003e, \u003cem\u003e59\u003c/em\u003e(2), 342\u0026ndash;378. ttps://doi.org/10.1177/0735633120952731\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKlepsch, M., Schmitz, F., \u0026amp; Seufert, T. (2017). Development and validation of two instruments measuring intrinsic, extraneous, and germane cognitive load. \u003cem\u003eFrontiers in Psychology, 8\u003c/em\u003e, Article 1997.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin, S. F., Lin, H. S., \u0026amp; Wu, Y. Y. (2013). Validation and exploration of instruments for assessing public knowledge of and attitudes toward nanotechnology. \u003cem\u003eJournal of Science Education and Technology\u003c/em\u003e, \u003cem\u003e22\u003c/em\u003e(4), 548\u0026ndash;559.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcFadden, M. (2025, August 5). \u003cem\u003eThe epistemic stakes of large language models\u003c/em\u003e. Institute for Experiential AI, Northeastern University. Retrieved from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://ai.northeastern.edu/news/the-stakes-of-llms\u003c/span\u003e\u003cspan address=\"https://ai.northeastern.edu/news/the-stakes-of-llms\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePaas, F., \u0026amp; van Merri\u0026euml;nboer, J. J. G. (2020). Cognitive-load theory: Methods to manage working memory load in the learning of complex tasks. \u003cem\u003eCurrent Directions in Psychological Science\u003c/em\u003e, \u003cem\u003e29\u003c/em\u003e(4), 394\u0026ndash;398.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePatac, L. P., \u0026amp; PatacJr., A. V. (2025). Using ChatGPT for academic support: Managing cognitive load and enhancing learning efficiency \u0026ndash; A phenomenological approach. \u003cem\u003eSocial Sciences \u0026amp; Humanities Open\u003c/em\u003e, \u003cem\u003e11\u003c/em\u003e, 101301.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRainie, L., Funk, C., \u0026amp; Anderson, M. (2019). \u003cem\u003eHow Americans approach facts and information\u003c/em\u003e. Pew Research Center.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStadler, M., Sailer, M., \u0026amp; Fischer, F. (2021). Knowledge as a formative construct: A good alpha is not always better. \u003cem\u003eNew Ideas in Psychology\u003c/em\u003e, \u003cem\u003e60\u003c/em\u003e, 100832. ttps://doi.org/10.1016/j.newideapsych.2020.100832\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eStadler, M., Bannert, M., \u0026amp; Sailer, M. (2024). Cognitive ease at a cost: LLMs reduce mental effort but compromise depth in student scientific inquiry. \u003cem\u003eComputers in Human Behavior\u003c/em\u003e, \u003cem\u003e160\u003c/em\u003e, 108386.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSweller (2011). CHAPTER TWO - Cognitive Load Theory. \u003cem\u003ePsychology of Learning and Motivation\u003c/em\u003e, \u003cem\u003e55\u003c/em\u003e, 37\u0026ndash;76. ttps://doi.org/10.1016/B978-0-12-387691-1.00002-8\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTetzlaff, L., Simonsmeier, B., Peters, T., \u0026amp; Brod, G. (2025). A cornerstone of adaptivity\u0026ndash;A meta-analysis of the expertise reversal effect. \u003cem\u003eLearning and Instruction\u003c/em\u003e, \u003cem\u003e98\u003c/em\u003e, 102142. ttps://doi.org/10.1016/j.learninstruc.2025.102142\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThe jamovi project (2025). \u003cem\u003ejamovi\u003c/em\u003e (Version 2.6) [Computer software]. Retrieved from \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.jamovi.org\u003c/span\u003e\u003cspan address=\"https://www.jamovi.org\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhai, C., Wibowo, S., \u0026amp; Li, L. D. (2024). The effects of over-reliance on AI dialogue systems on students' cognitive abilities: a systematic review. \u003cem\u003eSmart Learning Environments\u003c/em\u003e, \u003cem\u003e11\u003c/em\u003e(1), 28. ttps://doi.org/10.1186/s40561-024-00316-7\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-9084455/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9084455/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Medical education prepares students to reason under uncertainty, critically evaluate evidence, and make accountable clinical judgements, epistemic demands that define professional competence. Large language models (LLMs) are now entering this field, shifting the responsibility for seeking and synthesizing information from learners to tools. Whether this redistribution of epistemic labor supports or undermines the development of clinical reasoning depends on what learners already know, we argue. To test this theory, we conducted an experiment in which medical and social science students were asked to complete a critical reasoning task on an unsettled medical issue, the safety of nanoparticle-based sunscreen, using either an LLM (ChatGPT-4) or a traditional search engine (Google). Across both groups, LLM users reported significantly lower cognitive load, confirming that AI assistance reduces perceived effort regardless of expertise. However, the effect on justification quality was moderated by domain knowledge: medical students produced stronger justifications when using the LLM, whereas social science students produced stronger justifications when using the search engine. These findings suggest that LLMs do not uniformly support or hinder epistemic performance, their effect depends on the prior knowledge that learners bring to the task. For medical education, this has direct implications: integrating LLMs into curricula requires not only technical access, but also the deliberate preparation of students to exercise epistemic responsibility, evaluating, challenging and taking ownership of AI-generated information, rather than deferring to it.","manuscriptTitle":"Redistributing Epistemic Labor: Prior Knowledge Shapes How Effectively Students Use Large Language Models","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-18 07:13:28","doi":"10.21203/rs.3.rs-9084455/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"0ba7bd34-f93e-4cd6-a927-ff9c330aea8b","owner":[],"postedDate":"March 18th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-05-10T14:54:58+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-18 07:13:28","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9084455","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9084455","identity":"rs-9084455","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.