A Mixed-Method Meta-Analysis on the Dual Impact of Generative AI on Undergraduates’ Critical Thinking in Higher Education

doi:10.21203/rs.3.rs-9461812/v1

A Mixed-Method Meta-Analysis on the Dual Impact of Generative AI on Undergraduates’ Critical Thinking in Higher Education

2026 · doi:10.21203/rs.3.rs-9461812/v1

preprint OA: closed

Full text JSON View at publisher

Full text 213,110 characters · extracted from preprint-html · click to expand

A Mixed-Method Meta-Analysis on the Dual Impact of Generative AI on Undergraduates’ Critical Thinking in Higher Education | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Systematic Review A Mixed-Method Meta-Analysis on the Dual Impact of Generative AI on Undergraduates’ Critical Thinking in Higher Education Jecha Jecha, Yu Liang, Jining Han, Hayfa Nassor This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9461812/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract This meta-analysis examined whether generative artificial intelligence (GenAI) tools enhance or undermine critical thinking (CT) in university-level education. Through a systematic review of 43 empirical studies (2022–2025) identified from IEEE Xplore, Web of Science, and ERIC following the PRISMA 2020 guidelines, this study synthesised 21 between-group comparisons, nine within-group pre–post designs, and 13 correlational/structural equation models. Random-effects meta-analyses revealed significant positive effects of GenAI-supported instruction on CT outcomes in controlled comparisons (Hedges’ g = 0.88, 95% CI [0.65, 1.11], p < .001) and within AI-enhanced courses ( g = 0.97, 95% CI [0.84, 1.11], p < .001); however, correlational evidence showed negligible and heterogeneous associations between AI use frequency and CT ( g = 0.24, 95% CI [− 0.03, 0.51], p = .076). Meta-regression analyses demonstrated that high teacher guidance and explicit ethical framing significantly predicted stronger CT gains, whereas low-guidance implementation yielded diminished or adverse effects. The qualitative synthesis of all 49 studies confirmed this dual pattern: structured, scaffolded, and ethically framed GenAI integration supported deep analytical engagement, argumentation, and reflection, whereas unstructured access fostered overreliance, cognitive passivity, and integrity concerns. Findings were consistent across disciplines, indicating that pedagogical design, rather than discipline or tool type, determines GenAI impact on CT. This study offers evidence-based recommendations for curriculum designers, policymakers, and educators seeking to harness GenAI as a scaffold for critical inquiry rather than as a cognitive crutch. Generative artificial intelligence critical thinking cognitive offloading academic dishonesty pedagogical design meta-analysis Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 1. Introduction The widespread availability of generative AI (GenAI) tools, such as ChatGPT, represents a significant inflection point for higher education. As these systems increasingly exhibit human-like language understanding, reasoning, and content creation abilities, universities face pressing questions regarding their integration into teaching and learning (Grassini, 2023 ). Proponents argue that GenAI may present opportunities to scaffold higher-order cognitive skills and promote deep engagement through interactive dialogues and feedback (Chan & Hu, 2023 ). Conversely, there is genuine concern that freely available tools might substitute for student effort, promote academic dishonesty, and ultimately undermine critical thinking (CT) aptitudes fundamental to university education (Tlili et al., 2023 ). This aligns with the distinction between performed and demonstrated CT in the GenAI era, as discussed by Zhou et al. (2024). New pedagogical models speculate that GenAI can act as a “competent outsider,” leading students to critique AI-generated content and consequently improve their evaluative judgements (Mollick & Mollick, 2023 ; Pardos & Bhandari, 2023). The problem is that students find it easy to produce plausible answers, raising the spectre of cognitive offloading in which learners bypass the cognitive effort required for well-evidenced conclusions (Lodge et al., 2023 ). Recent research emphasizes that this duality is mediated by instructional design, disciplinary context, and the nature of tasks undertaken by students (Bearman et al., 2024 ; Hon, 2025 ). Despite growing empirical work on GenAI in education, evidence regarding its concrete effects on CT remains fragmented. Several reviews have examined the general impact of AI on academic honesty and the student experience (Crompton & Burke, 2023 ; Lo, 2023 ); however, there is a notable absence of studies that systematically quantify the impact on CT outcomes. The available literature is frequently based on single-institutional case reports or self-reported data, and it remains unclear whether “enhancement” or “erosion” holds across various educational environments. A specific focus on the dual impact and its quantification across diverse pedagogical contexts, with attention to pedagogical and ethical moderators, remains uniquely underexplored. While several recent systematic reviews have explored aspects of GenAI in education (Tiruneh et al., 2025; Zhang & Al Shammari, 2025 ; Qu et al., 2025 ; Sardi et al., 2025 ), none has combined quantitative effect size analysis with qualitative thematic synthesis to investigate the dual impact on CT with detailed moderator analyses across disciplines and specific pedagogical interventions, which this study aims to address. Closing these gaps is essential for evidence-based guidelines that move beyond plagiarism policies to dynamic pedagogical strategies. Unlike broader reviews, this study uniquely combines quantitative effect size analysis with qualitative thematic synthesis to explore the nuanced conditions under which GenAI affects CT. 1.1 Research Objectives To address these limitations, this study aimed to: Does the use of GenAI in university-level education enhance or undermine undergraduate CT skills? Which disciplines and pedagogical models moderate the impact of GenAI on CT outcomes? 2. Literature Review 2.1 Theoretical Frameworks To examine the relationship between GenAI and CT in higher education, this study employs a multi-theoretical lens comprising cognitive load theory (CLT), scaffolding theory, and the AI–TPACK framework. These frameworks were selected to justify the study’s dual focus on the cognitive risks (erosion) and pedagogical opportunities (enhancement) of GenAI. 2.1.1 Cognitive Load Theory and Cognitive Offloading CLT (Sweller et al., 2019 ) addresses how instructional design influences learning through its effects on working memory. According to CLT, effective learning requires minimizing extraneous load, controlling intrinsic load, and facilitating germane load (effortful processing for schema construction). GenAI tools, such as ChatGPT, can reduce extraneous cognitive load by automating mundane tasks (e.g., formatting and information retrieval), potentially freeing working memory for analysis and evaluation (Swiecki et al., 2022 ). Conversely, the ease of replicating complete answers may inadvertently lower germane load, resulting in shallow engagement and cognitive reliance (Kim et al., 2024 ). Karapantelakis et al. ( 2024 ) confirmed that reliance on GenAI is associated with self-reported reductions in cognitive effort. The related concept of cognitive offloading, the inclination to exploit external tools to alleviate cognitive burden (Risko & Gilbert, 2016 ), is particularly relevant. While adaptive in moderation, overuse of external aids may cause skill degradation and loss of cognitive autonomy (Grinschgl & Neubauer, 2022 ). GenAI tools represent a concentrated form of cognitive offloading, capable of producing synthesized arguments, structured essays, and creative content (Clark, 2025 ). The “extended mind” hypothesis (Clark & Chalmers, 1998 ) suggests that GenAI can become a thinking partner that extends human cognition, provided it is used critically. However, when used passively, the tool shifts from a cognitive partner to a cognitive crutch, impairing independent thinking (Lodge et al., 2023 ). This study uses CLT to analyze whether GenAI-supported pedagogies retain adequate germane load for critical reflection or instead encourage passive consumption. 2.1.2 Scaffolding Theory Scaffolding theory, rooted in Vygotsky’s ( 1978 ) work, provides a framework for viewing temporary supportive structures as necessary for student development. Effective scaffolding in higher education fosters CT by helping students engage in deeper cognitive processing, including analysis, synthesis, and evaluation (van de Pol et al., 2010 ). GenAI tools have been proposed as interactive digital scaffolds offering learners just-in-time support (Holmes & Porayska-Pomsta, 2022 ; Mollick & Mollick, 2023 ). For instance, AI-enabled writing assistants can provide real-time feedback on argument structure, and as Socratic interlocutors, GenAI tools can promote CT by requiring students to justify their reasoning (Chan & Hu, 2023 ). Recent applications in STEM and language education have shown that AI-supported debate, case analysis, and reflective writing can increase evaluative judgement when scaffolding is embedded in task design (Biagini et al., 2025 ; Hwang et al., 2023 ). However, scaffolding theory also warns against over-scaffolding, in which excessive support prevents productive struggle (Koedinger & Aleven, 2007 ). In the GenAI context, this occurs when students rely on AI-generated responses rather than building knowledge incrementally. The effectiveness of AI as a scaffold depends on task structure, teacher guidance, and students’ metacognitive awareness (Biagini et al., 2025 ). This study applies scaffolding theory to explore how various pedagogical models utilize GenAI as a scaffold for CT without encouraging cognitive dependency. 2.1.3 The AI-TPACK Framework Adapted from Mishra and Koehler’s ( 2006 ) TPACK model, the AI-TPACK framework (Ning et al., 2024 ) emphasizes that sound GenAI integration requires not only technical skills but also consideration of what AI can and cannot do for domain-specific goals. Teachers AI-TPACK capabilities, including AI-enhanced task design, scaffolding students to critically evaluate AI outputs, and considering ethical implications, are strong predictors of successful AI integration (Celik et al., 2022 ). When educators have high AI-TPACK, they develop activities that position GenAI as something to critique rather than as a source of answers, fostering CT and AI literacy simultaneously (Holmes & Porayska-Pomsta, 2022 ). The AI-TPACK framework is used in this study to explain the role of teacher enactment, ethical framing, and disciplinary context in shaping GenAI’s impact on CT. Together, these three theoretical lenses provide a foundation for examining the interaction between GenAI and CT. CLT informs how GenAI can facilitate or hinder learning at the cognitive level, scaffolding theory addresses how AI-supported pedagogy should be designed to develop higher-order thinking, and AI-TPACK underscores the centrality of teacher knowledge and disciplinary context. 2.2 Critical Thinking in the Era of GenAI CT remains one of the most desired yet contentious concepts in higher education. Facione’s ( 1990 ) model posits six essential CT skills: interpretation, analysis, evaluation, inference, explanation, and self-regulation. Ennis ( 1993 ) complemented this with a dispositional perspective, defining CT as “reasonable reflective thinking focused on deciding what to believe or do.” More recent work advocates domain-specific models to account for how disciplinary epistemologies shape CT (Davies, 2015 ; Moore, 2013 ). This study defines CT as a collection of cognitive processes (analysis, evaluation, inference, and reflection) and dispositions (curiosity, skepticism, and intellectual humility) that enable learners to thoughtfully engage with complex problems. The relationship between GenAI and CT varies across disciplines, reflecting differences in epistemological norms and task structures (Moore, 2013 ; Hon, 2025 ). In the humanities and social sciences, GenAI has been used to scaffold close reading, source evaluation, and essay writing, although ethical concerns about writing as evidence of thinking are prominent (Chan & Hu, 2023 ; Bearman et al., 2024 ). In STEM fields, AI-infused inquiry can enhance analytical engagement when students evaluate AI-generated solutions or debug AI code, although the risks of cognitive offloading are notable in programming instruction (Hwang et al., 2023 ; Kim et al., 2024 ). In professional disciplines (health, business, and education), AI-assisted case-based learning can enhance diagnostic reasoning alongside expert feedback, although concerns about professional judgement remain (Hon, 2025 ; Lodge et al., 2023 ). While this disciplinary variability is recognized, prior meta-analyses that specifically dissect these differences are scarce, further underscoring the contribution of this study. 2.3 Pedagogical Models and the Dual Effects of GenAI The impact of GenAI on learning depends almost entirely on the pedagogical model in which it is embedded. Three predominant integration models have emerged: (a) GenAI as a “competent outsider” or object of critique, where students evaluate, challenge, and improve AI-generated content against standards of accuracy and relevance (Mollick & Mollick, 2023 ; Holmes & Porayska-Pomsta, 2022 ); (b) GenAI as a Socratic dialogue partner, interacting with students through iterative questioning and reflection (Chan & Hu, 2023 ; Pardos & Bhandari, 2023); and (c) GenAI for inquiry- or problem-based learning, where AI-enabled tools serve as resources for hypothesis development and information synthesis (Holmes & Porayska-Pomsta, 2022 ; Hon, 2025 ). Additionally, “AI-resilient assessment” redesigns evaluation to focus on processes, justifications, and metacognition rather than products easily generated by AI (Lodge et al., 2023 ; Bearman et al., 2024 ). Empirical evidence regarding the impact of GenAI on CT is divided. On the enhancement side, experimental and quasi-experimental studies report increased CT among students in AI-supported activities compared to traditional instruction, particularly when GenAI scaffolds structured dialogues, argumentation, and reflective inquiry (Hwang et al., 2023 ; Pardos & Bhandari, 2023). On the erosion side, students show lower writing quality when relying heavily on AI-generated text, weakened retention when AI serves as an “answer machine,” and lowered self-efficacy in independent thinking (Kim et al., 2024 ; Lodge et al., 2023 ; Perkins et al., 2024 ). The causes of erosion operate at multiple levels: cognitive (bypassing deliberate practice), motivational (discouraging engagement with challenging tasks), and metacognitive (inducing false comprehension through AI explanations) (Risko & Gilbert, 2016 ; Sweller et al., 2019 ). Crucially, teacher guidance and ethical framing play central roles. Research comparing high- and low-guidance implementations indicates that structured, teacher-facilitated use significantly outperforms unstructured AI engagement in terms of CT gains (Hon, 2025 ; Hwang et al., 2023 ). Explicit ethical framing, including discussions of AI bias, hallucinations, and academic integrity, has been shown to encourage deeper and more responsible learning (Bearman et al., 2024 ; Lodge et al., 2023 ). 2.4 Summary and Research Gaps While recent meta-analyses have broadly explored the impact of AI on education (Seufert & Rohwer, 2024 ; Solyst et al., 2025 ; Qu et al., 2025 ), this study provides a unique quantification of GenAI’s bidirectional impact on CT by integrating effect size analysis with thematic synthesis to identify critical moderators. This study addresses the following key gaps: (a) lack of systematic quantification of CT effects across multiple experimental studies; (b) insufficient disciplinary granularity in existing reviews; and (c) limited empirical evidence about specific, actionable pedagogical models for GenAI integration, moving beyond the general recommendations found in broader reviews (Li et al., 2025 ; Sardi et al., 2025 ). 3. Methods 3.1 Research Design This study was conducted as a mixed-method systematic review and meta-analysis following the PRISMA 2020 guidelines (Page et al., 2021 ). The review incorporated a quantitative meta-analysis of effect sizes and a qualitative thematic synthesis of how GenAI has been reported to impact CT. A PRISMA 2020 flow diagram documented how studies were identified and screened. 3.2 Eligibility Criteria Studies were included if they (a) involved higher education or tertiary-level participants; (b) examined the use of GenAI tools (defined as systems driven by large language models that generate novel textual or multimodal content, for example, ChatGPT, GPT-based writing assistants, and multi-agent tutoring environments) for teaching, learning, assessment, or academic support; (c) reported at least one CT-related outcome (standardized CT measure, researcher-developed test of CT skills/dispositions, subscale scores in broader higher-order thinking measures, or constructs such as evaluative judgement or argumentation quality); and (d) employed a design yielding empirical evidence of impact (between-group comparisons, within-group pre-post designs, or correlational/model-based analyses). Studies were excluded if they were conceptual or opinion-only, did not involve GenAI, involved non-higher education participants, or failed to report CT-relevant outcomes. These criteria are consistent with recent meta-analyses on GenAI in higher education (Giannakos, 2025 ; Zhang & Al Shammari, 2025 ). The full criteria are detailed in Table 1 . Table 1 Inclusion and Exclusion Criteria Inclusion Exclusion Studies from 2022–2025 using GenAI in higher education No actual AI integration Empirical, mixed-methods, quasi-experimental, case studies No clear CT metrics Must report CT-related outcomes Exclusively primary/secondary education English language Non-English language 3.3 Search Strategy and Study Selection The search retrieved 653 records from IEEE Xplore, Web of Science, and ERIC. These three databases were selected because they provide complementary coverage of the target literature: IEEE Xplore captures technology-focused educational research (conference proceedings and journals in computing and engineering education), Web of Science provides broad multidisciplinary coverage of high-impact journals, and ERIC is the premier database for educational research and policy. Together, they ensure representation across both the technology and education dimensions of GenAI research while maintaining quality thresholds (all are indexed, peer-reviewed sources). Records were exported in the RIS format and managed through Rayyan for blinded screening, conflict resolution, and documentation of exclusion reasons (Ouzzani et al., 2016 ). The search string was: (“generative AI” OR “ChatGPT” OR “large language model”) AND (“critical thinking” OR “self-regulation” OR “cognitive skills”) AND (“university” OR “higher education”) . Backward citation chasing through included articles supplemented the database searches. The study selection proceeded in two phases. In Phase 1, two reviewers independently screened all titles and abstracts in Rayyan. In Phase 2, both reviewers independently read all candidate full texts and recorded the reasons for exclusion. This process yielded 43 studies that met all inclusion criteria. 3.4 Inter-Rater Reliability Both reviewers screened all 653 records, achieving Cohen’s kappa κ = 1.00 (109 included, 544 excluded by both), surpassing the “almost perfect” agreement threshold (McHugh, 2012 ). All unclear cases were resolved through discussion, and no records remained discordant after consensus, as recommended by the methodological guidance (Calderon et al., 2025; Hanegraaf et al., 2024 ). 3.5 Data Extraction and Coding Data were extracted into three interlinked spreadsheets (StudyInfo_CT, EffectSizes_CT, and Qualitative-CT_Themes) following established guidance for mixed-methods syntheses (Marshall et al., 2021 ). StudyInfo_CT captured bibliographic and contextual information for each study, along with moderators including AI use case, pedagogical model, teacher guidance level, ethical guidelines presence, and overall risk-of-bias judgement (Cochrane Bias Methods Group, 2024; Shafer, 2025 ). EffectSizes_CT contained quantitative data for computing standardized effect sizes by study design. Qualitative-CT_Themes compiled narrative summaries with structured theme coding on ordinal scales (0–3), supporting the thematic synthesis. 3.6 Study Quality Appraisal Four risk-of-bias items were coded for all 43 studies: randomization, baseline equivalence, validated CT measure, and attrition reporting (Cochrane Bias Methods Group, 2024; Shafer, 2025 ). The overall risk of bias was rated on a three-tier scale (low, some concerns, high). Risk-of-bias ratings informed the sensitivity analyses but did not serve as exclusion criteria (Calderon et al., 2025). 3.7 Quantitative Synthesis Three families of CT outcomes were analyzed: (1) between-group comparisons (AI-supported vs. non-AI instruction), (2) within-group pre-post changes, and (3) correlational/model-based associations. All models used JASP (Version 0.95.4) with the Classical Meta-Analysis module. Hedges’ g was computed for between-group and within-group effects; correlational/SEM estimates were standardized to z-values. Random-effects models with REML estimation were employed throughout. Heterogeneity was assessed using Cochran’s Q and I² statistics. Publication bias was evaluated through funnel plot inspection, trim-and-fill, and fail-safe N statistics. Moderator analyses focused on between-group effects using subgroup analyses and meta-regression with Knapp–Hartung corrections. 3.8 Qualitative Synthesis The qualitative synthesis examined how GenAI was used in practice and whether it supported or hindered CT. Structured coding on ordinal scales (0–3) captured overreliance on AI, plagiarism or integrity concerns, cognitive offloading or passivity, and deep CT engagement. Descriptive statistics and iterative thematic synthesis identified patterns of AI use, scaffolding, and risk profiles (Avsheniuk et al., 2024 ; Wiredu et al., 2024 ). 3.9 Mixed-Method Integration Quantitative and qualitative results were integrated at two levels: study-level (matching individual effects to contextual and qualitative codes) and pattern-level (interpreting meta-regression results through qualitative themes). Joint displays and narrative weaving were used to explain heterogeneity and identify conditions under which GenAI acts as a CT scaffold versus a cognitive crutch (Miranda et al., 2025 ). 4. Results 4.1 Study Characteristics A total of 43 studies were included. Approximately 43% employed between-group designs, 18% used within-group pre–post designs, 27% used correlational/SEM designs, and 12% reported descriptive outcomes not amenable to quantitative synthesis. Studies spanned educational technology/digital learning (29%), education/psychology (27%), STEM/computer science/engineering (18%), language/EFL education (16%), health/nursing (2%), and other disciplines (8%). China was the most represented country of origin (29%), followed by Spain, Indonesia, Taiwan, and Malaysia (Table 2 ). Table 2 Characteristics of Critical-Thinking–Relevant Studies (n = 43) Characteristic Category n % 95% CI Discipline EdTech / Digital learning 14 28.6 15.9–41.2 Education/Psychology 13 26.5 14.2–38.9 STEM/CS/Engineering 9 18.4 7.5–29.2 Language / EFL 8 16.3 6.0–26.7 Health/Nursing 1 2.0 0.0–6.0 Other disciplines 4 8.2 0.5–15.8 Country China 14 28.6 15.9–41.2 Spain 4 8.2 0.5–15.8 Indonesia 3 6.1 0.0–12.8 China/Taiwan 3 6.1 0.0–12.8 Malaysia 3 6.1 0.0–12.8 Other (1–2 each) * 22 44.9 31.0–58.8 *e.g., South Korea, United States, Ghana, Iran, Norway, Singapore 4.1.1 Study Quality and Risk of Bias Of the 43 studies, 33 showed low overall risk of bias, eight were assessed as moderate (some concerns), (Fig. 2 ). Low-risk studies generally employed randomised or strong quasi-experimental designs with baseline equivalence checks, used validated CT instruments (e.g., CCTST, Watson-Glaser, CTDI), and reported sufficient outcome data. Moderate-risk studies tended to use non-random allocation without confirmed baseline equivalence or relied on self-report measures with limited validation. Sensitivity analysis re-estimated the between-group CT model after removing moderate-risk studies; the pooled effect remained significant and positive, falling within the confidence interval of the original model. 4.2 Quantitative Meta-Analytic Findings 4.2.1 Between-Group Comparisons Twenty-one between-group effects were analysed. The pooled effect was large and positive (Hedges’ g = 0.88, 95% CI [0.65, 1.11], p < .001), indicating that students in AI-supported conditions scored nearly one standard deviation higher on CT outcomes than comparison groups (Table 3 ). Heterogeneity was moderate (Q (20) = 81.18, p < .001; I² = 76%; τ² = 0.16). The 95% prediction interval (0.02 to 1.74) suggests that true effects range from near zero to very large across implementations. Funnel plot inspection, trim-and-fill, and fail-safe N diagnostics indicated no substantial publication bias (Figs. 3 a − 3c). Table 3 Random-Effects Meta-Analytic Models for Critical Thinking Outcomes Model Effect type k Pooled effect 95% CI p I² (%) Between-group Post-test Hedges’ g 21 0.88 [0.65, 1.11] < .001 76 Within-group Pre-post Hedges’ g 9 0.97 [0.84, 1.11] < .001 57 Correlational/SEM Generic effect (ES + SE) 13 0.24 [− 0.03, 0.51] .076 ≈ 99 4.2.2 Within-Group Pre–Post Changes Nine pre–post effects yielded a large pooled effect ( g = 0.97, 95% CI [0.84, 1.11], p < .001) with moderate heterogeneity (Q(8) = 18.46, p = .018; I² = 57%) (Figs. 4 a − 4b). Trim-and-fill suggested two potentially missing studies, but the adjusted effect remained large ( g = 0.92, 95% CI [0.81, 1.04]). These gains cannot be attributed solely to GenAI, as they partly reflect ordinary course learning. 4.2.3 Correlational and Model-Based Evidence Thirteen correlational/SEM effects produced a small, positive, but non-significant pooled effect (0.24, 95% CI [− 0.03, 0.51], p = .076) with very high heterogeneity (I² ≈ 99%) (Figs. 5 a– 5 b). Individual studies ranged from no association to strong positive association. The simple frequency of AI use does not appear to reliably predict CT outcomes; the relationship is contingent on how AI is scaffolded within courses. 4.3 Moderator and Meta-Regression Analyses Subgroup analyses by discipline and AI use case were limited by small cell sizes, though positive pooled effects occurred within each discipline. Meta-regression revealed more informative patterns (Table 4). When teacher guidance level was entered as a predictor, the overall moderator test was significant (F (3, 17) = 3.55, p = .037). High guidance predicted a significant positive CT effect (intercept ĝ ≈ 0.92), whereas low guidance was associated with a substantially smaller and adverse predicted effect (coefficient = − 2.12, 95% CI [− 3.59, − 0.65], p = .007). Moderate and moderate-to-high guidance did not differ significantly from the high-guidance baseline. The presence of explicit ethical/academic integrity guidelines also significantly predicted CT outcomes (F (1, 19) = 10.87, p = .004). Conditions without guidelines yielded a negative but imprecise predicted effect (− 1.20, 95% CI [− 2.53, 0.14], p = .076), while the ethical-guidelines coefficient was + 2.12 (95% CI [0.78, 3.47], p = .004). Table 4 Meta-Regression of CT Effects on Teacher Guidance and Ethical Guidelines Panel A. Omnibus tests Predictor F df₁ df₂ p Teacher guidance level 3.554 3 17.00 .037 Ethical guidelines present 10.87 1 19.00 .004 Panel B. Coefficients: Teacher guidance Term Estimate SE 95% CI t df p Intercept (High) 0.923 0.245 [0.407, 1.439] 3.773 17.00 .002 Low −2.121 0.695 [− 3.588, − 0.654] −3.051 17.00 .007 Moderate −0.009 0.266 [− 0.571, 0.553] −0.033 17.00 .974 Moderate to High 0.197 0.486 [− 0.827, 1.222] 0.406 17.00 .690 Panel C. Coefficients: Ethical guidelines Term Estimate SE 95% CI t df p Intercept (No guidelines) −1.198 0.638 [− 2.533, 0.136] −1.879 19.00 .076 Guidelines present (Yes) 2.123 0.644 [0.775, 3.472] 3.297 19.00 .004 4.4 Qualitative Synthesis of Mechanisms and Risks The qualitative synthesis of all 43 studies revealed that every study reported at least one CT-supportive outcome and at least one CT-related risk, underscoring the structurally dual nature of GenAI (Table 5 ). Deep CT engagement emerged when AI was presented as an object for critique, comparison, or structured discussion. Studies reported increases in analytical reasoning, argumentation, reflective judgement, and higher-order questioning when students challenged AI outputs or revised work based on AI feedback (Nusivera et al., 2025; Chen et al., 2025; Toscano et al., 2024). Simultaneously, nearly all studies recorded risks of overdependence and cognitive offloading: students trusting AI outputs without checking, relying on AI for phrasing and proofreading, and displaying reduced reflection effort (Essel et al., 2024; Gerlich, 2025; Huang et al., 2025). Overreliance was more pronounced among students with low baseline CT or AI literacy (Kulal, 2025; Hou et al., 2025). Academic integrity concerns were also common, with students paraphrasing AI content without adequate transformation or citation (Zou et al., 2024; Gao et al., 2025). The qualitative findings confirmed the meta-regression results: studies with the highest CT gains combined GenAI with explicit scaffolding, multi-phase inquiry, guided AI–human comparison, reflexive debriefing, and explicit teaching about AI limitations (Martin-Gomez & Gonzalez Ruiz, 2025). Conversely, studies deploying AI as a generic tool with minimal guidance more often described surface-level engagement and dependence (Al-Kumaim et al., 2025; Song et al., 2025). Table 5 Frequencies of CT-Support and CT-Risk Codes (n = 43) Code/Theme n % Example Deep CT ≥ 2 (moderate to high) 43 87.8 “ChatGPT helped us organise and challenge arguments effectively.” Deep CT = 3 (high) 24 49.0 “AI debates trained me to defend ideas more logically.” Overreliance ≥ 2 42 85.7 “Sometimes I just accepted what ChatGPT said without verifying.” Offloading ≥ 2 39 79.6 “AI convenience often replaces cognitive effort.” Plagiarism ≥ 1 36 73.5 “I found myself copying AI suggestions too easily.” CT positive flag 43 100.0 “ChatDOC helped me understand difficult papers and think critically.” CT negative flag 43 100.0 “AI analysis sometimes made me skip my own reasoning steps.” 4.5 Summary of Integrated Findings The mixed-method synthesis supports a consistent but nuanced conclusion. GenAI-augmented teaching is associated with significant average CT gains in controlled and within-group studies. However, these gains are highly uneven and do not reflect a uniform positive relationship between AI use frequency and CT. Both moderator analyses and qualitative data converge in demonstrating that pedagogical design, guidance, and ethical framing determine whether GenAI serves as a scaffold for deeper thinking or as an engine of overreliance and cognitive offloading. GenAI amplifies the underlying instructional design: embedded within structured, critically oriented tasks with explicit guidance, it enhances CT; offered as an unstructured convenience tool, it promotes shortcut strategies that may compromise thoughtful consideration. 4.6 Answering the Research Questions RQ1. Does the use of GenAI in university-level education enhance or undermine students’ CT? The evidence indicates that GenAI has the potential to significantly improve students’ CT while introducing risks that can be detrimental when deployment is poorly designed. The between-group meta-analysis showed a large positive effect (Hedges’ g = 0.88), and within-group analysis confirmed substantial gains ( g = 0.97). However, correlational evidence showed only a negligible, non-significant association between AI use frequency and CT ( g = 0.24, p = .076) with extreme heterogeneity. Both quantitative and qualitative analyses confirmed the dual pattern: when used in structured, guided, and ethical ways, GenAI typically improves CT; when applied with minimal guidance as an “answer machine,” it can have detrimental effects. RQ2. Which disciplines and pedagogical models moderate the impact of GenAI on CT outcomes? Positive CT effects were observed across all disciplinary clusters. No area was identified in which GenAI consistently impaired CT, though cell sizes were too small for strong statistical claims about disciplinary differences. The most conservative conclusion is that GenAI can support CT across disciplines when linked to sound instructional design. Pedagogical implementation features, however, showed clear effects: high teacher guidance predicted the largest CT gains, while low guidance predicted diminished or adverse effects. Explicit ethical and academic integrity guidelines also significantly predicted stronger CT outcomes. Discipline alone does not determine whether GenAI enhances or hinders CT; rather, the pedagogical model, scaffolding level, and ethical framing are the determining factors. 5. Discussion This mixed-method meta-analysis explored whether GenAI-assisted learning in higher education serves as an aid or barrier to students’ CT. The quantitative synthesis revealed that GenAI-embedded instruction is linked to significant CT gains compared to non-AI learning, as well as significant pre-post gains within AI-integrated courses, while correlational evidence was inconsistent and weak. The qualitative synthesis revealed widespread risks of overdependence, cognitive offloading, and academic integrity issues (Chen et al., 2025; Bukar et al., 2024 ). These findings suggest that GenAI is neither inherently beneficial nor harmful for CT, but rather amplifies the pedagogical, ethical, and regulatory environment in which it operates. 5.1 The Role of Guidance, Scaffolding, and Ethical Framing The strongest theme from this review was the influence of teacher facilitation and ethical context on CT outcomes. Meta-regression showed that high teacher guidance predicted significant positive CT effects, while low guidance predicted diminished or adverse effects. Similarly, studies contextualising GenAI within explicit ethical instruction had considerably higher CT gains. These quantitative trends were echoed in qualitative results: studies with strong CT improvements described well-structured AI applications including argument mapping, scaffolded reading and inquiry, and AI-supported writing with explicit prompts for explanation and revision (Nusivera et al., 2025; Sanchez-Lopez et al., 2025; Zhang et al., 2025). Teachers in these studies overtly positioned AI as an imperfect artefact requiring verification and critique (Oliva-Cordova & Jimenez, 2025). In contrast, studies implementing GenAI as an open-ended assistant with limited scaffolding reported more passive use and uncritical copying (Dobre & Popescu, 2024; Gerlich, 2025). These findings are directly interpretable through the study’s theoretical framework. From the CLT perspective, high-guidance implementations maintained adequate germane cognitive load by requiring students to analyse, evaluate, and justify AI outputs, whereas low-guidance implementations reduced germane load by allowing passive consumption. Scaffolding theory predicts precisely this pattern: effective scaffolds are gradually faded as competence develops, whereas constant, unstructured AI availability functions as over-scaffolding that prevents productive struggle (Koedinger & Aleven, 2007 ). The AI-TPACK framework explains why teacher knowledge matters: educators with higher AI-TPACK competence designed tasks that positioned GenAI as an object for critical evaluation rather than an answer source (Celik et al., 2022 ; Ning et al., 2024 ). 5.2 GenAI as a Double-Edged Tool for Critical Thinking The evidence collectively demonstrates that GenAI functions as a double-edged instrument for CT. On the enhancement side, structured arguments, reading and writing support, and inquiry tasks requiring students to interrogate AI outputs provide rich occasions for analysis, evaluation, and reflective judgement (Zhang et al., 2025; Chen et al., 2025). Students in these contexts reported that AI enabled them to examine alternative viewpoints, structure arguments, identify weaknesses, and think more critically about evidence. On the erosion side, students in multiple studies reported trusting AI outputs uncritically, parroting machine-generated text, or relying on AI to “get the answer” without engaging their own reasoning (Gao et al., 2025; Song et al., 2025). CT risks included AI-dependent idea generation, cognitive offloading and passivity, and blurred authorship boundaries (Hou et al., 2025; Gerlich, 2025; Rastogi & Ashraf Ali Hassan Al Lawati, 2024). The fact that every CT-related study reported both positive outcomes and risks underscores the paradox: the same tool can elicit deeper processing while also encouraging cognitive shortcuts. From the cognitive offloading perspective (Risko & Gilbert, 2016 ), this duality is predictable: GenAI reduces the cognitive cost of producing outputs, which can be beneficial when the freed capacity is redirected toward higher-order thinking, or harmful when it replaces that thinking entirely. 5.3 Implications for Educational Practice The findings indicate that adopting GenAI is not a neutral or merely technical decision. The evidence shows that design and governance choices determine outcomes: First, courses should position humans at the centre of reasoning. When AI is presented as something to critique, compare, or debate rather than as a source of correct solutions, CT gains are higher. Tasks should require students to justify AI suggestions, assess alternatives, and reflect on when to trust or reject AI output. Second, institutions should construct explicit scaffolds rather than simply granting AI access. Effective implementations include multistep task designs, structured prompts, and feedback loops. Simply permitting students to “use AI if they want” without scaffolding risks producing superficial engagement and dependency. Third, ethics and AI literacy should be treated as foundational content, not a compliance afterthought. Instructors who framed discussions around AI bias, hallucinations, and academic honesty expectations were more likely to report responsible and critically attentive student engagement. Fourth, assessment should reward reasoning rather than AI fluency. Because fluent but superficial text is easily generated, assessment should focus on argument structures, explanations, and decision rationales, and may include in-class, oral, or stepwise components that resist wholesale cognitive offloading. Additionally, this review suggests several concrete directions for future research: What is the optimal “fading schedule” for GenAI scaffolding, i.e., how should AI support be gradually reduced as students develop independent CT skills? How do different prompt engineering strategies (e.g., Socratic prompting vs. direct instruction prompts) differentially affect CT development across disciplines? To what extent does students’ prior AI literacy moderate the relationship between GenAI use and CT, and can targeted AI literacy interventions mitigate cognitive offloading? What are the long-term (semester- or year-long) effects of GenAI integration on CT development, beyond the short-term interventions studied to date? How does GenAI’s impact on CT differ for students from underrepresented groups or those with varying levels of prior academic achievement, and what equity-oriented design principles can mitigate disparities? 5.4 Limitations This meta-analysis acknowledges four limitations. First, although trim-and-fill and fail-safe N detected no substantial publication bias, these methods have documented limitations when heterogeneity is high (I² = 76% for between-group effects). Studies reporting favourable effects are systematically more likely to be published, and reliance on published databases excludes grey literature and dissertations, potentially inflating observed effects. Second, the 2022–2025 timeframe creates boundary constraints. Studies from 2022–2023 predominantly examined GPT-3.5, whereas current models (GPT-4 and beyond) have substantially different capabilities. Pooling effects across model versions creates a “moving target” problem limiting temporal generalisability. Third, restricting the scope to university-level education means findings may not generalise to K–12 settings, where developmental differences in cognitive maturity, metacognitive awareness, and self-regulation likely alter the dynamics of cognitive offloading and scaffolding. Fourth, measurement heterogeneity complicates precise estimation. While many studies employed validated CT instruments, others relied on researcher-developed measures with unclear psychometric properties. The corpus also underrepresents certain regions (Global South) and disciplines (professional practice programs), limiting cross-cultural transferability. 6. Conclusion This study addressed the question of whether GenAI enhances or undermines CT in higher education. The answer is conditional: it does both, depending on how it is implemented. The meta-analytic evidence demonstrated that GenAI-supported instruction can produce large positive effects on CT when embedded within structured, guided, and ethically framed pedagogical designs (Hedges’ g = 0.88 for between-group comparisons; g = 0.97 for within-group pre–post designs). However, the correlational evidence revealed that mere frequency of AI use does not predict CT gains, and the qualitative synthesis documented pervasive risks of overreliance, cognitive offloading, and academic integrity erosion across all 43 included studies. The two critical moderators identified, teacher guidance level and explicit ethical framing, point to a practical conclusion: GenAI is a pedagogical amplifier. It magnifies whatever instructional design surrounds it. High-guidance, ethically framed implementations produced the strongest CT gains; low-guidance, unframed implementations produced the weakest and sometimes adverse effects. These findings carry direct implications for higher education stakeholders. For curriculum designers, the priority should be integrating GenAI within tasks that demand evaluation, justification, and metacognitive reflection rather than passive consumption. For policymakers, institutional AI policies should move beyond prohibitive or permissive stances toward prescriptive pedagogical guidance. For educators, professional development targeting AI-TPACK competencies, particularly in designing scaffolded AI tasks and framing AI ethics, represents the most promising lever for ensuring that GenAI strengthens rather than weakens students’ capacity for independent, critical thought. The field is still young, and the methodological heterogeneity of the current evidence base warrants caution. However, the convergence of quantitative, qualitative, and moderator evidence in this review provides a robust empirical foundation for an actionable principle: GenAI supports CT when students are required to think with and about AI, not simply through it. Declarations Author Contribution Jecha S Jecha: Conceptualization; investigation; writing—original draft; methodology; Jining Han: investigation; writing – original draft; methodology; Yu Liang: review and editing; Hayfa Nassor: investigation; writing – original draft; methodology; Data Availability Data supporting the findings of this study are available from the corresponding author upon reasonable request. References Ahmed, H. (2024). Institutional Integration of Artificial Intelligence in Higher Education: The Moderation Effect of Ethical Consideration. International Journal of Educational Reform. https://doi.org/10.1177/10567879241247551 *Al-kumaim, N. H., Hasnah Hassan, S., Mohammed, F., & Saleh, A. Y. (2025). Navigating GenAI in Malaysian Universities: Use, Problems, and Challenges. 2025 5th International Conference on Emerging Smart Technologies and Applications (eSmarTA) , 1–7. https://doi.org/10.1109/eSmarTA66764.2025.11132252 Alshehri, Y., AlZahrani, A., & AlQahtani, M. (2025). Challenging Cognitive Load Theory: The role of educational neuroscience and artificial intelligence in redefining learning efficacy. Brain Sciences , 15 (2), 127. https://doi.org/10.3390/brainsci15020127 Avsheniuk, N., Lutsenko, O., Seminikhyna, N., & Svyrydiuk, T. (2024). Empowering Language Learners’ Critical Thinking: Evaluating ChatGPT’s Role in English Course Implementation. Arab World English Journal, 1(1), 210–224. https://doi.org/10.24093/awej/chatgpt.14 Ayanwale, M. A., Adelana, O. P., Bamiro, N. B., Olatunbosun, S. O., Idowu, K. O., & Adewale, K. A. (2025). Large language models and GenAI in education: Insights from Nigerian in-service teachers through a hybrid ANN-PLS-SEM approach. F1000Research , 14 , 258. https://doi.org/10.12688/f1000research.161637.1 Bancoro, J. C. M. (2024). The relationship between artificial intelligence (AI) usage and academic performance of business administration students. International Journal of Asian Business and Management, 3 (1), 27–48. https://doi.org/10.55927/ijabm.v3i1.7876 Ballance, O. J. (2024). Sampling and randomisation in experimental and quasi-experimental CALL studies: Issues and recommendations for design, reporting, review, and interpretation. ReCALL , 36 (1), 58–71. https://doi.org/10.1017/S0958344023000162 Bearman, M., Ajjawi, R., & Luckin, R. (2024). Developing evaluative judgement for a time of generative artificial intelligence. Assessment & Evaluation in Higher Education . Advance online publication. https://doi.org/10.1080/02602938.2024.2335321 Bezanilla, M. J., Fernández-Nogueira, D., Poblete, M., & Galindo-Domínguez, H. (2023). Conceptualizations and instructional strategies on critical thinking in higher education: A systematic review of systematic reviews. Frontiers in Education, 8 , Article 1141686. https://doi.org/10.3389/feduc.2023.1141686 Biagini, M., Chen, Y., & Davidson, P. (2025). Mapping the scaffolding of metacognition and learning by AI tools in STEM classrooms: A bibliometric–systematic review approach (2005–2025). Data, 13 (11), 148. https://doi.org/10.3390/data13110148 Bio-Protocol. (2025). Data extraction. Bio-Protocol. https://bio-protocol.org Bukar, U. A., Sayeed, M. S., Abdul Razak, S. F., Yogarayan, N., Yahya, N., & Mohamad, N. N. S. (2024). Effectiveness of ChatGPT in improving critical thinking and problem-solving skills of engineering students. IEEE Access, 12 , 95368–95389. https://doi.org/10.1109/ACCESS.2024.3425172 Buselić, V., & Rajković, I. (2024). Teaching generic skills with ChatGPT: Debate as a critical and creative thinking teaching tool in higher education. In 2024 47th International Convention on Information, Communication and Electronic Technology (MIPRO) , Opatija, Croatia, 2024, pp. 707-712, https://doi.org/10.1109/MIPRO60963.2024.10569431 Calderon Martinez, E., Ghattas Hasbun, P. E., Salolin Vargas, V. P., García-González, O. Y., Fermin Madera, M. D., Rueda Capistrán, D. E., Campos Carmona, T., Sanchez Cruz, C., & Teran Hooper, C. (2025). A comprehensive guide to conduct a systematic review and meta-analysis in medical research . Medicine, 104 (33), e41868. https://doi.org/10.1097/MD.0000000000041868 Celik, I., Dindar, M., Muukkonen, H., & Järvelä, S. (2022). The promises and challenges of artificial intelligence for teachers: A systematic review of research. TechTrends, 66 (4), 616–630. https://doi.org/10.1007/s11528-022-00715-y Chan, C. K. Y., & Hu, W. (2023). Students’ voices on generative AI: Perceptions, benefits, and challenges in higher education. International Journal of Educational Technology in Higher Education, 20 , Article 43. https://doi.org/10.1186/s41239-023-00411-8 *Chang, L.-C., Hung, L.-L., Liu, T.-W., Huang, C.-H., Lin, H.-L., & Liao, L.-L. (2025). Relationships between ChatGPT use with self-directed learning and critical thinking among school and university nurses in Taiwan. BMC Nursing, 24 , 1426. https://doi.org/10.1186/s12912-025-04069-7 *Chen, X., Jia, B., Peng, X., Zhao, H., Yao, J., Wang, Z., & Zhu, S. (2025). Effects of ChatGPT and argument map (AM)-supported online argumentation on college students’ critical thinking skills and perceptions. Education and Information Technologies, 30 (12), 17623–17658. https://doi.org/10.1007/s10639-025-13471-2 *Chiu, M. C., & Hwang, G. J. (2025). Enhancing student creative and critical thinking in generative AI-empowered creation: a mind-mapping approach. Interactive Learning Environments , 1–22. https://doi.org/10.1080/10494820.2025.2511244 Clark, A. (2025). Extending Minds with Generative AI. Nature Communications , 16 (1), 4627. https://doi.org/10.1038/s41467-025-59906-9 Clark, A., & Chalmers, D. (1998). The extended mind. Analysis , 58 (1), 7–19. https://doi.org/10.1093/analys/58.1.7 Cochrane Bias Methods Group Bias Methods Group. (2024). RoB 2: A revised Cochrane Bias Methods Group risk-of-bias tool for randomized trials . https://methods.Cochrane Bias Methods Group.org/bias/resources/rob-2 Crompton, H., & Burke, D. (2023). Artificial intelligence in higher education: The state of the field. International Journal of Educational Technology in Higher Education , 20 (1), Article 22. https://doi.org/10.1186/s41239-023-00392-8 *Damiano, A. D., Lauría, E. J. M., Sarmiento, C., & Zhao, N. (2024). Early perceptions of teaching and learning using generative AI in higher education. Journal of Educational Technology Systems , 52 (3), 346–375. https://doi.org/10.1177/00472395241233290 Davies, M. (2015). A model of critical thinking in higher education. In M. B. Paulsen (Ed.), Higher education: Handbook of theory and research (Vol. 30, pp. 41–92). Springer. https://doi.org/10.1007/978-3-319-12835-1_2 *de la Puente Pacheco, M. A., Torres, J., Blanco Troncoso, A. L., Guzmán Murillo, H. J., & Carrascal, J. X. M. (2025). Enhancing Critical Thinking and Argumentation Skills in Colombian Undergraduate Diplomacy Students: ChatGPT-Assisted and Traditional Debate Methods. Journal of Political Science Education , 21 (4), 728–738. https://doi.org/10.1080/15512169.2025.2449936 *Dobre, S.-C., & Popescu, E. (2024). Exploring Students’ Perception and Experience with ChatGPT and Critical Thinking in a Higher Education Context. 2024 21st International Conference on Information Technology Based Higher Education and Training (ITHET) , 1–6. https://doi.org/10.1109/ITHET61869.2024.10837650 Dwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., Baabdullah, A. M., Koohang, A., Raghavan, V., Ahuja, M., Albanna, H., Albashrawi, M. A., Al-Busaidi, A. S., Balakrishnan, J., Barlette, Y., Basu, S., Bose, I., Brooks, L., Buhalis, D., Wright, R. (2023). “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management , 71 , Article 102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642 Ennis, R. H. (1993). Critical thinking assessment. Theory Into Practice , 32 (3), 179–186. https://doi.org/10.1080/00405849309543594 * Essel, H. B., Vlachopoulos, D., Essuman, A. B., & Amankwa, J. O. (2024). ChatGPT effects on cognitive skills of undergraduate students: Receiving instant responses from AI-based conversational large language models (LLMs). Computers and Education: Artificial Intelligence , 6 , 100198. https://doi.org/10.1016/j.caeai.2023.100198 EvalAcademy. (2025). Interpreting themes from qualitative data: Thematic analysis. https://www.evalacademy.com/articles/interpreting-themes-from-qualitative-data-thematic-analysis Facione, P. A. (1990). Critical thinking: A statement of expert consensus for purposes of educational assessment and instruction. The California Academic Press . Fajar, J. (2024). Approaches for identifying and managing publication bias in meta-analysis. Deka in Medicine, 1(1), e865. https://doi.org/10.69863/dim.v1i1.1 *Fakour, H., & Imani, M. (2025). Socratic wisdom in the age of AI: A comparative study of ChatGPT and human tutors in enhancing critical thinking skills. Frontiers in Education , 10 , 1528603. https://doi.org/10.3389/feduc.2025.1528603 *Gao, J., Zhang, J., & Li, Y. (2025). Do AI chatbot-integrated writing tasks influence writing self-efficacy and critical thinking ability? An exploratory study. Computers and Education: Artificial Intelligence , 9 , 100472. https://doi.org/10.1016/j.caeai.2025.100472 * Gerlich, M. (2025). AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking. Societies , 15 (1), 6. https://doi.org/10.3390/soc15010006 Giannakos, M. (2025). The promise and challenges of generative AI in education. Behaviour & Information Technology . Advance online publication. https://doi.org/10.1080/0144929X.2024.2394886 Grassini, S. (2023). Shaping the future of education: Exploring the potential and consequences of AI and ChatGPT in educational settings. Education Sciences , 13 (7), Article 692. https://doi.org/10.3390/educsci13070692 Grinschgl, S., & Neubauer, A. C. (2022). Supporting Cognition With Modern Technology: Distributed Cognition Today and in an AI-Enhanced Future. Frontiers in Artificial Intelligence , 5 , 908261. https://doi.org/10.3389/frai.2022.908261 *Guo, Y., & Lee, D. (2023). Leveraging ChatGPT for Enhancing Critical Thinking Skills. Journal of Chemical Education , 100 (12), 4876–4883. https://doi.org/10.1021/acs.jchemed.3c00505 Hanegraaf, P., Mosselman, J.-J., van Zuuren, F., van Valkenhoef, G., Delaney, B. C., & Dagnelie, P. C. (2024). Inter-reviewer reliability of human literature reviewing and implications for the introduction of machine-assisted systematic reviews: A mixed-methods review. BMJ Open , 14 (3), e076912. https://doi.org/10.1136/bmjopen-2023-076912 Holmes, W., & Porayska-Pomsta, K. (Eds.). (2022). The ethics of artificial intelligence in education: Practices, challenges, and debates . Routledge. Hon, K. L. (2025). Generative AI in higher education: A systematic review of its effects on learning outcomes and academic performance. Journal of Educational Technology Systems, 0 (0). Advance online publication. https://doi.org/10.1177/00472395251400089 Hopewell, S., Chan, A.-W., Collins, G. S., Osterhoff, G., & Moher, D. (2025). CONSORT 2025 explanation and elaboration: Updated guideline for reporting randomised trials. BMJ , 389 , Article e081124. https://doi.org/10.1136/bmj-2024-081124 *Hou, C., Zhu, G., & Sudarshan, V. (2025). The role of critical thinking on undergraduates’ reliance behaviours on generative AI in problem‐solving. British Journal of Educational Technology , 56 (5), 1919–1941. https://doi.org/10.1111/bjet.13613 *Huang, Y.-M., Chen, P.-H., Lee, H.-Y., Sandnes, F. E., & Wu, T.-T. (2025). ChatGPT-enhanced mobile instant messaging in online learning: Effects on student outcomes and perceptions. Computers in Human Behavior , 168 , 108659. https://doi.org/10.1016/j.chb.2025.108659 Hwang, G. J., Xie, H., Wah, B. W., & Gašević, D. (2023). Vision, challenges, roles and research issues of Artificial Intelligence in Education. Computers and Education: Artificial Intelligence , 1 , Article 100001. https://doi.org/10.1016/j.caeai.2020.100001 Hwang, S. (2022). Examining the effects of artificial intelligence on elementary students’ mathematics achievement: A meta-analysis. Sustainability , 14 (20), 13185. https://doi.org/10.3390/su142013185 Karapantelakis, A., Nikou, A., Kattepur, A., Martins, J., Mokrushin, L., Mohalik, S. K., Orlic, M., & Feljan, A. V. (2024). A Survey on the Integration of Generative AI for Critical Thinking in Mobile Networks. ArXiv . https://arxiv.org/abs/2404.06946 *Khampusaen, D. (2025). The Impact of ChatGPT on Academic Writing Skills and Knowledge: An Investigation of Its Use in Argumentative Essays. LEARN Journal: Language Education and Acquisition Research Network , 18 (1), 963–988. https://doi.org/10.70730/PGCQ9242 Kim, J., Lee, H., & Cho, Y. H. (2024). Does using artificial intelligence assistance accelerate skill decay and hinder skill development without performers' awareness? Cognitive Research: Principles and Implications , 9 (1), Article 46. https://doi.org/10.1186/s41235-024-00572-8 Koedinger, K. R., & Aleven, V. (2007). Exploring the assistance dilemma in experiments with cognitive tutors. Educational Psychology Review , 19 (3), 239–264. https://doi.org/10.1007/s10648-007-9049-0 *Kulal, A. (2025). Cognitive Risks of AI: Literacy, Trust, and Critical Thinking. Journal of Computer Information Systems , 1–13. https://doi.org/10.1080/08874417.2025.2582050 *Lee, H.-P. (Hank), Sarkar, A., Tankelevitch, L., Drosos, I., Rintel, S., Banks, R., & Wilson, N. (2025). The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects from a Survey of Knowledge Workers. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems , 1–22. https://doi.org/10.1145/3706598.3713778 *Lee, Y.-F., Hwang, G.-J., & Cheng, L.-C. (2025). Impacts of a ChatGPT-supported concept mapping approach on students’ database programming achievement and their problem-solving and critical thinking awareness. Interactive Learning Environments , 1–20. https://doi.org/10.1080/10494820.2025.2523395 Li, F., Yan, X., Su, H., Shen, R., & Mao, G. (2025). An assessment of human–AI interaction capability in the generative AI era: The influence of critical thinking. Journal of Intelligence , 13 (6), Article 62. https://doi.org/10.3390/jintelligence13060062 *Li, K. C., Chong, G. H. L., Wong, B. T. M., & Wu, M. M. F. (2025). A TAM-Based Analysis of Hong Kong Undergraduate Students’ Attitudes Toward Generative AI in Higher Education and Employment. Education Sciences , 15 (7), 798. https://doi.org/10.3390/educsci15070798 Lintangesukmanjaya, R., Putra, D., & Rahmawati, N. (2025). Measuring learners’ critical thinking skills using argument-based assessment in higher education. Journal of Educational Assessment and Evaluation, 9(2), 45–63. https://doi.org/10.5555/jeae.2025.97 *Liu, H., Zhou, F., & Li, J. (2025). Empower or Disempower: The Impact of Generative Artificial Intelligence on College Students’ Creativity. 2025 7th International Conference on Computer Science and Technologies in Education (CSTE) , 661–665. https://doi.org/10.1109/CSTE64638.2025.11091881 Lo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. Education Sciences , 13 (4), 410. https://doi.org/10.3390/educsci13040410 Lodge, J. M., Howard, S., Bearman, M., & Dawson, P. (2023). Assessment reform for the age of artificial intelligence . Tertiary Education Quality and Standards Agency (TEQSA). https://www.teqsa.gov.au/guides-resources/resources/corporate-publications/assessment-reform-age-artificial-intelligence Lunny, C., Higgins, J. P. T., Welton, N. J., Caldwell, D. M., Dias, S., Eldridge, S., Ferroni, E., Furukawa, T. A., Gallo, V., Ioannidis, J. P. A., Jansen, J. P., Johnson, K. R., Jørgensen, L., Page, M. J., Rutter, H., Salanti, G., Schünemann, H. J., Sutton, A. J., Thorlund, K., ... Egger, M. (2025). Risk of Bias in Network Meta-Analysis (RoB NMA) tool. BMJ , 388 , e079839. https://doi.org/10.1136/bmj-2024-079839 Marshall, I. J., Nye, B., Kuiper, J., Noel-Storr, A., Marshall, R., Zelko, D., Thomas, J., & Wallace, B. C. (2021). Data extraction methods for systematic review (semi)automation: Update of a living systematic review. F1000Research , 10 , 401. https://doi.org/10.12688/f1000research.51117.2 *Martín-Gómez, S., & González Ruiz, C. J. (2025). AI in Higher Education: Initial Teacher Training in the Critical and Didactic Use of Artificial Intelligence. IEEE Revista Iberoamericana de Tecnologias Del Aprendizaje , 20 , 302–309. https://doi.org/10.1109/RITA.2025.3616509 McHugh, M. L. (2012). Interrater reliability: The kappa statistic. Biochemia Medica , 22 (3), 276–282. https://doi.org/10.11613/bm.2012.031 *Miah, A. S. M., Tusher, M. M. R., Hossain, Md. M., Hossain, M. M., Rahim, M. A., Hamid, M. E., Islam, Md. S., & Shin, J. (2025). ChatGPT in Research and Education: A SWOT Analysis of Its Academic Impact. Computer Modeling in Engineering & Sciences , 143 (3), 2573–2614. https://doi.org/10.32604/cmes.2025.064168 Michel-Villarreal, R., Vilalta-Perdomo, E., Salinas-Navarro, D. E., Thierry-Aguilera, R., & Gerardou, F. S. (2023). Challenges and opportunities of generative AI for higher education as explained by ChatGPT. Education Sciences , 13 (9), Article 856. https://doi.org/10.3390/educsci13090856 Miranda, J. P. P., Cruz, M. A. D., Fernandez, A. B., Balahadia, F. F., Aviles, J. S., Caro, C. A., Liwanag, I. G., & Gaña, E. P. (2025). Erosion of critical academic skills due to AI dependency among tertiary students: A path analysis. In M. B. Garcia, J. Rosak-Szyrocka, & A. Bozkurt (Eds.), Pitfalls of AI integration in education: Skill obsolescence, misuse, and bias (pp. 25–48). IGI Global. https://doi.org/10.4018/979-8-3373-0122-8.ch002 Mishra, P., & Koehler, M. J. (2006). Technological pedagogical content knowledge: A framework for teacher knowledge. Teachers College Record , 108 (6), 1017–1054. https://doi.org/10.1111/j.1467-9620.2006.00684.x Mollick, E. R., & Mollick, L. (2023). Assigning AI: Seven approaches for students, with prompts . SSRN. https://dx.doi.org/10.2139/ssrn.4475995 Moore, T. (2013). Critical thinking: Seven definitions in search of a concept. Studies in Higher Education , 38 (4), 506–522. https://doi.org/10.1080/03075079.2011.586995 *Mun, C. (2024). EFL Learners’ English Writing Feedback and Their Perception of Using ChatGPT. STEM Journal , 25 (2), 26–39. https://doi.org/10.16875/stem.2024.25.2.26 *Nasr, N. R., Tu, C.-H., Werner, J., Bauer, T., Yen, C.-J., & Sujo-Montes, L. (2025). Exploring the Impact of Generative AI ChatGPT on Critical Thinking in Higher Education: Passive AI-Directed Use or Human–AI Supported Collaboration? Education Sciences , 15 (9), 1198. https://doi.org/10.3390/educsci15091198 Ng, D. T. K., Leung, J. K. L., Chu, S. K. W., & Qiao, M. S. (2021). Conceptualizing AI literacy: An exploratory review. Computers and Education: Artificial Intelligence , 2 , Article 100041. https://doi.org/10.1016/j.caeai.2021.100041 Ning, Y., Zhang, C., Xu, B., Zhou, Y., & Wijaya, T. T. (2024). Teachers’ AI-TPACK: Exploring the Relationship between Knowledge Elements. Sustainability, 16(3), 978. https://doi.org/10.3390/su16030978 Nordström, T., Kalmendal, A., & Batinović, L. (2023). Risk of bias and open science practices in systematic reviews of educational effectiveness: A meta-review. Review of Education, 11 (3), e3443. https://doi.org/10.1002/rev3.3443 Nowell, L. S., Norris, J. M., White, D. E., & Moules, N. J. (2017). Thematic analysis: Striving to meet the trustworthiness criteria. International Journal of Qualitative Methods , 16 , Article 1609406917733847. https://doi.org/10.1177/1609406917733847 *Nusivera, E., Hikmat, A., & Ghani, A. R. A. (2025). Integration of Chat-GPT Usage in Language Learning Model to Improve Argumentation Skills, Complex Comprehension Skills, and Critical Thinking Skills. International Journal of Learning, Teaching and Educational Research , 24 (2), 375–390. https://doi.org/10.26803/ijlter.24.2.19 *Oliva-Córdova, L. M., Álvarez-Icaza, I., & George-Reyes, C. E. (2025). Evaluation of Generative AI Use to Foster Critical Thinking in Higher Education. IEEE Revista Iberoamericana de Tecnologias Del Aprendizaje , 20 , 237–243. https://doi.org/10.1109/RITA.2025.3597848 Ouzzani, M., Hammady, H., Fedorowicz, Z., & Elmagarmid, A. (2016). Rayyan—a web and mobile app for systematic reviews. Systematic Reviews , 5 (1), 210. https://doi.org/10.1186/s13643-016-0384-4 Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., . . . Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ , 372 , Article n71. https://doi.org/10.1136/bmj.n71 Pardos, Z. A., & Bhandari, S. (2024). ChatGPT-generated help produces learning gains equivalent to human tutor-authored help on mathematics skills. PLOS ONE , 19 (5), Article e0304013. https://doi.org/10.1371/journal.pone.0304013 Pellas, N. (2025). The role of students’ higher-order thinking skills in the relationship between academic achievements and machine learning using generative AI chatbots. Research and Practice in Technology Enhanced Learning , 20 , 036. https://doi.org/10.58459/rptel.2025.20036 Perkins, M., Furze, L., Roe, J., & MacVaugh, J. (2024). The Artificial Intelligence Assessment Scale (AIAS): A framework for ethical integration of generative AI in educational assessment. Journal of University Teaching & Learning Practice , 21 (6), Article 01. https://doi.org/10.53761/q3azde36 Premkumar, P. P., Yatigammana, M. R. K. N., & Kannangara, S. (2024). Impact of generative AI on critical thinking skills in undergraduates: A systematic review. Journal of Desk Research Review and Analysis , 2 (2), 215–232. https://doi.org/10.4038/jdrra.v2i2.52 Qu, X., Sherwood, J., Liu, P., & Aleisa, N. (2025). Generative AI tools in higher education: A meta-analysis of cognitive impact. In Extended abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25) (pp. 1–9). Association for Computing Machinery. https://doi.org/10.1145/3706599.3719841 *Rastogi, A., & Ashraf Ali Hassan Al Lawati, A. G. (2024). Understanding the acceptance of ChatGPT by HEI’s students for knowledge enhancement. 2024 1st International Conference on Innovative Engineering Sciences and Technological Research (ICIESTR) , 1–8. https://doi.org/10.1109/ICIESTR60916.2024.10798141 Risko, E. F., & Gilbert, S. J. (2016). Cognitive offloading. Trends in Cognitive Sciences , 20 (9), 676–688. https://doi.org/10.1016/j.tics.2016.07.002 *Ruiz-Rojas, L. I., Salvador-Ullauri, L., & Acosta-Vargas, P. (2024). Collaborative Working and Critical Thinking: Adoption of Generative Artificial Intelligence Tools in Higher Education. Sustainability , 16 (13), 5367. https://doi.org/10.3390/su16135367 Seufert, S., & Rohwer, K. (2024). Developing a framework for analysing and assessing critical thinking skills for the responsible use of generative AI in higher education. In ICERI2024 Proceedings (pp. 8579–8585). IATED Academy. https://doi.org/10.21125/iceri.2024.2139 *Sanchez-Lopez, A. L., Jimenez-Perez, M. I., Perfecto-Avalos, Y., Navarro-Lopez, D. E., Esparza-Sanchez, J., & Mena, E. R. L. (2025). Integration of Artificial Intelligence as a Tool to Enhance Critical Thinking Skills and Foster Learning in Bioengineering Education. 2025 IEEE Global Engineering Education Conference (EDUCON) , 1–5. https://doi.org/10.1109/EDUCON62633.2025.11016586 Sardi, J., Yuliana, D. F., Yanto, D. T. P., Eliza, F., Candra, O., Habibullah, H., & Darmansyah, D. (2025). How Generative AI Influences Students’ Self-Regulated Learning and Critical Thinking Skills? A Systematic Review. International Journal of Engineering Pedagogy (iJEP), 15(1), 94–108. https://doi.org/10.3991/ijep.v15i1.53379 Shafer, D. (2025). A critical thinking thematic framework and observation tool for improved theory and developing secondary teachers’ instructional practice: Proof of concept. Thinking Skills and Creativity , 56 , 101787. https://doi.org/10.1016/j.tsc.2025.101787 *Shi, H., Chai, C. S., Zhou, S., & Aubrey, S. (2025). Comparing the effects of ChatGPT and automated writing evaluation on students’ writing and ideal L2 writing self. Computer Assisted Language Learning , 1–28. https://doi.org/10.1080/09588221.2025.2454541 Sobkowiak, P. (2016). Critical thinking in the intercultural context: Investigating EFL textbooks. Studies in Second Language Learning and Teaching , 6 (4), 697–718. https://doi.org/10.14746/ssllt.2016.6.4.7 Solyst, J., Pan, M. Y., Andam, A., Poblete, I. P., Eslami, M., Hammer, J., Ogan, A., & Stewart, A. E. (2025). Critical AI literacy through exploring generative AI limitations. In A. Rajala, A. Cortez, R. Hofmann, A. Jornet, H. Lotz-Sisitka, & L. Markauskaite (Eds.), Proceedings of the 19th International Conference of the Learning Sciences - ICLS 2025 (pp. 2061–2065). International Society of the Learning Sciences. https://repository.isls.org/handle/1/11423 *Song, D., Zhang, P., Zhu, Y., Qi, S., Yang, Y., Gong, L., & Zhou, L. (2025). Effects of generative artificial intelligence on higher-order thinking skills and artificial intelligence literacy in nursing undergraduates: A quasi-experimental study. Nurse Education in Practice , 88 , 104549. https://doi.org/10.1016/j.nepr.2025.104549 Southworth, J., Migliaccio, K., Glover, J., Glover, J., Reed, D., McCarty, C., Brendemuhl, J., & Thomas, A. (2022). Developing a model for AI Across the curriculum: Transforming the higher education landscape via innovation in AI literacy. Computers and Education: Artificial Intelligence , 4 , 100127. https://doi.org/10.1016/j.caeai.2023.100127 *Styve, A., Virkki, O. T., & Naeem, U. (2024). Developing Critical Thinking Practices Interwoven with Generative AI Usage in an Introductory Programming Course. 2024 IEEE Global Engineering Education Conference (EDUCON) , 01–08. https://doi.org/10.1109/EDUCON60312.2024.10578746 Sweller, J., van Merriënboer, J. J. G., & Paas, F. (2019). Cognitive architecture and instructional design: 20 years later. Educational Psychology Review , 31 (2), 261–292. https://doi.org/10.1007/s10648-019-09465-5 Swiecki, Z., Khosravi, H., Chen, G., Martinez-Maldonado, R., Lodge, J. M., Milligan, S., Selwyn, N., & Gašević, D. (2022). Assessment in the age of artificial intelligence. Computers and Education: Artificial Intelligence , 3 , Article 100075. https://doi.org/10.1016/j.caeai.2022.100075 Tian, J., & Zhang, R. (2025). Learners' AI dependence and critical thinking: The psychological mechanism of fatigue and the social buffering role of AI literacy. Acta Psychologica , 260 , 105725. https://doi.org/10.1016/j.actpsy.2025.105725 Tiruneh, D. T., Verburgh, A., & Elen, J. (2014). Effectiveness of critical thinking instruction in higher education: A systematic review of intervention studies. Higher Education Studies , 4 (1), 1–17. https://doi.org/10.5539/hes.v4n1p1 Tlili, A., Shehata, B., Adarkwah, M. A., Bozkurt, A., Hickey, D. T., Huang, R., & Agyemang, B. (2023). What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education. Smart Learning Environments , 10 (1), Article 15. https://doi.org/10.1186/s40561-023-00237-x *Toscano, R., Guerra, M. A., Durán-Ballén, S., & Valarezo, B. M. (2024). WIP - Development of critical thinking in AEC students aided by artificial intelligence. In 2024 IEEE Frontiers in Education Conference (FIE) (pp. 1–6). IEEE. https://doi.org/10.1109/FIE61694.2024.10893092 *Tressyalina, T., Ghaluh, B. M., Wulandari, E., Arief, E., & Noveria, E. (2025). Enhancing students' critical thinking in criminal case solving: An AI-based pragmatic application for analyzing authentic Indonesian texts and videos. Interactive Learning Environments , 1–33. Advance online publication. https://doi.org/10.1080/10494820.2025.2504062 *Trikoili, A., Georgiou, D., Pappa, C. I., & Pittich, D. (2025). Critical Thinking Assessment in Higher Education: A Mixed-Methods Comparative Analysis of AI and Human Evaluator. International Journal of Human–Computer Interaction , 1–14. https://doi.org/10.1080/10447318.2025.2499164 van de Pol, J., Volman, M., & Beishuizen, J. (2010). Scaffolding in teacher–student interaction: A decade of research. Educational Psychology Review , 22 (3), 271–296. https://doi.org/10.1007/s10648-010-9127-6 Vuogan, A. & Li, S. (2024). A systematic review of meta-analyses in second language research: current practices, issues, and recommendations. Applied Linguistics Review , 15 (4), 1621-1644. https://doi.org/10.1515/applirev-2022-0192 Vygotsky, L. S. (1978). Mind in society: The development of higher psychological processes . Harvard University Press. *Wahba, F., Ajlouni, A. O., & Abumosa, M. A. (2024). The impact of ChatGPT-based learning statistics on undergraduates’ statistical reasoning and attitudes toward statistics. Eurasia Journal of Mathematics, Science and Technology Education , 20 (7), em2468. https://doi.org/10.29333/ejmste/14726 *Waziana, W., Andewi, W., Wibisono, D., Hastomo, T. and Muslihudin, M. (2025). Exploring ChatGPT’s Impact on Critical, Creative, and Reflective Thinking Skills: A Mixed-Methods Study in an Indonesian EFL Classroom. Applied Research on English Language , 14 (4), 77-114. http://doi.org/10.22108/are.2025.145896.2564 Wiredu, J. K., Zakaria, H., & Abuba, N. S. (2024). Impact of Generative AI in Academic Integrity and Learning Outcomes: A Case Study in the Upper East Region. Asian Journal of Research in Computer Science, 17(8), 70–88. https://doi.org/10.9734/ajrcos/2024/v17i7491 Winkler, R., & Sörensen, J. F. L. (2024). Artificial intelligence alone will not democratise education: On educational inequality, techno-solutionism and inclusive tools . Sustainability, 16 (2), 781. https://doi.org/10.3390/su16020781 Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry , 17 (2), 89–100. https://doi.org/10.1111/j.1469-7610.1976.tb00381.x *Wu, B., He, Y.-N., Song, Y., & Li, H.-H. (2025). Fostering critical thinking in higher education: An intelligent dialogue-based approach empowered by conversational AI. Interactive Learning Environments , 1–18. https://doi.org/10.1080/10494820.2025.2538750 Xin, Y., Hao, G., Zhu, H., Shen, J., Yang, Y., & Ghanbari-H., A. (2025). Poor reporting quality and high proportion of missing data in economic evaluations alongside pragmatic trials: A cross-sectional survey. BMC Medical Research Methodology , 25 , 61. https://doi.org/10.1186/s12874-025-02519- Xu, F., Gage, N., Zeng, S., Zhang, M., Iun, A., O’Riordan, M., & Kim, E. (2024). The Use of Digital Interventions for Children and Adolescents with Autism Spectrum Disorder—A Meta-Analysis. Journal of Autism and Developmental Disorders. https://doi.org/10.1007/s10803-024-06563-4 Zhang, L., & Al Shammari, H. (2025). Systematic literature review on critical thinking in higher education: Trends, measures and interventions. Learning Gate Journal of Educational Research, 12(1), 1–22. https://learning-gate.com/index.php/2576-8484/article/view/7377 *Zhang, Q., Siraj, S. B., & Abdul Razak, R. B. (2025). Effects of AI chatbots on EFL students’ critical thinking skills and intrinsic motivation in argumentative writing. Innovation in Language Learning and Teaching , 1–29. https://doi.org/10.1080/17501229.2025.2515111 *Zhang, Y., Lai, X., Yi, S., & Lu, Y. (2025). Does ChatGPT-based reading platform impact foreign language paper reading? Evidence from a quasi-experimental study on Chinese undergraduate students. Education and Information Technologies , 30 (7), 9737–9754. https://doi.org/10.1007/s10639-024-13190-0 *Zhou, X., Teng, D., & Al-Samarraie, H. (2024). The Mediating Role of Generative AI Self-Regulation on Students’ Critical Thinking and Problem-Solving. Education Sciences , 14 (12), 1302. https://doi.org/10.3390/educsci14121302 *Zou, D., Zhang, H., Zhao, Y., & Xu, P. (2025). Unleashing the potential: How ChatGPT improves gisting skills in student interpreters. The Interpreter and Translator Trainer , 19 (1), 1–25. https://doi.org/10.1080/1750399X.2025.2507540 *Zou, X., Su, P., Li, L., & Fu, P. (2024). AI-generated content tools and students’ critical thinking: Insights from a Chinese university. IFLA Journal , 50 (2), 228–241. https://doi.org/10.1177/03400352231214963 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9461812","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Systematic Review","associatedPublications":[],"authors":[{"id":626334039,"identity":"f086e519-92f0-476e-819b-afa2bd8e8fb2","order_by":0,"name":"Jecha Jecha","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA+0lEQVRIiWNgGAWjYBACxgYGNgaGChjXgGgtZ+BaoHoO4NfFxsDYBucQoYW5vfnZg5/z6uTNJZKfffxR8Cexgf3wA+YPf/A4rOeYuWHvNjbDnTPSjGfzGBgkNvCkGTAc4MGjZUYOmwTvNh7GDWcOGDMzgLQw5AAdJoFHy/w3bJJ/50jYbzhz/DPjD5AW/jdALXiCjnEGD5s0b4NB4objPcYMYIdJgGxJwOeXNDNpmWMJyUAtxcw8BsbGbRLPDA6cOYBbi2H74WeSb2rqbDccZt/M+OOPnGw/f/LDBxV4QsywAU3AERRHeOxgYJBHF7DHp3oUjIJRMApGJgAA6SVOuGMckw4AAAAASUVORK5CYII=","orcid":"","institution":"Southwest University","correspondingAuthor":true,"prefix":"","firstName":"Jecha","middleName":"","lastName":"Jecha","suffix":""},{"id":626334040,"identity":"72db27ce-d1a6-4103-b726-d42032e92085","order_by":1,"name":"Yu Liang","email":"","orcid":"","institution":"Southwest University","correspondingAuthor":false,"prefix":"","firstName":"Yu","middleName":"","lastName":"Liang","suffix":""},{"id":626334041,"identity":"2e9b14ce-ca29-46b5-8617-24129176a0a2","order_by":2,"name":"Jining Han","email":"","orcid":"","institution":"Southwest University","correspondingAuthor":false,"prefix":"","firstName":"Jining","middleName":"","lastName":"Han","suffix":""},{"id":626334042,"identity":"feaeb9f8-cd60-4db6-9f3b-419740bfac86","order_by":3,"name":"Hayfa Nassor","email":"","orcid":"","institution":"Zanzibar University","correspondingAuthor":false,"prefix":"","firstName":"Hayfa","middleName":"","lastName":"Nassor","suffix":""}],"badges":[],"createdAt":"2026-04-19 11:54:05","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9461812/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9461812/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107526783,"identity":"f3b5f63d-503c-4baa-9c92-803770999364","added_by":"auto","created_at":"2026-04-22 09:44:25","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":72830,"visible":true,"origin":"","legend":"\u003cp\u003eArticle search and selection processes.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-9461812/v1/2c74773349b6ed8df72a97a2.png"},{"id":107526779,"identity":"a37a2c9a-f8e1-4334-9ecc-8849dc58696e","added_by":"auto","created_at":"2026-04-22 09:44:25","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":35652,"visible":true,"origin":"","legend":"\u003cp\u003eQuality of the included studies.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-9461812/v1/a10623b5646946b46feb18dd.png"},{"id":107706019,"identity":"bf078748-5f4f-4d5d-b25e-407156273161","added_by":"auto","created_at":"2026-04-24 09:17:09","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":116342,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ea.\u003c/strong\u003e Between-group forest plot (AI vs. non-AI teaching).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb.\u003c/strong\u003e Funnel plot of between-group effects.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003ec.\u003c/strong\u003e Trim-and-fill and fail-safe N diagnostics.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-9461812/v1/9ecab7071fa48f5859628aac.png"},{"id":107705437,"identity":"f88a8ea6-b05a-4b7b-83f2-0d45017e8adc","added_by":"auto","created_at":"2026-04-24 09:12:42","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":104995,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ea.\u003c/strong\u003e Within-group (pre–post AI courses) forest plot.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb.\u003c/strong\u003e Funnel plot for nine within-group studies.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-9461812/v1/97adf809654d0e8f4f5fb689.png"},{"id":107526782,"identity":"4fc38555-6662-4b45-a2cb-16a9f73da723","added_by":"auto","created_at":"2026-04-22 09:44:25","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":122851,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ea.\u003c/strong\u003e Correlational/SEM AI–CT association forest plot.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eb.\u003c/strong\u003e Funnel plot for 13 correlational studies.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-9461812/v1/f6c15a383d9e0b98522f1e6d.png"},{"id":109442167,"identity":"1bc27bf4-23e0-4e70-a217-477e7ad00273","added_by":"auto","created_at":"2026-05-18 07:26:57","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":844412,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9461812/v1/23c8b553-b6db-43ed-bf58-6ed5853da71d.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"A Mixed-Method Meta-Analysis on the Dual Impact of Generative AI on Undergraduates’ Critical Thinking in Higher Education","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eThe widespread availability of generative AI (GenAI) tools, such as ChatGPT, represents a significant inflection point for higher education. As these systems increasingly exhibit human-like language understanding, reasoning, and content creation abilities, universities face pressing questions regarding their integration into teaching and learning (Grassini, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e). Proponents argue that GenAI may present opportunities to scaffold higher-order cognitive skills and promote deep engagement through interactive dialogues and feedback (Chan \u0026amp; Hu, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e). Conversely, there is genuine concern that freely available tools might substitute for student effort, promote academic dishonesty, and ultimately undermine critical thinking (CT) aptitudes fundamental to university education (Tlili et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e). This aligns with the distinction between performed and demonstrated CT in the GenAI era, as discussed by Zhou et al. (2024).\u003c/p\u003e \u003cp\u003eNew pedagogical models speculate that GenAI can act as a “competent outsider,” leading students to critique AI-generated content and consequently improve their evaluative judgements (Mollick \u0026amp; Mollick, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Pardos \u0026amp; Bhandari, 2023). The problem is that students find it easy to produce plausible answers, raising the spectre of cognitive offloading in which learners bypass the cognitive effort required for well-evidenced conclusions (Lodge et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e). Recent research emphasizes that this duality is mediated by instructional design, disciplinary context, and the nature of tasks undertaken by students (Bearman et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e; Hon, \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eDespite growing empirical work on GenAI in education, evidence regarding its concrete effects on CT remains fragmented. Several reviews have examined the general impact of AI on academic honesty and the student experience (Crompton \u0026amp; Burke, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Lo, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e); however, there is a notable absence of studies that systematically quantify the impact on CT outcomes. The available literature is frequently based on single-institutional case reports or self-reported data, and it remains unclear whether “enhancement” or “erosion” holds across various educational environments. A specific focus on the dual impact and its quantification across diverse pedagogical contexts, with attention to pedagogical and ethical moderators, remains uniquely underexplored. While several recent systematic reviews have explored aspects of GenAI in education (Tiruneh et al., 2025; Zhang \u0026amp; Al Shammari, \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e; Qu et al., \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e; Sardi et al., \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e), none has combined quantitative effect size analysis with qualitative thematic synthesis to investigate the dual impact on CT with detailed moderator analyses across disciplines and specific pedagogical interventions, which this study aims to address.\u003c/p\u003e \u003cp\u003eClosing these gaps is essential for evidence-based guidelines that move beyond plagiarism policies to dynamic pedagogical strategies. Unlike broader reviews, this study uniquely combines quantitative effect size analysis with qualitative thematic synthesis to explore the nuanced conditions under which GenAI affects CT.\u003c/p\u003e \u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003e1.1 Research Objectives\u003c/h2\u003e \u003cp\u003eTo address these limitations, this study aimed to:\u003c/p\u003e \u003cp\u003e \u003c/p\u003e\u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eDoes the use of GenAI in university-level education enhance or undermine undergraduate CT skills?\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eWhich disciplines and pedagogical models moderate the impact of GenAI on CT outcomes?\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003cp\u003e\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003cdiv id=\"Sec4\" class=\"Section3\"\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section3\"\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section3\"\u003e \u003c/div\u003e \u003c/div\u003e "},{"header":"2. Literature Review","content":"\u003ch2\u003e2.1 Theoretical Frameworks\u003c/h2\u003e\u003cp\u003eTo examine the relationship between GenAI and CT in higher education, this study employs a multi-theoretical lens comprising cognitive load theory (CLT), scaffolding theory, and the AI–TPACK framework. These frameworks were selected to justify the study’s dual focus on the cognitive risks (erosion) and pedagogical opportunities (enhancement) of GenAI.\u003c/p\u003e\u003ch2\u003e2.1.1 Cognitive Load Theory and Cognitive Offloading\u003c/h2\u003e\u003cp\u003eCLT (Sweller et al., \u003cspan class=\"CitationRef\"\u003e2019\u003c/span\u003e) addresses how instructional design influences learning through its effects on working memory. According to CLT, effective learning requires minimizing extraneous load, controlling intrinsic load, and facilitating germane load (effortful processing for schema construction). GenAI tools, such as ChatGPT, can reduce extraneous cognitive load by automating mundane tasks (e.g., formatting and information retrieval), potentially freeing working memory for analysis and evaluation (Swiecki et al., \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e). Conversely, the ease of replicating complete answers may inadvertently lower germane load, resulting in shallow engagement and cognitive reliance (Kim et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e). Karapantelakis et al. (\u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e) confirmed that reliance on GenAI is associated with self-reported reductions in cognitive effort.\u003c/p\u003e\u003cp\u003eThe related concept of cognitive offloading, the inclination to exploit external tools to alleviate cognitive burden (Risko \u0026amp; Gilbert, \u003cspan class=\"CitationRef\"\u003e2016\u003c/span\u003e), is particularly relevant. While adaptive in moderation, overuse of external aids may cause skill degradation and loss of cognitive autonomy (Grinschgl \u0026amp; Neubauer, \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e). GenAI tools represent a concentrated form of cognitive offloading, capable of producing synthesized arguments, structured essays, and creative content (Clark, \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e). The “extended mind” hypothesis (Clark \u0026amp; Chalmers, \u003cspan class=\"CitationRef\"\u003e1998\u003c/span\u003e) suggests that GenAI can become a thinking partner that extends human cognition, provided it is used critically. However, when used passively, the tool shifts from a cognitive partner to a cognitive crutch, impairing independent thinking (Lodge et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e). This study uses CLT to analyze whether GenAI-supported pedagogies retain adequate germane load for critical reflection or instead encourage passive consumption.\u003c/p\u003e\u003ch2\u003e2.1.2 Scaffolding Theory\u003c/h2\u003e\u003cp\u003eScaffolding theory, rooted in Vygotsky’s (\u003cspan class=\"CitationRef\"\u003e1978\u003c/span\u003e) work, provides a framework for viewing temporary supportive structures as necessary for student development. Effective scaffolding in higher education fosters CT by helping students engage in deeper cognitive processing, including analysis, synthesis, and evaluation (van de Pol et al., \u003cspan class=\"CitationRef\"\u003e2010\u003c/span\u003e). GenAI tools have been proposed as interactive digital scaffolds offering learners just-in-time support (Holmes \u0026amp; Porayska-Pomsta, \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e; Mollick \u0026amp; Mollick, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e). For instance, AI-enabled writing assistants can provide real-time feedback on argument structure, and as Socratic interlocutors, GenAI tools can promote CT by requiring students to justify their reasoning (Chan \u0026amp; Hu, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e). Recent applications in STEM and language education have shown that AI-supported debate, case analysis, and reflective writing can increase evaluative judgement when scaffolding is embedded in task design (Biagini et al., \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e; Hwang et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eHowever, scaffolding theory also warns against over-scaffolding, in which excessive support prevents productive struggle (Koedinger \u0026amp; Aleven, \u003cspan class=\"CitationRef\"\u003e2007\u003c/span\u003e). In the GenAI context, this occurs when students rely on AI-generated responses rather than building knowledge incrementally. The effectiveness of AI as a scaffold depends on task structure, teacher guidance, and students’ metacognitive awareness (Biagini et al., \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e). This study applies scaffolding theory to explore how various pedagogical models utilize GenAI as a scaffold for CT without encouraging cognitive dependency.\u003c/p\u003e\u003ch2\u003e2.1.3 The AI-TPACK Framework\u003c/h2\u003e\u003cp\u003eAdapted from Mishra and Koehler’s (\u003cspan class=\"CitationRef\"\u003e2006\u003c/span\u003e) TPACK model, the AI-TPACK framework (Ning et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e) emphasizes that sound GenAI integration requires not only technical skills but also consideration of what AI can and cannot do for domain-specific goals. Teachers AI-TPACK capabilities, including AI-enhanced task design, scaffolding students to critically evaluate AI outputs, and considering ethical implications, are strong predictors of successful AI integration (Celik et al., \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e). When educators have high AI-TPACK, they develop activities that position GenAI as something to critique rather than as a source of answers, fostering CT and AI literacy simultaneously (Holmes \u0026amp; Porayska-Pomsta, \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e). The AI-TPACK framework is used in this study to explain the role of teacher enactment, ethical framing, and disciplinary context in shaping GenAI’s impact on CT.\u003c/p\u003e\u003cp\u003eTogether, these three theoretical lenses provide a foundation for examining the interaction between GenAI and CT. CLT informs how GenAI can facilitate or hinder learning at the cognitive level, scaffolding theory addresses how AI-supported pedagogy should be designed to develop higher-order thinking, and AI-TPACK underscores the centrality of teacher knowledge and disciplinary context.\u003c/p\u003e\u003ch2\u003e2.2 Critical Thinking in the Era of GenAI\u003c/h2\u003e\u003cp\u003eCT remains one of the most desired yet contentious concepts in higher education. Facione’s (\u003cspan class=\"CitationRef\"\u003e1990\u003c/span\u003e) model posits six essential CT skills: interpretation, analysis, evaluation, inference, explanation, and self-regulation. Ennis (\u003cspan class=\"CitationRef\"\u003e1993\u003c/span\u003e) complemented this with a dispositional perspective, defining CT as “reasonable reflective thinking focused on deciding what to believe or do.” More recent work advocates domain-specific models to account for how disciplinary epistemologies shape CT (Davies, \u003cspan class=\"CitationRef\"\u003e2015\u003c/span\u003e; Moore, \u003cspan class=\"CitationRef\"\u003e2013\u003c/span\u003e). This study defines CT as a collection of cognitive processes (analysis, evaluation, inference, and reflection) and dispositions (curiosity, skepticism, and intellectual humility) that enable learners to thoughtfully engage with complex problems.\u003c/p\u003e\u003cp\u003eThe relationship between GenAI and CT varies across disciplines, reflecting differences in epistemological norms and task structures (Moore, \u003cspan class=\"CitationRef\"\u003e2013\u003c/span\u003e; Hon, \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e). In the humanities and social sciences, GenAI has been used to scaffold close reading, source evaluation, and essay writing, although ethical concerns about writing as evidence of thinking are prominent (Chan \u0026amp; Hu, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Bearman et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e). In STEM fields, AI-infused inquiry can enhance analytical engagement when students evaluate AI-generated solutions or debug AI code, although the risks of cognitive offloading are notable in programming instruction (Hwang et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Kim et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e). In professional disciplines (health, business, and education), AI-assisted case-based learning can enhance diagnostic reasoning alongside expert feedback, although concerns about professional judgement remain (Hon, \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e; Lodge et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e). While this disciplinary variability is recognized, prior meta-analyses that specifically dissect these differences are scarce, further underscoring the contribution of this study.\u003c/p\u003e\u003ch2\u003e2.3 Pedagogical Models and the Dual Effects of GenAI\u003c/h2\u003e\u003cp\u003eThe impact of GenAI on learning depends almost entirely on the pedagogical model in which it is embedded. Three predominant integration models have emerged: (a) GenAI as a “competent outsider” or object of critique, where students evaluate, challenge, and improve AI-generated content against standards of accuracy and relevance (Mollick \u0026amp; Mollick, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Holmes \u0026amp; Porayska-Pomsta, \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e); (b) GenAI as a Socratic dialogue partner, interacting with students through iterative questioning and reflection (Chan \u0026amp; Hu, \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Pardos \u0026amp; Bhandari, 2023); and (c) GenAI for inquiry- or problem-based learning, where AI-enabled tools serve as resources for hypothesis development and information synthesis (Holmes \u0026amp; Porayska-Pomsta, \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e; Hon, \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e). Additionally, “AI-resilient assessment” redesigns evaluation to focus on processes, justifications, and metacognition rather than products easily generated by AI (Lodge et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Bearman et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eEmpirical evidence regarding the impact of GenAI on CT is divided. On the enhancement side, experimental and quasi-experimental studies report increased CT among students in AI-supported activities compared to traditional instruction, particularly when GenAI scaffolds structured dialogues, argumentation, and reflective inquiry (Hwang et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Pardos \u0026amp; Bhandari, 2023). On the erosion side, students show lower writing quality when relying heavily on AI-generated text, weakened retention when AI serves as an “answer machine,” and lowered self-efficacy in independent thinking (Kim et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e; Lodge et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e; Perkins et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e). The causes of erosion operate at multiple levels: cognitive (bypassing deliberate practice), motivational (discouraging engagement with challenging tasks), and metacognitive (inducing false comprehension through AI explanations) (Risko \u0026amp; Gilbert, \u003cspan class=\"CitationRef\"\u003e2016\u003c/span\u003e; Sweller et al., \u003cspan class=\"CitationRef\"\u003e2019\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eCrucially, teacher guidance and ethical framing play central roles. Research comparing high- and low-guidance implementations indicates that structured, teacher-facilitated use significantly outperforms unstructured AI engagement in terms of CT gains (Hon, \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e; Hwang et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e). Explicit ethical framing, including discussions of AI bias, hallucinations, and academic integrity, has been shown to encourage deeper and more responsible learning (Bearman et al., \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e; Lodge et al., \u003cspan class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e\u003ch2\u003e2.4 Summary and Research Gaps\u003c/h2\u003e\u003cp\u003eWhile recent meta-analyses have broadly explored the impact of AI on education (Seufert \u0026amp; Rohwer, \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e; Solyst et al., \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e; Qu et al., \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e), this study provides a unique quantification of GenAI’s bidirectional impact on CT by integrating effect size analysis with thematic synthesis to identify critical moderators. This study addresses the following key gaps: (a) lack of systematic quantification of CT effects across multiple experimental studies; (b) insufficient disciplinary granularity in existing reviews; and (c) limited empirical evidence about specific, actionable pedagogical models for GenAI integration, moving beyond the general recommendations found in broader reviews (Li et al., \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e; Sardi et al., \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e"},{"header":"3. Methods","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Research Design\u003c/h2\u003e \u003cp\u003eThis study was conducted as a mixed-method systematic review and meta-analysis following the PRISMA 2020 guidelines (Page et al., \u003cspan citationid=\"CR81\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). The review incorporated a quantitative meta-analysis of effect sizes and a qualitative thematic synthesis of how GenAI has been reported to impact CT. A PRISMA 2020 flow diagram documented how studies were identified and screened.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Eligibility Criteria\u003c/h2\u003e \u003cp\u003eStudies were included if they (a) involved higher education or tertiary-level participants; (b) examined the use of GenAI tools (defined as systems driven by large language models that generate novel textual or multimodal content, for example, ChatGPT, GPT-based writing assistants, and multi-agent tutoring environments) for teaching, learning, assessment, or academic support; (c) reported at least one CT-related outcome (standardized CT measure, researcher-developed test of CT skills/dispositions, subscale scores in broader higher-order thinking measures, or constructs such as evaluative judgement or argumentation quality); and (d) employed a design yielding empirical evidence of impact (between-group comparisons, within-group pre-post designs, or correlational/model-based analyses). Studies were excluded if they were conceptual or opinion-only, did not involve GenAI, involved non-higher education participants, or failed to report CT-relevant outcomes. These criteria are consistent with recent meta-analyses on GenAI in higher education (Giannakos, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Zhang \u0026amp; Al Shammari, \u003cspan citationid=\"CR119\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). The full criteria are detailed in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eInclusion and Exclusion Criteria\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"2\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eInclusion\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eExclusion\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eStudies from 2022\u0026ndash;2025 using GenAI in higher education\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo actual AI integration\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEmpirical, mixed-methods, quasi-experimental, case studies\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNo clear CT metrics\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMust report CT-related outcomes\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eExclusively primary/secondary education\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eEnglish language\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eNon-English language\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Search Strategy and Study Selection\u003c/h2\u003e \u003cp\u003eThe search retrieved 653 records from IEEE Xplore, Web of Science, and ERIC. These three databases were selected because they provide complementary coverage of the target literature: IEEE Xplore captures technology-focused educational research (conference proceedings and journals in computing and engineering education), Web of Science provides broad multidisciplinary coverage of high-impact journals, and ERIC is the premier database for educational research and policy. Together, they ensure representation across both the technology and education dimensions of GenAI research while maintaining quality thresholds (all are indexed, peer-reviewed sources). Records were exported in the RIS format and managed through Rayyan for blinded screening, conflict resolution, and documentation of exclusion reasons (Ouzzani et al., \u003cspan citationid=\"CR80\" class=\"CitationRef\"\u003e2016\u003c/span\u003e). The search string was: \u003cem\u003e(\u0026ldquo;generative AI\u0026rdquo; OR \u0026ldquo;ChatGPT\u0026rdquo; OR \u0026ldquo;large language model\u0026rdquo;) AND (\u0026ldquo;critical thinking\u0026rdquo; OR \u0026ldquo;self-regulation\u0026rdquo; OR \u0026ldquo;cognitive skills\u0026rdquo;) AND (\u0026ldquo;university\u0026rdquo; OR \u0026ldquo;higher education\u0026rdquo;)\u003c/em\u003e. Backward citation chasing through included articles supplemented the database searches.\u003c/p\u003e \u003cp\u003eThe study selection proceeded in two phases. In Phase 1, two reviewers independently screened all titles and abstracts in Rayyan. In Phase 2, both reviewers independently read all candidate full texts and recorded the reasons for exclusion. This process yielded 43 studies that met all inclusion criteria.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Inter-Rater Reliability\u003c/h2\u003e \u003cp\u003eBoth reviewers screened all 653 records, achieving Cohen\u0026rsquo;s kappa κ\u0026thinsp;=\u0026thinsp;1.00 (109 included, 544 excluded by both), surpassing the \u0026ldquo;almost perfect\u0026rdquo; agreement threshold (McHugh, \u003cspan citationid=\"CR65\" class=\"CitationRef\"\u003e2012\u003c/span\u003e). All unclear cases were resolved through discussion, and no records remained discordant after consensus, as recommended by the methodological guidance (Calderon et al., 2025; Hanegraaf et al., \u003cspan citationid=\"CR41\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e3.5 Data Extraction and Coding\u003c/h2\u003e \u003cp\u003eData were extracted into three interlinked spreadsheets (StudyInfo_CT, EffectSizes_CT, and Qualitative-CT_Themes) following established guidance for mixed-methods syntheses (Marshall et al., \u003cspan citationid=\"CR63\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). StudyInfo_CT captured bibliographic and contextual information for each study, along with moderators including AI use case, pedagogical model, teacher guidance level, ethical guidelines presence, and overall risk-of-bias judgement (Cochrane Bias Methods Group, 2024; Shafer, \u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). EffectSizes_CT contained quantitative data for computing standardized effect sizes by study design. Qualitative-CT_Themes compiled narrative summaries with structured theme coding on ordinal scales (0\u0026ndash;3), supporting the thematic synthesis.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e3.6 Study Quality Appraisal\u003c/h2\u003e \u003cp\u003eFour risk-of-bias items were coded for all 43 studies: randomization, baseline equivalence, validated CT measure, and attrition reporting (Cochrane Bias Methods Group, 2024; Shafer, \u003cspan citationid=\"CR93\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). The overall risk of bias was rated on a three-tier scale (low, some concerns, high). Risk-of-bias ratings informed the sensitivity analyses but did not serve as exclusion criteria (Calderon et al., 2025).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e3.7 Quantitative Synthesis\u003c/h2\u003e \u003cp\u003eThree families of CT outcomes were analyzed: (1) between-group comparisons (AI-supported vs. non-AI instruction), (2) within-group pre-post changes, and (3) correlational/model-based associations. All models used JASP (Version 0.95.4) with the Classical Meta-Analysis module. Hedges\u0026rsquo; \u003cem\u003eg\u003c/em\u003e was computed for between-group and within-group effects; correlational/SEM estimates were standardized to z-values. Random-effects models with REML estimation were employed throughout. Heterogeneity was assessed using Cochran\u0026rsquo;s Q and I\u0026sup2; statistics. Publication bias was evaluated through funnel plot inspection, trim-and-fill, and fail-safe N statistics. Moderator analyses focused on between-group effects using subgroup analyses and meta-regression with Knapp\u0026ndash;Hartung corrections.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e3.8 Qualitative Synthesis\u003c/h2\u003e \u003cp\u003eThe qualitative synthesis examined how GenAI was used in practice and whether it supported or hindered CT. Structured coding on ordinal scales (0\u0026ndash;3) captured overreliance on AI, plagiarism or integrity concerns, cognitive offloading or passivity, and deep CT engagement. Descriptive statistics and iterative thematic synthesis identified patterns of AI use, scaffolding, and risk profiles (Avsheniuk et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Wiredu et al., \u003cspan citationid=\"CR113\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003e3.9 Mixed-Method Integration\u003c/h2\u003e \u003cp\u003eQuantitative and qualitative results were integrated at two levels: study-level (matching individual effects to contextual and qualitative codes) and pattern-level (interpreting meta-regression results through qualitative themes). Joint displays and narrative weaving were used to explain heterogeneity and identify conditions under which GenAI acts as a CT scaffold versus a cognitive crutch (Miranda et al., \u003cspan citationid=\"CR68\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e"},{"header":"4. Results","content":"\u003cdiv id=\"Sec21\" class=\"Section2\"\u003e\n \u003ch2\u003e4.1 Study Characteristics\u003c/h2\u003e\n \u003cp\u003eA total of 43 studies were included. Approximately 43% employed between-group designs, 18% used within-group pre\u0026ndash;post designs, 27% used correlational/SEM designs, and 12% reported descriptive outcomes not amenable to quantitative synthesis. Studies spanned educational technology/digital learning (29%), education/psychology (27%), STEM/computer science/engineering (18%), language/EFL education (16%), health/nursing (2%), and other disciplines (8%). China was the most represented country of origin (29%), followed by Spain, Indonesia, Taiwan, and Malaysia (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u0026nbsp;\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eCharacteristics of Critical-Thinking\u0026ndash;Relevant Studies (n\u0026thinsp;=\u0026thinsp;43)\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eCharacteristic\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eCategory\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003en\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e%\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c5\"\u003e\n \u003cp\u003e95% CI\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eDiscipline\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eEdTech / Digital learning\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e14\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e28.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e15.9\u0026ndash;41.2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eEducation/Psychology\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e26.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e14.2\u0026ndash;38.9\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eSTEM/CS/Engineering\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e18.4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e7.5\u0026ndash;29.2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eLanguage / EFL\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e16.3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e6.0\u0026ndash;26.7\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eHealth/Nursing\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e2.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e0.0\u0026ndash;6.0\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eOther disciplines\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e8.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e0.5\u0026ndash;15.8\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eCountry\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eChina\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e14\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e28.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e15.9\u0026ndash;41.2\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eSpain\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e8.2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e0.5\u0026ndash;15.8\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eIndonesia\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e6.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e0.0\u0026ndash;12.8\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eChina/Taiwan\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e6.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e0.0\u0026ndash;12.8\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eMalaysia\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e6.1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e0.0\u0026ndash;12.8\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eOther (1\u0026ndash;2 each) *\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e22\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e44.9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e31.0\u0026ndash;58.8\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003ctfoot\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"5\"\u003e*e.g., South Korea, United States, Ghana, Iran, Norway, Singapore\u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tfoot\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec22\" class=\"Section3\"\u003e\n \u003ch2\u003e4.1.1 Study Quality and Risk of Bias\u003c/h2\u003e\n \u003cp\u003eOf the 43 studies, 33 showed low overall risk of bias, eight were assessed as moderate (some concerns), (Fig. \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Low-risk studies generally employed randomised or strong quasi-experimental designs with baseline equivalence checks, used validated CT instruments (e.g., CCTST, Watson-Glaser, CTDI), and reported sufficient outcome data. Moderate-risk studies tended to use non-random allocation without confirmed baseline equivalence or relied on self-report measures with limited validation.\u003c/p\u003e\n \u003cp\u003eSensitivity analysis re-estimated the between-group CT model after removing moderate-risk studies; the pooled effect remained significant and positive, falling within the confidence interval of the original model.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec23\" class=\"Section2\"\u003e\n \u003ch2\u003e4.2 Quantitative Meta-Analytic Findings\u003c/h2\u003e\n \u003cdiv id=\"Sec24\" class=\"Section3\"\u003e\n \u003ch2\u003e4.2.1 Between-Group Comparisons\u003c/h2\u003e\n \u003cp\u003eTwenty-one between-group effects were analysed. The pooled effect was large and positive (Hedges\u0026rsquo; \u003cem\u003eg\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.88, 95% CI [0.65, 1.11], \u003cem\u003ep\u003c/em\u003e \u0026lt; .001), indicating that students in AI-supported conditions scored nearly one standard deviation higher on CT outcomes than comparison groups (Table \u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e). Heterogeneity was moderate (Q (20)\u0026thinsp;=\u0026thinsp;81.18, \u003cem\u003ep\u003c/em\u003e \u0026lt; .001; I\u0026sup2; = 76%; \u0026tau;\u0026sup2; = 0.16). The 95% prediction interval (0.02 to 1.74) suggests that true effects range from near zero to very large across implementations. Funnel plot inspection, trim-and-fill, and fail-safe N diagnostics indicated no substantial publication bias (Figs.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e3\u003c/span\u003ea \u0026minus;\u0026thinsp;3c).\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u0026nbsp;\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eRandom-Effects Meta-Analytic Models for Critical Thinking Outcomes\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eModel\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eEffect type\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003ek\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003ePooled effect\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c5\"\u003e\n \u003cp\u003e95% CI\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c6\"\u003e\n \u003cp\u003ep\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c7\"\u003e\n \u003cp\u003eI\u0026sup2; (%)\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eBetween-group\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003ePost-test Hedges\u0026rsquo; \u003cem\u003eg\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e21\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.88\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e[0.65, 1.11]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c6\"\u003e\n \u003cp\u003e\u0026lt;\u0026thinsp;.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c7\"\u003e\n \u003cp\u003e76\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eWithin-group\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003ePre-post Hedges\u0026rsquo; \u003cem\u003eg\u003c/em\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e9\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.97\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e[0.84, 1.11]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c6\"\u003e\n \u003cp\u003e\u0026lt;\u0026thinsp;.001\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c7\"\u003e\n \u003cp\u003e57\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eCorrelational/SEM\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eGeneric effect (ES\u0026thinsp;+\u0026thinsp;SE)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e13\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e0.24\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e[\u0026minus;\u0026thinsp;0.03, 0.51]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c6\"\u003e\n \u003cp\u003e.076\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c7\"\u003e\n \u003cp\u003e\u0026asymp;\u0026thinsp;99\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec25\" class=\"Section3\"\u003e\n \u003ch2\u003e4.2.2 Within-Group Pre\u0026ndash;Post Changes\u003c/h2\u003e\n \u003cp\u003eNine pre\u0026ndash;post effects yielded a large pooled effect (\u003cem\u003eg\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.97, 95% CI [0.84, 1.11], \u003cem\u003ep\u003c/em\u003e \u0026lt; .001) with moderate heterogeneity (Q(8)\u0026thinsp;=\u0026thinsp;18.46, \u003cem\u003ep\u003c/em\u003e = .018; I\u0026sup2; = 57%) (Figs. \u003cspan refid=\"Fig5\" class=\"InternalRef\"\u003e4\u003c/span\u003ea \u0026minus;\u0026thinsp;4b). Trim-and-fill suggested two potentially missing studies, but the adjusted effect remained large (\u003cem\u003eg\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.92, 95% CI [0.81, 1.04]). These gains cannot be attributed solely to GenAI, as they partly reflect ordinary course learning.\u003c/p\u003e\n \u003c/div\u003e\n \u003cdiv id=\"Sec26\" class=\"Section3\"\u003e\n \u003ch2\u003e4.2.3 Correlational and Model-Based Evidence\u003c/h2\u003e\n \u003cp\u003eThirteen correlational/SEM effects produced a small, positive, but non-significant pooled effect (0.24, 95% CI [\u0026minus;\u0026thinsp;0.03, 0.51], \u003cem\u003ep\u003c/em\u003e = .076) with very high heterogeneity (I\u0026sup2; \u0026asymp; 99%) (Figs. \u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003ea\u0026ndash;\u003cspan refid=\"Fig6\" class=\"InternalRef\"\u003e5\u003c/span\u003eb). Individual studies ranged from no association to strong positive association. The simple frequency of AI use does not appear to reliably predict CT outcomes; the relationship is contingent on how AI is scaffolded within courses.\u003c/p\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec27\" class=\"Section2\"\u003e\n \u003ch2\u003e4.3 Moderator and Meta-Regression Analyses\u003c/h2\u003e\n \u003cp\u003eSubgroup analyses by discipline and AI use case were limited by small cell sizes, though positive pooled effects occurred within each discipline. Meta-regression revealed more informative patterns (Table\u0026nbsp;4).\u003c/p\u003e\n \u003cp\u003eWhen teacher guidance level was entered as a predictor, the overall moderator test was significant (F (3, 17)\u0026thinsp;=\u0026thinsp;3.55, \u003cem\u003ep\u003c/em\u003e = .037). High guidance predicted a significant positive CT effect (intercept ĝ \u0026asymp; 0.92), whereas low guidance was associated with a substantially smaller and adverse predicted effect (coefficient\u0026thinsp;=\u0026thinsp;\u0026minus;\u0026thinsp;2.12, 95% CI [\u0026minus;\u0026thinsp;3.59, \u0026minus;\u0026thinsp;0.65], \u003cem\u003ep\u003c/em\u003e = .007). Moderate and moderate-to-high guidance did not differ significantly from the high-guidance baseline.\u003c/p\u003e\n \u003cp\u003eThe presence of explicit ethical/academic integrity guidelines also significantly predicted CT outcomes (F (1, 19)\u0026thinsp;=\u0026thinsp;10.87, \u003cem\u003ep\u003c/em\u003e = .004). Conditions without guidelines yielded a negative but imprecise predicted effect (\u0026minus;\u0026thinsp;1.20, 95% CI [\u0026minus;\u0026thinsp;2.53, 0.14], \u003cem\u003ep\u003c/em\u003e = .076), while the ethical-guidelines coefficient was +\u0026thinsp;2.12 (95% CI [0.78, 3.47], \u003cem\u003ep\u003c/em\u003e = .004).\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eTable 4\u0026nbsp;\u003c/strong\u003eMeta-Regression of CT Effects on Teacher Guidance and Ethical Guidelines\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003ePanel A.\u003c/strong\u003e Omnibus tests\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u003ctable float=\"No\" id=\"Taba\" border=\"1\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003ePredictor\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eF\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003edf₁\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003edf₂\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c5\"\u003e\n \u003cp\u003ep\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eTeacher guidance level\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e3.554\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e17.00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c5\"\u003e\n \u003cp\u003e.037\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eEthical guidelines present\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e10.87\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e19.00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c5\"\u003e\n \u003cp\u003e.004\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003ctfoot\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"5\"\u003e\u003cstrong\u003ePanel B.\u003c/strong\u003e Coefficients: Teacher guidance\u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tfoot\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cdiv class=\"gridtable\"\u003e\n \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003cbr\u003e\u003c/div\u003e\u0026nbsp;\u003cbr\u003e\n \u003ctable float=\"No\" id=\"Tabb\" border=\"1\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eTerm\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eEstimate\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eSE\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e95% CI\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c5\"\u003e\n \u003cp\u003et\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c6\"\u003e\n \u003cp\u003edf\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c7\"\u003e\n \u003cp\u003ep\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eIntercept (High)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0.923\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.245\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e[0.407, 1.439]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e3.773\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\n \u003cp\u003e17.00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c7\"\u003e\n \u003cp\u003e.002\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eLow\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e\u0026minus;2.121\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.695\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e[\u0026minus;\u0026thinsp;3.588, \u0026minus;\u0026thinsp;0.654]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e\u0026minus;3.051\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\n \u003cp\u003e17.00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c7\"\u003e\n \u003cp\u003e.007\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eModerate\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e\u0026minus;0.009\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.266\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e[\u0026minus;\u0026thinsp;0.571, 0.553]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e\u0026minus;0.033\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\n \u003cp\u003e17.00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c7\"\u003e\n \u003cp\u003e.974\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eModerate to High\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e0.197\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.486\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\n \u003cp\u003e[\u0026minus;\u0026thinsp;0.827, 1.222]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e0.406\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\n \u003cp\u003e17.00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c7\"\u003e\n \u003cp\u003e.690\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003ctfoot\u003e\n \u003ctr\u003e\n \u003ctd colspan=\"7\"\u003e\u003cstrong\u003ePanel C.\u003c/strong\u003e Coefficients: Ethical guidelines\u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tfoot\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u003ctable float=\"No\" id=\"Tabc\" border=\"1\"\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eTerm\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003eEstimate\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003eSE\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e95% CI\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c5\"\u003e\n \u003cp\u003et\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c6\"\u003e\n \u003cp\u003edf\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c7\"\u003e\n \u003cp\u003ep\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eIntercept (No guidelines)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e\u0026minus;1.198\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.638\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e[\u0026minus;\u0026thinsp;2.533, 0.136]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e\u0026minus;1.879\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\n \u003cp\u003e19.00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c7\"\u003e\n \u003cp\u003e.076\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eGuidelines present (Yes)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e2.123\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e0.644\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e[0.775, 3.472]\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\n \u003cp\u003e3.297\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\n \u003cp\u003e19.00\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c7\"\u003e\n \u003cp\u003e.004\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec28\" class=\"Section2\"\u003e\n \u003ch2\u003e4.4 Qualitative Synthesis of Mechanisms and Risks\u003c/h2\u003e\n \u003cp\u003eThe qualitative synthesis of all 43 studies revealed that every study reported at least one CT-supportive outcome and at least one CT-related risk, underscoring the structurally dual nature of GenAI (Table \u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e5\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003eDeep CT engagement emerged when AI was presented as an object for critique, comparison, or structured discussion. Studies reported increases in analytical reasoning, argumentation, reflective judgement, and higher-order questioning when students challenged AI outputs or revised work based on AI feedback (Nusivera et al., 2025; Chen et al., 2025; Toscano et al., 2024). Simultaneously, nearly all studies recorded risks of overdependence and cognitive offloading: students trusting AI outputs without checking, relying on AI for phrasing and proofreading, and displaying reduced reflection effort (Essel et al., 2024; Gerlich, 2025; Huang et al., 2025). Overreliance was more pronounced among students with low baseline CT or AI literacy (Kulal, 2025; Hou et al., 2025). Academic integrity concerns were also common, with students paraphrasing AI content without adequate transformation or citation (Zou et al., 2024; Gao et al., 2025).\u003c/p\u003e\n \u003cp\u003eThe qualitative findings confirmed the meta-regression results: studies with the highest CT gains combined GenAI with explicit scaffolding, multi-phase inquiry, guided AI\u0026ndash;human comparison, reflexive debriefing, and explicit teaching about AI limitations (Martin-Gomez \u0026amp; Gonzalez Ruiz, 2025). Conversely, studies deploying AI as a generic tool with minimal guidance more often described surface-level engagement and dependence (Al-Kumaim et al., 2025; Song et al., 2025).\u0026nbsp;\u003c/p\u003e\n \u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eFrequencies of CT-Support and CT-Risk Codes (n\u0026thinsp;=\u0026thinsp;43)\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eCode/Theme\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c2\"\u003e\n \u003cp\u003en\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c3\"\u003e\n \u003cp\u003e%\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003eExample\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eDeep CT\u0026thinsp;\u0026ge;\u0026thinsp;2 (moderate to high)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e43\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e87.8\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e\u0026ldquo;ChatGPT helped us organise and challenge arguments effectively.\u0026rdquo;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eDeep CT\u0026thinsp;=\u0026thinsp;3 (high)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e24\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e49.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e\u0026ldquo;AI debates trained me to defend ideas more logically.\u0026rdquo;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eOverreliance\u0026thinsp;\u0026ge;\u0026thinsp;2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e42\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e85.7\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e\u0026ldquo;Sometimes I just accepted what ChatGPT said without verifying.\u0026rdquo;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eOffloading\u0026thinsp;\u0026ge;\u0026thinsp;2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e39\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e79.6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e\u0026ldquo;AI convenience often replaces cognitive effort.\u0026rdquo;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003ePlagiarism\u0026thinsp;\u0026ge;\u0026thinsp;1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e36\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e73.5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e\u0026ldquo;I found myself copying AI suggestions too easily.\u0026rdquo;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eCT positive flag\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e43\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e100.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e\u0026ldquo;ChatDOC helped me understand difficult papers and think critically.\u0026rdquo;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\" colname=\"c1\"\u003e\n \u003cp\u003eCT negative flag\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\n \u003cp\u003e43\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\n \u003cp\u003e100.0\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\" colname=\"c4\"\u003e\n \u003cp\u003e\u0026ldquo;AI analysis sometimes made me skip my own reasoning steps.\u0026rdquo;\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003cp\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec29\" class=\"Section2\"\u003e\n \u003ch2\u003e4.5 Summary of Integrated Findings\u003c/h2\u003e\n \u003cp\u003eThe mixed-method synthesis supports a consistent but nuanced conclusion. GenAI-augmented teaching is associated with significant average CT gains in controlled and within-group studies. However, these gains are highly uneven and do not reflect a uniform positive relationship between AI use frequency and CT. Both moderator analyses and qualitative data converge in demonstrating that pedagogical design, guidance, and ethical framing determine whether GenAI serves as a scaffold for deeper thinking or as an engine of overreliance and cognitive offloading. GenAI amplifies the underlying instructional design: embedded within structured, critically oriented tasks with explicit guidance, it enhances CT; offered as an unstructured convenience tool, it promotes shortcut strategies that may compromise thoughtful consideration.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec30\" class=\"Section2\"\u003e\n \u003ch2\u003e4.6 Answering the Research Questions\u003c/h2\u003e\n \u003cp\u003e\u003cstrong\u003eRQ1. Does the use of GenAI in university-level education enhance or undermine students\u0026rsquo; CT?\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003eThe evidence indicates that GenAI has the potential to significantly improve students\u0026rsquo; CT while introducing risks that can be detrimental when deployment is poorly designed. The between-group meta-analysis showed a large positive effect (Hedges\u0026rsquo; \u003cem\u003eg\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.88), and within-group analysis confirmed substantial gains (\u003cem\u003eg\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.97). However, correlational evidence showed only a negligible, non-significant association between AI use frequency and CT (\u003cem\u003eg\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.24, \u003cem\u003ep\u003c/em\u003e = .076) with extreme heterogeneity. Both quantitative and qualitative analyses confirmed the dual pattern: when used in structured, guided, and ethical ways, GenAI typically improves CT; when applied with minimal guidance as an \u0026ldquo;answer machine,\u0026rdquo; it can have detrimental effects.\u003c/p\u003e\n \u003cp\u003e\u003cstrong\u003eRQ2. Which disciplines and pedagogical models moderate the impact of GenAI on CT outcomes?\u003c/strong\u003e\u003c/p\u003e\n \u003cp\u003ePositive CT effects were observed across all disciplinary clusters. No area was identified in which GenAI consistently impaired CT, though cell sizes were too small for strong statistical claims about disciplinary differences. The most conservative conclusion is that GenAI can support CT across disciplines when linked to sound instructional design. Pedagogical implementation features, however, showed clear effects: high teacher guidance predicted the largest CT gains, while low guidance predicted diminished or adverse effects. Explicit ethical and academic integrity guidelines also significantly predicted stronger CT outcomes. Discipline alone does not determine whether GenAI enhances or hinders CT; rather, the pedagogical model, scaffolding level, and ethical framing are the determining factors.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"5. Discussion","content":"\u003cp\u003eThis mixed-method meta-analysis explored whether GenAI-assisted learning in higher education serves as an aid or barrier to students\u0026rsquo; CT. The quantitative synthesis revealed that GenAI-embedded instruction is linked to significant CT gains compared to non-AI learning, as well as significant pre-post gains within AI-integrated courses, while correlational evidence was inconsistent and weak. The qualitative synthesis revealed widespread risks of overdependence, cognitive offloading, and academic integrity issues (Chen et al., 2025; Bukar et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). These findings suggest that GenAI is neither inherently beneficial nor harmful for CT, but rather amplifies the pedagogical, ethical, and regulatory environment in which it operates.\u003c/p\u003e \u003cdiv id=\"Sec32\" class=\"Section2\"\u003e \u003ch2\u003e5.1 The Role of Guidance, Scaffolding, and Ethical Framing\u003c/h2\u003e \u003cp\u003eThe strongest theme from this review was the influence of teacher facilitation and ethical context on CT outcomes. Meta-regression showed that high teacher guidance predicted significant positive CT effects, while low guidance predicted diminished or adverse effects. Similarly, studies contextualising GenAI within explicit ethical instruction had considerably higher CT gains. These quantitative trends were echoed in qualitative results: studies with strong CT improvements described well-structured AI applications including argument mapping, scaffolded reading and inquiry, and AI-supported writing with explicit prompts for explanation and revision (Nusivera et al., 2025; Sanchez-Lopez et al., 2025; Zhang et al., 2025). Teachers in these studies overtly positioned AI as an imperfect artefact requiring verification and critique (Oliva-Cordova \u0026amp; Jimenez, 2025). In contrast, studies implementing GenAI as an open-ended assistant with limited scaffolding reported more passive use and uncritical copying (Dobre \u0026amp; Popescu, 2024; Gerlich, 2025).\u003c/p\u003e \u003cp\u003eThese findings are directly interpretable through the study\u0026rsquo;s theoretical framework. From the CLT perspective, high-guidance implementations maintained adequate germane cognitive load by requiring students to analyse, evaluate, and justify AI outputs, whereas low-guidance implementations reduced germane load by allowing passive consumption. Scaffolding theory predicts precisely this pattern: effective scaffolds are gradually faded as competence develops, whereas constant, unstructured AI availability functions as over-scaffolding that prevents productive struggle (Koedinger \u0026amp; Aleven, \u003cspan citationid=\"CR52\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). The AI-TPACK framework explains why teacher knowledge matters: educators with higher AI-TPACK competence designed tasks that positioned GenAI as an object for critical evaluation rather than an answer source (Celik et al., \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Ning et al., \u003cspan citationid=\"CR75\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec33\" class=\"Section2\"\u003e \u003ch2\u003e5.2 GenAI as a Double-Edged Tool for Critical Thinking\u003c/h2\u003e \u003cp\u003eThe evidence collectively demonstrates that GenAI functions as a double-edged instrument for CT. On the enhancement side, structured arguments, reading and writing support, and inquiry tasks requiring students to interrogate AI outputs provide rich occasions for analysis, evaluation, and reflective judgement (Zhang et al., 2025; Chen et al., 2025). Students in these contexts reported that AI enabled them to examine alternative viewpoints, structure arguments, identify weaknesses, and think more critically about evidence.\u003c/p\u003e \u003cp\u003eOn the erosion side, students in multiple studies reported trusting AI outputs uncritically, parroting machine-generated text, or relying on AI to \u0026ldquo;get the answer\u0026rdquo; without engaging their own reasoning (Gao et al., 2025; Song et al., 2025). CT risks included AI-dependent idea generation, cognitive offloading and passivity, and blurred authorship boundaries (Hou et al., 2025; Gerlich, 2025; Rastogi \u0026amp; Ashraf Ali Hassan Al Lawati, 2024). The fact that every CT-related study reported both positive outcomes and risks underscores the paradox: the same tool can elicit deeper processing while also encouraging cognitive shortcuts. From the cognitive offloading perspective (Risko \u0026amp; Gilbert, \u003cspan citationid=\"CR88\" class=\"CitationRef\"\u003e2016\u003c/span\u003e), this duality is predictable: GenAI reduces the cognitive cost of producing outputs, which can be beneficial when the freed capacity is redirected toward higher-order thinking, or harmful when it replaces that thinking entirely.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec34\" class=\"Section2\"\u003e \u003ch2\u003e5.3 Implications for Educational Practice\u003c/h2\u003e \u003cp\u003eThe findings indicate that adopting GenAI is not a neutral or merely technical decision. The evidence shows that design and governance choices determine outcomes:\u003c/p\u003e \u003cp\u003eFirst, courses should position humans at the centre of reasoning. When AI is presented as something to critique, compare, or debate rather than as a source of correct solutions, CT gains are higher. Tasks should require students to justify AI suggestions, assess alternatives, and reflect on when to trust or reject AI output.\u003c/p\u003e \u003cp\u003eSecond, institutions should construct explicit scaffolds rather than simply granting AI access. Effective implementations include multistep task designs, structured prompts, and feedback loops. Simply permitting students to \u0026ldquo;use AI if they want\u0026rdquo; without scaffolding risks producing superficial engagement and dependency.\u003c/p\u003e \u003cp\u003eThird, ethics and AI literacy should be treated as foundational content, not a compliance afterthought. Instructors who framed discussions around AI bias, hallucinations, and academic honesty expectations were more likely to report responsible and critically attentive student engagement.\u003c/p\u003e \u003cp\u003eFourth, assessment should reward reasoning rather than AI fluency. Because fluent but superficial text is easily generated, assessment should focus on argument structures, explanations, and decision rationales, and may include in-class, oral, or stepwise components that resist wholesale cognitive offloading.\u003c/p\u003e \u003cp\u003eAdditionally, this review suggests several concrete directions for future research:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eWhat is the optimal \u0026ldquo;fading schedule\u0026rdquo; for GenAI scaffolding, i.e., how should AI support be gradually reduced as students develop independent CT skills?\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eHow do different prompt engineering strategies (e.g., Socratic prompting vs. direct instruction prompts) differentially affect CT development across disciplines?\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eTo what extent does students\u0026rsquo; prior AI literacy moderate the relationship between GenAI use and CT, and can targeted AI literacy interventions mitigate cognitive offloading?\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eWhat are the long-term (semester- or year-long) effects of GenAI integration on CT development, beyond the short-term interventions studied to date?\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eHow does GenAI\u0026rsquo;s impact on CT differ for students from underrepresented groups or those with varying levels of prior academic achievement, and what equity-oriented design principles can mitigate disparities?\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec35\" class=\"Section2\"\u003e \u003ch2\u003e5.4 Limitations\u003c/h2\u003e \u003cp\u003eThis meta-analysis acknowledges four limitations. First, although trim-and-fill and fail-safe N detected no substantial publication bias, these methods have documented limitations when heterogeneity is high (I\u0026sup2; = 76% for between-group effects). Studies reporting favourable effects are systematically more likely to be published, and reliance on published databases excludes grey literature and dissertations, potentially inflating observed effects.\u003c/p\u003e \u003cp\u003eSecond, the 2022\u0026ndash;2025 timeframe creates boundary constraints. Studies from 2022\u0026ndash;2023 predominantly examined GPT-3.5, whereas current models (GPT-4 and beyond) have substantially different capabilities. Pooling effects across model versions creates a \u0026ldquo;moving target\u0026rdquo; problem limiting temporal generalisability.\u003c/p\u003e \u003cp\u003eThird, restricting the scope to university-level education means findings may not generalise to K\u0026ndash;12 settings, where developmental differences in cognitive maturity, metacognitive awareness, and self-regulation likely alter the dynamics of cognitive offloading and scaffolding.\u003c/p\u003e \u003cp\u003eFourth, measurement heterogeneity complicates precise estimation. While many studies employed validated CT instruments, others relied on researcher-developed measures with unclear psychometric properties. The corpus also underrepresents certain regions (Global South) and disciplines (professional practice programs), limiting cross-cultural transferability.\u003c/p\u003e \u003c/div\u003e"},{"header":"6. Conclusion","content":"\u003cp\u003eThis study addressed the question of whether GenAI enhances or undermines CT in higher education. The answer is conditional: it does both, depending on how it is implemented.\u003c/p\u003e \u003cp\u003eThe meta-analytic evidence demonstrated that GenAI-supported instruction can produce large positive effects on CT when embedded within structured, guided, and ethically framed pedagogical designs (Hedges\u0026rsquo; \u003cem\u003eg\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.88 for between-group comparisons; \u003cem\u003eg\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.97 for within-group pre\u0026ndash;post designs). However, the correlational evidence revealed that mere frequency of AI use does not predict CT gains, and the qualitative synthesis documented pervasive risks of overreliance, cognitive offloading, and academic integrity erosion across all 43 included studies.\u003c/p\u003e \u003cp\u003eThe two critical moderators identified, teacher guidance level and explicit ethical framing, point to a practical conclusion: GenAI is a pedagogical amplifier. It magnifies whatever instructional design surrounds it. High-guidance, ethically framed implementations produced the strongest CT gains; low-guidance, unframed implementations produced the weakest and sometimes adverse effects.\u003c/p\u003e \u003cp\u003eThese findings carry direct implications for higher education stakeholders. For curriculum designers, the priority should be integrating GenAI within tasks that demand evaluation, justification, and metacognitive reflection rather than passive consumption. For policymakers, institutional AI policies should move beyond prohibitive or permissive stances toward prescriptive pedagogical guidance. For educators, professional development targeting AI-TPACK competencies, particularly in designing scaffolded AI tasks and framing AI ethics, represents the most promising lever for ensuring that GenAI strengthens rather than weakens students\u0026rsquo; capacity for independent, critical thought.\u003c/p\u003e \u003cp\u003eThe field is still young, and the methodological heterogeneity of the current evidence base warrants caution. However, the convergence of quantitative, qualitative, and moderator evidence in this review provides a robust empirical foundation for an actionable principle: GenAI supports CT when students are required to think with and about AI, not simply through it.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eJecha S Jecha: Conceptualization; investigation; writing\u0026mdash;original draft; methodology; Jining Han: investigation; writing \u0026ndash; original draft; methodology; Yu Liang: review and editing; Hayfa Nassor: investigation; writing \u0026ndash; original draft; methodology;\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eData supporting the findings of this study are available from the corresponding author upon reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAhmed, H. (2024). Institutional Integration of Artificial Intelligence in Higher Education: The Moderation Effect of Ethical Consideration. \u003cem\u003eInternational Journal of Educational Reform.\u003c/em\u003e https://doi.org/10.1177/10567879241247551\u003c/li\u003e\n\u003cli\u003e*Al-kumaim, N. H., Hasnah Hassan, S., Mohammed, F., \u0026amp; Saleh, A. Y. (2025). Navigating GenAI in Malaysian Universities: Use, Problems, and Challenges. \u003cem\u003e2025 5th International Conference on Emerging Smart Technologies and Applications (eSmarTA)\u003c/em\u003e, 1\u0026ndash;7. https://doi.org/10.1109/eSmarTA66764.2025.11132252\u003c/li\u003e\n\u003cli\u003eAlshehri, Y., AlZahrani, A., \u0026amp; AlQahtani, M. (2025). Challenging Cognitive Load Theory: The role of educational neuroscience and artificial intelligence in redefining learning efficacy. \u003cem\u003eBrain Sciences\u003c/em\u003e, \u003cem\u003e15\u003c/em\u003e(2), 127. https://doi.org/10.3390/brainsci15020127\u003c/li\u003e\n\u003cli\u003eAvsheniuk, N., Lutsenko, O., Seminikhyna, N., \u0026amp; Svyrydiuk, T. (2024). Empowering Language Learners\u0026rsquo; Critical Thinking: Evaluating ChatGPT\u0026rsquo;s Role in English Course Implementation. \u003cem\u003eArab World English Journal,\u003c/em\u003e 1(1), 210\u0026ndash;224. https://doi.org/10.24093/awej/chatgpt.14\u003c/li\u003e\n\u003cli\u003eAyanwale, M. A., Adelana, O. P., Bamiro, N. B., Olatunbosun, S. O., Idowu, K. O., \u0026amp; Adewale, K. A. (2025). Large language models and GenAI in education: Insights from Nigerian in-service teachers through a hybrid ANN-PLS-SEM approach. \u003cem\u003eF1000Research\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e, 258. https://doi.org/10.12688/f1000research.161637.1\u003c/li\u003e\n\u003cli\u003eBancoro, J. C. M. (2024). The relationship between artificial intelligence (AI) usage and academic performance of business administration students. \u003cem\u003eInternational Journal of Asian Business and Management, 3\u003c/em\u003e(1), 27\u0026ndash;48. https://doi.org/10.55927/ijabm.v3i1.7876\u003c/li\u003e\n\u003cli\u003eBallance, O. J. (2024). Sampling and randomisation in experimental and quasi-experimental CALL studies: Issues and recommendations for design, reporting, review, and interpretation. \u003cem\u003eReCALL\u003c/em\u003e, \u003cem\u003e36\u003c/em\u003e(1), 58\u0026ndash;71. https://doi.org/10.1017/S0958344023000162\u003c/li\u003e\n\u003cli\u003eBearman, M., Ajjawi, R., \u0026amp; Luckin, R. (2024). Developing evaluative judgement for a time of generative artificial intelligence. \u003cem\u003eAssessment \u0026amp; Evaluation in Higher Education\u003c/em\u003e. Advance online publication. https://doi.org/10.1080/02602938.2024.2335321\u003c/li\u003e\n\u003cli\u003eBezanilla, M. J., Fern\u0026aacute;ndez-Nogueira, D., Poblete, M., \u0026amp; Galindo-Dom\u0026iacute;nguez, H. (2023). Conceptualizations and instructional strategies on critical thinking in higher education: A systematic review of systematic reviews. \u003cem\u003eFrontiers in Education, 8\u003c/em\u003e, Article 1141686. https://doi.org/10.3389/feduc.2023.1141686\u003c/li\u003e\n\u003cli\u003eBiagini, M., Chen, Y., \u0026amp; Davidson, P. (2025). Mapping the scaffolding of metacognition and learning by AI tools in STEM classrooms: A bibliometric\u0026ndash;systematic review approach (2005\u0026ndash;2025). \u003cem\u003eData, 13\u003c/em\u003e(11), 148. https://doi.org/10.3390/data13110148\u003c/li\u003e\n\u003cli\u003eBio-Protocol. (2025). Data extraction. \u003cem\u003eBio-Protocol.\u003c/em\u003e https://bio-protocol.org\u003c/li\u003e\n\u003cli\u003eBukar, U. A., Sayeed, M. S., Abdul Razak, S. F., Yogarayan, N., Yahya, N., \u0026amp; Mohamad, N. N. S. (2024). Effectiveness of ChatGPT in improving critical thinking and problem-solving skills of engineering students. \u003cem\u003eIEEE Access, 12\u003c/em\u003e, 95368\u0026ndash;95389. https://doi.org/10.1109/ACCESS.2024.3425172\u003c/li\u003e\n\u003cli\u003eBuselić, V., \u0026amp; Rajković, I. (2024). Teaching generic skills with ChatGPT: Debate as a critical and creative thinking teaching tool in higher education. In \u003cem\u003e2024 47th International Convention on Information, Communication and Electronic Technology (MIPRO)\u003c/em\u003e, Opatija, Croatia, 2024, pp. 707-712, https://doi.org/10.1109/MIPRO60963.2024.10569431 \u003c/li\u003e\n\u003cli\u003eCalderon Martinez, E., Ghattas Hasbun, P. E., Salolin Vargas, V. P., Garc\u0026iacute;a-Gonz\u0026aacute;lez, O. Y., Fermin Madera, M. D., Rueda Capistr\u0026aacute;n, D. E., Campos Carmona, T., Sanchez Cruz, C., \u0026amp; Teran Hooper, C. (2025). \u003cem\u003eA comprehensive guide to conduct a systematic review and meta-analysis in medical research\u003c/em\u003e. \u003cem\u003eMedicine, 104\u003c/em\u003e(33), e41868. https://doi.org/10.1097/MD.0000000000041868 \u003c/li\u003e\n\u003cli\u003eCelik, I., Dindar, M., Muukkonen, H., \u0026amp; J\u0026auml;rvel\u0026auml;, S. (2022). The promises and challenges of artificial intelligence for teachers: A systematic review of research. \u003cem\u003eTechTrends, 66\u003c/em\u003e(4), 616\u0026ndash;630. https://doi.org/10.1007/s11528-022-00715-y\u003c/li\u003e\n\u003cli\u003eChan, C. K. Y., \u0026amp; Hu, W. (2023). Students\u0026rsquo; voices on generative AI: Perceptions, benefits, and challenges in higher education. \u003cem\u003eInternational Journal of Educational Technology in Higher Education, 20\u003c/em\u003e, Article 43. https://doi.org/10.1186/s41239-023-00411-8\u003c/li\u003e\n\u003cli\u003e*Chang, L.-C., Hung, L.-L., Liu, T.-W., Huang, C.-H., Lin, H.-L., \u0026amp; Liao, L.-L. (2025). Relationships between ChatGPT use with self-directed learning and critical thinking among school and university nurses in Taiwan. \u003cem\u003eBMC Nursing, 24\u003c/em\u003e, 1426. https://doi.org/10.1186/s12912-025-04069-7\u003c/li\u003e\n\u003cli\u003e*Chen, X., Jia, B., Peng, X., Zhao, H., Yao, J., Wang, Z., \u0026amp; Zhu, S. (2025). Effects of ChatGPT and argument map (AM)-supported online argumentation on college students\u0026rsquo; critical thinking skills and perceptions. \u003cem\u003eEducation and Information Technologies, 30\u003c/em\u003e(12), 17623\u0026ndash;17658. https://doi.org/10.1007/s10639-025-13471-2\u003c/li\u003e\n\u003cli\u003e*Chiu, M. C., \u0026amp; Hwang, G. J. (2025). Enhancing student creative and critical thinking in generative AI-empowered creation: a mind-mapping approach. \u003cem\u003eInteractive Learning Environments\u003c/em\u003e, 1\u0026ndash;22. https://doi.org/10.1080/10494820.2025.2511244 \u003c/li\u003e\n\u003cli\u003eClark, A. (2025). Extending Minds with Generative AI. \u003cem\u003eNature Communications\u003c/em\u003e, \u003cem\u003e16\u003c/em\u003e(1), 4627. https://doi.org/10.1038/s41467-025-59906-9 \u003c/li\u003e\n\u003cli\u003eClark, A., \u0026amp; Chalmers, D. (1998). The extended mind. \u003cem\u003eAnalysis\u003c/em\u003e, \u003cem\u003e58\u003c/em\u003e(1), 7\u0026ndash;19. https://doi.org/10.1093/analys/58.1.7\u003c/li\u003e\n\u003cli\u003eCochrane Bias Methods Group Bias Methods Group. (2024). \u003cem\u003eRoB 2: A revised Cochrane Bias Methods Group risk-of-bias tool for randomized trials\u003c/em\u003e. https://methods.Cochrane Bias Methods Group.org/bias/resources/rob-2\u003c/li\u003e\n\u003cli\u003eCrompton, H., \u0026amp; Burke, D. (2023). Artificial intelligence in higher education: The state of the field. \u003cem\u003eInternational Journal of Educational Technology in Higher Education\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e(1), Article 22. https://doi.org/10.1186/s41239-023-00392-8\u003c/li\u003e\n\u003cli\u003e*Damiano, A. D., Laur\u0026iacute;a, E. J. M., Sarmiento, C., \u0026amp; Zhao, N. (2024). Early perceptions of teaching and learning using generative AI in higher education. \u003cem\u003eJournal of Educational Technology Systems\u003c/em\u003e, \u003cem\u003e52\u003c/em\u003e(3), 346\u0026ndash;375. https://doi.org/10.1177/00472395241233290\u003c/li\u003e\n\u003cli\u003eDavies, M. (2015). A model of critical thinking in higher education. In M. B. Paulsen (Ed.), \u003cem\u003eHigher education: Handbook of theory and research\u003c/em\u003e (Vol. 30, pp. 41\u0026ndash;92). Springer. https://doi.org/10.1007/978-3-319-12835-1_2\u003c/li\u003e\n\u003cli\u003e*de la Puente Pacheco, M. A., Torres, J., Blanco Troncoso, A. L., Guzm\u0026aacute;n Murillo, H. J., \u0026amp; Carrascal, J. X. M. (2025). Enhancing Critical Thinking and Argumentation Skills in Colombian Undergraduate Diplomacy Students: ChatGPT-Assisted and Traditional Debate Methods. \u003cem\u003eJournal of Political Science Education\u003c/em\u003e, \u003cem\u003e21\u003c/em\u003e(4), 728\u0026ndash;738. https://doi.org/10.1080/15512169.2025.2449936 \u003c/li\u003e\n\u003cli\u003e*Dobre, S.-C., \u0026amp; Popescu, E. (2024). Exploring Students\u0026rsquo; Perception and Experience with ChatGPT and Critical Thinking in a Higher Education Context. \u003cem\u003e2024 21st International Conference on Information Technology Based Higher Education and Training (ITHET)\u003c/em\u003e, 1\u0026ndash;6. https://doi.org/10.1109/ITHET61869.2024.10837650\u003c/li\u003e\n\u003cli\u003eDwivedi, Y. K., Kshetri, N., Hughes, L., Slade, E. L., Jeyaraj, A., Kar, A. K., Baabdullah, A. M., Koohang, A., Raghavan, V., Ahuja, M., Albanna, H., Albashrawi, M. A., Al-Busaidi, A. S., Balakrishnan, J., Barlette, Y., Basu, S., Bose, I., Brooks, L., Buhalis, D., Wright, R. (2023). \u0026ldquo;So what if ChatGPT wrote it?\u0026rdquo; Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. \u003cem\u003eInternational Journal of Information Management\u003c/em\u003e, \u003cem\u003e71\u003c/em\u003e, Article 102642. https://doi.org/10.1016/j.ijinfomgt.2023.102642\u003c/li\u003e\n\u003cli\u003eEnnis, R. H. (1993). Critical thinking assessment. \u003cem\u003eTheory Into Practice\u003c/em\u003e, \u003cem\u003e32\u003c/em\u003e(3), 179\u0026ndash;186. https://doi.org/10.1080/00405849309543594\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003e*\u003c/strong\u003eEssel, H. B., Vlachopoulos, D., Essuman, A. B., \u0026amp; Amankwa, J. O. (2024). ChatGPT effects on cognitive skills of undergraduate students: Receiving instant responses from AI-based conversational large language models (LLMs). \u003cem\u003eComputers and Education: Artificial Intelligence\u003c/em\u003e, \u003cem\u003e6\u003c/em\u003e, 100198. https://doi.org/10.1016/j.caeai.2023.100198\u003c/li\u003e\n\u003cli\u003eEvalAcademy. (2025). Interpreting themes from qualitative data: Thematic analysis. https://www.evalacademy.com/articles/interpreting-themes-from-qualitative-data-thematic-analysis\u003c/li\u003e\n\u003cli\u003eFacione, P. A. (1990). Critical thinking: A statement of expert consensus for purposes of educational assessment and instruction. \u003cem\u003eThe California Academic Press\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003eFajar, J. (2024). Approaches for identifying and managing publication bias in meta-analysis. \u003cem\u003eDeka in Medicine,\u003c/em\u003e 1(1), e865. https://doi.org/10.69863/dim.v1i1.1\u003c/li\u003e\n\u003cli\u003e*Fakour, H., \u0026amp; Imani, M. (2025). Socratic wisdom in the age of AI: A comparative study of ChatGPT and human tutors in enhancing critical thinking skills. \u003cem\u003eFrontiers in Education\u003c/em\u003e, \u003cem\u003e10\u003c/em\u003e, 1528603. https://doi.org/10.3389/feduc.2025.1528603\u003c/li\u003e\n\u003cli\u003e*Gao, J., Zhang, J., \u0026amp; Li, Y. (2025). Do AI chatbot-integrated writing tasks influence writing self-efficacy and critical thinking ability? An exploratory study. \u003cem\u003eComputers and Education: Artificial Intelligence\u003c/em\u003e, \u003cem\u003e9\u003c/em\u003e, 100472. https://doi.org/10.1016/j.caeai.2025.100472\u003c/li\u003e\n\u003cli\u003e\u003cstrong\u003e*\u003c/strong\u003eGerlich, M. (2025). AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking. \u003cem\u003eSocieties\u003c/em\u003e, \u003cem\u003e15\u003c/em\u003e(1), 6. https://doi.org/10.3390/soc15010006\u003c/li\u003e\n\u003cli\u003eGiannakos, M. (2025). The promise and challenges of generative AI in education. \u003cem\u003eBehaviour \u0026amp; Information Technology\u003c/em\u003e. Advance online publication. https://doi.org/10.1080/0144929X.2024.2394886\u003c/li\u003e\n\u003cli\u003eGrassini, S. (2023). Shaping the future of education: Exploring the potential and consequences of AI and ChatGPT in educational settings. \u003cem\u003eEducation Sciences\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e(7), Article 692. https://doi.org/10.3390/educsci13070692\u003c/li\u003e\n\u003cli\u003eGrinschgl, S., \u0026amp; Neubauer, A. C. (2022). Supporting Cognition With Modern Technology: Distributed Cognition Today and in an AI-Enhanced Future. \u003cem\u003eFrontiers in Artificial Intelligence\u003c/em\u003e, \u003cem\u003e5\u003c/em\u003e, 908261. https://doi.org/10.3389/frai.2022.908261 \u003c/li\u003e\n\u003cli\u003e*Guo, Y., \u0026amp; Lee, D. (2023). Leveraging ChatGPT for Enhancing Critical Thinking Skills. \u003cem\u003eJournal of Chemical Education\u003c/em\u003e, \u003cem\u003e100\u003c/em\u003e(12), 4876\u0026ndash;4883. https://doi.org/10.1021/acs.jchemed.3c00505\u003c/li\u003e\n\u003cli\u003eHanegraaf, P., Mosselman, J.-J., van Zuuren, F., van Valkenhoef, G., Delaney, B. C., \u0026amp; Dagnelie, P. C. (2024). Inter-reviewer reliability of human literature reviewing and implications for the introduction of machine-assisted systematic reviews: A mixed-methods review. \u003cem\u003eBMJ Open\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(3), e076912. https://doi.org/10.1136/bmjopen-2023-076912\u003c/li\u003e\n\u003cli\u003eHolmes, W., \u0026amp; Porayska-Pomsta, K. (Eds.). (2022). \u003cem\u003eThe ethics of artificial intelligence in education: Practices, challenges, and debates\u003c/em\u003e. Routledge.\u003c/li\u003e\n\u003cli\u003eHon, K. L. (2025). Generative AI in higher education: A systematic review of its effects on learning outcomes and academic performance. \u003cem\u003eJournal of Educational Technology Systems, 0\u003c/em\u003e(0). Advance online publication. https://doi.org/10.1177/00472395251400089\u003c/li\u003e\n\u003cli\u003eHopewell, S., Chan, A.-W., Collins, G. S., Osterhoff, G., \u0026amp; Moher, D. (2025). CONSORT 2025 explanation and elaboration: Updated guideline for reporting randomised trials. \u003cem\u003eBMJ\u003c/em\u003e, \u003cem\u003e389\u003c/em\u003e, Article e081124. https://doi.org/10.1136/bmj-2024-081124\u003c/li\u003e\n\u003cli\u003e*Hou, C., Zhu, G., \u0026amp; Sudarshan, V. (2025). The role of critical thinking on undergraduates\u0026rsquo; reliance behaviours on generative AI in problem‐solving. \u003cem\u003eBritish Journal of Educational Technology\u003c/em\u003e, \u003cem\u003e56\u003c/em\u003e(5), 1919\u0026ndash;1941. https://doi.org/10.1111/bjet.13613\u003c/li\u003e\n\u003cli\u003e*Huang, Y.-M., Chen, P.-H., Lee, H.-Y., Sandnes, F. E., \u0026amp; Wu, T.-T. (2025). ChatGPT-enhanced mobile instant messaging in online learning: Effects on student outcomes and perceptions. \u003cem\u003eComputers in Human Behavior\u003c/em\u003e, \u003cem\u003e168\u003c/em\u003e, 108659. https://doi.org/10.1016/j.chb.2025.108659\u003c/li\u003e\n\u003cli\u003eHwang, G. J., Xie, H., Wah, B. W., \u0026amp; Ga\u0026scaron;ević, D. (2023). Vision, challenges, roles and research issues of Artificial Intelligence in Education. \u003cem\u003eComputers and Education: Artificial Intelligence\u003c/em\u003e, \u003cem\u003e1\u003c/em\u003e, Article 100001. https://doi.org/10.1016/j.caeai.2020.100001\u003c/li\u003e\n\u003cli\u003eHwang, S. (2022). Examining the effects of artificial intelligence on elementary students\u0026rsquo; mathematics achievement: A meta-analysis. \u003cem\u003eSustainability\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(20), 13185. https://doi.org/10.3390/su142013185\u003c/li\u003e\n\u003cli\u003eKarapantelakis, A., Nikou, A., Kattepur, A., Martins, J., Mokrushin, L., Mohalik, S. K., Orlic, M., \u0026amp; Feljan, A. V. (2024). A Survey on the Integration of Generative AI for Critical Thinking in Mobile Networks. \u003cem\u003eArXiv\u003c/em\u003e. https://arxiv.org/abs/2404.06946 \u003c/li\u003e\n\u003cli\u003e*Khampusaen, D. (2025). The Impact of ChatGPT on Academic Writing Skills and Knowledge: An Investigation of Its Use in Argumentative Essays. \u003cem\u003eLEARN Journal: Language Education and Acquisition Research Network\u003c/em\u003e, \u003cem\u003e18\u003c/em\u003e(1), 963\u0026ndash;988. https://doi.org/10.70730/PGCQ9242\u003c/li\u003e\n\u003cli\u003eKim, J., Lee, H., \u0026amp; Cho, Y. H. (2024). Does using artificial intelligence assistance accelerate skill decay and hinder skill development without performers\u0026apos; awareness? \u003cem\u003eCognitive Research: Principles and Implications\u003c/em\u003e, \u003cem\u003e9\u003c/em\u003e(1), Article 46. https://doi.org/10.1186/s41235-024-00572-8\u003c/li\u003e\n\u003cli\u003eKoedinger, K. R., \u0026amp; Aleven, V. (2007). Exploring the assistance dilemma in experiments with cognitive tutors. \u003cem\u003eEducational Psychology Review\u003c/em\u003e, \u003cem\u003e19\u003c/em\u003e(3), 239\u0026ndash;264. https://doi.org/10.1007/s10648-007-9049-0\u003c/li\u003e\n\u003cli\u003e*Kulal, A. (2025). Cognitive Risks of AI: Literacy, Trust, and Critical Thinking. \u003cem\u003eJournal of Computer Information Systems\u003c/em\u003e, 1\u0026ndash;13. https://doi.org/10.1080/08874417.2025.2582050\u003c/li\u003e\n\u003cli\u003e*Lee, H.-P. (Hank), Sarkar, A., Tankelevitch, L., Drosos, I., Rintel, S., Banks, R., \u0026amp; Wilson, N. (2025). The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects from a Survey of Knowledge Workers. \u003cem\u003eProceedings of the 2025 CHI Conference on Human Factors in Computing Systems\u003c/em\u003e, 1\u0026ndash;22. https://doi.org/10.1145/3706598.3713778\u003c/li\u003e\n\u003cli\u003e*Lee, Y.-F., Hwang, G.-J., \u0026amp; Cheng, L.-C. (2025). Impacts of a ChatGPT-supported concept mapping approach on students\u0026rsquo; database programming achievement and their problem-solving and critical thinking awareness. \u003cem\u003eInteractive Learning Environments\u003c/em\u003e, 1\u0026ndash;20. https://doi.org/10.1080/10494820.2025.2523395 \u003c/li\u003e\n\u003cli\u003eLi, F., Yan, X., Su, H., Shen, R., \u0026amp; Mao, G. (2025). An assessment of human\u0026ndash;AI interaction capability in the generative AI era: The influence of critical thinking. \u003cem\u003eJournal of Intelligence\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e(6), Article 62. https://doi.org/10.3390/jintelligence13060062\u003c/li\u003e\n\u003cli\u003e*Li, K. C., Chong, G. H. L., Wong, B. T. M., \u0026amp; Wu, M. M. F. (2025). A TAM-Based Analysis of Hong Kong Undergraduate Students\u0026rsquo; Attitudes Toward Generative AI in Higher Education and Employment. \u003cem\u003eEducation Sciences\u003c/em\u003e, \u003cem\u003e15\u003c/em\u003e(7), 798. https://doi.org/10.3390/educsci15070798\u003c/li\u003e\n\u003cli\u003eLintangesukmanjaya, R., Putra, D., \u0026amp; Rahmawati, N. (2025). Measuring learners\u0026rsquo; critical thinking skills using argument-based assessment in higher education. \u003cem\u003eJournal of Educational Assessment and Evaluation,\u003c/em\u003e 9(2), 45\u0026ndash;63. https://doi.org/10.5555/jeae.2025.97\u003c/li\u003e\n\u003cli\u003e*Liu, H., Zhou, F., \u0026amp; Li, J. (2025). Empower or Disempower: The Impact of Generative Artificial Intelligence on College Students\u0026rsquo; Creativity. \u003cem\u003e2025 7th International Conference on Computer Science and Technologies in Education (CSTE)\u003c/em\u003e, 661\u0026ndash;665. https://doi.org/10.1109/CSTE64638.2025.11091881\u003c/li\u003e\n\u003cli\u003eLo, C. K. (2023). What is the impact of ChatGPT on education? A rapid review of the literature. \u003cem\u003eEducation Sciences\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e(4), 410. https://doi.org/10.3390/educsci13040410\u003c/li\u003e\n\u003cli\u003eLodge, J. M., Howard, S., Bearman, M., \u0026amp; Dawson, P. (2023). \u003cem\u003eAssessment reform for the age of artificial intelligence\u003c/em\u003e. Tertiary Education Quality and Standards Agency (TEQSA). https://www.teqsa.gov.au/guides-resources/resources/corporate-publications/assessment-reform-age-artificial-intelligence\u003c/li\u003e\n\u003cli\u003eLunny, C., Higgins, J. P. T., Welton, N. J., Caldwell, D. M., Dias, S., Eldridge, S., Ferroni, E., Furukawa, T. A., Gallo, V., Ioannidis, J. P. A., Jansen, J. P., Johnson, K. R., J\u0026oslash;rgensen, L., Page, M. J., Rutter, H., Salanti, G., Sch\u0026uuml;nemann, H. J., Sutton, A. J., Thorlund, K., ... Egger, M. (2025). Risk of Bias in Network Meta-Analysis (RoB NMA) tool. \u003cem\u003eBMJ\u003c/em\u003e, \u003cem\u003e388\u003c/em\u003e, e079839. https://doi.org/10.1136/bmj-2024-079839\u003c/li\u003e\n\u003cli\u003eMarshall, I. J., Nye, B., Kuiper, J., Noel-Storr, A., Marshall, R., Zelko, D., Thomas, J., \u0026amp; Wallace, B. C. (2021). Data extraction methods for systematic review (semi)automation: Update of a living systematic review. \u003cem\u003eF1000Research\u003c/em\u003e, \u003cem\u003e10\u003c/em\u003e, 401. https://doi.org/10.12688/f1000research.51117.2\u003c/li\u003e\n\u003cli\u003e*Mart\u0026iacute;n-G\u0026oacute;mez, S., \u0026amp; Gonz\u0026aacute;lez Ruiz, C. J. (2025). AI in Higher Education: Initial Teacher Training in the Critical and Didactic Use of Artificial Intelligence. \u003cem\u003eIEEE Revista Iberoamericana de Tecnologias Del Aprendizaje\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e, 302\u0026ndash;309. https://doi.org/10.1109/RITA.2025.3616509\u003c/li\u003e\n\u003cli\u003eMcHugh, M. L. (2012). Interrater reliability: The kappa statistic. \u003cem\u003eBiochemia Medica\u003c/em\u003e, \u003cem\u003e22\u003c/em\u003e(3), 276\u0026ndash;282. https://doi.org/10.11613/bm.2012.031\u003c/li\u003e\n\u003cli\u003e*Miah, A. S. M., Tusher, M. M. R., Hossain, Md. M., Hossain, M. M., Rahim, M. A., Hamid, M. E., Islam, Md. S., \u0026amp; Shin, J. (2025). ChatGPT in Research and Education: A SWOT Analysis of Its Academic Impact. \u003cem\u003eComputer Modeling in Engineering \u0026amp; Sciences\u003c/em\u003e, \u003cem\u003e143\u003c/em\u003e(3), 2573\u0026ndash;2614. https://doi.org/10.32604/cmes.2025.064168 \u003c/li\u003e\n\u003cli\u003eMichel-Villarreal, R., Vilalta-Perdomo, E., Salinas-Navarro, D. E., Thierry-Aguilera, R., \u0026amp; Gerardou, F. S. (2023). Challenges and opportunities of generative AI for higher education as explained by ChatGPT. \u003cem\u003eEducation Sciences\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e(9), Article 856. https://doi.org/10.3390/educsci13090856\u003c/li\u003e\n\u003cli\u003eMiranda, J. P. P., Cruz, M. A. D., Fernandez, A. B., Balahadia, F. F., Aviles, J. S., Caro, C. A., Liwanag, I. G., \u0026amp; Ga\u0026ntilde;a, E. P. (2025). Erosion of critical academic skills due to AI dependency among tertiary students: A path analysis. In M. B. Garcia, J. Rosak-Szyrocka, \u0026amp; A. Bozkurt (Eds.), \u003cem\u003ePitfalls of AI integration in education: Skill obsolescence, misuse, and bias\u003c/em\u003e (pp. 25\u0026ndash;48). IGI Global. https://doi.org/10.4018/979-8-3373-0122-8.ch002\u003c/li\u003e\n\u003cli\u003eMishra, P., \u0026amp; Koehler, M. J. (2006). Technological pedagogical content knowledge: A framework for teacher knowledge. \u003cem\u003eTeachers College Record\u003c/em\u003e, \u003cem\u003e108\u003c/em\u003e(6), 1017\u0026ndash;1054. https://doi.org/10.1111/j.1467-9620.2006.00684.x\u003c/li\u003e\n\u003cli\u003eMollick, E. R., \u0026amp; Mollick, L. (2023). \u003cem\u003eAssigning AI: Seven approaches for students, with prompts\u003c/em\u003e. SSRN. https://dx.doi.org/10.2139/ssrn.4475995\u003c/li\u003e\n\u003cli\u003eMoore, T. (2013). Critical thinking: Seven definitions in search of a concept. \u003cem\u003eStudies in Higher Education\u003c/em\u003e, \u003cem\u003e38\u003c/em\u003e(4), 506\u0026ndash;522. https://doi.org/10.1080/03075079.2011.586995\u003c/li\u003e\n\u003cli\u003e*Mun, C. (2024). EFL Learners\u0026rsquo; English Writing Feedback and Their Perception of Using ChatGPT. \u003cem\u003eSTEM Journal\u003c/em\u003e, \u003cem\u003e25\u003c/em\u003e(2), 26\u0026ndash;39. https://doi.org/10.16875/stem.2024.25.2.26\u003c/li\u003e\n\u003cli\u003e*Nasr, N. R., Tu, C.-H., Werner, J., Bauer, T., Yen, C.-J., \u0026amp; Sujo-Montes, L. (2025). Exploring the Impact of Generative AI ChatGPT on Critical Thinking in Higher Education: Passive AI-Directed Use or Human\u0026ndash;AI Supported Collaboration? \u003cem\u003eEducation Sciences\u003c/em\u003e, \u003cem\u003e15\u003c/em\u003e(9), 1198. https://doi.org/10.3390/educsci15091198 \u003c/li\u003e\n\u003cli\u003eNg, D. T. K., Leung, J. K. L., Chu, S. K. W., \u0026amp; Qiao, M. S. (2021). Conceptualizing AI literacy: An exploratory review. \u003cem\u003eComputers and Education: Artificial Intelligence\u003c/em\u003e, \u003cem\u003e2\u003c/em\u003e, Article 100041. https://doi.org/10.1016/j.caeai.2021.100041\u003c/li\u003e\n\u003cli\u003eNing, Y., Zhang, C., Xu, B., Zhou, Y., \u0026amp; Wijaya, T. T. (2024). Teachers\u0026rsquo; AI-TPACK: Exploring the Relationship between Knowledge Elements. Sustainability, 16(3), 978. https://doi.org/10.3390/su16030978 \u003c/li\u003e\n\u003cli\u003eNordstr\u0026ouml;m, T., Kalmendal, A., \u0026amp; Batinović, L. (2023). Risk of bias and open science practices in systematic reviews of educational effectiveness: A meta-review. \u003cem\u003eReview of Education, 11\u003c/em\u003e(3), e3443. https://doi.org/10.1002/rev3.3443\u003c/li\u003e\n\u003cli\u003eNowell, L. S., Norris, J. M., White, D. E., \u0026amp; Moules, N. J. (2017). Thematic analysis: Striving to meet the trustworthiness criteria. \u003cem\u003eInternational Journal of Qualitative Methods\u003c/em\u003e, \u003cem\u003e16\u003c/em\u003e, Article 1609406917733847. https://doi.org/10.1177/1609406917733847\u003c/li\u003e\n\u003cli\u003e*Nusivera, E., Hikmat, A., \u0026amp; Ghani, A. R. A. (2025). Integration of Chat-GPT Usage in Language Learning Model to Improve Argumentation Skills, Complex Comprehension Skills, and Critical Thinking Skills. \u003cem\u003eInternational Journal of Learning, Teaching and Educational Research\u003c/em\u003e, \u003cem\u003e24\u003c/em\u003e(2), 375\u0026ndash;390. https://doi.org/10.26803/ijlter.24.2.19\u003c/li\u003e\n\u003cli\u003e*Oliva-C\u0026oacute;rdova, L. M., \u0026Aacute;lvarez-Icaza, I., \u0026amp; George-Reyes, C. E. (2025). Evaluation of Generative AI Use to Foster Critical Thinking in Higher Education. \u003cem\u003eIEEE Revista Iberoamericana de Tecnologias Del Aprendizaje\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e, 237\u0026ndash;243. https://doi.org/10.1109/RITA.2025.3597848\u003c/li\u003e\n\u003cli\u003eOuzzani, M., Hammady, H., Fedorowicz, Z., \u0026amp; Elmagarmid, A. (2016). Rayyan\u0026mdash;a web and mobile app for systematic reviews. \u003cem\u003eSystematic Reviews\u003c/em\u003e, \u003cem\u003e5\u003c/em\u003e(1), 210. https://doi.org/10.1186/s13643-016-0384-4\u003c/li\u003e\n\u003cli\u003ePage, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hr\u0026oacute;bjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., . . . Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. \u003cem\u003eBMJ\u003c/em\u003e, \u003cem\u003e372\u003c/em\u003e, Article n71. https://doi.org/10.1136/bmj.n71\u003c/li\u003e\n\u003cli\u003ePardos, Z. A., \u0026amp; Bhandari, S. (2024). ChatGPT-generated help produces learning gains equivalent to human tutor-authored help on mathematics skills. \u003cem\u003ePLOS ONE\u003c/em\u003e, \u003cem\u003e19\u003c/em\u003e(5), Article e0304013. https://doi.org/10.1371/journal.pone.0304013\u003c/li\u003e\n\u003cli\u003ePellas, N. (2025). The role of students\u0026rsquo; higher-order thinking skills in the relationship between academic achievements and machine learning using generative AI chatbots. \u003cem\u003eResearch and Practice in Technology Enhanced Learning\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e, 036. https://doi.org/10.58459/rptel.2025.20036 \u003c/li\u003e\n\u003cli\u003ePerkins, M., Furze, L., Roe, J., \u0026amp; MacVaugh, J. (2024). The Artificial Intelligence Assessment Scale (AIAS): A framework for ethical integration of generative AI in educational assessment. \u003cem\u003eJournal of University Teaching \u0026amp; Learning Practice\u003c/em\u003e, \u003cem\u003e21\u003c/em\u003e(6), Article 01. https://doi.org/10.53761/q3azde36\u003c/li\u003e\n\u003cli\u003ePremkumar, P. P., Yatigammana, M. R. K. N., \u0026amp; Kannangara, S. (2024). Impact of generative AI on critical thinking skills in undergraduates: A systematic review. \u003cem\u003eJournal of Desk Research Review and Analysis\u003c/em\u003e, \u003cem\u003e2\u003c/em\u003e(2), 215\u0026ndash;232. https://doi.org/10.4038/jdrra.v2i2.52\u003c/li\u003e\n\u003cli\u003eQu, X., Sherwood, J., Liu, P., \u0026amp; Aleisa, N. (2025). Generative AI tools in higher education: A meta-analysis of cognitive impact. In \u003cem\u003eExtended abstracts of the CHI Conference on Human Factors in Computing Systems\u003c/em\u003e (CHI EA \u0026apos;25) (pp. 1\u0026ndash;9). Association for Computing Machinery. https://doi.org/10.1145/3706599.3719841\u003c/li\u003e\n\u003cli\u003e*Rastogi, A., \u0026amp; Ashraf Ali Hassan Al Lawati, A. G. (2024). Understanding the acceptance of ChatGPT by HEI\u0026rsquo;s students for knowledge enhancement. \u003cem\u003e2024 1st International Conference on Innovative Engineering Sciences and Technological Research (ICIESTR)\u003c/em\u003e, 1\u0026ndash;8. https://doi.org/10.1109/ICIESTR60916.2024.10798141 \u003c/li\u003e\n\u003cli\u003eRisko, E. F., \u0026amp; Gilbert, S. J. (2016). Cognitive offloading. \u003cem\u003eTrends in Cognitive Sciences\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e(9), 676\u0026ndash;688. https://doi.org/10.1016/j.tics.2016.07.002\u003c/li\u003e\n\u003cli\u003e*Ruiz-Rojas, L. I., Salvador-Ullauri, L., \u0026amp; Acosta-Vargas, P. (2024). Collaborative Working and Critical Thinking: Adoption of Generative Artificial Intelligence Tools in Higher Education. \u003cem\u003eSustainability\u003c/em\u003e, \u003cem\u003e16\u003c/em\u003e(13), 5367. https://doi.org/10.3390/su16135367\u003c/li\u003e\n\u003cli\u003eSeufert, S., \u0026amp; Rohwer, K. (2024). Developing a framework for analysing and assessing critical thinking skills for the responsible use of generative AI in higher education. In \u003cem\u003eICERI2024 Proceedings\u003c/em\u003e (pp. 8579\u0026ndash;8585). IATED Academy. https://doi.org/10.21125/iceri.2024.2139\u003c/li\u003e\n\u003cli\u003e*Sanchez-Lopez, A. L., Jimenez-Perez, M. I., Perfecto-Avalos, Y., Navarro-Lopez, D. E., Esparza-Sanchez, J., \u0026amp; Mena, E. R. L. (2025). Integration of Artificial Intelligence as a Tool to Enhance Critical Thinking Skills and Foster Learning in Bioengineering Education. \u003cem\u003e2025 IEEE Global Engineering Education Conference (EDUCON)\u003c/em\u003e, 1\u0026ndash;5. https://doi.org/10.1109/EDUCON62633.2025.11016586\u003c/li\u003e\n\u003cli\u003eSardi, J., Yuliana, D. F., Yanto, D. T. P., Eliza, F., Candra, O., Habibullah, H., \u0026amp; Darmansyah, D. (2025). How Generative AI Influences Students\u0026rsquo; Self-Regulated Learning and Critical Thinking Skills? A Systematic Review. \u003cem\u003eInternational Journal of Engineering Pedagogy (iJEP),\u003c/em\u003e 15(1), 94\u0026ndash;108. https://doi.org/10.3991/ijep.v15i1.53379\u003c/li\u003e\n\u003cli\u003eShafer, D. (2025). A critical thinking thematic framework and observation tool for improved theory and developing secondary teachers\u0026rsquo; instructional practice: Proof of concept. \u003cem\u003eThinking Skills and Creativity\u003c/em\u003e, \u003cem\u003e56\u003c/em\u003e, 101787. https://doi.org/10.1016/j.tsc.2025.101787 \u003c/li\u003e\n\u003cli\u003e*Shi, H., Chai, C. S., Zhou, S., \u0026amp; Aubrey, S. (2025). Comparing the effects of ChatGPT and automated writing evaluation on students\u0026rsquo; writing and ideal L2 writing self. \u003cem\u003eComputer Assisted Language Learning\u003c/em\u003e, 1\u0026ndash;28. https://doi.org/10.1080/09588221.2025.2454541\u003c/li\u003e\n\u003cli\u003eSobkowiak, P. (2016). Critical thinking in the intercultural context: Investigating EFL textbooks. \u003cem\u003eStudies in Second Language Learning and Teaching\u003c/em\u003e, \u003cem\u003e6\u003c/em\u003e(4), 697\u0026ndash;718. https://doi.org/10.14746/ssllt.2016.6.4.7\u003c/li\u003e\n\u003cli\u003eSolyst, J., Pan, M. Y., Andam, A., Poblete, I. P., Eslami, M., Hammer, J., Ogan, A., \u0026amp; Stewart, A. E. (2025). Critical AI literacy through exploring generative AI limitations. In A. Rajala, A. Cortez, R. Hofmann, A. Jornet, H. Lotz-Sisitka, \u0026amp; L. Markauskaite (Eds.), \u003cem\u003eProceedings of the 19th International Conference of the Learning Sciences - ICLS 2025\u003c/em\u003e (pp. 2061\u0026ndash;2065). International Society of the Learning Sciences. https://repository.isls.org/handle/1/11423\u003c/li\u003e\n\u003cli\u003e*Song, D., Zhang, P., Zhu, Y., Qi, S., Yang, Y., Gong, L., \u0026amp; Zhou, L. (2025). Effects of generative artificial intelligence on higher-order thinking skills and artificial intelligence literacy in nursing undergraduates: A quasi-experimental study. \u003cem\u003eNurse Education in Practice\u003c/em\u003e, \u003cem\u003e88\u003c/em\u003e, 104549. https://doi.org/10.1016/j.nepr.2025.104549\u003c/li\u003e\n\u003cli\u003eSouthworth, J., Migliaccio, K., Glover, J., Glover, J., Reed, D., McCarty, C., Brendemuhl, J., \u0026amp; Thomas, A. (2022). Developing a model for AI Across the curriculum: Transforming the higher education landscape via innovation in AI literacy. \u003cem\u003eComputers and Education: Artificial Intelligence\u003c/em\u003e, \u003cem\u003e4\u003c/em\u003e, 100127. https://doi.org/10.1016/j.caeai.2023.100127 \u003c/li\u003e\n\u003cli\u003e*Styve, A., Virkki, O. T., \u0026amp; Naeem, U. (2024). Developing Critical Thinking Practices Interwoven with Generative AI Usage in an Introductory Programming Course. \u003cem\u003e2024 IEEE Global Engineering Education Conference (EDUCON)\u003c/em\u003e, 01\u0026ndash;08. https://doi.org/10.1109/EDUCON60312.2024.10578746\u003c/li\u003e\n\u003cli\u003eSweller, J., van Merri\u0026euml;nboer, J. J. G., \u0026amp; Paas, F. (2019). Cognitive architecture and instructional design: 20 years later. \u003cem\u003eEducational Psychology Review\u003c/em\u003e, \u003cem\u003e31\u003c/em\u003e(2), 261\u0026ndash;292. https://doi.org/10.1007/s10648-019-09465-5\u003c/li\u003e\n\u003cli\u003eSwiecki, Z., Khosravi, H., Chen, G., Martinez-Maldonado, R., Lodge, J. M., Milligan, S., Selwyn, N., \u0026amp; Ga\u0026scaron;ević, D. (2022). Assessment in the age of artificial intelligence. \u003cem\u003eComputers and Education: Artificial Intelligence\u003c/em\u003e, \u003cem\u003e3\u003c/em\u003e, Article 100075. https://doi.org/10.1016/j.caeai.2022.100075\u003c/li\u003e\n\u003cli\u003eTian, J., \u0026amp; Zhang, R. (2025). Learners\u0026apos; AI dependence and critical thinking: The psychological mechanism of fatigue and the social buffering role of AI literacy. \u003cem\u003eActa Psychologica\u003c/em\u003e, \u003cem\u003e260\u003c/em\u003e, 105725. https://doi.org/10.1016/j.actpsy.2025.105725 \u003c/li\u003e\n\u003cli\u003eTiruneh, D. T., Verburgh, A., \u0026amp; Elen, J. (2014). Effectiveness of critical thinking instruction in higher education: A systematic review of intervention studies. \u003cem\u003eHigher Education Studies\u003c/em\u003e, \u003cem\u003e4\u003c/em\u003e(1), 1\u0026ndash;17. https://doi.org/10.5539/hes.v4n1p1\u003c/li\u003e\n\u003cli\u003eTlili, A., Shehata, B., Adarkwah, M. A., Bozkurt, A., Hickey, D. T., Huang, R., \u0026amp; Agyemang, B. (2023). What if the devil is my guardian angel: ChatGPT as a case study of using chatbots in education. \u003cem\u003eSmart Learning Environments\u003c/em\u003e, \u003cem\u003e10\u003c/em\u003e(1), Article 15. https://doi.org/10.1186/s40561-023-00237-x\u003c/li\u003e\n\u003cli\u003e*Toscano, R., Guerra, M. A., Dur\u0026aacute;n-Ball\u0026eacute;n, S., \u0026amp; Valarezo, B. M. (2024). WIP - Development of critical thinking in AEC students aided by artificial intelligence. In \u003cem\u003e2024 IEEE Frontiers in Education Conference (FIE)\u003c/em\u003e (pp. 1\u0026ndash;6). IEEE. https://doi.org/10.1109/FIE61694.2024.10893092\u003c/li\u003e\n\u003cli\u003e*Tressyalina, T., Ghaluh, B. M., Wulandari, E., Arief, E., \u0026amp; Noveria, E. (2025). Enhancing students\u0026apos; critical thinking in criminal case solving: An AI-based pragmatic application for analyzing authentic Indonesian texts and videos. \u003cem\u003eInteractive Learning Environments\u003c/em\u003e, 1\u0026ndash;33. Advance online publication. https://doi.org/10.1080/10494820.2025.2504062\u003c/li\u003e\n\u003cli\u003e*Trikoili, A., Georgiou, D., Pappa, C. I., \u0026amp; Pittich, D. (2025). Critical Thinking Assessment in Higher Education: A Mixed-Methods Comparative Analysis of AI and Human Evaluator. \u003cem\u003eInternational Journal of Human\u0026ndash;Computer Interaction\u003c/em\u003e, 1\u0026ndash;14. https://doi.org/10.1080/10447318.2025.2499164\u003c/li\u003e\n\u003cli\u003evan de Pol, J., Volman, M., \u0026amp; Beishuizen, J. (2010). Scaffolding in teacher\u0026ndash;student interaction: A decade of research. \u003cem\u003eEducational Psychology Review\u003c/em\u003e, \u003cem\u003e22\u003c/em\u003e(3), 271\u0026ndash;296. https://doi.org/10.1007/s10648-010-9127-6\u003c/li\u003e\n\u003cli\u003eVuogan, A. \u0026amp; Li, S. (2024). A systematic review of meta-analyses in second language research: current practices, issues, and recommendations. \u003cem\u003eApplied Linguistics Review\u003c/em\u003e, \u003cem\u003e15\u003c/em\u003e(4), 1621-1644. https://doi.org/10.1515/applirev-2022-0192\u003c/li\u003e\n\u003cli\u003eVygotsky, L. S. (1978). \u003cem\u003eMind in society: The development of higher psychological processes\u003c/em\u003e. Harvard University Press.\u003c/li\u003e\n\u003cli\u003e*Wahba, F., Ajlouni, A. O., \u0026amp; Abumosa, M. A. (2024). The impact of ChatGPT-based learning statistics on undergraduates\u0026rsquo; statistical reasoning and attitudes toward statistics. \u003cem\u003eEurasia Journal of Mathematics, Science and Technology Education\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e(7), em2468. https://doi.org/10.29333/ejmste/14726\u003c/li\u003e\n\u003cli\u003e*Waziana, W., Andewi, W., Wibisono, D., Hastomo, T. and Muslihudin, M. (2025). Exploring ChatGPT\u0026rsquo;s Impact on Critical, Creative, and Reflective Thinking Skills: A Mixed-Methods Study in an Indonesian EFL Classroom. \u003cem\u003eApplied Research on English Language\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(4), 77-114. http://doi.org/10.22108/are.2025.145896.2564 \u003c/li\u003e\n\u003cli\u003eWiredu, J. K., Zakaria, H., \u0026amp; Abuba, N. S. (2024). Impact of Generative AI in Academic Integrity and Learning Outcomes: A Case Study in the Upper East Region. \u003cem\u003eAsian Journal of Research in Computer Science,\u003c/em\u003e 17(8), 70\u0026ndash;88. https://doi.org/10.9734/ajrcos/2024/v17i7491\u003c/li\u003e\n\u003cli\u003eWinkler, R., \u0026amp; S\u0026ouml;rensen, J. F. L. (2024). Artificial intelligence alone will not democratise education: On educational inequality, techno-solutionism and inclusive tools\u003cem\u003e.\u003c/em\u003e \u003cem\u003eSustainability, 16\u003c/em\u003e(2), 781. https://doi.org/10.3390/su16020781\u003c/li\u003e\n\u003cli\u003eWood, D., Bruner, J. S., \u0026amp; Ross, G. (1976). The role of tutoring in problem solving. \u003cem\u003eJournal of Child Psychology and Psychiatry\u003c/em\u003e, \u003cem\u003e17\u003c/em\u003e(2), 89\u0026ndash;100. https://doi.org/10.1111/j.1469-7610.1976.tb00381.x\u003c/li\u003e\n\u003cli\u003e*Wu, B., He, Y.-N., Song, Y., \u0026amp; Li, H.-H. (2025). Fostering critical thinking in higher education: An intelligent dialogue-based approach empowered by conversational AI. \u003cem\u003eInteractive Learning Environments\u003c/em\u003e, 1\u0026ndash;18. https://doi.org/10.1080/10494820.2025.2538750 \u003c/li\u003e\n\u003cli\u003eXin, Y., Hao, G., Zhu, H., Shen, J., Yang, Y., \u0026amp; Ghanbari-H., A. (2025). Poor reporting quality and high proportion of missing data in economic evaluations alongside pragmatic trials: A cross-sectional survey. \u003cem\u003eBMC Medical Research Methodology\u003c/em\u003e, \u003cem\u003e25\u003c/em\u003e, 61. https://doi.org/10.1186/s12874-025-02519-\u003c/li\u003e\n\u003cli\u003eXu, F., Gage, N., Zeng, S., Zhang, M., Iun, A., O\u0026rsquo;Riordan, M., \u0026amp; Kim, E. (2024). The Use of Digital Interventions for Children and Adolescents with Autism Spectrum Disorder\u0026mdash;A Meta-Analysis. \u003cem\u003eJournal of Autism and Developmental Disorders.\u003c/em\u003e https://doi.org/10.1007/s10803-024-06563-4\u003c/li\u003e\n\u003cli\u003eZhang, L., \u0026amp; Al Shammari, H. (2025). Systematic literature review on critical thinking in higher education: Trends, measures and interventions. \u003cem\u003eLearning Gate Journal of Educational Research,\u003c/em\u003e 12(1), 1\u0026ndash;22. https://learning-gate.com/index.php/2576-8484/article/view/7377\u003c/li\u003e\n\u003cli\u003e*Zhang, Q., Siraj, S. B., \u0026amp; Abdul Razak, R. B. (2025). Effects of AI chatbots on EFL students\u0026rsquo; critical thinking skills and intrinsic motivation in argumentative writing. \u003cem\u003eInnovation in Language Learning and Teaching\u003c/em\u003e, 1\u0026ndash;29. https://doi.org/10.1080/17501229.2025.2515111\u003c/li\u003e\n\u003cli\u003e*Zhang, Y., Lai, X., Yi, S., \u0026amp; Lu, Y. (2025). Does ChatGPT-based reading platform impact foreign language paper reading? Evidence from a quasi-experimental study on Chinese undergraduate students. \u003cem\u003eEducation and Information Technologies\u003c/em\u003e, \u003cem\u003e30\u003c/em\u003e(7), 9737\u0026ndash;9754. https://doi.org/10.1007/s10639-024-13190-0\u003c/li\u003e\n\u003cli\u003e*Zhou, X., Teng, D., \u0026amp; Al-Samarraie, H. (2024). The Mediating Role of Generative AI Self-Regulation on Students\u0026rsquo; Critical Thinking and Problem-Solving. \u003cem\u003eEducation Sciences\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(12), 1302. https://doi.org/10.3390/educsci14121302\u003c/li\u003e\n\u003cli\u003e*Zou, D., Zhang, H., Zhao, Y., \u0026amp; Xu, P. (2025). Unleashing the potential: How ChatGPT improves gisting skills in student interpreters. \u003cem\u003eThe Interpreter and Translator Trainer\u003c/em\u003e, \u003cem\u003e19\u003c/em\u003e(1), 1\u0026ndash;25. https://doi.org/10.1080/1750399X.2025.2507540\u003c/li\u003e\n\u003cli\u003e*Zou, X., Su, P., Li, L., \u0026amp; Fu, P. (2024). AI-generated content tools and students\u0026rsquo; critical thinking: Insights from a Chinese university. \u003cem\u003eIFLA Journal\u003c/em\u003e, \u003cem\u003e50\u003c/em\u003e(2), 228\u0026ndash;241. https://doi.org/10.1177/03400352231214963\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Generative artificial intelligence, critical thinking, cognitive offloading, academic dishonesty, pedagogical design, meta-analysis","lastPublishedDoi":"10.21203/rs.3.rs-9461812/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9461812/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis meta-analysis examined whether generative artificial intelligence (GenAI) tools enhance or undermine critical thinking (CT) in university-level education. Through a systematic review of 43 empirical studies (2022\u0026ndash;2025) identified from IEEE Xplore, Web of Science, and ERIC following the PRISMA 2020 guidelines, this study synthesised 21 between-group comparisons, nine within-group pre\u0026ndash;post designs, and 13 correlational/structural equation models. Random-effects meta-analyses revealed significant positive effects of GenAI-supported instruction on CT outcomes in controlled comparisons (Hedges\u0026rsquo; \u003cem\u003eg\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.88, 95% CI [0.65, 1.11], \u003cem\u003ep\u003c/em\u003e \u0026lt; .001) and within AI-enhanced courses (\u003cem\u003eg\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.97, 95% CI [0.84, 1.11], \u003cem\u003ep\u003c/em\u003e \u0026lt; .001); however, correlational evidence showed negligible and heterogeneous associations between AI use frequency and CT (\u003cem\u003eg\u003c/em\u003e\u0026thinsp;=\u0026thinsp;0.24, 95% CI [\u0026minus;\u0026thinsp;0.03, 0.51], \u003cem\u003ep\u003c/em\u003e = .076). Meta-regression analyses demonstrated that high teacher guidance and explicit ethical framing significantly predicted stronger CT gains, whereas low-guidance implementation yielded diminished or adverse effects. The qualitative synthesis of all 49 studies confirmed this dual pattern: structured, scaffolded, and ethically framed GenAI integration supported deep analytical engagement, argumentation, and reflection, whereas unstructured access fostered overreliance, cognitive passivity, and integrity concerns. Findings were consistent across disciplines, indicating that pedagogical design, rather than discipline or tool type, determines GenAI impact on CT. This study offers evidence-based recommendations for curriculum designers, policymakers, and educators seeking to harness GenAI as a scaffold for critical inquiry rather than as a cognitive crutch.\u003c/p\u003e","manuscriptTitle":"A Mixed-Method Meta-Analysis on the Dual Impact of Generative AI on Undergraduates’ Critical Thinking in Higher Education","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-22 09:44:17","doi":"10.21203/rs.3.rs-9461812/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"27e7a922-bbdb-44c9-8b3f-04a30e48cb2b","owner":[],"postedDate":"April 22nd, 2026","published":true,"recentEditorialEvents":[{"type":"decision","content":"Rejected","date":"2026-05-18T07:11:19+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-05-15T06:34:25+00:00","index":"","fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-05-18T07:26:23+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-22 09:44:17","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9461812","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9461812","identity":"rs-9461812","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00