Sequential Human-AI Collaboration Impairs Narrative Creativity in University Students: A Randomized Controlled Trial | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Sequential Human-AI Collaboration Impairs Narrative Creativity in University Students: A Randomized Controlled Trial Haotian Liang, Dawei Zhang, Xianli An, Junjun Xu This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8774138/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 4 You are reading this latest preprint version Abstract The rise of generative AI (GenAI) has sparked interest in whether human-AI collaboration can enhance creativity, yet empirical findings remain mixed. This study systematically compared 4 collaboration models (i.e., AI-first, AI-follow, AI-parallel, and human-only) using a randomized controlled experiment involving 112 college students. Participants completed a two-stage narrative story-writing task. Creativity was assessed by both trained human raters (N = 10) and a validated AI model, while creative self-efficacy was measured before and after the task. A residualized linear model was used to control baseline differences and covariates. Results showed that sequential collaboration models (i.e., AI-first and AI-follow) significantly impaired narrative creativity relative to human-only creation, and AI-parallel collaboration did not improve creativity. None of the models improved creative self-efficacy. These findings suggest that GenAI may even disrupt key cognitive mechanisms underlying narrative creativity. Using a rigorous research design, we examine how different models of human-AI collaboration shape creativity. Implications for human-AI co-creation in education and design are discussed. Physical sciences/Mathematics and computing Biological sciences/Neuroscience Biological sciences/Psychology Social science/Psychology Generative AI Human-AI collaboration sequential collaboration synchronous collaboration narrative creativity creative self-efficacy Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 1 Introduction Creativity is operationally defined as the production of outcomes that are both novel and contextually appropriate, stemming from the integration of domain-relevant skills, creativity-relevant processes, and intrinsic task motivation (Amabile, 1983 ). The growth of Generative AI (GenAI) has challenged the anthropocentric view of creativity (Seli et al., 2025 ), with Large Language Models (LLMs) demonstrating capabilities in pattern recombination and fluency that rival human performance (Heigl, 2025 ; Zhou & Lee, 2024 ). The creative potential of GenAI prompts growing interest in whether specific human-AI collaboration models can be designed to enhance creative outcomes. Human–AI collaboration refers to an interactive process in which humans and AI pursue shared goals through coordinated and evolving roles (Berretta et al., 2023 ; Zhao et al., 2025 ). Theoretically, GenAI can function as a potent creativity support tool. According to the Dual-Pathway Creativity Model (DPCM), creative ideation arises from the interaction of two cognitive pathways: cognitive flexibility (the breadth of categories and set-shifting) and cognitive persistence (the depth of exploration within a given category) (Nijstad et al., 2010 ). GenAI can increase associative variety and reduce the mechanical demands of elaboration, potentially supporting both flexibility and persistence (Park, 2026 ). However, empirical evidence regarding the efficacy of Human-AI collaboration is mixed. For example, Holzner et al. ( 2025 ) conducted a meta-analysis and reported modest improvements in overall creative performance when individuals collaborated with GenAI compared to working alone (Hedges’ g = 0.27) but also identified a substantial decrease in idea diversity (Hedges’ g = -0.86), suggesting a homogenizing effect. Although GenAI-generated ideas were rated as roughly equivalent to human-generated ones overall (Hedges’ g = -0.05, ns), effects varied depending on participant expertise, task type, and AI model, indicating that creative outcomes are highly context-dependent. One key differentiator among collaboration models is the temporal and structural relationship between human and AI contributions. These models differ primarily in whether human and AI contributions occur sequentially or in parallel. Human-AI sequential collaboration involves humans and AI contributing in a predetermined order, forming a serial process in which one party’s output becomes the other’s input (Zhao et al., 2025 ; Zou et al., 2025 ). In contrast, AI-parallel collaboration features real-time, reciprocal interaction between the human and AI. Sequential collaboration further divides into two widely studied subtypes: AI-first assistance, where users receive initial input from AI and then refine the output independently, suited for efficiency-driven or low-risk tasks; and AI-follow assistance, in which users first generate content independently and then integrate AI-generated enhancements, typically used in high-risk or originality-focused contexts where human agency is prioritized (Gomez et al., 2025 ). Despite growing interest in these models, empirical research comparing them remains inconsistent. Hosanagar and Ahn ( 2025 ), in a controlled creative writing experiment, found that human-led ideation (AI-follow) resulted in higher story quality and participant satisfaction than AI-led generation (AI-first), which also reduced content diversity through semantic convergence. Similarly, Lee et al. ( 2025 ) showed that when users retained decision-making authority after receiving AI input (i.e., AI-follow), creative breadth was preserved, whereas full delegation (i.e., AI-first) narrowed the search space. Wu et al. ( 2025 ), across four large-scale experiments, observed that although GenAI enhanced immediate task performance, it reduced intrinsic motivation and increased boredom in subsequent solo tasks—particularly after AI-first assistance. In contrast, Baltà-Salvador et al. ( 2026 ), using a crossover design in educational ideation tasks, reported that AI-first assistance improved flexibility in follow-up unaided tasks, suggesting a scaffolding effect under certain conditions. While these studies offer valuable insights, most have not directly compared multiple collaboration models within a unified experimental framework, and few have focused on narrative creativity, an open-ended task that provides a more ecologically valid and sensitive assessment of individual creative expression (Niloy et al., 2024 ). The current study addresses both gaps by systematically comparing AI-first assistance, AI-follow assistance, AI-parallel collaboration, and human-only creation within a multi-stage narrative creativity task. Beyond creative performance, collaboration structure may also shape Creative Self-Efficacy (CSE), individuals’ belief in their capacity to produce creative outcomes (Bandura, 1978 ). Although GenAI tools can lower the threshold for ideation and enhance user confidence (Li et al., 2024 ), these benefits depend on users maintaining a sense of authorship and control (Wang et al., 2024 ). Here, too, the evidence is mixed. Fu et al. ( 2025 ) found that AI-initiated music composition enhanced self-efficacy through perceived competence, whereas McGuire et al. ( 2024 ) reported that AI-initiated poetry writing significantly reduced CSE compared to human-initiated workflows. These findings suggest that when individuals are relegated to the role of “auxiliary editors” in the creative process, their sense of agency and creative confidence may be undermined. To clarify these inconsistencies, further research is needed that directly compares collaboration structures using ecologically valid tasks while assessing both creative output and self-belief. Together, the present study adopts an exploratory approach to investigate how distinct human-AI collaboration models affect both narrative creativity and creative self-efficacy. Using an ecologically valid, multi-stage story-writing task, we aim to provide a more nuanced understanding of how collaboration structure shapes creative outcomes. 2 Methods 2.1 Participants The study protocol was reviewed and approved by the Ethics Committee of Yangzhou University (Approval No. JKY-2024121611). All participants were recruited in December 2024 and were undergraduate students from Yangzhou University, representing a range of academic disciplines including the humanities, social sciences, STEM, agriculture, and medicine. Eligibility criteria required that participants had no reported language or writing impairments and had experience using GenAI tools. Participants were informed of their right to withdraw from the study at any point without penalty. To encourage engagement, participants received performance-based compensation ranging from 25 to 45 CNY. Informed consent was obtained from all participants. Following data screening for attrition and non-compliance, 112 valid cases remained, resulting in a total of 224 creative writing samples (first and final drafts combined). Participants were randomly assigned to one of four experimental conditions: (1) AI-first - Human-AI collaboration on the first draft followed by independent refinement (n = 28, 16 male); (2) AI-follow - Independent first draft followed by human-AI collaboration during refinement (n = 25, 17 male); (3) AI-parallel - Human-AI collaboration throughout both stages (n = 30, 17 male); (4) Human-only - Independent creation across both stages (n = 29, 14 male). A post hoc power analysis using G*Power 3.1 (α = .05, two-tailed) with N = 112, four groups, two covariates, and 3 numerator degrees of freedom yielded a power of .844. 2.2 Measures 2.2.1 Creativity Task To evaluate participants’ narrative creativity in an ecologically valid context, this study employed an open-ended story-writing task based on the prompt: “Imagine and depict the future world” (Niloy et al., 2024 ). This task was selected for several reasons. First, it closely reflects real-world scenarios where human-AI collaboration is primarily verbal, making it well-suited for assessing co-creative dynamics with generative language models. Second, it engages both divergent and convergent thinking, core components of the Dual Pathway to Creativity Model (Nijstad et al., 2010 ), by requiring the generation of novel ideas and the construction of coherent narratives. Third, its open-ended structure minimizes the risk of content homogenization often associated with AI-generated responses, allowing for greater individual expression and variability in creative style (Niloy et al., 2024 ). Participants in AI-involved conditions received the following standardized instructions: “In this stage, you need to create with the help of GenAI. You may use any tool such as ChatGPT or DeepSeek and are free to ask questions and incorporate AI-generated content. The task topic is ‘Imagine and depict the future world.’ Your writing should demonstrate novelty, idea richness, diverse perspectives, detailed description, logical coherence, and structural completeness. You have 15 minutes. The higher your work is rated, the more compensation you will receive.” Participants in the Human-only control condition received identical instructions, with the exception that they were explicitly asked to complete the task independently without using any AI tools. 2.2.2 Creative Self-Efficacy Creative self-efficacy was measured using the Student Creative Self-Efficacy Scale developed by Hong and Lin ( 2004 ). The scale comprises 17 items across three dimensions: beliefs in creative strategies, beliefs in producing creative products, and beliefs in coping with negative evaluations. Items were rated on a 4-point Likert scale, with higher scores indicating stronger creative self-efficacy. In the present study, internal consistency was acceptable, with Cronbach’s α = .76 at pre-test and .83 at post-test. 2.2.3 Creative Personality Creative personality was assessed using the Williams Creativity Aptitude Test (WCAT), originally developed by F. E. Williams and revised for Chinese populations by Lin and Wang ( 1994 ). It consists of 50 items across four dimensions: curiosity, imagination, challenge, and adventure. Items are rated on a 3-point Likert scale, with higher scores reflecting greater creative disposition. In the present study, the scale showed excellent internal consistency (Cronbach’s α = .87). 2.2.4 AI Literacy AI literacy was measured using the Meta AI Literacy Scale developed by Carolus et al. ( 2023 ). The scale consists of 34 items assessing four dimensions: general AI literacy, AI creation, AI self-efficacy, and AI self-competence. Responses were recorded on an 11-point Likert scale (0 = strongly disagree to 10 = strongly agree), with higher total scores indicating greater overall AI literacy. In the present study, the scale showed excellent internal consistency (Cronbach’s α = .93). 2.3 Narrative Creativity Evaluation Participants’ written narratives were evaluated along four core dimensions of creativity commonly used in verbal and story-based tasks: originality (novelty and uniqueness of ideas), fluency (quantity of ideas), flexibility (diversity of thematic categories), and elaboration (level of descriptive detail and logical structure). Each dimension was rated on a 9-point scale (1 = very poor, 9 = very good), with higher scores indicating stronger narrative creativity. 2.3.1 Human evaluation Ten undergraduate students (4 male, 6 female) from diverse academic backgrounds served as human raters. All had completed coursework in innovation or entrepreneurship and had received university-level awards for creative achievement. Each narrative was independently rated by all ten raters. To minimize bias, all narratives were anonymized and presented in randomized order. Raters were blind to experimental conditions and scored narratives using a standardized rubric. Prior to formal evaluation, raters completed a calibration session, which included a review of creativity constructs and operational definitions, refinement of scoring criteria, and practice rating of 15 randomly selected narratives to establish baseline agreement. Interrater reliability was assessed using the intraclass correlation coefficient (ICC), calculated via a two-way mixed-effects model with absolute agreement and average measures – appropriate for continuous ratings and multi-rater designs. Pre-evaluation reliability was high (ICC = 0.873, 95% CI [0.74, 0.95]), and consistency remained acceptable during formal evaluation (ICC = 0.722, 95% CI [0.48, 0.88]). 2.3.2 AI Evaluation To complement human scoring and assess cross-method reliability, narrative creativity was also evaluated using GenAI tools. Prior research has shown that large language models can produce creativity assessments consistent with expert human ratings across domains including educational writing (Zhang et al., 2024 , 2025 ), artistic generation (Chen et al., 2024 ), and decision-making tasks (Doshi et al., 2025 ). Seven AI models were initially tested for their ability to follow the narrative scoring protocol. Only DeepSeek-V3 and ChatGPT-4o demonstrated reliable adherence to the scoring instructions. Other models exhibited issues such as incomplete outputs, patterned or biased scoring, irrelevant rationales, or formatting inconsistencies. To assess scoring stability, DeepSeek-V3 and ChatGPT-4o re-scored all narratives after a one-week interval using identical prompts. DeepSeek-V3 demonstrated acceptable test-retest reliability (r = .547, p < .001), while ChatGPT-4o produced inconsistent results (r = − .034, p = .610). DeepSeek’s scores also correlated moderately with human evaluations - r = .342 (p < .001) for first drafts and r = .501 (p < .001) for final drafts. Based on these findings, DeepSeek-V3 was selected as the primary AI-based evaluator in this study. 2.4 Procedure This study followed a two-stage creative writing design adapted from Gomez et al. ( 2025 ) and Zhang and Gosline ( 2023 ). Each participant completed a first draft and a final draft, with each stage limited to 15 minutes. Participants were randomly assigned to one of four creation modes: (1) AI-first assistance, in which the first draft was completed through human-AI collaboration and the final draft independently; (2) AI-follow assistance, involving independent creation of the first draft followed by human-AI collaboration in the final draft; (3) AI-parallel assistance, where both drafts were developed collaboratively with AI; and (4) Human-only, where both drafts were written independently without AI (serving as the control condition). During AI-assisted stages, participants were allowed to choose their preferred AI tools (e.g., ChatGPT, DeepSeek), send prompts freely, and selectively adopt or reject AI-generated content. This open format was intended to approximate real-world, self-directed co-creation. In the final draft stage, participants were permitted to reference their first draft but were not allowed to modify it directly. Upon completion, both drafts were independently evaluated for creativity by trained human raters and by a validated AI model, with all evaluators blinded to experimental conditions. 2.5 Statistical Analysis Chi-square tests and ANOVA were used to test for differences in demographic variables, AI literacy, and creative tendencies among the four groups to determine whether potential confounding variables might interfere with the results. Additionally, an ANOVA was performed to compare first-draft creativity scores. Despite random group assignment, significant differences were found at baseline. In non-randomized or observational designs, baseline imbalance in outcome measures is commonly addressed using difference-score models, as conditioning on baseline may introduce bias when baseline values are systematically related to group membership. However, in the present randomized controlled trial, the observed baseline imbalance is most plausibly attributable to sampling variability rather than confounding. Accordingly, a residual (ANCOVA) model was adopted, with post-intervention outcomes analyzed while adjusting for baseline values. This decision was further supported by strong evidence of regression to the mean (i.e., measurement errors in the creativity evaluations), indicated by a significant negative association between baseline scores and change scores. Adjusting for baseline (i.e., a residual model) in this context mitigates regression to the mean and provides more accurate estimates of group effects. Prior to hypothesis testing, the dataset was screened for missing data and input accuracy. Five participants (4.5%) had missing values. A missing-at-random (MAR) assumption was supported by diagnostic tests. To retain statistical power and maintain balanced group sizes, missing data were addressed using Multiple Imputation by Chained Equations (MICE). Five imputed datasets were generated, yielding a final sample of N = 112 for all analyses. Assumptions of the General Linear Model were then evaluated. Multicollinearity diagnostics showed that all Variance Inflation Factors (VIF) were below 2, indicating no collinearity concerns. Residual normality was verified using Q-Q plots, while homoscedasticity was tested with the Breusch-Pagan test. Sensitivity analyses using Cook’s Distance identified and excluded influential outliers (D > 4/n) on a model-by-model basis to ensure robustness. Our residual model was conducted as follows: Yi = β₀ + β₁Group1 i + β₂Group2 i + β₃Group3 i + β₄PreTest i + βₖCovariates i + ε i In this model, Yi represents the outcome variable (i.e., the human-rated or AI-rated creativity score). The categorical variable Group was dummy-coded, with the Human-only condition serving as the reference category. Coefficients β₁, β₂, and β₃ reflect the estimated differences in post-test outcomes between each AI-assisted condition and the control group, controlling for pre-test scores (β₄) and covariates (age, gender, AI literacy, and creative personality). 3 Results 3.1 Demographic Variables Participants had a mean age of 19.47 years (SD = 1.20), a mean creative personality score of 2.26 (SD = 0.19), and a mean AI literacy score of 5.44 (SD = 1.28). A chi-square test indicated no significant difference in gender distribution across the four groups, χ²(3, N = 112) = 2.14, p = .544. One-way ANOVAs revealed no significant group differences in age, F(3, 108) = 1.41, p = .244; creative personality, F(3, 108) = 0.08, p = .972; or AI literacy, F(3, 108) = 0.73, p = .537. These results indicate that the groups were demographically comparable (Table 1 .) Table 1 Descriptive Statistics of Demographic Variables Group Proportion Male Age Creative Personality AI Literacy AI-first 0.57 19.25 2.25 5.22 AI-follow 0.68 19.72 2.27 5.54 AI-parallel 0.57 19.70 2.27 5.68 Human-only 0.48 19.24 2.26 5.30 Note. Values represent group means unless otherwise indicated. Proportion male refers to the percentage of male participants in each condition. 3.2 Narrative Creativity: Human-Rated Evaluation We first examined group differences in overall creativity as assessed by human raters. Initial diagnostics identified seven influential cases based on Cook’s Distance; these were excluded. A robust ANCOVA revealed a significant main effect of group, F(3, 96) = 9.62, p < .001, η²ₚ = .38. Planned contrasts indicated that the Human-only group achieved a significantly higher adjusted mean score (M = 5.65, SE = 0.10) than both the AI-first group (M = 4.86, SE = 0.10) and the AI-follow group (M = 4.81, SE = 0.10). The effect sizes were large, with Human-only outperforming AI-first by 0.95 standard deviations (p = .002, d = -0.95) and AI-follow by 1.41 standard deviations (p < .001, d = -1.41). The difference between Human-only and AI-parallel (M = 5.17, SE = 0.10) was not statistically significant (p = .262, d = -0.33). 3.3 Robustness Check: AI-Rated Evaluation To assess the robustness of the findings, we repeated the analysis using creativity scores generated by an AI rater. Diagnostics identified eight influential cases, which were excluded. The ANCOVA again showed a significant main effect of group, F(3, 95) = 6.12, p < .001, η²ₚ = .16. Consistent with human ratings, the Human-only group (M = 6.40, SE = 0.20) scored significantly higher than both the AI-first group (M = 5.69, SE = 0.18; p = .005, d = -0.78) and the AI-follow group (M = 5.61, SE = 0.20; p = .017, d = -0.70). No significant difference was observed between Human-only and AI-parallel (M = 6.47, SE = 0.20; p = .515, d = 0.18). 3.4 Subdomain Analysis of Narrative Creativity To explore the cognitive mechanisms underlying the observed group differences, we conducted separate ANCOVAs for each creativity subdomain (originality, fluency, flexibility, elaboration). Influential cases were removed for each analysis. Omnibus p-values were adjusted using the Holm-Bonferroni method. The largest group effect was found in originality, F = 7.60, p < .001. The Human-only group significantly outperformed all other groups, with large differences compared to AI-first (p = .006, d = -0.87) and AI-follow (p < .001, d = -1.36), and a medium-to-large difference compared to AI-parallel (p = .036, d = -0.62). Significant effects were also observed for fluency (F = 5.66, p = .003) and elaboration (F = 7.10, p < .001). In both domains, the Human-only group scored significantly higher than AI-first (Fluency: d = -0.61; Elaboration: d = -0.67) and AI-follow (Fluency: d = -1.00; Elaboration: d = -1.24). No significant differences were found between Human-only and AI-parallel in either fluency (p = .867) or elaboration (p = .428). Although the omnibus test for flexibility was significant (F = 3.80, p = .013), effect sizes were smaller. The Human-only group performed marginally better than AI-follow (p = .085, d = -0.50), but differences with AI-first (p = .581) and AI-parallel (p = .111) were not statistically significant. 3.5 Creative Self-Efficacy To assess whether changes in objective performance were mirrored by changes in students’ creative confidence, we analyzed post-test creative self-efficacy scores. After excluding 10 influential cases, the final sample was N = 102. The ANCOVA revealed no significant main effect of group, F(3, 93) = 2.29, p = .083, η²ₚ = .16. While pre-test self-efficacy strongly predicted post-test scores (p < .001), the experimental conditions did not significantly influence changes in creative self-efficacy. Given the non-significant omnibus result, no further pairwise comparisons were conducted. 4 Discussion This study examined how different human-AI collaboration models influence narrative creativity and creative self-efficacy in a two-stage story-writing task. Across both human and AI evaluations, the findings consistently revealed that sequential collaboration, whether in the form of AI-first or AI-follow assistance, undermined creativity relative to human-only creation, while AI-parallel collaboration (real-time co-creation) offered no significant advantage. Furthermore, none of the collaboration models led to improvements in creative self-efficacy. DPCM explains the creative deficits observed in sequential collaboration, which posits that creativity arises from the interplay of cognitive persistence and cognitive flexibility. In this study, both human and AI evaluations consistently showed that sequential collaboration significantly reduced narrative creativity than human-only creation. This pattern aligns with prior studies across multiple domains, which show that AI-first workflows tend to reduce creators’ sense of ownership and narrative control (Wu et al., 2025 ), impair content quality and diversity (Hosanagar & Ahn, 2025 ), and narrow the scope of exploration by over-relying on early AI-generated ideas (Lee et al., 2025 ). From a DPCM perspective, sequential collaboration disrupts cognitive persistence by forcing process handovers and mode switching, increasing cognitive load, fragmenting attention, and undermining intrinsic motivation (Siddiqui et al., 2025 ). In parallel, it impairs flexibility through mechanisms such as cognitive anchoring in AI-first conditions (Chen & Chan, 2024 ), stylistic homogenization in AI-follow outputs (Radwan et al., 2024 ), and algorithm aversion or confirmation bias that cause users to disregard divergent AI suggestions (Hwang et al., 2024 ). Together, these disruptions help explain the consistent creativity decrements observed under sequential collaboration. Although real-time AI-parallel collaboration is often presumed to enhance creativity through dynamic feedback and fluid interaction, the present study found no significant improvement in narrative creativity compared to human-only creation. This finding aligns with a growing body of research suggesting that human-AI collaboration, even when synchronous, does not reliably outperform solo human effort in open-ended tasks. Meta-analytic evidence shows that collaboration offers, at best, marginal gains with no consistent advantage over the better-performing party (Vaccaro et al., 2024 ), while experimental work by Luan et al. ( 2025 ) revealed that only human-only conditions yielded cumulative increases in creative quality across multiple rounds, with AI-assisted groups showing stagnant performance. These outcomes suggest that greater AI involvement does not inherently translate into greater creative value. The absence of benefit in the AI-parallel condition may reflect added cognitive load: although AI-parallel workflows theoretically promote cognitive flexibility through ongoing idea generation and associative prompts, they also impose continuous demands on users to manage the interaction, for example, formulating prompts, interpreting, and deciding which content to incorporate. These demands can disrupt focus, fragment the creative flow, and diminish cognitive persistence. Moreover, in the absence of structured guidance, users may default to low-effort engagement patterns, such as prompt-copy cycles, resulting in shallow rather than synergistic co-creation. While consistent with prior findings, this study offers several methodological advances. Unlike earlier work that focused on isolated modes, we systematically compared AI-first, AI-follow, AI-parallel, and human-only collaboration within a unified experimental design. Creativity was evaluated using both trained human raters and a validated AI model, ensuring cross-method validation in a domain where assessment is often subjective. Additionally, the use of an open-ended, multi-stage story-writing task provided higher ecological validity than common divergent thinking or short-form tasks, reflecting the complexity of real-world creative expression. These design improvements strengthen the reliability and generalizability of this field. These methodological improvements (i.e., direct model comparison, more ecological narrative creativity task, and dual evaluation) help explain inconsistencies in previous findings. Many past studies differed in task structure, evaluation focus, or collaboration timing – factors that may explain conflicting results. For example, creativity gains observed in structured or short-form tasks (e.g., Guo et al., 2025 ; Huang et al., 2026 ) may reflect the suitability of AI as an optimization tool, rather than a true co-creative partner. Other studies separated human and AI input phases artificially or failed to control for interaction quality and user expertise (e.g., Chen & Chan, 2024 ; Zhang & Gosline, 2023 ), limiting their ecological validity. By comparing all major collaboration modes under naturalistic, open-ended conditions and using multi-source evaluation, the present study offers a more concrete account of when and why AI fails to enhance or even hinders creativity. This study found that neither sequential nor parallel human-AI collaboration significantly improved creative self-efficacy compared to human-only creation. While GenAI may reduce entry barriers and offer immediate feedback, these advantages do not necessarily translate into increased confidence in one’s creative ability. According to Bandura’s ( 1978 ) self-efficacy theory, perceived competence is primarily shaped through successes attributed to one’s own efforts. In collaborative settings where AI generates or refines content, creators may perceive themselves as passive contributors rather than agents of their own success. This diluted sense of ownership likely limits the formation of internal attributions necessary to strengthen self-efficacy. The findings offer practical implications for GenAI integration in writing instruction. In educational settings, AI may be better suited as a feedback or refinement tool rather than a co-creator. Sequential collaboration, where GenAI intervenes early or late as a creator, was found to impair creativity in this study, suggesting that reliance on GenAI may hinder student agency and originality. Educators should consider structuring GenAI use to support rather than direct the creative process, preserving students’ ownership and engagement. This study has several limitations. First, the sample was restricted to college students, and age has been shown to moderate the effects of GenAI on creativity (Ma et al., 2025 ). Future research could involve more diverse samples. Second, the study focused on a single unstructured narrative creativity task. Future work could compare outcomes across creative types (e.g., artistic vs. problem-solving) and components (e.g., divergent vs. convergent thinking). Third, the current design did not track real-time cognitive and affective changes during collaboration. Future research should explore how GenAI influences users’ mental states, including motivation and engagement. Lastly, while the study permitted flexible user-AI interactions, it did not control or categorize the specific modes of interaction. As recent work suggests that interaction patterns significantly shape collaboration outcomes (Zhang et al., 2025 ), future research should examine how different styles of human-AI engagement moderate creative performance. This study provides systematic evidence that the structure of human-AI collaboration significantly shapes creative outcomes. In a multi-stage narrative writing task, sequential collaboration – whether AI-first or AI-follow – consistently impaired creativity relative to human-only creation, while real-time AI-parallel collaboration offered no significant advantage. Moreover, none of the collaboration models improved creative self-efficacy. These findings challenge the assumptions that AI assistance reliably enhances creativity across tasks and underscore the importance of collaboration design. By employing a fully crossed experimental framework, ecologically valid task, and multi-source evaluation, this study clarifies when and why GenAI may fail to enhance, or even hinder, creativity. Future work should continue to refine human-AI workflows to ensure that technological support enhances rather than replaces the human creative process. Declarations Competing Interests The authors declare no competing interests relevant to the content of this article. Ethics Approval This study was approved by the relevant Institutional Review Board. All procedures were conducted in accordance with the Declaration of Helsinki. All identifying details have been anonymized for blind review. Author Contribution H.L. was responsible for conceptualizing the study, designing the research framework, conducting the experiments, performing data analysis, and drafting the manuscript. D.Z. was responsible for designing the experiments, performing data analysis, reviewing and editing the manuscript, and supervising the project. X.A. was responsible for conceptualizing the study and supervising the project. J.X. was responsible for conducting experiments, data curation, visualization, and reviewing and editing the manuscript. All the authors have read and agreed to the published version of the manuscript. Data Availability The datasets and materials supporting this study will be made available in the supplementary files accompanying the manuscript. Additional data can be obtained from the corresponding author upon reasonable request. References Amabile TM (1983) The social psychology of creativity: A componential conceptualization. J Personal Soc Psychol 45(2):357–376. https://doi.org/10.1037/0022-3514.45.2.357 Baltà-Salvador R, Brasó-Vives E, Peña M (2026) Evaluating AI-assisted creative ideation: A crossover study in higher education. Think Skills Creativity 59:101958. https://doi.org/10.1016/j.tsc.2025.101958 Bandura A (1978) Self-efficacy: Toward a unifying theory of behavioral change. Advances in Behaviour Research and Therapy, Perceived Self-Efficacy: Analyses of Bandura’s Theory of Behavioural Change , 1 (4), 139–161. https://doi.org/10.1016/0146-6402(78)90002-4 Berretta S, Tausch A, Ontrup G, Gilles B, Peifer C, Kluge A (2023) Defining human-AI teaming the human-centered way: A scoping review and network analysis. Frontiers in Artificial Intelligence , 6 . https://doi.org/10.3389/frai.2023.1250725 Carolus A, Koch MJ, Straka S, Latoschik ME, Wienrich C (2023) MAILS - meta AI literacy scale: Development and testing of an AI literacy questionnaire based on well-founded competency models and psychological change- and meta-competencies. Computers Hum Behavior: Artif Hum 1(2):100014. https://doi.org/10.1016/j.chbah.2023.100014 Chen J, An J, Lyu H, Kanan C, Luo J (2024) Learning to Evaluate the Artness of AI-Generated Images. IEEE Trans Multimedia 26:10731–10740. https://doi.org/10.1109/TMM.2024.3410672 Chen Z, Chan J (2024) Large language model in creative work: The role of collaboration modality and user expertise. Manage Sci 70(12):9101–9117. https://doi.org/10.1287/mnsc.2023.03014 Doshi AR, Bell JJ, Mirzayev E, Vanneste BS (2025) Generative artificial intelligence and evaluating strategic decisions. Strateg Manag J 46(3):583–610. https://doi.org/10.1002/smj.3677 Fu Y, Newman M, Going L, Feng Q, Lee JH (2025) Exploring the collaborative Co-creation process with AI: A case study in novice music production. Proceedings of the 2025 ACM Designing Interactive Systems Conference , 1298–1312. DIS ’25: Designing Interactive Systems Conference. https://doi.org/10.1145/3715336.3735829 Gomez C, Cho SM, Ke S, Huang C-M, Unberath M (2025) Human-AI collaboration is not very collaborative yet: A taxonomy of interaction patterns in AI-assisted decision making from a systematic review. Frontiers in Computer Science , 6 . https://doi.org/10.3389/fcomp.2024.1521066 Guo A, Sathyanarayanan S, Wang L, Heer J, Zhang A (2025) From pen to prompt: How creative writers integrate AI into their writing practice. Proceedings of the 2025 Conference on Creativity and Cognition , 527–545. https://doi.org/10.1145/3698061.3726910 Heigl R (2025) Generative artificial intelligence in creative contexts: A systematic review and future research agenda. Manage Rev Q. https://doi.org/10.1007/s11301-025-00494-9 Holzner N, Maier S, Feuerriegel S (2025) Generative AI and creativity: A systematic literature review and meta-analysis (arXiv:2505.17241). https://doi.org/10.48550/arXiv.2505.17241 . arXiv Hong SP, Lin SR (2004) Whatever you say, I can do it: Development of the Student Creative Self-Efficacy Scale. In Proceedings of Taiwan’s 2nd Symposium on Innovation and Creativity. Taiwan Hosanagar K, Ahn D (2025) Designing human and generative AI collaboration (arXiv:2412.14199). arXiv. https://doi.org/10.48550/arXiv.2412.14199 Huang S, Long L, Zhu Y, Zhu JNY (2026) Human–GenAI collaboration across creative phases: Cognitive mechanisms shaping novelty and usefulness. Int J Inf Manag 86:102986. https://doi.org/10.1016/j.ijinfomgt.2025.102986 Hwang AH-C, Liao QV, Blodgett SL, Olteanu A, Trischler A (2024) ‘It was 80% me, 20% AI’: Seeking authenticity in Co-writing with large language models (arXiv:2411.13032). arXiv. https://doi.org/10.48550/arXiv.2411.13032 Lee J, Kim JS, Shin S, You S (2025) The impact of AI-human collaboration models on creativity: A study on search processes and decision-making dynamics. SAIS 2025 Proceedings . https://aisel.aisnet.org/sais2025/20 Li H, Wang Y, Qu H (2024) Where are we so far? Understanding data storytelling tools from the perspective of human-AI collaboration. Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, CHI ’24 , 1–19. https://doi.org/10.1145/3613904.3642726 Lin HT, Wang MR (1994) Creativity Assessment Packet (CAP): Revised Chinese version. Psychology Publishing Company, Ltd., Taipei, Taiwan Luan L, Kim YJ, Zhou J (2025) Augmented learning for joint creativity in human-GenAI co-creation . https://doi.org/10.17863/CAM.122297 Ma H, Zhang Y, Shan X, Hu X (2025) Exploring the impact of artificial intelligence on the creativity perception of music practitioners. J Intell 13(4):47. https://doi.org/10.3390/jintelligence13040047 McGuire J, De Cremer D, Van de Cruys T (2024) Establishing the importance of co-creation and self-efficacy in creative collaboration with artificial intelligence. Sci Rep 14(1):18525. https://doi.org/10.1038/s41598-024-69423-2 Nijstad BA, Dreu CKWD, Rietzschel EF, Baas M (2010) The dual pathway to creativity model: Creative ideation as a function of flexibility and persistence. European Review of Social Psychology . (world). https://doi.org/10.1080/10463281003765323 Niloy AC, Akter S, Sultana N, Sultana J, Rahman SIU (2024) Is chatgpt a menace for creative writing ability? An experiment. J Comput Assist Learn 40(2):919–930. https://doi.org/10.1111/jcal.12929 Park MJ (2026) AI as a cognitive collaborator: Assimilation and accommodation in human–machine teaming for innovation. J Innov Knowl 12:100892. https://doi.org/10.1016/j.jik.2025.100892 Radwan AY, Alasmari KM, Abdulbagi OA, Alghamdi EA (2024) SARD: A human-AI collaborative story generation (arXiv:2403.01575). arXiv. https://doi.org/10.48550/arXiv.2403.01575 Seli P, Ragnhildstveit A, Orwig W, Bellaiche L, Spooner S, Barr N (2025) Beyond the brush: Human versus artificial intelligence creativity in the realm of generative art. Psychol Aesthet Creativity Arts. https://doi.org/10.1037/aca0000743 Siddiqui MN, Pea RD, Subramonyam H (2025) Script&shift: A layered interface paradigm for integrating content development and rhetorical strategy with LLM writing assistants. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI ’25 , 1–19. https://doi.org/10.1145/3706598.3714119 Vaccaro M, Almaatouq A, Malone T (2024) When combinations of humans and AI are useful: A systematic review and meta-analysis. Nat Hum Behav 1–11. https://doi.org/10.1038/s41562-024-02024-1 Wang S, Wang F, Zhu Z, Wang J, Tran T, Du Z (2024) Artificial intelligence in education: A systematic literature review. Expert Syst Appl 252:124167. https://doi.org/10.1016/j.eswa.2024.124167 Wu S, Liu Y, Ruan M, Chen S, Xie X-Y (2025) Human-generative AI collaboration enhances task performance but undermines human’s intrinsic motivation. Sci Rep 15(1):15105. https://doi.org/10.1038/s41598-025-98385-2 Zhang D-W, Boey M, Tan YY, Jia AHS (2024) Evaluating large language models for criterion-based grading from agreement to consistency. Npj Sci Learn 9(1):79. https://doi.org/10.1038/s41539-024-00291-1 Zhang D-W, Hong X, Qi Y (2025) When and how does LLM-generated feedback surpass traditional automated writing evaluation? A learning trajectory analysis of writing improvement (Fkx3v_v1). PsyArXiv. https://doi.org/10.31234/osf.io/fkx3v_v1 Zhang Y, Gosline R (2023) Human favoritism, not AI aversion: People’s perceptions (and bias) toward generative AI, human experts, and human–GAI collaboration in persuasive content generation. Judgm Decis Mak 18:e41. https://doi.org/10.1017/jdm.2023.37 Zhao M, Simmons R, Admoni H (2025) The role of adaptation in collective human–AI teaming. Top Cogn Sci 17(2):291–323. https://doi.org/10.1111/tops.12633 Zhou E, Lee D (2024) Generative artificial intelligence, human creativity, and art. PNAS Nexus 3(3):pgae052. https://doi.org/10.1093/pnasnexus/pgae052 Zou HP, Huang W-C, Wu Y, Chen Y, Miao C, Nguyen H, Zhou Y, Zhang W, Fang L, He L, Li Y, Li D, Jiang R, Liu X, Yu PS (2025) LLM-based human-agent collaboration and interaction systems: A survey (arXiv:2505.00753). arXiv. https://doi.org/10.48550/arXiv.2505.00753 Additional Declarations No competing interests reported. Supplementary Files DataSPSS.sav researchtools.pdf Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 10 Feb, 2026 Editor assigned by journal 10 Feb, 2026 Submission checks completed at journal 07 Feb, 2026 First submitted to journal 03 Feb, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8774138","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":584941331,"identity":"f756d7a2-5839-4d86-a022-3f686c9c0c61","order_by":0,"name":"Haotian Liang","email":"","orcid":"","institution":"Yangzhou University","correspondingAuthor":false,"prefix":"","firstName":"Haotian","middleName":"","lastName":"Liang","suffix":""},{"id":584941332,"identity":"cc41b81e-3a44-4423-89a3-afc1303fd853","order_by":1,"name":"Dawei Zhang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABAElEQVRIiWNgGAWjYDCCA0CcAEYMjA/AIhJAzEOkFmYD4rVAdDGwSRClhe/4GcMHDxjq8vhnt1+rulFxR15+dgPjg7dtuLVInskxNkhgYCuWuHOm7HbOmWeGG+4cYDaci0eLwYHcbRIJDDyJDTdy0m7nth1m3CCRwCbNi0/L+bfbfyQwSCTOB2opBmqxnz8jgf03Xi03crcB/W6QuOFG+jFmoBagdQlszPi0SN54/1kiwSAhceONHGbpnDOHkzfcSGyWnHMOtxa+82mJH39U1CXOu5H+8HNOxWHb+TOSD354U4ZbC9R5IILHAMpjbCCkHgbYHxCrchSMglEwCkYYAADlGlvdaNEXVwAAAABJRU5ErkJggg==","orcid":"","institution":"Monash University Malaysia","correspondingAuthor":true,"prefix":"","firstName":"Dawei","middleName":"","lastName":"Zhang","suffix":""},{"id":584941333,"identity":"c51c059a-983f-43ba-a5d3-1aa7072e7505","order_by":2,"name":"Xianli An","email":"","orcid":"","institution":"Yangzhou University","correspondingAuthor":false,"prefix":"","firstName":"Xianli","middleName":"","lastName":"An","suffix":""},{"id":584941334,"identity":"3712bdc4-df60-462b-a952-929911a6968a","order_by":3,"name":"Junjun Xu","email":"","orcid":"","institution":"Yangzhou University","correspondingAuthor":false,"prefix":"","firstName":"Junjun","middleName":"","lastName":"Xu","suffix":""}],"badges":[],"createdAt":"2026-02-03 10:08:45","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8774138/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8774138/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":101941294,"identity":"ba9dedf0-b037-4c49-9e39-5849dda22788","added_by":"auto","created_at":"2026-02-05 09:20:27","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":46746,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of mean creativity scores between first and final drafts across four experimental groups (Human evaluation). The four groups are: AI-first, AI-follow, AI-parallel, and Human-only.\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-8774138/v1/57b37149bab6ba928ec25acc.png"},{"id":101941293,"identity":"89d96655-53cd-4daf-bf83-dca5fab77326","added_by":"auto","created_at":"2026-02-05 09:20:26","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":47132,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of mean creativity scores between first and final drafts across four experimental groups (AI evaluation). Creativity was evaluated using the DeepSeek-V3 model. Groups are as defined in Fig 1.\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-8774138/v1/4d8a3691dbdbb17a931e8417.png"},{"id":101941292,"identity":"4efc5e94-28a6-4a7e-bfb0-026013a6241a","added_by":"auto","created_at":"2026-02-05 09:20:26","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":99654,"visible":true,"origin":"","legend":"\u003cp\u003eHuman-evaluated creativity subscale scores: (a) Originality, (b) Fluency, (c) Flexibility, (d) Elaboration.\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-8774138/v1/e0760a1af8407716bfe5acc5.png"},{"id":101943560,"identity":"e632b332-1dd8-4f32-9644-10d93d4675ce","added_by":"auto","created_at":"2026-02-05 09:42:21","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":69781,"visible":true,"origin":"","legend":"\u003cp\u003eAI-evaluated creativity subscale scores: (a) Originality, (b) Fluency, (c) Flexibility, (d) Elaboration.\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-8774138/v1/7e14dede82ff293a63fd8530.png"},{"id":101941289,"identity":"1e78d4ff-a945-46e0-b94a-7291a4478038","added_by":"auto","created_at":"2026-02-05 09:20:26","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":59027,"visible":true,"origin":"","legend":"\u003cp\u003eChanges in creative self-efficacy scores from pre-test to post-test across the four experimental groups.\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-8774138/v1/b74dc07188f4ec083bfaa396.png"},{"id":101944173,"identity":"53e6b188-be46-44a5-b677-fccca7fac4ff","added_by":"auto","created_at":"2026-02-05 09:49:29","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":881214,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8774138/v1/b23fd5d6-0ec6-40c3-b117-0d01ae7ea44e.pdf"},{"id":101941287,"identity":"e8b9d75b-92b5-4c92-ac62-0e2f6df61113","added_by":"auto","created_at":"2026-02-05 09:20:26","extension":"sav","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":19052,"visible":true,"origin":"","legend":"","description":"","filename":"DataSPSS.sav","url":"https://assets-eu.researchsquare.com/files/rs-8774138/v1/b9dc5dfe77302df009deb76e.sav"},{"id":101941291,"identity":"97e20227-c689-470e-8fc3-a58f72cdf55a","added_by":"auto","created_at":"2026-02-05 09:20:26","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":429951,"visible":true,"origin":"","legend":"","description":"","filename":"researchtools.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8774138/v1/3acc7cbd716f70a28ba93383.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Sequential Human-AI Collaboration Impairs Narrative Creativity in University Students: A Randomized Controlled Trial","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eCreativity is operationally defined as the production of outcomes that are both novel and contextually appropriate, stemming from the integration of domain-relevant skills, creativity-relevant processes, and intrinsic task motivation (Amabile, \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1983\u003c/span\u003e). The growth of Generative AI (GenAI) has challenged the anthropocentric view of creativity (Seli et al., \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), with Large Language Models (LLMs) demonstrating capabilities in pattern recombination and fluency that rival human performance (Heigl, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Zhou \u0026amp; Lee, \u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eThe creative potential of GenAI prompts growing interest in whether specific human-AI collaboration models can be designed to enhance creative outcomes. Human\u0026ndash;AI collaboration refers to an interactive process in which humans and AI pursue shared goals through coordinated and evolving roles (Berretta et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Zhao et al., \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Theoretically, GenAI can function as a potent creativity support tool. According to the Dual-Pathway Creativity Model (DPCM), creative ideation arises from the interaction of two cognitive pathways: cognitive flexibility (the breadth of categories and set-shifting) and cognitive persistence (the depth of exploration within a given category) (Nijstad et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2010\u003c/span\u003e). GenAI can increase associative variety and reduce the mechanical demands of elaboration, potentially supporting both flexibility and persistence (Park, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2026\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eHowever, empirical evidence regarding the efficacy of Human-AI collaboration is mixed. For example, Holzner et al. (\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2025\u003c/span\u003e) conducted a meta-analysis and reported modest improvements in overall creative performance when individuals collaborated with GenAI compared to working alone (Hedges\u0026rsquo; g\u0026thinsp;=\u0026thinsp;0.27) but also identified a substantial decrease in idea diversity (Hedges\u0026rsquo; g = -0.86), suggesting a homogenizing effect. Although GenAI-generated ideas were rated as roughly equivalent to human-generated ones overall (Hedges\u0026rsquo; g = -0.05, ns), effects varied depending on participant expertise, task type, and AI model, indicating that creative outcomes are highly context-dependent.\u003c/p\u003e \u003cp\u003eOne key differentiator among collaboration models is the temporal and structural relationship between human and AI contributions. These models differ primarily in whether human and AI contributions occur sequentially or in parallel. Human-AI sequential collaboration involves humans and AI contributing in a predetermined order, forming a serial process in which one party\u0026rsquo;s output becomes the other\u0026rsquo;s input (Zhao et al., \u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Zou et al., \u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). In contrast, AI-parallel collaboration features real-time, reciprocal interaction between the human and AI. Sequential collaboration further divides into two widely studied subtypes: AI-first assistance, where users receive initial input from AI and then refine the output independently, suited for efficiency-driven or low-risk tasks; and AI-follow assistance, in which users first generate content independently and then integrate AI-generated enhancements, typically used in high-risk or originality-focused contexts where human agency is prioritized (Gomez et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eDespite growing interest in these models, empirical research comparing them remains inconsistent. Hosanagar and Ahn (\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), in a controlled creative writing experiment, found that human-led ideation (AI-follow) resulted in higher story quality and participant satisfaction than AI-led generation (AI-first), which also reduced content diversity through semantic convergence. Similarly, Lee et al. (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2025\u003c/span\u003e) showed that when users retained decision-making authority after receiving AI input (i.e., AI-follow), creative breadth was preserved, whereas full delegation (i.e., AI-first) narrowed the search space. Wu et al. (\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), across four large-scale experiments, observed that although GenAI enhanced immediate task performance, it reduced intrinsic motivation and increased boredom in subsequent solo tasks\u0026mdash;particularly after AI-first assistance. In contrast, Balt\u0026agrave;-Salvador et al. (\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2026\u003c/span\u003e), using a crossover design in educational ideation tasks, reported that AI-first assistance improved flexibility in follow-up unaided tasks, suggesting a scaffolding effect under certain conditions. While these studies offer valuable insights, most have not directly compared multiple collaboration models within a unified experimental framework, and few have focused on narrative creativity, an open-ended task that provides a more ecologically valid and sensitive assessment of individual creative expression (Niloy et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). The current study addresses both gaps by systematically comparing AI-first assistance, AI-follow assistance, AI-parallel collaboration, and human-only creation within a multi-stage narrative creativity task.\u003c/p\u003e \u003cp\u003eBeyond creative performance, collaboration structure may also shape Creative Self-Efficacy (CSE), individuals\u0026rsquo; belief in their capacity to produce creative outcomes (Bandura, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e1978\u003c/span\u003e). Although GenAI tools can lower the threshold for ideation and enhance user confidence (Li et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), these benefits depend on users maintaining a sense of authorship and control (Wang et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Here, too, the evidence is mixed. Fu et al. (\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2025\u003c/span\u003e) found that AI-initiated music composition enhanced self-efficacy through perceived competence, whereas McGuire et al. (\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2024\u003c/span\u003e) reported that AI-initiated poetry writing significantly reduced CSE compared to human-initiated workflows. These findings suggest that when individuals are relegated to the role of \u0026ldquo;auxiliary editors\u0026rdquo; in the creative process, their sense of agency and creative confidence may be undermined. To clarify these inconsistencies, further research is needed that directly compares collaboration structures using ecologically valid tasks while assessing both creative output and self-belief.\u003c/p\u003e \u003cp\u003eTogether, the present study adopts an exploratory approach to investigate how distinct human-AI collaboration models affect both narrative creativity and creative self-efficacy. Using an ecologically valid, multi-stage story-writing task, we aim to provide a more nuanced understanding of how collaboration structure shapes creative outcomes.\u003c/p\u003e"},{"header":"2 Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Participants\u003c/h2\u003e \u003cp\u003eThe study protocol was reviewed and approved by the Ethics Committee of Yangzhou University (Approval No. JKY-2024121611). All participants were recruited in December 2024 and were undergraduate students from Yangzhou University, representing a range of academic disciplines including the humanities, social sciences, STEM, agriculture, and medicine. Eligibility criteria required that participants had no reported language or writing impairments and had experience using GenAI tools. Participants were informed of their right to withdraw from the study at any point without penalty. To encourage engagement, participants received performance-based compensation ranging from 25 to 45 CNY. Informed consent was obtained from all participants.\u003c/p\u003e \u003cp\u003eFollowing data screening for attrition and non-compliance, 112 valid cases remained, resulting in a total of 224 creative writing samples (first and final drafts combined). Participants were randomly assigned to one of four experimental conditions: (1) AI-first - Human-AI collaboration on the first draft followed by independent refinement (n\u0026thinsp;=\u0026thinsp;28, 16 male); (2) AI-follow - Independent first draft followed by human-AI collaboration during refinement (n\u0026thinsp;=\u0026thinsp;25, 17 male); (3) AI-parallel - Human-AI collaboration throughout both stages (n\u0026thinsp;=\u0026thinsp;30, 17 male); (4) Human-only - Independent creation across both stages (n\u0026thinsp;=\u0026thinsp;29, 14 male). A post hoc power analysis using G*Power 3.1 (α\u0026thinsp;=\u0026thinsp;.05, two-tailed) with N\u0026thinsp;=\u0026thinsp;112, four groups, two covariates, and 3 numerator degrees of freedom yielded a power of .844.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Measures\u003c/h2\u003e \u003cdiv id=\"Sec5\" class=\"Section3\"\u003e \u003ch2\u003e2.2.1 Creativity Task\u003c/h2\u003e \u003cp\u003eTo evaluate participants\u0026rsquo; narrative creativity in an ecologically valid context, this study employed an open-ended story-writing task based on the prompt: \u0026ldquo;Imagine and depict the future world\u0026rdquo; (Niloy et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). This task was selected for several reasons. First, it closely reflects real-world scenarios where human-AI collaboration is primarily verbal, making it well-suited for assessing co-creative dynamics with generative language models. Second, it engages both divergent and convergent thinking, core components of the Dual Pathway to Creativity Model (Nijstad et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2010\u003c/span\u003e), by requiring the generation of novel ideas and the construction of coherent narratives. Third, its open-ended structure minimizes the risk of content homogenization often associated with AI-generated responses, allowing for greater individual expression and variability in creative style (Niloy et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eParticipants in AI-involved conditions received the following standardized instructions: \u0026ldquo;In this stage, you need to create with the help of GenAI. You may use any tool such as ChatGPT or DeepSeek and are free to ask questions and incorporate AI-generated content. The task topic is \u0026lsquo;Imagine and depict the future world.\u0026rsquo; Your writing should demonstrate novelty, idea richness, diverse perspectives, detailed description, logical coherence, and structural completeness. You have 15 minutes. The higher your work is rated, the more compensation you will receive.\u0026rdquo; Participants in the Human-only control condition received identical instructions, with the exception that they were explicitly asked to complete the task independently without using any AI tools.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section3\"\u003e \u003ch2\u003e2.2.2 Creative Self-Efficacy\u003c/h2\u003e \u003cp\u003eCreative self-efficacy was measured using the Student Creative Self-Efficacy Scale developed by Hong and Lin (\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2004\u003c/span\u003e). The scale comprises 17 items across three dimensions: beliefs in creative strategies, beliefs in producing creative products, and beliefs in coping with negative evaluations. Items were rated on a 4-point Likert scale, with higher scores indicating stronger creative self-efficacy. In the present study, internal consistency was acceptable, with Cronbach\u0026rsquo;s α\u0026thinsp;=\u0026thinsp;.76 at pre-test and .83 at post-test.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec7\" class=\"Section3\"\u003e \u003ch2\u003e2.2.3 Creative Personality\u003c/h2\u003e \u003cp\u003eCreative personality was assessed using the Williams Creativity Aptitude Test (WCAT), originally developed by F. E. Williams and revised for Chinese populations by Lin and Wang (\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e1994\u003c/span\u003e). It consists of 50 items across four dimensions: curiosity, imagination, challenge, and adventure. Items are rated on a 3-point Likert scale, with higher scores reflecting greater creative disposition. In the present study, the scale showed excellent internal consistency (Cronbach\u0026rsquo;s α\u0026thinsp;=\u0026thinsp;.87).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section3\"\u003e \u003ch2\u003e2.2.4 AI Literacy\u003c/h2\u003e \u003cp\u003eAI literacy was measured using the Meta AI Literacy Scale developed by Carolus et al. (\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). The scale consists of 34 items assessing four dimensions: general AI literacy, AI creation, AI self-efficacy, and AI self-competence. Responses were recorded on an 11-point Likert scale (0\u0026thinsp;=\u0026thinsp;strongly disagree to 10\u0026thinsp;=\u0026thinsp;strongly agree), with higher total scores indicating greater overall AI literacy. In the present study, the scale showed excellent internal consistency (Cronbach\u0026rsquo;s α\u0026thinsp;=\u0026thinsp;.93).\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Narrative Creativity Evaluation\u003c/h2\u003e \u003cp\u003eParticipants\u0026rsquo; written narratives were evaluated along four core dimensions of creativity commonly used in verbal and story-based tasks: originality (novelty and uniqueness of ideas), fluency (quantity of ideas), flexibility (diversity of thematic categories), and elaboration (level of descriptive detail and logical structure). Each dimension was rated on a 9-point scale (1\u0026thinsp;=\u0026thinsp;very poor, 9\u0026thinsp;=\u0026thinsp;very good), with higher scores indicating stronger narrative creativity.\u003c/p\u003e \u003cdiv id=\"Sec10\" class=\"Section3\"\u003e \u003ch2\u003e2.3.1 Human evaluation\u003c/h2\u003e \u003cp\u003eTen undergraduate students (4 male, 6 female) from diverse academic backgrounds served as human raters. All had completed coursework in innovation or entrepreneurship and had received university-level awards for creative achievement. Each narrative was independently rated by all ten raters.\u003c/p\u003e \u003cp\u003eTo minimize bias, all narratives were anonymized and presented in randomized order. Raters were blind to experimental conditions and scored narratives using a standardized rubric. Prior to formal evaluation, raters completed a calibration session, which included a review of creativity constructs and operational definitions, refinement of scoring criteria, and practice rating of 15 randomly selected narratives to establish baseline agreement.\u003c/p\u003e \u003cp\u003eInterrater reliability was assessed using the intraclass correlation coefficient (ICC), calculated via a two-way mixed-effects model with absolute agreement and average measures \u0026ndash; appropriate for continuous ratings and multi-rater designs. Pre-evaluation reliability was high (ICC\u0026thinsp;=\u0026thinsp;0.873, 95% CI [0.74, 0.95]), and consistency remained acceptable during formal evaluation (ICC\u0026thinsp;=\u0026thinsp;0.722, 95% CI [0.48, 0.88]).\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section3\"\u003e \u003ch2\u003e2.3.2 AI Evaluation\u003c/h2\u003e \u003cp\u003eTo complement human scoring and assess cross-method reliability, narrative creativity was also evaluated using GenAI tools. Prior research has shown that large language models can produce creativity assessments consistent with expert human ratings across domains including educational writing (Zhang et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2024\u003c/span\u003e, \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), artistic generation (Chen et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), and decision-making tasks (Doshi et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eSeven AI models were initially tested for their ability to follow the narrative scoring protocol. Only DeepSeek-V3 and ChatGPT-4o demonstrated reliable adherence to the scoring instructions. Other models exhibited issues such as incomplete outputs, patterned or biased scoring, irrelevant rationales, or formatting inconsistencies.\u003c/p\u003e \u003cp\u003eTo assess scoring stability, DeepSeek-V3 and ChatGPT-4o re-scored all narratives after a one-week interval using identical prompts. DeepSeek-V3 demonstrated acceptable test-retest reliability (r = .547, p \u0026lt; .001), while ChatGPT-4o produced inconsistent results (r\u0026thinsp;=\u0026thinsp;\u0026minus;\u0026thinsp;.034, p = .610). DeepSeek\u0026rsquo;s scores also correlated moderately with human evaluations - r = .342 (p \u0026lt; .001) for first drafts and r = .501 (p \u0026lt; .001) for final drafts. Based on these findings, DeepSeek-V3 was selected as the primary AI-based evaluator in this study.\u003c/p\u003e \u003c/div\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e2.4 Procedure\u003c/h2\u003e \u003cp\u003eThis study followed a two-stage creative writing design adapted from Gomez et al. (\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2025\u003c/span\u003e) and Zhang and Gosline (\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Each participant completed a first draft and a final draft, with each stage limited to 15 minutes. Participants were randomly assigned to one of four creation modes: (1) AI-first assistance, in which the first draft was completed through human-AI collaboration and the final draft independently; (2) AI-follow assistance, involving independent creation of the first draft followed by human-AI collaboration in the final draft; (3) AI-parallel assistance, where both drafts were developed collaboratively with AI; and (4) Human-only, where both drafts were written independently without AI (serving as the control condition).\u003c/p\u003e \u003cp\u003eDuring AI-assisted stages, participants were allowed to choose their preferred AI tools (e.g., ChatGPT, DeepSeek), send prompts freely, and selectively adopt or reject AI-generated content. This open format was intended to approximate real-world, self-directed co-creation. In the final draft stage, participants were permitted to reference their first draft but were not allowed to modify it directly. Upon completion, both drafts were independently evaluated for creativity by trained human raters and by a validated AI model, with all evaluators blinded to experimental conditions.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e2.5 Statistical Analysis\u003c/h2\u003e \u003cp\u003eChi-square tests and ANOVA were used to test for differences in demographic variables, AI literacy, and creative tendencies among the four groups to determine whether potential confounding variables might interfere with the results.\u003c/p\u003e \u003cp\u003eAdditionally, an ANOVA was performed to compare first-draft creativity scores. Despite random group assignment, significant differences were found at baseline. In non-randomized or observational designs, baseline imbalance in outcome measures is commonly addressed using difference-score models, as conditioning on baseline may introduce bias when baseline values are systematically related to group membership. However, in the present randomized controlled trial, the observed baseline imbalance is most plausibly attributable to sampling variability rather than confounding. Accordingly, a residual (ANCOVA) model was adopted, with post-intervention outcomes analyzed while adjusting for baseline values. This decision was further supported by strong evidence of regression to the mean (i.e., measurement errors in the creativity evaluations), indicated by a significant negative association between baseline scores and change scores. Adjusting for baseline (i.e., a residual model) in this context mitigates regression to the mean and provides more accurate estimates of group effects.\u003c/p\u003e \u003cp\u003ePrior to hypothesis testing, the dataset was screened for missing data and input accuracy. Five participants (4.5%) had missing values. A missing-at-random (MAR) assumption was supported by diagnostic tests. To retain statistical power and maintain balanced group sizes, missing data were addressed using Multiple Imputation by Chained Equations (MICE). Five imputed datasets were generated, yielding a final sample of N\u0026thinsp;=\u0026thinsp;112 for all analyses.\u003c/p\u003e \u003cp\u003eAssumptions of the General Linear Model were then evaluated. Multicollinearity diagnostics showed that all Variance Inflation Factors (VIF) were below 2, indicating no collinearity concerns. Residual normality was verified using Q-Q plots, while homoscedasticity was tested with the Breusch-Pagan test. Sensitivity analyses using Cook\u0026rsquo;s Distance identified and excluded influential outliers (D\u0026thinsp;\u0026gt;\u0026thinsp;4/n) on a model-by-model basis to ensure robustness.\u003c/p\u003e \u003cp\u003eOur residual model was conducted as follows:\u003c/p\u003e \u003cp\u003eYi\u0026thinsp;=\u0026thinsp;β₀ + β₁Group1\u003csub\u003ei\u003c/sub\u003e + β₂Group2\u003csub\u003ei\u003c/sub\u003e + β₃Group3\u003csub\u003ei\u003c/sub\u003e + β₄PreTest\u003csub\u003ei\u003c/sub\u003e + βₖCovariates\u003csub\u003ei\u003c/sub\u003e + ε\u003csub\u003ei\u003c/sub\u003e\u003c/p\u003e \u003cp\u003eIn this model, Yi represents the outcome variable (i.e., the human-rated or AI-rated creativity score). The categorical variable Group was dummy-coded, with the Human-only condition serving as the reference category. Coefficients β₁, β₂, and β₃ reflect the estimated differences in post-test outcomes between each AI-assisted condition and the control group, controlling for pre-test scores (β₄) and covariates (age, gender, AI literacy, and creative personality).\u003c/p\u003e \u003c/div\u003e"},{"header":"3 Results","content":"\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Demographic Variables\u003c/h2\u003e \u003cp\u003eParticipants had a mean age of 19.47 years (SD\u0026thinsp;=\u0026thinsp;1.20), a mean creative personality score of 2.26 (SD\u0026thinsp;=\u0026thinsp;0.19), and a mean AI literacy score of 5.44 (SD\u0026thinsp;=\u0026thinsp;1.28). A chi-square test indicated no significant difference in gender distribution across the four groups, χ\u0026sup2;(3, N\u0026thinsp;=\u0026thinsp;112)\u0026thinsp;=\u0026thinsp;2.14, p = .544. One-way ANOVAs revealed no significant group differences in age, F(3, 108)\u0026thinsp;=\u0026thinsp;1.41, p = .244; creative personality, F(3, 108)\u0026thinsp;=\u0026thinsp;0.08, p = .972; or AI literacy, F(3, 108)\u0026thinsp;=\u0026thinsp;0.73, p = .537. These results indicate that the groups were demographically comparable (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.)\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eDescriptive Statistics of Demographic Variables\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"5\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eGroup\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eProportion Male\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAge\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eCreative Personality\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eAI Literacy\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAI-first\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e19.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2.25\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5.22\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAI-follow\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.68\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e19.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2.27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5.54\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eAI-parallel\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e19.70\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2.27\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5.68\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eHuman-only\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.48\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e19.24\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e2.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e5.30\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"5\"\u003eNote. Values represent group means unless otherwise indicated. Proportion male refers to the percentage of male participants in each condition.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Narrative Creativity: Human-Rated Evaluation\u003c/h2\u003e \u003cp\u003eWe first examined group differences in overall creativity as assessed by human raters. Initial diagnostics identified seven influential cases based on Cook\u0026rsquo;s Distance; these were excluded.\u003c/p\u003e \u003cp\u003eA robust ANCOVA revealed a significant main effect of group, F(3, 96)\u0026thinsp;=\u0026thinsp;9.62, p \u0026lt; .001, η\u0026sup2;ₚ = .38. Planned contrasts indicated that the Human-only group achieved a significantly higher adjusted mean score (M\u0026thinsp;=\u0026thinsp;5.65, SE\u0026thinsp;=\u0026thinsp;0.10) than both the AI-first group (M\u0026thinsp;=\u0026thinsp;4.86, SE\u0026thinsp;=\u0026thinsp;0.10) and the AI-follow group (M\u0026thinsp;=\u0026thinsp;4.81, SE\u0026thinsp;=\u0026thinsp;0.10). The effect sizes were large, with Human-only outperforming AI-first by 0.95 standard deviations (p = .002, d = -0.95) and AI-follow by 1.41 standard deviations (p \u0026lt; .001, d = -1.41). The difference between Human-only and AI-parallel (M\u0026thinsp;=\u0026thinsp;5.17, SE\u0026thinsp;=\u0026thinsp;0.10) was not statistically significant (p = .262, d = -0.33).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Robustness Check: AI-Rated Evaluation\u003c/h2\u003e \u003cp\u003eTo assess the robustness of the findings, we repeated the analysis using creativity scores generated by an AI rater. Diagnostics identified eight influential cases, which were excluded.\u003c/p\u003e \u003cp\u003eThe ANCOVA again showed a significant main effect of group, F(3, 95)\u0026thinsp;=\u0026thinsp;6.12, p \u0026lt; .001, η\u0026sup2;ₚ = .16. Consistent with human ratings, the Human-only group (M\u0026thinsp;=\u0026thinsp;6.40, SE\u0026thinsp;=\u0026thinsp;0.20) scored significantly higher than both the AI-first group (M\u0026thinsp;=\u0026thinsp;5.69, SE\u0026thinsp;=\u0026thinsp;0.18; p = .005, d = -0.78) and the AI-follow group (M\u0026thinsp;=\u0026thinsp;5.61, SE\u0026thinsp;=\u0026thinsp;0.20; p = .017, d = -0.70). No significant difference was observed between Human-only and AI-parallel (M\u0026thinsp;=\u0026thinsp;6.47, SE\u0026thinsp;=\u0026thinsp;0.20; p = .515, d\u0026thinsp;=\u0026thinsp;0.18).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e3.4 Subdomain Analysis of Narrative Creativity\u003c/h2\u003e \u003cp\u003eTo explore the cognitive mechanisms underlying the observed group differences, we conducted separate ANCOVAs for each creativity subdomain (originality, fluency, flexibility, elaboration). Influential cases were removed for each analysis. Omnibus p-values were adjusted using the Holm-Bonferroni method.\u003c/p\u003e \u003cp\u003eThe largest group effect was found in originality, F\u0026thinsp;=\u0026thinsp;7.60, p \u0026lt; .001. The Human-only group significantly outperformed all other groups, with large differences compared to AI-first (p = .006, d = -0.87) and AI-follow (p \u0026lt; .001, d = -1.36), and a medium-to-large difference compared to AI-parallel (p = .036, d = -0.62).\u003c/p\u003e \u003cp\u003eSignificant effects were also observed for fluency (F\u0026thinsp;=\u0026thinsp;5.66, p = .003) and elaboration (F\u0026thinsp;=\u0026thinsp;7.10, p \u0026lt; .001). In both domains, the Human-only group scored significantly higher than AI-first (Fluency: d = -0.61; Elaboration: d = -0.67) and AI-follow (Fluency: d = -1.00; Elaboration: d = -1.24). No significant differences were found between Human-only and AI-parallel in either fluency (p = .867) or elaboration (p = .428).\u003c/p\u003e \u003cp\u003eAlthough the omnibus test for flexibility was significant (F\u0026thinsp;=\u0026thinsp;3.80, p = .013), effect sizes were smaller. The Human-only group performed marginally better than AI-follow (p = .085, d = -0.50), but differences with AI-first (p = .581) and AI-parallel (p = .111) were not statistically significant.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec19\" class=\"Section2\"\u003e \u003ch2\u003e3.5 Creative Self-Efficacy\u003c/h2\u003e \u003cp\u003eTo assess whether changes in objective performance were mirrored by changes in students\u0026rsquo; creative confidence, we analyzed post-test creative self-efficacy scores. After excluding 10 influential cases, the final sample was N\u0026thinsp;=\u0026thinsp;102.\u003c/p\u003e \u003cp\u003eThe ANCOVA revealed no significant main effect of group, F(3, 93)\u0026thinsp;=\u0026thinsp;2.29, p = .083, η\u0026sup2;ₚ = .16. While pre-test self-efficacy strongly predicted post-test scores (p \u0026lt; .001), the experimental conditions did not significantly influence changes in creative self-efficacy. Given the non-significant omnibus result, no further pairwise comparisons were conducted.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4 Discussion","content":"\u003cp\u003eThis study examined how different human-AI collaboration models influence narrative creativity and creative self-efficacy in a two-stage story-writing task. Across both human and AI evaluations, the findings consistently revealed that sequential collaboration, whether in the form of AI-first or AI-follow assistance, undermined creativity relative to human-only creation, while AI-parallel collaboration (real-time co-creation) offered no significant advantage. Furthermore, none of the collaboration models led to improvements in creative self-efficacy.\u003c/p\u003e \u003cp\u003eDPCM explains the creative deficits observed in sequential collaboration, which posits that creativity arises from the interplay of cognitive persistence and cognitive flexibility. In this study, both human and AI evaluations consistently showed that sequential collaboration significantly reduced narrative creativity than human-only creation. This pattern aligns with prior studies across multiple domains, which show that AI-first workflows tend to reduce creators\u0026rsquo; sense of ownership and narrative control (Wu et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), impair content quality and diversity (Hosanagar \u0026amp; Ahn, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), and narrow the scope of exploration by over-relying on early AI-generated ideas (Lee et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). From a DPCM perspective, sequential collaboration disrupts cognitive persistence by forcing process handovers and mode switching, increasing cognitive load, fragmenting attention, and undermining intrinsic motivation (Siddiqui et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). In parallel, it impairs flexibility through mechanisms such as cognitive anchoring in AI-first conditions (Chen \u0026amp; Chan, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), stylistic homogenization in AI-follow outputs (Radwan et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), and algorithm aversion or confirmation bias that cause users to disregard divergent AI suggestions (Hwang et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Together, these disruptions help explain the consistent creativity decrements observed under sequential collaboration.\u003c/p\u003e \u003cp\u003eAlthough real-time AI-parallel collaboration is often presumed to enhance creativity through dynamic feedback and fluid interaction, the present study found no significant improvement in narrative creativity compared to human-only creation. This finding aligns with a growing body of research suggesting that human-AI collaboration, even when synchronous, does not reliably outperform solo human effort in open-ended tasks. Meta-analytic evidence shows that collaboration offers, at best, marginal gains with no consistent advantage over the better-performing party (Vaccaro et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), while experimental work by Luan et al. (\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2025\u003c/span\u003e) revealed that only human-only conditions yielded cumulative increases in creative quality across multiple rounds, with AI-assisted groups showing stagnant performance. These outcomes suggest that greater AI involvement does not inherently translate into greater creative value. The absence of benefit in the AI-parallel condition may reflect added cognitive load: although AI-parallel workflows theoretically promote cognitive flexibility through ongoing idea generation and associative prompts, they also impose continuous demands on users to manage the interaction, for example, formulating prompts, interpreting, and deciding which content to incorporate. These demands can disrupt focus, fragment the creative flow, and diminish cognitive persistence. Moreover, in the absence of structured guidance, users may default to low-effort engagement patterns, such as prompt-copy cycles, resulting in shallow rather than synergistic co-creation.\u003c/p\u003e \u003cp\u003eWhile consistent with prior findings, this study offers several methodological advances. Unlike earlier work that focused on isolated modes, we systematically compared AI-first, AI-follow, AI-parallel, and human-only collaboration within a unified experimental design. Creativity was evaluated using both trained human raters and a validated AI model, ensuring cross-method validation in a domain where assessment is often subjective. Additionally, the use of an open-ended, multi-stage story-writing task provided higher ecological validity than common divergent thinking or short-form tasks, reflecting the complexity of real-world creative expression. These design improvements strengthen the reliability and generalizability of this field.\u003c/p\u003e \u003cp\u003eThese methodological improvements (i.e., direct model comparison, more ecological narrative creativity task, and dual evaluation) help explain inconsistencies in previous findings. Many past studies differed in task structure, evaluation focus, or collaboration timing \u0026ndash; factors that may explain conflicting results. For example, creativity gains observed in structured or short-form tasks (e.g., Guo et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2025\u003c/span\u003e; Huang et al., \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2026\u003c/span\u003e) may reflect the suitability of AI as an optimization tool, rather than a true co-creative partner. Other studies separated human and AI input phases artificially or failed to control for interaction quality and user expertise (e.g., Chen \u0026amp; Chan, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Zhang \u0026amp; Gosline, \u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e2023\u003c/span\u003e), limiting their ecological validity. By comparing all major collaboration modes under naturalistic, open-ended conditions and using multi-source evaluation, the present study offers a more concrete account of when and why AI fails to enhance or even hinders creativity.\u003c/p\u003e \u003cp\u003eThis study found that neither sequential nor parallel human-AI collaboration significantly improved creative self-efficacy compared to human-only creation. While GenAI may reduce entry barriers and offer immediate feedback, these advantages do not necessarily translate into increased confidence in one\u0026rsquo;s creative ability. According to Bandura\u0026rsquo;s (\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e1978\u003c/span\u003e) self-efficacy theory, perceived competence is primarily shaped through successes attributed to one\u0026rsquo;s own efforts. In collaborative settings where AI generates or refines content, creators may perceive themselves as passive contributors rather than agents of their own success. This diluted sense of ownership likely limits the formation of internal attributions necessary to strengthen self-efficacy.\u003c/p\u003e \u003cp\u003eThe findings offer practical implications for GenAI integration in writing instruction. In educational settings, AI may be better suited as a feedback or refinement tool rather than a co-creator. Sequential collaboration, where GenAI intervenes early or late as a creator, was found to impair creativity in this study, suggesting that reliance on GenAI may hinder student agency and originality. Educators should consider structuring GenAI use to support rather than direct the creative process, preserving students\u0026rsquo; ownership and engagement.\u003c/p\u003e \u003cp\u003eThis study has several limitations. First, the sample was restricted to college students, and age has been shown to moderate the effects of GenAI on creativity (Ma et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2025\u003c/span\u003e). Future research could involve more diverse samples. Second, the study focused on a single unstructured narrative creativity task. Future work could compare outcomes across creative types (e.g., artistic vs. problem-solving) and components (e.g., divergent vs. convergent thinking). Third, the current design did not track real-time cognitive and affective changes during collaboration. Future research should explore how GenAI influences users\u0026rsquo; mental states, including motivation and engagement. Lastly, while the study permitted flexible user-AI interactions, it did not control or categorize the specific modes of interaction. As recent work suggests that interaction patterns significantly shape collaboration outcomes (Zhang et al., \u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e2025\u003c/span\u003e), future research should examine how different styles of human-AI engagement moderate creative performance.\u003c/p\u003e \u003cp\u003eThis study provides systematic evidence that the structure of human-AI collaboration significantly shapes creative outcomes. In a multi-stage narrative writing task, sequential collaboration \u0026ndash; whether AI-first or AI-follow \u0026ndash; consistently impaired creativity relative to human-only creation, while real-time AI-parallel collaboration offered no significant advantage. Moreover, none of the collaboration models improved creative self-efficacy. These findings challenge the assumptions that AI assistance reliably enhances creativity across tasks and underscore the importance of collaboration design. By employing a fully crossed experimental framework, ecologically valid task, and multi-source evaluation, this study clarifies when and why GenAI may fail to enhance, or even hinder, creativity. Future work should continue to refine human-AI workflows to ensure that technological support enhances rather than replaces the human creative process.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e \u003ch2\u003eCompeting Interests\u003c/h2\u003e \u003cp\u003eThe authors declare no competing interests relevant to the content of this article.\u003c/p\u003e \u003c/p\u003e \u003cp\u003e \u003cstrong\u003eEthics Approval\u003c/strong\u003e \u003cp\u003e This study was approved by the relevant Institutional Review Board. All procedures were conducted in accordance with the Declaration of Helsinki. All identifying details have been anonymized for blind review.\u003c/p\u003e \u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eH.L. was responsible for conceptualizing the study, designing the research framework, conducting the experiments, performing data analysis, and drafting the manuscript. D.Z.\u0026nbsp;was responsible for designing the experiments, performing data analysis, reviewing and editing the manuscript, and supervising the project. X.A.\u0026nbsp;was responsible for conceptualizing the study and supervising the project. J.X. was responsible for conducting experiments, data curation, visualization, and reviewing and editing the manuscript. All the authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets and materials supporting this study will be made available in the supplementary files accompanying the manuscript. Additional data can be obtained from the corresponding author upon reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eAmabile TM (1983) The social psychology of creativity: A componential conceptualization. J Personal Soc Psychol 45(2):357\u0026ndash;376. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1037/0022-3514.45.2.357\u003c/span\u003e\u003cspan address=\"10.1037/0022-3514.45.2.357\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBalt\u0026agrave;-Salvador R, Bras\u0026oacute;-Vives E, Pe\u0026ntilde;a M (2026) Evaluating AI-assisted creative ideation: A crossover study in higher education. Think Skills Creativity 59:101958. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.tsc.2025.101958\u003c/span\u003e\u003cspan address=\"10.1016/j.tsc.2025.101958\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBandura A (1978) Self-efficacy: Toward a unifying theory of behavioral change. \u003cem\u003eAdvances in Behaviour Research and Therapy, Perceived Self-Efficacy: Analyses of Bandura\u0026rsquo;s Theory of Behavioural Change\u003c/em\u003e, \u003cem\u003e1\u003c/em\u003e(4), 139\u0026ndash;161. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/0146-6402(78)90002-4\u003c/span\u003e\u003cspan address=\"10.1016/0146-6402(78)90002-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBerretta S, Tausch A, Ontrup G, Gilles B, Peifer C, Kluge A (2023) Defining human-AI teaming the human-centered way: A scoping review and network analysis. \u003cem\u003eFrontiers in Artificial Intelligence\u003c/em\u003e, \u003cem\u003e6\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/frai.2023.1250725\u003c/span\u003e\u003cspan address=\"10.3389/frai.2023.1250725\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCarolus A, Koch MJ, Straka S, Latoschik ME, Wienrich C (2023) MAILS - meta AI literacy scale: Development and testing of an AI literacy questionnaire based on well-founded competency models and psychological change- and meta-competencies. Computers Hum Behavior: Artif Hum 1(2):100014. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.chbah.2023.100014\u003c/span\u003e\u003cspan address=\"10.1016/j.chbah.2023.100014\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen J, An J, Lyu H, Kanan C, Luo J (2024) Learning to Evaluate the Artness of AI-Generated Images. IEEE Trans Multimedia 26:10731\u0026ndash;10740. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TMM.2024.3410672\u003c/span\u003e\u003cspan address=\"10.1109/TMM.2024.3410672\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChen Z, Chan J (2024) Large language model in creative work: The role of collaboration modality and user expertise. Manage Sci 70(12):9101\u0026ndash;9117. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1287/mnsc.2023.03014\u003c/span\u003e\u003cspan address=\"10.1287/mnsc.2023.03014\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDoshi AR, Bell JJ, Mirzayev E, Vanneste BS (2025) Generative artificial intelligence and evaluating strategic decisions. Strateg Manag J 46(3):583\u0026ndash;610. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/smj.3677\u003c/span\u003e\u003cspan address=\"10.1002/smj.3677\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFu Y, Newman M, Going L, Feng Q, Lee JH (2025) Exploring the collaborative Co-creation process with AI: A case study in novice music production. \u003cem\u003eProceedings of the 2025 ACM Designing Interactive Systems Conference\u003c/em\u003e, 1298\u0026ndash;1312. DIS \u0026rsquo;25: Designing Interactive Systems Conference. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1145/3715336.3735829\u003c/span\u003e\u003cspan address=\"10.1145/3715336.3735829\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGomez C, Cho SM, Ke S, Huang C-M, Unberath M (2025) Human-AI collaboration is not very collaborative yet: A taxonomy of interaction patterns in AI-assisted decision making from a systematic review. \u003cem\u003eFrontiers in Computer Science\u003c/em\u003e, \u003cem\u003e6\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fcomp.2024.1521066\u003c/span\u003e\u003cspan address=\"10.3389/fcomp.2024.1521066\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuo A, Sathyanarayanan S, Wang L, Heer J, Zhang A (2025) From pen to prompt: How creative writers integrate AI into their writing practice. \u003cem\u003eProceedings of the 2025 Conference on Creativity and Cognition\u003c/em\u003e, 527\u0026ndash;545. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1145/3698061.3726910\u003c/span\u003e\u003cspan address=\"10.1145/3698061.3726910\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHeigl R (2025) Generative artificial intelligence in creative contexts: A systematic review and future research agenda. Manage Rev Q. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11301-025-00494-9\u003c/span\u003e\u003cspan address=\"10.1007/s11301-025-00494-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHolzner N, Maier S, Feuerriegel S (2025) \u003cem\u003eGenerative AI and creativity: A systematic literature review and meta-analysis\u003c/em\u003e (arXiv:2505.17241). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2505.17241\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2505.17241\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. arXiv\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHong SP, Lin SR (2004) Whatever you say, I can do it: Development of the Student Creative Self-Efficacy Scale. In Proceedings of Taiwan\u0026rsquo;s 2nd Symposium on Innovation and Creativity. Taiwan\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHosanagar K, Ahn D (2025) \u003cem\u003eDesigning human and generative AI collaboration\u003c/em\u003e (arXiv:2412.14199). arXiv. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2412.14199\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2412.14199\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang S, Long L, Zhu Y, Zhu JNY (2026) Human\u0026ndash;GenAI collaboration across creative phases: Cognitive mechanisms shaping novelty and usefulness. Int J Inf Manag 86:102986. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ijinfomgt.2025.102986\u003c/span\u003e\u003cspan address=\"10.1016/j.ijinfomgt.2025.102986\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHwang AH-C, Liao QV, Blodgett SL, Olteanu A, Trischler A (2024) \u003cem\u003e\u0026lsquo;It was 80% me, 20% AI\u0026rsquo;: Seeking authenticity in Co-writing with large language models\u003c/em\u003e (arXiv:2411.13032). arXiv. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2411.13032\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2411.13032\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee J, Kim JS, Shin S, You S (2025) The impact of AI-human collaboration models on creativity: A study on search processes and decision-making dynamics. \u003cem\u003eSAIS 2025 Proceedings\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://aisel.aisnet.org/sais2025/20\u003c/span\u003e\u003cspan address=\"https://aisel.aisnet.org/sais2025/20\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLi H, Wang Y, Qu H (2024) Where are we so far? Understanding data storytelling tools from the perspective of human-AI collaboration. \u003cem\u003eProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, CHI \u0026rsquo;24\u003c/em\u003e, 1\u0026ndash;19. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1145/3613904.3642726\u003c/span\u003e\u003cspan address=\"10.1145/3613904.3642726\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLin HT, Wang MR (1994) Creativity Assessment Packet (CAP): Revised Chinese version. Psychology Publishing Company, Ltd., Taipei, Taiwan\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLuan L, Kim YJ, Zhou J (2025) \u003cem\u003eAugmented learning for joint creativity in human-GenAI co-creation\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.17863/CAM.122297\u003c/span\u003e\u003cspan address=\"10.17863/CAM.122297\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMa H, Zhang Y, Shan X, Hu X (2025) Exploring the impact of artificial intelligence on the creativity perception of music practitioners. J Intell 13(4):47. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/jintelligence13040047\u003c/span\u003e\u003cspan address=\"10.3390/jintelligence13040047\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMcGuire J, De Cremer D, Van de Cruys T (2024) Establishing the importance of co-creation and self-efficacy in creative collaboration with artificial intelligence. Sci Rep 14(1):18525. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-024-69423-2\u003c/span\u003e\u003cspan address=\"10.1038/s41598-024-69423-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNijstad BA, Dreu CKWD, Rietzschel EF, Baas M (2010) The dual pathway to creativity model: Creative ideation as a function of flexibility and persistence. \u003cem\u003eEuropean Review of Social Psychology\u003c/em\u003e. (world). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/10463281003765323\u003c/span\u003e\u003cspan address=\"10.1080/10463281003765323\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNiloy AC, Akter S, Sultana N, Sultana J, Rahman SIU (2024) Is chatgpt a menace for creative writing ability? An experiment. J Comput Assist Learn 40(2):919\u0026ndash;930. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/jcal.12929\u003c/span\u003e\u003cspan address=\"10.1111/jcal.12929\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePark MJ (2026) AI as a cognitive collaborator: Assimilation and accommodation in human\u0026ndash;machine teaming for innovation. J Innov Knowl 12:100892. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jik.2025.100892\u003c/span\u003e\u003cspan address=\"10.1016/j.jik.2025.100892\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRadwan AY, Alasmari KM, Abdulbagi OA, Alghamdi EA (2024) \u003cem\u003eSARD: A human-AI collaborative story generation\u003c/em\u003e (arXiv:2403.01575). arXiv. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2403.01575\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2403.01575\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSeli P, Ragnhildstveit A, Orwig W, Bellaiche L, Spooner S, Barr N (2025) Beyond the brush: Human versus artificial intelligence creativity in the realm of generative art. Psychol Aesthet Creativity Arts. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1037/aca0000743\u003c/span\u003e\u003cspan address=\"10.1037/aca0000743\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSiddiqui MN, Pea RD, Subramonyam H (2025) Script\u0026amp;shift: A layered interface paradigm for integrating content development and rhetorical strategy with LLM writing assistants. \u003cem\u003eProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI \u0026rsquo;25\u003c/em\u003e, 1\u0026ndash;19. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1145/3706598.3714119\u003c/span\u003e\u003cspan address=\"10.1145/3706598.3714119\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVaccaro M, Almaatouq A, Malone T (2024) When combinations of humans and AI are useful: A systematic review and meta-analysis. Nat Hum Behav 1\u0026ndash;11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41562-024-02024-1\u003c/span\u003e\u003cspan address=\"10.1038/s41562-024-02024-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang S, Wang F, Zhu Z, Wang J, Tran T, Du Z (2024) Artificial intelligence in education: A systematic literature review. Expert Syst Appl 252:124167. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.eswa.2024.124167\u003c/span\u003e\u003cspan address=\"10.1016/j.eswa.2024.124167\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu S, Liu Y, Ruan M, Chen S, Xie X-Y (2025) Human-generative AI collaboration enhances task performance but undermines human\u0026rsquo;s intrinsic motivation. Sci Rep 15(1):15105. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-025-98385-2\u003c/span\u003e\u003cspan address=\"10.1038/s41598-025-98385-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang D-W, Boey M, Tan YY, Jia AHS (2024) Evaluating large language models for criterion-based grading from agreement to consistency. Npj Sci Learn 9(1):79. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41539-024-00291-1\u003c/span\u003e\u003cspan address=\"10.1038/s41539-024-00291-1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang D-W, Hong X, Qi Y (2025) \u003cem\u003eWhen and how does LLM-generated feedback surpass traditional automated writing evaluation? A learning trajectory analysis of writing improvement\u003c/em\u003e (Fkx3v_v1). PsyArXiv. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.31234/osf.io/fkx3v_v1\u003c/span\u003e\u003cspan address=\"10.31234/osf.io/fkx3v_v1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhang Y, Gosline R (2023) Human favoritism, not AI aversion: People\u0026rsquo;s perceptions (and bias) toward generative AI, human experts, and human\u0026ndash;GAI collaboration in persuasive content generation. Judgm Decis Mak 18:e41. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1017/jdm.2023.37\u003c/span\u003e\u003cspan address=\"10.1017/jdm.2023.37\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao M, Simmons R, Admoni H (2025) The role of adaptation in collective human\u0026ndash;AI teaming. Top Cogn Sci 17(2):291\u0026ndash;323. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/tops.12633\u003c/span\u003e\u003cspan address=\"10.1111/tops.12633\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhou E, Lee D (2024) Generative artificial intelligence, human creativity, and art. PNAS Nexus 3(3):pgae052. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/pnasnexus/pgae052\u003c/span\u003e\u003cspan address=\"10.1093/pnasnexus/pgae052\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZou HP, Huang W-C, Wu Y, Chen Y, Miao C, Nguyen H, Zhou Y, Zhang W, Fang L, He L, Li Y, Li D, Jiang R, Liu X, Yu PS (2025) \u003cem\u003eLLM-based human-agent collaboration and interaction systems: A survey\u003c/em\u003e (arXiv:2505.00753). arXiv. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2505.00753\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2505.00753\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"humanities-and-social-sciences-communications","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"palcomms","sideBox":"Learn more about [Humanities \u0026 Social Sciences Communications](http://www.nature.com/palcomms/)","snPcode":"41599","submissionUrl":"https://submission.springernature.com/new-submission/41599/3","title":"Humanities and Social Sciences Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Nature AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Generative AI, Human-AI collaboration, sequential collaboration, synchronous collaboration, narrative creativity, creative self-efficacy","lastPublishedDoi":"10.21203/rs.3.rs-8774138/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8774138/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe rise of generative AI (GenAI) has sparked interest in whether human-AI collaboration can enhance creativity, yet empirical findings remain mixed. This study systematically compared 4 collaboration models (i.e., AI-first, AI-follow, AI-parallel, and human-only) using a randomized controlled experiment involving 112 college students. Participants completed a two-stage narrative story-writing task. Creativity was assessed by both trained human raters (N\u0026thinsp;=\u0026thinsp;10) and a validated AI model, while creative self-efficacy was measured before and after the task. A residualized linear model was used to control baseline differences and covariates. Results showed that sequential collaboration models (i.e., AI-first and AI-follow) significantly impaired narrative creativity relative to human-only creation, and AI-parallel collaboration did not improve creativity. None of the models improved creative self-efficacy. These findings suggest that GenAI may even disrupt key cognitive mechanisms underlying narrative creativity. Using a rigorous research design, we examine how different models of human-AI collaboration shape creativity. Implications for human-AI co-creation in education and design are discussed.\u003c/p\u003e","manuscriptTitle":"Sequential Human-AI Collaboration Impairs Narrative Creativity in University Students: A Randomized Controlled Trial","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-05 09:20:21","doi":"10.21203/rs.3.rs-8774138/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-02-10T11:34:19+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-02-10T11:22:11+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-02-07T11:20:07+00:00","index":"","fulltext":""},{"type":"submitted","content":"Humanities and Social Sciences Communications","date":"2026-02-03T09:13:49+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"humanities-and-social-sciences-communications","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"palcomms","sideBox":"Learn more about [Humanities \u0026 Social Sciences Communications](http://www.nature.com/palcomms/)","snPcode":"41599","submissionUrl":"https://submission.springernature.com/new-submission/41599/3","title":"Humanities and Social Sciences Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Nature AJ","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"ae819a1f-c90e-4185-93e6-c8c8b7ee2aaa","owner":[],"postedDate":"February 5th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":62228102,"name":"Physical sciences/Mathematics and computing"},{"id":62228103,"name":"Biological sciences/Neuroscience"},{"id":62228104,"name":"Biological sciences/Psychology"},{"id":62228105,"name":"Social science/Psychology"}],"tags":[],"updatedAt":"2026-02-12T01:08:29+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-05 09:20:21","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8774138","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8774138","identity":"rs-8774138","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.