AI-Driven Conversion of Textbooks into Long-Form Educational Videos

doi:10.21203/rs.3.rs-6379057/v1

AI-Driven Conversion of Textbooks into Long-Form Educational Videos

2025 · doi:10.21203/rs.3.rs-6379057/v1

preprint OA: closed

Full text JSON View at publisher

Full text 116,714 characters · extracted from preprint-html · click to expand

AI-Driven Conversion of Textbooks into Long-Form Educational Videos | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article AI-Driven Conversion of Textbooks into Long-Form Educational Videos Martin Yanev, Jurgen Kola This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6379057/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The integration of AI into educational content creation offers transformative opportunities for dynamic and interactive learning. VidAIQ, an innovative AI application, that converts static book content into long-form educational videos using OpenAI's APIs for audio, text, and multimedia processing. The system leverages Python libraries such as MoviePy for video editing while using AI Assistants to structure text into video scripts, generate visuals with DALL-E and Vision AI, and create live coding demonstrations. These elements are compiled into structured, engaging tutorials that enhance knowledge retention. Our analysis showed that VidAIQ outperformed traditional human-made content, reducing production time from 24 hours to 3.75 hours per four-hour video tutorial—a sixfold efficiency boost. AI-generated content cost just $ 5.04 per video, compared to $ 192 for human-produced tutorials. Engagement metrics collected over 30 days showed that AI-driven videos maintained similar viewership but with higher watch times, indicating slightly higher engagement. User preferences were leaning towards AI-generated content, with a higher like-to-view ratio of 5.61%, outperforming both the 4.85% for human videos and the industry average of 2.28%. These results demonstrate AI’s ability to create compelling educational experiences with similar or higher overall performance compared to human-generated video content. Future research is needed to explore performance trends over extended timeframes. This study demonstrates how AI-driven technologies can enhance the speed and scale of producing high-quality tutorials without compromising engagement or educational effectiveness. It achieves a remarkable balance of cost-effectiveness, allowing educational resources to be scaled to meet diverse learning needs. Artificial Intelligence and Machine Learning Software Engineering AI video production AI-driven learning Python MoviePy AI Assistants DALL-E Vision AI educational video automation EdTech Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 Figure 13 Figure 14 Figure 15 Figure 16 Figure 17 Introduction The impact of Artificial Intelligence on reshaping education in the digital age continues to grow with every new technological advancement. Traditional educational paradigms, long reliant on static modes of information dissemination, are being transformed by technological innovations that promise to make learning more interactive, personalized, and widely accessible (Rana AlShaikh, 2024). Turning textual material into dynamic videos that combine AI-generated voice, visuals, and interactive elements is part of this shift. This not only aligns with the need for varied learning tools that cater to diverse cognitive styles but also taps into the vast potential of AI to synthesize complex information efficiently (Pellas, 2023). The growing prevalence of AI-enhanced video content production is grounded in the principles of the Cognitive Theory of Multimedia Learning (CTML), which underscores the benefits of integrating multiple forms of media in promoting better learning. As this theory suggests, learners can derive greater meaning when presented with a combination of text and imagery—a methodology that is inherently supportive of video-based learning formats outcomes (Rana AlShaikh, 2024). Despite these advantages, there remains a significant gap in harnessing the full potential of AI to convert static educational resources into engaging video content. This is particularly pertinent given the underutilization of existing online educational videos, despite their widespread availability outcomes (Rana AlShaikh, 2024). Addressing this gap involves advancing AI technologies to not only automate video generation processes but also ensure they are contextually relevant and pedagogically effective. Moreover, the deployment of technologies such as Large Language Models (LLMs) and AI-based content transformation systems has illuminated new possibilities for educational innovation. These technologies can automate the meticulous processes of content scanning and transformation, letting educators and learners focus on engagement and comprehension rather than mere content consumption (Pengyuan Zhou, 2024). This study leverages publicly available OpenAI APIs to develop an innovative application, called vidAIQ , capable of generating AI-driven educational video tutorials. The application is designed to produce long-form tutorials exceeding four hours in duration, aligning with the structure of traditional online courses. Implemented in Python, the system integrates Large Language Models (LLMs), vision models, and text-to-speech (TTS) technology, utilizing established libraries such as MoviePy to transform written content into engaging educational videos. The generated videos incorporate a structured three-component design, combining slides, graphical illustrations, and real-time code demonstrations. Additionally, we ensure dynamic transitions and variations in visual elements to maintain viewer engagement. To evaluate the effectiveness of AI-generated educational videos, we conducted comparative analysis against human-produced long-format courses. Key engagement metrics, including view count, watch time, and revenue generation, were analyzed to assess the viability of AI-generated content in comparison to traditional instructional videos. This assessment provides insights into the potential of AI-powered educational content and its ability to compete with manually produced learning materials. Related work Educational content delivery has been significantly impacted by the integration of artificial intelligence, particularly in the creation of video materials. One study highlights the shift towards using AI-generated content in learning environments, finding that AI tools can produce video content that rivals human-made videos in terms of educational outcomes. This research also emphasizes the cost-effectiveness and rapid production capabilities of AI systems, suggesting a trend towards their increased adoption in digital education settings (Torbjørn Netland, 2024). Another critical evaluation compared the use of AI-generated versus paper-based educational materials. It focused on the respective effects of these media on learning outcomes. The findings underscore that while learners responded positively to video content for engagement and knowledge retention, there was no significant variance in comprehension when compared to traditional materials. This reveals how AI-generated content, while novel and engaging, might require further refinement to fully replace established educational materials (Yidi Zhang, 2024). Moreover, studies point out that AI can synthesize various forms of media—such as text, visuals, and audio—into cohesive learning materials that cater to diverse learning preferences. This approach not only supports improved learning experiences but also democratizes access to high-quality educational resources. Advanced AI systems can personalize these experiences, tailoring content to meet individual learning needs more effectively than traditional methods can (Torbjørn Netland, 2024). The effectiveness of AI in educational content delivery has also been studied extensively in the context of recommendation systems, particularly in platforms like YouTube, which use AI to tailor educational videos based on user engagement and behaviors. This personalized approach not only enhances the user's content experience but also aligns educational delivery with learners’ unique needs, thereby supporting diverse learning journeys across global platforms (Faycal Farhi, 2022). This reflects a broader trend in educational technology, wherein AI acts as a catalyst for personalized learning, offering tailored educational experiences that were previously unattainable with traditional static learning modes (Faycal Farhi, 2022). Studies indicate that AI-based systems can improve educational experiences by adapting to unique student needs, boosting engagement and effectiveness. Moreover, these adaptive systems leverage AI algorithms to create learning pathways that enhance the pedagogical process, a trend that is increasingly mirrored across various educational platforms (Aymane Ezzaim, 2022). AI-driven technologies like machine learning and deep learning can improve performance by incorporating multi-layered data for video pattern recognition and information classification over time (A.Jayanthiladevi, 2020). The “teachbot” project at the University of Edinburgh demonstrated how AI can effectively mediate learning in massive open online courses (MOOCs). It indicated a paradigm where automation is utilized not simply for substituting teacher presence but for enhancing pedagogical processes through interactive, algorithm-driven instruction Designed as an automated agent within MOOCs, teacherbots embody the fusion of human and technological interactions, serving as co-tutors and prompting profound critical discourse among students about the pedagogical value of AI interventions in education (Bayne, 2015). Deep learning approaches, such as Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), and Recurrent Neural Networks (RNNs), have become pivotal in addressing challenges like human action recognition, object detection, and behavior analysis (Vijeta Sharma, 2019). Moreover, artificial Intelligence in Education (AIEd) has provides personalized pathways through the integration of adaptive systems, big data analytics, and intelligent algorithms. AIEd leverages dynamic modeling to create individualized learning profiles, enabling the delivery of tailored educational materials that cater to the strengths, weaknesses, and preferences of each learner. Intelligent tutoring systems and virtual environments further enhance this personalized approach by providing real-time feedback and engaging, immersive experiences. (Olga Tapalova, 2022). AI technologies have revolutionized content creation in the educational domain by enabling the automated generation of high-quality and contextually relevant materials. For instance, OpenAI’s suite of APIs, including DALL-E and ChatGPT, offers powerful tools for creating visually rich and interactive educational content. DALL-E specializes in generating detailed images based on text prompts, making it particularly valuable for designing visual aids in presentations and instructional materials. By integrating these APIs with frameworks like Python's PPTX library, developers can automate the creation of customized educational resources, such as slideshows and video assets, tailored to specific learning outcomes. These advancements simplify the production process and enhance the accessibility and scalability of personalized learning experiences (Yanev, 2024). METHODOLOGY & IMPLEMENTATION This section outlines the methodology used in the development and implementation of VidAIQ. It details the processes involved in converting static educational texts into AI-generated videos, focusing on the application design, information extraction, and video generation techniques employed to enhance the learning experience. A. Application Design VidAIQ is conceptualized as a transformative tool that converts static educational text into dynamic, AI-generated videos, enhancing the learning experience through interactive and modern teaching methodologies. The application is structured in a way that integrates AI and multimedia processing to produce comprehensive video tutorials. Fig. 1 presents the user interface of the developed system. The primary functionality allows users to browse their database for a book and specify the desired number of videos to be generated. The processing time of the AI engine increases with the number of selected videos. Each generated video has a default duration of approximately 10 minutes. Two supplementary parameters control the video's structure and presentation style. The video type parameter enables users to choose between generating a single, continuous tutorial or a series of separate course modules. The total duration of the generated tutorial is directly proportional to the number of selected videos. For example, selecting 50 videos results in approximately 500 minutes (over eight hours) of AI-generated instructional content. The teaching style parameter adjusts the AI-generated narration, allowing users to select between a formal, lecture-style delivery similar to a professor addressing students or a more casual, YouTube-style tutorial format. Upon selecting the "Generate Video" button, the system processes the input book and parameters through its backend components. The main architecture of the application is illustrated in Fig. 2. The book, along with user-defined parameters, is first handled by the Book Reader module. Given the computational limitations of AI models in processing large documents in a structured manner, the system segments the book into smaller sections based on the number of chapters and headings. Our experimental results indicate that breaking the document into smaller units enhances the AI’s ability to extract and organize relevant information, which is critical for generating coherent educational content. Additionally, the Figure Extraction Module identifies and extracts all figures from the book, associates them with their respective chapter numbers and pages, and stores them in a designated directory. Once the book has been segmented, these sections are sequentially processed by AI modules. The Script Extractor Module converts the text into a structured video script using an AI Assistant, ensuring the content aligns with an instructional format. For technical books, particularly those related to programming and software engineering, the Code Extractor Module retrieves relevant code snippets and reformats them using a large language model (LLM) to ensure clarity and correctness. All extracted components are then organized into a structured Python dictionary, facilitating efficient retrieval and processing. Each video segment is generated iteratively by passing the extracted data to the Video Components Builder modules. These modules process the text and figures to generate AI-driven voiceovers, slides, code demonstrations, and figure presentations. Integrated AI APIs, including LLMs and Vision models, such as OpenAI gpt-4o-mini, facilitate the generation of these elements. Once all components for a given video are prepared, they are assembled using the Video Assembler Module , which compiles the final educational video. The process is repeated iteratively for each video in the series: components are generated, compiled into a video, then cleared before proceeding to the next segment, optimizing resource management and execution efficiency. B. Book Segmentation In the early stages of the project, we initially explored the approach of using the entire book as input and instructing the AI models to generate video titles based on the user-specified number of videos. The objective was to assign each generated title to a specific range of pages within the book. However, this approach proved inefficient, as the AI struggled to maintain a coherent sequence of pages for each assigned title. In educational video production, it is essential to ensure that the content is organized in a logical progression, from foundational concepts to more advanced topics. If we had followed this initial strategy, the resulting videos could have drawn content from disparate sections of the book or potentially overlapped, undermining the continuity and structure required for effective learning. The inefficiencies observed in our initial approach stemmed from intrinsic limitations in how the AI model processes and retrieves information, particularly when interfaced with a vector store. As depicted in Fig. 3, book data is transformed into a complex web of vector embeddings stored within a vector database. This transformation prioritizes semantic similarity over sequential accuracy, which inadvertently complicates the alignment of video titles with specific book sections. The vector store retrieves information based on proximity in the semantic space rather than the physical location within the text, leading to potential overlaps or disjointed sequencing in content allocation. Such issues highlight a critical mismatch between the AI’s semantic retrieval abilities and the need for a linear, pedagogical structure in educational video creation. The figure illustrates this process, emphasizing the gap between the need for maintaining educational flow and the semantic-driven design of vector storage, which is not inherently equipped to handle sequential integrity. This necessitates a more refined strategy that incorporates pre-structuring and hierarchical retrieval to ensure that the developed educational content maintains coherence and educational value. Given the limitations of the initial approach, we opted to pre-segment the book into logical sections prior to processing it with the AI system. To maintain both topic consistency and coherence, and to avoid splitting content in the middle of a topic, we implemented a compromise by dividing the book based on its chapters and titles, rather than segmenting it purely by page count for each video. The book was thus organized into discrete sections, with a function designed to evenly allocate the required number of videos across these sections. This approach minimized potential coherence errors and ensured the integrity of the topics within each video, facilitating a more structured and consistent educational experience. Table 1 : Distribution of Generated Videos Across Book Segments Chapter Book Segment (Chapter/Title) 1 Chapter 1: Introduction to AI 2 Chapter 1: Introduction to AI 3 Chapter 2: Machine Learning Basics 4 Chapter 2: Machine Learning Basics 5 Chapter 3: Neural Networks 6 Chapter 4: Deep Learning for Computer Vision To illustrate the relationship between the generated videos and the book segments, we present an example of how the videos are distributed across different sections of the book. As shown in Table 1 given that the total number of videos (6) exceeds the number of sections (4), the videos are distributed in a manner that ensures each section is equally represented. Initially one video is assigned to every section. The process is then repeated, with each section receiving a second video until the complete list of videos is consumed. This distribution strategy ensures balanced coverage of all sections, while preserving a logical flow throughout the content. C. The AI Assistant The AI assistant serves as the core component of our application, enabling efficient extraction of structured information from books. Unlike conventional request-response interactions with AI models, AI assistants are advanced software-driven systems capable of querying text documents while providing precise references to extracted information. In our implementation, we utilize OpenAI’s gpt-4o-mini assistant to process book chapters and extract essential components for video generation. The assistant operates on the OpenAI cloud, allowing multiple queries throughout the video production pipeline. To facilitate seamless interaction, we encapsulated the assistant's functionality within a dedicated function, read_book() , which dynamically receives specific instructions to extract relevant data. This modular design ensures flexibility, allowing the assistant to adapt to different extraction tasks based on user-defined input parameters. The process of creating and utilizing the AI Assistant involves the following steps: Assistant Creation: If an assistant is not yet created, it is instantiated with the instruction: "You are an expert at extracting information from books." Book Upload and Vectorization: The book is uploaded, and a vector store is generated to enable efficient querying. Query Execution: The assistant is provided with specific instructions to extract the required information from the vector storage. Assistant Deletion: Once the required information was extracted and recorded, the assistant for the book segment was deleted. To enhance adaptability, we introduced a novel Instruction Building Function methodology, which dynamically incorporates custom variables into the assistant’s queries. These functions generate instructions based on specific video generation requirements, ensuring that the assistant retrieves content relevant to each video segment. For instance, as demonstrated in Fig. 4, an instruction-building function can take a video title as input, construct a structured query for generating a corresponding video script, and subsequently send it to the assistant. This approach enables precise content extraction for the needs of AI-generated educational videos, improving the relevance and coherence of the output. D. Data Structures As illustrated in Fig. 2, VidAIQ leverages AI assistants to extract relevant information from book segments and structure it into a Python dictionary. Fig. 5 presents the structured data frame used to organize all video-related information. Each video is assigned a unique title based on its topic and includes a reference to the corresponding book segment from which the content is derived. Since each user-requested video consists of multiple scenes, the dictionary is structured to reflect this hierarchy. Every video entry contains a transcript dictionary, where scenes are identified by their respective scene numbers. Each scene includes a script field, which serves as the basis for audio narration during video generation. To further enhance customization, when generating the script , we pass the user’s preferred teaching style into the request prompt. Additionally, to accommodate educational content that includes programming examples, we incorporate a code field that captures relevant snippets, if present. The language field specifies the programming language associated with the code. Both script and code are stored as lists to maintain the correct association between explanatory text and its corresponding code snippet. This structured format ensures a seamless synthesis of narration and code presentation within the generated video content. When working with AI-generated content, it's important to distinguish between explanatory text and code snippets. To achieve this, we use a structured approach to extract and format both elements separately. The method relies on detecting code blocks enclosed within triple backticks (```), ensuring that code is properly separated from textual explanations. The process involves scanning the response using a pattern that identifies these backticks and extracting the enclosed code. The remaining text is then split accordingly to ensure that explanatory sections remain distinct from the code. Additionally, if any code blocks are missing backticks, the method attempts to reconstruct the missing sections by analyzing the surrounding text. By implementing this structured extraction method, we ensure that AI-generated content can be properly separated. E. Video Elements Generation Once all relevant information is stored in our structured dictionary, we utilize it to generate the necessary video components, which are subsequently assembled into a complete video. 1) Audio Generation To transform the generated video script into high-quality narration, we employ OpenAI’s Text-to-Speech (TTS) model, which converts text into an MP3 audio file. The model provides a selection of natural-sounding voices, allowing us to specify a preferred voice during the request. Audio files are generated interactively for each video scene, ensuring synchronization with the scene’s content. To maintain organization and facilitate seamless assembly, the MP3 filenames include both the scene number and the video title, enabling precise identification of each audio segment. 2) Slides Generation To enhance engagement and maintain visual diversity, we designed a collection of over 30 high-resolution slide templates for our educational videos. Each template follows a unique design to ensure variety throughout the video. To populate these slides with scene-specific content, we developed a Python function that leverages AI to generate key slide elements, including titles, bullet points, and structured text. The scene script serves as input, and AI generates the appropriate slide components. To facilitate automated extraction of these elements from the AI-generated response, we implemented a structured format where each element is prefixed with a special symbol (e.g., “~”), as illustrated in Fig. 6. In addition, we wanted to ensure that our slides included relevant images to enhance the visual appeal and clarity of the videos. To achieve this, we employed two primary approaches: AI-generated images using DALL-E and an icon search tool called Freepik . DALL-E is an advanced AI model developed by OpenAI that generates high-quality images based on textual descriptions. Freepik, on the other hand, is a widely used platform that provides access to a vast collection of high-quality icons, vector graphics, and stock images. Through its API, we can search for and integrate relevant icons and illustrations that complement the generated images and enhance the slides. To determine the most appropriate image for each scene, we utilized a LLM to generate relevant search terms based on the scene script. This allowed us to automate the process of selecting appropriate keywords for both AI-generated images and icon retrieval. As an output, we received direct links to the most relevant images and icons, ensuring that every visual element in the slides aligns with the content of the video scene. After extracting the slide metadata using the methodologies described above, we employed the Python library Pillow to render the content onto the predefined slide templates, as illustrated in Fig. 7. Given that each slide template contained distinct text and graphical elements, we developed a dedicated Python function to dynamically position these elements based on the template structure. While the selection of slide templates was randomized to introduce variety, we implemented a tracking mechanism to ensure that no template was repeated within the same video, maintaining visual diversity and coherence. 3) Figures Selection To enhance the visual component of our application, we incorporate figures and images using two primary approaches: Local images extracted from the book segment Web images retrieved using the Google Search API We prioritize local images from the book segment, as they are directly relevant to the content. However, in cases where a book contains few or no images, we supplement the visuals with web-sourced images. To achieve this, we utilize Google Search API to retrieve images based on a keyword that is dynamically generated using AI. This AI-driven keyword selection ensures that the retrieved images align with the specific scene’s topic. Once an image is selected, it is resized and formatted to fit within the video. Fig. 8 demonstrates how we leverage the OpenAI Vision API to refine the spoken script accompanying displayed figures. Initially, the scene’s script is generated based solely on the topic and book content, without accounting for specific figures. This can lead to a disconnect between the narration and the visual elements, as viewers expect a direct explanation of any displayed figures. To address this, we integrate the OpenAI Vision API, which processes the selected figure and generates an enhanced script tailored to both the scene’s topic and the figure’s content. This approach ensures that when a figure is shown, the narration dynamically adapts to describe and contextualize it within the broader educational content. Our findings indicate that this significantly improves the coherence and engagement of the video, making the instructional material more informative and visually aligned. 4) Code Display Many educational books include programming code, making it essential to incorporate code demonstrations in video content. Rather than displaying static code snippets, we chose to dynamically type the code on the screen, mimicking the way an instructor would type during a live programming tutorial. This approach enhances engagement and maintains viewer interest. Fig. 9 depicts the pseudocode design used to build our code display. To optimize video performance while preserving the typing effect, we analyzed the ideal frame rate required for smooth visual representation. We determined that 12 frames per second (fps) was the optimal rate, ensuring a realistic typing experience without excessive computational overhead. To maintain readability and visual coherence, our approach involves automatic text wrapping, font size adjustments, and structured text placement. The text wrapping functionality ensures that no line exceeds 80 characters, preserving readability across different screen sizes. Additionally, we dynamically adjust font size based on the length of the displayed code, ensuring that shorter code blocks appear larger for better visibility while longer code segments remain within readable constraints. Fig. 10 depicts the code writing display. In each code display we extract the programming language directly from the video transcript dictionary and display it prominently at the top of the screen. The code is then typed line by line, with individual characters appearing sequentially. Line numbers are displayed alongside the code to facilitate reference, and a vertical separator is included to visually distinguish the line numbers from the code. To reinforce realism, a subtle typing sound effect is synchronized with the text appearance, beginning after the introductory delay and ending before the final silent display. F. Video Rendering and Compilation Once all video components, including slides, audio, code displays, and figures, are generated and organized, they are assembled into a complete video. The assembly is performed using MoviePy, a widely adopted Python library for video processing, enabling efficient handling of media files and merging of components. Each scene’s visual and audio elements are carefully aligned based on predefined metadata, ensuring accurate timing and smooth transitions throughout the video. To maintain consistency in organization, each visual and audio component is labeled using a structured naming convention that includes the scene number and video title. This approach facilitates automated retrieval and systematic sequencing of all resources. The rendering process begins with an introductory image, which serves as an initial frame, followed by the sequential integration of scene-specific elements. As shown on Fig. 11 each scene, the corresponding visual content—whether a static slide, dynamically generated code display, or pre-rendered animation—is matched with the associated audio narration. If the visual component is a static image, it is processed as a still frame with a duration equivalent to the scene's audio length. In cases where the visual component is an animated segment, such as a code display with a typing effect, it is directly merged with the corresponding audio track. The compilation process leverages multi-threaded processing to enhance performance and minimize rendering time. Video segments are concatenated using a frame rate of 12 frames per second (fps), which was determined to be optimal for preserving smooth visual transitions while maintaining computational efficiency. The final video is encoded using the H.264 codec for high-quality compression, ensuring compatibility across various playback platforms. Upon completion of the rendering process, temporary video components are deleted to optimize storage usage, and the workflow proceeds to the generation of the next video. After the videos are created, the system automatically generates a text file containing the video description, title, and timestamps. The AI-driven process produces a description and title that optimizes the content's discoverability. Timestamps are added to help viewers easily navigate to specific sections of the video. The text file with these details is then saved for easy use when uploading the video to platforms like YouTube. G. User Interface The user interface of the application is developed using Flask, a lightweight Python web framework, which facilitates the integration of back-end logic and front-end elements, as illustrated on Fig. 1. The main entry point for users is an index page, which presents a simple form. This form allows users to upload files, select a teaching style, choose a video type, and specify the number of videos to generate from the uploaded content. Upon successful file upload, the process route handles the form submission and invokes the necessary back-end logic. The uploaded file is saved in the designated folder, and user selections such as the number of videos, teaching style, and video type are retrieved from the form. These inputs are then passed to the back-end, which processes the content based on the user's preferences. RESULTS AND DISCUSSION To assess the effectiveness of AI-generated video tutorials, we conducted a comparative analysis against human-produced educational content. The evaluation considered key performance metrics, including engagement, view count, watch time, and revenue generated from viewership. Additionally, we examined the content generation process in terms of production time for a four-hour tutorial, video quality, slide variability, and the total AI resource cost associated with generating a tutorial. A. Efficiency in Content Generation To quantify the efficiency gains of AI-driven video creation, we performed an experiment in which an experienced Python instructor was tasked with developing a four-hour tutorial. This process involved creating slides, recording the video content, editing, and rendering the final tutorial. The total time required for the instructor to complete the tutorial was approximately 24 hours. In contrast, VidAIQ, our AI-powered software, generated a comparable tutorial—including similar coding examples, slides, and figures—in only 3.75 hours. This result indicates a significant improvement, reducing content production time by a factor of six, as illustrated in Fig. 12. To further validate these findings, Fig. 13 depicts an additional 11 video tutorials we generated, each ranging from 3.6 to 4.4 hours in duration. The average processing time across all tutorials was approximately 3.64 hours, closely aligning with the results obtained in the initial experiment. This consistency demonstrates the reliability of the AI-driven approach in significantly reducing the time required for educational content creation while maintaining structured and comprehensive video output. B. Cost Analysis of AI-Generated Tutorials In addition to efficiency improvements, we analyzed the cost associated with generating AI-powered tutorials. Since VidAIQ integrates multiple AI models, we assessed the individual cost contributions of each component, as shown in Fig. 14. Among these, the text-to-speech model incurred the highest cost, as it was responsible for generating the audio narration. In contrast, the vision model, which was used to create textual descriptions for figures, had a significantly lower cost. The GPT-4o-mini model, which facilitated content extraction from books and text transformations, also accounted for relatively low cost. The average cost breakdown for a four-hour tutorial was as follows: the vision model averaged $0.22, indicating that image analysis is computationally inexpensive. The GPT-4o-mini model incurred an average cost of $0.53, while the text-to-speech model had the highest cost at $4.29. This resulted in a total AI resource cost of $5.04 per four-hour tutorial. For comparison, based on salary data (ZipRecruiter, 2025) the minimum hourly wage for an inexperienced video editor is approximately $8 per hour in 2025. Given that human content creation required 24 hours, the labor cost for an equivalent tutorial would be at least $192, making the AI-generated alternative approximately 38 times more cost-efficient. This substantial reduction in cost highlights the economic viability of AI-driven educational content creation, making high-quality tutorial production more accessible and scalable. C. Video Engagement To evaluate the engagement levels of AI-generated video tutorials in comparison to human-produced tutorials, we conducted an empirical study on YouTube, a widely used video-sharing platform. The primary objective was to assess whether AI-generated content could achieve similar or superior engagement levels relative to human-created tutorials. To this end, we uploaded full-length video tutorials over the course of one month and systematically analyzed key engagement metrics, including view count, watch time, revenue generation, and audience feedback. Our analysis focused on the first 10 days after publication to establish an initial comparison between AI and human-generated content. Fig. 15 presents the comparative view count trends for two AI-generated and two human-generated tutorials. While a noticeable spike in viewership was observed for human-generated videos on the third day, the overall trend remained similar for both categories. The data indicates no statistically significant difference in the number of views between AI-generated and human-produced tutorials, suggesting that the audience was equally receptive to both content types. This finding implies that AI-generated educational videos do not suffer from an inherent disadvantage in terms of attracting viewers when compared to human-created tutorials. Watch time is a critical metric that reflects user engagement and retention on a given video. As illustrated in Fig. 16, the difference in average watch time between AI-generated and human-generated tutorials was marginal. Interestingly, the AI-generated tutorials exhibited a slightly higher average watch time than their human-generated counterparts. While the difference was not statistically significant, it suggests that AI-generated content maintains comparable, if not superior, engagement levels. A potential explanation for this result is the structured and concise nature of AI-generated content. Unlike human-created tutorials, which may exhibit variations in pacing, audio quality, and instructional clarity, AI-generated videos maintain a consistent level of quality throughout. The automated production process ensures uniform slide formatting, refined explanations, and a standardized audio narration, which may contribute to sustained viewer engagement despite the absence of a human presenter. The revenue generated through YouTube AdSense was also analyzed as an indirect measure of content performance. As shown in Fig. 17, AI-generated tutorials consistently outperformed human-generated videos in terms of revenue generation during the first 10 days after publication. On average, AI-generated videos earned $1.75, nearly five times higher than the $0.45 generated by human-created tutorials. This result was unexpected, given that the viewership levels were similar between the two categories. One possible explanation is that factors such as watch time, audience retention, and ad placement may have contributed to the revenue disparity. Further research is required to isolate the precise factors influencing the observed increase in revenue from AI-generated tutorials. To assess user preference, we analyzed the like-to-view ratio, a widely recognized engagement metric that measures the proportion of viewers who actively liked a video. According to industry data, the average like-to-view ratio on YouTube across 116 million videos is 2.28% (Marketing Charts, 2020). For human-generated tutorials on our test channel, the average like-to-view ratio was 4.85%, whereas AI-generated tutorials achieved a higher ratio of 5.61%. This finding suggests that AI-generated tutorials were not only well-received but also exhibited a higher user engagement rate than both the channel's human-created tutorials and the general YouTube benchmark. The increased like-to-view ratio further reinforces the hypothesis that AI-generated content maintains a comparable, level of audience approval relative to traditional human-generated tutorials. Conclusion In this study, we have demonstrated the potential of AI-driven educational content creation through our pioneering tool, VidAIQ. Our findings highlight the impressive efficiency and cost-effectiveness of AI-based solutions for generating educational video tutorials, capable of surpassing traditional human-driven methods. VidAIQ's integration of AI technologies, such as text-to-speech, large language models, and vision models, enables a substantial reduction in content production time—from 24 hours of manual effort to just 3.75 hours—while maintaining high-quality outputs. The evaluation of AI-generated tutorials, compared to human-produced content, reveals that AI can achieve comparable levels of viewer engagement and revenue generation. Contrary to expectations, AI-produced content garnered a 38-fold decrease in production costs and maintained, if not exceeded, engagement metrics, including view counts and watch times. Furthermore, the higher like-to-view ratios observed in AI-generated videos suggest positive audience reception, indicating that AI-driven content delivery can hold its ground against conventional instructional formats. Looking ahead, there is a wealth of opportunities to refine these technologies. Future research could explore more sophisticated AI models, enabling even greater contextual adaptation and personalized learning experiences. Integration with real-time analytics to dynamically adapt content according to viewer engagement patterns represents another promising direction. Such advancements can further democratize education by providing accessible, tailored learning materials to diverse audiences worldwide. This work sets an important precedent allowing educators to focus on course design and student interaction, leaving the labor-intensive content creation to sophisticated AI tools, such as VidAIQ. Declarations Author Contributions M.Y. wrote the main manuscript text and designed the software application. J.K. prepared the Slides Generation software module. All authors reviewed the manuscript. Funding The authors declare that no funds, grants, or other support were received during the preparation of this manuscript. References A.Jayanthiladevi, A. G. (2020). AI in Video Analysis, Production and Streaming. Journal of Physics . doi:10.1088/1742-6596/1712/1/012014 al., C. C. (2022). How Deepfakes and Artificial Intelligence Could Reshape the Advertising Industry. Journal of Advertising Research . doi:https://doi.org/10.2501/JAR-2022-017 Aymane Ezzaim, A. D. (2022). AI-Based Adaptive Learning: A Systematic Mapping of the Literature. The Journal of Universal Computer Science . doi:https://doi.org/10.1016/j.caeai.2021.100017 Balti, M., & al., G. S. (2023). AI Based Video and Image Analytics. IEEE . doi:10.1109/INISTA59065.2023.10310403 Bayne, S. (2015). Teacherbot: interventions in automated teaching. Teaching in Higher Education . doi:https://doi.org/10.1080/13562517.2015.1020783 C.V, A. (2018). A Survey on Collaborative Learning Approach forSpeech andSpeaker Recognition. 3rd National Conference on Image Processing, Computing, Communication, Networking and Data Analytics. doi:10.21467/proceedings.1.34 Christopher Collins, D. D. (2021). Artificial intelligence in information systems research: A systematic literature review and research agenda. International Journal of Information Management . doi:https://doi.org/10.1016/j.ijinfomgt.2021.102383 Cihan Orak, Z. T. (2024). Using Artificial Intelligence In Digital Video Production: A Systematic Review Study. Journal of Educational Technology and Online Learning . doi:10.31681/jetol.1459434 Cox, T. (2018). Digital Video as a Personalized Learning Assignment: A Qualitative Study of Student Authored Video using the ICSDR Model. Journal of Scholarship of Teaching and Learning . doi:10.14434/josotl.v18i1.21027 Faycal Farhi, R. J. (2022). How do Students Perceive Artificial Intelligence in YouTube Educational Videos Selection? A Case Study of Al Ain City. International Journal of Emerging Technologies in Learning . doi:10.3991/ijet.v17i22.33447 Gao, H. (2022). Online AI-Guided Video Extraction for Distance Education with Applications. Mechanical Problems in Engineering . doi: https://doi.org/10.1155/2022/5028726 Marie-Luce Bourguet, Y. J.-A. (2017). Social Robots that can Sense and Improve Student Engagement. IEEE . doi:10.1109/TALE48869.2020.9368438 Marketing Charts. (2020). YouTube Influencer Engagement Rate Benchmarks: What Are Good Rates? Retrieved from https://www.marketingcharts.com/digital/video-112775. Olga Tapalova, N. Z. (2022). Artificial Intelligence in Education: AIEd for Personalised Learning Pathways. EJEL . Pellas, N. (2023). The influence of sociodemographic factors on students' attitudes toward AI-generated video content creation. Smart Learning Environments . doi:10.1186/s40561-023-00276-4 Pengyuan Zhou, L. W. (2024). A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming. doi:https://doi.org/10.48550/arXiv.2404.16038 Pluzhnikova, N. N. (2024). Technologies of Artificial Intelligence in Educational Management. IEEE . doi:10.1109/EMCTECH49634.2020.9261561 Rana AlShaikh, N. A.-M. (2024). The implementation of the cognitive theory of multimedia learning in the design and evaluation of an AI educational video assistant utilizing large language models. Heliyon . doi:https://doi.org/10.1016/j.heliyon.2024.e25361 Sahil Kumar, V. G. (2021). Role of Artificial Intelligence in Generating Video. IEEE . doi:10.1109/ICACCS54159.2022.9785336 Tony Belpaeme, J. K. (2018). Social robots for education: A review. Science Robotics . doi:10.1126/scirobotics.aat5954 Torbjørn Netland, O. v. (2024). Comparing human-made and AI-generated teaching videos: An experimental study on learning effects. Computers & Education . doi:https://doi.org/10.1016/j.compedu.2024.105164 Vijeta Sharma, M. G. (2019). Video Processing Using Deep Learning Techniques: A Systematic Literature Review. IEEE . doi:10.1109/ACCESS.2021.3118541 Yanev, M. (2024). Building AI Applications with OpenAI APIs. Packt Publishing. Yidi Zhang, M. L.-h. (2024). The effect of student acceptance on learning outcomes: AI-generated short videos versus paper materials. Computers and Education: Artificial Intelligence . doi:https://doi.org/10.1016/j.caeai.2024.100286 Yin Wang, P. L. (2022). Development and Strategy Analysis of Short Video News Dissemination under the Background of Artificial Intelligence. Mobile Information Systems . doi:10.1155/2022/2750925 Yueliang Wu, A. Y. (2023). Artificial intelligence for video game visualization, advancements, benefits and challenges. Mathematical Biosciences and Engineering . doi:10.3934/mbe.2023686 ZipRecruiter. (2025). Video Maker Salary . Retrieved from https://www.ziprecruiter.com/Salaries/Video-Maker-Salary. Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6379057","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":438655790,"identity":"048cb405-60b7-4f14-a62a-0d8343c24b1b","order_by":0,"name":"Martin Yanev","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABCElEQVRIiWNgGAWjYPACG342BgY2ZmQhxgb8WtIk29hgWtggqglpOSzZQLQW3RnJzx58bDsvwSff/OxxYds9u/nzm58/+MBgI7vhAHYtZjfSzA1ntt2WYGNjMzee2VacvOEYm2HjDIY0Y9xaEsykec7crmNj42GT5m1LSDZgYzBs5mE4nIhbS/o36T9nzknAtci3sX9s/sPwH4+WHDNphooDcC12DMd4DJsZGA7g1nLmTZlkT0UyUEuamfSMcwkJBsdyCmf2GCQbz8Sl5Xj6NokfBnYS8s2Hn0kXlCXYyzcf3/DhR4WdbB8OLQwCCaj8xAYwZYBDOQjwo5llj0ftKBgFo2AUjFAAACJZW07v0uDJAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0009-0005-5304-1783","institution":"Fitchburg State University","correspondingAuthor":true,"prefix":"","firstName":"Martin","middleName":"","lastName":"Yanev","suffix":""},{"id":438655791,"identity":"a6bac8fe-29ee-4af0-a801-15c91b54c821","order_by":1,"name":"Jurgen Kola","email":"","orcid":"","institution":"Fitchburg State University","correspondingAuthor":false,"prefix":"","firstName":"Jurgen","middleName":"","lastName":"Kola","suffix":""}],"badges":[],"createdAt":"2025-04-04 23:26:36","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-6379057/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6379057/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":80132892,"identity":"78a51caa-407c-434c-8e21-e4fe82e9323c","added_by":"auto","created_at":"2025-04-08 09:27:06","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":135176,"visible":true,"origin":"","legend":"\u003cp\u003eVidAIQ User Interface\u003c/p\u003e","description":"","filename":"f1b.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/2690ae060b794a162adb8ab8.png"},{"id":80135143,"identity":"57b254a0-56be-4d1d-881a-483a18060aa0","added_by":"auto","created_at":"2025-04-08 09:59:06","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":483786,"visible":true,"origin":"","legend":"\u003cp\u003eArchitectural Overview of Application Components\u003c/p\u003e","description":"","filename":"1a.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/f2ab770a2634d6a93053e0ed.png"},{"id":80133959,"identity":"85ed3e74-a230-4963-9786-ab400e739344","added_by":"auto","created_at":"2025-04-08 09:43:06","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":160082,"visible":true,"origin":"","legend":"\u003cp\u003eVector Store Retrieval Workflow\u003c/p\u003e","description":"","filename":"2b.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/257988a70f4e6b32fc4b177e.png"},{"id":80132894,"identity":"5c80999d-b5cb-4216-99d0-c3a3c35bc841","added_by":"auto","created_at":"2025-04-08 09:27:06","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":56329,"visible":true,"origin":"","legend":"\u003cp\u003eInstruction Building Function\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/f95e1a8ce458cedc907101fa.png"},{"id":80132902,"identity":"4556b296-b633-4cc1-9bad-aafc8f4ca58f","added_by":"auto","created_at":"2025-04-08 09:27:06","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":187650,"visible":true,"origin":"","legend":"\u003cp\u003eVideo Data Dictionary\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/06674da5bfa82e5936cdf9d9.png"},{"id":80135145,"identity":"c9d3b4b4-c85f-40b5-b107-b082fc641266","added_by":"auto","created_at":"2025-04-08 09:59:06","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":14993,"visible":true,"origin":"","legend":"\u003cp\u003eSlide Data\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/7381b9ae71d46df02edde90a.png"},{"id":80133045,"identity":"106a1a94-b764-4898-9398-f746ca07168a","added_by":"auto","created_at":"2025-04-08 09:35:06","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":151473,"visible":true,"origin":"","legend":"\u003cp\u003eSlide Generation Process\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/cf38520fb7f3a871a2a7ba5b.png"},{"id":80133049,"identity":"1efd288d-5a5b-4f80-8be9-cb5281bee9e2","added_by":"auto","created_at":"2025-04-08 09:35:06","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":127581,"visible":true,"origin":"","legend":"\u003cp\u003eUsing OpenAI Vision API to Describe Figures\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/2a1d2e6e21147ac4d2e97426.png"},{"id":80132907,"identity":"5844862f-9e43-4804-bb7e-5cb2a7af6150","added_by":"auto","created_at":"2025-04-08 09:27:06","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":86124,"visible":true,"origin":"","legend":"\u003cp\u003eVideo Code Display Pseudocode\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/ab51654e3d9cbac4654600f8.png"},{"id":80132915,"identity":"c614081d-7244-4f59-b02a-d4ea2a820485","added_by":"auto","created_at":"2025-04-08 09:27:06","extension":"png","order_by":10,"title":"Figure 10","display":"","copyAsset":false,"role":"figure","size":160560,"visible":true,"origin":"","legend":"\u003cp\u003eCode Writing Component Display\u003c/p\u003e","description":"","filename":"10.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/33ba536cd67bb51755d43d14.png"},{"id":80132914,"identity":"8e924087-2cd2-43b3-92ca-d11aa70eb406","added_by":"auto","created_at":"2025-04-08 09:27:06","extension":"png","order_by":11,"title":"Figure 11","display":"","copyAsset":false,"role":"figure","size":444328,"visible":true,"origin":"","legend":"\u003cp\u003eAssembling Video Components\u003c/p\u003e","description":"","filename":"11.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/537bd02568bec236a43db531.png"},{"id":80133050,"identity":"cb351e14-fd99-43f2-992e-8c6fb56d1961","added_by":"auto","created_at":"2025-04-08 09:35:06","extension":"png","order_by":12,"title":"Figure 12","display":"","copyAsset":false,"role":"figure","size":15363,"visible":true,"origin":"","legend":"\u003cp\u003eComparison of AI and Human Time to Create a 4-Hour Course\u003c/p\u003e","description":"","filename":"12.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/368441aeac822429bde5235d.png"},{"id":80133055,"identity":"c7f3a5ac-c9bd-4ec6-973e-bba6ff143fd0","added_by":"auto","created_at":"2025-04-08 09:35:07","extension":"png","order_by":13,"title":"Figure 13","display":"","copyAsset":false,"role":"figure","size":189628,"visible":true,"origin":"","legend":"\u003cp\u003eComparison Between Video Duration and Process Time for 11 Video Tutorials\u003c/p\u003e","description":"","filename":"13.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/cb292f3cac4cf3c722317e93.png"},{"id":80133051,"identity":"fd5329c0-7ca0-4acb-9659-2c4d9d4484b4","added_by":"auto","created_at":"2025-04-08 09:35:06","extension":"png","order_by":14,"title":"Figure 14","display":"","copyAsset":false,"role":"figure","size":106721,"visible":true,"origin":"","legend":"\u003cp\u003eAI Models Cost by Type\u003c/p\u003e","description":"","filename":"14.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/6da768261b4cfa0d0bf02617.png"},{"id":80132919,"identity":"1993a037-bb1b-4fb6-8435-7392f9768dcd","added_by":"auto","created_at":"2025-04-08 09:27:07","extension":"png","order_by":15,"title":"Figure 15","display":"","copyAsset":false,"role":"figure","size":132119,"visible":true,"origin":"","legend":"\u003cp\u003eNumber of Views of AI and Non-AI Videos\u003c/p\u003e","description":"","filename":"15.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/9e41c82fdec3924b8895bdde.png"},{"id":80132923,"identity":"1ffc9e9f-8b0f-477b-9b79-f80114459223","added_by":"auto","created_at":"2025-04-08 09:27:07","extension":"png","order_by":16,"title":"Figure 16","display":"","copyAsset":false,"role":"figure","size":164532,"visible":true,"origin":"","legend":"\u003cp\u003eWatch Times of AI and Non-AI Videos\u003c/p\u003e","description":"","filename":"16.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/f73bb45367033b4a5077564b.png"},{"id":80133961,"identity":"d05a1aaa-9a42-4521-8891-eb5a254ddffd","added_by":"auto","created_at":"2025-04-08 09:43:07","extension":"png","order_by":17,"title":"Figure 17","display":"","copyAsset":false,"role":"figure","size":94872,"visible":true,"origin":"","legend":"\u003cp\u003eRevenue of AI and Non-AI Videos\u003c/p\u003e","description":"","filename":"17.png","url":"https://assets-eu.researchsquare.com/files/rs-6379057/v1/e1eabec513653d23bae8b69b.png"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eAI-Driven Conversion of Textbooks into Long-Form Educational Videos\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eThe impact of Artificial Intelligence on reshaping education in the digital age continues to grow with every new technological advancement. Traditional educational paradigms, long reliant on static modes of information dissemination, are being transformed by technological innovations that promise to make learning more interactive, personalized, and widely accessible (Rana AlShaikh, 2024). Turning textual material into dynamic videos that combine AI-generated voice, visuals, and interactive elements is part of this shift. This not only aligns with the need for varied learning tools that cater to diverse cognitive styles but also taps into the vast potential of AI to synthesize complex information efficiently (Pellas, 2023).\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;The growing prevalence of AI-enhanced video content production is grounded in the principles of the Cognitive Theory of Multimedia Learning (CTML), which underscores the benefits of integrating multiple forms of media in promoting better learning. As this theory suggests, learners can derive greater meaning when presented with a combination of text and imagery\u0026mdash;a methodology that is inherently supportive of video-based learning formats outcomes (Rana AlShaikh, 2024).\u003c/p\u003e\n\u003cp\u003eDespite these advantages, there remains a significant gap in harnessing the full potential of AI to convert static educational resources into engaging video content. This is particularly pertinent given the underutilization of existing online educational videos, despite their widespread availability outcomes (Rana AlShaikh, 2024). Addressing this gap involves advancing AI technologies to not only automate video generation processes but also ensure they are contextually relevant and pedagogically effective.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eMoreover, the deployment of technologies such as Large Language Models (LLMs) and AI-based content transformation systems has illuminated new possibilities for educational innovation. These technologies can automate the meticulous processes of content scanning and transformation, letting educators and learners focus on engagement and comprehension rather than mere content consumption (Pengyuan Zhou, 2024).\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;This study leverages publicly available OpenAI APIs to develop an innovative application, called \u003cem\u003evidAIQ\u003c/em\u003e, capable of generating AI-driven educational video tutorials. The application is designed to produce long-form tutorials exceeding four hours in duration, aligning with the structure of traditional online courses. Implemented in Python, the system integrates Large Language Models (LLMs), vision models, and text-to-speech (TTS) technology, utilizing established libraries such as \u003cem\u003eMoviePy\u003c/em\u003e to transform written content into engaging educational videos. The generated videos incorporate a structured three-component design, combining slides, graphical illustrations, and real-time code demonstrations. Additionally, we ensure dynamic transitions and variations in visual elements to maintain viewer engagement.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;To evaluate the effectiveness of AI-generated educational videos, we conducted comparative analysis against human-produced long-format courses. Key engagement metrics, including view count, watch time, and revenue generation, were analyzed to assess the viability of AI-generated content in comparison to traditional instructional videos. This assessment provides insights into the potential of AI-powered educational content and its ability to compete with manually produced learning materials.\u003c/p\u003e"},{"header":"Related work","content":"\u003cp\u003eEducational content delivery has been significantly impacted by the integration of artificial intelligence, particularly in the creation of video materials. One study highlights the shift towards using AI-generated content in learning environments, finding that AI tools can produce video content that rivals human-made videos in terms of educational outcomes. This research also emphasizes the cost-effectiveness and rapid production capabilities of AI systems, suggesting a trend towards their increased adoption in digital education settings (Torbjørn Netland, 2024).\u003c/p\u003e\n\u003cp\u003eAnother critical evaluation compared the use of AI-generated versus paper-based educational materials. It focused on the respective effects of these media on learning outcomes. The findings underscore that while learners responded positively to video content for engagement and knowledge retention, there was no significant variance in comprehension when compared to traditional materials. This reveals how AI-generated content, while novel and engaging, might require further refinement to fully replace established educational materials (Yidi Zhang, 2024).\u003c/p\u003e\n\u003cp\u003eMoreover, studies point out that AI can synthesize various forms of media—such as text, visuals, and audio—into cohesive learning materials that cater to diverse learning preferences. This approach not only supports improved learning experiences but also democratizes access to high-quality educational resources. Advanced AI systems can personalize these experiences, tailoring content to meet individual learning needs more effectively than traditional methods can (Torbjørn Netland, 2024).\u003c/p\u003e\n\u003cp\u003eThe effectiveness of AI in educational content delivery has also been studied extensively in the context of recommendation systems, particularly in platforms like YouTube, which use AI to tailor educational videos based on user engagement and behaviors. This personalized approach not only enhances the user's content experience but also aligns educational delivery with learners’ unique needs, thereby supporting diverse learning journeys across global platforms (Faycal Farhi, 2022). This reflects a broader trend in educational technology, wherein AI acts as a catalyst for personalized learning, offering tailored educational experiences that were previously unattainable with traditional static learning modes (Faycal Farhi, 2022).\u003c/p\u003e\n\u003cp\u003eStudies indicate that AI-based systems can improve educational experiences by adapting to unique student needs, boosting engagement and effectiveness. Moreover, these adaptive systems leverage AI algorithms to create learning pathways that enhance the pedagogical process, a trend that is increasingly mirrored across various educational platforms (Aymane Ezzaim, 2022). AI-driven technologies like machine learning and deep learning can improve performance by incorporating multi-layered data for video pattern recognition and information classification over time (A.Jayanthiladevi, 2020).\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe “teachbot” project at the University of Edinburgh demonstrated how AI can effectively mediate learning in massive open online courses (MOOCs). It indicated a paradigm where automation is utilized not simply for substituting teacher presence but for enhancing pedagogical processes through interactive, algorithm-driven instruction Designed as an automated agent within MOOCs, teacherbots embody the fusion of human and technological interactions, serving as co-tutors and prompting profound critical discourse among students about the pedagogical value of AI interventions in education (Bayne, 2015).\u003c/p\u003e\n\u003cp\u003eDeep learning approaches, such as Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), and Recurrent Neural Networks (RNNs), have become pivotal in addressing challenges like human action recognition, object detection, and behavior analysis (Vijeta Sharma, 2019). Moreover, artificial Intelligence in Education (AIEd) has provides personalized pathways through the integration of adaptive systems, big data analytics, and intelligent algorithms. AIEd leverages dynamic modeling to create individualized learning profiles, enabling the delivery of tailored educational materials that cater to the strengths, weaknesses, and preferences of each learner. Intelligent tutoring systems and virtual environments further enhance this personalized approach by providing real-time feedback and engaging, immersive experiences. (Olga Tapalova, 2022).\u003c/p\u003e\n\u003cp\u003eAI technologies have revolutionized content creation in the educational domain by enabling the automated generation of high-quality and contextually relevant materials. For instance, OpenAI’s suite of APIs, including DALL-E and ChatGPT, offers powerful tools for creating visually rich and interactive educational content. DALL-E specializes in generating detailed images based on text prompts, making it particularly valuable for designing visual aids in presentations and instructional materials. By integrating these APIs with frameworks like Python's PPTX library, developers can automate the creation of customized educational resources, such as slideshows and video assets, tailored to specific learning outcomes. These advancements simplify the production process and enhance the accessibility and scalability of personalized learning experiences (Yanev, 2024).\u003c/p\u003e"},{"header":"METHODOLOGY \u0026 IMPLEMENTATION","content":"\u003cp\u003eThis section outlines the methodology used in the development and implementation of VidAIQ. It details the processes involved in converting static educational texts into AI-generated videos, focusing on the application design, information extraction, and video generation techniques employed to enhance the learning experience.\u003c/p\u003e\n\u003cp\u003eA. Application Design\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eVidAIQ\u003c/em\u003e is conceptualized as a transformative tool that converts static educational text into dynamic, AI-generated videos, enhancing the learning experience through interactive and modern teaching methodologies. The application is structured in a way that integrates AI and multimedia processing to produce comprehensive video tutorials.\u003c/p\u003e\n\u003cp\u003eFig. 1 presents the user interface of the developed system. The primary functionality allows users to browse their database for a book and specify the desired number of videos to be generated. The processing time of the AI engine increases with the number of selected videos. Each generated video has a default duration of approximately 10 minutes.\u003c/p\u003e\n\u003cp\u003eTwo supplementary parameters control the video's structure and presentation style. The video type parameter enables users to choose between generating a single, continuous tutorial or a series of separate course modules. The total duration of the generated tutorial is directly proportional to the number of selected videos. For example, selecting 50 videos results in approximately 500 minutes (over eight hours) of AI-generated instructional content. The teaching style parameter adjusts the AI-generated narration, allowing users to select between a formal, lecture-style delivery similar to a professor addressing students or a more casual, YouTube-style tutorial format. Upon selecting the \"Generate Video\" button, the system processes the input book and parameters through its backend components.\u003c/p\u003e\n\u003cp\u003eThe main architecture of the application is illustrated in Fig. 2. The book, along with user-defined parameters, is first handled by the \u003cem\u003eBook Reader\u003c/em\u003e module. Given the computational limitations of AI models in processing large documents in a structured manner, the system segments the book into smaller sections based on the number of chapters and headings. Our experimental results indicate that breaking the document into smaller units enhances the AI’s ability to extract and organize relevant information, which is critical for generating coherent educational content. Additionally, the \u003cem\u003eFigure Extraction Module\u003c/em\u003e identifies and extracts all figures from the book, associates them with their respective chapter numbers and pages, and stores them in a designated directory.\u003c/p\u003e\n\u003cp\u003eOnce the book has been segmented, these sections are sequentially processed by AI modules. The \u003cem\u003eScript Extractor Module\u003c/em\u003e converts the text into a structured video script using an AI Assistant, ensuring the content aligns with an instructional format. For technical books, particularly those related to programming and software engineering, the \u003cem\u003eCode Extractor Module\u003c/em\u003e retrieves relevant code snippets and reformats them using a large language model (LLM) to ensure clarity and correctness. All extracted components are then organized into a structured Python dictionary, facilitating efficient retrieval and processing.\u003c/p\u003e\n\u003cp\u003eEach video segment is generated iteratively by passing the extracted data to the \u003cem\u003eVideo Components Builder\u003c/em\u003e modules. These modules process the text and figures to generate AI-driven voiceovers, slides, code demonstrations, and figure presentations. Integrated AI APIs, including LLMs and Vision models, such as OpenAI gpt-4o-mini, facilitate the generation of these elements. Once all components for a given video are prepared, they are assembled using the \u003cem\u003eVideo Assembler Module\u003c/em\u003e, which compiles the final educational video. The process is repeated iteratively for each video in the series: components are generated, compiled into a video, then cleared before proceeding to the next segment, optimizing resource management and execution efficiency.\u003c/p\u003e\n\u003ch2\u003eB. Book Segmentation\u003c/h2\u003e\n\u003cp\u003eIn the early stages of the project, we initially explored the approach of using the entire book as input and instructing the AI models to generate video titles based on the user-specified number of videos. The objective was to assign each generated title to a specific range of pages within the book. However, this approach proved inefficient, as the AI struggled to maintain a coherent sequence of pages for each assigned title. In educational video production, it is essential to ensure that the content is organized in a logical progression, from foundational concepts to more advanced topics. If we had followed this initial strategy, the resulting videos could have drawn content from disparate sections of the book or potentially overlapped, undermining the continuity and structure required for effective learning.\u003c/p\u003e\n\u003cp\u003eThe inefficiencies observed in our initial approach stemmed from intrinsic limitations in how the AI model processes and retrieves information, particularly when interfaced with a vector store. As depicted in Fig. 3, book data is transformed into a complex web of vector embeddings stored within a vector database. This transformation prioritizes semantic similarity over sequential accuracy, which inadvertently complicates the alignment of video titles with specific book sections. The vector store retrieves information based on proximity in the semantic space rather than the physical location within the text, leading to potential overlaps or disjointed sequencing in content allocation. Such issues highlight a critical mismatch between the AI’s semantic retrieval abilities and the need for a linear, pedagogical structure in educational video creation. The figure illustrates this process, emphasizing the gap between the need for maintaining educational flow and the semantic-driven design of vector storage, which is not inherently equipped to handle sequential integrity. This necessitates a more refined strategy that incorporates pre-structuring and hierarchical retrieval to ensure that the developed educational content maintains coherence and educational value.\u003c/p\u003e\n\u003cp\u003eGiven the limitations of the initial approach, we opted to pre-segment the book into logical sections prior to processing it with the AI system. To maintain both topic consistency and coherence, and to avoid splitting content in the middle of a topic, we implemented a compromise by dividing the book based on its chapters and titles, rather than segmenting it purely by page count for each video. The book was thus organized into discrete sections, with a function designed to evenly allocate the required number of videos across these sections. This approach minimized potential coherence errors and ensured the integrity of the topics within each video, facilitating a more structured and consistent educational experience.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\u0026nbsp;Table\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003e1\u003c/strong\u003e: Distribution of Generated Videos Across Book Segments\u003c/p\u003e\n\u003cdiv\u003e\n \u003ctable border=\"0\" cellspacing=\"0\" cellpadding=\"0\" width=\"439\"\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003eChapter\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eBook Segment (Chapter/Title)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e1\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eChapter 1: Introduction to AI\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e2\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eChapter 1: Introduction to AI\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e3\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eChapter 2: Machine Learning Basics\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e4\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eChapter 2: Machine Learning Basics\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e5\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eChapter 3: Neural Networks\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd\u003e\n \u003cp\u003e6\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd\u003e\n \u003cp\u003eChapter 4: Deep Learning for Computer Vision\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003eTo illustrate the relationship between the generated videos and the book segments, we present an example of how the videos are distributed across different sections of the book. As shown in Table 1 given that the total number of videos (6) exceeds the number of sections (4), the videos are distributed in a manner that ensures each section is equally represented. Initially one video is assigned to every section. The process is then repeated, with each section receiving a second video until the complete list of videos is consumed. This distribution strategy ensures balanced coverage of all sections, while preserving a logical flow throughout the content.\u003c/p\u003e\n\u003ch2\u003eC. The AI Assistant\u003c/h2\u003e\n\u003cp\u003eThe AI assistant serves as the core component of our application, enabling efficient extraction of structured information from books. Unlike conventional request-response interactions with AI models, AI assistants are advanced software-driven systems capable of querying text documents while providing precise references to extracted information. In our implementation, we utilize OpenAI’s gpt-4o-mini assistant to process book chapters and extract essential components for video generation. The assistant operates on the OpenAI cloud, allowing multiple queries throughout the video production pipeline. To facilitate seamless interaction, we encapsulated the assistant's functionality within a dedicated function, \u003cem\u003eread_book()\u003c/em\u003e, which dynamically receives specific instructions to extract relevant data. This modular design ensures flexibility, allowing the assistant to adapt to different extraction tasks based on user-defined input parameters.\u003c/p\u003e\n\u003cp\u003eThe process of creating and utilizing the AI Assistant involves the following steps:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eAssistant Creation: If an assistant is not yet created, it is instantiated with the instruction: \"You are an expert at extracting information from books.\"\u003c/li\u003e\n \u003cli\u003eBook Upload and Vectorization: The book is uploaded, and a vector store is generated to enable efficient querying.\u003c/li\u003e\n \u003cli\u003eQuery Execution: The assistant is provided with specific instructions to extract the required information from the vector storage.\u003c/li\u003e\n \u003cli\u003eAssistant Deletion: Once the required information was extracted and recorded, the assistant for the book segment was deleted.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eTo enhance adaptability, we introduced a novel \u003cstrong\u003eInstruction Building Function\u003c/strong\u003e methodology, which dynamically incorporates custom variables into the assistant’s queries. These functions generate instructions based on specific video generation requirements, ensuring that the assistant retrieves content relevant to each video segment. For instance, as demonstrated in Fig. 4, an instruction-building function can take a video title as input, construct a structured query for generating a corresponding video script, and subsequently send it to the assistant.\u003c/p\u003e\n\u003cp\u003eThis approach enables precise content extraction for the needs of AI-generated educational videos, improving the relevance and coherence of the output.\u003c/p\u003e\n\u003ch2\u003eD. Data Structures\u003c/h2\u003e\n\u003cp\u003eAs illustrated in Fig. 2, \u003cstrong\u003eVidAIQ\u003c/strong\u003e leverages AI assistants to extract relevant information from book segments and structure it into a Python dictionary.\u003c/p\u003e\n\u003cp\u003eFig. 5 presents the structured data frame used to organize all video-related information. Each video is assigned a unique title based on its topic and includes a reference to the corresponding book segment from which the content is derived.\u003c/p\u003e\n\u003cp\u003eSince each user-requested video consists of multiple scenes, the dictionary is structured to reflect this hierarchy. Every video entry contains a \u003cem\u003etranscript\u003c/em\u003e dictionary, where scenes are identified by their respective scene numbers. Each scene includes a \u003cem\u003escript\u003c/em\u003e field, which serves as the basis for audio narration during video generation. To further enhance customization, when generating the \u003cem\u003escript\u003c/em\u003e, we pass the user’s preferred teaching style into the request prompt.\u003c/p\u003e\n\u003cp\u003eAdditionally, to accommodate educational content that includes programming examples, we incorporate a \u003cem\u003ecode\u003c/em\u003e field that captures relevant snippets, if present. The \u003cem\u003elanguage\u003c/em\u003e field specifies the programming language associated with the code. Both \u003cem\u003escript\u003c/em\u003e and \u003cem\u003ecode\u003c/em\u003e are stored as lists to maintain the correct association between explanatory text and its corresponding code snippet. This structured format ensures a seamless synthesis of narration and code presentation within the generated video content.\u003c/p\u003e\n\u003cp\u003eWhen working with AI-generated content, it's important to distinguish between explanatory text and code snippets. To achieve this, we use a structured approach to extract and format both elements separately. The method relies on detecting code blocks enclosed within triple backticks (```), ensuring that code is properly separated from textual explanations.\u003c/p\u003e\n\u003cp\u003eThe process involves scanning the response using a pattern that identifies these backticks and extracting the enclosed code. The remaining text is then split accordingly to ensure that explanatory sections remain distinct from the code. Additionally, if any code blocks are missing backticks, the method attempts to reconstruct the missing sections by analyzing the surrounding text. By implementing this structured extraction method, we ensure that AI-generated content can be properly separated.\u003c/p\u003e\n\u003ch2\u003eE. Video Elements Generation\u003c/h2\u003e\n\u003cp\u003eOnce all relevant information is stored in our structured dictionary, we utilize it to generate the necessary video components, which are subsequently assembled into a complete video.\u003c/p\u003e\n\u003ch3\u003e1) Audio Generation\u003c/h3\u003e\n\u003cp\u003eTo transform the generated video script into high-quality narration, we employ OpenAI’s Text-to-Speech (TTS) model, which converts text into an MP3 audio file. The model provides a selection of natural-sounding voices, allowing us to specify a preferred voice during the request. Audio files are generated interactively for each video scene, ensuring synchronization with the scene’s content. To maintain organization and facilitate seamless assembly, the MP3 filenames include both the scene number and the video title, enabling precise identification of each audio segment.\u003c/p\u003e\n\u003ch3\u003e2) Slides Generation\u003c/h3\u003e\n\u003cp\u003eTo enhance engagement and maintain visual diversity, we designed a collection of over 30 high-resolution slide templates for our educational videos. Each template follows a unique design to ensure variety throughout the video. To populate these slides with scene-specific content, we developed a Python function that leverages AI to generate key slide elements, including titles, bullet points, and structured text. The scene script serves as input, and AI generates the appropriate slide components.\u003c/p\u003e\n\u003cp\u003eTo facilitate automated extraction of these elements from the AI-generated response, we implemented a structured format where each element is prefixed with a special symbol (e.g., “~”), as illustrated in Fig. 6.\u003c/p\u003e\n\u003cp\u003eIn addition, we wanted to ensure that our slides included relevant images to enhance the visual appeal and clarity of the videos. To achieve this, we employed two primary approaches: AI-generated images using DALL-E and an icon search tool called \u003cstrong\u003eFreepik\u003c/strong\u003e. DALL-E is an advanced AI model developed by OpenAI that generates high-quality images based on textual descriptions. Freepik, on the other hand, is a widely used platform that provides access to a vast collection of high-quality icons, vector graphics, and stock images. Through its API, we can search for and integrate relevant icons and illustrations that complement the generated images and enhance the slides.\u003c/p\u003e\n\u003cp\u003eTo determine the most appropriate image for each scene, we utilized a LLM to generate relevant search terms based on the scene script. This allowed us to automate the process of selecting appropriate keywords for both AI-generated images and icon retrieval. As an output, we received direct links to the most relevant images and icons, ensuring that every visual element in the slides aligns with the content of the video scene.\u003c/p\u003e\n\u003cp\u003eAfter extracting the slide metadata using the methodologies described above, we employed the Python library Pillow to render the content onto the predefined slide templates, as illustrated in Fig. 7. Given that each slide template contained distinct text and graphical elements, we developed a dedicated Python function to dynamically position these elements based on the template structure. While the selection of slide templates was randomized to introduce variety, we implemented a tracking mechanism to ensure that no template was repeated within the same video, maintaining visual diversity and coherence.\u003c/p\u003e\n\u003ch3\u003e3) Figures Selection\u003c/h3\u003e\n\u003cp\u003eTo enhance the visual component of our application, we incorporate figures and images using two primary approaches:\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eLocal images extracted from the book segment\u003c/li\u003e\n \u003cli\u003eWeb images retrieved using the Google Search API\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eWe prioritize local images from the book segment, as they are directly relevant to the content. However, in cases where a book contains few or no images, we supplement the visuals with web-sourced images. To achieve this, we utilize Google Search API to retrieve images based on a keyword that is dynamically generated using AI. This AI-driven keyword selection ensures that the retrieved images align with the specific scene’s topic. Once an image is selected, it is resized and formatted to fit within the video.\u003c/p\u003e\n\u003cp\u003eFig. 8 demonstrates how we leverage the OpenAI Vision API to refine the spoken script accompanying displayed figures. Initially, the scene’s script is generated based solely on the topic and book content, without accounting for specific figures. This can lead to a disconnect between the narration and the visual elements, as viewers expect a direct explanation of any displayed figures.\u003c/p\u003e\n\u003cp\u003eTo address this, we integrate the OpenAI Vision API, which processes the selected figure and generates an enhanced script tailored to both the scene’s topic and the figure’s content. This approach ensures that when a figure is shown, the narration dynamically adapts to describe and contextualize it within the broader educational content. Our findings indicate that this significantly improves the coherence and engagement of the video, making the instructional material more informative and visually aligned.\u003c/p\u003e\n\u003ch3\u003e4) Code Display\u003c/h3\u003e\n\u003cp\u003eMany educational books include programming code, making it essential to incorporate code demonstrations in video content. Rather than displaying static code snippets, we chose to dynamically type the code on the screen, mimicking the way an instructor would type during a live programming tutorial. This approach enhances engagement and maintains viewer interest.\u003c/p\u003e\n\u003cp\u003eFig. 9 depicts the pseudocode design used to build our code display. \u0026nbsp;To optimize video performance while preserving the typing effect, we analyzed the ideal frame rate required for smooth visual representation. We determined that 12 frames per second (fps) was the optimal rate, ensuring a realistic typing experience without excessive computational overhead.\u003c/p\u003e\n\u003cp\u003eTo maintain readability and visual coherence, our approach involves automatic text wrapping, font size adjustments, and structured text placement. The text wrapping functionality ensures that no line exceeds 80 characters, preserving readability across different screen sizes. Additionally, we dynamically adjust font size based on the length of the displayed code, ensuring that shorter code blocks appear larger for better visibility while longer code segments remain within readable constraints.\u003c/p\u003e\n\u003cp\u003eFig. 10 depicts the code writing display. In each code display we extract the programming language directly from the video transcript dictionary and display it prominently at the top of the screen. The code is then typed line by line, with individual characters appearing sequentially. Line numbers are displayed alongside the code to facilitate reference, and a vertical separator is included to visually distinguish the line numbers from the code. To reinforce realism, a subtle typing sound effect is synchronized with the text appearance, beginning after the introductory delay and ending before the final silent display.\u003c/p\u003e\n\u003ch2\u003eF. Video Rendering and Compilation\u003c/h2\u003e\n\u003cp\u003eOnce all video components, including slides, audio, code displays, and figures, are generated and organized, they are assembled into a complete video. The assembly is performed using MoviePy, a widely adopted Python library for video processing, enabling efficient handling of media files and merging of components. Each scene’s visual and audio elements are carefully aligned based on predefined metadata, ensuring accurate timing and smooth transitions throughout the video.\u003c/p\u003e\n\u003cp\u003eTo maintain consistency in organization, each visual and audio component is labeled using a structured naming convention that includes the scene number and video title. This approach facilitates automated retrieval and systematic sequencing of all resources. The rendering process begins with an introductory image, which serves as an initial frame, followed by the sequential integration of scene-specific elements. As shown on Fig. 11 each scene, the corresponding visual content—whether a static slide, dynamically generated code display, or pre-rendered animation—is matched with the associated audio narration. If the visual component is a static image, it is processed as a still frame with a duration equivalent to the scene's audio length. In cases where the visual component is an animated segment, such as a code display with a typing effect, it is directly merged with the corresponding audio track.\u003c/p\u003e\n\u003cp\u003eThe compilation process leverages multi-threaded processing to enhance performance and minimize rendering time. Video segments are concatenated using a frame rate of 12 frames per second (fps), which was determined to be optimal for preserving smooth visual transitions while maintaining computational efficiency. The final video is encoded using the H.264 codec for high-quality compression, ensuring compatibility across various playback platforms. Upon completion of the rendering process, temporary video components are deleted to optimize storage usage, and the workflow proceeds to the generation of the next video.\u003c/p\u003e\n\u003cp\u003eAfter the videos are created, the system automatically generates a text file containing the video description, title, and timestamps. The AI-driven process produces a description and title that optimizes the content's discoverability. Timestamps are added to help viewers easily navigate to specific sections of the video. The text file with these details is then saved for easy use when uploading the video to platforms like YouTube.\u003c/p\u003e\n\u003ch2\u003eG. User Interface\u003c/h2\u003e\n\u003cp\u003eThe user interface of the application is developed using Flask, a lightweight Python web framework, which facilitates the integration of back-end logic and front-end elements, as illustrated on Fig. 1. The main entry point for users is an index page, which presents a simple form. This form allows users to upload files, select a teaching style, choose a video type, and specify the number of videos to generate from the uploaded content.\u003c/p\u003e\n\u003cp\u003eUpon successful file upload, the process route handles the form submission and invokes the necessary back-end logic. The uploaded file is saved in the designated folder, and user selections such as the number of videos, teaching style, and video type are retrieved from the form. These inputs are then passed to the back-end, which processes the content based on the user's preferences.\u003c/p\u003e"},{"header":"RESULTS AND DISCUSSION","content":"\u003cp\u003eTo assess the effectiveness of AI-generated video tutorials, we conducted a comparative analysis against human-produced educational content. The evaluation considered key performance metrics, including engagement, view count, watch time, and revenue generated from viewership. Additionally, we examined the content generation process in terms of production time for a four-hour tutorial, video quality, slide variability, and the total AI resource cost associated with generating a tutorial.\u003c/p\u003e\n\u003ch2\u003eA. Efficiency in Content Generation\u003c/h2\u003e\n\u003cp\u003eTo quantify the efficiency gains of AI-driven video creation, we performed an experiment in which an experienced Python instructor was tasked with developing a four-hour tutorial. This process involved creating slides, recording the video content, editing, and rendering the final tutorial. The total time required for the instructor to complete the tutorial was approximately 24 hours. In contrast, VidAIQ, our AI-powered software, generated a comparable tutorial—including similar coding examples, slides, and figures—in only 3.75 hours. This result indicates a significant improvement, reducing content production time by a factor of six, as illustrated in Fig. 12.\u003c/p\u003e\n\u003cp\u003eTo further validate these findings, Fig. 13 depicts an additional 11 video tutorials we generated, each ranging from 3.6 to 4.4 hours in duration. The average processing time across all tutorials was approximately 3.64 hours, closely aligning with the results obtained in the initial experiment. This consistency demonstrates the reliability of the AI-driven approach in significantly reducing the time required for educational content creation while maintaining structured and comprehensive video output.\u003c/p\u003e\n\u003ch2\u003eB. Cost Analysis of AI-Generated Tutorials\u003c/h2\u003e\n\u003cp\u003eIn addition to efficiency improvements, we analyzed the cost associated with generating AI-powered tutorials. Since VidAIQ integrates multiple AI models, we assessed the individual cost contributions of each component, as shown in Fig. 14. Among these, the text-to-speech model incurred the highest cost, as it was responsible for generating the audio narration. In contrast, the vision model, which was used to create textual descriptions for figures, had a significantly lower cost. The GPT-4o-mini model, which facilitated content extraction from books and text transformations, also accounted for relatively low cost.\u003c/p\u003e\n\u003cp\u003eThe average cost breakdown for a four-hour tutorial was as follows: the vision model averaged $0.22, indicating that image analysis is computationally inexpensive. The GPT-4o-mini model incurred an average cost of $0.53, while the text-to-speech model had the highest cost at $4.29. This resulted in a total AI resource cost of $5.04 per four-hour tutorial.\u003c/p\u003e\n\u003cp\u003eFor comparison, based on salary data (ZipRecruiter, 2025) the minimum hourly wage for an inexperienced video editor is approximately $8 per hour in 2025. Given that human content creation required 24 hours, the labor cost for an equivalent tutorial would be at least $192, making the AI-generated alternative approximately 38 times more cost-efficient. This substantial reduction in cost highlights the economic viability of AI-driven educational content creation, making high-quality tutorial production more accessible and scalable.\u003c/p\u003e\n\u003ch2\u003eC. Video Engagement\u003c/h2\u003e\n\u003cp\u003eTo evaluate the engagement levels of AI-generated video tutorials in comparison to human-produced tutorials, we conducted an empirical study on YouTube, a widely used video-sharing platform. The primary objective was to assess whether AI-generated content could achieve similar or superior engagement levels relative to human-created tutorials. To this end, we uploaded full-length video tutorials over the course of one month and systematically analyzed key engagement metrics, including view count, watch time, revenue generation, and audience feedback. Our analysis focused on the first 10 days after publication to establish an initial comparison between AI and human-generated content.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;Fig. 15 presents the comparative view count trends for two AI-generated and two human-generated tutorials. While a noticeable spike in viewership was observed for human-generated videos on the third day, the overall trend remained similar for both categories. The data indicates no statistically significant difference in the number of views between AI-generated and human-produced tutorials, suggesting that the audience was equally receptive to both content types. This finding implies that AI-generated educational videos do not suffer from an inherent disadvantage in terms of attracting viewers when compared to human-created tutorials.\u003c/p\u003e\n\u003cp\u003eWatch time is a critical metric that reflects user engagement and retention on a given video. As illustrated in Fig. 16, the difference in average watch time between AI-generated and human-generated tutorials was marginal. Interestingly, the AI-generated tutorials exhibited a slightly higher average watch time than their human-generated counterparts. While the difference was not statistically significant, it suggests that AI-generated content maintains comparable, if not superior, engagement levels.\u003c/p\u003e\n\u003cp\u003eA potential explanation for this result is the structured and concise nature of AI-generated content. Unlike human-created tutorials, which may exhibit variations in pacing, audio quality, and instructional clarity, AI-generated videos maintain a consistent level of quality throughout. The automated production process ensures uniform slide formatting, refined explanations, and a standardized audio narration, which may contribute to sustained viewer engagement despite the absence of a human presenter.\u003c/p\u003e\n\u003cp\u003eThe revenue generated through YouTube AdSense was also analyzed as an indirect measure of content performance. As shown in Fig. 17, AI-generated tutorials consistently outperformed human-generated videos in terms of revenue generation during the first 10 days after publication. On average, AI-generated videos earned $1.75, nearly five times higher than the $0.45 generated by human-created tutorials.\u003c/p\u003e\n\u003cp\u003eThis result was unexpected, given that the viewership levels were similar between the two categories. One possible explanation is that factors such as watch time, audience retention, and ad placement may have contributed to the revenue disparity. Further research is required to isolate the precise factors influencing the observed increase in revenue from AI-generated tutorials.\u003c/p\u003e\n\u003cp\u003eTo assess user preference, we analyzed the like-to-view ratio, a widely recognized engagement metric that measures the proportion of viewers who actively liked a video. According to industry data, the average like-to-view ratio on YouTube across 116 million videos is 2.28% (Marketing Charts, 2020). For human-generated tutorials on our test channel, the average like-to-view ratio was 4.85%, whereas AI-generated tutorials achieved a higher ratio of 5.61%.\u003c/p\u003e\n\u003cp\u003eThis finding suggests that AI-generated tutorials were not only well-received but also exhibited a higher user engagement rate than both the channel's human-created tutorials and the general YouTube benchmark. The increased like-to-view ratio further reinforces the hypothesis that AI-generated content maintains a comparable, level of audience approval relative to traditional human-generated tutorials.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn this study, we have demonstrated the potential of AI-driven educational content creation through our pioneering tool, VidAIQ. Our findings highlight the impressive efficiency and cost-effectiveness of AI-based solutions for generating educational video tutorials, capable of surpassing traditional human-driven methods. VidAIQ\u0026apos;s integration of AI technologies, such as text-to-speech, large language models, and vision models, enables a substantial reduction in content production time\u0026mdash;from 24 hours of manual effort to just 3.75 hours\u0026mdash;while maintaining high-quality outputs.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;The evaluation of AI-generated tutorials, compared to human-produced content, reveals that AI can achieve comparable levels of viewer engagement and revenue generation. Contrary to expectations, AI-produced content garnered a 38-fold decrease in production costs and maintained, if not exceeded, engagement metrics, including view counts and watch times. Furthermore, the higher like-to-view ratios observed in AI-generated videos suggest positive audience reception, indicating that AI-driven content delivery can hold its ground against conventional instructional formats.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;Looking ahead, there is a wealth of opportunities to refine these technologies. Future research could explore more sophisticated AI models, enabling even greater contextual adaptation and personalized learning experiences. Integration with real-time analytics to dynamically adapt content according to viewer engagement patterns represents another promising direction. Such advancements can further democratize education by providing accessible, tailored learning materials to diverse audiences worldwide. This work sets an important precedent allowing educators to focus on course design and student interaction, leaving the labor-intensive content creation to sophisticated AI tools, such as VidAIQ.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAuthor Contributions\u003c/strong\u003e M.Y. wrote the main manuscript text and designed the software application. J.K. prepared the Slides Generation software module. All authors reviewed the manuscript.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003cstrong\u003eFunding\u0026nbsp;\u003c/strong\u003eThe authors declare that no funds, grants, or other support were received during the preparation of this manuscript.\u003c/p\u003e\n"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eA.Jayanthiladevi, A. G. (2020). AI in Video Analysis, Production and Streaming. \u003cem\u003eJournal of Physics\u003c/em\u003e. doi:10.1088/1742-6596/1712/1/012014\u003c/li\u003e\n\u003cli\u003eal., C. C. (2022). How Deepfakes and Artificial Intelligence Could Reshape the Advertising Industry. \u003cem\u003eJournal of Advertising Research\u003c/em\u003e. doi:https://doi.org/10.2501/JAR-2022-017\u003c/li\u003e\n\u003cli\u003eAymane Ezzaim, A. D. (2022). AI-Based Adaptive Learning: A Systematic Mapping of the Literature. \u003cem\u003eThe Journal of Universal Computer Science\u003c/em\u003e. doi:https://doi.org/10.1016/j.caeai.2021.100017\u003c/li\u003e\n\u003cli\u003eBalti, M., \u0026amp; al., G. S. (2023). AI Based Video and Image Analytics. \u003cem\u003eIEEE\u003c/em\u003e. doi:10.1109/INISTA59065.2023.10310403\u003c/li\u003e\n\u003cli\u003eBayne, S. (2015). Teacherbot: interventions in automated teaching. \u003cem\u003eTeaching in Higher Education\u003c/em\u003e. doi:https://doi.org/10.1080/13562517.2015.1020783\u003c/li\u003e\n\u003cli\u003eC.V, A. (2018). A Survey on Collaborative Learning Approach forSpeech andSpeaker Recognition. \u003cem\u003e3rd National Conference on Image Processing, Computing, Communication, Networking and Data Analytics.\u003c/em\u003e doi:10.21467/proceedings.1.34\u003c/li\u003e\n\u003cli\u003eChristopher Collins, D. D. (2021). Artificial intelligence in information systems research: A systematic literature review and research agenda. \u003cem\u003eInternational Journal of Information Management\u003c/em\u003e. doi:https://doi.org/10.1016/j.ijinfomgt.2021.102383\u003c/li\u003e\n\u003cli\u003eCihan Orak, Z. T. (2024). Using Artificial Intelligence In Digital Video Production: A Systematic Review Study. \u003cem\u003eJournal of Educational Technology and Online Learning\u003c/em\u003e. doi:10.31681/jetol.1459434\u003c/li\u003e\n\u003cli\u003eCox, T. (2018). Digital Video as a Personalized Learning Assignment: A Qualitative Study of Student Authored Video using the ICSDR Model. \u003cem\u003eJournal of Scholarship of Teaching and Learning\u003c/em\u003e. doi:10.14434/josotl.v18i1.21027\u003c/li\u003e\n\u003cli\u003eFaycal Farhi, R. J. (2022). How do Students Perceive Artificial Intelligence in YouTube Educational Videos Selection? A Case Study of Al Ain City. \u003cem\u003eInternational Journal of Emerging Technologies in Learning\u003c/em\u003e. doi:10.3991/ijet.v17i22.33447\u003c/li\u003e\n\u003cli\u003eGao, H. (2022). Online AI-Guided Video Extraction for Distance Education with Applications. \u003cem\u003eMechanical Problems in Engineering\u003c/em\u003e. doi: https://doi.org/10.1155/2022/5028726\u003c/li\u003e\n\u003cli\u003eMarie-Luce Bourguet, Y. J.-A. (2017). Social Robots that can Sense and Improve Student Engagement. \u003cem\u003eIEEE\u003c/em\u003e. doi:10.1109/TALE48869.2020.9368438\u003c/li\u003e\n\u003cli\u003eMarketing Charts. (2020). \u003cem\u003eYouTube Influencer Engagement Rate Benchmarks: What Are Good Rates?\u003c/em\u003e Retrieved from https://www.marketingcharts.com/digital/video-112775.\u003c/li\u003e\n\u003cli\u003eOlga Tapalova, N. Z. (2022). Artificial Intelligence in Education: AIEd for Personalised Learning Pathways. \u003cem\u003eEJEL\u003c/em\u003e.\u003c/li\u003e\n\u003cli\u003ePellas, N. (2023). The influence of sociodemographic factors on students\u0026apos; attitudes toward AI-generated video content creation. \u003cem\u003eSmart Learning Environments\u003c/em\u003e. doi:10.1186/s40561-023-00276-4\u003c/li\u003e\n\u003cli\u003ePengyuan Zhou, L. W. (2024). A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming. doi:https://doi.org/10.48550/arXiv.2404.16038\u003c/li\u003e\n\u003cli\u003ePluzhnikova, N. N. (2024). Technologies of Artificial Intelligence in Educational Management. \u003cem\u003eIEEE\u003c/em\u003e. doi:10.1109/EMCTECH49634.2020.9261561\u003c/li\u003e\n\u003cli\u003eRana AlShaikh, N. A.-M. (2024). The implementation of the cognitive theory of multimedia learning in the design and evaluation of an AI educational video assistant utilizing large language models. \u003cem\u003eHeliyon\u003c/em\u003e. doi:https://doi.org/10.1016/j.heliyon.2024.e25361\u003c/li\u003e\n\u003cli\u003eSahil Kumar, V. G. (2021). Role of Artificial Intelligence in Generating Video. \u003cem\u003eIEEE\u003c/em\u003e. doi:10.1109/ICACCS54159.2022.9785336\u003c/li\u003e\n\u003cli\u003eTony Belpaeme, J. K. (2018). Social robots for education: A review. \u003cem\u003eScience Robotics\u003c/em\u003e. doi:10.1126/scirobotics.aat5954\u003c/li\u003e\n\u003cli\u003eTorbj\u0026oslash;rn Netland, O. v. (2024). Comparing human-made and AI-generated teaching videos: An experimental study on learning effects. \u003cem\u003eComputers \u0026amp; Education\u003c/em\u003e. doi:https://doi.org/10.1016/j.compedu.2024.105164\u003c/li\u003e\n\u003cli\u003eVijeta Sharma, M. G. (2019). Video Processing Using Deep Learning Techniques: A Systematic Literature Review. \u003cem\u003eIEEE\u003c/em\u003e. doi:10.1109/ACCESS.2021.3118541\u003c/li\u003e\n\u003cli\u003eYanev, M. (2024). \u003cem\u003eBuilding AI Applications with OpenAI APIs.\u003c/em\u003e Packt Publishing.\u003c/li\u003e\n\u003cli\u003eYidi Zhang, M. L.-h. (2024). The effect of student acceptance on learning outcomes: AI-generated short videos versus paper materials. \u003cem\u003eComputers and Education: Artificial Intelligence\u003c/em\u003e. doi:https://doi.org/10.1016/j.caeai.2024.100286\u003c/li\u003e\n\u003cli\u003eYin Wang, P. L. (2022). Development and Strategy Analysis of Short Video News Dissemination under the Background of Artificial Intelligence. \u003cem\u003eMobile Information Systems\u003c/em\u003e. doi:10.1155/2022/2750925\u003c/li\u003e\n\u003cli\u003eYueliang Wu, A. Y. (2023). Artificial intelligence for video game visualization, advancements, benefits and challenges. \u003cem\u003eMathematical Biosciences and Engineering\u003c/em\u003e. doi:10.3934/mbe.2023686\u003c/li\u003e\n\u003cli\u003eZipRecruiter. (2025). \u003cem\u003eVideo Maker Salary\u003c/em\u003e. Retrieved from https://www.ziprecruiter.com/Salaries/Video-Maker-Salary.\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Fitchburg State University","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":false,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"AI video production, AI-driven learning, Python, MoviePy, AI Assistants, DALL-E, Vision AI, educational video automation, EdTech","lastPublishedDoi":"10.21203/rs.3.rs-6379057/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6379057/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe integration of AI into educational content creation offers transformative opportunities for dynamic and interactive learning. VidAIQ, an innovative AI application, that converts static book content into long-form educational videos using OpenAI's APIs for audio, text, and multimedia processing. The system leverages Python libraries such as MoviePy for video editing while using AI Assistants to structure text into video scripts, generate visuals with DALL-E and Vision AI, and create live coding demonstrations. These elements are compiled into structured, engaging tutorials that enhance knowledge retention.\u003c/p\u003e \u003cp\u003eOur analysis showed that VidAIQ outperformed traditional human-made content, reducing production time from 24 hours to 3.75 hours per four-hour video tutorial\u0026mdash;a sixfold efficiency boost. AI-generated content cost just \u003cspan\u003e$\u003c/span\u003e5.04 per video, compared to \u003cspan\u003e$\u003c/span\u003e192 for human-produced tutorials. Engagement metrics collected over 30 days showed that AI-driven videos maintained similar viewership but with higher watch times, indicating slightly higher engagement. User preferences were leaning towards AI-generated content, with a higher like-to-view ratio of 5.61%, outperforming both the 4.85% for human videos and the industry average of 2.28%. These results demonstrate AI\u0026rsquo;s ability to create compelling educational experiences with similar or higher overall performance compared to human-generated video content. Future research is needed to explore performance trends over extended timeframes.\u003c/p\u003e \u003cp\u003eThis study demonstrates how AI-driven technologies can enhance the speed and scale of producing high-quality tutorials without compromising engagement or educational effectiveness. It achieves a remarkable balance of cost-effectiveness, allowing educational resources to be scaled to meet diverse learning needs.\u003c/p\u003e","manuscriptTitle":"AI-Driven Conversion of Textbooks into Long-Form Educational Videos","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-08 09:27:01","doi":"10.21203/rs.3.rs-6379057/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"33e9f35f-158a-476b-99a6-b11aa39ade37","owner":[],"postedDate":"April 8th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":46713150,"name":"Artificial Intelligence and Machine Learning"},{"id":46713151,"name":"Software Engineering"}],"tags":[],"updatedAt":"2025-04-08T09:27:02+00:00","versionOfRecord":[],"versionCreatedAt":"2025-04-08 09:27:01","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6379057","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6379057","identity":"rs-6379057","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00