Exploratory Data Analysis (EDA) on Undergraduate Data Science Students Through R Programming | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Exploratory Data Analysis (EDA) on Undergraduate Data Science Students Through R Programming Ashish Katyal, Pankaj Kumar Sharma, Manoj Kannan This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7422204/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract This study explores the use of exploratory data analysis (EDA) as a tool for experiential learning in the third-year AIML course "Introduction to R Programming" (PCC-CSE-354G). Conducted with undergraduate data science students, the research aimed to provide hands-on experience in data collection, manipulation, and visualization using R programming. The dataset, encompassing attributes such as age, gender, height, weight, and physical activity status, was self-collected by students in three randomly assigned groups (Alpha, Beta, and Gamma) under instructor supervision. Physical measurements, including height and weight, were recorded using measuring tapes and digital weighing machines to ensure precision. The study employed R libraries such as ggplot2, dplyr, and tidyr to perform EDA, focusing on descriptive and comparative analyses of team-based and gender-based patterns. Insights included the relationships between age, physical characteristics, and activity status, highlighting trends such as greater physical activity among lighter individuals and team-specific differences in gender composition. Correlation and statistical testing were further employed to deepen the analysis, revealing weak but notable relationships between age and physical activity. This hands-on approach not only enabled students to engage deeply with real-world data but also fostered teamwork, critical thinking, and technical proficiency in R programming. The findings demonstrate the effectiveness of integrating EDA into active learning frameworks, providing a valuable blueprint for similar educational initiatives in data science curricula. Artificial Intelligence and Machine Learning Exploratory Data Analysis (EDA) Problem-Based Learning (PBL) Experiential-Based Learning (EBL) Machine Learning Artificial Intelligence Integrated Course Design (ICD) Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Introduction and Review of Literature The evolving landscape of education demands innovative approaches that align pedagogy, policy, and practice to equip learners with the skills and knowledge required for the 21st century. Four key frameworks—Integrated Course Design (ICD), National Education Policy (NEP), India, Experiential-Based Learning (EBL), and Problem-Based Learning (PBL)—offer complementary perspectives that, when aligned, create a transformative educational paradigm. The alignment of these four approaches offers a powerful synergy. ICD provides a structured foundation, NEP sets a policy-driven vision, EBL emphasizes experiential engagement, and PBL cultivates problem-solving abilities. Together, they form an integrated framework for holistic education that nurtures academic excellence, employability, innovation, and societal impact. (Fig. 1 ). Integrated Course Design (ICD) provides a structured approach to course development, emphasizing the alignment of learning objectives, activities, and assessments to create significant learning experiences (Fink, 2013 ; Katyal et al., 2024 ). This systematic design fosters deep, meaningful engagement by ensuring that all course elements work cohesively to achieve educational outcomes. The National Education Policy (NEP), India, introduced in 2020, advocates for an education system that is flexible, multidisciplinary, and rooted in experiential and inquiry-based learning. NEP emphasizes skill development, holistic growth, and inclusivity to prepare students for global challenges (DAS & DAS, 2024 ; Kulal et al., 2024 ; Mhavan et al., 2022 ). Experiential-Based Learning (EBL) prioritizes hands-on, reflective learning through real-world experiences, encouraging learners to actively construct knowledge (Almeida & França, 2022 ; Bethell & Morgan, 2011 ; Rosenkranz, 2022 ). Grounded in Kolb’s experiential learning theory, EBL fosters practical skills, adaptability, and deeper understanding through active engagement (Kolb, 1984 ). Problem-Based Learning (PBL) centers on solving complex, real-world problems, promoting collaboration, inquiry, and critical thinking. By situating learning in authentic contexts, PBL prepares students for the challenges of the modern world (Hmelo-Silver, 2004 ; Katyal & Kannan, 2022 ; Paul et al., 2023 ). In today’s data-driven world, proficiency in data analysis is a critical skill across industries and disciplines (Konkolova & Paralic, 2018 ). As organizations increasingly rely on data to make informed decisions, educational institutions have recognized the need to equip students with robust analytical skills (Hicks & Irizarry, 2018 ). This has led to the integration of data science courses into undergraduate curricula, providing students with tools and techniques for handling, analyzing, and interpreting data effectively (Donohoo, 2017 ). Exploratory Data Analysis (EDA) plays a foundational role in data science by enabling students to understand the structure, trends, and patterns within datasets (Beyer, 1981 ). It involves summarizing data sets both statistically and visually to uncover relationships, anomalies, and initial insights that inform further analysis (Wright et al., 2023 ). For students, engaging in EDA serves as an entry point into the practical application of theoretical concepts, bridging the gap between learning and real-world problem-solving (Tsai, 2024 ). Also, the increasing emphasis on experiential learning and active engagement in higher education aligns closely with the methodologies used in EDA (Allen, 2021 ; Barman et al., 2022 ). Hands-on activities, such as collecting and analyzing real-world data, allow students to take ownership of the learning process, encouraging creativity, collaboration, and critical thinking (Forrester et al., 2022 ; Henrique Berssanette & Carlos De Francisco, 2021). Such experiential approaches not only deepen students’ understanding of course material but also prepare them for challenges they may encounter in their professional careers (Calderon et al., 2023 ; Masegosa et al., 2024 ; Tucker et al., 2023 ). The inclusion of Exploratory Data Analysis (EDA) as a key learning outcome in this course allows students to directly engage with real-world data, fostering an environment where they can develop technical proficiency and critical thinking skills (Kesler et al., 2022 ). By engaging students in the entire data analysis process—from data collection and cleaning to visualization and interpretation—the course bridges theoretical knowledge and practical application (Sakamaki et al., 2022 ). This aligns with the educational objectives outlined in frameworks like Fink’s Integrated Course Design (ICD), which emphasizes creating significant learning experiences through active engagement (Fink, 2013 ; Katyal et al., 2024 ). Methodology This research was conducted as part of the third-year AIML course "Introduction to R Programming" (PCC-CSE-354G) to provide students with experiential learning opportunities in exploratory data analysis (EDA). The methodology comprised the following steps: A. Context and Participants The study was carried out in a classroom setting, with students divided into three randomly assigned groups: Alpha, Beta, and Gamma. The participants were third-year undergraduate data science students enrolled in the course. B. Data Collection Process Data collection was designed to involve students actively in generating and handling their datasets. The data for this study were collected as part of a hands-on learning exercise during the Introduction to R Programming (PCC-CSE-354G) course offered in the third-year Artificial Intelligence and Machine Learning (AIML) program. The primary goal of this activity was to engage students in the complete process of data analysis, starting from raw data collection to advanced visualization and interpretation, in line with the principles of experiential and problem-based learning. 1. Group Formation and Data Gathering To ensure diversity in the data collection process, students were divided into three randomly formed groups, each representing a "team." This team-based approach encouraged collaboration, coordination, and the development of interpersonal skills, as students worked together to gather data from their peers (Katyal et al., 2025 ). The instructor monitored and guided the entire process to ensure the accuracy and reliability of the collected data. 2. Instruments and Tools The students utilized physical measurement tools, such as measuring tapes and digital weighing machines, to obtain precise values for height and weight. These tools were chosen for their simplicity, accessibility, and ability to provide accurate measurements. Additional demographic data, including age, gender, and physical activity status, was collected through direct interviews conducted by the students themselves. Physical activity status was recorded based on a binary (Yes/No) response to whether the student engaged in regular physical activity. The data collection activity was designed to be a practical experience that required students to address real-world challenges, such as ensuring consistency in measurements, handling variability among participants, and maintaining data integrity. By performing these tasks independently under supervision, students were exposed to the nuances of primary data collection, an essential skill in data science and research. 3. Ethical Considerations and Data Authenticity To ensure ethical compliance and maintain a respectful learning environment, all participants in the study were informed about the purpose of the data collection activity, and their consent was obtained prior to participation. The activity was conducted within the classroom setting, fostering a supportive atmosphere where students could comfortably engage in the process. The authenticity of the data was further reinforced by allowing students to measure and record data from their classmates in real-time. This approach not only ensured the accuracy of the measurements but also helped students appreciate the importance of firsthand data acquisition and the potential errors that can arise from improper handling or interpretation. C. Alignment with Learning Objectives This data collection exercise was meticulously aligned with the learning objectives of the course, providing students with experiential exposure to data gathering, cleaning, and preprocessing. The process also demonstrated how primary data acquisition forms the backbone of Exploratory Data Analysis (EDA). This activity laid the foundation for students to apply R programming techniques to analyze and visualize their own collected data, offering an authentic and contextually relevant experience that bridged theoretical knowledge with practical application. D. Data Analysis Workflow The primary aim of this study was to provide third-year undergraduate students in the Artificial Intelligence and Machine Learning (AIML) program with a comprehensive understanding of Exploratory Data Analysis (EDA) through a hands-on, experiential learning approach. By engaging in the full cycle of data collection, cleaning, analysis, and visualization, the study sought to align with modern pedagogical methods that emphasize practical skill development and critical thinking (Fig. 2 ). E. Active Learning Approach The methodology emphasized hands-on learning and active student participation. Students gained practical experience by collecting and analyzing data they could relate to personally. Using R programming for EDA allowed them to enhance their technical skills and understand data-driven insights. Collaborative efforts within groups fostered teamwork, critical thinking, and real-world problem-solving abilities. F. Data Analysis This study sought to encourage students to explore relationships within the collected dataset. By analyzing variables such as age, gender, height, weight, and physical activity, students were tasked with: 1. Identifying patterns and correlations among variables, such as the relationship between physical activity and weight or height. 2. Investigating team-specific trends and gender-based differences in the data. 3. Visualizing data effectively to communicate findings in a clear and impactful manner. Results and Discussion The findings from the exploratory data analysis (EDA) conducted on the data collected by students involve: A. Demographic Insights The dataset consisted of 15 participants from three groups (Alpha, Beta, Gamma) with varying age, gender, and physical activity statuses. A slight male predominance was observed in Team Alpha, while Team Gamma had the highest proportion of female participants (Fig. 3 ). Ages ranged from 20 to 23 years, with the mean age varying slightly across teams. Team Beta had the highest average age, while Teams Alpha and Gamma showed similar distributions. (Fig. 4 ) The diverse composition of teams reflects the random assignment process, allowing for unbiased team comparisons. B. Physical Activity and Anthropometric Relationships Analysis of physical activity participation revealed that participants engaging in regular physical activity generally exhibited lower weights, irrespective of their height. (Refer to Fig. 5 ). Team Gamma, with the highest percentage of physically active members, had the lowest average weight. Conversely, Team Alpha showed higher variability in weight among inactive participants. (Fig. 6 ). These trends align with existing literature suggesting that physical activity positively influences weight management, though variations may stem from individual lifestyle factors and measurement conditions (Bradley et al., 2022 ; Jakicic, 2009 ). C. Correlation Analysis The correlation analysis revealed weak correlations between age and height or weight, indicating age homogeneity within the sample. (Fig. 7 ). Moderate negative correlations were identified between physical activity and weight, particularly among males, suggesting gender-specific effects. The sample size limits statistical generalizability, and the observed trends reinforce the importance of physical activity in maintaining healthy anthropometric parameters. D. Team-Based Observations The team Alpha displayed the highest variability in height, potentially linked to its mixed-gender composition. (Fig. 8 ). Team Beta showed the lowest participation in physical activity, which may have contributed to its higher weight averages. (Fig. 9 ). E. Educational Implications The results of this study extend beyond data insights, serving as a testament to the pedagogical value of integrating experiential and problem-based learning in classroom settings. Students gained hands-on experience in data collection and analysis, enhancing their practical skills. The activity fostered collaboration and critical thinking, as teams navigated challenges related to data inconsistencies and ethical considerations. Conclusion and Future Aspects This study presents a comprehensive exploratory data analysis (EDA) of data collected by undergraduate students as part of an experiential learning activity in the course "Introduction to R Programming." By integrating problem-based learning (PBL) with real-world data collection and analysis, the study not only uncovered meaningful insights but also demonstrated the pedagogical value of hands-on exercises in data science education. The analysis revealed several key findings: 1. Demographic Trends: Participants from three randomly assigned teams exhibited diverse demographic characteristics, with notable differences in age, gender composition, and physical activity levels. 2. Activity and Anthropometrics: A moderate negative correlation was observed between physical activity and weight, emphasizing the impact of regular exercise on maintaining healthy body parameters. 3. Team-Specific Insights: Team-level comparisons highlighted the influence of group composition on height, weight, and activity patterns, underscoring the importance of team dynamics in interpreting data. 4. Correlation Analysis: Weak correlations between age and anthropometric variables suggested the uniformity of age among participants, while physical activity showed gender-specific effects on weight management. These activities provided students with a complete data science workflow, from data collection and cleaning to visualization and interpretation. The use of physical instruments for data gathering reinforced the importance of accuracy and reliability in real-world data collection. Additionally, the collaborative nature of the exercise fostered critical thinking and problem-solving skills. This study contributes to data science education by demonstrating the efficacy of active learning strategies, such as experiential learning and PBL, in equipping students with technical and analytical competencies. It also aligns with modern pedagogical frameworks, including Fink’s Integrated Course Design (ICD) and the principles outlined in India’s National Education Policy (NEP) 2020, which emphasize holistic, hands-on, and multidisciplinary learning. While the findings provide valuable insights, the small sample size and manual data collection methods limit the generalizability of the results. Future work could expand the dataset, incorporate additional variables, and explore the use of digital tools to improve measurement accuracy. Nonetheless, this study underscores the transformative potential of experiential learning in bridging the gap between theoretical knowledge and practical application, preparing students to tackle real-world challenges with confidence. This exercise stands as a model for integrating data analysis, pedagogy, and collaborative research in academic settings, paving the way for innovative approaches to teaching and learning in the field of data science. The findings provide valuable insights, but several limitations must be acknowledged. The sample size (n=15) limits the generalizability of results. Measurement variability due to the use of manual tools (e.g., tape measures) may introduce bias. Future studies can build upon this research by increasing sample size for statistical robustness, expanding the scope to include additional variables like dietary habits or sleep patterns, and leveraging automated or digital tools for precise measurements. References Allen, G. I. (2021). Experiential Learning in Data Science: Developing an Interdisciplinary, Client-Sponsored Capstone Program. Proceedings of the 52nd ACM Technical Symposium on Computer Science Education , 516–522. https://doi.org/10.1145/3408877.3432536 Almeida, C., & França, C. (2022). Improving the PBL method with experiential learning theory in software engineering teaching. Proceedings of the 4th International Workshop on Software Engineering Education for the Next Generation , 28–35. https://doi.org/10.1145/3528231.3536382 Barman, A., Chen, S., Chang, A., & Allen, G. (2022). Experiential Learning in Data Science Through a Novel Client-Facing Consulting Course. 2022 IEEE Frontiers in Education Conference (FIE) , 1–9. https://doi.org/10.1109/FIE56618.2022.9962532 Bethell, S., & Morgan, K. (2011). Problem-based and experiential learning: Engaging students in an undergraduate physical education module. The Journal of Hospitality Leisure Sport and Tourism , 10 (1), 128–134. https://doi.org/10.3794/johlste.101.365 Beyer, H. (1981). Tukey, John W.: Exploratory Data Analysis. Addison‐Wesley Publishing Company Reading, Mass. — Menlo Park, Cal., London, Amsterdam, Don Mills, Ontario, Sydney 1977, XVI, 688 S. Biometrical Journal , 23 (4), 413–414. https://doi.org/10.1002/bimj.4710230408 Bradley, T., Campbell, E., Dray, J., Bartlem, K., Wye, P., Hanly, G., Gibson, L., Fehily, C., Bailey, J., Wynne, O., Colyvas, K., & Bowman, J. (2022). Systematic review of lifestyle interventions to improve weight, physical activity and diet among people with a mental health condition. Systematic Reviews , 11 (1), 198. https://doi.org/10.1186/s13643-022-02067-3 Calderon, I., Silva, W., & Feitosa, E. (2023). Active Learning Methodologies for Teaching Programming in Undergraduate Courses: A Systematic Mapping Study. Informatics in Education . https://doi.org/10.15388/infedu.2024.11 DAS, P., & DAS, G. (2024). National Education Policy-2020: Research and Innovations for Transforming Higher Education . https://doi.org/10.5281/ZENODO.10845051 Donohoo, J. (2017). Collective efficacy: How educators’ beliefs impact student learning . Corwin. Fink, L. D. (2013). Creating Significant Learning Experiences: An Integrated Approach to Designing College Courses . John Wiley & Sons. Forrester, C., Schwikert, S., Foster, J., & Corwin, L. (2022). Undergraduate R Programming Anxiety in Ecology: Persistent Gender Gaps and Coping Strategies. CBE—Life Sciences Education , 21 (2), ar29. https://doi.org/10.1187/cbe.21-05-0133 Henrique Berssanette, J., & Carlos De Francisco, A. (2021). Active Learning in the Context of the Teaching/Learning of Computer Programming: A Systematic Review. Journal of Information Technology Education: Research , 20 , 201–220. https://doi.org/10.28945/4767 Hicks, S. C., & Irizarry, R. A. (2018). A Guide to Teaching Data Science. The American Statistician , 72 (4), 382–391. https://doi.org/10.1080/00031305.2017.1356747 Hmelo-Silver, C. E. (2004). Problem-Based Learning: What and How Do Students Learn? Educational Psychology Review , 16 (3), 235–266. https://doi.org/10.1023/B:EDPR.0000034022.16470.f3 Jakicic, J. M. (2009). The Effect of Physical Activity on Body Weight. Obesity , 17 (S3). https://doi.org/10.1038/oby.2009.386 Katyal, A., Chopra, Y., Sunita, Rajput, R., Bansal, A., & Bhatnagar, A. (2025). Sentiment Analysis of Student’s Subjective Feedback Data Using Natural Language Processing. 2025 Seventh International Conference on Computational Intelligence andCommunication Technologies (CCICT) , 628–631. https://doi.org/10.1109/CCICT65753.2025.00100 Katyal, A., Chowdhury, S., Sharma, P. K., & Kannan, M. (2024). Fink’s Integrated Course Design and Taxonomy: The Impact of Their Use in an Undergraduate Introductory Course on Bioinformatics. Journal of Science Education and Technology . https://doi.org/10.1007/s10956-024-10100-4 Katyal, A., & Kannan, M. (2022). Employing Collaborative Problem-Based Learning for an Immersive Online Experience in an Undergraduate Bioinformatics Course (SSRN Scholarly Paper 4026348). https://doi.org/10.2139/ssrn.4026348 Kesler, A., Shamir-Inbal, T., & Blau, I. (2022). Active Learning by Visual Programming: Pedagogical Perspectives of Instructivist and Constructivist Code Teachers and Their Implications on Actual Teaching Strategies and Students’ Programming Artifacts. Journal of Educational Computing Research , 60 (1), 28–55. https://doi.org/10.1177/07356331211017793 Kolb, D. A. (1984). Experimental learning: Experience as the source of learning and development . Prentice-Hall. Konkolova, V., & Paralic, J. (2018). Active Learning in Data Science Education. 2018 16th International Conference on Emerging eLearning Technologies and Applications (ICETA) , 285–290. https://doi.org/10.1109/ICETA.2018.8572219 Kulal, A., N., A., Dinesh, S., Bhat, D. C., & Girish, A. (2024). Evaluating the Promise and Pitfalls of India’s National Education Policy 2020: Insights from the Perspectives of Students, Teachers, and Experts. Sage Open , 14 (4), 21582440241279367. https://doi.org/10.1177/21582440241279367 Masegosa, A. R., Cabañas, R., Maldonado, A. D., & Morales, M. (2024). Learning Styles Impact Students’ Perceptions on Active Learning Methodologies: A Case Study on the Use of Live Coding and Short Programming Exercises. Education Sciences , 14 (3), 250. https://doi.org/10.3390/educsci14030250 Mhavan, N., Nair, D., & Gudipudi, A. B. (2022). National Education Policy-2020. In M. S. Manna, K. Sood, B. Balusamy, N. Chilamkurti, & I. Rajathi George, Edutech Enabled Teaching (1st ed., pp. 185–200). Chapman and Hall/CRC. https://doi.org/10.1201/9781003254942-12 Paul, R. M., Jazayeri, Y., Behjat, L., & Potter, M. (2023). Design of an Integrated Project-Based Learning Curriculum: Analysis Through Fink’s Taxonomy of Significant Learning. IEEE Transactions on Education , 66 (5), 457–467. https://doi.org/10.1109/TE.2023.3307974 Rosenkranz, N. (2022). The Best of Both Worlds: Experiential Problem-Based Learning Approaches in Hospitality Education. Journal of Hospitality & Tourism Education , 34 (2), 111–123. https://doi.org/10.1080/10963758.2021.1963739 Sakamaki, K., Taguri, M., Nishiuchi, H., Akimoto, Y., & Koizumi, K. (2022). Experience of distance education for project-based learning in data science. Japanese Journal of Statistics and Data Science , 5 (2), 757–767. https://doi.org/10.1007/s42081-022-00154-2 Tsai, Y.-C. (2024). Empowering students through active learning in educational big data analytics. Smart Learning Environments , 11 (1), 14. https://doi.org/10.1186/s40561-024-00300-1 Tucker, M. C., Shaw, S. T., Son, J. Y., & Stigler, J. W. (2023). Teaching Statistics and Data Analysis with R. Journal of Statistics and Data Science Education , 31 (1), 18–32. https://doi.org/10.1080/26939169.2022.2089410 Wright, C., Meng, Q., Breshock, M. R., Atta, L., Taub, M. A., Jager, L. R., Muschelli, J., & Hicks, S. C. (2023). Open Case Studies: Statistics and Data Science Education through Real-World Applications (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2301.05298 Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7422204","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":503422440,"identity":"c5d12bd4-3214-4a78-94e6-fc7367863c4d","order_by":0,"name":"Ashish Katyal","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABBElEQVRIiWNgGAWjYHACNoYEhgMGYOaDCiDBzNxApBY2IDPhDEgLIxFaGGBaEttAAgS08Lcff/bgwZ87xvzzeww/JM6rjeZvB2r5UbENpxaJMznmBoltz8wkjvEYSyRuO5474zBjA2PPmdu4rTmQwyaR2HDYhuEYjwFQy7HcBqAWZsY23Frkzz9/JpHw57CNPNCWH4lzjuXOJ6TF4EaCmUQC22Ezg2M8ZkDranI3ENJieOMNUGXbYWPDY2llFgnHDuRuBGo5iM8vcufTn0n++HPYcN7hw5tvfKipy513/vDBBz8q8HgfAThAKeAwmHmAGPVAwP4ASNQRqXgUjIJRMApGEgAAnMVjUbDA32AAAAAASUVORK5CYII=","orcid":"https://orcid.org/0000-0003-3469-3717","institution":"St. Andrew's Institute of Technology and Management","correspondingAuthor":true,"prefix":"","firstName":"Ashish","middleName":"","lastName":"Katyal","suffix":""},{"id":503438806,"identity":"a4277113-7fbf-4742-a952-6377ced82993","order_by":1,"name":"Pankaj Kumar Sharma","email":"","orcid":"https://orcid.org/0000-0002-4901-4628","institution":"Birla Institute of Technology and Science","correspondingAuthor":false,"prefix":"","firstName":"Pankaj","middleName":"Kumar","lastName":"Sharma","suffix":""},{"id":503438807,"identity":"2cc6e506-141f-4ae5-8ee2-e95d0657a33e","order_by":2,"name":"Manoj Kannan","email":"","orcid":"https://orcid.org/0000-0001-7099-2373","institution":"Plaksha University","correspondingAuthor":false,"prefix":"","firstName":"Manoj","middleName":"","lastName":"Kannan","suffix":""}],"badges":[],"createdAt":"2025-08-21 04:46:27","currentVersionCode":1,"declarations":{"humanSubjects":true,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":true,"humanSubjectConsent":true,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-7422204/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7422204/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":89991589,"identity":"82a12b86-9c43-4dfb-94e6-a934a9549a9e","added_by":"auto","created_at":"2025-08-27 07:23:17","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":39188,"visible":true,"origin":"","legend":"\u003cp\u003eThe nuanced alignments among these educational approaches and principles [Integrated Course Design (ICD), National Education Policy (NEP), Experiential-Based Learning (EBL), and Problem-Based Learning (PBL)]\u003c/p\u003e","description":"","filename":"1.png","url":"https://assets-eu.researchsquare.com/files/rs-7422204/v1/13f0df8605747ee300c68c66.png"},{"id":89991567,"identity":"5aba95ad-9606-4a39-8b9f-aad75d1b122c","added_by":"auto","created_at":"2025-08-27 07:23:16","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":38071,"visible":true,"origin":"","legend":"\u003cp\u003eThe workflow for performing Exploratory Data Analysis (EDA) in the Introduction to R Programming (PCC-CSE-354G) course\u003c/p\u003e","description":"","filename":"2.png","url":"https://assets-eu.researchsquare.com/files/rs-7422204/v1/428b973af3e838e46e13e3da.png"},{"id":89991583,"identity":"ae3bd400-9647-4422-a08f-ae1ca882a1ea","added_by":"auto","created_at":"2025-08-27 07:23:17","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":28225,"visible":true,"origin":"","legend":"\u003cp\u003eGender Composition by team\u003c/p\u003e","description":"","filename":"3.png","url":"https://assets-eu.researchsquare.com/files/rs-7422204/v1/23384e958b9f28a0b7561f7f.png"},{"id":89991574,"identity":"ac175a67-44a5-4172-9946-fc1986115440","added_by":"auto","created_at":"2025-08-27 07:23:16","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":23925,"visible":true,"origin":"","legend":"\u003cp\u003eAge Distribution by Team\u003c/p\u003e","description":"","filename":"4.png","url":"https://assets-eu.researchsquare.com/files/rs-7422204/v1/268af6e3d9c3ddd55d4d42ad.png"},{"id":89991599,"identity":"ccc7c308-c87b-4e5c-8409-e8c9e05c5441","added_by":"auto","created_at":"2025-08-27 07:23:18","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":34501,"visible":true,"origin":"","legend":"\u003cp\u003eWeight Distribution by Physical Activity\u003c/p\u003e","description":"","filename":"5.png","url":"https://assets-eu.researchsquare.com/files/rs-7422204/v1/635a02435e4e7e43a47c566c.png"},{"id":89992284,"identity":"0095e00d-248d-440d-921c-7f9bf7db9843","added_by":"auto","created_at":"2025-08-27 07:31:16","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":40193,"visible":true,"origin":"","legend":"\u003cp\u003eWeight Trends by Team and Activity Status\u003c/p\u003e","description":"","filename":"6.png","url":"https://assets-eu.researchsquare.com/files/rs-7422204/v1/055de5fa44700eee6ed01cfe.png"},{"id":89992287,"identity":"44acb8a7-dbca-424a-b6be-11d6d9f557a0","added_by":"auto","created_at":"2025-08-27 07:31:19","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":39408,"visible":true,"origin":"","legend":"\u003cp\u003eCorrelation Heatmap for Numeric Variables\u003c/p\u003e","description":"","filename":"7.png","url":"https://assets-eu.researchsquare.com/files/rs-7422204/v1/e537788fdfd5abde55ee92a0.png"},{"id":89992286,"identity":"4784332d-1ba2-40b0-b258-09589650aadb","added_by":"auto","created_at":"2025-08-27 07:31:17","extension":"png","order_by":8,"title":"Figure 8","display":"","copyAsset":false,"role":"figure","size":26123,"visible":true,"origin":"","legend":"\u003cp\u003eHeight Distribution by Team\u003c/p\u003e","description":"","filename":"8.png","url":"https://assets-eu.researchsquare.com/files/rs-7422204/v1/1b0a46ab7ebf5401c0d5ac85.png"},{"id":89991570,"identity":"0ce21195-3f3c-4db3-90a5-bf56f37ef481","added_by":"auto","created_at":"2025-08-27 07:23:16","extension":"png","order_by":9,"title":"Figure 9","display":"","copyAsset":false,"role":"figure","size":31407,"visible":true,"origin":"","legend":"\u003cp\u003ePhysical Activity Participation by Team\u003c/p\u003e","description":"","filename":"9.png","url":"https://assets-eu.researchsquare.com/files/rs-7422204/v1/7638b10c823ac22a0085006c.png"},{"id":89992288,"identity":"df601006-195c-41ad-98bb-26b2bd2963f6","added_by":"auto","created_at":"2025-08-27 07:31:24","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":758854,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7422204/v1/d85fa732-58ff-46ca-baea-f161256d6af1.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003e\u003cstrong\u003eExploratory Data Analysis (EDA) on Undergraduate Data Science Students Through R Programming\u003c/strong\u003e\u003c/p\u003e","fulltext":[{"header":"Introduction and Review of Literature","content":"\u003cp\u003eThe evolving landscape of education demands innovative approaches that align pedagogy, policy, and practice to equip learners with the skills and knowledge required for the 21st century. Four key frameworks\u0026mdash;Integrated Course Design (ICD), National Education Policy (NEP), India, Experiential-Based Learning (EBL), and Problem-Based Learning (PBL)\u0026mdash;offer complementary perspectives that, when aligned, create a transformative educational paradigm. The alignment of these four approaches offers a powerful synergy. ICD provides a structured foundation, NEP sets a policy-driven vision, EBL emphasizes experiential engagement, and PBL cultivates problem-solving abilities. Together, they form an integrated framework for holistic education that nurtures academic excellence, employability, innovation, and societal impact. (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eIntegrated Course Design (ICD) provides a structured approach to course development, emphasizing the alignment of learning objectives, activities, and assessments to create significant learning experiences (Fink, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Katyal et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). This systematic design fosters deep, meaningful engagement by ensuring that all course elements work cohesively to achieve educational outcomes. The National Education Policy (NEP), India, introduced in 2020, advocates for an education system that is flexible, multidisciplinary, and rooted in experiential and inquiry-based learning. NEP emphasizes skill development, holistic growth, and inclusivity to prepare students for global challenges (DAS \u0026amp; DAS, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Kulal et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Mhavan et al., \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Experiential-Based Learning (EBL) prioritizes hands-on, reflective learning through real-world experiences, encouraging learners to actively construct knowledge (Almeida \u0026amp; Fran\u0026ccedil;a, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Bethell \u0026amp; Morgan, \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Rosenkranz, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Grounded in Kolb\u0026rsquo;s experiential learning theory, EBL fosters practical skills, adaptability, and deeper understanding through active engagement (Kolb, \u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e1984\u003c/span\u003e). Problem-Based Learning (PBL) centers on solving complex, real-world problems, promoting collaboration, inquiry, and critical thinking. By situating learning in authentic contexts, PBL prepares students for the challenges of the modern world (Hmelo-Silver, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2004\u003c/span\u003e; Katyal \u0026amp; Kannan, \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Paul et al., \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eIn today\u0026rsquo;s data-driven world, proficiency in data analysis is a critical skill across industries and disciplines (Konkolova \u0026amp; Paralic, \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). As organizations increasingly rely on data to make informed decisions, educational institutions have recognized the need to equip students with robust analytical skills (Hicks \u0026amp; Irizarry, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). This has led to the integration of data science courses into undergraduate curricula, providing students with tools and techniques for handling, analyzing, and interpreting data effectively (Donohoo, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2017\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eExploratory Data Analysis (EDA) plays a foundational role in data science by enabling students to understand the structure, trends, and patterns within datasets (Beyer, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e1981\u003c/span\u003e). It involves summarizing data sets both statistically and visually to uncover relationships, anomalies, and initial insights that inform further analysis (Wright et al., \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). For students, engaging in EDA serves as an entry point into the practical application of theoretical concepts, bridging the gap between learning and real-world problem-solving (Tsai, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eAlso, the increasing emphasis on experiential learning and active engagement in higher education aligns closely with the methodologies used in EDA (Allen, \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Barman et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Hands-on activities, such as collecting and analyzing real-world data, allow students to take ownership of the learning process, encouraging creativity, collaboration, and critical thinking (Forrester et al., \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Henrique Berssanette \u0026amp; Carlos De Francisco, 2021). Such experiential approaches not only deepen students\u0026rsquo; understanding of course material but also prepare them for challenges they may encounter in their professional careers (Calderon et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Masegosa et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; Tucker et al., \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2023\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eThe inclusion of Exploratory Data Analysis (EDA) as a key learning outcome in this course allows students to directly engage with real-world data, fostering an environment where they can develop technical proficiency and critical thinking skills (Kesler et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). By engaging students in the entire data analysis process\u0026mdash;from data collection and cleaning to visualization and interpretation\u0026mdash;the course bridges theoretical knowledge and practical application (Sakamaki et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). This aligns with the educational objectives outlined in frameworks like Fink\u0026rsquo;s Integrated Course Design (ICD), which emphasizes creating significant learning experiences through active engagement (Fink, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2013\u003c/span\u003e; Katyal et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e"},{"header":"Methodology","content":"\u003cp\u003eThis research was conducted as part of the third-year AIML course \u0026quot;Introduction to R Programming\u0026quot; (PCC-CSE-354G) to provide students with experiential learning opportunities in exploratory data analysis (EDA). The methodology comprised the following steps:\u003c/p\u003e\n\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003eA. Context and Participants\u003c/h2\u003e\n \u003cp\u003eThe study was carried out in a classroom setting, with students divided into three randomly assigned groups: Alpha, Beta, and Gamma. The participants were third-year undergraduate data science students enrolled in the course.\u003c/p\u003e\n\u003c/div\u003e\n\u003ch3\u003eB. Data Collection Process\u003c/h3\u003e\n\u003cp\u003eData collection was designed to involve students actively in generating and handling their datasets. The data for this study were collected as part of a hands-on learning exercise during the Introduction to R Programming (PCC-CSE-354G) course offered in the third-year Artificial Intelligence and Machine Learning (AIML) program. The primary goal of this activity was to engage students in the complete process of data analysis, starting from raw data collection to advanced visualization and interpretation, in line with the principles of experiential and problem-based learning.\u003c/p\u003e\n\u003cp\u003e\u003cspan\u003e\u003c/span\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e1. Group Formation and Data Gathering\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003c/p\u003e\n\u003cp\u003eTo ensure diversity in the data collection process, students were divided into three randomly formed groups, each representing a \u0026quot;team.\u0026quot; This team-based approach encouraged collaboration, coordination, and the development of interpersonal skills, as students worked together to gather data from their peers (Katyal et al., \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e). The instructor monitored and guided the entire process to ensure the accuracy and reliability of the collected data.\u003c/p\u003e\n\u003cp\u003e\u003cspan\u003e\u003c/span\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e2. Instruments and Tools\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003c/p\u003e\n\u003cp\u003eThe students utilized physical measurement tools, such as measuring tapes and digital weighing machines, to obtain precise values for height and weight. These tools were chosen for their simplicity, accessibility, and ability to provide accurate measurements. Additional demographic data, including age, gender, and physical activity status, was collected through direct interviews conducted by the students themselves. Physical activity status was recorded based on a binary (Yes/No) response to whether the student engaged in regular physical activity.\u003c/p\u003e\n\u003cp\u003eThe data collection activity was designed to be a practical experience that required students to address real-world challenges, such as ensuring consistency in measurements, handling variability among participants, and maintaining data integrity. By performing these tasks independently under supervision, students were exposed to the nuances of primary data collection, an essential skill in data science and research.\u003c/p\u003e\n\u003cp\u003e\u003cspan\u003e\u003c/span\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003e3. Ethical Considerations and Data Authenticity\u003c/em\u003e\u003c/p\u003e\n\u003cp\u003e\u003c/p\u003e\n\u003cp\u003eTo ensure ethical compliance and maintain a respectful learning environment, all participants in the study were informed about the purpose of the data collection activity, and their consent was obtained prior to participation. The activity was conducted within the classroom setting, fostering a supportive atmosphere where students could comfortably engage in the process.\u003c/p\u003e\n\u003cp\u003eThe authenticity of the data was further reinforced by allowing students to measure and record data from their classmates in real-time. This approach not only ensured the accuracy of the measurements but also helped students appreciate the importance of firsthand data acquisition and the potential errors that can arise from improper handling or interpretation.\u003c/p\u003e\n\u003ch3\u003eC. Alignment with Learning Objectives\u003c/h3\u003e\n\u003cp\u003eThis data collection exercise was meticulously aligned with the learning objectives of the course, providing students with experiential exposure to data gathering, cleaning, and preprocessing. The process also demonstrated how primary data acquisition forms the backbone of Exploratory Data Analysis (EDA). This activity laid the foundation for students to apply R programming techniques to analyze and visualize their own collected data, offering an authentic and contextually relevant experience that bridged theoretical knowledge with practical application.\u003c/p\u003e\n\u003ch3\u003eD. Data Analysis Workflow\u003c/h3\u003e\n\u003cp\u003eThe primary aim of this study was to provide third-year undergraduate students in the Artificial Intelligence and Machine Learning (AIML) program with a comprehensive understanding of Exploratory Data Analysis (EDA) through a hands-on, experiential learning approach. By engaging in the full cycle of data collection, cleaning, analysis, and visualization, the study sought to align with modern pedagogical methods that emphasize practical skill development and critical thinking (Fig. \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e).\u003c/p\u003e\n\u003ch3\u003eE. Active Learning Approach\u003c/h3\u003e\n\u003cp\u003eThe methodology emphasized hands-on learning and active student participation. Students gained practical experience by collecting and analyzing data they could relate to personally. Using R programming for EDA allowed them to enhance their technical skills and understand data-driven insights. Collaborative efforts within groups fostered teamwork, critical thinking, and real-world problem-solving abilities.\u003c/p\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n \u003ch2\u003eF. Data Analysis\u003c/h2\u003e\n \u003cp\u003eThis study sought to encourage students to explore relationships within the collected dataset. By analyzing variables such as age, gender, height, weight, and physical activity, students were tasked with:\u003c/p\u003e\u003cspan\u003e\n \u003cp\u003e1. Identifying patterns and correlations among variables, such as the relationship between physical activity and weight or height.\u003c/p\u003e\n \u003c/span\u003e\u003cspan\u003e\n \u003cp\u003e2. Investigating team-specific trends and gender-based differences in the data.\u003c/p\u003e\n \u003c/span\u003e\u003cspan\u003e\n \u003cp\u003e3. Visualizing data effectively to communicate findings in a clear and impactful manner.\u003c/p\u003e\n \u003c/span\u003e\n\u003c/div\u003e"},{"header":"Results and Discussion","content":"\u003cp\u003eThe findings from the exploratory data analysis (EDA) conducted on the data collected by students involve:\u003c/p\u003e\n\u003ch3\u003eA. Demographic Insights\u003c/h3\u003e\n\u003cp\u003eThe dataset consisted of 15 participants from three groups (Alpha, Beta, Gamma) with varying age, gender, and physical activity statuses. A slight male predominance was observed in Team Alpha, while Team Gamma had the highest proportion of female participants (Fig. \u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e). Ages ranged from 20 to 23 years, with the mean age varying slightly across teams. Team Beta had the highest average age, while Teams Alpha and Gamma showed similar distributions. (Fig. \u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e)\u003c/p\u003e\n\u003cp\u003eThe diverse composition of teams reflects the random assignment process, allowing for unbiased team comparisons.\u003c/p\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n \u003ch2\u003eB. Physical Activity and Anthropometric Relationships\u003c/h2\u003e\n \u003cp\u003eAnalysis of physical activity participation revealed that participants engaging in regular physical activity generally exhibited lower weights, irrespective of their height. (Refer to Fig. \u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003eTeam Gamma, with the highest percentage of physically active members, had the lowest average weight. Conversely, Team Alpha showed higher variability in weight among inactive participants. (Fig. \u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003eThese trends align with existing literature suggesting that physical activity positively influences weight management, though variations may stem from individual lifestyle factors and measurement conditions (Bradley et al., \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e; Jakicic, \u003cspan class=\"CitationRef\"\u003e2009\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n \u003ch2\u003eC. Correlation Analysis\u003c/h2\u003e\n \u003cp\u003eThe correlation analysis revealed weak correlations between age and height or weight, indicating age homogeneity within the sample. (Fig. \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e). Moderate negative correlations were identified between physical activity and weight, particularly among males, suggesting gender-specific effects.\u003c/p\u003e\n \u003cp\u003eThe sample size limits statistical generalizability, and the observed trends reinforce the importance of physical activity in maintaining healthy anthropometric parameters.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n \u003ch2\u003eD. Team-Based Observations\u003c/h2\u003e\n \u003cp\u003eThe team Alpha displayed the highest variability in height, potentially linked to its mixed-gender composition. (Fig. \u003cspan class=\"InternalRef\"\u003e8\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003eTeam Beta showed the lowest participation in physical activity, which may have contributed to its higher weight averages. (Fig. \u003cspan class=\"InternalRef\"\u003e9\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\n \u003ch2\u003eE. Educational Implications\u003c/h2\u003e\n \u003cp\u003eThe results of this study extend beyond data insights, serving as a testament to the pedagogical value of integrating experiential and problem-based learning in classroom settings. Students gained hands-on experience in data collection and analysis, enhancing their practical skills. The activity fostered collaboration and critical thinking, as teams navigated challenges related to data inconsistencies and ethical considerations.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"Conclusion and Future Aspects","content":"\u003cp\u003eThis study presents a comprehensive exploratory data analysis (EDA) of data collected by undergraduate students as part of an experiential learning activity in the course \u0026quot;Introduction to R Programming.\u0026quot; By integrating problem-based learning (PBL) with real-world data collection and analysis, the study not only uncovered meaningful insights but also demonstrated the pedagogical value of hands-on exercises in data science education.\u003c/p\u003e\n\u003cp\u003eThe analysis revealed several key findings:\u003c/p\u003e\n\u003cp\u003e1. Demographic Trends: Participants from three randomly assigned teams exhibited diverse demographic characteristics, with notable differences in age, gender composition, and physical activity levels.\u003c/p\u003e\n\u003cp\u003e2. Activity and Anthropometrics: A moderate negative correlation was observed between physical activity and weight, emphasizing the impact of regular exercise on maintaining healthy body parameters.\u003c/p\u003e\n\u003cp\u003e3. Team-Specific Insights: Team-level comparisons highlighted the influence of group composition on height, weight, and activity patterns, underscoring the importance of team dynamics in interpreting data.\u003c/p\u003e\n\u003cp\u003e4. Correlation Analysis: Weak correlations between age and anthropometric variables suggested the uniformity of age among participants, while physical activity showed gender-specific effects on weight management.\u003c/p\u003e\n\u003cp\u003eThese activities provided students with a complete data science workflow, from data collection and cleaning to visualization and interpretation. The use of physical instruments for data gathering reinforced the importance of accuracy and reliability in real-world data collection. Additionally, the collaborative nature of the exercise fostered critical thinking and problem-solving skills.\u003c/p\u003e\n\u003cp\u003eThis study contributes to data science education by demonstrating the efficacy of active learning strategies, such as experiential learning and PBL, in equipping students with technical and analytical competencies. It also aligns with modern pedagogical frameworks, including Fink\u0026rsquo;s Integrated Course Design (ICD) and the principles outlined in India\u0026rsquo;s National Education Policy (NEP) 2020, which emphasize holistic, hands-on, and multidisciplinary learning.\u003c/p\u003e\n\u003cp\u003eWhile the findings provide valuable insights, the small sample size and manual data collection methods limit the generalizability of the results. Future work could expand the dataset, incorporate additional variables, and explore the use of digital tools to improve measurement accuracy. Nonetheless, this study underscores the transformative potential of experiential learning in bridging the gap between theoretical knowledge and practical application, preparing students to tackle real-world challenges with confidence.\u003c/p\u003e\n\u003cp\u003eThis exercise stands as a model for integrating data analysis, pedagogy, and collaborative research in academic settings, paving the way for innovative approaches to teaching and learning in the field of data science.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe findings provide valuable insights, but several limitations must be acknowledged. The sample size (n=15) limits the generalizability of results. Measurement variability due to the use of manual tools (e.g., tape measures) may introduce bias.\u003c/p\u003e\n\u003cp\u003eFuture studies can build upon this research by increasing sample size for statistical robustness, expanding the scope to include additional variables like dietary habits or sleep patterns, and leveraging automated or digital tools for precise measurements.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n \u003cli\u003eAllen, G. I. (2021). Experiential Learning in Data Science: Developing an Interdisciplinary, Client-Sponsored Capstone Program. \u003cem\u003eProceedings of the 52nd ACM Technical Symposium on Computer Science Education\u003c/em\u003e, 516\u0026ndash;522. https://doi.org/10.1145/3408877.3432536\u003c/li\u003e\n \u003cli\u003eAlmeida, C., \u0026amp; Fran\u0026ccedil;a, C. (2022). Improving the PBL method with experiential learning theory in software engineering teaching. \u003cem\u003eProceedings of the 4th International Workshop on Software Engineering Education for the Next Generation\u003c/em\u003e, 28\u0026ndash;35. https://doi.org/10.1145/3528231.3536382\u003c/li\u003e\n \u003cli\u003eBarman, A., Chen, S., Chang, A., \u0026amp; Allen, G. (2022). Experiential Learning in Data Science Through a Novel Client-Facing Consulting Course. \u003cem\u003e2022 IEEE Frontiers in Education Conference (FIE)\u003c/em\u003e, 1\u0026ndash;9. https://doi.org/10.1109/FIE56618.2022.9962532\u003c/li\u003e\n \u003cli\u003eBethell, S., \u0026amp; Morgan, K. (2011). Problem-based and experiential learning: Engaging students in an undergraduate physical education module. \u003cem\u003eThe Journal of Hospitality Leisure Sport and Tourism\u003c/em\u003e, \u003cem\u003e10\u003c/em\u003e(1), 128\u0026ndash;134. https://doi.org/10.3794/johlste.101.365\u003c/li\u003e\n \u003cli\u003eBeyer, H. (1981). Tukey, John W.: Exploratory Data Analysis. Addison‐Wesley Publishing Company Reading, Mass. \u0026mdash; Menlo Park, Cal., London, Amsterdam, Don Mills, Ontario, Sydney 1977, XVI, 688 S. \u003cem\u003eBiometrical Journal\u003c/em\u003e, \u003cem\u003e23\u003c/em\u003e(4), 413\u0026ndash;414. https://doi.org/10.1002/bimj.4710230408\u003c/li\u003e\n \u003cli\u003eBradley, T., Campbell, E., Dray, J., Bartlem, K., Wye, P., Hanly, G., Gibson, L., Fehily, C., Bailey, J., Wynne, O., Colyvas, K., \u0026amp; Bowman, J. (2022). Systematic review of lifestyle interventions to improve weight, physical activity and diet among people with a mental health condition. \u003cem\u003eSystematic Reviews\u003c/em\u003e, \u003cem\u003e11\u003c/em\u003e(1), 198. https://doi.org/10.1186/s13643-022-02067-3\u003c/li\u003e\n \u003cli\u003eCalderon, I., Silva, W., \u0026amp; Feitosa, E. (2023). Active Learning Methodologies for Teaching Programming in Undergraduate Courses: A Systematic Mapping Study. \u003cem\u003eInformatics in Education\u003c/em\u003e. https://doi.org/10.15388/infedu.2024.11\u003c/li\u003e\n \u003cli\u003eDAS, P., \u0026amp; DAS, G. (2024). \u003cem\u003eNational Education Policy-2020: Research and Innovations for Transforming Higher Education\u003c/em\u003e. https://doi.org/10.5281/ZENODO.10845051\u003c/li\u003e\n \u003cli\u003eDonohoo, J. (2017). \u003cem\u003eCollective efficacy: How educators\u0026rsquo; beliefs impact student learning\u003c/em\u003e. Corwin.\u003c/li\u003e\n \u003cli\u003eFink, L. D. (2013). \u003cem\u003eCreating Significant Learning Experiences: An Integrated Approach to Designing College Courses\u003c/em\u003e. John Wiley \u0026amp; Sons.\u003c/li\u003e\n \u003cli\u003eForrester, C., Schwikert, S., Foster, J., \u0026amp; Corwin, L. (2022). Undergraduate R Programming Anxiety in Ecology: Persistent Gender Gaps and Coping Strategies. \u003cem\u003eCBE\u0026mdash;Life Sciences Education\u003c/em\u003e, \u003cem\u003e21\u003c/em\u003e(2), ar29. https://doi.org/10.1187/cbe.21-05-0133\u003c/li\u003e\n \u003cli\u003eHenrique Berssanette, J., \u0026amp; Carlos De Francisco, A. (2021). Active Learning in the Context of the Teaching/Learning of Computer Programming: A Systematic Review. \u003cem\u003eJournal of Information Technology Education: Research\u003c/em\u003e, \u003cem\u003e20\u003c/em\u003e, 201\u0026ndash;220. https://doi.org/10.28945/4767\u003c/li\u003e\n \u003cli\u003eHicks, S. C., \u0026amp; Irizarry, R. A. (2018). A Guide to Teaching Data Science. \u003cem\u003eThe American Statistician\u003c/em\u003e, \u003cem\u003e72\u003c/em\u003e(4), 382\u0026ndash;391. https://doi.org/10.1080/00031305.2017.1356747\u003c/li\u003e\n \u003cli\u003eHmelo-Silver, C. E. (2004). Problem-Based Learning: What and How Do Students Learn? \u003cem\u003eEducational Psychology Review\u003c/em\u003e, \u003cem\u003e16\u003c/em\u003e(3), 235\u0026ndash;266. https://doi.org/10.1023/B:EDPR.0000034022.16470.f3\u003c/li\u003e\n \u003cli\u003eJakicic, J. M. (2009). The Effect of Physical Activity on Body Weight. \u003cem\u003eObesity\u003c/em\u003e, \u003cem\u003e17\u003c/em\u003e(S3). https://doi.org/10.1038/oby.2009.386\u003c/li\u003e\n \u003cli\u003eKatyal, A., Chopra, Y., Sunita, Rajput, R., Bansal, A., \u0026amp; Bhatnagar, A. (2025). Sentiment Analysis of Student\u0026rsquo;s Subjective Feedback Data Using Natural Language Processing. \u003cem\u003e2025 Seventh International Conference on Computational Intelligence andCommunication Technologies (CCICT)\u003c/em\u003e, 628\u0026ndash;631. https://doi.org/10.1109/CCICT65753.2025.00100\u003c/li\u003e\n \u003cli\u003eKatyal, A., Chowdhury, S., Sharma, P. K., \u0026amp; Kannan, M. (2024). Fink\u0026rsquo;s Integrated Course Design and Taxonomy: The Impact of Their Use in an Undergraduate Introductory Course on Bioinformatics. \u003cem\u003eJournal of Science Education and Technology\u003c/em\u003e. https://doi.org/10.1007/s10956-024-10100-4\u003c/li\u003e\n \u003cli\u003eKatyal, A., \u0026amp; Kannan, M. (2022). \u003cem\u003eEmploying Collaborative Problem-Based Learning for an Immersive Online Experience in an Undergraduate Bioinformatics Course\u003c/em\u003e (SSRN Scholarly Paper 4026348). https://doi.org/10.2139/ssrn.4026348\u003c/li\u003e\n \u003cli\u003eKesler, A., Shamir-Inbal, T., \u0026amp; Blau, I. (2022). Active Learning by Visual Programming: Pedagogical Perspectives of Instructivist and Constructivist Code Teachers and Their Implications on Actual Teaching Strategies and Students\u0026rsquo; Programming Artifacts. \u003cem\u003eJournal of Educational Computing Research\u003c/em\u003e, \u003cem\u003e60\u003c/em\u003e(1), 28\u0026ndash;55. https://doi.org/10.1177/07356331211017793\u003c/li\u003e\n \u003cli\u003eKolb, D. A. (1984). \u003cem\u003eExperimental learning: Experience as the source of learning and development\u003c/em\u003e. Prentice-Hall.\u003c/li\u003e\n \u003cli\u003eKonkolova, V., \u0026amp; Paralic, J. (2018). Active Learning in Data Science Education. \u003cem\u003e2018 16th International Conference on Emerging eLearning Technologies and Applications (ICETA)\u003c/em\u003e, 285\u0026ndash;290. https://doi.org/10.1109/ICETA.2018.8572219\u003c/li\u003e\n \u003cli\u003eKulal, A., N., A., Dinesh, S., Bhat, D. C., \u0026amp; Girish, A. (2024). Evaluating the Promise and Pitfalls of India\u0026rsquo;s National Education Policy 2020: Insights from the Perspectives of Students, Teachers, and Experts. \u003cem\u003eSage Open\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(4), 21582440241279367. https://doi.org/10.1177/21582440241279367\u003c/li\u003e\n \u003cli\u003eMasegosa, A. R., Caba\u0026ntilde;as, R., Maldonado, A. D., \u0026amp; Morales, M. (2024). Learning Styles Impact Students\u0026rsquo; Perceptions on Active Learning Methodologies: A Case Study on the Use of Live Coding and Short Programming Exercises. \u003cem\u003eEducation Sciences\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(3), 250. https://doi.org/10.3390/educsci14030250\u003c/li\u003e\n \u003cli\u003eMhavan, N., Nair, D., \u0026amp; Gudipudi, A. B. (2022). National Education Policy-2020. In M. S. Manna, K. Sood, B. Balusamy, N. Chilamkurti, \u0026amp; I. Rajathi George, \u003cem\u003eEdutech Enabled Teaching\u003c/em\u003e (1st ed., pp. 185\u0026ndash;200). Chapman and Hall/CRC. https://doi.org/10.1201/9781003254942-12\u003c/li\u003e\n \u003cli\u003ePaul, R. M., Jazayeri, Y., Behjat, L., \u0026amp; Potter, M. (2023). Design of an Integrated Project-Based Learning Curriculum: Analysis Through Fink\u0026rsquo;s Taxonomy of Significant Learning. \u003cem\u003eIEEE Transactions on Education\u003c/em\u003e, \u003cem\u003e66\u003c/em\u003e(5), 457\u0026ndash;467. https://doi.org/10.1109/TE.2023.3307974\u003c/li\u003e\n \u003cli\u003eRosenkranz, N. (2022). The Best of Both Worlds: Experiential Problem-Based Learning Approaches in Hospitality Education. \u003cem\u003eJournal of Hospitality \u0026amp; Tourism Education\u003c/em\u003e, \u003cem\u003e34\u003c/em\u003e(2), 111\u0026ndash;123. https://doi.org/10.1080/10963758.2021.1963739\u003c/li\u003e\n \u003cli\u003eSakamaki, K., Taguri, M., Nishiuchi, H., Akimoto, Y., \u0026amp; Koizumi, K. (2022). Experience of distance education for project-based learning in data science. \u003cem\u003eJapanese Journal of Statistics and Data Science\u003c/em\u003e, \u003cem\u003e5\u003c/em\u003e(2), 757\u0026ndash;767. https://doi.org/10.1007/s42081-022-00154-2\u003c/li\u003e\n \u003cli\u003eTsai, Y.-C. (2024). Empowering students through active learning in educational big data analytics. \u003cem\u003eSmart Learning Environments\u003c/em\u003e, \u003cem\u003e11\u003c/em\u003e(1), 14. https://doi.org/10.1186/s40561-024-00300-1\u003c/li\u003e\n \u003cli\u003eTucker, M. C., Shaw, S. T., Son, J. Y., \u0026amp; Stigler, J. W. (2023). Teaching Statistics and Data Analysis with R. \u003cem\u003eJournal of Statistics and Data Science Education\u003c/em\u003e, \u003cem\u003e31\u003c/em\u003e(1), 18\u0026ndash;32. https://doi.org/10.1080/26939169.2022.2089410\u003c/li\u003e\n \u003cli\u003eWright, C., Meng, Q., Breshock, M. R., Atta, L., Taub, M. A., Jager, L. R., Muschelli, J., \u0026amp; Hicks, S. C. (2023). \u003cem\u003eOpen Case Studies: Statistics and Data Science Education through Real-World Applications\u003c/em\u003e (Version 1). arXiv. https://doi.org/10.48550/ARXIV.2301.05298\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"St. Andrew's Institute of Technology and Management","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Exploratory Data Analysis (EDA), Problem-Based Learning (PBL), Experiential-Based Learning (EBL), Machine Learning, Artificial Intelligence, Integrated Course Design (ICD)","lastPublishedDoi":"10.21203/rs.3.rs-7422204/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7422204/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThis study explores the use of exploratory data analysis (EDA) as a tool for experiential learning in the third-year AIML course \"Introduction to R Programming\" (PCC-CSE-354G). Conducted with undergraduate data science students, the research aimed to provide hands-on experience in data collection, manipulation, and visualization using R programming. The dataset, encompassing attributes such as age, gender, height, weight, and physical activity status, was self-collected by students in three randomly assigned groups (Alpha, Beta, and Gamma) under instructor supervision. Physical measurements, including height and weight, were recorded using measuring tapes and digital weighing machines to ensure precision.\u003c/p\u003e\u003cp\u003eThe study employed R libraries such as ggplot2, dplyr, and tidyr to perform EDA, focusing on descriptive and comparative analyses of team-based and gender-based patterns. Insights included the relationships between age, physical characteristics, and activity status, highlighting trends such as greater physical activity among lighter individuals and team-specific differences in gender composition. Correlation and statistical testing were further employed to deepen the analysis, revealing weak but notable relationships between age and physical activity.\u003c/p\u003e\u003cp\u003eThis hands-on approach not only enabled students to engage deeply with real-world data but also fostered teamwork, critical thinking, and technical proficiency in R programming. The findings demonstrate the effectiveness of integrating EDA into active learning frameworks, providing a valuable blueprint for similar educational initiatives in data science curricula.\u003c/p\u003e","manuscriptTitle":"Exploratory Data Analysis (EDA) on Undergraduate Data Science Students Through R Programming","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-08-27 07:23:10","doi":"10.21203/rs.3.rs-7422204/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d75fa8e0-265e-4930-93bf-3fadbd141b6f","owner":[],"postedDate":"August 27th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":53480228,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2025-08-27T07:23:10+00:00","versionOfRecord":[],"versionCreatedAt":"2025-08-27 07:23:10","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7422204","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7422204","identity":"rs-7422204","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.