Common Misconceptions in Interpreting ROC Curves Among Medical Graduate Students: A Conceptual Diagnostic Study with a Simulation-Based Educational Intervention | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Common Misconceptions in Interpreting ROC Curves Among Medical Graduate Students: A Conceptual Diagnostic Study with a Simulation-Based Educational Intervention Yulong Wang, Fangxue Yang, Yu Zhang, Juxiong Xiao, Liping Zhu, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9132102/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 5 You are reading this latest preprint version Abstract Background: Receiver Operating Characteristic (ROC) curve analysis is widely used in diagnostic research and machine-learning model evaluation. However, medical graduate students frequently misinterpret key statistical concepts underlying ROC analysis, including the inferential meaning of the Area Under the Curve (AUC), the rationale for confidence intervals (CIs), and appropriate comparison of ROC curves. Despite the central role of ROC methodology in modern medical research, systematic investigation of these conceptual misconceptions remains limited. Methods: A total of 41 medical graduate students participated in this quasi-experimental study. A 2.5-hour structured educational intervention was delivered, addressing three conceptual domains: AUC interpretation, CI interpretation, and ROC curve comparison. Conceptual understanding was assessed using a 15-item multiple-choice questionnaire administered immediately before and after the intervention. Results: At baseline, misconceptions were highly prevalent across all domains (92.7%-97.6%). Following the intervention, significant improvements were observed in all domains (all P < 0.001). Mean scores increased from 48.8% (SD 26.9%) to 80.0% (SD 21.4%) for AUC interpretation, from 40.9% (SD 26.8%) to 74.1% (SD 22.9%) for CI interpretation, and from 34.6% (SD 26.8%) to 68.2% (SD 25.2%) for ROC curve comparison. Effect sizes were large across all domains (Cohen's d > 2.0). Conclusions: Conceptual misunderstandings in ROC analysis are systematic among medical graduate students. A targeted, simulation-based educational intervention significantly improved students' statistical reasoning. These findings highlight the importance of emphasizing conceptual understanding in graduate-level statistics education. ROC curve Area under the curve Statistical misconceptions Medical education Graduate students Simulation-based learning Figures Figure 1 Figure 2 Figure 3 Introduction Receiver Operating Characteristic (ROC) curve analysis is a fundamental statistical tool in diagnostic and predictive research and is widely applied across radiology, laboratory medicine, pathology, oncology, and biomedical engineering [ 1 – 3 ]. The Area Under the Curve (AUC) serves as a global measure of discriminative performance and is routinely reported in clinical research, biomarker evaluation, and machine-learning model assessment [ 4 ]. Consequently, the ability to correctly interpret ROC curves and AUC statistics has become an essential competency in graduate-level medical research training. Despite its widespread use, educators frequently observe that graduate students encounter persistent conceptual difficulties when interpreting ROC-related statistics. In particular, misunderstandings often arise regarding (1) why the AUC should be treated as a statistical estimate rather than a fixed geometric quantity, (2) how CIs for the AUC are derived and interpreted, and (3) why comparison of ROC curves requires paired statistical testing rather than visual inspection of CI overlap [ 6 , 7 ]. These misunderstandings are not merely computational mistakes; rather, they reflect deeper confusion about sampling variability, statistical estimation, and correlated data structures [ 8 ]. Research in statistics education has long distinguished between procedural knowledge (knowing how to perform calculations) and conceptual understanding (knowing what statistical measures represent and how they should be interpreted) [ 9 , 10 ]. Numerous studies have demonstrated that students may successfully generate statistical outputs using software while simultaneously misinterpreting inferential concepts such as CIs and hypothesis testing [ 11 – 13 ]. Although misconceptions related to p-values and CIs have been widely documented in medical education and psychology literature [ 11 , 12 , 14 ], conceptual misunderstandings specific to ROC analysis remain underexplored. Given the central role of ROC methodology in modern diagnostic research and artificial intelligence model evaluation, this gap warrants systematic investigation. During routine graduate teaching in radiology and pathology programs, we observed recurring reasoning patterns suggesting reproducible misconceptions in ROC interpretation. These patterns appeared consistent across student cohorts and disciplinary backgrounds, indicating that the issue may reflect structural features of how ROC analysis is commonly taught rather than isolated individual misunderstandings. The present study therefore pursued two primary aims. First, we sought to systematically characterize common conceptual misconceptions related to AUC estimation, CI interpretation, and ROC curve comparison among medical graduate students. Second, we aimed to evaluate whether a structured, conceptually targeted educational intervention integrating visual explanation and simulation-based demonstration could improve students’ statistical reasoning regarding ROC analysis [ 18 , 19 ]. By focusing on reasoning processes rather than procedural instruction alone, this study contributes to ongoing efforts to strengthen statistical literacy in medical research education [ 20 ]. Methods Study Design and Participants This study employed a mixed-methods educational research design combining observational identification of conceptual misconceptions with a quasi-experimental pre-post intervention evaluation. Participants were graduate students enrolled in diagnostic research training programs within radiology and pathology at a tertiary medical university in China. A total of 41 graduate students voluntarily participated in the study. All participants had prior exposure to ROC curve analysis through coursework or research activities but had not received formal conceptual training specifically addressing statistical reasoning related to AUC estimation, CIs, or ROC curve comparison. Participation was voluntary, and all data were collected anonymously for educational research purposes. Identification of Conceptual Misconceptions Common misconceptions were initially identified through routine teaching observations over multiple academic cohorts. During graduate research training sessions, instructors documented recurring reasoning difficulties encountered when students interpreted ROC analyses in their thesis projects and journal discussions. These observations were further refined through a structured baseline assessment consisting of multiple-choice conceptual questions designed to evaluate understanding in three domains: Interpretation of AUC as a statistical estimate Conceptual basis of AUC CIs Principles of paired ROC curve comparison Student responses were analyzed qualitatively to identify recurring patterns of misunderstanding. Educational Intervention The educational intervention was delivered as a structured 2.5-hour classroom session designed to address the three conceptual misconception domains identified during baseline teaching observations: (1) interpretation of AUC as a statistical estimate, (2) conceptual meaning of CIs, and (3) statistical comparison of ROC curves. The session was conducted in a face-to-face format and facilitated by two instructors with experience in graduate research training in diagnostic medicine. The intervention consisted of three sequential modules corresponding to the three misconception domains. Module 1: AUC as a statistical estimate (approximately 45 minutes) This module focused on correcting the misconception that AUC represents a fixed geometric quantity rather than a statistical estimate derived from sample data. The instructor first introduced the concept of sampling variability using simplified visual diagrams illustrating the difference between population parameters and sample estimates. Simulation-based demonstrations were then used to show how ROC curves and AUC values vary across repeated samples generated from the same underlying population distribution. Students were guided to interpret how sample size and variability influence the estimated AUC. Module 2: CIs for AUC (approximately 45 minutes) This module addressed common misunderstandings related to the probabilistic meaning of CIs [ 15 – 17 ]. Using graphical demonstrations and simulated datasets, the instructor illustrated how bootstrap resampling [ 19 ] generates a distribution of AUC estimates. Students observed how repeated resampling produces variability in AUC values and how the 95% CI reflects the uncertainty of statistical estimation rather than fluctuations in the ROC curve itself. Module 3: Comparison of ROC curves (approximately 45 minutes) The final module focused on the correct statistical logic for comparing ROC curves. Students were presented with examples of two ROC curves with overlapping CIs and were asked to predict whether the curves differed significantly. The instructor then introduced the concept of paired ROC comparison and explained the DeLong test [ 7 ]. Guided reasoning exercises were used to help students distinguish between visual comparison of ROC curves and formal statistical testing. At the end of the instructional session, key concepts from all three modules were summarized, followed by the administration of the post-intervention assessment ( approximately 15 minutes ). A schematic overview of the instructional framework is shown in Fig. 1 . A detailed outline of the 2.5-hour instructional session, including the structure of each module and the teaching materials used, is provided in Supplementary Appendix 1. [Figure 1 ] Outcome Assessment Conceptual understanding of ROC analysis was evaluated using a structured questionnaire administered immediately before and after the intervention. The same instrument was used for both the pre- and post-intervention assessments. The questionnaire consisted of 15 multiple-choice items, organized into three conceptual domains: (1) AUC interpretation (5 items), (2) CI interpretation (5 items), and (3) ROC curve comparison (5 items). Each item assessed a specific conceptual misunderstanding commonly encountered in graduate research practice. For each domain, students’ scores were calculated as the percentage of correct responses: Thus, possible scores for each domain were 0, 20, 40, 60, 80, or 100. Mean±standard deviation (SD) domain scores were then calculated across all participants. Misconception prevalence at baseline and post-intervention was defined as the proportion of students who answered at least one item incorrectly within each conceptual domain. The full assessment questionnaire is provided in Supplementary Appendix 2 . Statistical Analysis Descriptive statistics were used to summarize baseline misconception patterns. Pre- and post-intervention scores were compared using paired t -tests to evaluate improvements in conceptual understanding. All statistical analyses were performed using R software (version 4.2.0). A two-sided P value < 0.05 was considered statistically significant. Ethical Considerations This study was conducted as an educational evaluation within routine graduate teaching activities. Participation was voluntary and anonymous, and no identifiable personal data were collected. According to institutional guidelines for educational research, formal ethical approval was not required. Written informed consent was obtained from all participants. Results Baseline and Post-intervention Conceptual Misconceptions At baseline, conceptual misconceptions were highly prevalent across all three domains of ROC interpretation (Fig. 2 ). In the AUC interpretation domain, 92.7% of students answered at least one item incorrectly. Similarly, misconception prevalence was 95.1% for CI interpretation and 97.6% for ROC curve comparison, indicating widespread misunderstanding of the statistical principles underlying ROC analysis. Following the educational intervention, the prevalence of misconceptions decreased across all domains (Fig. 2 ). The proportion of students demonstrating misconceptions declined to 58.5% in the AUC interpretation domain, 70.7% in CI interpretation, and 78.0% in ROC curve comparison. The largest reduction was observed in AUC interpretation, suggesting that the visual and simulation-based explanations were particularly effective in clarifying the conceptual meaning of AUC as a statistical estimate. Overall, these findings indicate that the structured instructional intervention substantially reduced conceptual misunderstandings related to ROC analysis among graduate students. [Figure 2 ] Quantitative Improvement Following Intervention Following the structured conceptual intervention, students demonstrated significant improvements across all domains of assessment. The improvement in conceptual accuracy across domains is visually illustrated in Fig. 3 . Mean scores for AUC interpretation increased from 48.8% (SD 26.9%) to 80.0% (SD 21.4%) (P < 0.001), CI conceptual understanding increased from 40.9% (SD 26.8%) to 74.1% (SD 22.9%) (P < 0.001), and ROC comparison logic increased from 34.6% (SD 26.8%) to 68.2% (SD 25.2%) (P < 0.001). The overall conceptual accuracy increased from 41.4% (SD 26.8%) to 74.1% (SD 23.1%) (P 2.0). Among the three domains, the largest improvement was observed in ROC curve comparison, which represented the most conceptually challenging topic at baseline. These results indicate that the structured instructional intervention substantially improved students' statistical understanding of ROC analysis. [Figure 3 ] Discussion This study identified systematic patterns of conceptual misunderstanding among medical graduate students regarding ROC curve analysis based on questionnaire responses. Misconceptions were particularly concentrated in three domains: interpreting AUC as a deterministic geometric quantity rather than a statistical estimate, misinterpreting CIs as intrinsic fluctuations of the curve rather than reflections of sampling variability, and relying on visual overlap of CIs instead of appropriate paired statistical testing when comparing ROC curves. These findings suggest that difficulties in ROC interpretation reflect deeper inferential reasoning challenges rather than isolated computational errors. Our findings align with prior research in statistics education demonstrating that students often possess procedural competence without fully developed conceptual understanding [ 9 , 10 ]. Learners may successfully compute statistics using software while lacking an accurate mental model of what those statistics represent. Misinterpretations of CIs, in particular, have been widely documented across disciplines [ 11 , 15 – 17 ]. Hoekstra et al. showed that even researchers frequently misinterpret CIs as reflecting the probability that a parameter lies within the interval [ 11 ]. Similar misunderstandings observed in the present study suggest that ROC-related confusion may represent a specific manifestation of broader inferential reasoning challenges in statistical education [ 15 ]. The misconception that AUC represents a fixed “area” analogous to a geometric measurement highlights the influence of intuitive reasoning. Cognitive psychology research suggests that learners often rely on surface analogies when interpreting abstract statistical constructs [ 23 ]. Because AUC is visually presented as an area under a curve, students may default to deterministic geometric interpretations, overlooking the inferential nature of statistical estimation. Without explicit instruction addressing sampling variability and estimation theory, such intuitive interpretations may persist. The difficulty in comparing ROC curves further illustrates challenges related to correlated data structures. Prior work in medical statistics education has emphasized that paired and independent comparisons require fundamentally different inferential frameworks [ 24 ]. However, this distinction is rarely made explicit in ROC instruction, where visual comparison of curves may inadvertently encourage heuristic reasoning. The present study suggests that explicit conceptual emphasis on paired comparisons and formal statistical testing (e.g., the DeLong method [ 7 ]) can substantially improve interpretive accuracy. Importantly, our intervention emphasized conceptual visualization and simulation-based reasoning rather than formula derivation alone. Simulation-based approaches have been shown to enhance statistical understanding by making sampling variability more tangible [ 18 , 19 , 25 ]. By illustrating how AUC estimates vary across repeated resampling [ 19 ], the instructional modules aimed to clarify the relationship between sampling variability and CI estimation. This pedagogical strategy may help bridge the gap between procedural statistical output and conceptual interpretation [ 17 ]. The implications of these findings extend beyond ROC interpretation. In the era of artificial intelligence and predictive modeling, ROC curves and AUC values are frequently reported in clinical machine-learning studies [ 26 ]. Misinterpretation of these metrics may lead to overestimation of model performance or inappropriate comparison of diagnostic tools. Strengthening conceptual understanding of ROC analysis is therefore directly relevant to improving research literacy and responsible interpretation of AI-driven diagnostic systems. Several limitations should be acknowledged. First, the study was conducted within a single institution with a relatively small sample size, which may limit generalizability. Second, the intervention effect was measured immediately after instruction, and long-term retention of conceptual understanding was not evaluated. Third, the assessment relied on multiple-choice items rather than open-ended reasoning tasks, which may not fully capture the depth of students’ statistical reasoning processes. Future studies could incorporate longitudinal follow-up and multi-institutional validation to further evaluate the generalizability and durability of the educational intervention. In conclusion, this study demonstrates that misconceptions in ROC interpretation among medical graduate students are systematic and theoretically interpretable within broader frameworks of statistical reasoning. A targeted conceptual intervention integrating visualization and simulation was associated with substantial improvements in students’ understanding of ROC-related statistical concepts. Emphasizing conceptual reasoning rather than procedural computation alone may represent an effective strategy for strengthening statistical literacy in medical research training. Declarations Ethics approval and consent to participate: This study was performed in accordance with the Declaration of Helsinki. It was conducted as an educational evaluation within routine graduate teaching activities at Xiangya Hospital, Central South University. According to the institutional guidelines for educational research, formal ethical approval was waived by the Medical Ethics Committee of Xiangya Hospital. Participation was voluntary and anonymous, and written informed consent was obtained from all participants. Consent for publication: Not applicable. Availability of data and materials: The raw questionnaire response datasets generated and analysed during the current study are not publicly available due to privacy protections for educational research participants, but are available from the corresponding author on reasonable request. Competing interests: The authors of this manuscript declare no conflict of interest. Funding: This study was funded in part by the 2022 Central South University Graduate Course Ideological and Political Construction Project (2022YJSKS018), 2023 Central South University Graduate Education Teaching Reform Research Project (2023JGB118), 2023 Central South University Education and Teaching Reform Research Project (2023jy153), and 2023 Hunan Province Ordinary Higher Education Teaching Reform Research Project (HNJG-20230120). Authors' contributions: Yitao Mao and Luqing Zhao designed the study and developed the research questions. Yulong Wang drafted the initial manuscript. Yitao Mao reviewed and revised the manuscript for intellectual content. Fangxue Yang, Yu Zhang, Juxiong Xiao, Liping Zhu, and Luqing Zhao supervised the entire study process. Luqing Zhao obtained funding for the study. All authors read and approved the final manuscript. Acknowledgements: Not applicable. References Fawcett T. An introduction to ROC analysis. Pattern Recogn Lett. 2006;27(8):861–74. https://doi.org/10.1016/j.patrec.2005.10.010 . Pepe MS. Receiver operating characteristic methodology. J Am Stat Assoc. 2000;95(449):308–11. https://doi.org/10.1080/01621459.2000.10473992 . Mandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010;5(9):1315–6. https://doi.org/10.1097/JTO.0b013e3181e84089 . Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Casp J Intern Med. 2013;4(2):627–35. https://doi.org/10.12925/cjim.2013.4.2.627 . Hand DJ. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn. 2009;77(1):103–23. https://doi.org/10.1007/s10994-009-5119-5 . Pepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford: Oxford University Press; 2003. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–45. https://doi.org/10.2307/2531595 . He Z, Zhang Q, Song M, Tan X, Wang W. Four overlooked errors in ROC analysis: how to prevent and avoid. BMJ Evid Based Med. 2025;30(3):208–11. https://doi.org/10.1136/bmjebm-2024-113078 . Garfield J, Ben-Zvi D. Developing Students' Statistical Reasoning: Connecting Research and Teaching Practice. Dordrecht: Springer; 2008. https://doi.org/10.1007/978-1-4020-8308-4 . delMas R, Garfield J, Ooms A, Chance B. Assessing students' conceptual understanding after a first course in statistics. Stat Educ Res J. 2007;6(2):28–58. https://doi.org/10.5204/srej.302 . Hoekstra R, Morey RD, Rouder JN, Wagenmakers EJ. Robust misinterpretation of confidence intervals. Psychon Bull Rev. 2014;21(5):1157–64. https://doi.org/10.3758/s13423-013-0572-3 . Gigerenzer G, Gaissmaier W, Kurz-Milcke E, Schwartz LM, Woloshin S. Helping doctors and patients make sense of health statistics. Psychol Sci Public Interest. 2007;8(2):53–96. https://doi.org/10.1111/j.1529-1006.2007.00038.x . Anderson BL, Williams S, Schubert JR. Medical students' statistical literacy and attitudes toward statistics. Adv Health Sci Educ Theory Pract. 2023;28(4):1087–105. https://doi.org/10.1007/s10459-023-10215-4 . López-Martín MDM, Álvarez-Arroyo R. Exploring misconceptions related to sampling distribution, confidence intervals, and hypothesis testing: a perspective from econometrics. Advances in Quantitative Methods for Economics and Business. Cham: Springer; 2025. pp. 147–57. https://doi.org/10.1007/978-3-031-57565-9_8 . Wang X, Reich NG, Horton NJ. Enriching students' conceptual understanding of confidence intervals: an interactive trivia-based classroom activity. arXiv Preprint. 2017;arXiv:1701.08452. https://arxiv.org/abs/1701.08452 Thiesmeier R, Orsini N. Rolling the DICE (Design, Interpret, Compute, Estimate): interactive learning of biostatistics with simulations. JMIR Med Educ. 2024;10(1):e52679. https://doi.org/10.2196/52679 . Fidler F, Cumming G. Teaching confidence intervals: problems and potential solutions. In: Rossman A, Chance B, eds. Proceedings of the Seventh International Conference on Teaching Statistics (ICOTS-7). Voorburg: International Statistical Institute; 2006:1–6. Orsini N, Thiesmeier R, Båge K. A simulation-based approach to teach interaction effects in postgraduate biostatistics courses. J Stat Data Sci Educ. 2024;32(4):395–404. https://doi.org/10.1080/26939169.2024.2330937 . Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Boca Raton: CRC; 1994. https://doi.org/10.1201/9781315144649 . American Statistical Association. GAISE College Report: Guidelines for Assessment and Instruction in Statistics Education. Alexandria: American Statistical Association; 2016. https://www.amstat.org/asa/files/pdfs/GAISE/GaiseCollege_Full.pdf . Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, Müller M. pROC: an open-source package for R and S + to analyze and compare ROC curves. BMC Bioinformatics. 2011;12(1):77. https://doi.org/10.1186/1471-2105-12-77 . Kahneman D, Thinking. Fast and Slow. New York: Farrar, Straus and Giroux; 2011. Bland JM. An Introduction to Medical Statistics. 4th ed. Oxford: Oxford University Press; 2015. https://doi.org/10.1093/med/9780198741676.001.0001 . Chance B, Ben-Zvi D, Garfield J, Medina E. The role of technology in improving student learning of statistics. Technol Innov Stat Educ. 2007;1(1):1–20. https://doi.org/10.5070/T511004085 . Cobb G. Mere renovation is too little too late: We need to rethink our undergraduate curriculum from the ground up. Am Stat. 2015;69(4):266–82. https://doi.org/10.1080/00031305.2015.1052433 . Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44–56. https://doi.org/10.1038/s41591-018-0300-7 . Additional Declarations No competing interests reported. Supplementary Files SupplementaryAppendix1.docx SupplementaryAppendix2.docx Cite Share Download PDF Status: Under Review Version 1 posted Reviewers invited by journal 17 Apr, 2026 Editor assigned by journal 13 Apr, 2026 Editor invited by journal 20 Mar, 2026 Submission checks completed at journal 19 Mar, 2026 First submitted to journal 19 Mar, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9132102","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":627797369,"identity":"6f56604c-2773-4de5-b7bb-2738e7efeb4f","order_by":0,"name":"Yulong Wang","email":"","orcid":"","institution":"Central South University","correspondingAuthor":false,"prefix":"","firstName":"Yulong","middleName":"","lastName":"Wang","suffix":""},{"id":627797370,"identity":"00eeff95-7b99-4ec3-ae05-0d4686b36a56","order_by":1,"name":"Fangxue Yang","email":"","orcid":"","institution":"Central South University","correspondingAuthor":false,"prefix":"","firstName":"Fangxue","middleName":"","lastName":"Yang","suffix":""},{"id":627797371,"identity":"5bbec05c-f8ef-4bbf-a7d0-4a3d52241b28","order_by":2,"name":"Yu Zhang","email":"","orcid":"","institution":"Oklahoma State University","correspondingAuthor":false,"prefix":"","firstName":"Yu","middleName":"","lastName":"Zhang","suffix":""},{"id":627797372,"identity":"19f6e6f0-3f98-4a29-847c-b4e1a7479ea3","order_by":3,"name":"Juxiong Xiao","email":"","orcid":"","institution":"Central South University","correspondingAuthor":false,"prefix":"","firstName":"Juxiong","middleName":"","lastName":"Xiao","suffix":""},{"id":627797373,"identity":"b6b7a1ab-0d85-4fc3-9e56-118df880acc4","order_by":4,"name":"Liping Zhu","email":"","orcid":"","institution":"Central South University","correspondingAuthor":false,"prefix":"","firstName":"Liping","middleName":"","lastName":"Zhu","suffix":""},{"id":627797374,"identity":"980f2666-5559-4828-bdcd-3c1e7b721896","order_by":5,"name":"Luqing Zhao","email":"","orcid":"","institution":"Central South University","correspondingAuthor":false,"prefix":"","firstName":"Luqing","middleName":"","lastName":"Zhao","suffix":""},{"id":627797375,"identity":"789434bf-3939-4d1b-bd5d-7d1596f396ba","order_by":6,"name":"Yitao Mao","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAtElEQVRIiWNgGAWjYNCCCgglQbyOA2dI1nKwjRQtuu2Hj0l/nGdnb3CA+eBtHga7PIJazM6kpUkc3JacuOEAW7I1D0NyMWEtB3LMgFoOJBgc4DGT5mE4kNhAUMv5N0Atcw4AHcb/jUgtN0C2NBxg3HCAh41YLc+SLc4cS06ceZjN2HKOQTIxDks+eKOixs6e73jzwxtvKuwIawECFkh0MIMIAyLUg9R+IE7dKBgFo2AUjFgAAPBIPQ4Pe/2RAAAAAElFTkSuQmCC","orcid":"","institution":"Central South University","correspondingAuthor":true,"prefix":"","firstName":"Yitao","middleName":"","lastName":"Mao","suffix":""}],"badges":[],"createdAt":"2026-03-16 02:23:31","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9132102/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9132102/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107734057,"identity":"2b21ca4b-ffe3-4a1d-8f1d-eb3d98168b3c","added_by":"auto","created_at":"2026-04-24 13:41:37","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":396723,"visible":true,"origin":"","legend":"\u003cp\u003eOverview of the instructional framework used in the 2.5-hour educational intervention. The intervention addressed three common misconception domains in ROC analysis—AUC interpretation, confidence interval interpretation, and ROC curve comparison—through conceptual explanation, simulation-based demonstration, and guided reasoning exercises.\u003c/p\u003e","description":"","filename":"Figure1.png","url":"https://assets-eu.researchsquare.com/files/rs-9132102/v1/7892ab184375b60cafdb9b44.png"},{"id":107734120,"identity":"74748523-f465-4ea2-ba61-7ae3c6f0887b","added_by":"auto","created_at":"2026-04-24 13:41:43","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":219875,"visible":true,"origin":"","legend":"\u003cp\u003ePre- and post-intervention prevalence of conceptual misconceptions in ROC interpretation among medical graduate students. Bars represent the proportion of students who answered at least one item incorrectly within each conceptual domain before and after the educational intervention.\u003c/p\u003e","description":"","filename":"Figur2.png","url":"https://assets-eu.researchsquare.com/files/rs-9132102/v1/f31a4972fa8a29ef0a8648b2.png"},{"id":107734049,"identity":"51f7b755-6e08-4981-bcce-d250b2314414","added_by":"auto","created_at":"2026-04-24 13:41:29","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":188195,"visible":true,"origin":"","legend":"\u003cp\u003ePre- and post-intervention conceptual accuracy across the three domains of ROC interpretation. Bars represent mean percentage scores, and error bars indicate standard deviations.\u003c/p\u003e","description":"","filename":"Figur3.png","url":"https://assets-eu.researchsquare.com/files/rs-9132102/v1/f2b9713e0a0d785a67236b89.png"},{"id":107869947,"identity":"fd1e34e1-f0c2-4657-b8f8-d1e5c3f10081","added_by":"auto","created_at":"2026-04-27 07:38:31","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1890283,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9132102/v1/6335a5c7-3e85-4485-b8d1-b438f825194d.pdf"},{"id":107734048,"identity":"5f3a0473-ef6c-45fe-96fa-2fdb63298941","added_by":"auto","created_at":"2026-04-24 13:41:28","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":13695,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryAppendix1.docx","url":"https://assets-eu.researchsquare.com/files/rs-9132102/v1/ce79fb2803f685e71b719d31.docx"},{"id":107734117,"identity":"db09412a-73d4-40bd-9a2b-4e95fb8f7b8b","added_by":"auto","created_at":"2026-04-24 13:41:43","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":21911,"visible":true,"origin":"","legend":"","description":"","filename":"SupplementaryAppendix2.docx","url":"https://assets-eu.researchsquare.com/files/rs-9132102/v1/a92b8425160dd6356ee0417c.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Common Misconceptions in Interpreting ROC Curves Among Medical Graduate Students: A Conceptual Diagnostic Study with a Simulation-Based Educational Intervention","fulltext":[{"header":"Introduction","content":"\u003cp\u003eReceiver Operating Characteristic (ROC) curve analysis is a fundamental statistical tool in diagnostic and predictive research and is widely applied across radiology, laboratory medicine, pathology, oncology, and biomedical engineering [\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. The Area Under the Curve (AUC) serves as a global measure of discriminative performance and is routinely reported in clinical research, biomarker evaluation, and machine-learning model assessment [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]. Consequently, the ability to correctly interpret ROC curves and AUC statistics has become an essential competency in graduate-level medical research training.\u003c/p\u003e \u003cp\u003eDespite its widespread use, educators frequently observe that graduate students encounter persistent conceptual difficulties when interpreting ROC-related statistics. In particular, misunderstandings often arise regarding (1) why the AUC should be treated as a statistical estimate rather than a fixed geometric quantity, (2) how CIs for the AUC are derived and interpreted, and (3) why comparison of ROC curves requires paired statistical testing rather than visual inspection of CI overlap [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. These misunderstandings are not merely computational mistakes; rather, they reflect deeper confusion about sampling variability, statistical estimation, and correlated data structures [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eResearch in statistics education has long distinguished between procedural knowledge (knowing how to perform calculations) and conceptual understanding (knowing what statistical measures represent and how they should be interpreted) [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Numerous studies have demonstrated that students may successfully generate statistical outputs using software while simultaneously misinterpreting inferential concepts such as CIs and hypothesis testing [\u003cspan additionalcitationids=\"CR12\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Although misconceptions related to p-values and CIs have been widely documented in medical education and psychology literature [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e], conceptual misunderstandings specific to ROC analysis remain underexplored. Given the central role of ROC methodology in modern diagnostic research and artificial intelligence model evaluation, this gap warrants systematic investigation.\u003c/p\u003e \u003cp\u003eDuring routine graduate teaching in radiology and pathology programs, we observed recurring reasoning patterns suggesting reproducible misconceptions in ROC interpretation. These patterns appeared consistent across student cohorts and disciplinary backgrounds, indicating that the issue may reflect structural features of how ROC analysis is commonly taught rather than isolated individual misunderstandings.\u003c/p\u003e \u003cp\u003eThe present study therefore pursued two primary aims. First, we sought to systematically characterize common conceptual misconceptions related to AUC estimation, CI interpretation, and ROC curve comparison among medical graduate students. Second, we aimed to evaluate whether a structured, conceptually targeted educational intervention integrating visual explanation and simulation-based demonstration could improve students\u0026rsquo; statistical reasoning regarding ROC analysis [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e]. By focusing on reasoning processes rather than procedural instruction alone, this study contributes to ongoing efforts to strengthen statistical literacy in medical research education [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e].\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eStudy Design and Participants\u003c/h2\u003e \u003cp\u003eThis study employed a mixed-methods educational research design combining observational identification of conceptual misconceptions with a quasi-experimental pre-post intervention evaluation. Participants were graduate students enrolled in diagnostic research training programs within radiology and pathology at a tertiary medical university in China.\u003c/p\u003e \u003cp\u003eA total of 41 graduate students voluntarily participated in the study. All participants had prior exposure to ROC curve analysis through coursework or research activities but had not received formal conceptual training specifically addressing statistical reasoning related to AUC estimation, CIs, or ROC curve comparison.\u003c/p\u003e \u003cp\u003eParticipation was voluntary, and all data were collected anonymously for educational research purposes.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eIdentification of Conceptual Misconceptions\u003c/h3\u003e\n\u003cp\u003eCommon misconceptions were initially identified through routine teaching observations over multiple academic cohorts. During graduate research training sessions, instructors documented recurring reasoning difficulties encountered when students interpreted ROC analyses in their thesis projects and journal discussions.\u003c/p\u003e \u003cp\u003eThese observations were further refined through a structured baseline assessment consisting of multiple-choice conceptual questions designed to evaluate understanding in three domains:\u003c/p\u003e \u003cp\u003e \u003col\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eInterpretation of AUC as a statistical estimate\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003eConceptual basis of AUC CIs\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003cspan\u003e \u003cli\u003e \u003cp\u003ePrinciples of paired ROC curve comparison\u003c/p\u003e \u003c/li\u003e \u003c/span\u003e \u003c/ol\u003e \u003c/p\u003e \u003cp\u003eStudent responses were analyzed qualitatively to identify recurring patterns of misunderstanding.\u003c/p\u003e\n\u003ch3\u003eEducational Intervention\u003c/h3\u003e\n\u003cp\u003eThe educational intervention was delivered as a structured 2.5-hour classroom session designed to address the three conceptual misconception domains identified during baseline teaching observations: (1) interpretation of AUC as a statistical estimate, (2) conceptual meaning of CIs, and (3) statistical comparison of ROC curves.\u003c/p\u003e \u003cp\u003eThe session was conducted in a face-to-face format and facilitated by two instructors with experience in graduate research training in diagnostic medicine. The intervention consisted of three sequential modules corresponding to the three misconception domains.\u003c/p\u003e\n\u003ch3\u003eModule 1: AUC as a statistical estimate (approximately 45 minutes)\u003c/h3\u003e\n\u003cp\u003eThis module focused on correcting the misconception that AUC represents a fixed geometric quantity rather than a statistical estimate derived from sample data. The instructor first introduced the concept of sampling variability using simplified visual diagrams illustrating the difference between population parameters and sample estimates. Simulation-based demonstrations were then used to show how ROC curves and AUC values vary across repeated samples generated from the same underlying population distribution. Students were guided to interpret how sample size and variability influence the estimated AUC.\u003c/p\u003e\n\u003ch3\u003eModule 2: CIs for AUC (approximately 45 minutes)\u003c/h3\u003e\n\u003cp\u003eThis module addressed common misunderstandings related to the probabilistic meaning of CIs [\u003cspan additionalcitationids=\"CR16\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. Using graphical demonstrations and simulated datasets, the instructor illustrated how bootstrap resampling [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] generates a distribution of AUC estimates. Students observed how repeated resampling produces variability in AUC values and how the 95% CI reflects the uncertainty of statistical estimation rather than fluctuations in the ROC curve itself.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eModule 3: Comparison of ROC curves (approximately 45 minutes)\u003c/h2\u003e \u003cp\u003eThe final module focused on the correct statistical logic for comparing ROC curves. Students were presented with examples of two ROC curves with overlapping CIs and were asked to predict whether the curves differed significantly. The instructor then introduced the concept of paired ROC comparison and explained the DeLong test [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. Guided reasoning exercises were used to help students distinguish between visual comparison of ROC curves and formal statistical testing.\u003c/p\u003e \u003cp\u003eAt the end of the instructional session, key concepts from all three modules were summarized, followed by the administration of the post-intervention assessment (\u003cb\u003eapproximately 15 minutes\u003c/b\u003e). A schematic overview of the instructional framework is shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. A detailed outline of the 2.5-hour instructional session, including the structure of each module and the teaching materials used, is provided in \u003cb\u003eSupplementary Appendix 1.\u003c/b\u003e\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e[Figure \u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e]\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eOutcome Assessment\u003c/h3\u003e\n\u003cp\u003eConceptual understanding of ROC analysis was evaluated using a structured questionnaire administered immediately before and after the intervention. The same instrument was used for both the pre- and post-intervention assessments. The questionnaire consisted of 15 multiple-choice items, organized into three conceptual domains: (1) AUC interpretation (5 items), (2) CI interpretation (5 items), and (3) ROC curve comparison (5 items).\u003c/p\u003e \u003cp\u003eEach item assessed a specific conceptual misunderstanding commonly encountered in graduate research practice. For each domain, students\u0026rsquo; scores were calculated as the percentage of correct responses:\u003c/p\u003e\u003cp\u003e\u003cimg src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAfAAAABPCAYAAAAUR5fyAAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAAFiUAABYlAUlSJPAAABTUSURBVHhe7d3fb9vW2Qfwr977JaXVq2EoBlO7GDDAAyb1ovMuVKChlwzFhrST0g3DgA5wpQ0DtiJJKydXhd1S+wUUWywVKGAMW60sGbaLyYscYAUm3XRxCwtbrxwKRS56RUbz9gec9+LVw/fwiJSlJHXE9PsBCNgkRZGH1HlInoeHGaWUAhEREaXK/5gjiIiIaP4xgBMREaUQAzgREVEKMYATERGlEAM4ERFRCjGAExERpRADOBERUQoxgBMREaUQAzgREVEKMYATERGlEAM4ERFRCjGAExERpRADOBERUQoxgBMREaUQAzgREVEKMYATERGlEAM4ERFRCjGAExERpRADOBERUQoxgBPNoZ2dHZTLZaysrJiT5ka/30e5XEYmk8HCwgLW1tbMWYjoE8QATjRngiDAiRMncPXqVXPS3BgMBigWi3j11Vfh+z4qlQo2NjbQ7/fNWVOv1WqZo4jmQkYppcyRRPTwFQoFZLNZ3Lhxw5z00NXrdVy8eBGPevUxGAxQrVbnch8Q8QqcaE5ls1lzFB2jIAhQKpXM0URzgwGcSNNqtbCysoJ6vY5+v4+VlRVkMhmsrKwgCAIACMdlMhn0ej1gdEVqjuv1eqhWq1hYWEAQBKhWq8hkMsjlcuj3+wiCAGtra2EbctKt2n6/j1wuF67HYDAYmy7rZLZFyzqsrKyEy8nlcuG2mGSdFhYWkMlkUCgUsLOzE07v9XrIZDK4ePEiAITbPEm9Xg/XP5fLRZaH0VWutKVnMhmUy+XIrfikbRgMBmg2m8jlcuj1emFZNpvN8LP1ej2yLeYtfr0dP5PJoFqtRvbz3t4eOp1OWPZxZN/K9+Ryuci+1Nc/7jjQyTbI98m2yDgZL/tBH+S75P96vR4uN6kcgiCYWIatVivyuWq1Gi6THj4GcKIRaXvudDr44IMP0Gw2cenSJbiui06ng+vXrwMAbty4gUajEfnshQsXxsYBwK1btzAcDnH9+nW89tpr8H0fAHD27Fn88pe/xIsvvgjf93Hq1KnYyvH27dv4wx/+gK2trXA99KvCfr+Ps2fP4tKlS1BK4fXXX8fGxkak8t7d3UUQBOh0Ojh//jwsy8Lh4WE4XbeysoLhcIiDgwP4vo9CoYAzZ86EAWl5eRlKKbiuCwBQSk28jV6tVnHt2jXs7u7C933kcjmcOXMmPAkZDAbI5/MoFotQSsHzPHieh2KxGAlucdvw8ccf4/DwEJ7n4Z133sFXv/pVOI4Tblu1WsVHH32Eg4MDeJ4HACgWi2GA7vf7KBaL4Xe3221sbm7i8uXLwGjfOY4Dx3GglEq8jX758mXs7u5ib28Pvu8jn8/j3LlzkZMkWf/Lly/jG9/4BtrtNjzPw+uvvx7O02w28f7778P3ffi+j8XFxXBbPM+DbdvA6PhbXl6G53mwLAv5fD48rpaXl9Fut1EqlfDiiy8CR5TD4eFhYhn2+31Uq1Xs7e1BKYUf/OAHYyeP9JApIooAoGq12tg413XD/7vdrgKgut3uxHGO4yjzZ+a6buI487OO48TOt729rZRSqlQqhX+LfD6vLMsK/3ccR9m2HZknzvb2tgKgfN8Px/m+r2zbjixPJWyDaX9/XwFQnueF4xqNhgKg9vf3lVJKVSoVlc/ntU/9/+dKpVI4LmkbpMzNMtjf3x+bv91uKwCq0WgoNSq7SqUSmceyrLHvNfeBqVQqRT4j32PuS3M7zWW7rju2Pvp2yf6RslPaftD3Wa1WC8t8mnJIKsNutzv2WXMeerh4BU4U4+TJk+Yo/O1vfzNHHbvnnnsOAHDnzh0AwNWrV3Hu3LnIrdS9vT0Mh8PI1VIulwv/TrK1tQXbtiNt79lsFqVSCcPhMGwamFan0wEALC4uhuNWV1ehlMLS0hIAYHNzE4VCIZwOAEtLS8jn82NZ+JO24XOf+1zk/06nA8/zIuVy5swZYHTljVHZff7zn4987u7du4lNGUlarRZarRYGgwHW1tbw3e9+15wFSMhpkDICgKeeegqtVgvVajXcd+VyOZz+zDPPwLKsSBPByZMnYVkW3n77bWB0F8nzvLDMpykHYZbhF7/4RWB0V0b2vb4+9PAxgBOliB4MRbfbDW9l60PcvEe5e/euOSr2ZOZBivvOuGA3K7n1bQ5XrlwxZ70vkjeQz+fxxBNP4He/+505y1SWl5ext7cHALBtO9Iej1GZlMtltFotBEEQTqtUKnjrrbcAADdv3sQ3v/nN8DO4j3LIZrN477338PTTT+PZZ58NcxBofjCAE6WIVNp6UP3rX/+qzfF/zESxaQ2Hw0jQ0H3mM58xR00lrtLXr3KlXdZkWZY5aiadTid2W/Sy+eCDDyLTcA/PfX/nO9/B7u4uDg4OsLq6ihMnTpizTG1xcRFXrlzB/v4+bt26NZY4t7q6iuFwiJs3b+Ltt9/GM888g29/+9vwPA+tVgtbW1tjV8nTlEOSbDaLCxcu4ODgAI899hiKxSLbwecIA/h9CoIAhUIBhUIh9kdCj7b//Oc/sX9/Um7evAmMbqcCQKlUwubm5liQ/Mtf/hL5fxrf+ta3ACC8HSsODw9h23Z421vGHeWpp54CgDBjXTSbzfD2bKlUwt7e3tjt+SAIxgLRLBzHAUYJZjo9aDmOg6tXr0a+OwgC/POf/wz/n0an08Hzzz9/33cNms1mWIcsLS3hV7/61VjZSPPCL37xCxweHmJxcTEcd+nSJXzlK1/RljhdOSTp9Xrhd2ezWbRarTAhM87Ozs491YGDwWDs+KUpmY3i0wJw5GDbtnIcR9VqtUhCx6NEEkDMpBVKJ0mg0pOLfN9XACJJSJ7nheO63a5yXVfVajWFUQLc/v6+8n1f5fN5BSPJqFQqKRjJSPJZPUnIcRxlWZZqt9tKjY41y7Ii88j6AlCVSkW5rqts2w6PRUlCM78vjqyvZVnh5+U7ZR1kvmmXKUl8juOESVp6otb+/r6yLEvl8/kw8arRaCjLssL/J32fJMXpSWRCytm27fC79f0qv13LslStVlOu66p8Ph/5jlKppCzLUr7vR8pAJ+vv+77yPE9VKhWFUZJYo9GIJALqx4F5bLiuq0qlUrjdruuOfUYlJLNJOegJg+Kockgqw263Gx7f8j8m1HP5fD4sh2nJMWcmjR5le3s7PKaO4vt+WJYY/WaT9qVue3s7PO6k7JJ4nheWsxxP06jValPPG+eeA7garbRsoHngq9EOl4MZox+xOU/ayQE464FL80c/GZMhbpyQik+CaneUtSuVtvm5uGW5rhsGOX2cGgW3SqUSqXjiKk+paDGqaPTKyfy+uM/rfN+P/GbN74zbhklZ2vrykiq2/f39SBnoQUxN2AbJwE5aD9/3Va1WC8uvUqmM/Ubb7fbEOkxOMGzbHpsmtre3lWVZyrIs1Wg0lOd5yrIs5TjO1MdBt9tVjUYj3I9J66NG22UG27hxYlI5TCrDbrcb2S+2bU/MQp+1LpT5zcz7JL7vR4IqjCdD4ujrJGUpv9tJ2yK/O/PkOW5d/dEJmmy37N+4Y13XbrdV3ngyYVb3FcDVaEPNHW+SHwFGP+K4g5KIiNJt2iA+7XwmmVeC+FEBXE5AzBNXiVtxV+JyUmMuW+58mIE57oRAvjfujojSLn6Tpk/rvtvAzccw4iwtLeHdd9+FZVkYDoc4e/bsPbWVEBHR/NL77td7L9QFQRAm5924cWOm3AGZd9IjhaLX66HT6SCfz2N5eTkybXV1FQDw4x//ODI+CAK88cYbABB2hCPK5TIsy8LGxkYkke9Pf/oTYDyG9/TTTwNazoqpVCrhzTffvKcnRXT3HcCntbS0hFdeeQUYZZ2aiTJERJR+k4L4/QTvWf36178GAJw6dcqcFCZkep4XSei7fv06hsMh8vl87Lo9+eSTQEyiZ5K4ZM96vY5CoYDTp0+bk2Z2bAEcxhmNPLdo6vV6kb6Jc7kc1tbWYs/kpI/hhYUFIKZPYv05SnmrkCzXfMZS12w2USgUIutQr9dj55fuBmUddNKvthywvV4vXG5cn9BHMfttXhn12a13m6lrtVqR7SiXy2PZviIIgkif1TJ/0jqa2yZ9KJvZ+FLusk8WFhYmlj0RpV9cED/O4I1RRz0A8MQTT5iTAC1D/1//+lc4Tq6mpdtak1xZv//+++akWGYfCr1eD9euXcNrr70WGX/PzHvqs5L2gklt4Do9ScO8/y9tDNLFnz/KHoxrO6/VapEEC0mEkaw+PWFjf9SdYNw0k55BqrTkkLj5zXUQnudFvsdxnEjWpCwvrgyS+L4fJgFJO5Akz5htNb7vK2fUdaOUmZ44Y7b76O1Rsj56YpS+3XHbVqvVIvtV2psk90EvSynfWdu+zGSbWQciOn563VIZdZs7y+8+idS7Zt0n9rWnM8z2byHL0GOX1GtJy9XrITGpDVyPWf4o2e1B5oDdd802awDXA55esBJg4gquNnrExszY03eSHtiU1t8vYrJP9WBmihs/aX5v9DhR3DT95MPMeJSAF7e9ceQgMTUajbFlSIA1fyhxB6yMj5tfsmkRE/T1bZMAvb29HSnrpEcvpk1A0TGAE6WTXHxghguWoxwVwPU6e5YALp9JWm5cAJftk5MT+W7zqQC5kHuQjvUWOrRbEKaf/OQngNbXs+6nP/0pAGBvby/SS5LescT6+nrklozevnDlypXINDOhQWfbNvL5fGTcpPmnSUJ48sknxzqleP755yP/H0XaUvR+kBFTXoPBABsbGyiXy2O3qL7//e8DQKSzh52dHXQ6ndj5FxcXUalUgJhkD2HbdpgQUi6Xw7JutVrwPG8sEQRam1RSM0qcCxcuwOwKcpaBiB6Oy5cvh/VqqVR65JrPstks3n33Xdi2jccffxzPPvssarVaJFbV63UsLi5G4oC8xlWaQ++lM5tjD+BxL4To9/thH8BxATGbzYbtFX/+85/NyQ/U7du3cevWLcB4V+7DJtv/0ksvRV52kB11dSgk6zHu6YByuQylFNbX18Nxv/3tb4GE+QHg61//OjBK9og7wMygL/7+978DAB5//PGwTV2Gzc1NYEIXmg+Tua4cOHCIH6ZRrVZx69Yt3LhxY6xN/FGytLSEVqsFpRTu3r0bqWP7/T7eeuutSLt3s9nESy+9hPPnz0Mpdc/d1B57ANd3nHSn+N///lebI55cNf773/82Jz1wknj1hS98AXfu3MHu7q45y7FbWlpCu92GZVnY3NwMX3ZgBtW4rMdJjipP2UeYcj8JORDNK2FzIKJHkx68s9lsbGLboy4IApw9exZ//OMfIxc7r776KizLCu9e/uhHP8JwOMTPf/5z7dNHO9YAHgRBeKWdlKaftFPNbL5PSqvVCm+hHxwcYH19PfauwMNw+vRpHBwcwHXdMJB/+ctfjn35wkcffWSOmigp8Mfto1mYJxjzzjzB4MCBQ/wwiRm8xXEFcb3ZM+kdBbdv3waMZl2505lUH0q9KvMd5Yc//CHOnz8fae7t9XoYDofhI2nQ1jeuLp/kWAO4/uzcyy+/HJkm3nvvPXNUxCcZTPv9Ps6dOwfbtsfazeeF3DI/ODhArVYDAJw7d24sUE46EOR1hLpp7jJ89rOfNUcleuyxxwDjfcempMff4tTr9bFbeLMMRHQ8koK3OK4gXiqVAOMxMZ004enBWF7ok/SYmNxZlPkmkXwlucqexnA4NEdNdGwBvN/vhz3cOI4TacxfXl4On7tLeouSvPbvhRdeMCc9MBJsZk0wOw71ej3SPpLNZrG+vh4mmcm6y8E4HA5jA2QQBNja2gp/WJLYtre3F9v+IicGjuPMdPJULBYBAG+88cbYyQVG6zHrXQIimm9HBW9xHEH8e9/7HgDg2rVr5qSwPwzzLXuSFJz0ClapZ+VtgEn6/T5+9rOf4Te/+Y05aaJZX6F73wF8mkq43++jWCyGPdz8/ve/N2fBm2++CQCxr0YMggC7u7twHCdyaySugGcVF7TM7Ggz89tcP3Ev6zNN+Qm54tZJ8pl0VrC0tBQG8YsXL0Y6oOn1elhZWYlklJfL5bDJIG75csBeunTJnDTRc889F3adWywWI1f9sh6znIwxC51ovk0bvIUZxB+006dPw3Gc2IuTd955BwCwtbUVGZ/NZuG6LhDTDarc1XRdd+LFjLR76xdKuuXlZViWhX/84x/hOIkpcb3GTWQ+VzaLWd9GZj6PbZJ5Le11hp7nJXZ6L89GI+Y5ZekUBlqnLEJ/Rlx/3k9/dlA6Xcnn85Fl1Wq1yPN9+jqY36O/xk9//tEfdbQi08ztiiPPH+pvapLyN5+P15/fNgezMxplvGxG30dxHRQoo3MbTHjOUl+uOZgvBCCidGu321PVZSZ/wqta4+hxp1QqTfxO/Rltz/OUr3UOZtZrOukbQ+o26TQrrv40VUav9Z1E77RMr0/NGHqUew7gZoUcN1ijV+q5rjv1A/ztdjvS2Us+n1eu647tJH0efVAT1i1pmv4gv+woGS8FKgWsdxgTtw6O40ROBPTBdd2J0yZxR6+d1Hs8s0a9sJllo7Rez2Rb5EQkicwvPwzLslSpVBoLzknrr5ehzhu9H3na9SAiSmLWO9PUn2ZdGFevxXFdN3KBOk29JT1uTsN8le0062TKKN5fJCIiSp37bgMnIvokmE8eLCws3FOeCdGjilfgRDR3pDMl/RndL33pSw/kFYxEjwoGcCKaO9VqFaurq5FHfIgoigGciObKYDCAbduwbRunTp3CCy+8MPGFQkSfVmwDJ6K58vHHH6NWq4XdBX/ta1/D2tqaORvRpx6vwIlobklbeKfTQaPRmKlbSqJHHQM4Ec29QqGA4XAYvoCCiBjAiSgFdnZ2cObMGXaLS6RhGzgRzb0TJ07M/KIHokcdAzgRzb0PP/wQr7zyijma6FONAZyI5srCwgLW1tbCN0g1m03cuXMHFy5cMGcl+lRjACeiuVKpVLCxsQHbtlEoFHDy5Emsr6+bsxF96jGJjYiIKIV4BU5ERJRCDOBEREQpxABORESUQgzgREREKcQATkRElEIM4ERERCnEAE5ERJRCDOBEREQpxABORESUQgzgREREKcQATkRElEIM4ERERCnEAE5ERJRCDOBEREQpxABORESUQgzgREREKcQATkRElEL/C2uY1Y5mmslUAAAAAElFTkSuQmCC\"\u003e\u003c/p\u003e\u003cp\u003eThus, possible scores for each domain were 0, 20, 40, 60, 80, or 100. Mean\u0026plusmn;standard deviation (SD) domain scores were then calculated across all participants. Misconception prevalence at baseline and post-intervention was defined as the proportion of students who answered at least one item incorrectly within each conceptual domain. The full assessment questionnaire is provided in \u003cb\u003eSupplementary Appendix 2\u003c/b\u003e.\u003c/p\u003e \u003cdiv id=\"Sec10\" class=\"Section2\"\u003e \u003ch2\u003eStatistical Analysis\u003c/h2\u003e \u003cp\u003eDescriptive statistics were used to summarize baseline misconception patterns. Pre- and post-intervention scores were compared using paired \u003cem\u003et\u003c/em\u003e-tests to evaluate improvements in conceptual understanding. All statistical analyses were performed using R software (version 4.2.0). A two-sided P value\u0026thinsp;\u0026lt;\u0026thinsp;0.05 was considered statistically significant.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003eEthical Considerations\u003c/h2\u003e \u003cp\u003eThis study was conducted as an educational evaluation within routine graduate teaching activities. Participation was voluntary and anonymous, and no identifiable personal data were collected. According to institutional guidelines for educational research, formal ethical approval was not required. Written informed consent was obtained from all participants.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003eBaseline and Post-intervention Conceptual Misconceptions\u003c/h2\u003e \u003cp\u003eAt baseline, conceptual misconceptions were highly prevalent across all three domains of ROC interpretation (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). In the AUC interpretation domain, 92.7% of students answered at least one item incorrectly. Similarly, misconception prevalence was 95.1% for CI interpretation and 97.6% for ROC curve comparison, indicating widespread misunderstanding of the statistical principles underlying ROC analysis.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eFollowing the educational intervention, the prevalence of misconceptions decreased across all domains (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). The proportion of students demonstrating misconceptions declined to 58.5% in the AUC interpretation domain, 70.7% in CI interpretation, and 78.0% in ROC curve comparison. The largest reduction was observed in AUC interpretation, suggesting that the visual and simulation-based explanations were particularly effective in clarifying the conceptual meaning of AUC as a statistical estimate.\u003c/p\u003e \u003cp\u003eOverall, these findings indicate that the structured instructional intervention substantially reduced conceptual misunderstandings related to ROC analysis among graduate students.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e[Figure \u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e]\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003eQuantitative Improvement Following Intervention\u003c/h2\u003e \u003cp\u003eFollowing the structured conceptual intervention, students demonstrated significant improvements across all domains of assessment. The improvement in conceptual accuracy across domains is visually illustrated in Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eMean scores for AUC interpretation increased from 48.8% (SD 26.9%) to 80.0% (SD 21.4%) (P\u0026thinsp;\u0026lt;\u0026thinsp;0.001), CI conceptual understanding increased from 40.9% (SD 26.8%) to 74.1% (SD 22.9%) (P\u0026thinsp;\u0026lt;\u0026thinsp;0.001), and ROC comparison logic increased from 34.6% (SD 26.8%) to 68.2% (SD 25.2%) (P\u0026thinsp;\u0026lt;\u0026thinsp;0.001). The overall conceptual accuracy increased from 41.4% (SD 26.8%) to 74.1% (SD 23.1%) (P\u0026thinsp;\u0026lt;\u0026thinsp;0.001). Effect sizes (Cohen\u0026rsquo;s d) were large across all domains (all d\u0026thinsp;\u0026gt;\u0026thinsp;2.0).\u003c/p\u003e \u003cp\u003eAmong the three domains, the largest improvement was observed in ROC curve comparison, which represented the most conceptually challenging topic at baseline. These results indicate that the structured instructional intervention substantially improved students' statistical understanding of ROC analysis.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e[Figure \u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e]\u003c/p\u003e \u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study identified systematic patterns of conceptual misunderstanding among medical graduate students regarding ROC curve analysis based on questionnaire responses. Misconceptions were particularly concentrated in three domains: interpreting AUC as a deterministic geometric quantity rather than a statistical estimate, misinterpreting CIs as intrinsic fluctuations of the curve rather than reflections of sampling variability, and relying on visual overlap of CIs instead of appropriate paired statistical testing when comparing ROC curves. These findings suggest that difficulties in ROC interpretation reflect deeper inferential reasoning challenges rather than isolated computational errors.\u003c/p\u003e \u003cp\u003eOur findings align with prior research in statistics education demonstrating that students often possess procedural competence without fully developed conceptual understanding [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Learners may successfully compute statistics using software while lacking an accurate mental model of what those statistics represent. Misinterpretations of CIs, in particular, have been widely documented across disciplines [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan additionalcitationids=\"CR16\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. Hoekstra et al. showed that even researchers frequently misinterpret CIs as reflecting the probability that a parameter lies within the interval [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. Similar misunderstandings observed in the present study suggest that ROC-related confusion may represent a specific manifestation of broader inferential reasoning challenges in statistical education [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe misconception that AUC represents a fixed \u0026ldquo;area\u0026rdquo; analogous to a geometric measurement highlights the influence of intuitive reasoning. Cognitive psychology research suggests that learners often rely on surface analogies when interpreting abstract statistical constructs [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. Because AUC is visually presented as an area under a curve, students may default to deterministic geometric interpretations, overlooking the inferential nature of statistical estimation. Without explicit instruction addressing sampling variability and estimation theory, such intuitive interpretations may persist.\u003c/p\u003e \u003cp\u003eThe difficulty in comparing ROC curves further illustrates challenges related to correlated data structures. Prior work in medical statistics education has emphasized that paired and independent comparisons require fundamentally different inferential frameworks [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e]. However, this distinction is rarely made explicit in ROC instruction, where visual comparison of curves may inadvertently encourage heuristic reasoning. The present study suggests that explicit conceptual emphasis on paired comparisons and formal statistical testing (e.g., the DeLong method [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]) can substantially improve interpretive accuracy.\u003c/p\u003e \u003cp\u003eImportantly, our intervention emphasized conceptual visualization and simulation-based reasoning rather than formula derivation alone. Simulation-based approaches have been shown to enhance statistical understanding by making sampling variability more tangible [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e, \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. By illustrating how AUC estimates vary across repeated resampling [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e], the instructional modules aimed to clarify the relationship between sampling variability and CI estimation. This pedagogical strategy may help bridge the gap between procedural statistical output and conceptual interpretation [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe implications of these findings extend beyond ROC interpretation. In the era of artificial intelligence and predictive modeling, ROC curves and AUC values are frequently reported in clinical machine-learning studies [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. Misinterpretation of these metrics may lead to overestimation of model performance or inappropriate comparison of diagnostic tools. Strengthening conceptual understanding of ROC analysis is therefore directly relevant to improving research literacy and responsible interpretation of AI-driven diagnostic systems.\u003c/p\u003e \u003cp\u003eSeveral limitations should be acknowledged. First, the study was conducted within a single institution with a relatively small sample size, which may limit generalizability. Second, the intervention effect was measured immediately after instruction, and long-term retention of conceptual understanding was not evaluated. Third, the assessment relied on multiple-choice items rather than open-ended reasoning tasks, which may not fully capture the depth of students\u0026rsquo; statistical reasoning processes. Future studies could incorporate longitudinal follow-up and multi-institutional validation to further evaluate the generalizability and durability of the educational intervention.\u003c/p\u003e \u003cp\u003eIn conclusion, this study demonstrates that misconceptions in ROC interpretation among medical graduate students are systematic and theoretically interpretable within broader frameworks of statistical reasoning. A targeted conceptual intervention integrating visualization and simulation was associated with substantial improvements in students\u0026rsquo; understanding of ROC-related statistical concepts. Emphasizing conceptual reasoning rather than procedural computation alone may represent an effective strategy for strengthening statistical literacy in medical research training.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eEthics approval and consent to participate:\u0026nbsp;\u003c/strong\u003eThis study was performed in accordance with the Declaration of Helsinki. It was conducted as an educational evaluation within routine graduate teaching activities at Xiangya Hospital, Central South University. According to the institutional guidelines for educational research, formal ethical approval was waived by the Medical Ethics Committee of Xiangya Hospital. Participation was voluntary and anonymous, and written informed consent was obtained from all participants.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication:\u0026nbsp;\u003c/strong\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials:\u003c/strong\u003e The raw questionnaire response datasets generated and analysed during the current study are not publicly available due to privacy protections for educational research participants, but are available from the corresponding author on reasonable request.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests:\u003c/strong\u003e The authors of this manuscript declare no conflict of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding:\u0026nbsp;\u003c/strong\u003eThis study was funded in part by the 2022 Central South University Graduate Course Ideological and Political Construction Project (2022YJSKS018), 2023 Central South University Graduate Education Teaching Reform Research Project (2023JGB118), 2023 Central South University Education and Teaching Reform Research Project (2023jy153), and 2023 Hunan Province Ordinary Higher Education Teaching Reform Research Project (HNJG-20230120).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026apos; contributions:\u0026nbsp;\u003c/strong\u003eYitao Mao and Luqing Zhao designed the study and developed the research questions. Yulong Wang drafted the initial manuscript. Yitao Mao reviewed and revised the manuscript for intellectual content. Fangxue Yang, Yu Zhang, Juxiong Xiao, Liping Zhu, and Luqing Zhao supervised the entire study process. Luqing Zhao obtained funding for the study. All authors read and approved the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements:\u003c/strong\u003e Not applicable.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eFawcett T. An introduction to ROC analysis. Pattern Recogn Lett. 2006;27(8):861\u0026ndash;74. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.patrec.2005.10.010\u003c/span\u003e\u003cspan address=\"10.1016/j.patrec.2005.10.010\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePepe MS. Receiver operating characteristic methodology. J Am Stat Assoc. 2000;95(449):308\u0026ndash;11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/01621459.2000.10473992\u003c/span\u003e\u003cspan address=\"10.1080/01621459.2000.10473992\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMandrekar JN. Receiver operating characteristic curve in diagnostic test assessment. J Thorac Oncol. 2010;5(9):1315\u0026ndash;6. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1097/JTO.0b013e3181e84089\u003c/span\u003e\u003cspan address=\"10.1097/JTO.0b013e3181e84089\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Casp J Intern Med. 2013;4(2):627\u0026ndash;35. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.12925/cjim.2013.4.2.627\u003c/span\u003e\u003cspan address=\"10.12925/cjim.2013.4.2.627\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHand DJ. Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach Learn. 2009;77(1):103\u0026ndash;23. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10994-009-5119-5\u003c/span\u003e\u003cspan address=\"10.1007/s10994-009-5119-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePepe MS. The Statistical Evaluation of Medical Tests for Classification and Prediction. Oxford: Oxford University Press; 2003.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837\u0026ndash;45. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2307/2531595\u003c/span\u003e\u003cspan address=\"10.2307/2531595\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHe Z, Zhang Q, Song M, Tan X, Wang W. Four overlooked errors in ROC analysis: how to prevent and avoid. BMJ Evid Based Med. 2025;30(3):208\u0026ndash;11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1136/bmjebm-2024-113078\u003c/span\u003e\u003cspan address=\"10.1136/bmjebm-2024-113078\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGarfield J, Ben-Zvi D. Developing Students' Statistical Reasoning: Connecting Research and Teaching Practice. Dordrecht: Springer; 2008. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-1-4020-8308-4\u003c/span\u003e\u003cspan address=\"10.1007/978-1-4020-8308-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003edelMas R, Garfield J, Ooms A, Chance B. Assessing students' conceptual understanding after a first course in statistics. Stat Educ Res J. 2007;6(2):28\u0026ndash;58. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.5204/srej.302\u003c/span\u003e\u003cspan address=\"10.5204/srej.302\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHoekstra R, Morey RD, Rouder JN, Wagenmakers EJ. Robust misinterpretation of confidence intervals. Psychon Bull Rev. 2014;21(5):1157\u0026ndash;64. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3758/s13423-013-0572-3\u003c/span\u003e\u003cspan address=\"10.3758/s13423-013-0572-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGigerenzer G, Gaissmaier W, Kurz-Milcke E, Schwartz LM, Woloshin S. Helping doctors and patients make sense of health statistics. Psychol Sci Public Interest. 2007;8(2):53\u0026ndash;96. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/j.1529-1006.2007.00038.x\u003c/span\u003e\u003cspan address=\"10.1111/j.1529-1006.2007.00038.x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAnderson BL, Williams S, Schubert JR. Medical students' statistical literacy and attitudes toward statistics. Adv Health Sci Educ Theory Pract. 2023;28(4):1087\u0026ndash;105. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s10459-023-10215-4\u003c/span\u003e\u003cspan address=\"10.1007/s10459-023-10215-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eL\u0026oacute;pez-Mart\u0026iacute;n MDM, \u0026Aacute;lvarez-Arroyo R. Exploring misconceptions related to sampling distribution, confidence intervals, and hypothesis testing: a perspective from econometrics. Advances in Quantitative Methods for Economics and Business. Cham: Springer; 2025. pp. 147\u0026ndash;57. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/978-3-031-57565-9_8\u003c/span\u003e\u003cspan address=\"10.1007/978-3-031-57565-9_8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWang X, Reich NG, Horton NJ. Enriching students' conceptual understanding of confidence intervals: an interactive trivia-based classroom activity. arXiv Preprint. 2017;arXiv:1701.08452. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://arxiv.org/abs/1701.08452\u003c/span\u003e\u003cspan address=\"https://arxiv.org/abs/1701.08452\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eThiesmeier R, Orsini N. Rolling the DICE (Design, Interpret, Compute, Estimate): interactive learning of biostatistics with simulations. JMIR Med Educ. 2024;10(1):e52679. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/52679\u003c/span\u003e\u003cspan address=\"10.2196/52679\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFidler F, Cumming G. Teaching confidence intervals: problems and potential solutions. In: Rossman A, Chance B, eds. Proceedings of the Seventh International Conference on Teaching Statistics (ICOTS-7). Voorburg: International Statistical Institute; 2006:1\u0026ndash;6.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOrsini N, Thiesmeier R, B\u0026aring;ge K. A simulation-based approach to teach interaction effects in postgraduate biostatistics courses. J Stat Data Sci Educ. 2024;32(4):395\u0026ndash;404. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/26939169.2024.2330937\u003c/span\u003e\u003cspan address=\"10.1080/26939169.2024.2330937\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEfron B, Tibshirani RJ. An Introduction to the Bootstrap. Boca Raton: CRC; 1994. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1201/9781315144649\u003c/span\u003e\u003cspan address=\"10.1201/9781315144649\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAmerican Statistical Association. GAISE College Report: Guidelines for Assessment and Instruction in Statistics Education. Alexandria: American Statistical Association; 2016. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://www.amstat.org/asa/files/pdfs/GAISE/GaiseCollege_Full.pdf\u003c/span\u003e\u003cspan address=\"https://www.amstat.org/asa/files/pdfs/GAISE/GaiseCollege_Full.pdf\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRobin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, M\u0026uuml;ller M. pROC: an open-source package for R and S\u0026thinsp;+\u0026thinsp;to analyze and compare ROC curves. BMC Bioinformatics. 2011;12(1):77. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/1471-2105-12-77\u003c/span\u003e\u003cspan address=\"10.1186/1471-2105-12-77\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKahneman D, Thinking. Fast and Slow. New York: Farrar, Straus and Giroux; 2011.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBland JM. An Introduction to Medical Statistics. 4th ed. Oxford: Oxford University Press; 2015. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/med/9780198741676.001.0001\u003c/span\u003e\u003cspan address=\"10.1093/med/9780198741676.001.0001\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChance B, Ben-Zvi D, Garfield J, Medina E. The role of technology in improving student learning of statistics. Technol Innov Stat Educ. 2007;1(1):1\u0026ndash;20. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.5070/T511004085\u003c/span\u003e\u003cspan address=\"10.5070/T511004085\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCobb G. Mere renovation is too little too late: We need to rethink our undergraduate curriculum from the ground up. Am Stat. 2015;69(4):266\u0026ndash;82. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/00031305.2015.1052433\u003c/span\u003e\u003cspan address=\"10.1080/00031305.2015.1052433\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTopol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44\u0026ndash;56. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41591-018-0300-7\u003c/span\u003e\u003cspan address=\"10.1038/s41591-018-0300-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-education","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"meed","sideBox":"Learn more about [BMC Medical Education](http://bmcmededuc.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/meed/default.aspx","title":"BMC Medical Education","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"ROC curve, Area under the curve, Statistical misconceptions, Medical education, Graduate students, Simulation-based learning","lastPublishedDoi":"10.21203/rs.3.rs-9132102/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9132102/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground:\u003c/h2\u003e \u003cp\u003eReceiver Operating Characteristic (ROC) curve analysis is widely used in diagnostic research and machine-learning model evaluation. However, medical graduate students frequently misinterpret key statistical concepts underlying ROC analysis, including the inferential meaning of the Area Under the Curve (AUC), the rationale for confidence intervals (CIs), and appropriate comparison of ROC curves. Despite the central role of ROC methodology in modern medical research, systematic investigation of these conceptual misconceptions remains limited.\u003c/p\u003e\u003ch2\u003eMethods:\u003c/h2\u003e \u003cp\u003eA total of 41 medical graduate students participated in this quasi-experimental study. A 2.5-hour structured educational intervention was delivered, addressing three conceptual domains: AUC interpretation, CI interpretation, and ROC curve comparison. Conceptual understanding was assessed using a 15-item multiple-choice questionnaire administered immediately before and after the intervention.\u003c/p\u003e\u003ch2\u003eResults:\u003c/h2\u003e \u003cp\u003eAt baseline, misconceptions were highly prevalent across all domains (92.7%-97.6%). Following the intervention, significant improvements were observed in all domains (all P\u0026thinsp;\u0026lt;\u0026thinsp;0.001). Mean scores increased from 48.8% (SD 26.9%) to 80.0% (SD 21.4%) for AUC interpretation, from 40.9% (SD 26.8%) to 74.1% (SD 22.9%) for CI interpretation, and from 34.6% (SD 26.8%) to 68.2% (SD 25.2%) for ROC curve comparison. Effect sizes were large across all domains (Cohen's d\u0026thinsp;\u0026gt;\u0026thinsp;2.0).\u003c/p\u003e\u003ch2\u003eConclusions:\u003c/h2\u003e \u003cp\u003eConceptual misunderstandings in ROC analysis are systematic among medical graduate students. A targeted, simulation-based educational intervention significantly improved students' statistical reasoning. These findings highlight the importance of emphasizing conceptual understanding in graduate-level statistics education.\u003c/p\u003e","manuscriptTitle":"Common Misconceptions in Interpreting ROC Curves Among Medical Graduate Students: A Conceptual Diagnostic Study with a Simulation-Based Educational Intervention","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-24 13:41:05","doi":"10.21203/rs.3.rs-9132102/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"reviewersInvited","content":"","date":"2026-04-17T06:47:02+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-13T08:52:09+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-03-20T13:14:45+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-03-19T16:53:25+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Education","date":"2026-03-19T15:52:11+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-education","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"meed","sideBox":"Learn more about [BMC Medical Education](http://bmcmededuc.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/meed/default.aspx","title":"BMC Medical Education","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"aef0b4f8-567c-45dc-adba-832d8b25c007","owner":[],"postedDate":"April 24th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-24T13:41:05+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-24 13:41:05","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9132102","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9132102","identity":"rs-9132102","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.