ChatGPT's Performance in Supporting Physician Decision-Making in Nephrology Multiple-Choice Questions

doi:10.21203/rs.3.rs-4947755/v1

ChatGPT's Performance in Supporting Physician Decision-Making in Nephrology Multiple-Choice Questions

2024 · doi:10.21203/rs.3.rs-4947755/v1

preprint OA: closed

Full text JSON View at publisher

Full text 91,486 characters · extracted from preprint-html · click to expand

ChatGPT's Performance in Supporting Physician Decision-Making in Nephrology Multiple-Choice Questions | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article ChatGPT's Performance in Supporting Physician Decision-Making in Nephrology Multiple-Choice Questions Ryunosuke Noda, Kenichiro Tanabe, Daisuke Ichikawa, Yugo Shibagaki This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4947755/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 01 May, 2025 Read the published version in Scientific Reports → Version 1 posted You are reading this latest preprint version Abstract Background ChatGPT is a versatile conversational AI capable of performing various tasks, and its potential use in medicine has garnered attention. However, whether ChatGPT can support physicians' decision-making remains unclear. This study evaluated ChatGPT's performance in supporting physicians with answers to nephrology written examinations. Methods We extracted 45 single-answer multiple-choice questions from the Core Curriculum in Nephrology articles published in the American Journal of Kidney Diseases from October 2021 to June 2023. Eight junior physicians without board certification and ten senior physicians with board certification, as well as the ChatGPT GPT-4 model, answered these questions. The physicians answered twice: first without ChatGPT's support and then with the opportunity to revise their answers based on ChatGPT's output. We statistically compared the proportion of correct answers before and after using ChatGPT. Results ChatGPT had a proportion of correct answers of 77.8%. The mean proportion of correct answers from physicians before using ChatGPT was 50.8% (standard deviation [SD] 7.5) for junior physicians and 65.3% (SD 5.9) for senior physicians. After using ChatGPT, the proportion of correct answers significantly increased to 72.2% (SD 4.6) for junior physicians and 77.1% (SD 4.2) for senior physicians (junior physicians: p < 0.001, senior physicians: p < 0.001). The improvement of the proportion of correct answers was significantly higher for junior physicians than senior physicians (p = 0.015). Both groups decreased the proportion of correct answers in one of the seven clinical categories. Conclusions ChatGPT significantly improved the accuracy of physicians' answers in nephrology, especially for less experienced physicians, although it also suggested potential negative impacts in a specific subfield. Careful consideration is required regarding using ChatGPT to support physicians' decision-making. Artificial intelligence ChatGPT GPT-4 Large language models Nephrology Clinical decision-making Figures Figure 1 Figure 2 Figure 3 Figure 4 INTRODUCTION Rapid technological advancements and breakthroughs in artificial intelligence (AI), particularly large language models (LLMs), have raised great expectations for their application in various fields of medicine, including nephrology [ 1 – 3 ]. LLMs are capable of generating human-like text and have demonstrated efficacy in a wide range of medical tasks, such as medical question answering [ 4 ], diagnostic dialogue [ 5 ], clinical text summarization [ 6 ], and medical image interpretation [ 7 ]. The score in medical written examinations, which is easily quantifiable, has been primarily used to evaluate the performance of LLMs in medical fields [ 8 – 13 ]. ChatGPT, developed by OpenAI, is one of the most prominent applications based on LLMs [ 14 ]. Among its various models, GPT-4, despite being a general-purpose system not designed for medicine, has performed remarkably in general medical examinations, meeting the passing standards of the United States medical licensing examinations [ 8 ] and the Japanese national medical examinations [ 9 ]. Recently, GPT-4 has been tested in more advanced medical subspecialties, such as nephrology. GPT-4 has shown favorable results in nephrology, achieving passing standards in the Polish nephrology specialty examinations [ 10 ] and partially meeting the renewal requirements for the Japanese nephrology specialty certification [ 11 ], indicating its potential for future clinical applications. Although ChatGPT has demonstrated excellent performance in various medical fields, including nephrology, challenges remain for its implementation in clinical practice. Technical issues, such as hallucinations, where fabricated or incorrect information is generated, and the risk of providing responses that do not align with human ethics and values, could negatively impact medical practice [ 9 , 15 ]. To mitigate these risks, it is essential to recognize that clinical decision-making should not be entrusted solely to ChatGPT without the involvement of physicians [ 16 , 17 ]. Instead, its role should be to support physicians by providing an additional perspective that can be compared and contrasted with the physician's clinical judgment. This approach may allow physicians to maintain responsibility for the final decision while leveraging the potential benefits of ChatGPT in enhancing decision-making accuracy and efficiency. Despite many studies on ChatGPT's performance in medical tasks, research on its ability to support physicians in decision-making remains limited. Consequently, this study investigated whether the scores of nephrology multiple-choice questions improve before and after physicians use ChatGPT, examining the utility and concerns of ChatGPT in supporting physicians' decision-making. MATERIALS AND METHODS Nephrology Multiple-Choice Questions For this study, we utilized multiple-choice questions from the Core Curriculum in Nephrology articles published in the American Journal of Kidney Diseases. These questions were selected because they aim to provide core knowledge necessary for nephrology practice and are widely accepted by nephrologists worldwide. Each question was formatted as a clinical scenario requiring the correct answer selection from multiple-choice, with each article containing questions from a specific nephrology category. We selected eight articles published after October 2021 to avoid data leakage since the ChatGPT GPT-4 model used in this study was trained on internet data up to September 2021 [ 18 – 25 ]. We excluded two questions requiring multiple answers, resulting in 45 single-answer multiple-choice questions for this study (Fig. 1 ). The question categories included nutrition (5 questions), lung and kidney disorders (9 questions), intoxication (6 questions), diuretics (6 questions), immunosuppression (3 questions), metabolic alkalosis (5 questions), peritoneal dialysis (5 questions), and plasma exchange (6 questions) (Table 1 ). The correct answers were those provided in the articles—no questions involved images or charts. We created two PDF files containing only the questions, one in the original English and the other translated into Japanese for the participating physicians. Table 1 Categories and articles of the questions Title Article Category Number of questions Nutrition in Kidney Disease: Core Curriculum 2022 Am J Kidney Dis. 2021; 79(3):437–449. Nutrition 5 Concomitant Lung and Kidney Disorders in Critically Ill Patients: Core Curriculum 2022 Am J Kidney Dis. 2021; 79(4):601–612. Lung and kidney disorders 9 The Role of the Nephrologist in Management of Poisoning and Intoxication: Core Curriculum 2022 Am J Kidney Dis. 2021; 79(6):877–889. Intoxication 6 Diuretics in States of Volume Overload: Core Curriculum 2022 Am J Kidney Dis. 2022; 80(2):264–276. Diuretics 6 Principles of Immunosuppression in the Management of Kidney Disease: Core Curriculum 2022 Am J Kidney Dis. 2022; 80(3):393–405. Immunosuppression 3 Metabolic Alkalosis Pathogenesis, Diagnosis, and Treatment: Core Curriculum 2022 Am J Kidney Dis. 2022; 80(4):536–551. Metabolic alkalosis 5 Peritoneal Dialysis Prescription and Adequacy in Clinical Practice: Core Curriculum 2023 Am J Kidney Dis. 2022; 81(1):100–109. Peritoneal dialysis 5 Therapeutic Plasma Exchange: Core Curriculum 2023 Am J Kidney Dis. 2023; 81(4):475–492. Plasma exchange 6 ChatGPT We used the ChatGPT GPT-4 model (May 24 version). All prompts were entered in English on June 23–24, 2023. The initial prompt stated, “I am now going to give you questions about nephrology. Please answer the questions in the following format. Answer: Explanation:” followed by one question. We refreshed the chat session for each question, repeating this process for all 45 questions. We confirmed that the output followed the answer and explanation format and calculated ChatGPT's scores based on the provided answers. For the participating physicians, we compiled ChatGPT's outputs into two PDF files, one in original English and the other translated into Japanese. Participating Physicians Eighteen physicians in the Division of Nephrology and Hypertension, Department of Internal Medicine, at St. Marianna University Hospital, volunteered to participate in this study. All physicians were Japanese, residing in Japan, were non-native English speakers. The participants were divided into two groups: eight junior physicians without board certification in nephrology and ten senior physicians with board certification. The participants were initially provided with two types of PDF files, one with the original English questions and the other with the Japanese translations, and they answered the questions twice. During the first attempt, they answered independently. In the second attempt, they referred to two additional PDF files containing ChatGPT's output in both English and Japanese to revise their answers if necessary. The source of the questions was not disclosed, and participants were prohibited from using the internet or literature to research the questions or consulting with others while answering. Data Analysis We calculated the proportion of correct answers for all the questions and each question category for ChatGPT, the junior physician group, and the senior physician group. Assuming the proportion of correct answers followed a normal distribution, we described them using means and standard deviations. We statistically compared the differences in the proportion of correct answers. Differences in the proportion of correct answers between groups were analyzed using independent two-sample t-tests, while differences between the first and second attempts were analyzed using paired t-tests. Statistical significance was set at p < 0.05. Statistical analyses were performed using R Version 4.2.3. RESULTS No errors occurred in ChatGPT's output, and all outputs adhered to the specified answer and explanation format. ChatGPT's proportion of correct answers for all the questions was 77.8% (35/45). The proportion of correct answers by question category was highest for nutrition at 100% (5/5), peritoneal dialysis at 100% (5/5), and lowest for immunosuppression at 33.3% (1/3). Before using ChatGPT, the proportion of correct answers for physicians was 50.8 ± 7.5% for the junior physician group and 65.3 ± 5.9% for the senior physicians group. ChatGPT's proportion of correct answers was significantly higher than that of the junior physician group (p < 0.001) and the senior physician group (p < 0.001). After using ChatGPT, the proportion of correct answers increased significantly for both groups to 72.2 ± 4.6% for the junior physician group and 77.1 ± 4.2% for the senior physician group (p < 0.001 for both) (Fig. 2 ). Notably, the improvement of the proportion of correct answers was significantly higher for the junior physician group compared to the senior physician group (p = 0.015). ChatGPT usage improved the proportion of correct answers for 17 out of the 18 participating physicians, with one senior physician experiencing a decrease (Fig. 3 ). By question category, significant improvements in the proportion of correct answers were observed in the junior physician group for all categories except immunosuppression. In contrast, in the senior physician group, significant improvements were observed in four categories, excluding immunosuppressive therapy, intoxication, metabolic alkalosis, and peritoneal dialysis (Table 2 ). Both groups only experienced a decreased proportion of correct answers in the immunosuppression category, with increases observed in all other categories (Fig. 4 ). Table 2 The proportion of correct answers of ChatGPT, junior physicians, and senior physicians Question category Proportion of correct answers ChatGPT Junior physicians (n = 8) p-value Senior physicians (n = 10) p-value 1st attempt 2nd attempt 1st vs 2nd 1st attempt 2nd attempt 1st vs 2nd Overall 77.8% 50.8 ± 7.5% 72.2 ± 4.6% < 0.001 65.3 ± 5.9% 77.1 ± 4.2% < 0.001 Nutrition 100.0% 55.0 ± 17.7% 85.0 ± 20.7% 0.014 70.0 ± 19.4% 88.0 ± 16.9% 0.004 Lung and kidney disorders 55.6% 38.9 ± 10.3% 56.9 ± 12.5% 0.002 58.9 ± 13.9% 74.4 ± 10.5% 0.025 Intoxication 83.3% 50.0 ± 21.8% 81.3 ± 5.9% 0.006 71.7 ± 19.3% 80.0 ± 7.0% 0.138 Diuretics 83.3% 50.0 ± 15.4% 70.8 ± 11.8% 0.005 58.3 ± 11.8% 75.0 ± 14.2% 0.008 Immunosuppression 33.3% 45.8 ± 17.3% 37.5 ± 11.8% 0.351 60.0 ± 21.1% 43.3 ± 16.1% 0.052 Metabolic alkalosis 80.0% 52.5 ± 23.8% 75.0 ± 9.3% 0.015 68.0 ± 25.3% 82.0 ± 6.3% 0.089 Peritoneal dialysis 100.0% 72.5 ± 14.9% 90.0 ± 15.1% 0.021 74.0 ± 13.5% 78.0 ± 17.5% 0.168 Plasma exchange 83.3% 50.0 ± 26.7% 77.1 ± 8.6% 0.014 65.0 ± 14.6% 83.3 ± 7.9% 0.003 Assuming the proportion of correct answers followed a normal distribution, we described them using means ± standard deviations. DISCUSSION We evaluated the performance of the ChatGPT GPT-4 model in assisting physicians with nephrology multiple-choice questions. ChatGPT significantly improved the proportion of correct answers for junior and senior physicians, with a significantly higher effect on junior physicians. However, in a question category with a lower proportion of correct answers, ChatGPT decreased physicians' proportion of correct answers. To the best of our knowledge, this is the first study to assess the performance of ChatGPT in supporting physician decision-making in nephrology. Previous studies have evaluated ChatGPT's performance on nephrology-related exams. In the Polish national nephrology specialty exam, GPT-3.5 consistently failed, while GPT-4 passed 11 out of 13 attempts [ 10 ]. Similarly, in Japanese nephrology self-assessment questions, GPT-3.5 and Google Bard never passed, but GPT-4 did so in 3 out of 5 attempts [ 11 ]. GPT-4 also achieved the highest score (73.3%) in the American Society of Nephrology questions among seven large language models [ 12 ]. These results indicate GPT-4's strong performance in nephrology exams. However, prior research has not explored the implications of physicians using GPT-4. To better assess these technologies in clinical settings, it is crucial to consider scenarios where physicians utilize them [ 16 , 17 ]. Thus, our study evaluated ChatGPT GPT-4 model's effectiveness in assisting physicians with their answers. ChatGPT showed a significantly higher proportion of correct answers than junior and senior physicians. While previous studies found GPT-4's performance comparable to or slightly better than junior physicians or residents [ 11 , 13 , 26 , 27 ], it rarely outperformed board-certified specialists. For instance, in Israel's official medical board residency examinations, GPT-4 performed comparably to residents, meeting the passing standards in four specialties [ 26 ]. In the American Society of Nephrology's questions, GPT-4's score was below the passing threshold and the mean score of nephrology examinees [ 13 ]. Additionally, in Japanese nephrology exams, GPT-4 scored lower than fourth-year residents [ 11 ]. These findings suggest that GPT-4's higher performance than that of specialists in this study is debatable. The questions from the U.S. academic journal may have favored GPT-4 due to its mainly training on English-language datasets, potentially disadvantaging Japanese physicians. Therefore, the observed performance differences may not reflect clinical knowledge or skills and require careful interpretation. ChatGPT significantly improved the proportion of correct answers for physicians, especially less experienced junior physicians. Improvement in physician performance with AI assistance has been reported in other medical subfields. AI-assisted radiological reading has enhanced radiologists' performance in detecting lung cancer on chest X-rays [ 28 ]. In the imaging diagnosis of lung adenocarcinoma in CT scans, the diagnostic accuracy of less experienced radiologists improved when using AI with 3D-CNN [ 29 ]. A systematic review demonstrated that AI assistance improved physicians' diagnostic performance in skin cancer, benefiting non-dermatologists, such as primary care providers, more than dermatologists [ 30 ]. There is also a report that ChatGPT's advice improved the accuracy of physicians' decision-making in chest pain evaluation scenarios [ 31 ]. Our findings that ChatGPT effectively supported nephrology physicians align with previous studies, highlighting its potential not just as a clinical tool, but also as an educational aid. For junior physicians, who may benefit from additional guidance, ChatGPT could serve as a valuable tool, helping to bridge the gap between theoretical knowledge and practical application in clinical settings. However, in the category of immunosuppression, ChatGPT had a low proportion of correct answers and, although not statistically significant, decreased physicians' proportion of correct answers. It indicates that using ChatGPT could have severe negative impacts on clinical practice. Recent studies have reported variability in ChatGPT's proportion of correct answers depending on the clinical category [ 11 – 13 ]. GPT-3.5 performed poorly in electrolyte and acid-base disorders, glomerular diseases, and kidney-related bone and stone disorders, and GPT-4 had the lowest proportion of correct answers in electrolyte and acid-base disorders [ 12 , 13 ]. The low score of ChatGPT in immunosuppression in this study might be due to a lack of information on immunosuppressive therapy in the model's training data. However, the details of the training data and model from OpenAI are not publicly available, making strict evaluation difficult. The low proportion of correct answers for immunosuppression questions among physicians compared to other categories suggested that the difficulty of these questions might have been higher. Since there is no reliable method to identify the specific categories in which ChatGPT may provide inaccurate information, physicians may struggle to discern which output can be trusted or should be treated with caution. In any case, the decrease in physicians' proportion of correct answers with ChatGPT usage implies that physicians cannot fully distinguish between accurate and inaccurate information output by ChatGPT. It means that reliance on ChatGPT's information could lead to incorrect decision-making. It has already been reported that physician's performance can decline when AI tools provide incorrect recommendations [ 32 , 33 ]. In a study using 20 clinical cases, GPT-4 was reported to make incorrect clinical reasoning more frequently than residents [ 34 ]. Considering the potential negative impact of ChatGPT's inaccuracies on clinical decision-making, the practical application of ChatGPT in clinical settings requires careful consideration, adhering to the fundamental medical principle of "do no harm" to patients. This study had three limitations. First, ChatGPT's output was generated with a single simple prompt, and its performance reproducibility is not fully guaranteed. As a language model, ChatGPT's performance can fluctuate depending on the prompt and may generate different outputs each time. Second, the questions used in this study covered only some categories of nephrology. Nephrology includes many clinical categories not covered by the questions used in this study, such as acute kidney injury, chronic kidney disease, and renal replacement therapy. Therefore, this study could not comprehensively evaluate the performance of ChatGPT and physicians in nephrology. Third, the external validity of the physicians' proportion of correct answers is uncertain. The results were obtained from a single institution with a small number of participants, and the proportion of correct answers may not represent nephrologists as a whole and could include various biases. Considering these limitations, future studies should evaluate ChatGPT's decision-support performance for physicians using a diverse set of questions covering all areas of nephrology in large-scale, multi-institutional settings. CONCLUSION In this study, we evaluated ChatGPT's performance in supporting physicians in nephrology. ChatGPT significantly improved the proportion of correct answers in written nephrology exams for physicians, particularly for less experienced junior physicians, although it decreased the proportion of correct answers in a specific category. Further large-scale investigations are necessary to assess the effectiveness of large language models in clinical practice. Declarations ACKNOWLEDGEMENTS We would like to express our gratitude to the members of the Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine for their cooperation in answering questions related to this study. AUTHORS’ CONTRIBUTIONS R.N. designed the research plan. R.N. and K.T. analyzed the data. R.N., K.T., D.I. and Y.S. participated in the writing of the paper and participated in the approval of the final manuscript. FUNDING We received no financial support for this study. DATA AVAILABILITY The datasets analyzed during this study available from the corresponding author on reasonable request. ETHICS APPROVAL We consulted with the Representative of the Ethics Committee Members at St. Marianna University Hospital. After careful review, it was determined that the study did not involve patients and was based on the voluntary participation of our medical colleagues, and it was concluded that Institutional Review Board approval was not indicated and required for this study. CONFLICT OF INTEREST STATEMENT All authors declare no conflict of interest. CONSENT FOR PUBLICATION All participants involved in this study provided informed consent for participation and publication. References Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29:1930–40. Miao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia OA, Qureshi F, Cheungpasitporn W. Innovating Personalized Nephrology Care: Exploring the Potential Utilization of ChatGPT. J Pers Med. 2023;13:1681. Miao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia OA, Cheungpasitporn W. Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications. Med (Kaunas). 2024;60:445. Singhal K, Tu T, Gottweis J, Sayres R, Wulczyn E, Hou L et al. Towards Expert-Level Medical Question Answering with Large Language Models. arXiv e-prints. 2023. https://doi.org/10.48550/arXiv.2305.09617 Tu T, Palepu A, Schaekermann M, Saab K, Freyberg J, Tanno R et al. Towards Conversational Diagnostic AI. arXiv e-prints. 2024. https://doi.org/10.48550/arXiv.2401.05654 Van Veen D, Van Uden C, Blankemeier L, Delbrouck J-B, Aali A, Bluethgen C, et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat Med. 2024;30:1134–42. Tu T, Azizi S, Driess D, Schaekermann M, Amin M, Chang P-C, et al. Towards Generalist Biomedical AI NEJM AI. 2024;1:AIoa2300138. Nori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of GPT-4 on Medical Challenge Problems. arXiv e-prints. 2023. https://doi.org/10.48550/arXiv.2303.13375 Kasai J, Kasai Y, Sakaguchi K, Yamada Y, Radev D. Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations. arXiv e-prints. 2023. https://doi.org/10.48550/arXiv.2303.18027 Nicikowski J, Szczepański M, Miedziaszczyk M, Kudliński B. The potential of ChatGPT in medicine: an example analysis of nephrology specialty exams in Poland. Clin Kidney J. 2024;17:sfae193. Noda R, Izaki Y, Kitano F, Komatsu J, Ichikawa D, Shibagaki Y. Performance of ChatGPT and Bard in Self-Assessment Questions for Nephrology Board Renewal. Clin Exp Nephrol. 2024;28:465–9. Wu S, Koo M, Blum L, Black A, Kao L, Fei Z, et al. Benchmarking Open-Source Large Language Models, GPT-4 and Claude 2 on Multiple-Choice Questions in Nephrology. NEJM AI. 2024;0:AIdbp2300092. Miao J, Thongprayoon C, Garcia Valencia OA, Krisanapan P, Sheikh MS, Davis PW, et al. Performance of ChatGPT on Nephrology Test Questions. Clin J Am Soc Nephrol. 2023;19:35–43. ChatGPT. https://openai.com/chatgpt/ . Accessed 9 Jul 2024. Huang L, Yu W, Ma W, Zhong W, Feng Z, Wang H et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv e-prints. 2023. https://doi.org/10.48550/arXiv.2311.05232 Lee P, Bubeck S, Petro J. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med. 2023;388:1233–9. Yu K-H, Healey E, Leong T-Y, Kohane IS, Manrai AK. Medical Artificial Intelligence and Human Values. N Engl J Med. 2024;390:1895–904. MacLaughlin HL, Friedman AN, Ikizler TA. Nutrition in Kidney Disease: Core Curriculum 2022. Am J Kidney Dis. 2022;79:437–49. Sanghavi SF, Freidin N, Swenson ER. Concomitant Lung and Kidney Disorders in Critically Ill Patients: Core Curriculum 2022. Am J Kidney Dis. 2022;79:601–12. Mullins ME, Kraut JA. The Role of the Nephrologist in Management of Poisoning and Intoxication: Core Curriculum 2022. Am J Kidney Dis. 2022;79:877–89. Novak JE, Ellison DH. Diuretics in States of Volume Overload: Core Curriculum 2022. Am J Kidney Dis. 2022;80:264–76. Kant S, Kronbichler A, Geetha D. Principles of Immunosuppression in the Management of Kidney Disease: Core Curriculum 2022. Am J Kidney Dis. 2022;80:393–405. Do C, Vasquez PC, Soleimani M. Metabolic Alkalosis Pathogenesis, Diagnosis, and Treatment: Core Curriculum 2022. Am J Kidney Dis. 2022;80:536–51. Auguste BL, Bargman JM. Peritoneal Dialysis Prescription and Adequacy in Clinical Practice: Core Curriculum 2023. Am J Kidney Dis. 2023;81:100–9. Cervantes CE, Bloch EM, Sperati CJ. Therapeutic Plasma Exchange: Core Curriculum 2023. Am J Kidney Dis. 2023;81:475–92. Katz U, Cohen E, Shachar E, Somer J, Fink A, Morse E, et al. GPT versus Resident Physicians — A Benchmark Based on Official Board Scores. NEJM AI. 2024;1:AIdbp2300192. Miao J, Thongprayoon C, Cheungpasitporn W, Cornell LD. Performance of GPT-4 Vision on kidney pathology exam questions. Am J Clin Pathol 2024;Apr 3:aqae030. Epub ahead of print. Lee JH, Hong H, Nam G, Hwang EJ, Park CM. Effect of Human-AI Interaction on Detection of Malignant Lung Nodules on Chest Radiographs. Radiology. 2023;307:e222976. Yanagawa M, Niioka H, Kusumoto M, Awai K, Tsubamoto M, Satoh Y, et al. Diagnostic performance for pulmonary adenocarcinoma on CT: comparison of radiologists with and without three-dimensional convolutional neural network. Eur Radiol. 2021;31:1978–86. Krakowski I, Kim J, Cai ZR, Daneshjou R, Lapins J, Eriksson H, et al. Human-AI interaction in skin cancer diagnosis: a systematic review and meta-analysis. npj Digit Med. 2024;7:1–10. Goh E, Bunning B, Khoong E, Gallo R, Milstein A, Centola D et al. ChatGPT Influence on Medical Decision-Making, Bias, and Equity: A Randomized Study of Clinicians Evaluating Clinical Vignettes. medRxiv e-prints. 2023. https://doi.org/10.1101/2023.11.24.23298844 Tschandl P, Rinner C, Apalla Z, Argenziano G, Codella N, Halpern A, et al. Human–computer collaboration for skin cancer recognition. Nat Med. 2020;26:1229–34. Han SS, Kim YJ, Moon IJ, Jung JM, Lee MY, Lee WJ, et al. Evaluation of Artificial Intelligence–Assisted Diagnosis of Skin Neoplasms: A Single-Center, Paralleled, Unmasked, Randomized Controlled Trial. J Invest Dermatol. 2022;142:2353–e23622. Cabral S, Restrepo D, Kanjee Z, Wilson P, Crowe B, Abdulnour R-E, et al. Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians. JAMA Intern Med. 2024;184:581. Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 01 May, 2025 Read the published version in Scientific Reports → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4947755","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":347314110,"identity":"184c1cad-632f-4f4b-bb0a-cbce75c7faa3","order_by":0,"name":"Ryunosuke Noda","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAy0lEQVRIiWNgGAWjYHACAxDBw8beAGJbEK9Fjp/nAIgtQbwWY8kZCSCaCC3m7M0bP1e2HU7ccPP51Q0/CiQY+Nu7E/Bqsew5Vix5FqTldk7ZzR6gwyTOnN2A31U3cgwkGyFa0m7wALUYSOQS1GL8E6zl5pm0m3+I1GIGsgXoffZjt4mz5cyxMsuGc+nAQM5huy1jIMFD2C/HmzffbCizBkbl8Wc33/yxkeNv78WvBQwY2UAkDyQZEFYOBn9ABPsDIlWPglEwCkbBSAMAcF9MAL5FFeIAAAAASUVORK5CYII=","orcid":"","institution":"St. Marianna University School of Medicine, Miyamae-Ku","correspondingAuthor":true,"prefix":"","firstName":"Ryunosuke","middleName":"","lastName":"Noda","suffix":""},{"id":347314111,"identity":"b2aa6193-eda4-4369-a2f4-c2438eee7364","order_by":1,"name":"Kenichiro Tanabe","email":"","orcid":"","institution":"St. Marianna University School of Medicine, Miyamae-Ku","correspondingAuthor":false,"prefix":"","firstName":"Kenichiro","middleName":"","lastName":"Tanabe","suffix":""},{"id":347314112,"identity":"4e1c3cae-ae08-4781-9f2b-11e9bab2ee66","order_by":2,"name":"Daisuke Ichikawa","email":"","orcid":"","institution":"St. Marianna University School of Medicine, Miyamae-Ku","correspondingAuthor":false,"prefix":"","firstName":"Daisuke","middleName":"","lastName":"Ichikawa","suffix":""},{"id":347314113,"identity":"c0d0d039-ef4a-4151-9c52-391a2ff14242","order_by":3,"name":"Yugo Shibagaki","email":"","orcid":"","institution":"St. Marianna University School of Medicine, Miyamae-Ku","correspondingAuthor":false,"prefix":"","firstName":"Yugo","middleName":"","lastName":"Shibagaki","suffix":""}],"badges":[],"createdAt":"2024-08-21 01:18:02","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4947755/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4947755/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41598-025-99774-3","type":"published","date":"2025-05-02T00:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":64479449,"identity":"8e522b95-f884-4d4e-a25d-385ef5b77a71","added_by":"auto","created_at":"2024-09-13 16:10:26","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":108624,"visible":true,"origin":"","legend":"\u003cp\u003eFlow diagram of nephrology multiple-choice question selection.\u003c/p\u003e","description":"","filename":"OnlineFIgure1.png","url":"https://assets-eu.researchsquare.com/files/rs-4947755/v1/fdd27f66275e02032b3334f2.png"},{"id":64479455,"identity":"8e69861e-498f-4dc3-b545-2d754599afd2","added_by":"auto","created_at":"2024-09-13 16:10:26","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":104554,"visible":true,"origin":"","legend":"\u003cp\u003eThe proportion of correct answers of junior and senior physicians before and after using ChatGPT. Assuming the proportion of correct answers followed a normal distribution, we described them using means ± standard deviations.\u003c/p\u003e","description":"","filename":"OnlineFigure2.png","url":"https://assets-eu.researchsquare.com/files/rs-4947755/v1/5563dabb89f2de3455c951e1.png"},{"id":64480121,"identity":"2cce1b91-a8fb-4c64-b42e-2e54ba7e17cc","added_by":"auto","created_at":"2024-09-13 16:18:26","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":113835,"visible":true,"origin":"","legend":"\u003cp\u003eThe improvement of individual physicians' proportion of correct answers after using ChatGPT.\u003c/p\u003e","description":"","filename":"OnlineFIgure3.png","url":"https://assets-eu.researchsquare.com/files/rs-4947755/v1/9ce64ea5a15d7cbe81599f37.png"},{"id":64479454,"identity":"0023eea6-971b-4665-8830-ead515f48a99","added_by":"auto","created_at":"2024-09-13 16:10:26","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":107542,"visible":true,"origin":"","legend":"\u003cp\u003eThe proportion of correct answers of physicians before and after ChatGPT assistance by question category.\u003c/p\u003e","description":"","filename":"OnlineFigure4.png","url":"https://assets-eu.researchsquare.com/files/rs-4947755/v1/01f710a8dabf4bb329ea7941.png"},{"id":91120665,"identity":"9cfbf0de-a9dd-46d4-af71-1e14a0d04c65","added_by":"auto","created_at":"2025-09-11 19:05:02","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1269094,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4947755/v1/ed904933-bdb5-4369-95aa-f3e11052af93.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"ChatGPT's Performance in Supporting Physician Decision-Making in Nephrology Multiple-Choice Questions","fulltext":[{"header":"INTRODUCTION","content":"\u003cp\u003eRapid technological advancements and breakthroughs in artificial intelligence (AI), particularly large language models (LLMs), have raised great expectations for their application in various fields of medicine, including nephrology [\u003cspan additionalcitationids=\"CR2\" citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. LLMs are capable of generating human-like text and have demonstrated efficacy in a wide range of medical tasks, such as medical question answering [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e], diagnostic dialogue [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e], clinical text summarization [\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e], and medical image interpretation [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. The score in medical written examinations, which is easily quantifiable, has been primarily used to evaluate the performance of LLMs in medical fields [\u003cspan additionalcitationids=\"CR9 CR10 CR11 CR12\" citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eChatGPT, developed by OpenAI, is one of the most prominent applications based on LLMs [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. Among its various models, GPT-4, despite being a general-purpose system not designed for medicine, has performed remarkably in general medical examinations, meeting the passing standards of the United States medical licensing examinations [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] and the Japanese national medical examinations [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Recently, GPT-4 has been tested in more advanced medical subspecialties, such as nephrology. GPT-4 has shown favorable results in nephrology, achieving passing standards in the Polish nephrology specialty examinations [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] and partially meeting the renewal requirements for the Japanese nephrology specialty certification [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e], indicating its potential for future clinical applications.\u003c/p\u003e \u003cp\u003eAlthough ChatGPT has demonstrated excellent performance in various medical fields, including nephrology, challenges remain for its implementation in clinical practice. Technical issues, such as hallucinations, where fabricated or incorrect information is generated, and the risk of providing responses that do not align with human ethics and values, could negatively impact medical practice [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]. To mitigate these risks, it is essential to recognize that clinical decision-making should not be entrusted solely to ChatGPT without the involvement of physicians [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. Instead, its role should be to support physicians by providing an additional perspective that can be compared and contrasted with the physician's clinical judgment. This approach may allow physicians to maintain responsibility for the final decision while leveraging the potential benefits of ChatGPT in enhancing decision-making accuracy and efficiency. Despite many studies on ChatGPT's performance in medical tasks, research on its ability to support physicians in decision-making remains limited. Consequently, this study investigated whether the scores of nephrology multiple-choice questions improve before and after physicians use ChatGPT, examining the utility and concerns of ChatGPT in supporting physicians' decision-making.\u003c/p\u003e"},{"header":"MATERIALS AND METHODS","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eNephrology Multiple-Choice Questions\u003c/h2\u003e \u003cp\u003eFor this study, we utilized multiple-choice questions from the Core Curriculum in Nephrology articles published in the American Journal of Kidney Diseases. These questions were selected because they aim to provide core knowledge necessary for nephrology practice and are widely accepted by nephrologists worldwide. Each question was formatted as a clinical scenario requiring the correct answer selection from multiple-choice, with each article containing questions from a specific nephrology category. We selected eight articles published after October 2021 to avoid data leakage since the ChatGPT GPT-4 model used in this study was trained on internet data up to September 2021 [\u003cspan additionalcitationids=\"CR19 CR20 CR21 CR22 CR23 CR24\" citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e]. We excluded two questions requiring multiple answers, resulting in 45 single-answer multiple-choice questions for this study (Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The question categories included nutrition (5 questions), lung and kidney disorders (9 questions), intoxication (6 questions), diuretics (6 questions), immunosuppression (3 questions), metabolic alkalosis (5 questions), peritoneal dialysis (5 questions), and plasma exchange (6 questions) (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The correct answers were those provided in the articles\u0026mdash;no questions involved images or charts. We created two PDF files containing only the questions, one in the original English and the other translated into Japanese for the participating physicians.\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eCategories and articles of the questions\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTitle\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eArticle\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCategory\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eNumber of questions\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNutrition in Kidney Disease: Core Curriculum 2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAm J Kidney Dis. 2021; 79(3):437\u0026ndash;449.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eNutrition\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eConcomitant Lung and Kidney Disorders in Critically Ill Patients: Core Curriculum 2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAm J Kidney Dis. 2021; 79(4):601\u0026ndash;612.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eLung and kidney disorders\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eThe Role of the Nephrologist in Management of Poisoning and Intoxication: Core Curriculum 2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAm J Kidney Dis. 2021; 79(6):877\u0026ndash;889.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eIntoxication\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiuretics in States of Volume Overload: Core Curriculum 2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAm J Kidney Dis. 2022; 80(2):264\u0026ndash;276.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eDiuretics\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePrinciples of Immunosuppression in the Management of Kidney Disease: Core Curriculum 2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAm J Kidney Dis. 2022; 80(3):393\u0026ndash;405.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eImmunosuppression\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMetabolic Alkalosis Pathogenesis, Diagnosis, and Treatment: Core Curriculum 2022\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAm J Kidney Dis. 2022; 80(4):536\u0026ndash;551.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eMetabolic alkalosis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePeritoneal Dialysis Prescription and Adequacy in Clinical Practice: Core Curriculum 2023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAm J Kidney Dis. 2022; 81(1):100\u0026ndash;109.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePeritoneal dialysis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e5\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eTherapeutic Plasma Exchange: Core Curriculum 2023\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAm J Kidney Dis. 2023; 81(4):475\u0026ndash;492.\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003ePlasma exchange\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e6\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003eChatGPT\u003c/h2\u003e \u003cp\u003eWe used the ChatGPT GPT-4 model (May 24 version). All prompts were entered in English on June 23\u0026ndash;24, 2023. The initial prompt stated, \u0026ldquo;I am now going to give you questions about nephrology. Please answer the questions in the following format. Answer: Explanation:\u0026rdquo; followed by one question. We refreshed the chat session for each question, repeating this process for all 45 questions. We confirmed that the output followed the answer and explanation format and calculated ChatGPT's scores\u003c/p\u003e \u003cp\u003ebased on the provided answers. For the participating physicians, we compiled ChatGPT's outputs into two PDF files, one in original English and the other translated into Japanese.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eParticipating Physicians\u003c/h2\u003e \u003cp\u003e Eighteen physicians in the Division of Nephrology and Hypertension, Department of Internal Medicine, at St. Marianna University Hospital, volunteered to participate in this study. All physicians were Japanese, residing in Japan, were non-native English speakers. The participants were divided into two groups: eight junior physicians without board certification in nephrology and ten senior physicians with board certification. The participants were initially provided with two types of PDF files, one with the original English questions and the other with the Japanese translations, and they answered the questions twice. During the first attempt, they answered independently. In the second attempt, they referred to two additional PDF files containing ChatGPT's output in both English and Japanese to revise their answers if necessary. The source of the questions was not disclosed, and participants were prohibited from using the internet or literature to research the questions or consulting with others while answering.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec6\" class=\"Section2\"\u003e \u003ch2\u003eData Analysis\u003c/h2\u003e \u003cp\u003eWe calculated the proportion of correct answers for all the questions and each question category for ChatGPT, the junior physician group, and the senior physician group. Assuming the proportion of correct answers followed a normal distribution, we described them using means and standard deviations. We statistically compared the differences in the proportion of correct answers. Differences in the proportion of correct answers between groups were analyzed using independent two-sample t-tests, while differences between the first and second attempts were analyzed using paired t-tests. Statistical significance was set at p\u0026thinsp;\u0026lt;\u0026thinsp;0.05. Statistical analyses were performed using R Version 4.2.3.\u003c/p\u003e \u003c/div\u003e"},{"header":"RESULTS","content":"\u003cp\u003eNo errors occurred in ChatGPT's output, and all outputs adhered to the specified answer and explanation format. ChatGPT's proportion of correct answers for all the questions was 77.8% (35/45). The proportion of correct answers by question category was highest for nutrition at 100% (5/5), peritoneal dialysis at 100% (5/5), and lowest for immunosuppression at 33.3% (1/3).\u003c/p\u003e \u003cp\u003eBefore using ChatGPT, the proportion of correct answers for physicians was 50.8\u0026thinsp;\u0026plusmn;\u0026thinsp;7.5% for the junior physician group and 65.3\u0026thinsp;\u0026plusmn;\u0026thinsp;5.9% for the senior physicians group. ChatGPT's proportion of correct answers was significantly higher than that of the junior physician group (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001) and the senior physician group (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001). After using ChatGPT, the proportion of correct answers increased significantly for both groups to 72.2\u0026thinsp;\u0026plusmn;\u0026thinsp;4.6% for the junior physician group and 77.1\u0026thinsp;\u0026plusmn;\u0026thinsp;4.2% for the senior physician group (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001 for both) (Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Notably, the improvement of the proportion of correct answers was significantly higher for the junior physician group compared to the senior physician group (p\u0026thinsp;=\u0026thinsp;0.015). ChatGPT usage improved the proportion of correct answers for 17 out of the 18 participating physicians, with one senior physician experiencing a decrease (Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e \u003cp\u003eBy question category, significant improvements in the proportion of correct answers were observed in the junior physician group for all categories except immunosuppression. In contrast, in the senior physician group, significant improvements were observed in four categories, excluding immunosuppressive therapy, intoxication, metabolic alkalosis, and peritoneal dialysis (Table\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e). Both groups only experienced a decreased proportion of correct answers in the immunosuppression category, with increases observed in all other categories (Fig.\u0026nbsp;\u003cspan refid=\"Fig4\" class=\"InternalRef\"\u003e4\u003c/span\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eThe proportion of correct answers of ChatGPT, junior physicians, and senior physicians\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"8\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c8\" colnum=\"8\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eQuestion category\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colspan=\"7\" nameend=\"c8\" namest=\"c2\"\u003e \u003cp\u003eProportion of correct answers\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003eChatGPT\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c4\" namest=\"c3\"\u003e \u003cp\u003eJunior physicians (n\u0026thinsp;=\u0026thinsp;8)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003ep-value\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colspan=\"2\" nameend=\"c7\" namest=\"c6\"\u003e \u003cp\u003eSenior physicians (n\u0026thinsp;=\u0026thinsp;10)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003ep-value\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e1st attempt\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e2nd attempt\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e1st vs 2nd\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e1st attempt\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e2nd attempt\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e1st vs 2nd\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eOverall\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e77.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e50.8\u0026thinsp;\u0026plusmn;\u0026thinsp;7.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e72.2\u0026thinsp;\u0026plusmn;\u0026thinsp;4.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e65.3\u0026thinsp;\u0026plusmn;\u0026thinsp;5.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e77.1\u0026thinsp;\u0026plusmn;\u0026thinsp;4.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e\u0026lt;\u0026thinsp;0.001\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eNutrition\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e100.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e55.0\u0026thinsp;\u0026plusmn;\u0026thinsp;17.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e85.0\u0026thinsp;\u0026plusmn;\u0026thinsp;20.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.014\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e70.0\u0026thinsp;\u0026plusmn;\u0026thinsp;19.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e88.0\u0026thinsp;\u0026plusmn;\u0026thinsp;16.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.004\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLung and kidney disorders\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e55.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e38.9\u0026thinsp;\u0026plusmn;\u0026thinsp;10.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e56.9\u0026thinsp;\u0026plusmn;\u0026thinsp;12.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.002\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e58.9\u0026thinsp;\u0026plusmn;\u0026thinsp;13.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e74.4\u0026thinsp;\u0026plusmn;\u0026thinsp;10.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.025\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eIntoxication\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e50.0\u0026thinsp;\u0026plusmn;\u0026thinsp;21.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e81.3\u0026thinsp;\u0026plusmn;\u0026thinsp;5.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.006\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e71.7\u0026thinsp;\u0026plusmn;\u0026thinsp;19.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e80.0\u0026thinsp;\u0026plusmn;\u0026thinsp;7.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.138\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDiuretics\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e50.0\u0026thinsp;\u0026plusmn;\u0026thinsp;15.4%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e70.8\u0026thinsp;\u0026plusmn;\u0026thinsp;11.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.005\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e58.3\u0026thinsp;\u0026plusmn;\u0026thinsp;11.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e75.0\u0026thinsp;\u0026plusmn;\u0026thinsp;14.2%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.008\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eImmunosuppression\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e33.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e45.8\u0026thinsp;\u0026plusmn;\u0026thinsp;17.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e37.5\u0026thinsp;\u0026plusmn;\u0026thinsp;11.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.351\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e60.0\u0026thinsp;\u0026plusmn;\u0026thinsp;21.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e43.3\u0026thinsp;\u0026plusmn;\u0026thinsp;16.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.052\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMetabolic alkalosis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e80.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e52.5\u0026thinsp;\u0026plusmn;\u0026thinsp;23.8%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e75.0\u0026thinsp;\u0026plusmn;\u0026thinsp;9.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.015\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e68.0\u0026thinsp;\u0026plusmn;\u0026thinsp;25.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e82.0\u0026thinsp;\u0026plusmn;\u0026thinsp;6.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.089\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePeritoneal dialysis\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e100.0%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e72.5\u0026thinsp;\u0026plusmn;\u0026thinsp;14.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e90.0\u0026thinsp;\u0026plusmn;\u0026thinsp;15.1%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.021\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e74.0\u0026thinsp;\u0026plusmn;\u0026thinsp;13.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e78.0\u0026thinsp;\u0026plusmn;\u0026thinsp;17.5%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.168\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePlasma exchange\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c2\"\u003e \u003cp\u003e83.3%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e50.0\u0026thinsp;\u0026plusmn;\u0026thinsp;26.7%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003e77.1\u0026thinsp;\u0026plusmn;\u0026thinsp;8.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c5\"\u003e \u003cp\u003e0.014\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e65.0\u0026thinsp;\u0026plusmn;\u0026thinsp;14.6%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c7\"\u003e \u003cp\u003e83.3\u0026thinsp;\u0026plusmn;\u0026thinsp;7.9%\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c8\"\u003e \u003cp\u003e0.003\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"8\"\u003eAssuming the proportion of correct answers followed a normal distribution, we described them using means\u0026thinsp;\u0026plusmn;\u0026thinsp;standard deviations.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003c/p\u003e"},{"header":"DISCUSSION","content":"\u003cp\u003eWe evaluated the performance of the ChatGPT GPT-4 model in assisting physicians with nephrology multiple-choice questions. ChatGPT significantly improved the proportion of correct answers for junior and senior physicians, with a significantly higher effect on junior physicians. However, in a question category with a lower proportion of correct answers, ChatGPT decreased physicians' proportion of correct answers. To the best of our knowledge, this is the first study to assess the performance of ChatGPT in supporting physician decision-making in nephrology.\u003c/p\u003e \u003cp\u003ePrevious studies have evaluated ChatGPT's performance on nephrology-related exams. In the Polish national nephrology specialty exam, GPT-3.5 consistently failed, while GPT-4 passed 11 out of 13 attempts [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Similarly, in Japanese nephrology self-assessment questions, GPT-3.5 and Google Bard never passed, but GPT-4 did so in 3 out of 5 attempts [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. GPT-4 also achieved the highest score (73.3%) in the American Society of Nephrology questions among seven large language models [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. These results indicate GPT-4's strong performance in nephrology exams. However, prior research has not explored the implications of physicians using GPT-4. To better assess these technologies in clinical settings, it is crucial to consider scenarios where physicians utilize them [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e, \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e]. Thus, our study evaluated ChatGPT GPT-4 model's effectiveness in assisting physicians with their answers.\u003c/p\u003e \u003cp\u003eChatGPT showed a significantly higher proportion of correct answers than junior and senior physicians. While previous studies found GPT-4's performance comparable to or slightly better than junior physicians or residents [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e, \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], it rarely outperformed board-certified specialists. For instance, in Israel's official medical board residency examinations, GPT-4 performed comparably to residents, meeting the passing standards in four specialties [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e]. In the American Society of Nephrology's questions, GPT-4's score was below the passing threshold and the mean score of nephrology examinees [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. Additionally, in Japanese nephrology exams, GPT-4 scored lower than fourth-year residents [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]. These findings suggest that GPT-4's higher performance than that of specialists in this study is debatable. The questions from the U.S. academic journal may have favored GPT-4 due to its mainly training on English-language datasets, potentially disadvantaging Japanese physicians. Therefore, the observed performance differences may not reflect clinical knowledge or skills and require careful interpretation.\u003c/p\u003e \u003cp\u003eChatGPT significantly improved the proportion of correct answers for physicians, especially less experienced junior physicians. Improvement in physician performance with AI assistance has been reported in other medical subfields. AI-assisted radiological reading has enhanced radiologists' performance in detecting lung cancer on chest X-rays [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e]. In the imaging diagnosis of lung adenocarcinoma in CT scans, the diagnostic accuracy of less experienced radiologists improved when using AI with 3D-CNN [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. A systematic review demonstrated that AI assistance improved physicians' diagnostic performance in skin cancer, benefiting non-dermatologists, such as primary care providers, more than dermatologists [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e]. There is also a report that ChatGPT's advice improved the accuracy of physicians' decision-making in chest pain evaluation scenarios [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e]. Our findings that ChatGPT effectively supported nephrology physicians align with previous studies, highlighting its potential not just as a clinical tool, but also as an educational aid. For junior physicians, who may benefit from additional guidance, ChatGPT could serve as a valuable tool, helping to bridge the gap between theoretical knowledge and practical application in clinical settings.\u003c/p\u003e \u003cp\u003eHowever, in the category of immunosuppression, ChatGPT had a low proportion of correct answers and, although not statistically significant, decreased physicians' proportion of correct answers. It indicates that using ChatGPT could have severe negative impacts on clinical practice. Recent studies have reported variability in ChatGPT's proportion of correct answers depending on the clinical category [\u003cspan additionalcitationids=\"CR12\" citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. GPT-3.5 performed poorly in electrolyte and acid-base disorders, glomerular diseases, and kidney-related bone and stone disorders, and GPT-4 had the lowest proportion of correct answers in electrolyte and acid-base disorders [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]. The low score of ChatGPT in immunosuppression in this study might be due to a lack of information on immunosuppressive therapy in the model's training data. However, the details of the training data and model from OpenAI are not publicly available, making strict evaluation difficult. The low proportion of correct answers for immunosuppression questions among physicians compared to other categories suggested that the difficulty of these questions might have been higher. Since there is no reliable method to identify the specific categories in which ChatGPT may provide inaccurate information, physicians may struggle to discern which output can be trusted or should be treated with caution. In any case, the decrease in physicians' proportion of correct answers with ChatGPT usage implies that physicians cannot fully distinguish between accurate and inaccurate information output by ChatGPT. It means that reliance on ChatGPT's information could lead to incorrect decision-making. It has already been reported that physician's performance can decline when AI tools provide incorrect recommendations [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e, \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e]. In a study using 20 clinical cases, GPT-4 was reported to make incorrect clinical reasoning more frequently than residents [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e]. Considering the potential negative impact of ChatGPT's inaccuracies on clinical decision-making, the practical application of ChatGPT in clinical settings requires careful consideration, adhering to the fundamental medical principle of \"do no harm\" to patients.\u003c/p\u003e \u003cp\u003eThis study had three limitations. First, ChatGPT's output was generated with a single simple prompt, and its performance reproducibility is not fully guaranteed. As a language model, ChatGPT's performance can fluctuate depending on the prompt and may generate different outputs each time. Second, the questions used in this study covered only some categories of nephrology. Nephrology includes many clinical categories not covered by the questions used in this study, such as acute kidney injury, chronic kidney disease, and renal replacement therapy. Therefore, this study could not comprehensively evaluate the performance of ChatGPT and physicians in nephrology. Third, the external validity of the physicians' proportion of correct answers is uncertain. The results were obtained from a single institution with a small number of participants, and the proportion of correct answers may not represent nephrologists as a whole and could include various biases. Considering these limitations, future studies should evaluate ChatGPT's decision-support performance for physicians using a diverse set of questions covering all areas of nephrology in large-scale, multi-institutional settings.\u003c/p\u003e"},{"header":"CONCLUSION","content":"\u003cp\u003eIn this study, we evaluated ChatGPT's performance in supporting physicians in nephrology. ChatGPT significantly improved the proportion of correct answers in written nephrology exams for physicians, particularly for less experienced junior physicians, although it decreased the proportion of correct answers in a specific category. Further large-scale investigations are necessary to assess the effectiveness of large language models in clinical practice.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eACKNOWLEDGEMENTS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe would like to express our gratitude to the members of the Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine for their cooperation in answering questions related to this study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAUTHORS\u0026rsquo; CONTRIBUTIONS\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eR.N. designed the research plan. R.N. and K.T. analyzed the data. R.N., K.T., D.I. and Y.S. participated in the writing of the paper and participated in the approval of the final manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFUNDING\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe received no financial support for this study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDATA AVAILABILITY\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe datasets analyzed during this study available from the corresponding author on reasonable request.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eETHICS APPROVAL\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe consulted with the Representative of the Ethics Committee Members at St. Marianna University Hospital. After careful review, it was determined that the study did not involve patients and was based on the voluntary participation of our medical colleagues, and it was concluded that Institutional Review Board approval was not indicated and required for this study.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCONFLICT OF INTEREST STATEMENT\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors declare no conflict of interest.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCONSENT FOR PUBLICATION\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll participants involved in this study provided informed consent for participation and publication.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eThirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nat Med. 2023;29:1930\u0026ndash;40.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia OA, Qureshi F, Cheungpasitporn W. Innovating Personalized Nephrology Care: Exploring the Potential Utilization of ChatGPT. J Pers Med. 2023;13:1681.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiao J, Thongprayoon C, Suppadungsuk S, Garcia Valencia OA, Cheungpasitporn W. Integrating Retrieval-Augmented Generation with Large Language Models in Nephrology: Advancing Practical Applications. Med (Kaunas). 2024;60:445.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSinghal K, Tu T, Gottweis J, Sayres R, Wulczyn E, Hou L et al. Towards Expert-Level Medical Question Answering with Large Language Models. arXiv e-prints. 2023. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2305.09617\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2305.09617\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTu T, Palepu A, Schaekermann M, Saab K, Freyberg J, Tanno R et al. Towards Conversational Diagnostic AI. arXiv e-prints. 2024. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2401.05654\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2401.05654\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eVan Veen D, Van Uden C, Blankemeier L, Delbrouck J-B, Aali A, Bluethgen C, et al. Adapted large language models can outperform medical experts in clinical text summarization. Nat Med. 2024;30:1134\u0026ndash;42.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTu T, Azizi S, Driess D, Schaekermann M, Amin M, Chang P-C, et al. Towards Generalist Biomedical AI NEJM AI. 2024;1:AIoa2300138.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNori H, King N, McKinney SM, Carignan D, Horvitz E. Capabilities of GPT-4 on Medical Challenge Problems. arXiv e-prints. 2023. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2303.13375\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2303.13375\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKasai J, Kasai Y, Sakaguchi K, Yamada Y, Radev D. Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations. arXiv e-prints. 2023. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2303.18027\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2303.18027\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNicikowski J, Szczepański M, Miedziaszczyk M, Kudliński B. The potential of ChatGPT in medicine: an example analysis of nephrology specialty exams in Poland. Clin Kidney J. 2024;17:sfae193.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNoda R, Izaki Y, Kitano F, Komatsu J, Ichikawa D, Shibagaki Y. Performance of ChatGPT and Bard in Self-Assessment Questions for Nephrology Board Renewal. Clin Exp Nephrol. 2024;28:465\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWu S, Koo M, Blum L, Black A, Kao L, Fei Z, et al. Benchmarking Open-Source Large Language Models, GPT-4 and Claude 2 on Multiple-Choice Questions in Nephrology. NEJM AI. 2024;0:AIdbp2300092.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiao J, Thongprayoon C, Garcia Valencia OA, Krisanapan P, Sheikh MS, Davis PW, et al. Performance of ChatGPT on Nephrology Test Questions. Clin J Am Soc Nephrol. 2023;19:35\u0026ndash;43.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eChatGPT. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://openai.com/chatgpt/\u003c/span\u003e\u003cspan address=\"https://openai.com/chatgpt/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 9 Jul 2024.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHuang L, Yu W, Ma W, Zhong W, Feng Z, Wang H et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. arXiv e-prints. 2023. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.48550/arXiv.2311.05232\u003c/span\u003e\u003cspan address=\"10.48550/arXiv.2311.05232\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee P, Bubeck S, Petro J. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N Engl J Med. 2023;388:1233\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu K-H, Healey E, Leong T-Y, Kohane IS, Manrai AK. Medical Artificial Intelligence and Human Values. N Engl J Med. 2024;390:1895\u0026ndash;904.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMacLaughlin HL, Friedman AN, Ikizler TA. Nutrition in Kidney Disease: Core Curriculum 2022. Am J Kidney Dis. 2022;79:437\u0026ndash;49.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSanghavi SF, Freidin N, Swenson ER. Concomitant Lung and Kidney Disorders in Critically Ill Patients: Core Curriculum 2022. Am J Kidney Dis. 2022;79:601\u0026ndash;12.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMullins ME, Kraut JA. The Role of the Nephrologist in Management of Poisoning and Intoxication: Core Curriculum 2022. Am J Kidney Dis. 2022;79:877\u0026ndash;89.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eNovak JE, Ellison DH. Diuretics in States of Volume Overload: Core Curriculum 2022. Am J Kidney Dis. 2022;80:264\u0026ndash;76.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKant S, Kronbichler A, Geetha D. Principles of Immunosuppression in the Management of Kidney Disease: Core Curriculum 2022. Am J Kidney Dis. 2022;80:393\u0026ndash;405.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDo C, Vasquez PC, Soleimani M. Metabolic Alkalosis Pathogenesis, Diagnosis, and Treatment: Core Curriculum 2022. Am J Kidney Dis. 2022;80:536\u0026ndash;51.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAuguste BL, Bargman JM. Peritoneal Dialysis Prescription and Adequacy in Clinical Practice: Core Curriculum 2023. Am J Kidney Dis. 2023;81:100\u0026ndash;9.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCervantes CE, Bloch EM, Sperati CJ. Therapeutic Plasma Exchange: Core Curriculum 2023. Am J Kidney Dis. 2023;81:475\u0026ndash;92.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKatz U, Cohen E, Shachar E, Somer J, Fink A, Morse E, et al. GPT versus Resident Physicians \u0026mdash; A Benchmark Based on Official Board Scores. NEJM AI. 2024;1:AIdbp2300192.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiao J, Thongprayoon C, Cheungpasitporn W, Cornell LD. Performance of GPT-4 Vision on kidney pathology exam questions. Am J Clin Pathol 2024;Apr 3:aqae030. Epub ahead of print.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eLee JH, Hong H, Nam G, Hwang EJ, Park CM. Effect of Human-AI Interaction on Detection of Malignant Lung Nodules on Chest Radiographs. Radiology. 2023;307:e222976.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYanagawa M, Niioka H, Kusumoto M, Awai K, Tsubamoto M, Satoh Y, et al. Diagnostic performance for pulmonary adenocarcinoma on CT: comparison of radiologists with and without three-dimensional convolutional neural network. Eur Radiol. 2021;31:1978\u0026ndash;86.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKrakowski I, Kim J, Cai ZR, Daneshjou R, Lapins J, Eriksson H, et al. Human-AI interaction in skin cancer diagnosis: a systematic review and meta-analysis. npj Digit Med. 2024;7:1\u0026ndash;10.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGoh E, Bunning B, Khoong E, Gallo R, Milstein A, Centola D et al. ChatGPT Influence on Medical Decision-Making, Bias, and Equity: A Randomized Study of Clinicians Evaluating Clinical Vignettes. medRxiv e-prints. 2023. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/2023.11.24.23298844\u003c/span\u003e\u003cspan address=\"10.1101/2023.11.24.23298844\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eTschandl P, Rinner C, Apalla Z, Argenziano G, Codella N, Halpern A, et al. Human\u0026ndash;computer collaboration for skin cancer recognition. Nat Med. 2020;26:1229\u0026ndash;34.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHan SS, Kim YJ, Moon IJ, Jung JM, Lee MY, Lee WJ, et al. Evaluation of Artificial Intelligence\u0026ndash;Assisted Diagnosis of Skin Neoplasms: A Single-Center, Paralleled, Unmasked, Randomized Controlled Trial. J Invest Dermatol. 2022;142:2353\u0026ndash;e23622.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCabral S, Restrepo D, Kanjee Z, Wilson P, Crowe B, Abdulnour R-E, et al. Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians. JAMA Intern Med. 2024;184:581.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Artificial intelligence, ChatGPT, GPT-4, Large language models, Nephrology, Clinical decision-making","lastPublishedDoi":"10.21203/rs.3.rs-4947755/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4947755/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eChatGPT is a versatile conversational AI capable of performing various tasks, and its potential use in medicine has garnered attention. However, whether ChatGPT can support physicians' decision-making remains unclear. This study evaluated ChatGPT's performance in supporting physicians with answers to nephrology written examinations.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eWe extracted 45 single-answer multiple-choice questions from the Core Curriculum in Nephrology articles published in the American Journal of Kidney Diseases from October 2021 to June 2023. Eight junior physicians without board certification and ten senior physicians with board certification, as well as the ChatGPT GPT-4 model, answered these questions. The physicians answered twice: first without ChatGPT's support and then with the opportunity to revise their answers based on ChatGPT's output. We statistically compared the proportion of correct answers before and after using ChatGPT.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eChatGPT had a proportion of correct answers of 77.8%. The mean proportion of correct answers from physicians before using ChatGPT was 50.8% (standard deviation [SD] 7.5) for junior physicians and 65.3% (SD 5.9) for senior physicians. After using ChatGPT, the proportion of correct answers significantly increased to 72.2% (SD 4.6) for junior physicians and 77.1% (SD 4.2) for senior physicians (junior physicians: p\u0026thinsp;\u0026lt;\u0026thinsp;0.001, senior physicians: p\u0026thinsp;\u0026lt;\u0026thinsp;0.001). The improvement of the proportion of correct answers was significantly higher for junior physicians than senior physicians (p\u0026thinsp;=\u0026thinsp;0.015). Both groups decreased the proportion of correct answers in one of the seven clinical categories.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eChatGPT significantly improved the accuracy of physicians' answers in nephrology, especially for less experienced physicians, although it also suggested potential negative impacts in a specific subfield. Careful consideration is required regarding using ChatGPT to support physicians' decision-making.\u003c/p\u003e","manuscriptTitle":"ChatGPT's Performance in Supporting Physician Decision-Making in Nephrology Multiple-Choice Questions","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-09-13 16:10:21","doi":"10.21203/rs.3.rs-4947755/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"15d87857-22e8-47a6-91a1-4569a966565f","owner":[],"postedDate":"September 13th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-09-11T19:04:52+00:00","versionOfRecord":{"articleIdentity":"rs-4947755","link":"https://doi.org/10.1038/s41598-025-99774-3","journal":{"identity":"scientific-reports","isVorOnly":false,"title":"Scientific Reports"},"publishedOn":"2025-05-02 00:00:00","publishedOnDateReadable":"May 2nd, 2025"},"versionCreatedAt":"2024-09-13 16:10:21","video":"","vorDoi":"10.1038/s41598-025-99774-3","vorDoiUrl":"https://doi.org/10.1038/s41598-025-99774-3","workflowStages":[]},"version":"v1","identity":"rs-4947755","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4947755","identity":"rs-4947755","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00