Performance of Chat Gpt on a Turkish Board of Orthopaedi̇c Surgery Examination

doi:10.21203/rs.3.rs-4637339/v1

Performance of Chat Gpt on a Turkish Board of Orthopaedi̇c Surgery Examination

2024 · doi:10.21203/rs.3.rs-4637339/v1

preprint OA: closed CC-BY-4.0

📄 Open PDF Full text JSON View at publisher

Full text 54,821 characters · extracted from preprint-html · click to expand

Performance of Chat Gpt on a Turkish Board of Orthopaedi̇c Surgery Examination | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Performance of Chat Gpt on a Turkish Board of Orthopaedi̇c Surgery Examination Süleyman Kaan Öner, Bilgehan Ocak, Yavuz Şahbat, Recep Yasin Kurnaz, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4637339/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background This study aimed to evaluate the success of the Chat GPT according to the Turkish Board of Orthopedic Surgery Examination Methods Among the written exam questions prepared by TOTEK between 2021 and 2023, questions asking visual information like that in the literature and canceled questions were not included, and all other questions were taken into consideration. The questions were divided into 19 categories according to topic. Thequestions were divided into 3 categories according to the methods of evaluating information: direct recall of information, ability to comment and ability to use information correctly. Questions were asked separately about theChat GPT 3.5 and 4.0 artificial intelligence applications. All answers given were evaluated appropriately according to this grouping. Visual questions were not asked to the Chat GPT due to its inability to perceive visual questions. Only questions answered by the application with the correct choice and explanation were accepted as correct answers. Questions that were answered incorrectly by the Chat GPT were considered incorrect. Results We eliminated 300 visual questions in total and asked the remaining 265 multiple-choice questions about the Chat GPT. A total of 95 (35%) of 265 questions were answered correctly, and 169 (63%) were answered incorrectly. It was also seen that he could not answer 1 question. The exam success rate was greater for the Chat GPT group than for the control group, especially for the infection questions (67%). The descriptive findings are shown in Table 3, which shows that both artificial intelligence models can be effective at different levels on various issues, but predominantly, GPT 4 performs better. Conclusion Our study showed that although the Chat GPT could not reach the level of passing the Turkish Orthopedics and Traumatology Proficiency Exam, it could reach a certain level of accuracy. Software such as the Chat GPT needs to be developed and studied further to be useful for orthopedics and traumatology physicians, where the evaluation of radiological images and physical examination are very important. chatgpt orthopedics traumatology exam artificial intelligence Introduction Chat GPT (Generative Pre-Trained Transformer) is an advanced chatbot software released by OpenAI (San Francisco, CA) in November 2022 that understands human language and can conduct mutual conversation through the Large Language Model. [ 1 , 2 ] It can understand most existing languages and make sense from the same language. It can provide answers, solve test questions, answer complex questions, and even understand people's emotions and interact with people [ 1 ] . Medical proficiency exam questions were asked to the Chat GPT application, and the application scanned the bibliography and literature on the internet and was able to answer the questions correctly. [ 1 , 3 ] In our study, we aimed to determine the success rate of the Chat GPT in languages other than English. We also aimed to compare the success rate of this artificial intelligence application, which has recently become very popular, in the proficiency exam of Turkish Orthopedics compared to those who took the exam. We aimed to contribute to the literature by determining which types of questions the Chat GPT is more successful at evaluating in terms of its interpretation power. In our study, we evaluated the answers given to the ChatGPT 3.5 and 4.0 versions by asking proficiency exam questions prepared by the Turkish Orthopedics and Traumatology Training Council (TOTEK) and compared the results with those of the candidates who took the exam. In addition, for the answers given by the application, we evaluated how we could benefit more if we asked questions to the ChatGPT by evaluating them according to parameters such as question type and subject. The aim of our study was to compare the success of surgeons who perform the exam with that of surgeons who perform the Chat GPT and to understand how we can benefit more from the Chat GPT. Materials and Methods Among the written exam questions prepared by TOTEK between 2021 and 2023, questions asking visual information similar to that in the literature and canceled questions were not included, and all other questions were taken into consideration. The questions were divided into 19 categories according to topic. (Lower extremity surgery Foot Ankle (nontrauma); Lower Extremity knee/hip (nontrauma); Infections; Pediatric Orthopedics; Reconstructive Surgery; Sports Injuries Arthroscopic Surgery; Basic Sciences (Totek); Trauma Child Lower Extremity; Trauma Child General; Trauma Child Pelvis Vertebra; Trauma Child Upper Extremity; Trauma Lower Extremity Adult; Trauma Adult General/Open Fracture; Trauma Adult Pelvis/Acetabulum/Vertebra; Trauma Adult Upper Extremity; Tumors; Upper Extremity Surgery Brachial Plexus/Shoulder/Elbow (nontrauma); Upper Extremity Surgery Hand/Wrist/Forearm (nontrauma); Vertebra Surgery (nontrauma). The questions were divided into 3 categories according to the methods of evaluating information: direct recall of information, ability to comment and ability to use information correctly. Questions were asked separately about the Chat GPT 3.5 and 4.0 artificial intelligence applications. All answers given were evaluated appropriately according to this grouping. Visual questions were not asked to the Chat GPT due to its inability to perceive visual questions. Only questions answered by the application with the correct choice and explanation were accepted as correct answers. Questions that were answered incorrectly by the Chat GPT were considered incorrect. The TOTEK question distribution between 2021 and 2023 varies significantly in terms of topics and difficulty levels. The three topics with the most questions were Pediatric Orthopedics (14.45%), Basic Sciences (10.27%) and Reconstructive Surgery and Lower Extremity Surgery Knee/Hip (Nontrauma) (6.84%). This distribution reflects the breadth of the exam's scope. and reflects its diversity. The topics with the most intense questions focusing on knowledge level are listed as “Trauma Adult-Upper Extremity” (80.00%), “Infections” (77.80%) and “Trauma Adult Lower Extremity” (54.50%), which shows the importance given to basic knowledge in these areas. However, the participants in which problem-solving skills were at the forefront were Pediatric Orthopedics (60.50%), Lower Extremity Surgery Knee/Hip (Nontrauma) (66.70%) and Reconstructive Surgery (50.00%), which means that the exam also attaches importance to applied and analytical skills. Among the questions evaluated in total in terms of knowledge, comment and problem-solving levels, 36.1% (95 questions) were questions at the knowledge level, 54.8% (144 questions) were questions at the comment level, and 9.1% (24 questions) were questions at the problem-solving level. This distribution shows that while TOTEK attaches importance to basic knowledge, it also focuses on measuring students’ skills in interpreting their knowledge and solving applied problems. The higher percentage of questions at the comment level emphasizes that the exam attaches great importance to students' understanding and analytical thinking skills. The findings reflect TOTEK's aim of assessing competencies at various cognitive levels in orthopedics and traumatology training. Statistical analysis One-sample t test analysis was used to examine the differences between the general success rate obtained from the answers given by the students participating in the TOTEK exams and the success rates obtained by the Chat GPT models. One-sample t tests are variance analysis techniques used to compare the average of the measurements obtained with the known population average. In this study, an independent sample t test was used to compare the performance differences between GPT models. An alpha significance level of 0.05 was used for the hypothesis tests. All analyses were performed with SPSS v.26. Results We eliminated 300 visual questions in total and asked the remaining 265 multiple-choice questions about the Chat GPT. A total of 95 (35%) of 265 questions were answered correctly, and 169 (63%) were answered incorrectly. It was also seen that he could not answer 1 question. While the correct answer rate of the candidates who took the exam was 56%, that of the candidates who received GPT 3.5 and 4.0 artificial intelligence applications fell behind that of the candidates who took the exam by providing correct answers at rates of 37% and 45%, respectively. (Table 1 ) When we classified TOTEK exam questions according to knowledge evaluation methods, GPT 4 had a greater percentage of correct answers than did GPT 3.5 for all cognitive evaluation methods, although the difference was not statistically significant. For comment questions, the difference in success between Chat GPT 4.0 (43%) and Chat GPT 3.5 (33%) increased. (Table 2 ) The success rate of the Chat GPT exam was greater than that of the other tests, especially for infections (67%). The descriptive findings are shown in Table 3 , which shows that both artificial intelligence models can be effective at different levels on various issues, but predominantly, GPT 4 performs better. Discussion When we examined the correct answer success rates of the Chat GPT 3.5, Chat GPT 4.0 and orthopedic specialists who took the exam in the TOTEK proficiency exam questions, orthopedic specialists were more successful than Chat GPTs. The artificial intelligence application not only provided answers to the exam questions but also provided the necessary literature review and presented the answers to the questions in a logical way. This information was provided along with the explanation. However, these answers are not always accurate or up-to-date. GPT-4, which is a more comprehensive version than GPT 3.5, was found to be more successful than 3.5, as expected. In our study, the percentages of correct answers to 265 questions were 56%, 45% and 37% for the GPT 4 and GPT 3.5 students, respectively. In the literature, it has been reported that the Chat GPT achieved a near-passing score on the American Medical Qualifications Exam (USMLE). [ 3 ] It has also been reported that it achieved near-perfect success in the American university admissions exam (SAT) and was successful in the graduation exams of various university departments. [ 4 , 5 ] In the study conducted by Lum et al. [ 6 ] , where the success of the patients in the exams for Orthopedics and Traumatology education was evaluated, 47% of the 193 questions asked to the Chat GPT answered correctly, and while they were more successful in the questions that tested memory and direct knowledge to pass the exam, success decreases in more complex questions such as comparison, interpretation ability, and use of information. According to the literature, the Chat Gpt had a lower success rate for exams in which the question language was not English. Similarly, in a study conducted by Kaneda et al. [ 7 ] , it was reported that the percentage of correct answers to the Chat GPT decreased by 10% in exams conducted in languages other than English. The development of more modern and effective diagnostic and treatment methods in the field of medicine, as well as the search for fast and effective solutions against diseases worldwide, such as the COVID-19 pandemic, has forced physicians to work with artificial intelligence. [ 2 ] Due to the deep learning feature, deep learning methods can perform advanced medical evaluations, such as radiological imaging and evaluation, and diagnosis through photographs. [ 8 , 9 ] The Chat GPT application is a chatbot application that has recently become popular because of its deep learning features. In our study, we wanted to evaluate the knowledge and analysis ability of artificial intelligence in the field of orthopedics with a proficiency exam prepared by TOTEK. The Chat GPT's access to information on the web without being subject to any control may cause the application to be incomplete in matters such as distinguishing real, correct and up-to-date information; acting in accordance with ethical and moral rules; and directing people correctly. [ 11 ] Similar studies conducted in medical fields other than orthopedics and traumatology have shown that the Chat GPT can contribute to the training and exam success of physicians, but it has been reported that the validity of the answers is controversial. [ 11 , 12 ] When evaluated from an orthopedic perspective, the Chat GPT is not a source that can be completely trusted in accessing accurate information. However, in terms of medical diagnosis and treatment methods, the Chat GPT, in light of patient history and clinical information, can provide information about the treatment process for patients and physicians. [ 13 ] In the literature on this subject, there are also articles with suggestions for researchers and physicians to use the Chat GPT more usefully. [ 14 ] In a study different from the studies we have mentioned and our study, Klang et al. [ 15 ] prepared questions from the Chat GPT application for medical qualification exams and evaluated the questions. At the end of the study, they concluded that the Chat GPT can be used to prepare medical qualification exam questions, provided that they are checked by specialist physicians. All these studies on artificial intelligence show that in the near future, exams can be prepared entirely by artificial intelligence, and the competencies of physicians can be evaluated by artificial intelligence. Limitations Due to the Chat GPT's lack of visual definition, interpretation and integration with questions, questions containing radiological and histological images were excluded from the evaluation, similar to studies in the literature. When we separated the questions according to topics, a statistical evaluation was performed with fewer questions on some topics due to the high number of topics. Although the desired statistical results were obtained, there was concern about making Type 2 errors in the results. Conclusion Our study showed that although the Chat GPT could not reach the level of passing the Turkish Orthopedics and Traumatology Proficiency Exam, it could reach a certain level of accuracy. This situation can be explained by the lack of sufficient and accurate clinical data in the Chat GPT web environment and problems in question formulation. Software such as the Chat GPT needs to be developed and studied further to be useful for orthopedics and traumatology physicians, where the evaluation of radiological images and physical examination are very important. However, physicians' awareness of artificial intelligence, which will be an inevitable part of the coming years, should increase, and it will become a necessity for physicians to follow the developments in artificial intelligence. Declarations ETHICS APPROVAL AND CONSET TO PARTICIPATE 'Not applicable' CONSENT FOR PUBLICATİON 'Not applicable' AVAILABILITY OF DATA AND MATERIAL All data generated or analyzed during this study are included in this published article COMPETING INTERESTS 'Not applicable' FUNDING 'Not applicable' Authors' contributions AUTHORS’ CONTRIBUTIONS Süleyman Kaan Öner planned the program and Bilgehan OCAK was prepared to answer questions in chat GPT and he was great contributor in writing manuscript. Yavuz Şahbat made statistical analysis and Yasin Kurnaz/Emre Cilingir examined the data and drew conclusions. ACKNOWLEDGEMENTS 'Not applicable' References -Brandtzaeg PB, Følstad A. (2017) Why people use chatbots. Paper presented at: internet Science: 4th International Conference, INSCI 2017, Thessaloniki, Proceedings 42017. -Howard J. Artificial intelligence: implications for the future of work. Am J Ind Med. 2019;62:917–26. -Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. 10.1371/journal.pdig.0000198 . PMID: 36812645; PMCID: PMC9931230. -Massey PA, Montgomery C, Zhang AS. Comparison of the performance of the ChatGPT-3.5, ChatGPT-4, and the Orthopedic Resident on the Orthopedic Assessment Examinations. J Am Acad Orthop Surg. 2023;31(23):1173–9. 10.5435/JAAOS-D-23-00396 . Epub 2023 Sep 4. PMID: 37671415; PMCID: PMC10627532. -Terwiesch C. Would chat GPT get a Wharton MBA? New white paper by Christian Terwiesch. -Lum ZC. Can Artificial Intelligence Pass the American Board of Orthopedic Surgery Examination? Orthopedic Residents Versus ChatGPT. Clin Orthop Relat Res. 2023;481(8):1623–30. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23. PMID: 37220190; PMCID: PMC10344569. -Kaneda Y, Tanimoto T, Ozaki A, Sato T, Takahashi K. 2023. ‘Can ChatGPT Pass the 2023 Japanese National Medical Licensing Examination?’ Preprints. https://doi.org/10.20944/preprints202303.0191.v1 . -Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–118. 10.1038/nature21056 . Epub 2017 Jan 25. Erratum in: Nature. 2017;546(7660):686. PMID: 28117445; PMCID: PMC8382232. -Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115–118. 10.1038/nature21056 . Epub 2017 Jan 25. Erratum in: Nature. 2017;546(7660):686. PMID: 28117445; PMCID: PMC8382232. Evolution -GPT. https://medium.com/the-techlife/evolution-of-openais-gpt-models-8148e6214ee7 , Available Online, Accessed on March, 2023. -Revercomb L, Patel AM, Choudhry HS, Filimonov A. Performance of ChatGPT in Otolaryngology knowledge assessment. Am J Otolaryngol. 2023;45(1):104082. 10.1016/j.amjoto.2023.104082 . Epub ahead of print. PMID: 37862879. -Suchman K, Garg S, Trindade AJ. Chat Generative Pretrained Transformer Fails the Multiple-Choice American College of Gastroenterology Self-Assessment Test. Am J Gastroenterol. 2023;118(12):2280–2. 10.14309/ajg.0000000000002320 . Epub 2023 May 22. PMID: 37212584. -Garcia-Vidal C, Sanjuan G, Puerta-Alcalde P, Moreno-García E, Soriano A. Artificial intelligence to support clinical decision-making processes. EBioMedicine. 2019;46:27–9. 10.1016/j.ebiom.2019.07.019 . Epub 2019 Jul 11. PMID: 31303500; PMCID: PMC6710912. -Non LR. All aboard the ChatGPT steamroller: Top 10 ways to make artificial intelligence work for healthcare professionals. Antimicrob stewardship Healthc epidemiology: ASHE vol. 2023;3. 10.1017/ash.2023.512 . 1 e24318 Dec. Klang -E et al. Oct. Advantages and pitfalls in utilizing artificial intelligence for crafting medical examinations: a medical education pilot study with GPT-4. BMC medical education vol. 23,1 772. 17 2023, 10.1186/s12909-023-04752-w . Tables Tables 1 to 3 are available in the Supplementary Files section. Additional Declarations No competing interests reported. Supplementary Files table1.png table2.png table3.png Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4637339","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":325967079,"identity":"1461d279-9963-4cea-b7bb-42282a9b8fee","order_by":0,"name":"Süleyman Kaan Öner","email":"","orcid":"","institution":"Ağrı Patnos State Hospital, Department of Orthopaedics and Traumatology","correspondingAuthor":false,"prefix":"","firstName":"Süleyman","middleName":"Kaan","lastName":"Öner","suffix":""},{"id":325967080,"identity":"21f65ee6-ebd0-4146-9435-c50347229de8","order_by":1,"name":"Bilgehan Ocak","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3ElEQVRIiWNgGAWjYBACA2bmBjBDgh1EG1gQo4URqoXnAIgrQYQWBpgWiQQwRViLOTtj84sPFXbykjOfX93wo0CCgb+9OwGvFstmxjbLGWeSDWdL55Td7AE6TOLM2Q34HXaYsc2Yt+0A4zzpnLQbPEAtBhK5RGj523bAfp7kmbSbf4jU0vyYse1A4mwJ9mO3ibIF5BfGnjPJyTN7cthuyxhI8BD0izn/4cMfflTY2c44fvzZzTd/bOT423vxawECNmhc8BiASULKQYD5A4Rmf0CM6lEwCkbBKBiBAAACJEfyucM3hwAAAABJRU5ErkJggg==","orcid":"","institution":"Kütahya Sağlık Bilimleri Üniversitesi","correspondingAuthor":true,"prefix":"","firstName":"Bilgehan","middleName":"","lastName":"Ocak","suffix":""},{"id":325967081,"identity":"abf72269-a5ca-4530-b1b6-69d96bf22d30","order_by":2,"name":"Yavuz Şahbat","email":"","orcid":"","institution":"Erzurum City Hospital, Department of Orthopaedics and Traumatology","correspondingAuthor":false,"prefix":"","firstName":"Yavuz","middleName":"","lastName":"Şahbat","suffix":""},{"id":325967082,"identity":"b6f243cd-d2d0-4668-b429-7d120151f6ca","order_by":3,"name":"Recep Yasin Kurnaz","email":"","orcid":"","institution":"Eskisehir Acıbadem Hospital, Department of Orthopaedics and Traumatology","correspondingAuthor":false,"prefix":"","firstName":"Recep","middleName":"Yasin","lastName":"Kurnaz","suffix":""},{"id":325967083,"identity":"67285246-696b-4e0f-ba3a-65dc1b82e014","order_by":4,"name":"Emre Çilingir","email":"","orcid":"","institution":"Kütahya Sağlık Bilimleri Üniversitesi","correspondingAuthor":false,"prefix":"","firstName":"Emre","middleName":"","lastName":"Çilingir","suffix":""}],"badges":[],"createdAt":"2024-06-25 14:55:10","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4637339/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4637339/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":69101668,"identity":"6845802e-a8c0-414b-b431-df48e857ccd0","added_by":"auto","created_at":"2024-11-15 16:02:11","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":251232,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4637339/v1/eada6a0a-0afd-4584-afa9-3e90e092a5ea.pdf"},{"id":61877385,"identity":"e9c1afb3-8050-448d-8582-ec6b522c3ba8","added_by":"auto","created_at":"2024-08-06 14:33:58","extension":"png","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":128453,"visible":true,"origin":"","legend":"","description":"","filename":"table1.png","url":"https://assets-eu.researchsquare.com/files/rs-4637339/v1/7dcce3e50750f1225ed0c20d.png"},{"id":61877386,"identity":"bdb6ffaf-3e9b-4c69-a703-9a5148097b0d","added_by":"auto","created_at":"2024-08-06 14:33:58","extension":"png","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":106049,"visible":true,"origin":"","legend":"","description":"","filename":"table2.png","url":"https://assets-eu.researchsquare.com/files/rs-4637339/v1/df364107f8d9ebec3d3ef4d3.png"},{"id":61877387,"identity":"dd16cf48-2e23-4849-9be6-4d42ff40c26c","added_by":"auto","created_at":"2024-08-06 14:33:59","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":267630,"visible":true,"origin":"","legend":"","description":"","filename":"table3.png","url":"https://assets-eu.researchsquare.com/files/rs-4637339/v1/bb8fb5a2d826087d4aaa1f3f.png"}],"financialInterests":"No competing interests reported.","formattedTitle":"\u003cp\u003ePerformance of Chat Gpt on a Turkish Board of Orthopaedi̇c Surgery Examination\u003c/p\u003e","fulltext":[{"header":"Introduction","content":"\u003cp\u003eChat GPT (Generative Pre-Trained Transformer) is an advanced chatbot software released by OpenAI (San Francisco, CA) in November 2022 that understands human language and can conduct mutual conversation through the Large Language Model. \u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]\u003c/sup\u003e It can understand most existing languages and make sense from the same language. It can provide answers, solve test questions, answer complex questions, and even understand people's emotions and interact with people\u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]\u003c/sup\u003e.\u003c/p\u003e \u003cp\u003eMedical proficiency exam questions were asked to the Chat GPT application, and the application scanned the bibliography and literature on the internet and was able to answer the questions correctly. \u003csup\u003e[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/sup\u003e In our study, we aimed to determine the success rate of the Chat GPT in languages other than English. We also aimed to compare the success rate of this artificial intelligence application, which has recently become very popular, in the proficiency exam of Turkish Orthopedics compared to those who took the exam. We aimed to contribute to the literature by determining which types of questions the Chat GPT is more successful at evaluating in terms of its interpretation power.\u003c/p\u003e \u003cp\u003eIn our study, we evaluated the answers given to the ChatGPT 3.5 and 4.0 versions by asking proficiency exam questions prepared by the Turkish Orthopedics and Traumatology Training Council (TOTEK) and compared the results with those of the candidates who took the exam. In addition, for the answers given by the application, we evaluated how we could benefit more if we asked questions to the ChatGPT by evaluating them according to parameters such as question type and subject. The aim of our study was to compare the success of surgeons who perform the exam with that of surgeons who perform the Chat GPT and to understand how we can benefit more from the Chat GPT.\u003c/p\u003e"},{"header":"Materials and Methods","content":"\u003cp\u003eAmong the written exam questions prepared by TOTEK between 2021 and 2023, questions asking visual information similar to that in the literature and canceled questions were not included, and all other questions were taken into consideration. The questions were divided into 19 categories according to topic. (Lower extremity surgery Foot Ankle (nontrauma); Lower Extremity knee/hip (nontrauma); Infections; Pediatric Orthopedics; Reconstructive Surgery; Sports Injuries Arthroscopic Surgery; Basic Sciences (Totek); Trauma Child Lower Extremity; Trauma Child General; Trauma Child Pelvis Vertebra; Trauma Child Upper Extremity; Trauma Lower Extremity Adult; Trauma Adult General/Open Fracture; Trauma Adult Pelvis/Acetabulum/Vertebra; Trauma Adult Upper Extremity; Tumors; Upper Extremity Surgery Brachial Plexus/Shoulder/Elbow (nontrauma); Upper Extremity Surgery Hand/Wrist/Forearm (nontrauma); Vertebra Surgery (nontrauma).\u003c/p\u003e \u003cp\u003eThe questions were divided into 3 categories according to the methods of evaluating information: direct recall of information, ability to comment and ability to use information correctly. Questions were asked separately about the Chat GPT 3.5 and 4.0 artificial intelligence applications. All answers given were evaluated appropriately according to this grouping.\u003c/p\u003e \u003cp\u003eVisual questions were not asked to the Chat GPT due to its inability to perceive visual questions. Only questions answered by the application with the correct choice and explanation were accepted as correct answers. Questions that were answered incorrectly by the Chat GPT were considered incorrect.\u003c/p\u003e \u003cp\u003eThe TOTEK question distribution between 2021 and 2023 varies significantly in terms of topics and difficulty levels. The three topics with the most questions were Pediatric Orthopedics (14.45%), Basic Sciences (10.27%) and Reconstructive Surgery and Lower Extremity Surgery Knee/Hip (Nontrauma) (6.84%). This distribution reflects the breadth of the exam's scope. and reflects its diversity. The topics with the most intense questions focusing on knowledge level are listed as \u0026ldquo;Trauma Adult-Upper Extremity\u0026rdquo; (80.00%), \u0026ldquo;Infections\u0026rdquo; (77.80%) and \u0026ldquo;Trauma Adult Lower Extremity\u0026rdquo; (54.50%), which shows the importance given to basic knowledge in these areas. However, the participants in which problem-solving skills were at the forefront were Pediatric Orthopedics (60.50%), Lower Extremity Surgery Knee/Hip (Nontrauma) (66.70%) and Reconstructive Surgery (50.00%), which means that the exam also attaches importance to applied and analytical skills. Among the questions evaluated in total in terms of knowledge, comment and problem-solving levels, 36.1% (95 questions) were questions at the knowledge level, 54.8% (144 questions) were questions at the comment level, and 9.1% (24 questions) were questions at the problem-solving level. This distribution shows that while TOTEK attaches importance to basic knowledge, it also focuses on measuring students\u0026rsquo; skills in interpreting their knowledge and solving applied problems. The higher percentage of questions at the comment level emphasizes that the exam attaches great importance to students' understanding and analytical thinking skills. The findings reflect TOTEK's aim of assessing competencies at various cognitive levels in orthopedics and traumatology training.\u003c/p\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eStatistical analysis\u003c/h2\u003e \u003cp\u003eOne-sample t test analysis was used to examine the differences between the general success rate obtained from the answers given by the students participating in the TOTEK exams and the success rates obtained by the Chat GPT models. One-sample t tests are variance analysis techniques used to compare the average of the measurements obtained with the known population average. In this study, an independent sample t test was used to compare the performance differences between GPT models. An alpha significance level of 0.05 was used for the hypothesis tests. All analyses were performed with SPSS v.26.\u003c/p\u003e \u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003eWe eliminated 300 visual questions in total and asked the remaining 265 multiple-choice questions about the Chat GPT. A total of 95 (35%) of 265 questions were answered correctly, and 169 (63%) were answered incorrectly. It was also seen that he could not answer 1 question.\u003c/p\u003e\n\u003cp\u003eWhile the correct answer rate of the candidates who took the exam was 56%, that of the candidates who received GPT 3.5 and 4.0 artificial intelligence applications fell behind that of the candidates who took the exam by providing correct answers at rates of 37% and 45%, respectively. (Table \u003cspan\u003e1\u003c/span\u003e)\u003c/p\u003e\n\u003cp\u003eWhen we classified TOTEK exam questions according to knowledge evaluation methods, GPT 4 had a greater percentage of correct answers than did GPT 3.5 for all cognitive evaluation methods, although the difference was not statistically significant. For comment questions, the difference in success between Chat GPT 4.0 (43%) and Chat GPT 3.5 (33%) increased. (Table \u003cspan\u003e2\u003c/span\u003e)\u003c/p\u003e\n\u003cp\u003eThe success rate of the Chat GPT exam was greater than that of the other tests, especially for infections (67%). The descriptive findings are shown in Table \u003cspan\u003e3\u003c/span\u003e, which shows that both artificial intelligence models can be effective at different levels on various issues, but predominantly, GPT 4 performs better.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eWhen we examined the correct answer success rates of the Chat GPT 3.5, Chat GPT 4.0 and orthopedic specialists who took the exam in the TOTEK proficiency exam questions, orthopedic specialists were more successful than Chat GPTs. The artificial intelligence application not only provided answers to the exam questions but also provided the necessary literature review and presented the answers to the questions in a logical way. This information was provided along with the explanation. However, these answers are not always accurate or up-to-date. GPT-4, which is a more comprehensive version than GPT 3.5, was found to be more successful than 3.5, as expected.\u003c/p\u003e \u003cp\u003eIn our study, the percentages of correct answers to 265 questions were 56%, 45% and 37% for the GPT 4 and GPT 3.5 students, respectively. In the literature, it has been reported that the Chat GPT achieved a near-passing score on the American Medical Qualifications Exam (USMLE). \u003csup\u003e[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]\u003c/sup\u003e It has also been reported that it achieved near-perfect success in the American university admissions exam (SAT) and was successful in the graduation exams of various university departments. \u003csup\u003e[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e, \u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e]\u003c/sup\u003e In the study conducted by Lum et al.\u003csup\u003e[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]\u003c/sup\u003e, where the success of the patients in the exams for Orthopedics and Traumatology education was evaluated, 47% of the 193 questions asked to the Chat GPT answered correctly, and while they were more successful in the questions that tested memory and direct knowledge to pass the exam, success decreases in more complex questions such as comparison, interpretation ability, and use of information.\u003c/p\u003e \u003cp\u003eAccording to the literature, the Chat Gpt had a lower success rate for exams in which the question language was not English. Similarly, in a study conducted by Kaneda et al.\u003csup\u003e[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]\u003c/sup\u003e, it was reported that the percentage of correct answers to the Chat GPT decreased by 10% in exams conducted in languages other than English.\u003c/p\u003e \u003cp\u003eThe development of more modern and effective diagnostic and treatment methods in the field of medicine, as well as the search for fast and effective solutions against diseases worldwide, such as the COVID-19 pandemic, has forced physicians to work with artificial intelligence. \u003csup\u003e[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]\u003c/sup\u003e Due to the deep learning feature, deep learning methods can perform advanced medical evaluations, such as radiological imaging and evaluation, and diagnosis through photographs. \u003csup\u003e[\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e, \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] The\u003c/sup\u003e Chat GPT application is a chatbot application that has recently become popular because of its deep learning features. In our study, we wanted to evaluate the knowledge and analysis ability of artificial intelligence in the field of orthopedics with a proficiency exam prepared by TOTEK.\u003c/p\u003e \u003cp\u003e The Chat GPT's access to information on the web without being subject to any control may cause the application to be incomplete in matters such as distinguishing real, correct and up-to-date information; acting in accordance with ethical and moral rules; and directing people correctly. \u003csup\u003e[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e]\u003c/sup\u003e Similar studies conducted in medical fields other than orthopedics and traumatology have shown that the Chat GPT can contribute to the training and exam success of physicians, but it has been reported that the validity of the answers is controversial. \u003csup\u003e[\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e, \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]\u003c/sup\u003e When evaluated from an orthopedic perspective, the Chat GPT is not a source that can be completely trusted in accessing accurate information. However, in terms of medical diagnosis and treatment methods, the Chat GPT, in light of patient history and clinical information, can provide information about the treatment process for patients and physicians. \u003csup\u003e[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e]\u003c/sup\u003e In the literature on this subject, there are also articles with suggestions for researchers and physicians to use the Chat GPT more usefully. \u003csup\u003e[\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]\u003c/sup\u003e\u003c/p\u003e \u003cp\u003eIn a study different from the studies we have mentioned and our study, Klang et al.\u003csup\u003e[\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e]\u003c/sup\u003e prepared questions from the Chat GPT application for medical qualification exams and evaluated the questions. At the end of the study, they concluded that the Chat GPT can be used to prepare medical qualification exam questions, provided that they are checked by specialist physicians. All these studies on artificial intelligence show that in the near future, exams can be prepared entirely by artificial intelligence, and the competencies of physicians can be evaluated by artificial intelligence.\u003c/p\u003e"},{"header":"Limitations","content":"\u003cp\u003eDue to the Chat GPT's lack of visual definition, interpretation and integration with questions, questions containing radiological and histological images were excluded from the evaluation, similar to studies in the literature. When we separated the questions according to topics, a statistical evaluation was performed with fewer questions on some topics due to the high number of topics. Although the desired statistical results were obtained, there was concern about making Type 2 errors in the results.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eOur study showed that although the Chat GPT could not reach the level of passing the Turkish Orthopedics and Traumatology Proficiency Exam, it could reach a certain level of accuracy. This situation can be explained by the lack of sufficient and accurate clinical data in the Chat GPT web environment and problems in question formulation. Software such as the Chat GPT needs to be developed and studied further to be useful for orthopedics and traumatology physicians, where the evaluation of radiological images and physical examination are very important. However, physicians' awareness of artificial intelligence, which will be an inevitable part of the coming years, should increase, and it will become a necessity for physicians to follow the developments in artificial intelligence.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eETHICS APPROVAL AND CONSET TO PARTICIPATE\u003c/p\u003e\n\u003cp\u003e\u0026apos;Not applicable\u0026apos;\u003c/p\u003e\n\u003cp\u003eCONSENT FOR PUBLICATİON\u003c/p\u003e\n\u003cp\u003e\u0026apos;Not applicable\u0026apos;\u003c/p\u003e\n\u003cp\u003eAVAILABILITY OF DATA AND MATERIAL\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;All data generated or analyzed during this study are included in this published article\u003c/p\u003e\n\u003cp\u003eCOMPETING INTERESTS\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u0026apos;Not applicable\u0026apos;\u003c/p\u003e\n\u003cp\u003eFUNDING\u003c/p\u003e\n\u003cp\u003e\u0026apos;Not applicable\u0026apos;\u003c/p\u003e\n\u003cp\u003eAuthors\u0026apos; contributions\u003c/p\u003e\n\u003cp\u003eAUTHORS\u0026rsquo; CONTRIBUTIONS\u003c/p\u003e\n\u003cp\u003eS\u0026uuml;leyman Kaan \u0026Ouml;ner planned the program and Bilgehan OCAK \u0026nbsp;was prepared to answer questions in chat GPT and he was great contributor in writing manuscript. \u0026nbsp; Yavuz Şahbat made statistical analysis and Yasin Kurnaz/Emre Cilingir examined the data and drew conclusions.\u003c/p\u003e\n\u003cp\u003eACKNOWLEDGEMENTS\u003c/p\u003e\n\u003cp\u003e\u0026apos;Not applicable\u0026apos;\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003e-Brandtzaeg PB, F\u0026oslash;lstad A. (2017) Why people use chatbots. Paper presented at: internet Science: 4th International Conference, INSCI 2017, Thessaloniki, Proceedings 42017.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e-Howard J. Artificial intelligence: implications for the future of work. Am J Ind Med. 2019;62:917\u0026ndash;26.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e-Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepa\u0026ntilde;o C, Madriaga M, Aggabao R, Diaz-Candido G, Maningo J, Tseng V. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1371/journal.pdig.0000198\u003c/span\u003e\u003cspan address=\"10.1371/journal.pdig.0000198\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. PMID: 36812645; PMCID: PMC9931230.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e-Massey PA, Montgomery C, Zhang AS. Comparison of the performance of the ChatGPT-3.5, ChatGPT-4, and the Orthopedic Resident on the Orthopedic Assessment Examinations. J Am Acad Orthop Surg. 2023;31(23):1173\u0026ndash;9. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.5435/JAAOS-D-23-00396\u003c/span\u003e\u003cspan address=\"10.5435/JAAOS-D-23-00396\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Epub 2023 Sep 4. PMID: 37671415; PMCID: PMC10627532.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e-Terwiesch C. Would chat GPT get a Wharton MBA? New white paper by Christian Terwiesch.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e-Lum ZC. Can Artificial Intelligence Pass the American Board of Orthopedic Surgery Examination? Orthopedic Residents Versus ChatGPT. Clin Orthop Relat Res. 2023;481(8):1623\u0026ndash;30. doi: 10.1097/CORR.0000000000002704. Epub 2023 May 23. PMID: 37220190; PMCID: PMC10344569.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e-Kaneda Y, Tanimoto T, Ozaki A, Sato T, Takahashi K. 2023. \u0026lsquo;Can ChatGPT Pass the 2023 Japanese National Medical Licensing Examination?\u0026rsquo; Preprints. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.20944/preprints202303.0191.v1\u003c/span\u003e\u003cspan address=\"10.20944/preprints202303.0191.v1\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e-Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115\u0026ndash;118. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/nature21056\u003c/span\u003e\u003cspan address=\"10.1038/nature21056\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Epub 2017 Jan 25. Erratum in: Nature. 2017;546(7660):686. PMID: 28117445; PMCID: PMC8382232.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e-Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115\u0026ndash;118. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1038/nature21056\u003c/span\u003e\u003cspan address=\"10.1038/nature21056\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Epub 2017 Jan 25. Erratum in: Nature. 2017;546(7660):686. PMID: 28117445; PMCID: PMC8382232.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEvolution -GPT. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://medium.com/the-techlife/evolution-of-openais-gpt-models-8148e6214ee7\u003c/span\u003e\u003cspan address=\"https://medium.com/the-techlife/evolution-of-openais-gpt-models-8148e6214ee7\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e, Available Online, Accessed on March, 2023.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e-Revercomb L, Patel AM, Choudhry HS, Filimonov A. Performance of ChatGPT in Otolaryngology knowledge assessment. Am J Otolaryngol. 2023;45(1):104082. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.amjoto.2023.104082\u003c/span\u003e\u003cspan address=\"10.1016/j.amjoto.2023.104082\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Epub ahead of print. PMID: 37862879.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e-Suchman K, Garg S, Trindade AJ. Chat Generative Pretrained Transformer Fails the Multiple-Choice American College of Gastroenterology Self-Assessment Test. Am J Gastroenterol. 2023;118(12):2280\u0026ndash;2. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.14309/ajg.0000000000002320\u003c/span\u003e\u003cspan address=\"10.14309/ajg.0000000000002320\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Epub 2023 May 22. PMID: 37212584.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e-Garcia-Vidal C, Sanjuan G, Puerta-Alcalde P, Moreno-Garc\u0026iacute;a E, Soriano A. Artificial intelligence to support clinical decision-making processes. EBioMedicine. 2019;46:27\u0026ndash;9. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.ebiom.2019.07.019\u003c/span\u003e\u003cspan address=\"10.1016/j.ebiom.2019.07.019\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Epub 2019 Jul 11. PMID: 31303500; PMCID: PMC6710912.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003e-Non LR. All aboard the ChatGPT steamroller: Top 10 ways to make artificial intelligence work for healthcare professionals. Antimicrob stewardship Healthc epidemiology: ASHE vol. 2023;3. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1017/ash.2023.512\u003c/span\u003e\u003cspan address=\"10.1017/ash.2023.512\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. 1 e24318 Dec.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKlang -E et al. Oct. Advantages and pitfalls in utilizing artificial intelligence for crafting medical examinations: a medical education pilot study with GPT-4. \u003cem\u003eBMC medical education\u003c/em\u003e vol. 23,1 772. 17 2023, \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1186/s12909-023-04752-w\u003c/span\u003e\u003cspan address=\"10.1186/s12909-023-04752-w\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"},{"header":"Tables","content":"\u003cp\u003eTables 1 to 3 are available in the Supplementary Files section.\u003c/p\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"chatgpt, orthopedics, traumatology, exam, artificial intelligence","lastPublishedDoi":"10.21203/rs.3.rs-4637339/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4637339/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis study aimed to evaluate the success of the Chat GPT according to the Turkish Board of Orthopedic Surgery Examination\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethods\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAmong the written exam questions prepared by TOTEK between 2021 and 2023, questions asking visual information like that in the literature and canceled questions were not included, and all other questions were taken into consideration. The questions were divided into 19 categories according to topic. Thequestions were divided into 3 categories according to the methods of evaluating information: direct recall of information, ability to comment and ability to use information correctly. Questions were asked separately about theChat GPT 3.5 and 4.0 artificial intelligence applications. All answers given were evaluated appropriately according to this grouping.\u003c/p\u003e\n\u003cp\u003eVisual questions were not asked to the Chat GPT due to its inability to perceive visual questions. Only questions answered by the application with the correct choice and explanation were accepted as correct answers. Questions that were answered incorrectly by the Chat GPT were considered incorrect.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe eliminated 300 visual questions in total and asked the remaining 265 multiple-choice questions about the Chat GPT. A total of 95 (35%) of 265 questions were answered correctly, and 169 (63%) were answered incorrectly. It was also seen that he could not answer 1 question. The exam success rate was greater for the Chat GPT group than for the control group, especially for the infection questions (67%). The descriptive findings are shown in Table 3, which shows that both artificial intelligence models can be effective at different levels on various issues, but predominantly, GPT 4 performs better.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusion\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eOur study showed that although the Chat GPT could not reach the level of passing the Turkish Orthopedics and Traumatology Proficiency Exam, it could reach a certain level of accuracy. Software such as the Chat GPT needs to be developed and studied further to be useful for orthopedics and traumatology physicians, where the evaluation of radiological images and physical examination are very important.\u003c/p\u003e","manuscriptTitle":"Performance of Chat Gpt on a Turkish Board of Orthopaedi̇c Surgery Examination","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-06 14:33:54","doi":"10.21203/rs.3.rs-4637339/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"b1e7bb3c-b522-423f-a6bc-ee21c4515a0a","owner":[],"postedDate":"August 6th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-11-15T15:54:05+00:00","versionOfRecord":[],"versionCreatedAt":"2024-08-06 14:33:54","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4637339","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4637339","identity":"rs-4637339","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall: last seen: 2026-05-24T02:00:01.246996+00:00

License: CC-BY-4.0