Full text
23,297 characters
· extracted from
preprint-html
· click to expand
Evaluating the performance of artificial intelligence in answering frequently asked patient questions on epistaxis | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 24 June 2025 V1 Latest version Share on Evaluating the performance of artificial intelligence in answering frequently asked patient questions on epistaxis Authors : Shireen Gohari 0000-0001-9166-3579 [email protected] , Chiugo Ike , Valentin Weber , Kennth Lai , and Ryan Cheong Authors Info & Affiliations https://doi.org/10.22541/au.175074855.53963407/v1 145 views 101 downloads Contents Abstract Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract 11pt, fleqn, a4paper, ]LegrandOrangeBook Objective: Epistaxis is a common otolaryngological presentation encountered in the clinical setting. This study compares ChatGPT’s accuracy, readability, and conciseness in addressing patient questions compared to the American Academy of Otolaryngology-Head and Neck Surgery (AAO-HNS) Clinical Practice Guidelines [1]. Methods: Frequently asked questions (FAQs) for Patients with Nosebleed were extrapolated from the AAO-HNS Clinical Practice Guidelines [1]. ChatGPT-generated responses were compared to AAO-HNS model answers and graded by fifteen otolaryngologists for accuracy, readability, and conciseness. Results: 86.7-100% of otolaryngology consultants agreed with the accuracy of ChatGPT responses compared to 66.7-80% agreement with AAO-HNS responses. 66.7-93.3% of consultants agreed that all Chat GPT and AAO-HNS responses were easy to understand. ChatGPT responses were less concise with 6.7-40% consultant agreement, compared to 80-100% agreement with AAO-HNS responses. Conclusion: ChatGPT provided guideline-consistent epistaxis advice and appropriately encouraged specialist care but lacked conciseness. ChatGPT may serve as a useful adjunct for patients and clinicians. math_shortcuts Evaluating the performance of artificial intelligence in answering frequently asked patient questions on epistaxis Objective: Epistaxis is a common otolaryngological presentation encountered in the clinical setting. This study compares ChatGPT’s accuracy, readability, and conciseness in addressing patient questions compared to the American Academy of Otolaryngology-Head and Neck Surgery (AAO-HNS) Clinical Practice Guidelines [1]. Methods: Frequently asked questions (FAQs) for Patients with Nosebleed were extrapolated from the AAO-HNS Clinical Practice Guidelines [1]. ChatGPT-generated responses were compared to AAO-HNS model answers and graded by fifteen otolaryngologists for accuracy, readability, and conciseness. Results: 86.7-100% of otolaryngology consultants agreed with the accuracy of ChatGPT responses compared to 66.7-80% agreement with AAO-HNS responses. 66.7-93.3% of consultants agreed that all Chat GPT and AAO-HNS responses were easy to understand. ChatGPT responses were less concise with 6.7-40% consultant agreement, compared to 80-100% agreement with AAO-HNS responses. Conclusion: ChatGPT provided guideline-consistent epistaxis advice and appropriately encouraged specialist care but lacked conciseness. ChatGPT may serve as a useful adjunct for patients and clinicians. Keywords : ChatGPT, Large Language Model, Artificial Intelligence, Otolaryngology, Nosebleed, Epistaxis, Patient Education INTRODUCTION Chat Generative Pre-trained Transformer (ChatGPT) is a natural language processing model that has recently received broad exposure in the medical field [2]. There is growing evidence that its model is reliable in providing patient information on various medical conditions [3-4] and clinical procedures [5]. Epistaxis is one of the most common otolaryngology presentations encountered by physicians of various specialities in the clinical setting [6]. Up to 60% of the population has experienced epistaxis at some point, with 6% seeking medical attention for it [7]. For the non-specialist, answering patient’s questions and providing accurate advice can prove challenging, leading to potential misinformation, complications, and delays in care [8]. Providing accurate, understandable information and advice for discharge is an essential aspect of patient care in all settings, and can reduce re-presentations to hospitals [9]. Current literature shows evidence of ChatGPT suitably answering general otolaryngology questions [10], however there were limitations in the accuracy of the generated responses, posing potential risk to the public [11]. This is the first study to examine the accuracy, readability, and conciseness of ChatGPT in addressing patient questions specifically related to epistaxis. MATERIALS AND METHODS math_shortcuts 2.1 Survey of otolaryngologists Six frequently asked questions (FAQs; Appendix 1) related to general patient education and epistaxis prevention were extrapolated from the American Academy of Otolaryngology-Head and Neck Surgery (AAO-HNS) Clinical Practice Guideline on nosebleed [1]. Questions were entered into ChatGPT 4.0 by two independent researchers at two separate times on September 1, 2024 and Sepember 12, 2024, generating the same responses. ChatGPT-generated responses (Appendix 2) and model responses from AAO-HNS (Appendix 3) were subsequently blinded and distributed to twenty-nine qualified otolaryngology consultants at two major London teaching hospitals for review from September to November 2024. Two follow-up emails were sent to non-responders. Consultant otolaryngologists were asked to assess the accuracy, conciseness, and readability of the responses to FAQs using a 5-point Likert scale (1 = strongly disagree, 2 = somewhat disagree, 3 = neither agree nor disagree, 4 = somewhat agree, and 5 = strongly agree). Survey responses were acquired anonymously via Google Forms, with surveys limited at one response per person. 2.2 Readability Following the approach used in previous studies evaluating patient information and education materials [5, 12-13], three validated readability tools accessible through online software were used to assess the reading level of each response: the Flesch Reading Ease Score (FRES) [14], Flesch-Kincaid Grade Level (FKGL), and Gunning Fox Index (GFI). Each tool assesses factors including the number and length of words, syllables, and sentences to create a score reflective of the readability of the text. Table I provides the interpretation and target scores used for this study. RESULTS AND ANALYSIS 3.1 Survey of otolaryngologists In total, of the twenty-nine surveys sent, 15 completed responses were received from consultant otolaryngologists (response rate 51.7%). No incomplete questionnaires were returned. Response distributions are displayed in Figures 1-3 . The majority of otolaryngology consultants found ChatGPT responses were accurate in all six FAQs [see Figure 1 ], with 86.7-100% of consultants agreeing or strongly agreeing with the responses. By comparison 66.7%-80% of consultants agreed or strongly agreed that AAO-HNS responses were accurate for all questions except Question 3 (Can I use any over-the-counter medications to help if my nose is bleeding?): 20% disagreed or strongly disagreed, 33.3% remained neutral and 46.7% either agreed or strongly agreed. On the other hand, ChatGPT responses were generally perceived as less concise than AAO-HNS responses [see Figure 2 ], with only 6.7-40% of consultants agreeing or strongly agreeing that responses to Questions 1-3 and Question 5-6 were concise, and 73.3% agreeing or strongly agreeing that responses to Question 4 was concise. By comparison, 80-100% of consultants agreed that AAO-HNS responses were concise in all six FAQs. In terms of readability [see Figure 3 ], 66.7-93.3% of consultants agreed or strongly agreed that Chat GPT responses to all six questions were easy to understand. Similarly, 66.7-93.3% of consultants agreed or strongly agreed that all six AAO-HNS responses were easy to understand. 11pt, fleqn, a4paper, ]LegrandOrangeBook 3.2 Readability The FRES of AAO-HNS model responses ranged from 32 (difficult) to 75 (fairly easy), with a mean of 56.2 (SD = 18.8; 95% CI 36.4-75.9), which was considered “fairly difficult to read.” The FKGL of AAO-HNS model responses ranged from 7 to 14, with a mean of 9.83 (SD = 3.19; 95% CI 6.5-13.2), which corresponds to the literacy expected of a 15 year-old teenager. The GFI of AAO-HNS model responses ranged from 10 to 18, with a mean of 12.7 (SD = 3.3; 95% CI 9.2-16.2), which is considered the literacy level of a final year school student approaching graduation. On the other hand, the more effusive ChatGPT responses resulted in a mean FRES of 48.9 (SD = 12.8; 95% CI 35.6-62.3), which was considered “difficult to read,” with a range of 38.1 to 72.6. Similarly, the ChatGPT FKGL responses ranged from 7.1 to 13.5, with a mean of 10.7 (SD = 2.5; 95% CI 8.1-13.3). By contrast, the GFI, which considers the complexity of individual words, was lower than AAO-HNS responses, with ChatGPT responses scoring a mean of 9.6 (SD = 2.05; 95% CI 7.5-11.8), with responses ranging from 6.8 to 13.1. Results of readability of responses from AAO-HNS and ChatGPT are summarised in Table II . DISCUSSION A large proportion of people seek health information online [15], and with the increasing popularity of AI platforms, such as ChatGPT, the number of people consulting these platforms for health advice is likely to increase. Our study compares the patient advice generated by ChatGPT to advice that is widely accessible online from one of the largest professional associations for otolaryngologists: AAO-HNS. The findings of our study suggest overall expert agreement that AAO-HNS responses are accurate, concise, and easy to understand. By comparison, although generally regarded as less concise, expert agreement with ChatGPT responses was observed in terms of accuracy and readability. The 86.7-100% of expert agreement with the accuracy of ChatGPT-generated responses suggests the accuracy of AI-generated advice may already be comparable to professional guidelines. This is in keeping with previous studies [10] assessing information provided by ChatGPT for education, patient advice, and understandability. However, the authors in the same study reported concerns over the accuracy of generated response and potential instances when ChatGPT presents false information as fact, otherwise known as “hallucinations.” [10]. In terms of conciseness, the lower level of expert agreement with ChatGPT responses (6.7-40%) highlights a common limitation of AI-generated content: extensive responses, which may be of limited use in time-sensitive or acute settings. By comparison, the AAO-HNS model responses, generated by clinicians, were generally viewed as concise (80-100% agreement), suggesting more direct guidelines, which could be of more use in acute settings, to provide quick and clear patient advice. Despite being regarded as less concise by otolaryngologists, ChatGPT responses were perceived as having similar levels of readability to AAO-HNS responses. However, formal readability assessment with three validated scoring instruments showed that ChatGPT generated answers corresponded to a college level education. This is beyond the minimal recommended reading level for patient information, which is grade 7 and below [16]. When compared to AAO-HNS model responses, ChatGPT scored higher in FRES and FKGL, readability tools primarily focused on the number and length of words, syllables, and sentences; however the GFI, which focuses on the complexity of individual words, was higher in AAO-HNS model responses compared to ChatGPT. This could be accounted for in AAO-HNS responses that include terms such as “digital trauma,” which could be interpreted as more complex, and would not consider the subsequent clarification “(nose picking)” when calculating a GFI score. Considering the disparities in health literacy in the wider population, striking a balance between providing accurate information without overwhelming patients is essential. There is evidence that many individuals with limited health literacy are unable to understand and weigh-in health information, potentially negatively affecting the shared decision-making process [17]. Interventions that improve health-literacy have been shown to have a positive impact on reducing disparities in disease morbidity and mortality [17]. Despite the lack of conciseness in ChatGPT responses evidenced by our study, previous studies have demonstrated large language models such as ChatGPT can be used to improve readability of patient information [12]. In addition, ChatGPT can be prompted to generate responses tailored to various levels of health literacy [5]. To this end, ChatGPT may serve as a useful platform to aid clinicians in providing or improving patient advice that remains accurate as well as easy to understand by the wider population. Limitations This study assessed the use of ChatGPT to provide patient advice at one point in time, however this is likely something that will progress in subsequent iterations of ChatGPT as the AI language models are being improved. ChatGPT 4.0 was used for this study, however further models may be released as the technology undergoes constant updates and improvements. A relatively small sample size was used for this study, comprising 29 otolaryngology consultants at two major teaching hospitals, with a relatively modest response rate of 15 responses (51.7%). The perspectives gathered, though limited in number, reflect the opinions of highly specialised and experienced clinicians, with direct involvement in current clinical practice. Yet, inherent to a survey-based study, it is important to acknowledge the potential for response bias, as otolaryngologists responding to the survey may hold a particular interest in rhinology or AI. While the high level of expertise of our respondents indicates the data remains informative in guiding discussions regarding the applicability of AI-generated responses in clinical practice, future studies with larger cohorts should be conducted to validate these findings. Conclusion Our findings indicate that ChatGPT responses to common FAQs related to epistaxis are comparable to clinical guideline responses in terms of accuracy and readability, although lack conciseness. Given the potential for hallucinations within AI-generated responses, clinicians should view ChatGPT as a useful adjunct rather than sole data source for medical information. 11pt, fleqn, a4paper, ]LegrandOrangeBook Authors’ contributions Study design/planning: SG/VW Study conduct: SG/CI Data analysis: SG Writing paper: SG/CI/VW/KL/RC Statements and Declarations: Financial Support: This research received no specific grant from any funding agency, commercial or not-for-profit sectors. Competing Interests: No conflicts of interest to disclose. No authors reported any financial disclosures. math_shortcuts REFERENCES 1. Tunkel DE, Anne S, Payne SC, Ishman SL, Rosenfeld RM, Abramson PJ, et al. Clinical Practice Guideline: Nosebleed (Epistaxis) Otolaryngol Head Neck Surg 2020 Jan;162(Suppl 1):S1-S38 2. Dave T, Athaluri SA, Singh S. ChatGPT in medicine: an overview of its applications, advantages, limitations, future prospects, and ethical considerations Front Artif Intell 2023 May 4;6:1169595 3. Kuşcu O, Pamuk AE, Sütay Süslü N, Hosal S. Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer Front Oncol 2023 Dec 1;13:1256459 4. Seth I, Cox A, Xie Y, Bulloch G, Hunter-Smith DJ, Rozen WM, et al. Evaluating Chatbot Efficacy for Answering Frequently Asked Questions in Plastic Surgery: A ChatGPT Case Study Focused on Breast Augmentation Aesthet Surg J 2023 Sep 14;43:1126-1135 5. Mootz AA, Carvalho B, Sultan P, Nguyen TP, Reale SC. The Accuracy of ChatGPT-Generated Responses in Answering Commonly Asked Patient Questions About Labor Epidurals: A Survey-Based Study Anesth Analg 2024 May 1;138:1142-1144 6. Viehweg TL, Roberson JB, Hudson JW. Epistaxis: diagnosis and treatment J Oral Maxillofac Surg 2006 Mar;64:511-8 7. Womack JP, Kropa J, Jimenez Stabile M. Epistaxis: Outpatient management Am Fam Physician 2018 Aug 15;98:240-5 8. Tassone P, Georgalas C, Appleby E, Kotecha B. Management of patients with epistaxis by general practitioners: impact of otolaryngology experience on their practice Eur Arch Otorhinolaryngol 2006 Dec;263:1109-14 9. Eze N. Advice given to patients with epistaxis by A&E doctors Emergency Medicine Journal 2005;22:724-725 10. Lechien JR, Rameau A. Applications of ChatGPT in Otolaryngology-Head Neck Surgery: A State of the Art Review Otolaryngol Head Neck Surg 2024 Sep;171:667-677 11. Nielsen JPS, von Buchwald C, Grønhøj C. Validity of the large language model ChatGPT (GPT4) as a patient information source in otolaryngology by a variety of doctors in a tertiary otorhinolaryngology department Acta Otolaryngol 2023 Sep;143:779-782 12. Baldwin AJ. An artificial intelligence language model improves readability of burns first aid information Burns 2024 Jun;50:1122-1127 13. Narwani V, Nalamada K, Lee M, Kothari P, Lakhani R. Readability and quality assessment of internet-based patient education materials related to laryngeal cancer Head Neck 2016 Apr;38:601-5 14. Flesch R. A new readability yardstick J Appl Psychol 1948;32:221-233 15. Jia X, Pang Y, Liu LS. Online Health Information Seeking Behavior: A Systematic Review Healthcare (Basel) 2021 Dec 16;9:1740 16. Hutchinson N, Baird GL, Garg M. Examining the Reading Level of Internet Medical Information for Common Internal Medicine Diagnoses Am J Med 2016 Jun;129:637-9 17. Coughlin SS, Vernon M, Hatzigeorgiou C, George V. Health Literacy, Social Determinants of Health, and Disease Prevention and Control J Environ Health Sci 2020;6:3061 Table I. Interpretation of Readability Scoring Tools Flesch Reading Ease Score 90-100: Very easy (Grade 5) ≥70 80-90: Easy (Grade 6) 70-80: Fairly easy (Grade 7) 60-70: Standard (Grade 8-9) 50-60: Fairly difficult (Grade 10-12) 30-50: Difficult (College) 0-30: Very difficult (College graduate) FKGL Years of education required to understand most of the text ≤ 6.9 GFI Years of formal education that a person requires in order to easily ≤ 6.9 understand the text on the first reading FKGL = Flesch-Kincaid Grade Level, GFI = Gunning Fog Index Table II. Mean (SD) readability scores of AAO-HNS and ChatGPT Responses Readability Tool AAO-HNS Responses ChatGPT Responses FRES 56.2 (18.8) 48.9 (12.8) FKGL 9.83 (3.2) 10.72 (2.5) GFI 12.7 (3.3) 9.6 (2.1) Information & Authors Information Version history V1 Version 1 24 June 2025 Copyright This work is licensed under a Non Exclusive No Reuse License. Authors Affiliations Shireen Gohari 0000-0001-9166-3579 [email protected] St George's University Hospitals NHS Foundation Trust Department of ENT Surgery View all articles by this author Chiugo Ike St George's University Hospitals NHS Foundation Trust Department of ENT Surgery View all articles by this author Valentin Weber Whittington Health NHS Trust View all articles by this author Kennth Lai Guy's and St Thomas' NHS Foundation Trust Ear Nose and Throat Head and Neck Service View all articles by this author Ryan Cheong Royal National ENT and Eastman Dental Hospitals View all articles by this author Metrics & Citations Metrics Article Usage 145 views 101 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Shireen Gohari, Chiugo Ike, Valentin Weber, et al. Evaluating the performance of artificial intelligence in answering frequently asked patient questions on epistaxis. Authorea . 24 June 2025. DOI: https://doi.org/10.22541/au.175074855.53963407/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.175074855.53963407/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9febb9d919a5df88',t:'MTc3OTI4NDE0MA=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.