Readability, Quality, Understandability, and Actionability of ChatGPT Generated GI Patient Education vs AGA Patient Center

doi:10.21203/rs.3.rs-8940111/v1

Readability, Quality, Understandability, and Actionability of ChatGPT Generated GI Patient Education vs AGA Patient Center

2026 · doi:10.21203/rs.3.rs-8940111/v1

preprint OA: closed

Full text JSON View at publisher

Full text 71,219 characters · extracted from preprint-html · click to expand

Readability, Quality, Understandability, and Actionability of ChatGPT Generated GI Patient Education vs AGA Patient Center | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Readability, Quality, Understandability, and Actionability of ChatGPT Generated GI Patient Education vs AGA Patient Center Shivam Chandra, Vineet Kumar, Robert Kwei-Nsoro, Anas Almoghrabi This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8940111/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 11 You are reading this latest preprint version Abstract Background and Aims: Patients increasingly use the internet and artificial intelligence chatbots to obtain health information, yet the readability, quality, understandability, and actionability of AI-generated gastrointestinal patient education remain unclear. This study compared gastrointestinal patient education from a professional society website with content generated by ChatGPT using validated health literacy instruments. Methods: In this cross-sectional comparative study, 50 gastrointestinal patient education topics from the American Gastroenterological Association patient information website were paired with ChatGPT-generated responses using standardized prompts. Readability was assessed using the Flesch-Kincaid Grade Level. Quality of treatment information was evaluated using the DISCERN instrument. Understandability and actionability were assessed using the Patient Education Materials Assessment Tool. Paired t tests were used to compare mean scores between sources. Results: Fifty paired topics were analyzed. The mean Flesch-Kincaid Grade Level was higher for ChatGPT than professional society materials (10.33 vs 8.72; mean difference, 1.61; 95% CI, 0.89–2.32; P = .00012). Differences in DISCERN scores (63.52 vs 64.30; mean difference, −0.78; 95% CI, −3.10 to 1.53; P = .49), PEMAT understandability (87.91% vs 86.52%; mean difference, 1.39%; 95% CI, −1.48% to 4.26%; P = .33), and PEMAT actionability (78.57% vs 77.93%; mean difference, 0.63%; 95% CI, −3.14% to 4.40%; P = .73) were not statistically significant. Conclusion: ChatGPT-generated gastrointestinal patient education demonstrated similar quality, understandability, and actionability compared with professional society materials but was written at a significantly higher reading level. Improving readability may enhance accessibility and support the safe integration of AI-generated patient education. artificial intelligence patient education health literacy gastroenterology readability Introduction Patient education materials are a major source of health information and strongly influence clinical outcomes, adherence, and shared decision making. 1 Major organizations, including the American Medical Association and the National Institutes of Health, recommend that patient-facing materials be written at a sixth- to eighth-grade reading level to ensure accessibility. 2 However, prior studies have consistently shown that online health information often exceeds recommended readability thresholds, even as patients increasingly turn to the internet for medical information. 3 Large language models such as ChatGPT are rapidly emerging and potential new sources of health information, offering immediate access, conversational responses, and personalized explanations. 4 Despite these advantages, concerns remain regarding the quality, reliability, readability, and actionability of AI-generated content, highlighting the need for rigorous evaluation using validated tools. 5 Prior research has assessed the readability and quality of online health information and has begun examining AI-generated medical responses using validated measures such as the Flesch-Kincaid Grade Level, DISCERN, and the Patient Education Materials Assessment Tool. 6 However, gastrointestinal patient education has not been specifically studied in this context, and no prior study has directly compared AI-generated gastrointestinal patient education with professional society materials using validated health literacy tools. This study compares the readability, quality, understandability, and actionability of gastrointestinal patient education from the American Gastroenterological Association patient information website and ChatGPT using validated instruments in a paired design. We hypothesized that AI-generated content would demonstrate comparable quality and usability but poorer readability compared with professional society materials. Methods Study Design and Data Sources This cross sectional comparative study evaluated gastrointestinal patient education materials obtained from the American Gastroenterological Association patient education website and from ChatGPT (OpenAI). 7 , 8 Fifty gastrointestinal patient education topics were identified from the AGA website. Each topic title was entered as a standardized prompt to generate a corresponding ChatGPT response, creating paired content for analysis. ChatGPT responses were generated during a single study period using a new chat session for each topic to minimize carryover context. Prompts were written in plain language to reflect typical patient questions. No follow-up prompts or iterative refinement were used, and responses were copied verbatim without editing prior to analysis. The AGA webpage content and the corresponding ChatGPT response for each topic were then independently evaluated using the study instruments, and each topic pair served as the unit of analysis. Outcomes and Measurement Instruments Readability Readability was assessed using the Flesch Kincaid Grade Level formula, which estimates the US school grade level required to understand written content. Lower scores indicate easier readability and improved accessibility for patients. Readability metrics were calculated in R using the quanteda text analysis package. Quality of Treatment Information Quality and reliability were evaluated using the DISCERN instrument, a validated 16 item tool designed to assess consumer health information about treatment choices. Each item is scored on a 5 point scale and item scores were summed to generate a total score ranging from 16 to 80, with higher scores indicating higher quality and reliability. Understandability and Actionability Understandability and actionability were evaluated using the Patient Education Materials Assessment Tool. PEMAT understandability measures how easily readers can process and explain the information, while PEMAT actionability evaluates whether the material provides clear steps for readers to take. Scores were calculated as percentages according to published PEMAT scoring instructions, with higher scores indicating better performance. Scoring Procedures All materials were independently reviewed and scored by a single reviewer using the Flesch-Kincaid Grade Level, DISCERN, and PEMAT instruments.. Statistical Analysis Because each gastrointestinal topic had matched content from both sources, paired analyses were performed. Paired t tests were used to compare mean scores between the professional society website and ChatGPT for Flesch Kincaid Grade Level, DISCERN total score, PEMAT understandability, and PEMAT actionability. Mean differences and 95 percent confidence intervals were calculated. Statistical significance was defined as a 2 sided P value less than .05. Statistical analyses were performed using R statistical software (R Foundation for Statistical Computing). Ethics Statement This study used publicly available information and did not involve human participants. Institutional review board approval was not required. Results A total of 50 paired gastrointestinal patient education topics were analyzed (Table 1 ). The mean Flesch Kincaid Grade Level was 8.72 for the professional society materials and 10.33 for ChatGPT generated responses (mean difference, 1.61; 95% CI, 0.89 to 2.32; P = .00012 ). Table 1 Comparison of Readability, Quality, Understandability, and Actionability of ChatGPT vs American Gastroenterological Association Patient Education Materials Metric n pairs GI mean ChatGPT mean Mean diff 95% CI diff p value Flesch Kincaid Grade Level 50 8.72 10.33 + 1.61 0.89 to 2.32 0.00012 DISCERN total (0 to 80) 50 64.30 63.52 -0.78 -3.10 to 1.53 0.490 PEMAT Understandability (%) 50 86.52 87.91 + 1.39 -1.48 to 4.26 0.325 PEMAT Actionability (%) 50 77.93 78.57 + 0.63 -3.14 to 4.40 0.732 Note: Abbreviations: CI, confidence interval; DISCERN, Quality Criteria for Consumer Health Information; GI, gastrointestinal; PEMAT, Patient Education Materials Assessment Tool. Values represent paired comparisons (ChatGPT minus GI website). Paired t tests were used. Statistical significance was defined as a 2 sided P value < .05. The mean DISCERN total score was 64.30 for the professional society materials and 63.52 for ChatGPT generated responses (mean difference, − 0.78; 95% CI, − 3.10 to 1.53; P = .49). Mean PEMAT understandability scores were 86.52% for the professional society materials and 87.91% for ChatGPT generated responses (mean difference, 1.39%; 95% CI, − 1.48% to 4.26%; P = .33). Mean PEMAT actionability scores were 77.93% for the professional society materials and 78.57% for ChatGPT generated responses (mean difference, 0.63%; 95% CI, − 3.14% to 4.40%; P = .73). Discussion Previous studies have evaluated AI-generated patient education for individual gastrointestinal procedures and conditions; however, a specialty-wide comparison of AI-generated gastrointestinal patient education with professional society materials using validated health literacy instruments has not been performed. 9 In this study, patient education materials from a professional gastrointestinal society website were compared with content generated by ChatGPT using validated measures of readability, quality, understandability, and actionability. 7,8 ChatGPT responses were written at a higher reading level than website materials, whereas DISCERN quality and PEMAT understandability and actionability scores were similar between sources. These findings suggest that AI-generated patient education may provide information comparable in quality and usability to traditional sources; however, readability remains an important limitation. Using the Flesch-Kincaid Grade Level, ChatGPT responses across 50 gastrointestinal conditions had a mean reading level of 10.33 compared with 8.72 for materials from the American Gastroenterological Association patient education website, a statistically significant difference. 7 Approximately 85% of the general public reads at or below a sixth to eighth grade level, and the American Medical Association recommends that patient education materials be written within this range. 10 , 11 , 12 , 13 , 14 , 15 The professional society materials more closely approached this target. Large language models are trained on large scale internet text corpora and therefore reflect the linguistic characteristics of their training data. Because online health information often exceeds recommended literacy levels, AI generated responses may reproduce this complexity and be written at higher reading levels. 16 , 17 From this perspective, the higher reading level observed in AI generated content underscores the need to improve readability in AI generated patient education. DISCERN is a validated instrument designed to evaluate the reliability and quality of written health information about treatment choices. 18 The tool assesses transparency of sources, balance of information, and clarity in the description of treatment options, risks, and benefits, ultimately providing an overall quality rating relevant to patient decision-making. In this study, mean DISCERN scores for both ChatGPT-generated responses and American Gastroenterological Association materials were in the low 60s, corresponding to the “good quality” range defined by the DISCERN handbook (excellent, 63 to 75; good, 51 to 62; fair, 39 to 50; poor, 27 to 38; very poor, 15 to 26). 19 , 20 These findings suggest that, although not perfect, AI-generated health information can meet validated quality benchmarks similar to those achieved by national medical organizations, highlighting the emerging role of AI in patient education. The Patient Education Materials Assessment Tool evaluates whether health information is understandable and actionable for patients. 21 Understandability reflects how easily readers from diverse backgrounds can process the content, whereas actionability measures whether clear and specific steps are provided for patients to follow. Scores at or above the commonly used 70% threshold indicate acceptable performance. 22 In this analysis, no statistically significant differences in PEMAT scores were observed between ChatGPT-generated content and professional society materials, suggesting similar usability for patient audiences. Together, these findings support the concept that AI-generated health information may serve as a supplemental resource for patient education. 23 This is particularly relevant as digital health tools continue to evolve, including emerging platforms designed to securely connect medical records and wellness applications to help ground health conversations in individualized patient information. This study has several notable strengths. It directly compared AI generated gastrointestinal patient education with materials from a major professional society using a paired design, which minimized topic variability and enabled a rigorous head to head evaluation using identical prompts. Multiple validated and widely accepted health literacy instruments were used, including the Flesch Kincaid Grade Level for readability, the DISCERN instrument for quality and reliability, and the Patient Education Materials Assessment Tool for understandability and actionability, providing a multidimensional assessment rather than reliance on a single metric. The focus on gastrointestinal patient education addresses a gap in the literature, as no prior study to our knowledge has conducted a direct comparison between AI generated content and professional society resources in this specialty using validated tools. The paired statistical approach with prespecified outcomes, confidence intervals, and hypothesis testing strengthens the analytic rigor and allows interpretation of both statistical significance and effect size. The findings are also clinically relevant, given the central role of patient education in health literacy, shared decision making, and adherence, and the increasing integration of digital health tools into how patients seek information. Finally, the study provides a reproducible framework that can be applied to other specialties, organizations, and AI systems, supporting future research in this rapidly evolving area. Several considerations warrant discussion. The analysis was limited to a single professional society website and one large language model, which may limit generalizability to other organizations, online resources, and AI systems. Because AI models evolve rapidly, these findings reflect performance at a single time point. The study evaluated a limited sample of gastrointestinal topics, and although the paired design improves comparability, the sample may not represent the full range of patient education materials available online. Although validated instruments were used, these tools have inherent constraints. The Flesch Kincaid Grade Level assesses sentence length and word complexity but does not fully capture comprehension or cultural appropriateness. DISCERN evaluates reliability and treatment information quality but does not directly assess factual accuracy, and PEMAT measures understandability and actionability without determining whether patients would correctly apply the information in real world settings. The study did not evaluate patient comprehension, trust, behavioral change, or clinical outcomes, and did not directly assess accuracy, safety, or potential harm. AI responses were generated in a controlled setting using standardized prompts, whereas real world patient interactions are more variable and iterative. Finally, scoring was performed by a single reviewer, which may introduce subjectivity and precludes assessment of interrater reliability; future studies should incorporate multiple independent reviewers and interrater reliability analysis. Conclusion In this cross sectional comparative study, gastrointestinal patient education generated by ChatGPT demonstrated similar quality, understandability, and actionability compared with materials from a professional gastrointestinal society, while readability remained higher than recommended patient literacy targets. These findings suggest that AI generated patient education may serve as a supplemental source of health information, but improvements in readability remain necessary to optimize accessibility and health equity. As patients increasingly use conversational AI tools to seek medical information, ensuring that generated content aligns with established health literacy standards will be important for safe and effective integration into patient education. Future research should expand evaluation of AI generated patient education across additional medical specialties, professional organizations, and large language models to assess generalizability. Studies incorporating assessments of factual accuracy, clinical safety, and potential harm are also needed. Importantly, patient centered outcomes, including comprehension, trust, decision making, and behavioral change, should be evaluated in real world settings. Further work should focus on strategies to improve readability and accessibility of AI generated health information, including prompt design, model fine tuning, and integration of health literacy guidelines into AI development. Declarations Author Contribution Author ContributionsS.C. conceived the study, designed the methodology, collected the data, performed the analysis, and drafted the manuscript. V.K. assisted with study design, data collection, and manuscript editing. R.K.N. provided subject matter expertise, supervised the project, and contributed to manuscript revision. A.A. provided senior oversight, contributed to study design and interpretation of results, and critically revised the manuscript. All authors reviewed and approved the final manuscript. Data Availability The datasets generated and analyzed during the current study are available from the corresponding author on reasonable request. The study used publicly available patient education materials from the American Gastroenterological Association website and ChatGPT-generated responses created using standardized prompts. References Giguère A, Zomahoun HTV, Carmichael PH, et al. Printed educational materials: effects on professional practice and healthcare outcomes. Cochrane Database Syst Rev. 2020;8(8):CD004398. Published 2020 Jul 31. doi: 10.1002/14651858.CD004398.pub4 Eltorai AE, Ghanian S, Adams CA Jr, Born CT, Daniels AH. Readability of patient education materials on the american association for surgery of trauma website. Arch Trauma Res. 2014;3(2):e18161. doi: 10.5812/atr.18161 . PMID: 25147778; PMCID: PMC4139691. Will J, Gupta M, Zaretsky J, Dowlath A, Testa P, Feldman J. Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study. J Med Internet Res. 2025;27:e69955. doi: 10.2196/69955 . PMID: 40465378; PMCID: PMC12177420. Javaid, M., Haleem, A., & Singh, R. P. (2023). Chatgpt for Healthcare Services: An Emerging Stage for an Innovative Perspective. BenchCouncil Transactions on Benchmarks, Standards and Evaluations, 3 , Article ID: 100105. https://doi.org/10.1016/j.tbench.2023.100105 Gibson D, Jackson S, Shanmugasundaram R, et al. Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment. J Med Internet Res. 2024;26:e55939. Published 2024 Aug 14. doi: 10.2196/55939 Shafau F, Wahl C. Evaluating the Readability of AI-Generated Patient Information on Chronic Diseases. Chronic Dis Transl Med. 2025;11(4):316–317. Published 2025 Sep 1. doi: 10.1002/cdt3.70020 American Gastroenterological Association. AGA GI Patient Center. Accessed February 17, 2026. https://patient.gastro.org/ OpenAI. ChatGPT. Version GPT-5.2. OpenAI; 2026. Accessed February 18, 2026. https://chat.openai.com/ Miller, Matthew DO*; Mistry, Rumit DO; Hixson, William BS; Behers, Benjamin MD; Hamad, Karen MD; Ramos Ortiz, Gustavo MD; Jerez Diaz, David MD. S2961 Assessing the Quality and Readability of AI Chatbot-Generated Patient Education Materials for Endoscopic Retrograde Cholangiopancreatography. The American Journal of Gastroenterology 120(10S2): p S637, October 2025. | DOI: 10.14309/01.ajg.0001139304.44763.5e Patel AJ, Kloosterboer A, Yannuzzi NA, Venkateswaran N, Sridhar J. Evaluation of the Content, Quality, and Readability of Patient Accessible Online Resources Regarding Cataracts. Semin Ophthalmol. 2021;36(5–6):384–391. doi: 10.1080/08820538.2021.1893758. Epub 2021 Feb 26. PMID: 33634726; PMCID: PMC8328867. About readability. Accessed February 17, 2026. https://readable.com/readability/ Weiss BD. Health literacy: A manual for clinicians. Chicago, IL: American Medical Association Foundation and American Medical Association; 2003. Weiss BD, Coyne C. Communicating with patients who cannot read. N Engl J Med. 1997;337(4):272–4. doi: 10.1056/NEJM199707243370411 . Cotugna N, Vickery CE, Carpenter-Haefele KM. Evaluation of literacy level of patient education pages in health-related journals. J Community Health. 2005;30(3):213–9. doi: 10.1007/s10900-004-1959-x . Doak LG, Doak CC, Meade CD. Strategies to improve cancer education materials. Oncol Nurs Forum. 1996;23(8):1305–12. Rahimli Ocakoglu S, Coskun B. The emerging role of AI in patient education: a comparative analysis of LLM accuracy for pelvic organ prolapse. Med Princ Pract . 2024;33(4):330–337. doi: 10.1159/000538538 Pal A, Wangmo T, Bharadia T, Ahmed-Richards M, Bhanderi MB, Kachhadiya R, Allemann SS, Elger BS. Generative AI/LLMs for Plain Language Medical Information for Patients, Caregivers and General Public: Opportunities, Risks and Ethics. Patient Prefer Adherence. 2025;19:2227–2249. doi: 10.2147/PPA.S527922. PMID: 40771655; PMCID: PMC12325106. Charnock D, Shepperd S, Needham G, Gann R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health. 1999;53(2):105–111. doi: 10.1136/jech.53.2.105 Boyer C, Selby M, Scherrer JR, Appel RD. The Health On the Net Code of Conduct for medical and health Websites. Comput Biol Med. 1998;28(5):603 – 10. doi: 10.1016/s0010-4825(98)00037-7 . PMID: 9861515. Ozduran E, Büyükçoban S. Evaluating the readability, quality and reliability of online patient education materials on post-covid pain. PeerJ. 2022;10:e13686. doi: 10.7717/peerj.13686 . PMID: 35880220; PMCID: PMC9308460. Furukawa E, Okuhara T, Liu M, Okada H, Kiuchi T. Evaluating Online and Offline Health Information With the Patient Education Materials Assessment Tool: Protocol for a Systematic Review. JMIR Res Protoc. 2025;14:e63489. doi: 10.2196/63489 . PMID: 39813665; PMCID: PMC11780281. Shoemaker SJ, Wolf MS, Brach C. Development of the Patient Education Materials Assessment Tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Educ Couns. 2014;96(3):395–403. doi: 10.1016/j.pec.2014.05.027 . Epub 2014 Jun 12. PMID: 24973195; PMCID: PMC5085258. OpenAI. Introducing ChatGPT Health. Published January 7, 2026. Accessed February 17, 2026. https://openai.com/index/introducing-chatgpt-health/ Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Reviews received at journal 06 May, 2026 Reviews received at journal 20 Apr, 2026 Reviewers agreed at journal 13 Apr, 2026 Reviewers agreed at journal 13 Apr, 2026 Reviewers agreed at journal 11 Apr, 2026 Reviews received at journal 09 Apr, 2026 Reviewers agreed at journal 09 Apr, 2026 Reviewers invited by journal 09 Apr, 2026 Editor assigned by journal 24 Feb, 2026 Submission checks completed at journal 24 Feb, 2026 First submitted to journal 22 Feb, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8940111","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":622459358,"identity":"8d579eb3-e80a-4dc7-b723-d7c7edc970d5","order_by":0,"name":"Shivam Chandra","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA00lEQVRIie3RsQrCMBSF4Ugg0wXXhmL7CoFAp+KzNAS6ujp20s2uLfgQQqHzxYIugmsHl1JwcmhxVzs4CZK6OeTbAvnJgRBiWX+JIvaPEKbv4yQxJ0w1GYs9ntDRCUgJbC8Fjk38NQY8A1TF8VzdgYSzHRoScYpip3Muqjxp5gKJpTkh0YFn4qpK1IwCqZQx8dNm5UJUqSJt6TDsaU5IrakErKRwNBmGoTkR9XXS5EnsOXUb8K3QMjcPW3TYJ8NXpqrtbsv5bGMc9vHob9cty7KsL15XWkhzHMEo6QAAAABJRU5ErkJggg==","orcid":"","institution":"A.T. Still University","correspondingAuthor":true,"prefix":"","firstName":"Shivam","middleName":"","lastName":"Chandra","suffix":""},{"id":622459361,"identity":"8c630e0a-af19-4828-9df3-990d86673b10","order_by":1,"name":"Vineet Kumar","email":"","orcid":"","institution":"Michigan State University","correspondingAuthor":false,"prefix":"","firstName":"Vineet","middleName":"","lastName":"Kumar","suffix":""},{"id":622459365,"identity":"8c4fed37-cca9-4b45-8a4a-876baae2e677","order_by":2,"name":"Robert Kwei-Nsoro","email":"","orcid":"","institution":"John H. Stroger, Jr. Hospital of Cook County","correspondingAuthor":false,"prefix":"","firstName":"Robert","middleName":"","lastName":"Kwei-Nsoro","suffix":""},{"id":622459367,"identity":"e750d658-0b02-4e11-a2ac-5efb6938e843","order_by":3,"name":"Anas Almoghrabi","email":"","orcid":"","institution":"John H. Stroger, Jr. Hospital of Cook County","correspondingAuthor":false,"prefix":"","firstName":"Anas","middleName":"","lastName":"Almoghrabi","suffix":""}],"badges":[],"createdAt":"2026-02-22 15:38:15","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8940111/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8940111/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":107705125,"identity":"1607438f-4d43-45e4-8fab-9e763e67b5fd","added_by":"auto","created_at":"2026-04-24 09:08:25","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":186224,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8940111/v1/a37bea01-2d22-44df-b2b2-34465a30ff8d.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Readability, Quality, Understandability, and Actionability of ChatGPT Generated GI Patient Education vs AGA Patient Center","fulltext":[{"header":"Introduction","content":"\u003cp\u003ePatient education materials are a major source of health information and strongly influence clinical outcomes, adherence, and shared decision making.\u003csup\u003e\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e\u003c/sup\u003e Major organizations, including the American Medical Association and the National Institutes of Health, recommend that patient-facing materials be written at a sixth- to eighth-grade reading level to ensure accessibility.\u003csup\u003e\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e\u003c/sup\u003e However, prior studies have consistently shown that online health information often exceeds recommended readability thresholds, even as patients increasingly turn to the internet for medical information.\u003csup\u003e\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e\u003c/sup\u003e\u003c/p\u003e \u003cp\u003eLarge language models such as ChatGPT are rapidly emerging and potential new sources of health information, offering immediate access, conversational responses, and personalized explanations.\u003csup\u003e\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e\u003c/sup\u003e Despite these advantages, concerns remain regarding the quality, reliability, readability, and actionability of AI-generated content, highlighting the need for rigorous evaluation using validated tools.\u003csup\u003e\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e\u003c/sup\u003e Prior research has assessed the readability and quality of online health information and has begun examining AI-generated medical responses using validated measures such as the Flesch-Kincaid Grade Level, DISCERN, and the Patient Education Materials Assessment Tool.\u003csup\u003e\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e\u003c/sup\u003e However, gastrointestinal patient education has not been specifically studied in this context, and no prior study has directly compared AI-generated gastrointestinal patient education with professional society materials using validated health literacy tools.\u003c/p\u003e \u003cp\u003eThis study compares the readability, quality, understandability, and actionability of gastrointestinal patient education from the American Gastroenterological Association patient information website and ChatGPT using validated instruments in a paired design. We hypothesized that AI-generated content would demonstrate comparable quality and usability but poorer readability compared with professional society materials.\u003c/p\u003e"},{"header":"Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003eStudy Design and Data Sources\u003c/h2\u003e \u003cp\u003eThis cross sectional comparative study evaluated gastrointestinal patient education materials obtained from the American Gastroenterological Association patient education website and from ChatGPT (OpenAI).\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e,\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e\u003c/sup\u003e Fifty gastrointestinal patient education topics were identified from the AGA website. Each topic title was entered as a standardized prompt to generate a corresponding ChatGPT response, creating paired content for analysis. ChatGPT responses were generated during a single study period using a new chat session for each topic to minimize carryover context. Prompts were written in plain language to reflect typical patient questions. No follow-up prompts or iterative refinement were used, and responses were copied verbatim without editing prior to analysis. The AGA webpage content and the corresponding ChatGPT response for each topic were then independently evaluated using the study instruments, and each topic pair served as the unit of analysis.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eOutcomes and Measurement Instruments\u003c/h3\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003eReadability\u003c/h2\u003e \u003cp\u003eReadability was assessed using the Flesch Kincaid Grade Level formula, which estimates the US school grade level required to understand written content. Lower scores indicate easier readability and improved accessibility for patients. Readability metrics were calculated in R using the quanteda text analysis package.\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eQuality of Treatment Information\u003c/h3\u003e\n\u003cp\u003eQuality and reliability were evaluated using the DISCERN instrument, a validated 16 item tool designed to assess consumer health information about treatment choices. Each item is scored on a 5 point scale and item scores were summed to generate a total score ranging from 16 to 80, with higher scores indicating higher quality and reliability.\u003c/p\u003e\n\u003ch3\u003eUnderstandability and Actionability\u003c/h3\u003e\n\u003cp\u003eUnderstandability and actionability were evaluated using the Patient Education Materials Assessment Tool. PEMAT understandability measures how easily readers can process and explain the information, while PEMAT actionability evaluates whether the material provides clear steps for readers to take. Scores were calculated as percentages according to published PEMAT scoring instructions, with higher scores indicating better performance.\u003c/p\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003eScoring Procedures\u003c/h2\u003e \u003cp\u003eAll materials were independently reviewed and scored by a single reviewer using the Flesch-Kincaid Grade Level, DISCERN, and PEMAT instruments..\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003eStatistical Analysis\u003c/h2\u003e \u003cp\u003eBecause each gastrointestinal topic had matched content from both sources, paired analyses were performed. Paired t tests were used to compare mean scores between the professional society website and ChatGPT for Flesch Kincaid Grade Level, DISCERN total score, PEMAT understandability, and PEMAT actionability. Mean differences and 95 percent confidence intervals were calculated. Statistical significance was defined as a 2 sided P value less than .05. Statistical analyses were performed using R statistical software (R Foundation for Statistical Computing).\u003c/p\u003e \u003c/div\u003e\n\u003ch3\u003eEthics Statement\u003c/h3\u003e\n\u003cp\u003eThis study used publicly available information and did not involve human participants. Institutional review board approval was not required.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eA total of 50 paired gastrointestinal patient education topics were analyzed (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). The mean Flesch Kincaid Grade Level was 8.72 for the professional society materials and 10.33 for ChatGPT generated responses (mean difference, 1.61; 95% CI, 0.89 to 2.32; \u003cb\u003eP = .00012\u003c/b\u003e).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003e\u003cb\u003eComparison of Readability, Quality, Understandability, and Actionability of ChatGPT vs American Gastroenterological Association Patient Education Materials\u003c/b\u003e\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMetric\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003en pairs\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eGI mean\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eChatGPT mean\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003eMean diff\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003e95% CI diff\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003ep value\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eFlesch Kincaid Grade Level\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e8.72\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e10.33\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e+\u0026thinsp;1.61\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e0.89 to 2.32\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.00012\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eDISCERN total (0 to 80)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e64.30\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e63.52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e-0.78\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-3.10 to 1.53\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.490\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePEMAT Understandability (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e86.52\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e87.91\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e+\u0026thinsp;1.39\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-1.48 to 4.26\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.325\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003ePEMAT Actionability (%)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e50\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e77.93\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e78.57\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e+\u0026thinsp;0.63\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c6\"\u003e \u003cp\u003e-3.14 to 4.40\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e0.732\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003ctfoot\u003e \u003ctr\u003e\u003ctd colspan=\"7\"\u003eNote: Abbreviations: CI, confidence interval; DISCERN, Quality Criteria for Consumer Health Information; GI, gastrointestinal; PEMAT, Patient Education Materials Assessment Tool.\u003c/td\u003e\u003c/tr\u003e \u003ctr\u003e\u003ctd colspan=\"7\"\u003eValues represent paired comparisons (ChatGPT minus GI website). Paired t tests were used. Statistical significance was defined as a 2 sided P value \u0026lt; .05.\u003c/td\u003e\u003c/tr\u003e \u003c/tfoot\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eThe mean DISCERN total score was 64.30 for the professional society materials and 63.52 for ChatGPT generated responses (mean difference, \u0026minus;\u0026thinsp;0.78; 95% CI, \u0026minus;\u0026thinsp;3.10 to 1.53; P = .49).\u003c/p\u003e \u003cp\u003eMean PEMAT understandability scores were 86.52% for the professional society materials and 87.91% for ChatGPT generated responses (mean difference, 1.39%; 95% CI, \u0026minus;\u0026thinsp;1.48% to 4.26%; P = .33).\u003c/p\u003e \u003cp\u003eMean PEMAT actionability scores were 77.93% for the professional society materials and 78.57% for ChatGPT generated responses (mean difference, 0.63%; 95% CI, \u0026minus;\u0026thinsp;3.14% to 4.40%; P = .73).\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003ePrevious studies have evaluated AI-generated patient education for individual gastrointestinal procedures and conditions; however, a specialty-wide comparison of AI-generated gastrointestinal patient education with professional society materials using validated health literacy instruments has not been performed.\u003csup\u003e\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e\u003c/sup\u003e In this study, patient education materials from a professional gastrointestinal society website were compared with content generated by ChatGPT using validated measures of readability, quality, understandability, and actionability. \u003csup\u003e7,8\u003c/sup\u003e ChatGPT responses were written at a higher reading level than website materials, whereas DISCERN quality and PEMAT understandability and actionability scores were similar between sources. These findings suggest that AI-generated patient education may provide information comparable in quality and usability to traditional sources; however, readability remains an important limitation.\u003c/p\u003e \u003cp\u003eUsing the Flesch-Kincaid Grade Level, ChatGPT responses across 50 gastrointestinal conditions had a mean reading level of 10.33 compared with 8.72 for materials from the American Gastroenterological Association patient education website, a statistically significant difference.\u003csup\u003e\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e\u003c/sup\u003e Approximately 85% of the general public reads at or below a sixth to eighth grade level, and the American Medical Association recommends that patient education materials be written within this range.\u003csup\u003e\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e,\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e,\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e,\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e,\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e,\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u003c/sup\u003e The professional society materials more closely approached this target. Large language models are trained on large scale internet text corpora and therefore reflect the linguistic characteristics of their training data. Because online health information often exceeds recommended literacy levels, AI generated responses may reproduce this complexity and be written at higher reading levels.\u003csup\u003e\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e,\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e\u003c/sup\u003e From this perspective, the higher reading level observed in AI generated content underscores the need to improve readability in AI generated patient education.\u003c/p\u003e \u003cp\u003eDISCERN is a validated instrument designed to evaluate the reliability and quality of written health information about treatment choices.\u003csup\u003e\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e\u003c/sup\u003e The tool assesses transparency of sources, balance of information, and clarity in the description of treatment options, risks, and benefits, ultimately providing an overall quality rating relevant to patient decision-making. In this study, mean DISCERN scores for both ChatGPT-generated responses and American Gastroenterological Association materials were in the low 60s, corresponding to the \u0026ldquo;good quality\u0026rdquo; range defined by the DISCERN handbook (excellent, 63 to 75; good, 51 to 62; fair, 39 to 50; poor, 27 to 38; very poor, 15 to 26).\u003csup\u003e\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e,\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e\u003c/sup\u003e These findings suggest that, although not perfect, AI-generated health information can meet validated quality benchmarks similar to those achieved by national medical organizations, highlighting the emerging role of AI in patient education.\u003c/p\u003e \u003cp\u003eThe Patient Education Materials Assessment Tool evaluates whether health information is understandable and actionable for patients.\u003csup\u003e\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u003c/sup\u003e Understandability reflects how easily readers from diverse backgrounds can process the content, whereas actionability measures whether clear and specific steps are provided for patients to follow. Scores at or above the commonly used 70% threshold indicate acceptable performance.\u003csup\u003e\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e\u003c/sup\u003e In this analysis, no statistically significant differences in PEMAT scores were observed between ChatGPT-generated content and professional society materials, suggesting similar usability for patient audiences. Together, these findings support the concept that AI-generated health information may serve as a supplemental resource for patient education.\u003csup\u003e\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e\u003c/sup\u003e This is particularly relevant as digital health tools continue to evolve, including emerging platforms designed to securely connect medical records and wellness applications to help ground health conversations in individualized patient information.\u003c/p\u003e \u003cp\u003eThis study has several notable strengths. It directly compared AI generated gastrointestinal patient education with materials from a major professional society using a paired design, which minimized topic variability and enabled a rigorous head to head evaluation using identical prompts. Multiple validated and widely accepted health literacy instruments were used, including the Flesch Kincaid Grade Level for readability, the DISCERN instrument for quality and reliability, and the Patient Education Materials Assessment Tool for understandability and actionability, providing a multidimensional assessment rather than reliance on a single metric. The focus on gastrointestinal patient education addresses a gap in the literature, as no prior study to our knowledge has conducted a direct comparison between AI generated content and professional society resources in this specialty using validated tools. The paired statistical approach with prespecified outcomes, confidence intervals, and hypothesis testing strengthens the analytic rigor and allows interpretation of both statistical significance and effect size. The findings are also clinically relevant, given the central role of patient education in health literacy, shared decision making, and adherence, and the increasing integration of digital health tools into how patients seek information. Finally, the study provides a reproducible framework that can be applied to other specialties, organizations, and AI systems, supporting future research in this rapidly evolving area.\u003c/p\u003e \u003cp\u003eSeveral considerations warrant discussion. The analysis was limited to a single professional society website and one large language model, which may limit generalizability to other organizations, online resources, and AI systems. Because AI models evolve rapidly, these findings reflect performance at a single time point. The study evaluated a limited sample of gastrointestinal topics, and although the paired design improves comparability, the sample may not represent the full range of patient education materials available online. Although validated instruments were used, these tools have inherent constraints. The Flesch Kincaid Grade Level assesses sentence length and word complexity but does not fully capture comprehension or cultural appropriateness. DISCERN evaluates reliability and treatment information quality but does not directly assess factual accuracy, and PEMAT measures understandability and actionability without determining whether patients would correctly apply the information in real world settings. The study did not evaluate patient comprehension, trust, behavioral change, or clinical outcomes, and did not directly assess accuracy, safety, or potential harm. AI responses were generated in a controlled setting using standardized prompts, whereas real world patient interactions are more variable and iterative. Finally, scoring was performed by a single reviewer, which may introduce subjectivity and precludes assessment of interrater reliability; future studies should incorporate multiple independent reviewers and interrater reliability analysis.\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn this cross sectional comparative study, gastrointestinal patient education generated by ChatGPT demonstrated similar quality, understandability, and actionability compared with materials from a professional gastrointestinal society, while readability remained higher than recommended patient literacy targets. These findings suggest that AI generated patient education may serve as a supplemental source of health information, but improvements in readability remain necessary to optimize accessibility and health equity. As patients increasingly use conversational AI tools to seek medical information, ensuring that generated content aligns with established health literacy standards will be important for safe and effective integration into patient education.\u003c/p\u003e \u003cp\u003eFuture research should expand evaluation of AI generated patient education across additional medical specialties, professional organizations, and large language models to assess generalizability. Studies incorporating assessments of factual accuracy, clinical safety, and potential harm are also needed. Importantly, patient centered outcomes, including comprehension, trust, decision making, and behavioral change, should be evaluated in real world settings. Further work should focus on strategies to improve readability and accessibility of AI generated health information, including prompt design, model fine tuning, and integration of health literacy guidelines into AI development.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eAuthor ContributionsS.C. conceived the study, designed the methodology, collected the data, performed the analysis, and drafted the manuscript. V.K. assisted with study design, data collection, and manuscript editing. R.K.N. provided subject matter expertise, supervised the project, and contributed to manuscript revision. A.A. provided senior oversight, contributed to study design and interpretation of results, and critically revised the manuscript. All authors reviewed and approved the final manuscript.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets generated and analyzed during the current study are available from the corresponding author on reasonable request. The study used publicly available patient education materials from the American Gastroenterological Association website and ChatGPT-generated responses created using standardized prompts.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eGigu\u0026egrave;re A, Zomahoun HTV, Carmichael PH, et al. Printed educational materials: effects on professional practice and healthcare outcomes. Cochrane Database Syst Rev. 2020;8(8):CD004398. Published 2020 Jul 31. doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/14651858.CD004398.pub4\u003c/span\u003e\u003cspan address=\"10.1002/14651858.CD004398.pub4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eEltorai AE, Ghanian S, Adams CA Jr, Born CT, Daniels AH. Readability of patient education materials on the american association for surgery of trauma website. Arch Trauma Res. 2014;3(2):e18161. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.5812/atr.18161\u003c/span\u003e\u003cspan address=\"10.5812/atr.18161\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. PMID: 25147778; PMCID: PMC4139691.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWill J, Gupta M, Zaretsky J, Dowlath A, Testa P, Feldman J. Enhancing the Readability of Online Patient Education Materials Using Large Language Models: Cross-Sectional Study. J Med Internet Res. 2025;27:e69955. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2196/69955\u003c/span\u003e\u003cspan address=\"10.2196/69955\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. PMID: 40465378; PMCID: PMC12177420.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eJavaid, M., Haleem, A., \u0026amp; Singh, R. P. (2023). Chatgpt for Healthcare Services: An Emerging Stage for an Innovative Perspective. \u003cem\u003eBenchCouncil Transactions on Benchmarks, Standards and Evaluations, 3\u003c/em\u003e, Article ID: 100105. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.tbench.2023.100105\u003c/span\u003e\u003cspan address=\"10.1016/j.tbench.2023.100105\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGibson D, Jackson S, Shanmugasundaram R, et al. Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment. J Med Internet Res. 2024;26:e55939. Published 2024 Aug 14. doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2196/55939\u003c/span\u003e\u003cspan address=\"10.2196/55939\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShafau F, Wahl C. Evaluating the Readability of AI-Generated Patient Information on Chronic Diseases. Chronic Dis Transl Med. 2025;11(4):316\u0026ndash;317. Published 2025 Sep 1. doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1002/cdt3.70020\u003c/span\u003e\u003cspan address=\"10.1002/cdt3.70020\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAmerican Gastroenterological Association. \u003cem\u003eAGA GI Patient Center.\u003c/em\u003e Accessed February 17, 2026. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://patient.gastro.org/\u003c/span\u003e\u003cspan address=\"https://patient.gastro.org/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOpenAI. ChatGPT. Version GPT-5.2. OpenAI; 2026. Accessed February 18, 2026. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://chat.openai.com/\u003c/span\u003e\u003cspan address=\"https://chat.openai.com/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMiller, Matthew DO*; Mistry, Rumit DO; Hixson, William BS; Behers, Benjamin MD; Hamad, Karen MD; Ramos Ortiz, Gustavo MD; Jerez Diaz, David MD. S2961\u0026emsp;Assessing the Quality and Readability of AI Chatbot-Generated Patient Education Materials for Endoscopic Retrograde Cholangiopancreatography. The American Journal of Gastroenterology 120(10S2): p S637, October 2025. | DOI: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.14309/01.ajg.0001139304.44763.5e\u003c/span\u003e\u003cspan address=\"10.14309/01.ajg.0001139304.44763.5e\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePatel AJ, Kloosterboer A, Yannuzzi NA, Venkateswaran N, Sridhar J. Evaluation of the Content, Quality, and Readability of Patient Accessible Online Resources Regarding Cataracts. Semin Ophthalmol. 2021;36(5\u0026ndash;6):384\u0026ndash;391. doi: 10.1080/08820538.2021.1893758. Epub 2021 Feb 26. PMID: 33634726; PMCID: PMC8328867.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eAbout readability. Accessed February 17, 2026. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://readable.com/readability/\u003c/span\u003e\u003cspan address=\"https://readable.com/readability/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWeiss BD. Health literacy: A manual for clinicians. Chicago, IL: American Medical Association Foundation and American Medical Association; 2003.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eWeiss BD, Coyne C. Communicating with patients who cannot read. N Engl J Med. 1997;337(4):272\u0026ndash;4. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1056/NEJM199707243370411\u003c/span\u003e\u003cspan address=\"10.1056/NEJM199707243370411\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCotugna N, Vickery CE, Carpenter-Haefele KM. Evaluation of literacy level of patient education pages in health-related journals. J Community Health. 2005;30(3):213\u0026ndash;9. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1007/s10900-004-1959-x\u003c/span\u003e\u003cspan address=\"10.1007/s10900-004-1959-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eDoak LG, Doak CC, Meade CD. Strategies to improve cancer education materials. Oncol Nurs Forum. 1996;23(8):1305\u0026ndash;12.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRahimli Ocakoglu S, Coskun B. The emerging role of AI in patient education: a comparative analysis of LLM accuracy for pelvic organ prolapse. \u003cem\u003eMed Princ Pract\u003c/em\u003e. 2024;33(4):330\u0026ndash;337. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1159/000538538\u003c/span\u003e\u003cspan address=\"10.1159/000538538\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003ePal A, Wangmo T, Bharadia T, Ahmed-Richards M, Bhanderi MB, Kachhadiya R, Allemann SS, Elger BS. Generative AI/LLMs for Plain Language Medical Information for Patients, Caregivers and General Public: Opportunities, Risks and Ethics. Patient Prefer Adherence. 2025;19:2227\u0026ndash;2249. doi: 10.2147/PPA.S527922. PMID: 40771655; PMCID: PMC12325106.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eCharnock D, Shepperd S, Needham G, Gann R. DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health. 1999;53(2):105\u0026ndash;111. doi:\u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1136/jech.53.2.105\u003c/span\u003e\u003cspan address=\"10.1136/jech.53.2.105\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eBoyer C, Selby M, Scherrer JR, Appel RD. The Health On the Net Code of Conduct for medical and health Websites. Comput Biol Med. 1998;28(5):603\u0026thinsp;\u0026ndash;\u0026thinsp;10. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/s0010-4825(98)00037-7\u003c/span\u003e\u003cspan address=\"10.1016/s0010-4825(98)00037-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. PMID: 9861515.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOzduran E, B\u0026uuml;y\u0026uuml;k\u0026ccedil;oban S. Evaluating the readability, quality and reliability of online patient education materials on post-covid pain. PeerJ. 2022;10:e13686. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.7717/peerj.13686\u003c/span\u003e\u003cspan address=\"10.7717/peerj.13686\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. PMID: 35880220; PMCID: PMC9308460.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eFurukawa E, Okuhara T, Liu M, Okada H, Kiuchi T. Evaluating Online and Offline Health Information With the Patient Education Materials Assessment Tool: Protocol for a Systematic Review. JMIR Res Protoc. 2025;14:e63489. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.2196/63489\u003c/span\u003e\u003cspan address=\"10.2196/63489\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. PMID: 39813665; PMCID: PMC11780281.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eShoemaker SJ, Wolf MS, Brach C. Development of the Patient Education Materials Assessment Tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Educ Couns. 2014;96(3):395\u0026ndash;403. doi: \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/j.pec.2014.05.027\u003c/span\u003e\u003cspan address=\"10.1016/j.pec.2014.05.027\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Epub 2014 Jun 12. PMID: 24973195; PMCID: PMC5085258.\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOpenAI. Introducing ChatGPT Health. Published January 7, 2026. Accessed February 17, 2026. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://openai.com/index/introducing-chatgpt-health/\u003c/span\u003e\u003cspan address=\"https://openai.com/index/introducing-chatgpt-health/\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"digestive-diseases-and-sciences","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ddsj","sideBox":"Learn more about [Digestive Diseases and Sciences](http://link.springer.com/journal/10620)","snPcode":"10620","submissionUrl":"https://submission.nature.com/new-submission/10620/3","title":"Digestive Diseases and Sciences","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"artificial intelligence, patient education, health literacy, gastroenterology, readability","lastPublishedDoi":"10.21203/rs.3.rs-8940111/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8940111/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eBackground and Aims: Patients increasingly use the internet and artificial intelligence chatbots to obtain health information, yet the readability, quality, understandability, and actionability of AI-generated gastrointestinal patient education remain unclear. This study compared gastrointestinal patient education from a professional society website with content generated by ChatGPT using validated health literacy instruments.\u003c/p\u003e\n\u003cp\u003eMethods: In this cross-sectional comparative study, 50 gastrointestinal patient education topics from the American Gastroenterological Association patient information website were paired with ChatGPT-generated responses using standardized prompts. Readability was assessed using the Flesch-Kincaid Grade Level. Quality of treatment information was evaluated using the DISCERN instrument. Understandability and actionability were assessed using the Patient Education Materials Assessment Tool. Paired t tests were used to compare mean scores between sources.\u003c/p\u003e\n\u003cp\u003eResults: Fifty paired topics were analyzed. The mean Flesch-Kincaid Grade Level was higher for ChatGPT than professional society materials (10.33 vs 8.72; mean difference, 1.61; 95% CI, 0.89–2.32; P = .00012). Differences in DISCERN scores (63.52 vs 64.30; mean difference, −0.78; 95% CI, −3.10 to 1.53; P = .49), PEMAT understandability (87.91% vs 86.52%; mean difference, 1.39%; 95% CI, −1.48% to 4.26%; P = .33), and PEMAT actionability (78.57% vs 77.93%; mean difference, 0.63%; 95% CI, −3.14% to 4.40%; P = .73) were not statistically significant.\u003c/p\u003e\n\u003cp\u003eConclusion: ChatGPT-generated gastrointestinal patient education demonstrated similar quality, understandability, and actionability compared with professional society materials but was written at a significantly higher reading level. Improving readability may enhance accessibility and support the safe integration of AI-generated patient education.\u003c/p\u003e","manuscriptTitle":"Readability, Quality, Understandability, and Actionability of ChatGPT Generated GI Patient Education vs AGA Patient Center","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-17 19:44:56","doi":"10.21203/rs.3.rs-8940111/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"editorInvitedReview","content":"","date":"2026-05-06T22:54:09+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-20T17:20:19+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"258612991008033445369947298136097340623","date":"2026-04-13T16:25:21+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"226161726271148291798158647525906450933","date":"2026-04-13T14:41:29+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"337828565202228932859382632898752592845","date":"2026-04-11T11:03:04+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-09T22:40:20+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"299120225845396121634101765428216603277","date":"2026-04-09T11:54:47+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-09T08:24:53+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-02-24T20:18:28+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-02-24T13:19:02+00:00","index":"","fulltext":""},{"type":"submitted","content":"Digestive Diseases and Sciences","date":"2026-02-22T15:22:31+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"digestive-diseases-and-sciences","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"ddsj","sideBox":"Learn more about [Digestive Diseases and Sciences](http://link.springer.com/journal/10620)","snPcode":"10620","submissionUrl":"https://submission.nature.com/new-submission/10620/3","title":"Digestive Diseases and Sciences","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"3606c39e-b785-4255-95ec-22accd5fbae1","owner":[],"postedDate":"April 17th, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-06T22:54:09+00:00","index":21,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-04-17T19:44:56+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-17 19:44:56","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8940111","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8940111","identity":"rs-8940111","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00