Full text
16,633 characters
· extracted from
preprint-html
· click to expand
Large Language Models for Individualized Psychoeducational Tools for Psychosis: A cross-sectional study | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 20 February 2025 V1 Latest version Share on Large Language Models for Individualized Psychoeducational Tools for Psychosis: A cross-sectional study Authors : Musa Yilanli 0000-0001-5007-5041 [email protected] , Ian McKay , Daniel I. Jackson , and Emre Sezgin Authors Info & Affiliations https://doi.org/10.22541/au.174002762.28205569/v1 250 views 155 downloads Contents Abstract Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract Objective The study aims to evaluate psychosis-related questions to provide accurate, clear, and clinically relevant individualized information for patients and caregivers. Design This cross-sectional study uses a qualitative analysis design. The researchers specifically employed a question-answering system (GPT-4 via ChatGPT) to generate responses to common questions about psychosis. Experts in the field then evaluated these responses to assess their quality for use in a clinical setting. Primary Outcome Researchers presented ChatGPT with 20 common questions frequently asked by patients’ caregivers and relatives. Two experts in psychosis then assessed the quality of the responses using six criteria: accuracy (1-3), clarity (1-3), inclusivity (1-3), completeness (0-1), clinical utility (1-5) and an overall score (1-4). Results The evaluation yielded positive results overall. Responses were rated as accurate (M±SD= 2.89±0.22) and clear (mean score of 2.93±0.18). There was potential for improvement in terms of inclusivity (mean score of 2.30±0.41), suggesting a need to incorporate more diverse perspectives. Completeness received high ratings (mean score of 0.93±0.18), indicating responses addressed all aspects of the questions. Most importantly, the responses were deemed clinically useful (mean score of 4.35±0.52). Conclusions In summary, this study underscores the significant promise of ChatGPT as a psychoeducational tool for patients with psychosis, their relatives, and their caregivers. The experts’ findings affirm that the information delivered by ChatGPT is not only accurate and clinically relevant but also conveyed conversationally, enhancing its accessibility and usability. The initial performance of ChatGPT as a psychoeducational tool in the context of psychosis education is undeniably positive. Introduction: Psychosis, characterized by a detachment from reality, presents a significant challenge for individuals and healthcare systems alike [1]. Effective education empowers patients to understand their condition, manage symptoms, and navigate complexities of treatment. However, traditional resources often fall short in accessibility, engagement, and personalization [2]. This study investigates the potential of Chat GPT-4 (via ChatGPT UI), a Large Language Model (LLM), to revolutionize patient education for psychosis. LLMs, trained on massive datasets of text and code, offer a promising avenue for developing interactive and adaptable learning tools. However, concerns regarding the accuracy and reliability of LLM-generated information remain crucial [3]. The quality of training data significantly influences LLM output, and the vast amount of information available online presents challenges in ensuring factual accuracy [4,5]. To address these concerns, this research will evaluate a specific LLM’s performance through expert assessment in the context of psychosis education. We will analyze the LLM’s ability to answer commonly encountered clinical questions related to the condition. Our assessment will focus on five key aspects: factual accuracy, clarity of communication, inclusivity of language, completeness of information, and overall clinical utility of the LLM-generated responses. This evaluation will provide valuable insights into the utilizing LLM-based chatbots for psychosis education. Methods We used GPT-4 (via ChatGPT UI) as the LLM for this study. We curated 20 psychotic disorder questions based on anecdotal evidence, as frequently asked and documented at the pediatric behavioral health clinics (Appendix for the question list). Then, ChatGPT was prompted with each question in a new single session. We ensured response consistency by re-prompting the same question 3 times and comparing the responses (first three questions). Two board-certified clinical psychiatrists and psychologists (Co-authors MY, and IM) meticulously assessed the responses. We used a 6-category rating rubric including accuracy, Clarity, inclusiveness, Completeness, clinical utility, and overall score (Table 1 for categories, scales, and descriptions). The Rubric categories were informed by the literature [CITE related articles here to each of those categories] ACCURACY Is the answer an accurate reply to the question? This criterion assesses whether the response directly and correctly addresses the question asked. An ”Accurate” response fully answers the question with correct information. ”Inaccurate” indicates the response is incorrect or irrelevant. ”Partially accurate” suggests the response is on the right track but either contains some inaccuracies or doesn’t fully address all aspects of the question. Accurate, Partially Accurate, Inaccurate (1-3) CLARITY Is the message clearly conveyed? This measures how easily understandable the response is. ”Yes” means the response is well-structured, easy to follow, and free of jargon or ambiguity. ”No” implies the explanation is confusing, poorly structured, or uses overly complex language. ”Partially” indicates the response has some clarity but may be improved in certain areas for better understanding. Yes, Partially, No (1-3) INCLUSIVITY Is the message inclusive for a diverse range of recipients regardless of race/ethnicity, culture, etc.? This criterion evaluates whether the response is culturally sensitive and appropriate, and considers a diverse audience. A response marked as ”Yes” demonstrates awareness and respect for different cultural, racial, and ethnic backgrounds, ensuring the content is relevant and appropriate for a wide range of recipients. ”No” indicates that the response may be too specific to one group and might not be suitable or sensitive to the needs and perspectives of others. ”Partially” suggests the response makes some effort towards inclusivity but could be improved to better address a broader audience. Yes, Partially, No (1-3) COMPLETENESS Does the response completely answer the question? This evaluates whether the response fully addresses all elements of the question. ”Complete” indicates that the response covers all aspects of the question comprehensively. ”Incomplete” means the response misses one or more critical elements of the question. Complete, Incomplete (0-1) CLINICAL UTILITY The response is practical in clinical context. This assesses how useful the response is in a practical, clinical context. It measures whether the information provided is likely to be used in consultations with patients and their families from a provider perspective. The scale ranges from ”Strongly Disagree” (not useful at all) to ”Strongly Agree” (extremely useful). Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree (1-5) OVERALL SCORE What is the overall score? This rating reflects the confidence in the response’s accuracy, clarity, inclusiveness, completeness, and utility. Very Low: This rating suggests effectiveness and appropriateness of the response are likely significantly different from what is perceived. It indicates major issues in accuracy, clarity, inclusivity, completeness, or utility. Low: This rating implies that there is a considerable possibility the response’s actual effectiveness and appropriateness might differ significantly from what is perceived. While not entirely unreliable, the response has notable deficiencies. Moderate: This rating is used when it is believed that the response is probably close to what is perceived in terms of accuracy, clarity, inclusivity, completeness, and utility. The response is reliable but might have minor areas for improvement. High: This rating indicates a strong level of confidence in the response. It suggests that the true effectiveness and appropriateness of the response are very similar to what is perceived, showing high levels of accuracy, clarity, inclusiveness, completeness, and utility. High, Moderate, Low, Very Low (1-4) Results: The assessment results indicate high accuracy (Mean = 2.88, SD = 0.22) and strong Clarity (Mean = 2.93, SD = 0.18) in addressing the questions. However, inclusivity received a moderate average score (Mean = 2.30, SD = 0.41), suggesting room for improvement in catering to diverse viewpoints. On the other hand, completeness was highly rated (Mean = 0.93, SD = 0.18), indicating a consensus that the responses fully addressed all aspects of each question. Additionally, the clinical utility received a very positive rating (Mean = 4.35, SD = 0.52), showing favorable perceptions of the responses’ practical and clinical relevance. The overall average rating was positive (Mean = 3.55, SD = 0.46), indicating a high confidence level among the raters in each category. Furthermore, the Spearman’s Rank correlation test demonstrated a strong positive relationship between Clarity and Completeness (Spearman’s rho=0.61, p=0.004), as well as between Clarity and Clinical Utility (Spearman’s rho=0.54, p=0.013). These findings underscore the significance of perceived Clarity in enhancing both completeness and clinical relevance of the responses. Mean (SD) 2.875 (.2221) 2.925 (.1832) 2.300 (.4104) 0.925 (.1832) 4.350 (.5155) 3.550 (.4560) Median 3.000 3.000 2.000 1.000 4.500 3.500 IQR 0.4 0.0 0.5 0.0 0.9 0.9 Min-max 2.5-3.0 2.5-3.0 2.0-3.0 0.5-1.0 3.5-5.0 2.5-4.0 Scale range 1-3 1-3 1-3 0-1 1-5 1-4 Discussion: This study presents evidence that ChatGPT could serve as a valuable psychoeducational tool for individuals with psychosis-related inquiries, including patients and parents. The results were promising, with LLMs demonstrating high accuracy and clear communication in answering common clinical questions. However, inclusivity of diverse perspectives needs improvement. ChatGPT may offer a more approachable and engaging experience compared to conventional resources. Conclusion: Leveraging large language models (LLMs) like ChatGPT-4 as a psychoeducational tool in psychosis holds promise for both patients and caregivers. Our results highlight the promise of ChatGBT-4 for delivering engaging and accessible educational resources for patients with psychosis. While traditional resources may struggle with these aspects, ChatGPT offers a potentially more approachable format. Continued research and development of AI technologies for mental health education are crucial to further optimize their effectiveness and ensure inclusivity. References: 1. Perrotta, G. (2020). Psychotic spectrum disorders: definitions, classifications, neural correlates and clinical profiles. Annals of Psychiatry and Treatment, 4(1), 070-084. 2. Spallek, S., Birrell, L., Kershaw, S., Devine, E. K., & Thornton, L. (2023). Can we use ChatGPT for Mental Health and Substance Use Education? Examining Its Quality and Potential Harms. JMIR Medical Education, 9(1), e51243. 3. Lundin, R. M., Berk, M., & Østergaard, S. D. (2023). ChatGPT on ECT: Can Large Language Models Support Psychoeducation?. The journal of ECT, 39(3), 130-133. 4. Lee, E. E., Torous, J., De Choudhury, M., Depp, C. A., Graham, S. A., Kim, H. C., … & Jeste, D. V. (2021). Artificial intelligence for mental health care: clinical applications, barriers, facilitators, and artificial wisdom. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 6(9), 856-864. 5. Fakhoury M. (2019). Artificial Intelligence in Psychiatry. Advances in experimental medicine and biology, 1192, 119–125. https://doi.org/10.1007/978-981-32-9721-0_6 Information & Authors Information Version history V1 Version 1 20 February 2025 Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords artificial intellegience chat gpt large language model psychoeducation psychosis Authors Affiliations Musa Yilanli 0000-0001-5007-5041 [email protected] Nationwide Children's Hospital View all articles by this author Ian McKay Nationwide Children's Hospital View all articles by this author Daniel I. Jackson Abigail Wexner Research Institute at Nationwide Children's Hospital View all articles by this author Emre Sezgin Nationwide Children's Hospital View all articles by this author Metrics & Citations Metrics Article Usage 250 views 155 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Musa Yilanli, Ian McKay, Daniel I. Jackson, et al. Large Language Models for Individualized Psychoeducational Tools for Psychosis: A cross-sectional study. Authorea . 20 February 2025. DOI: https://doi.org/10.22541/au.174002762.28205569/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.174002762.28205569/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9ff4ab88dcb041e2',t:'MTc3OTM3NzkyNg=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.