High patient and surgeon satisfaction with ChatGPT-generated responses to real patient questions regarding total knee arthroplasty.

doi:10.1186/s13018-025-06451-2

High patient and surgeon satisfaction with ChatGPT-generated responses to real patient questions regarding total knee arthroplasty.

2025 · doi:10.1186/s13018-025-06451-2 · PMID:41361785 · PMC12683780

OA: gold CC-BY-NC-ND-4.0

📄 Open PDF Full text JSON View on PubMed View at publisher

Full text 32,734 characters · extracted from pmc-nxml · 6 sections · click to expand

Results

To choose typical questions from patients before TKA, 220 questions were collected from 44 patients and categorized into 52 distinct questions (Supplementary Table 1). After analyzing the frequency of these questions, we identified the 10 most frequently asked questions (Table 1 ). In addition to the high-frequency questions, patients raised various concerns related to health insurance reimbursement, surgery timing, surgeon selection, postoperative mobility, care after surgery, medication use, anesthesia methods, preoperative fasting, postoperative diet, hospital stay duration, and risk of deep vein thrombosis. These issues encompass the spectrum of patient concerns regarding perioperative surgery, addressing aspects such as the procedure's safety, recovery, quality of life, and financial implications. We then utilized OpenAI’s ChatGPT-3.5 to answer a series of questions from patients. Each question and its corresponding answer are detailed in Supplementary Table 2. The questions from patients and the responses from ChatGPT were provided in a survey and sent to surgeons at different hospitals to rate the accuracy and professionalism of the ChatGPT responses. A total of 81 questionnaires were collected from orthopedic surgeons to assess their satisfaction with the responses generated by ChatGPT in communicating with TKA patients. The mean satisfaction score for each answered question is provided in Fig. 1 , revealing no significant disparity in satisfaction across different queries. Surgeons rated ChatGPT with an average satisfaction score of 4.72 out of 5; 75% of the surgeons were satisfied (score of 5), 23% were relatively satisfied (score of 4), and 2% were generally satisfied (score of 3). Interestingly, no surgeons reported being "unsatisfied" with the ChatGPT responses (score of 1), underscoring the value of ChatGPT in medical communication. Fig. 1 Ratings of surgeons’ and patients' satisfaction with ChatGPT responses. The highest possible satisfaction score is 5 (5-satisfied, 4-relatively satisfied, 3-generally satisfied, 2-relatively unsatisfied, 1-unsatisfied). Questions 1–10 are the same as those in Table 1 Ratings of surgeons’ and patients' satisfaction with ChatGPT responses. The highest possible satisfaction score is 5 (5-satisfied, 4-relatively satisfied, 3-generally satisfied, 2-relatively unsatisfied, 1-unsatisfied). Questions 1–10 are the same as those in Table 1 In addition, the orthopedic surgeons evaluated the ChatGPT responses in terms of TKA patient communication and assigned the responses an average rating of 4.41 out of 5 for professionalism and 4.36 out of 5 for accuracy, suggesting that the responses provided generally reliable medical knowledge and professional advice. Moreover, we collected information from surgeons about their willingness to substitute ChatGPT responses for their own responses. We found that most surgeons (55.6%) were willing to provide ChatGPT responses to answer almost all (80%-100%) of their patients' questions. A total of 29.6% were willing to provide a majority (60%-80%) of such responses to their patients. A total of 13.6% were willing to provide a moderate number (40%-60%) of ChatGPT responses. Interestingly, only one surgeon was willing to provide a small percentage (20%) of ChatGPT responses to patients. To explore the patient satisfaction rate with the ChatGPT responses, 53 patients, including 34 women and 19 men, were queried, and the mean patient satisfaction rating was 4.99/5. The mean satisfaction score for each answered question is provided in Fig. 1 . The average education level of respondents was 4.8th grade. These findings demonstrated that almost all the patients were satisfied with the responses provided by ChatGPT despite a low level of education. To assess the readability of the ChatGPT responses, we analyzed the FKGL scores. The average FKGL for all the responses was 9.45 (95% CI, 0.76), suggesting a reading level of 9th grade or higher. After ChatGPT was instructed to provide answers in an easy-to-understand manner, the mean FKGL decreased to 8.75 (95% CI, 0.62). Table 2 shows the FKGL scores for the responses to the 10 questions. An independent samples t test revealed a significant improvement in readability following the instruction to provide easier-to-understand answers (p < 0.05). The second iteration of the ChatGPT responses is provided in Supplementary Table 3. The revised responses from ChatGPT were fit for a reading level of 8th grade or above. Table 2 The readability analysis FKGL scores FKGL Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 V1 9.95 8.3 9.03 10.3 8.98 9.74 8.94 10.77 9.64 8.83 V2 8.27 7.97 8.92 8.9 7.95 8.42 9.57 9.57 9.58 8.36 FKGL, Flesch-Kincaid Grade Level; Q1–Q10 corresponds to the ten questions in Table 1 ; V1 represents the score for the first answer to each question; V2 represents the score when ChatGPT was instructed to provide an easily understandable answer The readability analysis FKGL scores FKGL, Flesch-Kincaid Grade Level; Q1–Q10 corresponds to the ten questions in Table 1 ; V1 represents the score for the first answer to each question; V2 represents the score when ChatGPT was instructed to provide an easily understandable answer FKGL, Flesch-Kincaid Grade Level; Q1–Q10 correspond to the ten questions in Table 1 ; V1 represents the score for the first answer to each question; V2 represents the score when ChatGPT was instructed to provide an easily understandable answer. To further investigate the factors that affect surgeon's overall satisfaction and their evaluations of the accuracy and professionalism of the ChatGPT responses, the level of hospital at which the surgeon worked, the surgeon's professional title, and their surgical experience were used for analysis. In this survey, 81 completed questionnaires were collected: 50 from third-level Grade A hospitals, 8 from second-level Grade A hospitals, and 23 from third-level Grade B hospitals. Two questionnaires were collected from medical students, 15 from resident surgeons, 31 from attending surgeons, 26 from associate surgeons, and 7 from chief surgeons. A total of 27 questionnaires were obtained from surgeons performing 0–50 surgeries per year, 5 questionnaires from surgeons performing 50–100 surgeries per year, 2 questionnaires from surgeons performing 100–200 surgeries per year, 4 questionnaires from surgeons performing 200–500 surgeries per year, and 41 questionnaires from surgeons who were only assistants during TKA surgery. The results revealed that as the hospital level increased, the overall satisfaction (r = −0.236; P < 0.001), accuracy rating (r = −0.317; P < 0.001), and professionalism rating (r = −0.429; P < 0.001) gradually decreased (Fig. 2 a–c). Upon further analysis across surgeon titles, no significant correlation was observed between satisfaction and professionalism (Table 3 ). Interestingly, the professionalism ratings increased with increasing surgical experience (r = 0.093, P < 0.01) (Fig. 2 d). Finally, a significant correlation was noted between the degree of the number of surgeons assisting the chief surgeon surgical participation and the surgeons' level of satisfaction (r = 0.119, P < 0.01) and accuracy rating (r = −0.092, P < 0.01) for the ChatGPT responses. The data in Fig. 2 e suggest that experienced surgeons might evaluate ChatGPT responses from a user experience perspective in addition to professional standards. However, the accuracy ratings of the ChatGPT responses improved with more advanced surgeon title (r = 0.12, P < 0.01) (Fig. 2 f). Fig. 2 Correlations between satisfaction, accuracy, and professionalism ratings and hospital level, surgeon level, and surgical experience. a The line graph shows the correlation between overall satisfaction ratings (out of 5) and different hospital levels; as the hospital level increases, the surgeons' satisfaction ratings (out of 5) regarding the ChatGPT responses decrease. b The line graph shows the correlation between hospital levels and accuracy ratings (out of 5). As hospital levels increased, surgeons' ratings of the accuracy of the ChatGPT responses showed a decreasing trend. c The line graph shows the correlation between hospital levels and professionalism ratings (out of 5). As the hospital level increased, the surgeons' ratings of the professionalism (out of 5) of the ChatGPT responses showed a decreasing trend. d The line graph shows the correlation between the number of surgeries performed annually and professionalism ratings (out of 5). As the number of bone and joint replacement surgeries performed by surgeons on patients under their primary care increased, the surgeons' ratings of the professionalism of the ChatGPT responses tended to decrease. e. The line graph shows the correlation between the number of surgeons participating in a surgery and surgeons' overall satisfaction with the ChatGPT responses (out of 5). As the number of surgeons participating in bone and joint replacement surgeries increased, the surgeons' satisfaction with the ChatGPT responses tended to decrease. f. The line graph shows the correlation between the title of the surgeons and the surgeons' rating of the accuracy of the ChatGPT responses (out of 5). As the title of the surgeon became more advanced, the surgeons' rating of the accuracy of the ChatGPT responses tended to decrease Table 3 Correlation analysis among satisfaction, accuracy, and professionalism Rank correlation analysis Satisfaction Accuracy Professionalism Hospital level Rs = -0.236*** Rs = -0.317*** Rs = -0.429*** Medical position Rs = -0.035 Rs = 0.12** Rs = 0.026 Number of surgeries Rs = -0.013 Rs = 0.023 Rs = 0.093** Surgical participation quantity Rs = 0.119** Rs = -0.092** Rs = -0.033 Rs are rank correlation coefficients, with * representing P ≤ 0.05, ** representing P ≤ 0.01, and *** representing P ≤ 0.001 Correlations between satisfaction, accuracy, and professionalism ratings and hospital level, surgeon level, and surgical experience. a The line graph shows the correlation between overall satisfaction ratings (out of 5) and different hospital levels; as the hospital level increases, the surgeons' satisfaction ratings (out of 5) regarding the ChatGPT responses decrease. b The line graph shows the correlation between hospital levels and accuracy ratings (out of 5). As hospital levels increased, surgeons' ratings of the accuracy of the ChatGPT responses showed a decreasing trend. c The line graph shows the correlation between hospital levels and professionalism ratings (out of 5). As the hospital level increased, the surgeons' ratings of the professionalism (out of 5) of the ChatGPT responses showed a decreasing trend. d The line graph shows the correlation between the number of surgeries performed annually and professionalism ratings (out of 5). As the number of bone and joint replacement surgeries performed by surgeons on patients under their primary care increased, the surgeons' ratings of the professionalism of the ChatGPT responses tended to decrease. e. The line graph shows the correlation between the number of surgeons participating in a surgery and surgeons' overall satisfaction with the ChatGPT responses (out of 5). As the number of surgeons participating in bone and joint replacement surgeries increased, the surgeons' satisfaction with the ChatGPT responses tended to decrease. f. The line graph shows the correlation between the title of the surgeons and the surgeons' rating of the accuracy of the ChatGPT responses (out of 5). As the title of the surgeon became more advanced, the surgeons' rating of the accuracy of the ChatGPT responses tended to decrease Correlation analysis among satisfaction, accuracy, and professionalism Rs are rank correlation coefficients, with * representing P ≤ 0.05, ** representing P ≤ 0.01, and *** representing P ≤ 0.001

Materials

Perioperative questions were collected from 44 consecutive patients who were scheduled for TKA at our institution between January and March 2024. The knee phenotypes of all 44 patients were osteoarthritis and consistently grade 4. Our study was originally designed to conduct questionnaires among 50 patients. However, 6 patients were excluded for various reasons, and 44 patients ultimately participated in the questionnaire survey. This sample size was determined on the basis of feasibility constraints while aiming to capture diverse clinical concerns. While a formal power calculation was not performed for this exploratory analysis, a sample of 44 patients (providing 220 questions) was deemed sufficient to achieve thematic saturation for identifying the most frequent patient concerns, which was the primary goal of this phase. The mean age was 68.5 ± 8.2 years; 65% were female, and the mean education level was 4.8th grade. Each patient was asked to provide five questions that they were most concerned about before and after TKA surgery, encompassing topics such as preoperative preparation, surgical risks, postoperative protocols, and rehabilitation. A total of 220 questions were collected for further analysis. In this study, we utilized the OpenAI ChatGPT-3.5 version to answer the top 10 most repeated questions (Table 1 ). ChatGPT was accessed via a web browser on April 7, 2024 ( https://chat.openai.com/chat ). The 10 questions posed to ChatGPT are essentially consistent with the 10 questions most frequently asked from patients, with no major differences. We initiated the queries with the prompt, "You are now an arthroplasty surgeon, and the patient is preparing for knee arthroplasty surgery. Please answer the following patients’ questions." To assess the adaptability of ChatGPT in responding to patient queries under different instructional contexts, we conducted two types of prompts: one using the original patient questions and another with an added instruction requesting “an easy-to-understand answer” to evaluate the model’s ability to simplify its responses for better patient comprehension. This approach was intended to simulate real-world scenarios where clinicians might refine patient inquiries to improve the clarity or educational value of AI-generated responses. The original questions were retained for evaluations of overall satisfaction and accuracy to ensure that the core assessment of ChatGPT's performance was based on authentic patient expressions. All the ratings were based on the original patient questions, and rephrased versions were only used for readability comparison. The consistency of the responses was ensured by asking the questions in a controlled, single-session format. Table 1 The top 10 questions asked by TKA patients Number Question Frequency 1 What is the cost of the operation and when do I have to pay for it? 27 2 Will there be any pain after the operation? 25 3 What is the procedure like and what implants are used to put it in? 19 4 How long will I be discharged from the hospital after the operation and when will I be fully recovered? 16 5 When will I be able to walk on the floor after the operation? 15 6 Will other illnesses such as high blood pressure, diabetes, uremia, or heart disease affect the operation? 15 7 What should I eat and drink after the operation, and should I take any supplements or other medication? 12 8 What does the surgical consent form contain and when should I sign it? 11 9 What should I do to prepare for the operation? 10 10 How soon will I be able to have a bowel movement after the operation? 10 The top 10 questions asked by TKA patients Surgeons evaluated the ChatGPT responses to assess their clinical validity, complementing the patient satisfaction data. To further investigate orthopedic surgeons’ satisfaction with ChatGPT-generated responses, we designed questionnaires comprising patient questions and corresponding AI-generated answers on a website ( https://www.wjx.cn/ ). These questionnaires were sent to surgeons at different levels of hospitals in China (hospitals in China are ranked by size, technology, service, and research into three levels: I, II, and III, with subgrades A, B, and C, respectively). Surgeon respondents were stratified by hospital level (III-A, II-A, III-B), title (student to chief), and annual TKA volume (0–50, 50–200, > 200). Questionnaires were essential for assessing medical accuracy and professionalism, as orthopedic surgeons are ultimately responsible for patient communication. The questionnaire assessed the satisfaction of orthopedic surgeons at different hospitals with each response given by ChatGPT using a 5-point scale, while considering professional titles that included medical students to chief surgeons and were stratified by the number of surgeries performed annually. Surgeons rated the accuracy/professionalism of the responses according to a structured 5-point Likert scale: accuracy referred to the medical correctness of the information, whereas professionalism assessed whether the response was appropriately structured, tone-appropriate, and reflective of standard medical communication practices (Supplementary Table 4). To further investigate patient satisfaction with the answers generated by ChatGPT, a questionnaire with a maximum satisfaction score of 5 was produced and sent to 53 patients undergoing TKA surgery in our orthopedic department at that time. None of these patients were among the 44 patients who had provided the original questions. Patient evaluation is critical for determining real-world comprehensibility and satisfaction among end-users. We simultaneously rated the readability of the two ChatGPT responses on a readability website ( https://datayze.com/readability-analyser ) and ultimately used the Flesch‒Kincaid Grade Level (FKGL) to assess the readability of the ChatGPT responses. FKGL was chosen because it is an effective method for evaluating the readability of content and is widely used in readability studies. We utilized the Statistical Package for the Social Sciences 26 (SPSS 26) for data processing and analysis. Descriptive statistics, including the mean, standard deviation, and frequency distribution, were employed to summarize the basic characteristics of the dataset. Correlation analysis was conducted to investigate the factors that affect surgeons’ satisfaction with the accuracy and professionalism of the responses, such as the surgeons' professional title, the level of the hospital at which they work, and number of TKA surgeries they perform annually. Spearman’s rank correlation was used to assess ordinal variables (hospital level, title). The correlation strengths were interpreted as follows: Rs = 0.1–0.3 (weak), 0.3–0.5 (moderate), and > 0.5 (strong). Independent t tests were used to compare the readability scores. Significance was established at p < 0.05.

Discussion

The present study evaluated the utility of ChatGPT in addressing the perioperative concerns of TKA patients. We found that both surgeons and patients highly rated the AI-generated responses in terms of satisfaction, accuracy, and professionalism. In our study, patients' common concerns about perioperative TKA focused on the cost of surgery, postoperative pain, surgical procedure, recovery time, mobility, dietary advice, presurgical preparation, and postoperative continence problems. These concerns reflected patients' real worries about surgical safety, the recovery process, quality of life, and financial burden. ChatGPT demonstrated a consistent ability to provide informative and comprehensible responses to these frequently raised concerns. Moreover, the surgeon group gave high ratings to ChatGPT's responses, with an average satisfaction score of 4.72/5.00. Professionalism and accuracy were rated at 4.41/5.00 and 4.36/5.00, respectively. These ratings demonstrated the value of ChatGPT in medical communication. Moreover, most surgeons indicated that they would be willing to use ChatGPT to answer 80%-100% of their patients' questions. These findings suggest that surgeons perceive ChatGPT to be reliable in conveying medical knowledge and providing professional advice, although it may not always meet the highest standards of professional medical communication. This study revealed that patients were highly satisfied with the responses provided by ChatGPT (mean score 4.99), indicating its considerable potential to improve patients' access to information. In contrast, surgeon satisfaction with ChatGPT responses was rated at 4.72, which was lower than patient satisfaction. The samples of doctors and patients included in this study are sufficient to represent these two groups. There is no difference in satisfaction between the two groups. ChatGPT, as a cutting-edge AI technology, has shown great potential in the field of medical communication [ 40 ]. In the field of epilepsy, Wu et al. reported that ChatGPT offered “accurate and thorough” responses to more than 50% of inquiries [ 41 ]. A dentistry study reported that ChatGPT can significantly increase the efficiency and quality of dental telemedicine [ 42 ]. Similarly, Bahar et al. reported that ChatGPT accurately and satisfactorily answered more than 90% of endometriosis questions [ 43 ]. In the field of orthopedic research, there are numerous studies related to ChatGPT. Aleksander et al. reported that satisfactory AI responses in THA patients require only minimal clarification [ 35 ]. In the specialized field of arthroplasty, ChatGPT has been shown to provide precise and detailed responses, which help alleviate patients' preoperative anxiety and enhance their understanding of the surgical procedure [ 27 , 36 ]. However, the limitations of these studies include their failure to collect questions addressing the real needs of TKA patients, their lack of response scores from orthopedic surgeons with different levels of experience, and their lack of evidence regarding surgeons' opinions about the use of the ChatGPT to answer patient questions. Therefore, we conducted this study to investigate the effectiveness of ChatGPT in patient communication. In contrast to previous studies, we gathered many real questions from patients and sent questionnaires to surgeons at various levels in different hospitals to obtain their ratings of the ChatGPT responses. We also introduced a readability score to analyze the readability of these responses. At the end of the questionnaire, surgeons' recommendations for ChatGPT responses were analyzed. Most surgeons reported that the responses were satisfactory and accurate responses, but some mentioned the need for more detailed information, a review of information sources, and personalized medicine. Moreover, analysis of the open-ended feedback from surgeons indicated that a portion of respondents (∼15%) perceived the ChatGPT responses as sometimes being too general or lacking specific details tailored to individual patient circumstances, such as precise timelines for recovery or information on institution-specific protocols. This feedback indicated that there are still shortcomings in ChatGPT responses and areas for further improvement. Furthermore, our correlation analysis revealed that surgeons from higher-level hospitals and those with greater surgical experience rated ChatGPT’s responses lower in terms of satisfaction, accuracy, and professionalism. This may reflect their higher expectations and more critical assessment of the information provided, particularly regarding the lack of personalized medical advice and the failure to cite the most up-to-date clinical protocols. These limitations became apparent when surgeons evaluated responses to complex or context-specific questions, suggesting that while ChatGPT performs well in general patient education, it may fall short in addressing nuanced or highly individualized clinical scenarios. These findings emphasize that although AI shows promise in healthcare, its effectiveness depends on integration with healthcare professionals' knowledge and experience [ 44 ]. This finding may reflect higher expectations rather than lower intrinsic quality. Although ChatGPT's information is generally valid, accurate, and professional, it can be affected by a variety of factors. Our study revealed that the level of the hospital and the experience of the surgeon are significant factors affecting the surgeons' assessment of the ChatGPT responses. The evaluation criteria for accuracy and professionalism were explicitly defined in the survey instrument. Nonetheless, the subjective nature of such ratings must be acknowledged, as surgeons from different hospital tiers and with varying surgical volumes may hold divergent expectations and benchmarks for what constitutes “accurate” or “professional” content. We believe that this variability is due to differences in expertise, medical experience, background, and the application of medical technology among surgeons. Surgeons at top hospitals often have greater expertise, more experience, more advanced education, advanced equipment, extensive case databases, and multidisciplinary support. This variability highlights the importance of contextual and experiential factors in shaping clinicians’ acceptance of AI-generated information. In Mohammad Hosseini's study, significant differences in acceptance and willingness to use ChatGPT were observed between trainees (e.g., students, residents, and fellows) and faculty (e.g., professors and senior fellows). Varying levels of knowledge about ChatGPT and other LLMs may lead to different levels of acceptance of these technologies. An individual's academic background and professional training also influence his or her acceptance of ChatGPT [ 45 ]. Given the importance of the hospital and surgeon levels, future research should assess the attitudes of patients and surgeons towards AI tools and explore how these tools can be effectively integrated into existing surgeon–patient communication processes [ 43 ]. A readability analysis was conducted, and the results indicated that most of the responses were suitable for individuals with a reading level of 9th grade or higher. After we instructed ChatGPT to provide answers in an easy-to-understand manner, the reading difficulty significantly decreased. ChatGPT responses were also found to be tailorable according to the educational background of patients, providing answers suitable for their comprehension. Additionally, ChatGPT can provide multiple response options for a single question, thereby addressing different aspects of patients' needs. This flexibility in generating diverse responses is a notable feature that has been overlooked by most researchers. This study has several limitations. First, the participating surgeons were all from Grade 2A or higher hospitals, which are the only facilities that are qualified to perform TKA in China. Consequently, the perspectives of surgeons in primary care settings remain unexplored. The single-center design, small surgeon sample size (n = 81), and exclusion of low-literacy patients may limit the applicability of the findings. Second, the patient cohort generally had low education levels, as most patients undergoing TKA are older and grew up in an era with limited access to compulsory education. This may have influenced their understanding and evaluation of the responses provided by ChatGPT. Third, only the top 10 most frequently asked questions were analyzed, leaving less common but potentially critical concerns unaddressed. Fourth, the study utilized ChatGPT-3.5 rather than the more advanced GPT-4 architecture, potentially limiting response quality. Fifth, cultural and healthcare system specificity may affect generalizability, as the study was conducted exclusively within China's healthcare framework. Finally, temporal limitations exist: AI models evolve rapidly, and responses generated in April 2024 may not reflect current capabilities. Longitudinal studies are needed to assess performance consistency across model updates. Despite these constraints, the questions analyzed are believed to represent primary perioperative concerns. This work has practical applications for AI in orthopedic practice: 1. ChatGPT can provide standardized responses to common patient questions, reducing the repetitive workload of clinicians; 2. The integration of AI tools can save approximately 12–15 min per patient consultation, allowing surgeons to focus on complex decision-making and personalized care; 3. AI responses ensure consistent baseline information delivery across healthcare settings, which is particularly valuable in regions with variable specialist access. Implementation could involve embedding AI response systems in patient portals with physician oversight mechanisms to maintain quality control while enhancing communication efficiency.

Conclusions

Overall, questionnaire collection identifies the clinical needs of patients. The application of ChatGPT in TKA patient communication has yielded positive initial results. Additionally, many surgeons are willing to use ChatGPT to answer most of their patients' questions. Although there are differences in satisfaction and accuracy ratings among surgeons working in hospitals of different levels and with different titles, overall, surgeons are highly satisfied with the accuracy of ChatGPT-generated responses to patient questions. Patient satisfaction with the responses was greater, likely because of the comprehensive and understandable nature of the information provided by ChatGPT. It is expected that ChatGPT will become an important tool for medical communication and to improve the quality of patient education and communication.

Introduction

Total knee arthroplasty (TKA) is a common surgery with a growing demand in countries such as China and the U.S. [ 1 – 7 ]. Many patients have numerous questions and concerns about the surgical and recovery processes [ 8 , 9 ]. Effective communication reduces anxiety and improves surgical outcomes [ 10 – 12 ]. While healthcare professionals offer expert advice, time constraints can hinder the provision of comprehensive guidance. Therefore, patients need other ways to acquire information about the surgery. Although patients increasingly seek health information online [ 13 , 14 ], internet sources often provide inaccurate information [ 15 , 16 ], and specialized medical content often requires high reading proficiency[ 17 , 18 ]. In addition to traditional websites, social media and video platforms are becoming channels for accessing medical information, but most of the content comes from nonmedical professionals [ 19 – 21 ]. The diverse sources result in variable content quality, which makes it difficult for patients to distinguish between inaccurate and expert information [ 22 ]. In addition, the average American reads at the 6th–9th grade level, whereas most patient education materials are written at the 9th–11th grade level [ 23 ]. Many patients have poor health literacy skills and face difficulties with reading, writing, arithmetic, communication, and increasingly electronic technology, which impedes their access to and comprehension of healthcare information. Therefore, it is important to introduce new tools to facilitate patient communication. Recently, artificial intelligence (AI) has become a key resource for medical information[ 24 , 25 ]. Search engines integrate AI such as Chat-Generative Pretrained Transformer (ChatGPT) and other large language models (LLMs) into browsers, focusing on AI-generated summaries [ 25 – 28 ]. The emergence of LLMs such as ChatGPT offers potential solutions. ChatGPT has demonstrated capabilities in medical consultation and patient education [ 29 – 31 ], with promising diagnostic accuracy in orthopedics [ 32 – 34 ]. However, existing studies have critical limitations: most studies evaluate ChatGPT- generated responses without assessing real patient questions [ 35 – 39 ], stakeholder acceptance, or the readability of the responses, which is a crucial factor for patient comprehension. Therefore, the aims of this study were as follows: 1. to generate responses to authentic patient concerns regarding TKA; 2. to evaluate surgeon acceptance of the ChatGPT responses and the determinants of this acceptance; and 3. to assess patient satisfaction with readability-adjusted responses. Our study uniquely evaluates ChatGPT’s utility by incorporating real patient questions, surgeon feedback across hospital tiers, and readability metrics—gaps that were previously unaddressed in the literature.

Supplementary Material

Below is the link to the electronic supplementary material. Supplementary Material 1 Supplementary Material 1 Supplementary Material 2 Supplementary Material 2 Supplementary Material 3 Supplementary Material 3 Supplementary Material 4 Supplementary Material 4

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: pmc-nxml ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-06-29T06:08:12.325296+00:00
unpaywall: last seen: 2026-05-21T05:10:58.409756+00:00

License: CC-BY-NC-ND-4.0