Comparing ChatGPT and physicians’ answers to endometriosis questions on Reddit: A blind expert evaluation

article OA: hybrid CC0
AI-generated summary by claude@2026-06, 2026-06-06

ChatGPT's endometriosis answers were rated higher than physicians' by experts for clarity, empathy, and clinical adherence, though both had potentially dangerous responses requiring supervision.

One-sentence paraphrase of the abstract; not a substitute for reading it. No clinical advice. How this works

Abstract

• ChatGPT outperformed Reddit physicians on multiple expert-rated criteria, particularly in terms of clarity, empathy, and adherence to clinical recommendations. • Non-negligible proportion of responses considered potentially dangerous by experts underscores the need for cautious use and appropriate supervision of such technologies. • The study supports physician-led evaluation of AI tools in clinical care. To compare the perceived quality, safety, and relevance of ChatGPT responses to those provided by verified physicians on Reddit, a large online discussion platform, in response to questions related to endometriosis. We selected 30 endometriosis-related questions posted on Reddit’s r/AskDocs forum, each answered by a verified physician. Using the same question prompts, ChatGPT (GPT-3.5) generated matched-length responses. Responses were anonymized, randomized (A/B format), and assessed blindly by three university-affiliated physicians using a 11-item Likert-scale questionnaire covering medical accuracy, safety, clarity, empathy, and alignment with best practices. Evaluators also indicated which response they considered most pertinent and whether they suspected AI authorship. ChatGPT responses were rated significantly higher than physicians’ responses on most items, including medical coherence (mean 3.89 ± 0.89 vs. 3.08 ± 0.92), clarity (3.93 ± 0.95 vs. 3.04 ± 0.99), and empathy (3.91 ± 0.93 vs. 2.76 ± 1.09), all with p-values < 0.001. Experts selected ChatGPT as the most pertinent response in 63.3 % of cases. A substantial proportion of responses from both sources were considered potentially dangerous by at least one expert: 26.7 % for ChatGPT and 60.0 % for physicians (p = 0.019). ChatGPT outperformed Reddit physicians on multiple expert-rated criteria, particularly in terms of clarity, empathy, and adherence to clinical recommendations. However, the non-negligible proportion of responses considered potentially dangerous by experts underscores the need for cautious use and appropriate supervision of such technologies.

My notes (saved in your browser only)

Condition tags

endometriosis

MeSH descriptors

Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Endometriosis Physicians Physicians Physicians Physicians Physicians Physicians Physicians Physicians Physicians

Citation neighborhood

Papers in the corpus that this work cites (lower rings, blue) and that cite this one (upper rings, green). Dot size scales with the paper's in-corpus citation count — bigger dot = more influential within the endo/adeno field. Click a dot to open that paper. [ expand to 2 hops ] — adds papers reached through this work's immediate citers/citees. Heavier; up to 60 extra dots.

References (21)

Source provenance

europepmc
last seen: 2026-06-12T06:13:51.797165+00:00
openalex
last seen: 2026-06-04T00:00:01.174412+00:00
pubmed
last seen: 2026-05-20T00:31:13.175075+00:00
License: CC0 · commercial use OK