Letter to the Editor: Accuracy and reproducibility of ChatGPT's free version answers about endometriosis
letter
OA: closed
CC0
Abstract
We recently read the published article, "Accuracy and reproducibility of ChatGPT's free version answers about endometriosis" by Ozgor et al.1 We would like to thank the authors for their article, and highlight some thoughts we had regarding the conducted study. Searching the internet extensively for frequently asked questions (FAQs) regarding endometriosis was part of the study. The European Society of Human Reproduction and Embryology's (ESHRE) guideline on endometriosis served as the basis for the formulation of the questions. An experienced gynecologist assessed the ChatGPT responses using a point system. The same question was asked twice in order to evaluate ChatGPT's repeatability, and the scores of the same question were compared to ascertain reproducibility. In all, ChatGPT provided comprehensive and precise answers to 91.4% of the FAQs. With 94.1% of the questions properly answered, the symptom and diagnostic category showed the highest accuracy. Only 81.3% of the questions in the therapy category were properly answered, giving it the lowest accuracy. Among the study's limitations are its inability to indicate the entire number of questions examined and its omission of precise information regarding the standards used to score the responses. More details regarding the analysis techniques utilized to ascertain the repeatability and reproducibility rates would have also been beneficial. Additionally, the AI system is based on the available data, so therefore, it is a retrospective analysis. The problem regarding the availability of updated data may be associated with accuracy, and it can also be related to the reproducibility problem when the repeated prompt is applied to the AI system. Future directions for the research could involve increasing the sample size and enlisting more specialists in evaluating the responses given by ChatGPT. It would also be helpful to see how well ChatGPT performs when tested with a wider variety of queries, and how well it can deliver correct and customized information based on certain situations. Furthermore, as the study found that ChatGPT's responses in the therapy category were inaccurate, efforts may be taken to strengthen this area. The choice to follow a fair and moral code ultimately rests with each user of an AI system.2 The authors declare no conflicts of interest.
My notes (saved in your browser only)
Condition tags
MeSH descriptors
Citation neighborhood (no data yet)
We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.
References (2)
- W4378639096 via openalex
- W4389900324 via openalex
Source provenance
- europepmc
- last seen: 2026-07-03T06:58:25.718087+00:00
- openalex
- last seen: 2026-06-10T17:14:06.276822+00:00
- pubmed
- last seen: 2026-07-03T06:56:46.671706+00:00
- unpaywall
- last seen: 2026-07-02T06:27:13.117293+00:00
License: CC0
· commercial use OK