Invisible Text Injection: The Trojan Horse of AI-Assisted Medical Peer Review

doi:10.1101/2025.07.24.25332148

Invisible Text Injection: The Trojan Horse of AI-Assisted Medical Peer Review

2025 · doi:10.1101/2025.07.24.25332148

preprint OA: closed

📄 Open PDF Full text JSON View at publisher

Full text 4,479 characters · extracted from oa-doi-fallback · 4 sections · click to expand

Abstract

Question Are large language models robust against adversarial attacks in medical peer review? Findings In this factorial experimental study, invisible text injection attacks significantly increased review scores and raised manuscript acceptance rates from 0% to nearly 100%, while also significantly impairing the ability of large language models to detect scientific flaws. Meaning Enhanced safeguards and human oversight are essential prerequisite for using large language models in medical peer review. Importance Large language models (LLMs) are increasingly considered for medical peer review. However, their vulnerability to adversarial attacks and ability to detect scientific flaws remain poorly understood.

Objective

Evaluate LLMs’ ability to identify scientific flaws in peer review and their robustness against invisible text injection (ITI). Design, Setting, and Participants This factorial experimental study was conducted in May 2025 using a 3 LLMs × 3 prompt strategies × 4 manuscript variants x 2 with/without ITI design. We used three commercial LLMs (Anthropic, Google, OpenAI). The four manuscript variants either contained no flaws (control) or included scientific flaws in the methodology, results, or discussion section, respectively. Three prompt strategies were evaluated: neutral peer review, strict guidelines emphasizing objectivity, and explicit rejection. Interventions ITI involved inserting concealed instructions using white text on white background, directing LLMs to review with positive evaluations and “accept without revision” recommendations. Main Outcomes and Measures Primary outcomes were review scores (1-5 scale) and acceptance rates under neutral prompts. Secondary outcomes were review scores, acceptance rates under strict and explicit reject prompts. We investigated flaw detection capability using liberal (detect any flaw) and stringent (detect all flaw) criteria. We calculated mean score differences by models and prompt types and used t-test and Fisher’s exact test for calculating P-value.

Results

ITI caused significant score inflation under neutral prompts. Score differences for Anthropic, Google and OpenAI were 1.0 (P<.001), 2.5 (P<.001) and 1.7 (P<.001). Acceptance rates increased from 0% to 99.2%-100% across all providers (P<.001). Score differences were still statistically significant under strict prompting. Score differences were not significant under explicit rejection prompting, but flaw detection rate was still impaired. Using liberal detection criteria, results section flaw detection rate was significantly compromised with ITI, particularly in Google (88.9% to 47.8%, P<.001). Stringent criteria revealed methodology detection falling from 56.3% to 25.6% (P<.001) and overall detection dropping from 18.9% to 8.5% (P<.001).

Conclusions

and Relevance ITI can significantly alter the evaluation of medical studies by LLMs, and mitigation at the prompt level is insufficient. Enhanced safeguards and human oversight are essential prerequisites for the application of LLMs in medical publishing. Competing Interest Statement The authors have declared no competing interest. Funding Statement This study did not receive any funding Author Declarations I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained. Yes I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals. Yes I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance). Yes I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable. Yes Data Availability All data produced in the present study are available upon reasonable request to the authors

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: oa-doi-fallback ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00