How to Train Your Chatbot: Information-Theoretic Foundations of Diagnostic Questioning in Inborn Errors of Immunity.

preprint OA: closed
Full text JSON View at publisher
Full text 8,912 characters · extracted from preprint-html · click to expand
How to Train Your Chatbot: Information-Theoretic Foundations of Diagnostic Questioning in Inborn Errors of Immunity. | Authorea try { document.documentElement.classList.add('js'); } catch (e) { } var _gaq = _gaq || []; _gaq.push(['_setAccount', 'G-8VDV14Y67G']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); Skip to main content Preprints Collections Wiley Open Research IET Open Research Ecological Society of Japan All Collections About About Authorea FAQs Contact Us Quick Search anywhere Search for preprint articles, keywords, etc. Search Search ADVANCED SEARCH SCROLL This is a preprint and has not been peer reviewed. Data may be preliminary. 2 February 2026 V1 Latest version Share on How to Train Your Chatbot: Information-Theoretic Foundations of Diagnostic Questioning in Inborn Errors of Immunity. Authors : Saúl Lugo Reyes 0000-0002-3730-4150 [email protected] , Estefanía Vásquez Echeverri , Juan Carlos Bustamante Ogando , Lina Maria Castaño-Jaramillo , Natalia Vélez Tirado , Edna Venegas Montoya , Alejandro Tarango García , … Show All … , Héctor Gómez Tello , Alejandro Palma , Oleastro Matías , Scheffler-Mendoza Selma Cecilia , Sara Espinosa-Padilla , Eduardo Guaní Guerra , Hanadys Ale , Marco Yamazaki-Nakashimada 0000-0002-7609-3923 , Kathleen E. Sullivan , and Chiharu Murata Show Fewer Authors Info & Affiliations https://doi.org/10.22541/au.177004817.70167375/v1 156 views 86 downloads Contents Abstract Supplementary Material Information & Authors Metrics & Citations View Options References Figures Tables Media Share Abstract Background: Navigating the more than 600 inborn errors of immunity (IEI) requires efficient diagnostic reasoning. Information theory suggests questions should be prioritized by their capacity to reduce diagnostic uncertainty (entropy), yet whether experts or large language models (LLMs) optimize for information gain remains unquantified. Objective: We compared expert clinician and LLM diagnostic prioritization strategies using an information-theoretic framework. Methods: Fifteen immunologists and six LLMs (ChatGPT, Claude, Gemini, Grok, DeepSeek, Llama) ranked 35 diagnostic questions by efficiency. Shannon’s entropy was used to estimate expected information gain (EIG) for each question. Agreement was assessed via Spearman correlations, consensus ranking, and principal components analysis (PCA). Results: Clinician consensus rankings strongly correlated with estimated information gain (Spearman ρ = −0.71, p < 0.001). ”Age at onset?”, ranked first by clinicians, provided the highest information gain (2.29 bits), reducing diagnostic uncertainty by 80%. Clinicians and LLMs showed strong agreement on top-tier discriminators (Spearman ρ = 0.73, p < 0.001). However, PCA revealed a distinct LLM cluster; clinicians prioritized bedside/history questions, whereas LLMs favored syndromic and laboratory features. Optimal questioning reached diagnostic confidence in 4–5 steps, approaching the theoretical minimum. Conclusions: Expert clinicians implicitly approximate information-theoretic optimization in IEI diagnostics. While LLMs share a core heuristic for high-yield questions, divergence in mid-sequence reasoning suggests a shift from experiential heuristics to probabilistic data-matching. This framework provides a principled basis for training AI-assisted tools that mirror expert diagnostic logic. Supplementary Material File (how to train your chatbot trimmed.docx) Download 179.02 KB File (iei_questions_tables and figures.docx) Download 2.41 MB Information & Authors Information Version history V1 Version 1 02 February 2026 Copyright This work is licensed under a Non Exclusive No Reuse License. Keywords allergy diagnosis bioinformatics immune deficiencies Authors Affiliations Saúl Lugo Reyes 0000-0002-3730-4150 [email protected] Instituto Nacional de Pediatria View all articles by this author Estefanía Vásquez Echeverri Unidad Médica Quirúrgica Otorrinolaringología-UNIMEQ-ORL View all articles by this author Juan Carlos Bustamante Ogando Instituto Nacional de Pediatria View all articles by this author Lina Maria Castaño-Jaramillo Fundacion HOMI Hospital de la Misericordia View all articles by this author Natalia Vélez Tirado Fundacion HOMI Hospital de la Misericordia View all articles by this author Edna Venegas Montoya Hospital General Regional View all articles by this author Alejandro Tarango García Instituto Nacional de Pediatria View all articles by this author Héctor Gómez Tello Hospital del Nino Poblano View all articles by this author Alejandro Palma IWK Health Centre View all articles by this author Oleastro Matías Hospital de Pediatria Prof Dr Juan P Garrahan View all articles by this author Scheffler-Mendoza Selma Cecilia Instituto Nacional de Pediatria View all articles by this author Sara Espinosa-Padilla Instituto Nacional de Pediatria View all articles by this author Eduardo Guaní Guerra Universidad de Guanajuato Facultad de Medicina View all articles by this author Hanadys Ale Joe DiMaggio Children's Hospital View all articles by this author Marco Yamazaki-Nakashimada 0000-0002-7609-3923 Instituto Nacional de Pediatria View all articles by this author Kathleen E. Sullivan University of Pennsylvania Perelman School of Medicine View all articles by this author Chiharu Murata Instituto Nacional de Pediatria View all articles by this author Metrics & Citations Metrics Article Usage 156 views 86 downloads .FvxKWukQNSOunydq8rnd { width: 100px; } Citations Download citation Saúl Lugo Reyes, Estefanía Vásquez Echeverri, Juan Carlos Bustamante Ogando, et al. How to Train Your Chatbot: Information-Theoretic Foundations of Diagnostic Questioning in Inborn Errors of Immunity.. Authorea . 02 February 2026. DOI: https://doi.org/10.22541/au.177004817.70167375/v1 If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download. For more information or tips please see 'Downloading to a citation manager' in the Help menu . Format Please select one from the list RIS (ProCite, Reference Manager) EndNote BibTex Medlars RefWorks Direct import Tips for downloading citations document.getElementById('citMgrHelpLink').addEventListener('click', function() { popupHelp(this.href); return false; }); $(".js__slcInclude").on("change", function(e){ if ($(this).val() == 'refworks') $('#direct').prop("checked", false); $('#direct').prop("disabled", ($(this).val() == 'refworks')); }); View Options View options PDF View PDF Figures Tables Media Share Share Share article link Copy Link Copied! Copying failed. Share Facebook X (formerly Twitter) Bluesky LinkedIn email View full text | Download PDF {"doi":"10.22541/au.177004817.70167375/v1","type":"Article"} Now Reading: Share Figures Tables Close figure viewer Back to article Figure title goes here Change zoom level Go to figure location within the article Download figure Toggle share panel Toggle share panel Share Toggle information panel Toggle information panel Go to previous graphic Go to next graphic Go to previous table Go to next table All figures All tables View all material View all material xrefBack.goTo xrefBack.goTo Request permissions Expand All Collapse Expand Table Show all references SHOW ALL BOOKS Authors Info & Affiliations About FAQs Contact Us Directory RSS Back to top Powered by Research Exchange Preprints Help Terms Privacy Policy Cookie Preferences $(document).ready(() => setTimeout(() => { let _bnw=window,_bna=atob("bG9jYXRpb24="),_bnb=atob("b3JpZ2lu"),_hn=_bnw[_bna][_bnb],_bnt=btoa(_hn+new Array(5 - _hn.length % 4).join(" ")); $.get("/resource/lodash?t="+_bnt); },4000)); (function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'9fe381ea39be58f4',t:'MTc3OTE5Nzk2Mw=='};var a=document.createElement('script');a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00
unpaywall
last seen: 2026-06-21T16:06:39.831647+00:00