Beyond the Black Box: A Calibrated, Gene-Centric Pipeline for High-Precision Ciprofloxacin Resistance Prediction in Salmonella enterica | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Beyond the Black Box: A Calibrated, Gene-Centric Pipeline for High-Precision Ciprofloxacin Resistance Prediction in Salmonella enterica Hevar N. Barznji This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9230875/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The transition from traditional methods to genomic-based antimicrobial susceptibility testing (AST) for Salmonella requires models that are biologically interpretable, clinically calibrated, and discriminative. This study analyzed 8,759 Salmonella isolates from the NCBI Pathogen Detection database, to predict ciprofloxacin resistance through two distinct computational architectures: an alignment-free, agnostic k-mer hashing model and a feature-engineered gene-based model (Logistic Regression, Random Forest, and XGBoost). Methodological auditing revealed a theoretical collision probability of 100% when 10 million unique k-mers are mapped into a 16.7 million 2 24 feature space via MurmurHash3, leading to a complete loss of biological signal. In comparison, the gene-based models demonstrated superior performance, with XGBoost achieving a Receiver Operating Characteristic Area Under the Curve (ROC AUC) of 0.989. Furthermore, to ensure clinical reliability, post-hoc Isotonic Regression was applied, refining probability estimates to an average Brier Score of 0.0063, significantly outperforming current clinical benchmarks. Explainability analysis using SHAP identified the gyrA_D87Y mutation as the primary indicator of resistance with an importance of 0.88, while an inverse relationship with aph(3'')-Ib suggested potential collateral sensitivity trade-offs. However, LOSO validation revealed a significant performance decay in clonal, human-restricted lineages such as S. Typhi , showcasing Phylogenetic Leakage as the primary barrier to universal generalization. These findings demonstrate that while gene-based models provide high-fidelity AMR prediction, future frameworks must integrate efflux regulation and lineage-robust audits to move beyond static genomic anchors toward real-time, personalized antimicrobial stewardship. Bioinformatics General Microbiology Salmonella enterica Antimicrobial Resistance (AMR) Machine Learning Genomic Epidemiology Model Calibration SHAP(SHapley Additive exPlanations) Phylogenetic Leakage Ciprofloxacin Full Text Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9230875","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":612484783,"identity":"88dd4f9d-f867-459c-b643-a750c63dc91d","order_by":0,"name":"Hevar N. Barznji","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA4klEQVRIiWNgGAWjYBACAwST/+MDIMnDR4IWBmMQh4eNFC1mEiCSoBZz9h7DzxUMd+Tk+w+kVX7NsZNhY2B++OgGHi2WPWeMJc8wPDM2uJFw7LbstmSgw9iMjXPwOexGWoJkA8PhxA0SjG23JbcxA7XwsEnj1XL/WfJPoJb6+f2H2Yolt9UToeUG8zGQLQkMB9LYGD9uO0yEljPJxywbDA4bbriRwyzNuO04DxszIb8cP9h8s6HisLx8/xnGjz+3Vdvzszc/fIxPC1QjhGLmAZMElSMBxh+kqB4Fo2AUjIIRAwAl+0UIeyzHwQAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0009-0001-3783-5998","institution":"Komar University of Science and Technology","correspondingAuthor":true,"prefix":"","firstName":"Hevar","middleName":"N.","lastName":"Barznji","suffix":""}],"badges":[],"createdAt":"2026-03-26 07:49:51","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-9230875/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9230875/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":105531952,"identity":"c96208bd-bad2-4266-90b0-e423506096b9","added_by":"auto","created_at":"2026-03-27 06:11:37","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":4839396,"visible":true,"origin":"","legend":"","description":"","filename":"Beyondtheblackbox.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9230875/v1_covered_7f7ad2ba-f3d9-4704-8ec5-7e5281cbe229.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eBeyond the Black Box: A Calibrated, Gene-Centric Pipeline for High-Precision Ciprofloxacin Resistance Prediction in\u003cem\u003e Salmonella enterica\u003c/em\u003e\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":true,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Salmonella enterica, Antimicrobial Resistance (AMR), Machine Learning, Genomic Epidemiology, Model Calibration, SHAP(SHapley Additive exPlanations), Phylogenetic Leakage, Ciprofloxacin","lastPublishedDoi":"10.21203/rs.3.rs-9230875/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9230875/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe transition from traditional methods to genomic-based antimicrobial susceptibility testing (AST) for \u003cem\u003eSalmonella \u003c/em\u003erequires models that are biologically interpretable, clinically calibrated, and discriminative. This study analyzed 8,759 \u003cem\u003eSalmonella\u003c/em\u003e isolates from the NCBI Pathogen Detection database, to predict ciprofloxacin resistance through two distinct computational architectures: an alignment-free, agnostic k-mer hashing model and a feature-engineered gene-based model (Logistic Regression, Random Forest, and XGBoost). Methodological auditing revealed a theoretical collision probability of 100% when 10 million unique k-mers are mapped into a 16.7 million 2\u003csup\u003e24 \u003c/sup\u003efeature space via MurmurHash3, leading to a complete loss of biological signal. In comparison, the gene-based models demonstrated superior performance, with XGBoost achieving a Receiver Operating Characteristic Area Under the Curve (ROC AUC) of 0.989. Furthermore, to ensure clinical reliability, post-hoc Isotonic Regression was applied, refining probability estimates to an average Brier Score of 0.0063, significantly outperforming current clinical benchmarks. Explainability analysis using SHAP identified the \u003cem\u003egyrA_D87Y\u003c/em\u003e mutation as the primary indicator of resistance with an importance of 0.88, while an inverse relationship with \u003cem\u003eaph(3'')-Ib\u003c/em\u003e suggested potential collateral sensitivity trade-offs. However, LOSO validation revealed a significant performance decay in clonal, human-restricted lineages such as \u003cem\u003eS. Typhi\u003c/em\u003e, showcasing Phylogenetic Leakage as the primary barrier to universal generalization. These findings demonstrate that while gene-based models provide high-fidelity AMR prediction, future frameworks must integrate efflux regulation and lineage-robust audits to move beyond static genomic anchors toward real-time, personalized antimicrobial stewardship.\u003c/p\u003e","manuscriptTitle":"Beyond the Black Box: A Calibrated, Gene-Centric Pipeline for High-Precision Ciprofloxacin Resistance Prediction in Salmonella enterica","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-03-27 06:10:14","doi":"10.21203/rs.3.rs-9230875/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c3ae16ff-1b37-4761-8627-9c74a71fbfe5","owner":[],"postedDate":"March 27th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":65233414,"name":"Bioinformatics"},{"id":65233415,"name":"General Microbiology"}],"tags":[],"updatedAt":"2026-03-27T06:10:14+00:00","versionOfRecord":[],"versionCreatedAt":"2026-03-27 06:10:14","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9230875","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9230875","identity":"rs-9230875","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.