OOD Detectors Are Best Used Runtime Verifiers, Not Semantic Shift Classifiers | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article OOD Detectors Are Best Used Runtime Verifiers, Not Semantic Shift Classifiers Birk Torpmann-Hagen, Pål Halvorsen, Michael Alexander Riegler, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7696794/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Out-of-distribution (OOD) detection is a widely studied problem in machine learn- ing, and involves identifying inputs that are drawn from a different distribution than what a trained network is intended to model. Conventionally, OOD detectors are evaluated in terms of their capabilities for detecting instances where the inputs are semantically distinct from the training data, such as when a network trained on nu- meric digits encounters letters. In this position paper, we contend that this problem setting significantly undersells the true potential of OOD detectors have, namely as runtime verifiers that detect instances of subtle, semantics-preserving shifts in the covariates of the data that nevertheless adversely impact network accuracy. We base this argument on the fact that OOD detectors effectively measure the degree to which a datum has support in the training distribution, and that this is a necessary condition for a neural network to reliably predict correctly. We support our position empirically through a cost-benefit analysis in a polyp segmentation case study, where we compare the expected lifetime costs per-patient in a system utilizing OOD detectors as runtime verifiers, to a conventionally implemented system. Our results show that implementing OOD detectors as runtime verifiers reduces the expected costs per patient by upwards of 40%. Overall, we position OOD detection as a promising candidate towards endowing deep learning systems with the necessary resilience for responsible deployment in high-stakes applications, and encourage a shift in the focus of OOD detection research to this end. Artificial Intelligence and Machine Learning Out of distribution Deep Learning Runtime Verification Distributional Shift Full Text Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7696794","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":519553567,"identity":"afbf03cb-1f74-464b-959f-9b3978677484","order_by":0,"name":"Birk Torpmann-Hagen","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABE0lEQVRIiWNgGAWjYBACA2YGZjQRMFnBwENIiwSaljMMPLj0ABVg08LYxoDTGnN23scGDH8Y6vilzx7d8HPHHaDI8YuPC+cdlrFn4D34AIsWy2Z24wSgmRKSfXlpN3vPPGOw7MkpNp657TDQYXzJBtgcdpiN+QBjA4OEwRkesxu8bYfrNxzISZPmBWvhMZPApQXoMLCWm3/bDjMYnH+T/pt3Dn4tCQxsEC23eUFabqQfY+ZtwK3FspmN2SCxTUJyZg9f2m1ZsJY3zNI8x9J5eA7zGGPzizn/MWaJD39s+Pl5eI/dfAt2WPrDzzw11vbs7T2G2EIMDBLA0QKPBx6o2cw4lCMAXAs7TrNHwSgYBaNgZAIAK5FWrmrVK9kAAAAASUVORK5CYII=","orcid":"","institution":"UiT: The Arctic University of Norway","correspondingAuthor":true,"prefix":"","firstName":"Birk","middleName":"","lastName":"Torpmann-Hagen","suffix":""},{"id":519553568,"identity":"5b02b15a-e2bb-4adc-880a-835a7fee5658","order_by":1,"name":"Pål Halvorsen","email":"","orcid":"","institution":"SimulaMet","correspondingAuthor":false,"prefix":"","firstName":"Pål","middleName":"","lastName":"Halvorsen","suffix":""},{"id":519553569,"identity":"528aa7c1-c6b5-4569-b1a9-0970e4c7f1d8","order_by":2,"name":"Michael Alexander Riegler","email":"","orcid":"","institution":"SimulaMet","correspondingAuthor":false,"prefix":"","firstName":"Michael","middleName":"Alexander","lastName":"Riegler","suffix":""},{"id":519553570,"identity":"e5fd9347-4d76-439b-88b3-88909dd8cf40","order_by":3,"name":"Dag Johansen","email":"","orcid":"","institution":"UiT: The Arctic University of Norway","correspondingAuthor":false,"prefix":"","firstName":"Dag","middleName":"","lastName":"Johansen","suffix":""}],"badges":[],"createdAt":"2025-09-23 17:23:38","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-7696794/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7696794/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":92129405,"identity":"7e5fd215-184b-4e42-8cf0-483e06f8f776","added_by":"auto","created_at":"2025-09-25 02:32:58","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":339387,"visible":true,"origin":"","legend":"","description":"","filename":"NeurIPSOOD.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7696794/v1_covered_f187eed9-8d61-40f2-95ae-3ec87bbb0bd5.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eOOD Detectors Are Best Used Runtime Verifiers, Not Semantic Shift Classifiers\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"University of Tromsø - The Arctic University of Norway","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Out of distribution, Deep Learning, Runtime Verification, Distributional Shift","lastPublishedDoi":"10.21203/rs.3.rs-7696794/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7696794/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eOut-of-distribution (OOD) detection is a widely studied problem in machine learn- ing, and involves identifying inputs that are drawn from a different distribution than what a trained network is intended to model. Conventionally, OOD detectors are evaluated in terms of their capabilities for detecting instances where the inputs are semantically distinct from the training data, such as when a network trained on nu- meric digits encounters letters. In this position paper, we contend that this problem setting significantly undersells the true potential of OOD detectors have, namely as runtime verifiers that detect instances of subtle, semantics-preserving shifts in the covariates of the data that nevertheless adversely impact network accuracy. We base this argument on the fact that OOD detectors effectively measure the degree to which a datum has support in the training distribution, and that this is a necessary condition for a neural network to reliably predict correctly. We support our position empirically through a cost-benefit analysis in a polyp segmentation case study, where we compare the expected lifetime costs per-patient in a system utilizing OOD detectors as runtime verifiers, to a conventionally implemented system. Our results show that implementing OOD detectors as runtime verifiers reduces the expected costs per patient by upwards of 40%. Overall, we position OOD detection as a promising candidate towards endowing deep learning systems with the necessary resilience for responsible deployment in high-stakes applications, and encourage a shift in the focus of OOD detection research to this end.\u003c/p\u003e","manuscriptTitle":"OOD Detectors Are Best Used Runtime Verifiers, Not Semantic Shift Classifiers","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-25 02:24:51","doi":"10.21203/rs.3.rs-7696794/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"9b951788-8691-4438-a45e-3db70f1a2af4","owner":[],"postedDate":"September 25th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":55218937,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2025-09-25T02:24:51+00:00","versionOfRecord":[],"versionCreatedAt":"2025-09-25 02:24:51","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7696794","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7696794","identity":"rs-7696794","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.