Per-Second, Explainable Obstructive Sleep Apnea Detection from Multimodal Time-Series using Vision Transformer | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Per-Second, Explainable Obstructive Sleep Apnea Detection from Multimodal Time-Series using Vision Transformer Hyung-sin Kim, Joopyo Hong, Kunmin Jang, Hyojin Lee, Hyun Keun Ahn, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7300086/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Manual, second-by-second scoring of polysomnography (PSG) is the gold standard for diagnosing obstructive sleep apnea (OSA), yet it is time- and labor-intensive and prone to inter-scorer variability. Existing automated approaches analyze only ≤3 channels and skip second-level annotation, reporting instead the coarse Apnea-Hypopnea Index (AHI) and sacrificing clinical detail and transparency. We present VOSA, a Vision-Transformer (ViT)-based model that reproduces the technologist’s visual workflow: it ingests standardized PSG images containing all 21 biosignals, labels every second as normal, hypopnea, or apnea, computes AHI, and assigns four-level OSA severity while supplying attention heatmaps and calibrated confidence scores. Trained and evaluated on KISS, a PSG image dataset from 7,745 patients across four centers, VOSA achieved a per-second Macro F1 score of 82.6% and a severity Macro F1 score of 73.5%, placing 99.2% of patients in the correct or adjacent severity class. Testing on the public SHHS-2 dataset confirmed robust performance. Attention visualizations demonstrated VOSA’s alignment with AASM guidelines. Coupled with image-based sleep staging, VOSA marks the first attempt at fully automated generation of PSG reports and endotypic metrics, delivering an interpretable, scalable solution for precision sleep-medicine workflows. Health sciences/Health care/Medical imaging Health sciences/Biomarkers/Diagnostic markers Health sciences/Diseases/Neurological disorders/Sleep disorders Biological sciences/Computational biology and bioinformatics Biological sciences/Systems biology Full Text Additional Declarations Yes there is potential Competing Interest. Hyun-Woo Shin is an inventor on patent applications submitted by Seoul National University related to an image-based polysomnography dataset and its application. Hyun-Woo Shin is a founder of OUaR LaB, Inc., serves on the Board of Directors and as a chief executive officer for OUaR LaB, Inc., and owns OUaR LaB Stock, which are subject to certain restrictions under university policy. Supplementary Files supplementary.pdf SUPPLEMENTARY Materials Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7300086","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":509287542,"identity":"141842ee-f0bf-4069-bb58-0ba4169dbd83","order_by":0,"name":"Hyung-sin Kim","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAz0lEQVRIiWNgGAWjYBACAzB5QIKBHyYiQbQWyQYStQAZB4jVYi6RY/bwyxmLaOPzZ49JMNTYMUjOPoBfi+WMHHNjmRsSudtu5KVJMBxLZpDmSyDgsBs5ZtISH0BaeMwkGNgOMMjxEHAYXMvm/jNALf+I1CL5AeiwDQw5ZhKMbQcYpAlqOfOsTJrhjETujBs5xhaJfck8kj2EtBxP3ib541hdbn//GcMbH77ZyUmcIaCFQSCBgRnulAQGBkLOAgL+AwyMPwgrGwWjYBSMgpEMAEQUP4G/62r4AAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0001-8605-5077","institution":"Seoul National University","correspondingAuthor":true,"prefix":"","firstName":"Hyung-sin","middleName":"","lastName":"Kim","suffix":""},{"id":509287543,"identity":"40d2f35e-effb-4c99-a5f6-63a58adcf95a","order_by":1,"name":"Joopyo Hong","email":"","orcid":"https://orcid.org/0009-0006-3514-6896","institution":"Seoul National University","correspondingAuthor":false,"prefix":"","firstName":"Joopyo","middleName":"","lastName":"Hong","suffix":""},{"id":509287544,"identity":"5f4a73c3-841c-44ee-a417-1be995d03f36","order_by":2,"name":"Kunmin Jang","email":"","orcid":"","institution":"Seoul National University","correspondingAuthor":false,"prefix":"","firstName":"Kunmin","middleName":"","lastName":"Jang","suffix":""},{"id":509287545,"identity":"f13beaa1-3890-4261-863c-af198c068238","order_by":3,"name":"Hyojin Lee","email":"","orcid":"","institution":"Seoul National University","correspondingAuthor":false,"prefix":"","firstName":"Hyojin","middleName":"","lastName":"Lee","suffix":""},{"id":509287546,"identity":"9f30a4a2-f3ae-480b-b514-6e2204716072","order_by":4,"name":"Hyun Keun Ahn","email":"","orcid":"https://orcid.org/0009-0001-3962-3972","institution":"Seoul National University College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Hyun","middleName":"Keun","lastName":"Ahn","suffix":""},{"id":509287547,"identity":"8ac530e0-5540-4ad2-a074-9d9cdca75255","order_by":5,"name":"Hyun-Woo Shin","email":"","orcid":"https://orcid.org/0000-0002-4038-9992","institution":"Seoul National University College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Hyun-Woo","middleName":"","lastName":"Shin","suffix":""}],"badges":[],"createdAt":"2025-08-05 11:30:39","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7300086/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7300086/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":90776452,"identity":"8871f712-b9a9-4256-96ea-eeac251ed2ae","added_by":"auto","created_at":"2025-09-08 03:14:40","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2189885,"visible":true,"origin":"","legend":"Article File","description":"","filename":"NatureCommunicationsVOSA.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7300086/v1_covered_8879ec32-dd16-4e15-995a-19248c1b6f5b.pdf"},{"id":90775386,"identity":"3dffca9a-1b14-4bf2-aaf0-b56144733466","added_by":"auto","created_at":"2025-09-08 02:58:36","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":4520744,"visible":true,"origin":"","legend":"SUPPLEMENTARY Materials","description":"","filename":"supplementary.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7300086/v1/e58a763064e1349af695c7fb.pdf"}],"financialInterests":"\u003cb\u003eYes\u003c/b\u003e there is potential Competing Interest.\nHyun-Woo Shin is an inventor on patent applications submitted by Seoul National University related to an image-based polysomnography dataset and its application.\r\nHyun-Woo Shin is a founder of OUaR LaB, Inc., serves on the Board of Directors and as a chief executive officer for OUaR LaB, Inc., and owns OUaR LaB Stock, which are subject to certain restrictions under university policy.","formattedTitle":"Per-Second, Explainable Obstructive Sleep Apnea Detection from Multimodal Time-Series using Vision\r\nTransformer","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7300086/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7300086/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Manual, second-by-second scoring of polysomnography (PSG) is the gold standard for diagnosing obstructive sleep apnea (OSA), yet it is time- and labor-intensive and prone to inter-scorer variability. Existing automated approaches analyze only ≤3 channels and skip second-level annotation, reporting instead the coarse Apnea-Hypopnea Index (AHI) and sacrificing clinical detail and transparency. We present VOSA, a Vision-Transformer (ViT)-based model that reproduces the technologist’s visual workflow: it ingests standardized PSG images containing all 21 biosignals, labels every second as normal, hypopnea, or apnea, computes AHI, and assigns four-level OSA severity while supplying attention heatmaps and calibrated confidence scores. Trained and evaluated on KISS, a PSG image dataset from 7,745 patients across four centers, VOSA achieved a per-second Macro F1 score of 82.6% and a severity Macro F1 score of 73.5%, placing 99.2% of patients in the correct or adjacent severity class. Testing on the public SHHS-2 dataset confirmed robust performance. Attention visualizations demonstrated VOSA’s alignment with AASM guidelines. Coupled with image-based sleep staging, VOSA marks the first attempt at fully automated generation of PSG reports and endotypic metrics, delivering an interpretable, scalable solution for precision sleep-medicine workflows.","manuscriptTitle":"Per-Second, Explainable Obstructive Sleep Apnea Detection from Multimodal Time-Series using Vision\nTransformer","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-08 02:58:32","doi":"10.21203/rs.3.rs-7300086/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"communications-medicine","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"commsmed","sideBox":"Learn more about [Communications Medicine](http://www.nature.com/commsmed)","snPcode":"43856","submissionUrl":"https://mts-commsmed.nature.com/cgi-bin/main.plex","title":"Communications Medicine","twitterHandle":"@commsmedicine","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Communications Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"78ef2ff7-6b31-4062-a237-4c8d24467390","owner":[],"postedDate":"September 8th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":54091485,"name":"Health sciences/Health care/Medical imaging"},{"id":54091486,"name":"Health sciences/Biomarkers/Diagnostic markers"},{"id":54091487,"name":"Health sciences/Diseases/Neurological disorders/Sleep disorders"},{"id":54091488,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":54091489,"name":"Biological sciences/Systems biology"}],"tags":[],"updatedAt":"2026-04-06T09:45:41+00:00","versionOfRecord":[],"versionCreatedAt":"2025-09-08 02:58:32","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7300086","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7300086","identity":"rs-7300086","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.