A sensitive and accurate framework for population-scale structural variant discovery and genotyping across sequence types

preprint OA: closed
Full text JSON View at publisher
Full text 12,279 characters · extracted from preprint-html · click to expand
A sensitive and accurate framework for population-scale structural variant discovery and genotyping across sequence types | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article A sensitive and accurate framework for population-scale structural variant discovery and genotyping across sequence types Xin Wang, Guangbao Luo, Li Xiao, Zhangjun Fei This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8773548/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Structural variation underlies much of the phenotypic and evolutionary diversity. However, accurate discovery and genotyping of structural variants (SVs) at population scale remain challenging due to the varied characteristics of sequencing technologies and the complexity of genome architectures. Here, we introduce PSVGT, a unified framework that integrates short- and long-read data, de novo assembled contigs, and chromosome-level assemblies to enable comprehensive SV detection and genotyping across diploid and polyploid genomes. PSVGT employs an integrated signaling module to extract precise insertion and deletion breakpoints, coupled with the ploidy-aware KLOOK clustering algorithm and local depth-adaptive filtering to resolve multi-allelic events and accommodate the uneven coverage characteristic of complex genomic regions. Benchmarking on simulated and real datasets demonstrates that PSVGT consistently outperforms state-of-the-art tools across sequence types, with advantages particularly in complex genomes and low-coverage long-read data. PSVGT fills a critical gap in scalable SV analysis by leveraging underutilized short-read data and enabling robust characterization of SVs across diverse genome architectures–from diploids to polyploids–thereby facilitating population-scale analyses and pan-genome research. Biological sciences/Computational biology and bioinformatics Biological sciences/Genetics Full Text Additional Declarations There is NO Competing Interest. Supplementary Files SupplymentaryTablefinal.xlsx Supplementary Tables SupplementaryFigfinal.pdf Supplementary Figures Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8773548","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":592362513,"identity":"de79c410-0a0c-46df-a358-b13cf8b61c27","order_by":0,"name":"Xin Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABAUlEQVRIiWNgGAWjYDACZgglx8DA2HiAsQHCkyBGizFQSwORWqAgEaSYOC0Gx3mPSfzcUZu+tv0w0JYddfIGB5gP3uZhsMvDpUWymS9NsvfM8dxtZxKBWs6wGW44wJZszcOQXIxLCz8zj9kN3rZjudsOgLS08TBuOMBjJs3DcCCxAYcWNqCWm3/bjqWbnX8I0iJhv+EA/ze8WkC23OZtq0kwuwG2xSARaAsbXi2SzTzmv2XbDhhuuwG0JbEtIXnmYTZjyzkGyTi1GJw/Y2z4tq1O3ux8+sMHH9vqbPuONz+88abCDqcWKDgMoRJABDhyDfCrB4I6gipGwSgYBaNgBAMASt9ba/fQyQYAAAAASUVORK5CYII=","orcid":"","institution":"National Key Laboratory for Germplasm Innovation \u0026 Utilization of Horticultural Crops; Hubei Hongshan Laboratory; College of Horticulture and Forestry Sciences, Huazhong Agricultural University","correspondingAuthor":true,"prefix":"","firstName":"Xin","middleName":"","lastName":"Wang","suffix":""},{"id":592362514,"identity":"89d1b1d2-45ac-4976-8a88-4f0c6b0a7fc9","order_by":1,"name":"Guangbao Luo","email":"","orcid":"","institution":"Huazhong Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Guangbao","middleName":"","lastName":"Luo","suffix":""},{"id":592362515,"identity":"a0f1462e-c385-4536-9c79-51476497fa67","order_by":2,"name":"Li Xiao","email":"","orcid":"","institution":"National Key Laboratory for Germplasm Innovation \u0026 Utilization of Horticultural Crops; Hubei Hongshan Laboratory; College of Horticulture and Forestry Sciences, Huazhong Agricultural University","correspondingAuthor":false,"prefix":"","firstName":"Li","middleName":"","lastName":"Xiao","suffix":""},{"id":592362516,"identity":"3755724a-738d-4b88-84e9-f6d424dce0e8","order_by":3,"name":"Zhangjun Fei","email":"","orcid":"https://orcid.org/0000-0001-9684-1450","institution":"Boyce Thompson Institute, Cornell University","correspondingAuthor":false,"prefix":"","firstName":"Zhangjun","middleName":"","lastName":"Fei","suffix":""}],"badges":[],"createdAt":"2026-02-03 09:12:13","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8773548/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8773548/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":102964156,"identity":"ed6fb402-6592-458d-aa9a-145560b1cd43","added_by":"auto","created_at":"2026-02-19 04:21:37","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1816131,"visible":true,"origin":"","legend":"","description":"","filename":"PSVGTMSfinal.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8773548/v1_covered_6c8298c1-7e7c-4aa2-b686-200bfe248905.pdf"},{"id":102888269,"identity":"e1a56e86-fb6b-4ee2-8b0d-83d3027b41cd","added_by":"auto","created_at":"2026-02-18 03:26:23","extension":"xlsx","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":5980562,"visible":true,"origin":"","legend":"Supplementary Tables","description":"","filename":"SupplymentaryTablefinal.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-8773548/v1/345d31c8a0d8232312b898aa.xlsx"},{"id":102888270,"identity":"3842a156-d8b6-4e64-afb8-843bb1f9cdce","added_by":"auto","created_at":"2026-02-18 03:26:23","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":2407422,"visible":true,"origin":"","legend":"Supplementary Figures","description":"","filename":"SupplementaryFigfinal.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8773548/v1/29ff72bfa6415d72101db12e.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"A sensitive and accurate framework for population-scale structural variant discovery and genotyping across sequence types","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-8773548/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8773548/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eStructural variation underlies much of the phenotypic and evolutionary diversity. However, accurate discovery and genotyping of structural variants (SVs) at population scale remain challenging due to the varied characteristics of sequencing technologies and the complexity of genome architectures. Here, we introduce PSVGT, a unified framework that integrates short- and long-read data, de novo assembled contigs, and chromosome-level assemblies to enable comprehensive SV detection and genotyping across diploid and polyploid genomes. PSVGT employs an integrated signaling module to extract precise insertion and deletion breakpoints, coupled with the ploidy-aware KLOOK clustering algorithm and local depth-adaptive filtering to resolve multi-allelic events and accommodate the uneven coverage characteristic of complex genomic regions. Benchmarking on simulated and real datasets demonstrates that PSVGT consistently outperforms state-of-the-art tools across sequence types, with advantages particularly in complex genomes and low-coverage long-read data. PSVGT fills a critical gap in scalable SV analysis by leveraging underutilized short-read data and enabling robust characterization of SVs across diverse genome architectures\u0026ndash;from diploids to polyploids\u0026ndash;thereby facilitating population-scale analyses and pan-genome research.\u003c/p\u003e","manuscriptTitle":"A sensitive and accurate framework for population-scale structural variant discovery and genotyping across sequence types","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-18 03:26:18","doi":"10.21203/rs.3.rs-8773548/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"f4bbf799-2eae-476a-8e77-4780415639f3","owner":[],"postedDate":"February 18th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":63031079,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":63031080,"name":"Biological sciences/Genetics"}],"tags":[],"updatedAt":"2026-04-02T20:35:27+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-18 03:26:18","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8773548","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8773548","identity":"rs-8773548","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00