HitSV: Maximizing discovery of structural variants across sequencing technologies

preprint OA: closed
Full text JSON View at publisher
Full text 11,037 characters · extracted from preprint-html · click to expand
HitSV: Maximizing discovery of structural variants across sequencing technologies | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article HitSV: Maximizing discovery of structural variants across sequencing technologies Yadong Wang, Gaoyang Li, Yadong Liu, Bo Liu, Long Qian This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8913458/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Structural variants (SVs) are a major source of genomic diversity, yet their discovery remains challenging due to repetitive genomic contexts, alignment ambiguity, and the trade-off between sequencing cost and read length. Here we introduce HitSV, which substantially improves SV discovery by implementing repetitiveness and signature density aware breakpoint recognition coupled with precise haplotype-resolved local assembly, thereby enabling base-resolution SV reconstruction and genotyping across various sequencing technologies. HitSV is 12-68% (long-read), 3%-36% (short-read) and 13% (hybrid-sequencing), respectively, more accurate than state-of-the-art SV callers across different coverages. Applying HitSV to the 1KGP Phase 4 cohort, we identified 31.5% more SVs, substantially reshaping allele-frequency landscapes. Notably, analysis of a large Chinese long-read cohort uncovers tandem repeat–mobile element composite arrays as a prevalent and multi-allelic class of complex SVs, highlighting composite repeat architectures as a fundamental hallmark of human genomes. Biological sciences/Genetics/Genome/Genetic variation/Structural variation Biological sciences/Computational biology and bioinformatics/Software Full Text Additional Declarations There is NO Competing Interest. Supplementary Files HitSVSupplementary.pdf Supplementary Figures and Notes Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8913458","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":593791790,"identity":"780d0159-2ffe-4f5b-9841-ad692d4891ad","order_by":0,"name":"Yadong Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAx0lEQVRIiWNgGAWjYFCCBBBhk8AGonhI0JKWwMZGopbDCQxEazE4nmP4ueDX+Tw++QbGB2/bGOTNCWo588ZYembf7WKgw5gN57YxGO5sIKTlRu4Gad6e24ltbAxs0rxtDAkGBwhr2fybt+ccSAv7b2K1bJPm+XEAbAszUVokz7z/Zs3bkAzUktgsOeechOEGQlr4jqcl3+b5Y5c4v/nwwQ9vymzkCdqiAFLA2AZiMjYACQkC6oFAHqSO4Q9hhaNgFIyCUTCCAQA5CEDDlOsZfQAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0002-0673-8503","institution":"Harbin Institute of Technology","correspondingAuthor":true,"prefix":"","firstName":"Yadong","middleName":"","lastName":"Wang","suffix":""},{"id":593791791,"identity":"9d23f5e1-7ab6-44a9-85e6-99e9549221f6","order_by":1,"name":"Gaoyang Li","email":"","orcid":"","institution":"Harbin Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Gaoyang","middleName":"","lastName":"Li","suffix":""},{"id":593791792,"identity":"6163ff2d-d601-4907-b154-474f68ccc2d6","order_by":2,"name":"Yadong Liu","email":"","orcid":"","institution":"Harbin Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Yadong","middleName":"","lastName":"Liu","suffix":""},{"id":593791793,"identity":"7563df6e-1e7e-4614-9e3c-70576a6585c9","order_by":3,"name":"Bo Liu","email":"","orcid":"","institution":"Harbin Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Bo","middleName":"","lastName":"Liu","suffix":""},{"id":593791794,"identity":"d85b6673-8ead-4f74-97c8-180dfb0f103a","order_by":4,"name":"Long Qian","email":"","orcid":"https://orcid.org/0000-0003-3728-7917","institution":"Peking University","correspondingAuthor":false,"prefix":"","firstName":"Long","middleName":"","lastName":"Qian","suffix":""}],"badges":[],"createdAt":"2026-02-19 03:10:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8913458/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8913458/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":103050351,"identity":"4787a6de-771b-4b9a-9387-c1a0be951d3b","added_by":"auto","created_at":"2026-02-20 07:49:35","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":5519704,"visible":true,"origin":"","legend":"Article File","description":"","filename":"HitSVmanuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8913458/v1_covered_2bd3a889-d877-43ab-a121-b9c0cc14aeb0.pdf"},{"id":103033976,"identity":"ee37dd71-2d4f-4370-81ee-a12baea62afd","added_by":"auto","created_at":"2026-02-20 00:21:57","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":30027492,"visible":true,"origin":"","legend":"Supplementary Figures and Notes","description":"","filename":"HitSVSupplementary.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8913458/v1/4f567e7178c37de2e63f0250.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"HitSV: Maximizing discovery of structural variants across sequencing technologies","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-8913458/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8913458/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Structural variants (SVs) are a major source of genomic diversity, yet their discovery remains challenging due to repetitive genomic contexts, alignment ambiguity, and the trade-off between sequencing cost and read length. Here we introduce HitSV, which substantially improves SV discovery by implementing repetitiveness and signature density aware breakpoint recognition coupled with precise haplotype-resolved local assembly, thereby enabling base-resolution SV reconstruction and genotyping across various sequencing technologies. HitSV is 12-68% (long-read), 3%-36% (short-read) and 13% (hybrid-sequencing), respectively, more accurate than state-of-the-art SV callers across different coverages. Applying HitSV to the 1KGP Phase 4 cohort, we identified 31.5% more SVs, substantially reshaping allele-frequency landscapes. Notably, analysis of a large Chinese long-read cohort uncovers tandem repeat–mobile element composite arrays as a prevalent and multi-allelic class of complex SVs, highlighting composite repeat architectures as a fundamental hallmark of human genomes.","manuscriptTitle":"HitSV: Maximizing discovery of structural variants across sequencing technologies","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-20 00:21:51","doi":"10.21203/rs.3.rs-8913458/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"nature-biotechnology","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"nbt","sideBox":"Learn more about [Nature Biotechnology](http://www.nature.com/nbt/)","snPcode":"","submissionUrl":"","title":"Nature Biotechnology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"eedf787c-f3bc-46cb-9e89-6636d8719f84","owner":[],"postedDate":"February 20th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":63186995,"name":"Biological sciences/Genetics/Genome/Genetic variation/Structural variation"},{"id":63186996,"name":"Biological sciences/Computational biology and bioinformatics/Software"}],"tags":[],"updatedAt":"2026-03-30T09:05:59+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-20 00:21:51","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8913458","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8913458","identity":"rs-8913458","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00