Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

preprint OA: closed
Full text JSON View at publisher
Full text 15,457 characters · extracted from preprint-html · click to expand
Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis Can Luo, Yichen Liu, Han Liu, Zhenmiao Zhang, Lu Zhang, Brock Peters, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8408441/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Background: Accurate detection of genetic variants, including single nucleotide polymorphisms (SNPs), small insertions and deletions (INDELs), and structural variants (SVs), is critical for comprehensive genomic analysis. While traditional short-read sequencing performs well for SNP and INDEL detection, it struggles to resolve SVs, especially in complex genomic regions, due to inherent read length limitations. Linked-read sequencing technologies, such as single-tube Long Fragment Read sequencing (stLFR), overcome these challenges by employing molecular barcodes, providing crucial long-range information. Methods: This study investigates traditional pair-end linked-reads and a conceptual extension of linked-read technology: barcoded single-end reads of 500 bp (SE500 stLFR) and 1000 bp (SE1000 stLFR), generated using the single-tube Long Fragment Read (stLFR) platform. Unlike conventional paired-end (PE100 stLFR) linked reads, these longer single-end reads could offer improved resolution for variant detection by leveraging extended read lengths per barcode. To explore the potential of stLFR reads, we developed stLFR-sim, a Python-based simulator that reproduces the stLFR linked-read sequencing workflow to enable realistic simulation and benchmarking of linked-read sequencing data. Using stLFR-sim, we simulated a diverse set of datasets for the HG002 sample using T2T-based realistic genome simulation. Variant detection performance was then systematically assessed across three stLFR configurations: standard PE100 stLFR, SE500 stLFR, and SE1000 stLFR. Results: Benchmarking against the Genome in a Bottle (GIAB) gold standard reveals distinct strengths of each configuration. Extended single-end reads (SE500 stLFR and SE1000 stLFR) significantly enhance SV detection, with SE1000 stLFR providing the best balance between precision and recall. In contrast, the shorter PE100 stLFR reads exhibit higher precision for SNP and INDEL calling, particularly within high-confidence regions, though with reduced performance in low-mappability contexts. To explore optimization strategies, we constructed hybrid libraries combining paired-end and single-end barcoded reads. These hybrid approaches integrate the complementary advantages of different read types, consistently outperforming single libraries across small variant types and genomic contexts. Conclusion: Collectively, our findings offer a robust comparative framework for evaluating stLFR sequencing strategies, highlight the promise of barcoded single-end reads for improving SV detection, and provide practical guidance for tailoring sequencing designs to the complexities of the genome. Linked-read sequencing Sequencing simulation SNP call INDEL call Structural variants call stLFR Full Text Additional Declarations No competing interests reported. Supplementary Files stLFRsimulationpaper20250920revison4.pdf Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8408441","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":572860251,"identity":"d99684c4-c0a3-48fd-9fbc-3a51a3ede559","order_by":0,"name":"Can Luo","email":"","orcid":"","institution":"Vanderbilt University","correspondingAuthor":false,"prefix":"","firstName":"Can","middleName":"","lastName":"Luo","suffix":""},{"id":572860252,"identity":"ab496abb-1ee2-4f92-9e05-91d5dd247b38","order_by":1,"name":"Yichen Liu","email":"","orcid":"","institution":"Vanderbilt University","correspondingAuthor":false,"prefix":"","firstName":"Yichen","middleName":"","lastName":"Liu","suffix":""},{"id":572860253,"identity":"dd7a2537-d7d5-491d-9a5c-7b8182a708e7","order_by":2,"name":"Han Liu","email":"","orcid":"","institution":"Vanderbilt University","correspondingAuthor":false,"prefix":"","firstName":"Han","middleName":"","lastName":"Liu","suffix":""},{"id":572860254,"identity":"d53cd451-e8f2-45a5-b74b-9ddf7d3db2f0","order_by":3,"name":"Zhenmiao Zhang","email":"","orcid":"","institution":"University of California, San Diego","correspondingAuthor":false,"prefix":"","firstName":"Zhenmiao","middleName":"","lastName":"Zhang","suffix":""},{"id":572860255,"identity":"93ff01dd-d763-46df-9022-30131f0104fd","order_by":4,"name":"Lu Zhang","email":"","orcid":"","institution":"Hong Kong Baptist University","correspondingAuthor":false,"prefix":"","firstName":"Lu","middleName":"","lastName":"Zhang","suffix":""},{"id":572860256,"identity":"a8ef32fd-7292-4924-8172-bd5374dcb15c","order_by":5,"name":"Brock Peters","email":"","orcid":"","institution":"Complete Genomics (United States)","correspondingAuthor":false,"prefix":"","firstName":"Brock","middleName":"","lastName":"Peters","suffix":""},{"id":572860257,"identity":"c8021377-eb54-4dcd-b8f3-c2a7328f9102","order_by":6,"name":"Xin Maizie Zhou","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABC0lEQVRIiWNgGAWjYFACxgaGBCBlAGQ9gHB5GA6AWAwHCGthNoBrSUjApwUKgMrZJGBaGPBp0W0/3PzhQQWDvTl7j1k1T80d2X7pswcPJP5gkOO7kYBVi9mZxAaDhDMMzJY9Z8xu8xx7ZjyzLy8B5DBjSVxaDiQ2JCS2MbAZ3MgxuzmD7XDihjM8BiAtiRtwaTn/sOFA4j8GHpCWwhn/Difuh2qpx6nlRmJjQ2IDgwRIC8PHNqAtPBAtCQY4tTxsZkg4JmFgcOZYscTHvsPGM87wAf2SJmE488wDHA5Lf/zxR42NvcHx5o0fEr4dlu3v4T384YONjTzfcey2QIEEESKjYBSMglEwCogHAHi2bAHgaThNAAAAAElFTkSuQmCC","orcid":"","institution":"Vanderbilt University","correspondingAuthor":true,"prefix":"","firstName":"Xin","middleName":"Maizie","lastName":"Zhou","suffix":""}],"badges":[],"createdAt":"2025-12-19 22:38:23","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8408441/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8408441/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":100077798,"identity":"9f7ca83e-05da-434f-97e8-fa028d7d23f8","added_by":"auto","created_at":"2026-01-12 17:51:58","extension":"json","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":9185,"visible":true,"origin":"","legend":"","description":"","filename":"3a7d59bb392a410f9930173111eae593.json","url":"https://assets-eu.researchsquare.com/files/rs-8408441/v1/2e516ab9e4fbc5c5d7ec7b72.json"},{"id":101520144,"identity":"cdfeb9de-ab25-4235-bf11-0bfc9ba169c8","added_by":"auto","created_at":"2026-01-30 16:56:01","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":630217,"visible":true,"origin":"","legend":"","description":"","filename":"stLFRsimulationpaper20250920revison1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8408441/v1_covered_cb5c5bdc-6c37-47e1-91d7-94fdb6ef6f73.pdf"},{"id":100077799,"identity":"ebcd9767-b7cb-4101-ac5a-fe6b565c1b5e","added_by":"auto","created_at":"2026-01-12 17:51:58","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":1117242,"visible":true,"origin":"","legend":"","description":"","filename":"stLFRsimulationpaper20250920revison4.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8408441/v1/86ca48c0dc4b7b190d855b62.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Linked-read sequencing, Sequencing simulation, SNP call, INDEL call, Structural variants call, stLFR","lastPublishedDoi":"10.21203/rs.3.rs-8408441/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8408441/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003e\u003cstrong\u003eBackground: \u003c/strong\u003eAccurate detection of genetic variants, including single nucleotide polymorphisms (SNPs), small insertions and deletions (INDELs), and structural variants (SVs), is critical for comprehensive genomic analysis. While traditional short-read sequencing performs well for SNP and INDEL detection, it struggles to resolve SVs, especially in complex genomic regions, due to inherent read length limitations. Linked-read sequencing technologies, such as single-tube Long Fragment Read sequencing (stLFR), overcome these challenges by employing molecular barcodes, providing crucial long-range information.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eMethods:\u003c/strong\u003e This study investigates traditional pair-end linked-reads and a conceptual extension of linked-read technology: barcoded single-end reads of 500 bp (SE500 stLFR) and 1000 bp (SE1000 stLFR), generated using the single-tube Long Fragment Read (stLFR) platform. Unlike conventional paired-end (PE100 stLFR) linked reads, these longer single-end reads could offer improved resolution for variant detection by leveraging extended read lengths per barcode. To explore the potential of stLFR reads, we developed stLFR-sim, a Python-based simulator that reproduces the stLFR linked-read sequencing workflow to enable realistic simulation and benchmarking of linked-read sequencing data. Using stLFR-sim, we simulated a diverse set of datasets for the HG002 sample using T2T-based realistic genome simulation. Variant detection performance was then systematically assessed across three stLFR configurations: standard PE100 stLFR, SE500 stLFR, and SE1000 stLFR.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eResults:\u003c/strong\u003e Benchmarking against the Genome in a Bottle (GIAB) gold standard reveals distinct strengths of each configuration. Extended single-end reads (SE500 stLFR and SE1000 stLFR) significantly enhance SV detection, with SE1000 stLFR providing the best balance between precision and recall. In contrast, the shorter PE100 stLFR reads exhibit higher precision for SNP and INDEL calling, particularly within high-confidence regions, though with reduced performance in low-mappability contexts. To explore optimization strategies, we constructed hybrid libraries combining paired-end and single-end barcoded reads. These hybrid approaches integrate the complementary advantages of different read types, consistently outperforming single libraries across small variant types and genomic contexts.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConclusion:\u003c/strong\u003e Collectively, our findings offer a robust comparative framework for evaluating stLFR sequencing strategies, highlight the promise of barcoded single-end reads for improving SV detection, and provide practical guidance for tailoring sequencing designs to the complexities of the genome.\u003c/p\u003e","manuscriptTitle":"Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-12 17:51:54","doi":"10.21203/rs.3.rs-8408441/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"0ea5ef7b-e7f2-43e5-b5ad-734542854264","owner":[],"postedDate":"January 12th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-01-30T16:55:29+00:00","versionOfRecord":[],"versionCreatedAt":"2026-01-12 17:51:54","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8408441","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8408441","identity":"rs-8408441","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00