META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing Louis-Maël Guéguen, Alban Mathieu, Simon Pelletier, Anthony Woo, and 4 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8663341/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 13 You are reading this latest preprint version Abstract Traditional case-control metagenomic studies are constrained by their dependence on taxonomic and functional databases. Because annotation occurs before differential analysis, they are limited to known elements and keep function and taxonomy separate. Although binning strategies have emerged to reconstruct genomes and mitigate this issue, they still require an assembly step, preventing the use of all available sequencing data. Here, we introduce META-DIFF, a pipeline based on differentially abundant k -mers independently of any prior annotation. From those k -mers, it reconstructs longer sequences and provides biological context, as well as the best set of unitigs to discriminate between conditions. In both taxonomy-centric and functionally-centric benchmarks, it showed high precision, robust reproducibility and behaved more conservatively than did common univariate methods. The efficacy of META-DIFF was further validated through its application to a real-world colorectal cancer dataset, which produced both confirmatory and novel results compared with those of previous publications. The pipeline is able to exploit all reads and identify differentially abundant elements, including unknown DNA, prior to annotation. With the guidelines provided, META-DIFF provides users with great exploratory power to unravel microbiome changes. Biological sciences/Computational biology and bioinformatics Biological sciences/Microbiology metagenomics shotgun differential abundance k-mer unitigs Full Text Additional Declarations No competing interests reported. Supplementary Files ScientificReportsArticleMETADIFFsuppdata.docx Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 05 May, 2026 Reviews received at journal 04 May, 2026 Reviews received at journal 20 Apr, 2026 Reviewers agreed at journal 13 Apr, 2026 Reviewers agreed at journal 12 Apr, 2026 Reviewers agreed at journal 12 Apr, 2026 Reviewers agreed at journal 04 Mar, 2026 Reviewers agreed at journal 04 Mar, 2026 Reviewers invited by journal 27 Jan, 2026 Editor invited by journal 27 Jan, 2026 Editor assigned by journal 22 Jan, 2026 Submission checks completed at journal 22 Jan, 2026 First submitted to journal 21 Jan, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8663341","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":581610985,"identity":"369b4ba4-5963-4a47-96cb-6551812a460b","order_by":0,"name":"Louis-Maël Guéguen","email":"","orcid":"","institution":"Centre hospitalier de l'Université Laval","correspondingAuthor":false,"prefix":"","firstName":"Louis-Maël","middleName":"","lastName":"Guéguen","suffix":""},{"id":581610986,"identity":"a88b1e3a-6c5c-4e5c-8cc8-a7e348232c6a","order_by":1,"name":"Alban Mathieu","email":"","orcid":"","institution":"Centre hospitalier de l'Université Laval","correspondingAuthor":false,"prefix":"","firstName":"Alban","middleName":"","lastName":"Mathieu","suffix":""},{"id":581610987,"identity":"8df2a5d2-54f6-40bb-8afb-cb4c5513fcf2","order_by":2,"name":"Simon Pelletier","email":"","orcid":"","institution":"Centre hospitalier de l'Université Laval","correspondingAuthor":false,"prefix":"","firstName":"Simon","middleName":"","lastName":"Pelletier","suffix":""},{"id":581610988,"identity":"b33527cb-7acc-4c19-b33f-7de058e0fa82","order_by":3,"name":"Anthony Woo","email":"","orcid":"","institution":"L'Oréal","correspondingAuthor":false,"prefix":"","firstName":"Anthony","middleName":"","lastName":"Woo","suffix":""},{"id":581610989,"identity":"01861465-f891-4546-b074-c473c18f73af","order_by":4,"name":"Namita Misra","email":"","orcid":"","institution":"L'Oréal","correspondingAuthor":false,"prefix":"","firstName":"Namita","middleName":"","lastName":"Misra","suffix":""},{"id":581610990,"identity":"c506062f-2fcb-46d0-ae56-f004495f6af6","order_by":5,"name":"Magali Moreau","email":"","orcid":"","institution":"L'Oréal","correspondingAuthor":false,"prefix":"","firstName":"Magali","middleName":"","lastName":"Moreau","suffix":""},{"id":581610991,"identity":"48af9d6a-7b96-42c5-a33b-9944e6cbbb52","order_by":6,"name":"Olivier Perin","email":"","orcid":"","institution":"L'Oréal","correspondingAuthor":false,"prefix":"","firstName":"Olivier","middleName":"","lastName":"Perin","suffix":""},{"id":581610992,"identity":"fea13dd5-f2d1-4d25-932d-543c3ceed08d","order_by":7,"name":"Arnaud Droit","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABWklEQVRIie2RP0vDQBTAXwiky9WuV0rNV0gI9A9E/Sp3BDLVLIFOHRoCl6XoGrF+h3YSJxsCdYmZM0ml4FQhIEhBUK9RTGktdBTMj3fH3Xv34x08gIKCP4sEQn8VKT8rWQCQrDLj62C34jj+lkKy6i4FBBfto1Qu74MX1IX6ted6c733IDexMZ49M92qlCYSJr1QlkB8THMFx5ZRQzFowyhw3M7UVm9801avmGlXB4Qr01BlIGl4rU2ElFqZAfUxddzTPiFK0mnwTEhHCXBFMgUGCNYUOULa24/S/lKaPPNBbzPl3TzhirjMFSVCjbyL8N1FLLMJHWGuUKZTrkhrXVSutIcx1nwUOBeDKVFH0ZNdHcaGjSPqtuiZbjBRarRy5ZB/LFl09bpf8mbpskdk5c4Yp4vusVXxwiBJX/HRecmdJ1uDwRt3lO2rYa0Qf5vlJmifRwUFBQX/hU+L9HvPVZKgWwAAAABJRU5ErkJggg==","orcid":"","institution":"Centre hospitalier de l'Université Laval","correspondingAuthor":true,"prefix":"","firstName":"Arnaud","middleName":"","lastName":"Droit","suffix":""}],"badges":[],"createdAt":"2026-01-21 20:38:29","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8663341/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8663341/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":101398484,"identity":"997d1404-9b2c-43b9-97ba-9e386a2fac26","added_by":"auto","created_at":"2026-01-29 09:41:49","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":875204,"visible":true,"origin":"","legend":"","description":"","filename":"ScientificReportsArticleMETADIFFmaintext.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8663341/v1_covered_5ef86a6c-8ea4-447d-a0f6-8eaf025947de.pdf"},{"id":101390536,"identity":"abe17644-ab7f-4540-a23a-c3acc7ea073d","added_by":"auto","created_at":"2026-01-29 08:15:34","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":2188641,"visible":true,"origin":"","legend":"","description":"","filename":"ScientificReportsArticleMETADIFFsuppdata.docx","url":"https://assets-eu.researchsquare.com/files/rs-8663341/v1/22cfbefd4c858487ea471c9a.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"metagenomics, shotgun, differential abundance, k-mer, unitigs","lastPublishedDoi":"10.21203/rs.3.rs-8663341/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8663341/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eTraditional case-control metagenomic studies are constrained by their dependence on taxonomic and functional databases. Because annotation occurs before differential analysis, they are limited to known elements and keep function and taxonomy separate. Although binning strategies have emerged to reconstruct genomes and mitigate this issue, they still require an assembly step, preventing the use of all available sequencing data. Here, we introduce META-DIFF, a pipeline based on differentially abundant \u003cem\u003ek\u003c/em\u003e-mers independently of any prior annotation. From those \u003cem\u003ek\u003c/em\u003e-mers, it reconstructs longer sequences and provides biological context, as well as the best set of unitigs to discriminate between conditions. In both taxonomy-centric and functionally-centric benchmarks, it showed high precision, robust reproducibility and behaved more conservatively than did common univariate methods. The efficacy of META-DIFF was further validated through its application to a real-world colorectal cancer dataset, which produced both confirmatory and novel results compared with those of previous publications. The pipeline is able to exploit all reads and identify differentially abundant elements, including unknown DNA, prior to annotation. With the guidelines provided, META-DIFF provides users with great exploratory power to unravel microbiome changes.\u003c/p\u003e","manuscriptTitle":"META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-29 08:15:29","doi":"10.21203/rs.3.rs-8663341/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-05-05T07:09:50+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-04T10:12:48+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-04-20T12:37:01+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"202259841302349757430545697044500200854","date":"2026-04-13T08:40:26+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"182344779208268838317230397833324789098","date":"2026-04-13T02:38:42+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"221644556587848797478829701672872376767","date":"2026-04-12T17:46:53+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"1871998596097179810331503585443501803","date":"2026-03-04T09:10:02+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"241687178050140716906374333989428876284","date":"2026-03-04T08:25:09+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-01-27T16:23:00+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2026-01-27T15:11:26+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-01-23T02:49:14+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-01-23T02:47:31+00:00","index":"","fulltext":""},{"type":"submitted","content":"Scientific Reports","date":"2026-01-21T20:31:01+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"scientific-reports","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"scirep","sideBox":"Learn more about [Scientific Reports](http://www.nature.com/srep/)","snPcode":"","submissionUrl":"","title":"Scientific Reports","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Scientific Reports","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"d5d2dec3-7927-4030-bc61-077f61197623","owner":[],"postedDate":"January 29th, 2026","published":true,"recentEditorialEvents":[{"type":"decision","content":"Revision requested","date":"2026-05-05T07:09:50+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-04T10:12:48+00:00","index":97,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":61872791,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":61872792,"name":"Biological sciences/Microbiology"}],"tags":[],"updatedAt":"2026-05-19T03:53:45+00:00","versionOfRecord":[],"versionCreatedAt":"2026-01-29 08:15:29","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8663341","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8663341","identity":"rs-8663341","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.