scCobra: Contrastive cell embedding learning with domain-adaptation for single-cell data integration and harmonization | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article scCobra: Contrastive cell embedding learning with domain-adaptation for single-cell data integration and harmonization Jun Ding, Bowen Zhao, Dong-qing Wei, Yi Xiong This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4410408/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 13 Feb, 2025 Read the published version in Communications Biology → Version 1 posted You are reading this latest preprint version Abstract The rapid development of single-cell technologies has underscored the need for more effective methods in the integration and harmonization of single-cell sequencing data. The prevalent challenge of batch effects, resulting from technical and biological variations across studies, demands accurate and reliable solutions for data integration. Traditional tools often have limitations, both due to reliance on gene expression distribution assumptions and the common issue of over-correction, particularly in methods based on anchor alignments. Here we introduce scCobra, a deep neural network tool designed specifically to address these challenges. By leveraging a deep generative model that combines a contrastive neural network with domain adaptation, scCobra effectively mitigates batch effects and minimizes over-correction without depending on gene expression distribution assumptions. Additionally, scCobra enables online label transfer across datasets with batch effects, facilitating the continuous integration of new data without retraining, and offers features for batch effect simulation and advanced multi-omic batch integration. These capabilities make scCobra a versatile data integration and harmonization tool for achieving accurate and insightful biological interpretations from complex datasets. Biological sciences/Computational biology and bioinformatics/Computational models Biological sciences/Computational biology and bioinformatics/Machine learning Biological sciences/Computational biology and bioinformatics/Data integration Single-cell Data integration Batch correction Contrastive learning Domain adaptation Generative adversarial Generative neural network Label transfer Batch simulation. Full Text Additional Declarations There is NO Competing Interest. Supplementary Files scCobraSupplementary.pdf Cite Share Download PDF Status: Published Journal Publication published 13 Feb, 2025 Read the published version in Communications Biology → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4410408","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":302400115,"identity":"097ebd32-40e4-448e-ac80-32ef10ce42b1","order_by":0,"name":"Jun Ding","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA00lEQVRIiWNgGAWjYBACxgYgwWPAwMAP4iUUWJCgRRLESDCQINIqHiA2OABiEaOFeUbuAYY3BXfsNl87/PjFA6AW/vYDBBw2Iy+BcY7Bs+Rtt9PMLEAOkziTQEhLjgEzj8HhZLPbOWwGIC0GDMRqMZ4N08L/gDgtdgbSOcwPwFokCNnS88bg4ByDwwkSQL+AAplH4gYBWwzbcwwfvPlz2J5/dvLjjz8qbOT4+wnYYtjAwHAASCcCaTZQpPDgVw8E8lDaHoiZPxBUPgpGwSgYBSMSAACvBz9uU/isZwAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0001-5183-6885","institution":"McGill University","correspondingAuthor":true,"prefix":"","firstName":"Jun","middleName":"","lastName":"Ding","suffix":""},{"id":302400116,"identity":"2568bfef-07a8-400b-906a-3f84bfb35674","order_by":1,"name":"Bowen Zhao","email":"","orcid":"","institution":"McGill University","correspondingAuthor":false,"prefix":"","firstName":"Bowen","middleName":"","lastName":"Zhao","suffix":""},{"id":302400117,"identity":"dbef4652-804b-4903-ae3d-d277fd5df173","order_by":2,"name":"Dong-qing Wei","email":"","orcid":"https://orcid.org/0000-0003-4200-7502","institution":"Shanghai Jiao Tong University","correspondingAuthor":false,"prefix":"","firstName":"Dong-qing","middleName":"","lastName":"Wei","suffix":""},{"id":302400118,"identity":"60a9c7a7-7d80-4ba6-bd39-29904f15b165","order_by":3,"name":"Yi Xiong","email":"","orcid":"https://orcid.org/0000-0003-2910-6725","institution":"Shanghai Jiaotong University","correspondingAuthor":false,"prefix":"","firstName":"Yi","middleName":"","lastName":"Xiong","suffix":""}],"badges":[],"createdAt":"2024-05-13 03:18:15","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4410408/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4410408/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s42003-025-07692-x","type":"published","date":"2025-02-13T05:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":76267865,"identity":"165ca96d-2656-49ca-bc86-1454ec84b319","added_by":"auto","created_at":"2025-02-14 08:06:40","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":23810565,"visible":true,"origin":"","legend":"","description":"","filename":"scCobrafinal.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4410408/v1_covered_61b1436c-82fb-411d-ba3e-266ed972de31.pdf"},{"id":63047370,"identity":"b486ba6c-166d-4930-afe0-4e30b3921e5e","added_by":"auto","created_at":"2024-08-22 13:06:25","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":8775114,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cbr\u003e\u003c/p\u003e","description":"","filename":"scCobraSupplementary.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4410408/v1/112577b6851f567c9b0049d6.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"scCobra: Contrastive cell embedding learning with domain-adaptation for single-cell data integration and harmonization","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Single-cell, Data integration, Batch correction, Contrastive learning, Domain adaptation, Generative adversarial, Generative neural network, Label transfer, Batch simulation.","lastPublishedDoi":"10.21203/rs.3.rs-4410408/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4410408/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"The rapid development of single-cell technologies has underscored the need for more effective methods in the integration and harmonization of single-cell sequencing data. The prevalent challenge of batch effects, resulting from technical and biological variations across studies, demands accurate and reliable solutions for data integration. Traditional tools often have limitations, both due to reliance on gene expression distribution assumptions and the common issue of over-correction, particularly in methods based on anchor alignments. Here we introduce scCobra, a deep neural network tool designed specifically to address these challenges. By leveraging a deep generative model that combines a contrastive neural network with domain adaptation, scCobra effectively mitigates batch effects and minimizes over-correction without depending on gene expression distribution assumptions. Additionally, scCobra enables online label transfer across datasets with batch effects, facilitating the continuous integration of new data without retraining, and offers features for batch effect simulation and advanced multi-omic batch integration. These capabilities make scCobra a versatile data integration and harmonization tool for achieving accurate and insightful biological interpretations from complex datasets.","manuscriptTitle":"scCobra: Contrastive cell embedding learning with domain-adaptation for single-cell data integration and harmonization","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-22 13:06:20","doi":"10.21203/rs.3.rs-4410408/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"communications-biology","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"commsbio","sideBox":"Learn more about [Communications Biology](http://www.nature.com/commsbio/)","snPcode":"","submissionUrl":"","title":"Communications Biology","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Communications Series","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"ddf96e2d-0a71-4c7a-ba7e-b28cd8f6bf64","owner":[],"postedDate":"August 22nd, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":31907784,"name":"Biological sciences/Computational biology and bioinformatics/Computational models"},{"id":31907785,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"},{"id":31907786,"name":"Biological sciences/Computational biology and bioinformatics/Data integration"}],"tags":[],"updatedAt":"2025-02-14T08:06:20+00:00","versionOfRecord":{"articleIdentity":"rs-4410408","link":"https://doi.org/10.1038/s42003-025-07692-x","journal":{"identity":"communications-biology","isVorOnly":false,"title":"Communications Biology"},"publishedOn":"2025-02-13 05:00:00","publishedOnDateReadable":"February 13th, 2025"},"versionCreatedAt":"2024-08-22 13:06:20","video":"","vorDoi":"10.1038/s42003-025-07692-x","vorDoiUrl":"https://doi.org/10.1038/s42003-025-07692-x","workflowStages":[]},"version":"v1","identity":"rs-4410408","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4410408","identity":"rs-4410408","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.