A machine learning framework to identify complex physicochemical features of B cell epitopes | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article A machine learning framework to identify complex physicochemical features of B cell epitopes Simranjit Grewal, Uwa Iyamu, Daniel Vinals, Catherine Mitran, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6255613/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 02 Oct, 2025 Read the published version in npj Systems Biology and Applications → Version 1 posted 9 You are reading this latest preprint version Abstract During infection with Plasmodium falciparum in pregnancy, parasites express a unique virulence factor, VAR2CSA, that mediates binding of infected red blood cells to the placenta. A major goal in designing vaccines to protect pregnant women from malaria is to elicit antibodies to VAR2CSA. The challenge is that VAR2CSA is highly polymorphic and identifying conserved epitopes is essential to elicit strain-transcending immunity. Unexpectedly, a mouse monoclonal antibody, 3D10, raised against the unrelated Duffy binding protein from P. vivax (DBPII) cross-reacts with diverse alleles of VAR2CSA in vitro . To identify these potentially conserved epitopes in VAR2CSA, we designed a machine learning framework to analyse 3D10 reactivity to peptides derived from two alleles of VAR2CSA, DBPII, and PvEBP2 (negative control). We used decision trees and a panel of 430 features to extract features correlated to 3D10 binding. We analysed patterns of these features in the dataset and designed mutant peptides to test complex sequence motifs. Features associated with 3D10 reactivity were mapped onto predicted 3D structures of Plasmodium proteins and validated based on 3D10 reactivity to the recombinant antigens. While the array data identified certain linear epitopes, the framework predicted other epitopes that are conformational. With this approach, peptide array data can be mined to extract physicochemical properties of epitopes recognized by polyreactive antibodies. Biological sciences/Biological techniques Biological sciences/Computational biology and bioinformatics Biological sciences/Immunology Biological sciences/Microbiology Health sciences/Medical research Full Text Additional Declarations No competing interests reported. Supplementary Files MicroarrayDataMousemAb3D10PEP20205061690PcDBP.xlsx REMMI.py MicroarrayDataMouseIgG3D1010ugDBPIIPVEBP2VAR2CSANF54andFCR3.xlsx supplementarydata.docx MicroarrayDataMouseIgG3D1010ug.xlsx Cite Share Download PDF Status: Published Journal Publication published 02 Oct, 2025 Read the published version in npj Systems Biology and Applications → Version 1 posted Editorial decision: Revision requested 06 May, 2025 Reviews received at journal 05 May, 2025 Reviewers agreed at journal 22 Apr, 2025 Reviews received at journal 20 Apr, 2025 Reviewers agreed at journal 07 Apr, 2025 Reviewers invited by journal 30 Mar, 2025 Editor assigned by journal 21 Mar, 2025 Submission checks completed at journal 20 Mar, 2025 First submitted to journal 18 Mar, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6255613","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":440741487,"identity":"4454f425-b8ec-4841-b485-e05025306d57","order_by":0,"name":"Simranjit Grewal","email":"","orcid":"","institution":"University of Alberta","correspondingAuthor":false,"prefix":"","firstName":"Simranjit","middleName":"","lastName":"Grewal","suffix":""},{"id":440741492,"identity":"91935e28-f0cb-4e2c-bd1b-be80700653e0","order_by":1,"name":"Uwa Iyamu","email":"","orcid":"","institution":"University of Alberta","correspondingAuthor":false,"prefix":"","firstName":"Uwa","middleName":"","lastName":"Iyamu","suffix":""},{"id":440741495,"identity":"8b35fde4-aade-4f1e-893c-45106a05e663","order_by":2,"name":"Daniel Vinals","email":"","orcid":"","institution":"University of Alberta","correspondingAuthor":false,"prefix":"","firstName":"Daniel","middleName":"","lastName":"Vinals","suffix":""},{"id":440741497,"identity":"efc9b690-4707-4bc3-bf18-4d735555d64c","order_by":3,"name":"Catherine Mitran","email":"","orcid":"","institution":"University of Alberta","correspondingAuthor":false,"prefix":"","firstName":"Catherine","middleName":"","lastName":"Mitran","suffix":""},{"id":440741498,"identity":"ce5b7833-4ad0-4ac4-a053-c2869f4415c3","order_by":4,"name":"Nidhi Hegde","email":"","orcid":"","institution":"University of Alberta","correspondingAuthor":false,"prefix":"","firstName":"Nidhi","middleName":"","lastName":"Hegde","suffix":""},{"id":440741499,"identity":"96d945e0-c1e5-45a8-a1bb-1c33dbed482e","order_by":5,"name":"Stephanie Yanow","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAy0lEQVRIiWNgGAWjYBACgwMgsoKBgU+CNC1nGBjYQFoOEKNFsgFIMLaRooWf/ezBx7zzDsuzSTcf+/yhhkGev4GAFjaevGRj3m2HDdtkjiXPOHCMwXAGIavYGHLMpHO3HWZsk8gxZjjAxpBA0HVs/G+AWuYctm+TyP/McOAfQ4I8QS0SIFsaDicCbWFmONjGkGBAWMsbY+M/x9KTgX4xZjjbJ2G4kbDDcgwfzqixtu2Xbn7MUPHNRl6OkBZ0QHQaGAWjYBSMglGADwAAiTc8QtOtHy4AAAAASUVORK5CYII=","orcid":"","institution":"University of Alberta","correspondingAuthor":true,"prefix":"","firstName":"Stephanie","middleName":"","lastName":"Yanow","suffix":""}],"badges":[],"createdAt":"2025-03-18 18:23:09","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-6255613/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6255613/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41540-025-00583-1","type":"published","date":"2025-10-02T15:56:50+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":92883706,"identity":"7bbcec0e-96be-43db-b242-b5186bcc2ad1","added_by":"auto","created_at":"2025-10-06 16:08:16","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1701378,"visible":true,"origin":"","legend":"","description":"","filename":"AmachinelearningframeworktoidentifycomplexphysicochemicalfeaturesofBcellepitopes.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6255613/v1_covered_ccaf54e8-0d8b-4dce-ae06-c0668d18563b.pdf"},{"id":80881505,"identity":"64a82c63-8029-4ca0-8aa5-a84c3048d2a6","added_by":"auto","created_at":"2025-04-18 07:50:37","extension":"xlsx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":205745,"visible":true,"origin":"","legend":"","description":"","filename":"MicroarrayDataMousemAb3D10PEP20205061690PcDBP.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6255613/v1/b377e3fc33e9b9eff7332cbb.xlsx"},{"id":80881504,"identity":"5feabe78-d797-4593-9ef4-c9d8e201065f","added_by":"auto","created_at":"2025-04-18 07:50:37","extension":"py","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":62873,"visible":true,"origin":"","legend":"","description":"","filename":"REMMI.py","url":"https://assets-eu.researchsquare.com/files/rs-6255613/v1/010c03ce5fec93739fcab7c4.py"},{"id":80881508,"identity":"f7a655d9-6f34-43bd-9819-31df47794da6","added_by":"auto","created_at":"2025-04-18 07:50:37","extension":"xlsx","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":1555449,"visible":true,"origin":"","legend":"","description":"","filename":"MicroarrayDataMouseIgG3D1010ugDBPIIPVEBP2VAR2CSANF54andFCR3.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6255613/v1/d8cc03687758d6b99e402cd1.xlsx"},{"id":80881516,"identity":"1f128e53-58ba-44a2-9742-6118205501ce","added_by":"auto","created_at":"2025-04-18 07:50:38","extension":"docx","order_by":3,"title":"","display":"","copyAsset":false,"role":"supplement","size":4054609,"visible":true,"origin":"","legend":"","description":"","filename":"supplementarydata.docx","url":"https://assets-eu.researchsquare.com/files/rs-6255613/v1/3581f1bb00abc1634b9ceab4.docx"},{"id":80881511,"identity":"f967a8bf-08b7-4a0a-99d5-78299a453308","added_by":"auto","created_at":"2025-04-18 07:50:37","extension":"xlsx","order_by":4,"title":"","display":"","copyAsset":false,"role":"supplement","size":1672908,"visible":true,"origin":"","legend":"","description":"","filename":"MicroarrayDataMouseIgG3D1010ug.xlsx","url":"https://assets-eu.researchsquare.com/files/rs-6255613/v1/46c60792515bf8bd175f08af.xlsx"}],"financialInterests":"No competing interests reported.","formattedTitle":"A machine learning framework to identify complex physicochemical features of B cell epitopes","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"npj-systems-biology-and-applications","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjsba","sideBox":"Learn more about [npj Systems Biology and Applications](http://www.nature.com/npjsba/)","snPcode":"41540","submissionUrl":"https://submission.springernature.com/new-submission/41540/3","title":"npj Systems Biology and Applications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-6255613/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6255613/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eDuring infection with \u003cem\u003ePlasmodium falciparum\u003c/em\u003e in pregnancy, parasites express a unique virulence factor, VAR2CSA, that mediates binding of infected red blood cells to the placenta. A major goal in designing vaccines to protect pregnant women from malaria is to elicit antibodies to VAR2CSA. The challenge is that VAR2CSA is highly polymorphic and identifying conserved epitopes is essential to elicit strain-transcending immunity. Unexpectedly, a mouse monoclonal antibody, 3D10, raised against the unrelated Duffy binding protein from \u003cem\u003eP. vivax\u003c/em\u003e (DBPII) cross-reacts with diverse alleles of VAR2CSA \u003cem\u003ein vitro\u003c/em\u003e. To identify these potentially conserved epitopes in VAR2CSA, we designed a machine learning framework to analyse 3D10 reactivity to peptides derived from two alleles of VAR2CSA, DBPII, and PvEBP2 (negative control). We used decision trees and a panel of 430 features to extract features correlated to 3D10 binding. We analysed patterns of these features in the dataset and designed mutant peptides to test complex sequence motifs. Features associated with 3D10 reactivity were mapped onto predicted 3D structures of \u003cem\u003ePlasmodium\u003c/em\u003e proteins and validated based on 3D10 reactivity to the recombinant antigens. While the array data identified certain linear epitopes, the framework predicted other epitopes that are conformational. With this approach, peptide array data can be mined to extract physicochemical properties of epitopes recognized by polyreactive antibodies.\u003c/p\u003e","manuscriptTitle":"A machine learning framework to identify complex physicochemical features of B cell epitopes","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-04-18 07:50:32","doi":"10.21203/rs.3.rs-6255613/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-05-07T03:06:10+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-05-05T06:32:56+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"84361536667283864284536535657421030318","date":"2025-04-22T23:15:22+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-04-21T01:46:15+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"284146571809030018518360566800375198875","date":"2025-04-07T17:53:31+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-03-30T16:05:37+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-03-21T05:47:01+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-03-20T06:06:27+00:00","index":"","fulltext":""},{"type":"submitted","content":"npj Systems Biology and Applications","date":"2025-03-18T18:09:59+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"npj-systems-biology-and-applications","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjsba","sideBox":"Learn more about [npj Systems Biology and Applications](http://www.nature.com/npjsba/)","snPcode":"41540","submissionUrl":"https://submission.springernature.com/new-submission/41540/3","title":"npj Systems Biology and Applications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"8e41181e-b6b8-4506-ba03-7bb550b9d712","owner":[],"postedDate":"April 18th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":46926549,"name":"Biological sciences/Biological techniques"},{"id":46926550,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":46926551,"name":"Biological sciences/Immunology"},{"id":46926552,"name":"Biological sciences/Microbiology"},{"id":46926553,"name":"Health sciences/Medical research"}],"tags":[],"updatedAt":"2025-10-06T16:00:14+00:00","versionOfRecord":{"articleIdentity":"rs-6255613","link":"https://doi.org/10.1038/s41540-025-00583-1","journal":{"identity":"npj-systems-biology-and-applications","isVorOnly":false,"title":"npj Systems Biology and Applications"},"publishedOn":"2025-10-02 15:56:50","publishedOnDateReadable":"October 2nd, 2025"},"versionCreatedAt":"2025-04-18 07:50:32","video":"","vorDoi":"10.1038/s41540-025-00583-1","vorDoiUrl":"https://doi.org/10.1038/s41540-025-00583-1","workflowStages":[]},"version":"v1","identity":"rs-6255613","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6255613","identity":"rs-6255613","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.