Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model

preprint OA: closed
Full text JSON View at publisher
AI-generated deep summary by claude@2026-07, 2026-07-03 · read from full text

The preprint presents RegionAMP, a unified deep learning framework for residue-level annotation of antimicrobial peptide (AMP) precursor proteins, addressing the limitation of sequence-level classifiers that cannot locate functional AMP domains within full-length open reading frames containing other regions such as signal peptides. Using the pre-trained ESM-2 protein language model, the authors apply a two-stage fine-tuning strategy: first learning patterns from isolated fragments (signal, antimicrobial, neutral functions), then adding a CRF decoding layer to capture contextual dependencies and boundary transitions across full proteins. RegionAMP is reported to achieve high performance on an imbalanced independent test set (MCC 0.92; Recall_M 0.93) and to recover 2,127 of 2,296 antimicrobial residues with an average IoU of 0.9528 on an extremely imbalanced dataset. The paper is a research preprint and explicitly notes it has not been peer reviewed by a journal. The paper does not explicitly discuss endometriosis or adenomyosis; it was included in the corpus via a keyword match related to antimicrobial peptides.

Read from the paper's body, not the abstract. Not a substitute for reading the paper. No clinical advice. How this works

Full text 25,800 characters · extracted from preprint-html · click to expand
Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model Peilin Xie, Xingchen Liu, Lantian Yao, Zhihao Zhao, Anming Yang, and 8 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8208819/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The escalating crisis of antimicrobial resistance (AMR) urgently demands novel therapeutic agents, positioning antimicrobial peptides (AMPs)—key effectors of innate immunity—as highly promising candidates. While direct genomic mining provides a powerful route for discovery, conventional sequence-level classifiers face a fundamental methodological bottleneck: they are inadequate for analyzing full Open Reading Frame (ORF) translation products (precursor proteins) because they fail to identify and precisely locate the functional AMP domain within sequences that also contain other regions like signal peptides. To overcome this limitation and enable fine-grained locating, we developed RegionAMP, a unified deep learning framework for accurate residue-level annotation of AMP precursors. RegionAMP leverages the pre-trained ESM-2 protein language model, adapting it through a meticulously designed two-stage fine-tuning strategy. The initial stage learns the intrinsic sequence patterns of isolated functional fragments (signal, antimicrobial, neutral functions). Crucially, the second stage integrates a Conditional Random Field (CRF) decoding layer, enabling the model to learn contextual dependencies and inter-region transitions within full-length proteins, thereby achieving robust boundary delineation. The final architecture (PLM-CRF) is highly effective for this sequence labeling task. RegionAMP exhibits exceptional performance on a challenging, imbalanced independent test set, achieving an MCC of 0.92, indicating strong discriminative performance. The recall for the critical antimicrobial peptide sites ( \((Recall_M)\) ) also reached 0.93. Feature space analysis using t-SNE confirms the model’s effective differentiation of AMP, signal peptide, and neutral sites into distinct clusters. Most compellingly, on an independent and extremely imbalanced test dataset containing only 2,296 antimicrobial residues within 46,442,400 total residues, RegionAMP successfully recovered 2,127 true antimicrobial residues, achieving an impressive average Intersection over Union (IoU) of 0.9528. This high IoU definitively validates the model’s capacity for precise locating and boundary detection of the complete AMP domain. This work successfully demonstrates robust, region-specific AMP identification directly from precursor protein sequences. Antimicrobial Peptides Genomic Mining Protein Language Model Residue-level annotation Full Text Additional Declarations No competing interests reported. Supplementary Files SUPPLEMENTARYMATERIALS.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8208819","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":558481067,"identity":"cd46550e-6a70-4570-a622-b2fc7eb88651","order_by":0,"name":"Peilin Xie","email":"","orcid":"","institution":"Chinese University of Hong Kong, Shenzhen","correspondingAuthor":false,"prefix":"","firstName":"Peilin","middleName":"","lastName":"Xie","suffix":""},{"id":558481069,"identity":"ead82419-453b-4a98-b56a-49ba79f74c55","order_by":1,"name":"Xingchen Liu","email":"","orcid":"","institution":"The Chinese University of Hong Kong, Shenzhen","correspondingAuthor":false,"prefix":"","firstName":"Xingchen","middleName":"","lastName":"Liu","suffix":""},{"id":558481072,"identity":"a242391f-3fa5-4399-97e2-9962fea6540e","order_by":2,"name":"Lantian Yao","email":"","orcid":"","institution":"Xiamen University","correspondingAuthor":false,"prefix":"","firstName":"Lantian","middleName":"","lastName":"Yao","suffix":""},{"id":558481084,"identity":"8e26bd26-df06-401d-adc5-1864f13bd073","order_by":3,"name":"Zhihao Zhao","email":"","orcid":"","institution":"Chinese University of Hong Kong, Shenzhen","correspondingAuthor":false,"prefix":"","firstName":"Zhihao","middleName":"","lastName":"Zhao","suffix":""},{"id":558481085,"identity":"6f8bc6c9-5b92-4a7d-8d56-3a4880aaa5ee","order_by":4,"name":"Anming Yang","email":"","orcid":"","institution":"Peking University","correspondingAuthor":false,"prefix":"","firstName":"Anming","middleName":"","lastName":"Yang","suffix":""},{"id":558481087,"identity":"7dc8c297-bc56-4f66-9236-4943ab897382","order_by":5,"name":"Jiahui Guan","email":"","orcid":"","institution":"The University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Jiahui","middleName":"","lastName":"Guan","suffix":""},{"id":558481090,"identity":"4da15af8-f48e-47e7-9523-382e5398b805","order_by":6,"name":"Zijun Jiao","email":"","orcid":"","institution":"Peking University","correspondingAuthor":false,"prefix":"","firstName":"Zijun","middleName":"","lastName":"Jiao","suffix":""},{"id":558481100,"identity":"131e202d-44ec-40ae-ba4c-2aef514f1cda","order_by":7,"name":"Zhihong Liu","email":"","orcid":"","institution":"Shenzhen Bay Laboratory","correspondingAuthor":false,"prefix":"","firstName":"Zhihong","middleName":"","lastName":"Liu","suffix":""},{"id":558481102,"identity":"6153b7d5-a966-4559-a1cc-25a8eaccee3e","order_by":8,"name":"Junwen Wang","email":"","orcid":"","institution":"The University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Junwen","middleName":"","lastName":"Wang","suffix":""},{"id":558481103,"identity":"7f042503-3dff-47c2-bcf9-5db02b9c86ce","order_by":9,"name":"Tzong-Yi Lee","email":"","orcid":"","institution":"National Yang Ming Chiao Tung University","correspondingAuthor":false,"prefix":"","firstName":"Tzong-Yi","middleName":"","lastName":"Lee","suffix":""},{"id":558481105,"identity":"c1c732b1-e528-4899-9c1d-65097fd4fee7","order_by":10,"name":"Zigang Li","email":"","orcid":"","institution":"Peking University","correspondingAuthor":false,"prefix":"","firstName":"Zigang","middleName":"","lastName":"Li","suffix":""},{"id":558481113,"identity":"17e5f9f6-7f73-4ea3-9d15-a134648c4cfd","order_by":11,"name":"Bingyu Cui","email":"","orcid":"","institution":"Chinese University of Hong Kong, Shenzhen","correspondingAuthor":false,"prefix":"","firstName":"Bingyu","middleName":"","lastName":"Cui","suffix":""},{"id":558481129,"identity":"79c38fb5-9dae-4554-b134-3d4aaf7d25a5","order_by":12,"name":"Ying-Chih Chiang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3klEQVRIiWNgGAWjYDACCcYGMM3PwMAGYkK5xGiRbCBeC5Q2OECsFv7ZzY2fC34dljc+fsbs4QwGG9kNB5ifPcBryZ2DzdIz+w4bbjuTY264gSHNeMMBNnMDfFoMJBIbpHl7DieY3eAxk3zAcDhxwwEeNgkCWpp/g7QYzwBr+U+UljZpnh+HEwwkgFo2MBwgrEXiRmKbNW9DuuGMM2llkjMMko1nHmYzw6uFf0b649s8f6zl+dsPb5PsqbCT7Tve/AyvFjBgbGuGuROImQmqB4E/dUQpGwWjYBSMghEKACuRSiImq+eLAAAAAElFTkSuQmCC","orcid":"","institution":"Chinese University of Hong Kong, Shenzhen","correspondingAuthor":true,"prefix":"","firstName":"Ying-Chih","middleName":"","lastName":"Chiang","suffix":""}],"badges":[],"createdAt":"2025-11-26 05:23:21","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8208819/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8208819/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":97964292,"identity":"21e6b735-f475-40fa-873a-95cc9a260b7a","added_by":"auto","created_at":"2025-12-11 09:33:22","extension":"json","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":14528,"visible":true,"origin":"","legend":"","description":"","filename":"feff67654d834c089fb9773053e8f91a.json","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/11edcd07426e0921a010b6bb.json"},{"id":98422856,"identity":"f1580e04-f6ed-49fb-8d09-8ce6d8100028","added_by":"auto","created_at":"2025-12-17 16:31:35","extension":"xml","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":102025,"visible":true,"origin":"","legend":"","description":"","filename":"feff67654d834c089fb9773053e8f91a1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/7073a91fe84f661466537985.xml"},{"id":98422956,"identity":"7b1de60a-8283-4cb8-ab82-618c6929695c","added_by":"auto","created_at":"2025-12-17 16:31:40","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":33274273,"visible":true,"origin":"","legend":"","description":"","filename":"RegionAMP.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/009567334ab47505afed3137.pdf"},{"id":97964294,"identity":"b124933b-a235-4ab0-b612-216b605382bb","added_by":"auto","created_at":"2025-12-11 09:33:22","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":118304,"visible":true,"origin":"","legend":"","description":"","filename":"coverletter.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/e56273381288a0cd12c38af6.pdf"},{"id":97964303,"identity":"35a2ee84-e1b9-4294-a642-b05ec9c2173e","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":10538681,"visible":true,"origin":"","legend":"","description":"","filename":"introbio.png","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/7213dfe2a97206b7c8b03ab8.png"},{"id":98423410,"identity":"95593f43-f7a5-4dd7-a78f-e6da8f0d825d","added_by":"auto","created_at":"2025-12-17 16:32:12","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1843999,"visible":true,"origin":"","legend":"","description":"","filename":"modelnew.png","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/239b69957c9a3b44184d2c4d.png"},{"id":98422518,"identity":"5608cc4b-e88b-4946-9463-aae694302140","added_by":"auto","created_at":"2025-12-17 16:31:10","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":19423313,"visible":true,"origin":"","legend":"","description":"","filename":"modelresult.png","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/7c650f5dbac5455628b49cbe.png"},{"id":97964300,"identity":"4ca99919-ba81-445d-9160-4b60d01d1425","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"jpg","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1225649,"visible":true,"origin":"","legend":"","description":"","filename":"peptidestucture.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/15d9fc3d2278eb6d9f79b92b.jpg"},{"id":98424010,"identity":"422275ae-878d-4f5e-a971-7b51a35b28b4","added_by":"auto","created_at":"2025-12-17 16:32:51","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2460692,"visible":true,"origin":"","legend":"","description":"","filename":"peptidestucture.png","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/fc22575feb75c07d2e977a3f.png"},{"id":97964295,"identity":"20a87f49-9f52-440f-bccd-a8c98bf0d276","added_by":"auto","created_at":"2025-12-11 09:33:22","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":499661,"visible":true,"origin":"","legend":"","description":"","filename":"scatter.png","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/4e63b6801194fc0ca75d0219.png"},{"id":97964299,"identity":"10e6fcf3-1d64-49cc-a898-5ed22311e4ab","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"bst","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":146013,"visible":true,"origin":"","legend":"","description":"","filename":"snapacite.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/f990509d9aed8e51872ad75e.bst"},{"id":97964297,"identity":"3a20267b-2d44-4761-b6cc-27a927d26098","added_by":"auto","created_at":"2025-12-11 09:33:22","extension":"bst","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":29828,"visible":true,"origin":"","legend":"","description":"","filename":"snaps.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/b7740e74e0e0d1d2ba462a91.bst"},{"id":98422500,"identity":"1d7a97ef-285e-4fbe-9851-55712b8d7585","added_by":"auto","created_at":"2025-12-17 16:31:08","extension":"pdf","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":421391,"visible":true,"origin":"","legend":"","description":"","filename":"snarticle.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/9a78f8f9325887bd480ce417.pdf"},{"id":97964316,"identity":"05bb03cd-b354-468a-b536-28a2e94da50e","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"bst","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":35515,"visible":true,"origin":"","legend":"","description":"","filename":"snbasic.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/07012ec1e9987906d33ee3e2.bst"},{"id":98423094,"identity":"bb88a3b6-86f7-498c-b911-43376ff39334","added_by":"auto","created_at":"2025-12-17 16:31:51","extension":"bst","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":33968,"visible":true,"origin":"","legend":"","description":"","filename":"snchicago.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/10faff9e3a1bb23a2650e335.bst"},{"id":98422413,"identity":"4c834823-f8bf-49ae-92a1-2e82072e934d","added_by":"auto","created_at":"2025-12-17 16:31:00","extension":"cls","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":55857,"visible":true,"origin":"","legend":"","description":"","filename":"snjnl.cls","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/111236c805478c9390056dfb.cls"},{"id":97964306,"identity":"7b041c90-1512-4618-969d-9be70cbb4f15","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"bst","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":64023,"visible":true,"origin":"","legend":"","description":"","filename":"snmathphysay.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/ee3442bbda20bcdc183ff5ce.bst"},{"id":97964310,"identity":"62f81522-4f8e-4468-af01-83cfbabfa277","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"bst","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":64166,"visible":true,"origin":"","legend":"","description":"","filename":"snmathphysnum.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/0cbd947c8dbeeb201a77655e.bst"},{"id":97964311,"identity":"1a6f5c14-c428-485f-bfb8-525b68d29fa0","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"bst","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":37333,"visible":true,"origin":"","legend":"","description":"","filename":"snnature.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/0d5e6da8e95dab9b7ea7140e.bst"},{"id":98424139,"identity":"7e5e950b-e2c0-4fc3-b66b-c1f6ebe8f3d2","added_by":"auto","created_at":"2025-12-17 16:32:59","extension":"bst","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":39951,"visible":true,"origin":"","legend":"","description":"","filename":"snvancouveray.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/851f4a0ae353cbb1749edf4d.bst"},{"id":98422760,"identity":"f5132ed7-695b-48bf-8d57-2609247a7cb6","added_by":"auto","created_at":"2025-12-17 16:31:27","extension":"bst","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":40758,"visible":true,"origin":"","legend":"","description":"","filename":"snvancouvernum.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/0feca8f3d8aff82197d62ae2.bst"},{"id":97964313,"identity":"c2fed289-eca4-48e6-9dbe-90b6f2ac297d","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"pdf","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":418495,"visible":true,"origin":"","legend":"","description":"","filename":"usermanual.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/0a64b5f4266a082fbc1d2260.pdf"},{"id":98423046,"identity":"e55b2411-1c80-48fa-b829-8570b73f449b","added_by":"auto","created_at":"2025-12-17 16:31:46","extension":"xml","order_by":26,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":109991,"visible":true,"origin":"","legend":"","description":"","filename":"feff67654d834c089fb9773053e8f91a1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/984efdc91d5e2a16ca0fd91a.xml"},{"id":97964315,"identity":"5d0f8a30-c642-4825-b27c-35c736e9e6e1","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"html","order_by":27,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":121897,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/5dd85d6f414ad2e976151bc7.html"},{"id":104800714,"identity":"ad971e51-fc12-4be7-9410-4fa501b5613f","added_by":"auto","created_at":"2026-03-17 10:27:57","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":10598135,"visible":true,"origin":"","legend":"","description":"","filename":"RegionAMP.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1_covered_7a341cbd-0be9-4976-8aab-36d6325490e2.pdf"},{"id":98423832,"identity":"c62b5a10-9eab-498f-a1ef-c4a0a5f77cdb","added_by":"auto","created_at":"2025-12-17 16:32:41","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":468678,"visible":true,"origin":"","legend":"","description":"","filename":"SUPPLEMENTARYMATERIALS.docx","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/246c933dce7625cf87a47fc1.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Antimicrobial Peptides, Genomic Mining, Protein Language Model, Residue-level annotation","lastPublishedDoi":"10.21203/rs.3.rs-8208819/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8208819/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe escalating crisis of antimicrobial resistance (AMR) urgently demands novel therapeutic agents, positioning antimicrobial peptides (AMPs)\u0026mdash;key effectors of innate immunity\u0026mdash;as highly promising candidates. While direct genomic mining provides a powerful route for discovery, conventional sequence-level classifiers face a fundamental methodological bottleneck: they are inadequate for analyzing full Open Reading Frame (ORF) translation products (precursor proteins) because they fail to identify and precisely locate the functional AMP domain within sequences that also contain other regions like signal peptides. To overcome this limitation and enable fine-grained locating, we developed RegionAMP, a unified deep learning framework for accurate residue-level annotation of AMP precursors. RegionAMP leverages the pre-trained ESM-2 protein language model, adapting it through a meticulously designed two-stage fine-tuning strategy. The initial stage learns the intrinsic sequence patterns of isolated functional fragments (signal, antimicrobial, neutral functions). Crucially, the second stage integrates a Conditional Random Field (CRF) decoding layer, enabling the model to learn contextual dependencies and inter-region transitions within full-length proteins, thereby achieving robust boundary delineation. The final architecture (PLM-CRF) is highly effective for this sequence labeling task. RegionAMP exhibits exceptional performance on a challenging, imbalanced independent test set, achieving an MCC of 0.92, indicating strong discriminative performance. The recall for the critical antimicrobial peptide sites (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\((Recall_M)\\)\u003c/span\u003e\u003c/span\u003e) also reached 0.93. Feature space analysis using t-SNE confirms the model\u0026rsquo;s effective differentiation of AMP, signal peptide, and neutral sites into distinct clusters. Most compellingly, on an independent and extremely imbalanced test dataset containing only 2,296 antimicrobial residues within 46,442,400 total residues, RegionAMP successfully recovered 2,127 true antimicrobial residues, achieving an impressive average Intersection over Union (IoU) of 0.9528. This high IoU definitively validates the model\u0026rsquo;s capacity for precise locating and boundary detection of the complete AMP domain. This work successfully demonstrates robust, region-specific AMP identification directly from precursor protein sequences.\u003c/p\u003e","manuscriptTitle":"Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-11 09:33:18","doi":"10.21203/rs.3.rs-8208819/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3ab3a9b8-41a0-4d96-9d34-a77fd4235e0e","owner":[],"postedDate":"December 11th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-03-17T10:25:01+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-11 09:33:18","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8208819","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8208819","identity":"rs-8208819","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00