Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model Peilin Xie, Xingchen Liu, Lantian Yao, Zhihao Zhao, Anming Yang, and 8 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8208819/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The escalating crisis of antimicrobial resistance (AMR) urgently demands novel therapeutic agents, positioning antimicrobial peptides (AMPs)—key effectors of innate immunity—as highly promising candidates. While direct genomic mining provides a powerful route for discovery, conventional sequence-level classifiers face a fundamental methodological bottleneck: they are inadequate for analyzing full Open Reading Frame (ORF) translation products (precursor proteins) because they fail to identify and precisely locate the functional AMP domain within sequences that also contain other regions like signal peptides. To overcome this limitation and enable fine-grained locating, we developed RegionAMP, a unified deep learning framework for accurate residue-level annotation of AMP precursors. RegionAMP leverages the pre-trained ESM-2 protein language model, adapting it through a meticulously designed two-stage fine-tuning strategy. The initial stage learns the intrinsic sequence patterns of isolated functional fragments (signal, antimicrobial, neutral functions). Crucially, the second stage integrates a Conditional Random Field (CRF) decoding layer, enabling the model to learn contextual dependencies and inter-region transitions within full-length proteins, thereby achieving robust boundary delineation. The final architecture (PLM-CRF) is highly effective for this sequence labeling task. RegionAMP exhibits exceptional performance on a challenging, imbalanced independent test set, achieving an MCC of 0.92, indicating strong discriminative performance. The recall for the critical antimicrobial peptide sites ( \((Recall_M)\) ) also reached 0.93. Feature space analysis using t-SNE confirms the model’s effective differentiation of AMP, signal peptide, and neutral sites into distinct clusters. Most compellingly, on an independent and extremely imbalanced test dataset containing only 2,296 antimicrobial residues within 46,442,400 total residues, RegionAMP successfully recovered 2,127 true antimicrobial residues, achieving an impressive average Intersection over Union (IoU) of 0.9528. This high IoU definitively validates the model’s capacity for precise locating and boundary detection of the complete AMP domain. This work successfully demonstrates robust, region-specific AMP identification directly from precursor protein sequences. Antimicrobial Peptides Genomic Mining Protein Language Model Residue-level annotation Full Text Additional Declarations No competing interests reported. Supplementary Files SUPPLEMENTARYMATERIALS.docx Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8208819","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":558481067,"identity":"cd46550e-6a70-4570-a622-b2fc7eb88651","order_by":0,"name":"Peilin Xie","email":"","orcid":"","institution":"Chinese University of Hong Kong, Shenzhen","correspondingAuthor":false,"prefix":"","firstName":"Peilin","middleName":"","lastName":"Xie","suffix":""},{"id":558481069,"identity":"ead82419-453b-4a98-b56a-49ba79f74c55","order_by":1,"name":"Xingchen Liu","email":"","orcid":"","institution":"The Chinese University of Hong Kong, Shenzhen","correspondingAuthor":false,"prefix":"","firstName":"Xingchen","middleName":"","lastName":"Liu","suffix":""},{"id":558481072,"identity":"a242391f-3fa5-4399-97e2-9962fea6540e","order_by":2,"name":"Lantian Yao","email":"","orcid":"","institution":"Xiamen University","correspondingAuthor":false,"prefix":"","firstName":"Lantian","middleName":"","lastName":"Yao","suffix":""},{"id":558481084,"identity":"8e26bd26-df06-401d-adc5-1864f13bd073","order_by":3,"name":"Zhihao Zhao","email":"","orcid":"","institution":"Chinese University of Hong Kong, Shenzhen","correspondingAuthor":false,"prefix":"","firstName":"Zhihao","middleName":"","lastName":"Zhao","suffix":""},{"id":558481085,"identity":"6f8bc6c9-5b92-4a7d-8d56-3a4880aaa5ee","order_by":4,"name":"Anming Yang","email":"","orcid":"","institution":"Peking University","correspondingAuthor":false,"prefix":"","firstName":"Anming","middleName":"","lastName":"Yang","suffix":""},{"id":558481087,"identity":"7dc8c297-bc56-4f66-9236-4943ab897382","order_by":5,"name":"Jiahui Guan","email":"","orcid":"","institution":"The University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Jiahui","middleName":"","lastName":"Guan","suffix":""},{"id":558481090,"identity":"4da15af8-f48e-47e7-9523-382e5398b805","order_by":6,"name":"Zijun Jiao","email":"","orcid":"","institution":"Peking University","correspondingAuthor":false,"prefix":"","firstName":"Zijun","middleName":"","lastName":"Jiao","suffix":""},{"id":558481100,"identity":"131e202d-44ec-40ae-ba4c-2aef514f1cda","order_by":7,"name":"Zhihong Liu","email":"","orcid":"","institution":"Shenzhen Bay Laboratory","correspondingAuthor":false,"prefix":"","firstName":"Zhihong","middleName":"","lastName":"Liu","suffix":""},{"id":558481102,"identity":"6153b7d5-a966-4559-a1cc-25a8eaccee3e","order_by":8,"name":"Junwen Wang","email":"","orcid":"","institution":"The University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Junwen","middleName":"","lastName":"Wang","suffix":""},{"id":558481103,"identity":"7f042503-3dff-47c2-bcf9-5db02b9c86ce","order_by":9,"name":"Tzong-Yi Lee","email":"","orcid":"","institution":"National Yang Ming Chiao Tung University","correspondingAuthor":false,"prefix":"","firstName":"Tzong-Yi","middleName":"","lastName":"Lee","suffix":""},{"id":558481105,"identity":"c1c732b1-e528-4899-9c1d-65097fd4fee7","order_by":10,"name":"Zigang Li","email":"","orcid":"","institution":"Peking University","correspondingAuthor":false,"prefix":"","firstName":"Zigang","middleName":"","lastName":"Li","suffix":""},{"id":558481113,"identity":"17e5f9f6-7f73-4ea3-9d15-a134648c4cfd","order_by":11,"name":"Bingyu Cui","email":"","orcid":"","institution":"Chinese University of Hong Kong, Shenzhen","correspondingAuthor":false,"prefix":"","firstName":"Bingyu","middleName":"","lastName":"Cui","suffix":""},{"id":558481129,"identity":"79c38fb5-9dae-4554-b134-3d4aaf7d25a5","order_by":12,"name":"Ying-Chih Chiang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA3klEQVRIiWNgGAWjYDACCcYGMM3PwMAGYkK5xGiRbCBeC5Q2OECsFv7ZzY2fC34dljc+fsbs4QwGG9kNB5ifPcBryZ2DzdIz+w4bbjuTY264gSHNeMMBNnMDfFoMJBIbpHl7DieY3eAxk3zAcDhxwwEeNgkCWpp/g7QYzwBr+U+UljZpnh+HEwwkgFo2MBwgrEXiRmKbNW9DuuGMM2llkjMMko1nHmYzw6uFf0b649s8f6zl+dsPb5PsqbCT7Tve/AyvFjBgbGuGuROImQmqB4E/dUQpGwWjYBSMghEKACuRSiImq+eLAAAAAElFTkSuQmCC","orcid":"","institution":"Chinese University of Hong Kong, Shenzhen","correspondingAuthor":true,"prefix":"","firstName":"Ying-Chih","middleName":"","lastName":"Chiang","suffix":""}],"badges":[],"createdAt":"2025-11-26 05:23:21","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8208819/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8208819/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":97964292,"identity":"21e6b735-f475-40fa-873a-95cc9a260b7a","added_by":"auto","created_at":"2025-12-11 09:33:22","extension":"json","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":14528,"visible":true,"origin":"","legend":"","description":"","filename":"feff67654d834c089fb9773053e8f91a.json","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/11edcd07426e0921a010b6bb.json"},{"id":98422856,"identity":"f1580e04-f6ed-49fb-8d09-8ce6d8100028","added_by":"auto","created_at":"2025-12-17 16:31:35","extension":"xml","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":102025,"visible":true,"origin":"","legend":"","description":"","filename":"feff67654d834c089fb9773053e8f91a1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/7073a91fe84f661466537985.xml"},{"id":98422956,"identity":"7b1de60a-8283-4cb8-ab82-618c6929695c","added_by":"auto","created_at":"2025-12-17 16:31:40","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":33274273,"visible":true,"origin":"","legend":"","description":"","filename":"RegionAMP.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/009567334ab47505afed3137.pdf"},{"id":97964294,"identity":"b124933b-a235-4ab0-b612-216b605382bb","added_by":"auto","created_at":"2025-12-11 09:33:22","extension":"pdf","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":118304,"visible":true,"origin":"","legend":"","description":"","filename":"coverletter.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/e56273381288a0cd12c38af6.pdf"},{"id":97964303,"identity":"35a2ee84-e1b9-4294-a642-b05ec9c2173e","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":10538681,"visible":true,"origin":"","legend":"","description":"","filename":"introbio.png","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/7213dfe2a97206b7c8b03ab8.png"},{"id":98423410,"identity":"95593f43-f7a5-4dd7-a78f-e6da8f0d825d","added_by":"auto","created_at":"2025-12-17 16:32:12","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1843999,"visible":true,"origin":"","legend":"","description":"","filename":"modelnew.png","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/239b69957c9a3b44184d2c4d.png"},{"id":98422518,"identity":"5608cc4b-e88b-4946-9463-aae694302140","added_by":"auto","created_at":"2025-12-17 16:31:10","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":19423313,"visible":true,"origin":"","legend":"","description":"","filename":"modelresult.png","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/7c650f5dbac5455628b49cbe.png"},{"id":97964300,"identity":"4ca99919-ba81-445d-9160-4b60d01d1425","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"jpg","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1225649,"visible":true,"origin":"","legend":"","description":"","filename":"peptidestucture.jpg","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/15d9fc3d2278eb6d9f79b92b.jpg"},{"id":98424010,"identity":"422275ae-878d-4f5e-a971-7b51a35b28b4","added_by":"auto","created_at":"2025-12-17 16:32:51","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2460692,"visible":true,"origin":"","legend":"","description":"","filename":"peptidestucture.png","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/fc22575feb75c07d2e977a3f.png"},{"id":97964295,"identity":"20a87f49-9f52-440f-bccd-a8c98bf0d276","added_by":"auto","created_at":"2025-12-11 09:33:22","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":499661,"visible":true,"origin":"","legend":"","description":"","filename":"scatter.png","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/4e63b6801194fc0ca75d0219.png"},{"id":97964299,"identity":"10e6fcf3-1d64-49cc-a898-5ed22311e4ab","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"bst","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":146013,"visible":true,"origin":"","legend":"","description":"","filename":"snapacite.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/f990509d9aed8e51872ad75e.bst"},{"id":97964297,"identity":"3a20267b-2d44-4761-b6cc-27a927d26098","added_by":"auto","created_at":"2025-12-11 09:33:22","extension":"bst","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":29828,"visible":true,"origin":"","legend":"","description":"","filename":"snaps.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/b7740e74e0e0d1d2ba462a91.bst"},{"id":98422500,"identity":"1d7a97ef-285e-4fbe-9851-55712b8d7585","added_by":"auto","created_at":"2025-12-17 16:31:08","extension":"pdf","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":421391,"visible":true,"origin":"","legend":"","description":"","filename":"snarticle.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/9a78f8f9325887bd480ce417.pdf"},{"id":97964316,"identity":"05bb03cd-b354-468a-b536-28a2e94da50e","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"bst","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":35515,"visible":true,"origin":"","legend":"","description":"","filename":"snbasic.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/07012ec1e9987906d33ee3e2.bst"},{"id":98423094,"identity":"bb88a3b6-86f7-498c-b911-43376ff39334","added_by":"auto","created_at":"2025-12-17 16:31:51","extension":"bst","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":33968,"visible":true,"origin":"","legend":"","description":"","filename":"snchicago.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/10faff9e3a1bb23a2650e335.bst"},{"id":98422413,"identity":"4c834823-f8bf-49ae-92a1-2e82072e934d","added_by":"auto","created_at":"2025-12-17 16:31:00","extension":"cls","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":55857,"visible":true,"origin":"","legend":"","description":"","filename":"snjnl.cls","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/111236c805478c9390056dfb.cls"},{"id":97964306,"identity":"7b041c90-1512-4618-969d-9be70cbb4f15","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"bst","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":64023,"visible":true,"origin":"","legend":"","description":"","filename":"snmathphysay.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/ee3442bbda20bcdc183ff5ce.bst"},{"id":97964310,"identity":"62f81522-4f8e-4468-af01-83cfbabfa277","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"bst","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":64166,"visible":true,"origin":"","legend":"","description":"","filename":"snmathphysnum.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/0cbd947c8dbeeb201a77655e.bst"},{"id":97964311,"identity":"1a6f5c14-c428-485f-bfb8-525b68d29fa0","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"bst","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":37333,"visible":true,"origin":"","legend":"","description":"","filename":"snnature.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/0d5e6da8e95dab9b7ea7140e.bst"},{"id":98424139,"identity":"7e5e950b-e2c0-4fc3-b66b-c1f6ebe8f3d2","added_by":"auto","created_at":"2025-12-17 16:32:59","extension":"bst","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":39951,"visible":true,"origin":"","legend":"","description":"","filename":"snvancouveray.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/851f4a0ae353cbb1749edf4d.bst"},{"id":98422760,"identity":"f5132ed7-695b-48bf-8d57-2609247a7cb6","added_by":"auto","created_at":"2025-12-17 16:31:27","extension":"bst","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":40758,"visible":true,"origin":"","legend":"","description":"","filename":"snvancouvernum.bst","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/0feca8f3d8aff82197d62ae2.bst"},{"id":97964313,"identity":"c2fed289-eca4-48e6-9dbe-90b6f2ac297d","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"pdf","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":418495,"visible":true,"origin":"","legend":"","description":"","filename":"usermanual.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/0a64b5f4266a082fbc1d2260.pdf"},{"id":98423046,"identity":"e55b2411-1c80-48fa-b829-8570b73f449b","added_by":"auto","created_at":"2025-12-17 16:31:46","extension":"xml","order_by":26,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":109991,"visible":true,"origin":"","legend":"","description":"","filename":"feff67654d834c089fb9773053e8f91a1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/984efdc91d5e2a16ca0fd91a.xml"},{"id":97964315,"identity":"5d0f8a30-c642-4825-b27c-35c736e9e6e1","added_by":"auto","created_at":"2025-12-11 09:33:23","extension":"html","order_by":27,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":121897,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/5dd85d6f414ad2e976151bc7.html"},{"id":104800714,"identity":"ad971e51-fc12-4be7-9410-4fa501b5613f","added_by":"auto","created_at":"2026-03-17 10:27:57","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":10598135,"visible":true,"origin":"","legend":"","description":"","filename":"RegionAMP.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1_covered_7a341cbd-0be9-4976-8aab-36d6325490e2.pdf"},{"id":98423832,"identity":"c62b5a10-9eab-498f-a1ef-c4a0a5f77cdb","added_by":"auto","created_at":"2025-12-17 16:32:41","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":468678,"visible":true,"origin":"","legend":"","description":"","filename":"SUPPLEMENTARYMATERIALS.docx","url":"https://assets-eu.researchsquare.com/files/rs-8208819/v1/246c933dce7625cf87a47fc1.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Antimicrobial Peptides, Genomic Mining, Protein Language Model, Residue-level annotation","lastPublishedDoi":"10.21203/rs.3.rs-8208819/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8208819/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eThe escalating crisis of antimicrobial resistance (AMR) urgently demands novel therapeutic agents, positioning antimicrobial peptides (AMPs)\u0026mdash;key effectors of innate immunity\u0026mdash;as highly promising candidates. While direct genomic mining provides a powerful route for discovery, conventional sequence-level classifiers face a fundamental methodological bottleneck: they are inadequate for analyzing full Open Reading Frame (ORF) translation products (precursor proteins) because they fail to identify and precisely locate the functional AMP domain within sequences that also contain other regions like signal peptides. To overcome this limitation and enable fine-grained locating, we developed RegionAMP, a unified deep learning framework for accurate residue-level annotation of AMP precursors. RegionAMP leverages the pre-trained ESM-2 protein language model, adapting it through a meticulously designed two-stage fine-tuning strategy. The initial stage learns the intrinsic sequence patterns of isolated functional fragments (signal, antimicrobial, neutral functions). Crucially, the second stage integrates a Conditional Random Field (CRF) decoding layer, enabling the model to learn contextual dependencies and inter-region transitions within full-length proteins, thereby achieving robust boundary delineation. The final architecture (PLM-CRF) is highly effective for this sequence labeling task. RegionAMP exhibits exceptional performance on a challenging, imbalanced independent test set, achieving an MCC of 0.92, indicating strong discriminative performance. The recall for the critical antimicrobial peptide sites (\u003cspan class=\"InlineEquation\"\u003e\u003cspan class=\"mathinline\"\u003e\\((Recall_M)\\)\u003c/span\u003e\u003c/span\u003e) also reached 0.93. Feature space analysis using t-SNE confirms the model\u0026rsquo;s effective differentiation of AMP, signal peptide, and neutral sites into distinct clusters. Most compellingly, on an independent and extremely imbalanced test dataset containing only 2,296 antimicrobial residues within 46,442,400 total residues, RegionAMP successfully recovered 2,127 true antimicrobial residues, achieving an impressive average Intersection over Union (IoU) of 0.9528. This high IoU definitively validates the model\u0026rsquo;s capacity for precise locating and boundary detection of the complete AMP domain. This work successfully demonstrates robust, region-specific AMP identification directly from precursor protein sequences.\u003c/p\u003e","manuscriptTitle":"Unlocking the genomic landscape for antimicrobial domain discovery with a two-stage progressive residue-level annotation model","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-11 09:33:18","doi":"10.21203/rs.3.rs-8208819/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3ab3a9b8-41a0-4d96-9d34-a77fd4235e0e","owner":[],"postedDate":"December 11th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-03-17T10:25:01+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-11 09:33:18","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8208819","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8208819","identity":"rs-8208819","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.