KinForm: Kinetics-Informed Feature Optimised Representation Models for Enzyme kcat and KM Prediction | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article KinForm: Kinetics-Informed Feature Optimised Representation Models for Enzyme k cat and K M Prediction Saleh Alwer, Ronan M T Fleming This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7990856/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 28 Mar, 2026 Read the published version in npj Systems Biology and Applications → Version 1 posted 11 You are reading this latest preprint version Abstract Kinetic parameters such as the turnover number (kcat ) and Michaelis constant (KM) are essential for modelling enzymatic activity but experimental data remains limited in scale and diversity. Previous methods for predicting enzyme kinetics typically use mean-pooled residue embeddings from a single protein language model to represent the protein. We present KinForm, a machine learning framework designed to improve predictive accuracy and generalisation for kinetic parameters by optimising protein feature representations. KinForm combines several residue-level embeddings (Evolutionary Scale Modeling Cambrian, Evolutionary Scale Modeling 2, and ProtT5-XL-UniRef50), taken from empirically selected intermediate transformer layers, and applies weighted pooling based on per-residue binding-site probability. To counter the resulting high dimensionality, we apply dimensionality reduction using principal–component analysis (PCA) on concatenated protein features, and rebalance the training data via a similarity-based oversampling strategy. KinForm outperforms baseline methods on two benchmark datasets. Improvements are most pronounced in low sequence similarity bins. We observe improvements from binding-site probability pooling, intermediate-layer selection, PCA, and oversampling of low-identity proteins. We also find that removing sequence overlap between folds provides a more realistic evaluation of generalisation and should be the standard over random splitting when benchmarking kinetic prediction models. Biological sciences/Computational biology and bioinformatics Physical sciences/Mathematics and computing Full Text Additional Declarations No competing interests reported. Supplementary Files supplementarymaterials.pdf Cite Share Download PDF Status: Published Journal Publication published 28 Mar, 2026 Read the published version in npj Systems Biology and Applications → Version 1 posted Editorial decision: Revision requested 12 Dec, 2025 Reviews received at journal 23 Nov, 2025 Reviews received at journal 16 Nov, 2025 Reviewers agreed at journal 12 Nov, 2025 Reviewers agreed at journal 12 Nov, 2025 Reviewers agreed at journal 10 Nov, 2025 Reviewers agreed at journal 07 Nov, 2025 Reviewers invited by journal 07 Nov, 2025 Editor assigned by journal 07 Nov, 2025 Submission checks completed at journal 07 Nov, 2025 First submitted to journal 30 Oct, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7990856","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":545929724,"identity":"8345e443-eec1-4cfe-9c60-bae2f93363a2","order_by":0,"name":"Saleh Alwer","email":"","orcid":"","institution":"University of Galway","correspondingAuthor":false,"prefix":"","firstName":"Saleh","middleName":"","lastName":"Alwer","suffix":""},{"id":545929725,"identity":"482f9e57-36f4-45a0-b678-79e86444132d","order_by":1,"name":"Ronan M T Fleming","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABCUlEQVRIiWNgGAWjYFAD5sMNDAwGDAz8IE5CAWENEgxsiRAtkg0gLQZEawECgwNgErdS8/beg58LGLbVMbAxNj66UXDHbvP51YkfHhgwyPOLHcCqRebMuWTpGQy3gbYwNhvnGDxL3nbj7WYJoMMMZ85OwO4giRwDaR6QFvnGNukcg8PJZjfObgBpSTC4jVOL8W8eiC3tv0FajGec3fyDgBYzaaiWNmagFjsD/t5t+G3hOWNmzWNwW7IN6BeQwxIkbvBus0gwkMDtF/Ye49s8Fbf5+dmYD37O+XPYnr//7OabPyps5PmlsWuBAGAssEGZiQ0SYJUSeJSjAXsG/gPEqx4Fo2AUjIIRAQDHjFeiHnbxzAAAAABJRU5ErkJggg==","orcid":"","institution":"University of Galway","correspondingAuthor":true,"prefix":"","firstName":"Ronan","middleName":"M T","lastName":"Fleming","suffix":""}],"badges":[],"createdAt":"2025-10-30 15:23:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7990856/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7990856/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41540-026-00692-5","type":"published","date":"2026-03-28T16:13:26+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":96248667,"identity":"1ba6abea-dff1-4087-b8b6-d9458173d7d4","added_by":"auto","created_at":"2025-11-19 07:28:57","extension":"json","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4741,"visible":true,"origin":"","legend":"","description":"","filename":"7ffcbb11c50e429281b22f7788896eb9.json","url":"https://assets-eu.researchsquare.com/files/rs-7990856/v1/083fc849ecafb5b6afc5fb5a.json"},{"id":105756052,"identity":"c5a3f3f2-3ad2-451b-ad7f-ddcfe07391cd","added_by":"auto","created_at":"2026-03-30 16:34:41","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":6018823,"visible":true,"origin":"","legend":"","description":"","filename":"KinForm.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7990856/v1_covered_b9f0d897-5e6c-40c7-b35e-0843dc6bb68a.pdf"},{"id":96132580,"identity":"28ecf0c2-d856-4d16-9c2a-980ccf47729d","added_by":"auto","created_at":"2025-11-18 02:53:57","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":3737211,"visible":true,"origin":"","legend":"","description":"","filename":"supplementarymaterials.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7990856/v1/967c8bff21ed52069392f95b.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"\u003cp\u003eKinForm: Kinetics-Informed Feature Optimised Representation Models for Enzyme k\u003csub\u003ecat\u003c/sub\u003e and K\u003csub\u003eM\u003c/sub\u003e Prediction\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"npj-systems-biology-and-applications","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjsba","sideBox":"Learn more about [npj Systems Biology and Applications](http://www.nature.com/npjsba/)","snPcode":"41540","submissionUrl":"https://submission.springernature.com/new-submission/41540/3","title":"npj Systems Biology and Applications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7990856/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7990856/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Kinetic parameters such as the turnover number (kcat ) and Michaelis constant (KM) are essential for modelling enzymatic activity but experimental data remains limited in scale and diversity. Previous methods for predicting enzyme kinetics typically use mean-pooled residue embeddings from a single protein language model to represent the protein. We present KinForm, a machine learning framework designed to improve predictive accuracy and generalisation for kinetic parameters by optimising protein feature representations. KinForm combines several residue-level embeddings (Evolutionary Scale Modeling Cambrian, Evolutionary Scale Modeling 2, and ProtT5-XL-UniRef50), taken from empirically selected intermediate transformer layers, and applies weighted pooling based on per-residue binding-site probability. To counter the resulting high dimensionality, we apply dimensionality reduction using principal–component analysis (PCA) on concatenated protein features, and rebalance the training data via a similarity-based oversampling strategy. KinForm outperforms baseline methods on two benchmark datasets. Improvements are most pronounced in low sequence similarity bins. We observe improvements from binding-site probability pooling, intermediate-layer selection, PCA, and oversampling of low-identity proteins. We also find that removing sequence overlap between folds provides a more realistic evaluation of generalisation and should be the standard over random splitting when benchmarking kinetic prediction models.","manuscriptTitle":"KinForm: Kinetics-Informed Feature Optimised Representation Models for Enzyme kcat and KM Prediction","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-18 02:53:52","doi":"10.21203/rs.3.rs-7990856/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-12-12T17:07:14+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-23T19:46:20+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-11-17T00:39:44+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"298954501122060238358799468687009018729","date":"2025-11-12T18:37:45+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"105732524084532732784860136523450956257","date":"2025-11-12T17:33:19+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"266723203855525495925843198168959769940","date":"2025-11-10T20:44:42+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"248134459135941781426649704708618912187","date":"2025-11-07T14:25:16+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-11-07T14:06:47+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-11-07T07:53:59+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-11-07T07:33:11+00:00","index":"","fulltext":""},{"type":"submitted","content":"npj Systems Biology and Applications","date":"2025-10-30T15:15:10+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"npj-systems-biology-and-applications","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjsba","sideBox":"Learn more about [npj Systems Biology and Applications](http://www.nature.com/npjsba/)","snPcode":"41540","submissionUrl":"https://submission.springernature.com/new-submission/41540/3","title":"npj Systems Biology and Applications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"07226be4-4cdb-4720-b552-57789316f565","owner":[],"postedDate":"November 18th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":58150658,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":58150659,"name":"Physical sciences/Mathematics and computing"}],"tags":[],"updatedAt":"2026-03-30T16:30:47+00:00","versionOfRecord":{"articleIdentity":"rs-7990856","link":"https://doi.org/10.1038/s41540-026-00692-5","journal":{"identity":"npj-systems-biology-and-applications","isVorOnly":false,"title":"npj Systems Biology and Applications"},"publishedOn":"2026-03-28 16:13:26","publishedOnDateReadable":"March 28th, 2026"},"versionCreatedAt":"2025-11-18 02:53:52","video":"","vorDoi":"10.1038/s41540-026-00692-5","vorDoiUrl":"https://doi.org/10.1038/s41540-026-00692-5","workflowStages":[]},"version":"v1","identity":"rs-7990856","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7990856","identity":"rs-7990856","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.