Deep learning for enzyme kcat prediction: what works, what doesn't, and why? | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Deep learning for enzyme k cat prediction: what works, what doesn't, and why? Liangzhen Zheng This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9154046/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Deep learning models for turnover number (kcat) prediction are widely reported in recent years, yet their practical utility in enzyme engineering remains unclear. We benchmark five state-of-the-art models using multi-dimensional tests beyond standard regression metrics. Training data are severely biased toward oxidoreductases and ATP-like substrates, with sparse coverage of sequence and chemical space. Across diverse independent test sets—including temporal hold-out data, novel enzymes, deep mutational scans, and enzyme-inhibitor pairs—all models failed to predict absolute kcat accurately. Moreover, their ranking capability collapsed for sequences dissimilar to the training data. Crucially, models exhibit a striking asymmetry: they respond to active-site disruptions but ignore substrate chemistry, unable to distinguish substrates from products or inhibitors. Experimental validation on 98 PjxA xylanase mutants proposed by these models confirms low predictive utility (global correlations <0.3, positive rate <=10%. These findings indicate current models function as pattern-recognition-like predictors rather than mechanism-aware predictors. Our findings reveal that current kcat predictors lack the chemical awareness required for reliable industrial application, highlighting a critical gap in the field. Biological sciences/Biochemistry/Biocatalysis Biological sciences/Biochemistry/Enzymes Biological sciences/Computational biology and bioinformatics/Machine learning Enzyme catalysis deep learning turnover number enzyme kinetics Full Text Additional Declarations There is NO Competing Interest. Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9154046","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":607951702,"identity":"47d3261b-dc02-4a11-90fb-fa01c132389a","order_by":0,"name":"Liangzhen Zheng","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA5ElEQVRIie2PuwrCMBSGKwfiEuiaQeorRApe8GUqBaf6BqKbk+ha38Ghk3PlYCclawa3goMgFASnIiaDjm3cBPNBLsP/cc7vOBbL75KqA/piLYM0vBUSaIV+o1Cuf/WKuxZZUZRnr+du71c57VOniYekSmEyhE1ML/4gvu2GUaYWo+OxrBwjAYAyHCXytPMjohRGu5VKW6BSOM4Tebz40dNA4WmolAADLpaQTxYGSkeGfiNOsZNI0oXJilFS18UT+9wpSmxzgfk9esw8t4lZdf0PLCBMv8QsrnFTKMzTFovF8k+8ADHNSqUSnIddAAAAAElFTkSuQmCC","orcid":"","institution":"Shanghai Zelixir Biotech Co Ltd","correspondingAuthor":true,"prefix":"","firstName":"Liangzhen","middleName":"","lastName":"Zheng","suffix":""}],"badges":[],"createdAt":"2026-03-18 03:25:41","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9154046/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9154046/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":106504693,"identity":"6b5fa56c-cf1b-4892-94ff-00f56df8468f","added_by":"auto","created_at":"2026-04-09 09:43:56","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":16228970,"visible":true,"origin":"","legend":"Article File","description":"","filename":"KcatBenchmark260318.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9154046/v1_covered_73e09cc5-cdb8-44f4-8e95-f3aeb2fd0b57.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"\u003cp\u003eDeep learning for enzyme k\u003csub\u003ecat\u003c/sub\u003e prediction: what works, what doesn't, and why?\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Enzyme catalysis, deep learning, turnover number, enzyme kinetics","lastPublishedDoi":"10.21203/rs.3.rs-9154046/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9154046/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Deep learning models for turnover number (kcat) prediction are widely reported in recent years, yet their practical utility in enzyme engineering remains unclear. We benchmark five state-of-the-art models using multi-dimensional tests beyond standard regression metrics. Training data are severely biased toward oxidoreductases and ATP-like substrates, with sparse coverage of sequence and chemical space. Across diverse independent test sets—including temporal hold-out data, novel enzymes, deep mutational scans, and enzyme-inhibitor pairs—all models failed to predict absolute kcat accurately. Moreover, their ranking capability collapsed for sequences dissimilar to the training data. Crucially, models exhibit a striking asymmetry: they respond to active-site disruptions but ignore substrate chemistry, unable to distinguish substrates from products or inhibitors. Experimental validation on 98 PjxA xylanase mutants proposed by these models confirms low predictive utility (global correlations \u003c0.3, positive rate \u003c=10%. These findings indicate current models function as pattern-recognition-like predictors rather than mechanism-aware predictors. Our findings reveal that current kcat predictors lack the chemical awareness required for reliable industrial application, highlighting a critical gap in the field.","manuscriptTitle":"Deep learning for enzyme kcat prediction: what works, what doesn't, and why?","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-09 09:40:40","doi":"10.21203/rs.3.rs-9154046/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-catalysis","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"natcatal","sideBox":"Learn more about [Nature Catalysis](http://www.nature.com/natcatal/)","snPcode":"","submissionUrl":"","title":"Nature Catalysis","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"71115340-a87f-4674-b2d3-0119a9456111","owner":[],"postedDate":"April 9th, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"This content is not available.","date":"2026-05-09T00:47:20+00:00","index":1,"fulltext":"This content is not available."}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":65998817,"name":"Biological sciences/Biochemistry/Biocatalysis"},{"id":65998818,"name":"Biological sciences/Biochemistry/Enzymes"},{"id":65998819,"name":"Biological sciences/Computational biology and bioinformatics/Machine learning"}],"tags":[],"updatedAt":"2026-04-24T15:22:33+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-09 09:40:40","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9154046","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9154046","identity":"rs-9154046","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.