Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling Mingda Li, Mouyang Cheng, Weiliang Luo, Hao Tang, Bowen Yu, Yongqiang Cheng, and 3 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7228011/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Diffusion-based deep generative models have emerged as powerful tools for inverse materials design. Yet, many existing approaches overlook essential chemical constraints such as oxidation state balance, which can lead to chemically invalid structures. Here we introduce CrysVCD (Crystal generator with Valence-Constrained Design), a modular framework that integrates chemical rules directly into the generative process. CrysVCD first employs a transformer-based elemental language model to generate valence-balanced compositions, followed by a diffusion model to generate crystal structures. The valence constraint enables orders-of-magnitude more efficient chemical valence checking, compared to pure data-driven approaches with post-screening. When fine-tuned on stability metrics, CrysVCD achieves 85% thermodynamic stability and 68% phonon stability. Moreover, CrysVCD supports conditional generation of functional materials, enabling discovery of candidates such as high thermal conductivity semiconductors and high-κ dielectric compounds. Designed as a general-purpose plugin, CrysVCD can be integrated into diverse generative pipeline to promote chemical validity, offering a reliable, scientifically grounded path for materials discovery. Physical sciences/Materials science/Theory and computation/Computational methods Physical sciences/Materials science/Materials for energy and catalysis Full Text Additional Declarations Yes there is potential Competing Interest. The authors declare that a patent application has been filed relating to the material described in this manuscript. Supplementary Files CrysVCDSI.pdf SUPPLEMENTARY INFORMATION Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7228011","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":509170265,"identity":"1d0723b6-15b5-428c-8003-0fc586b1f374","order_by":0,"name":"Mingda Li","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAqElEQVRIiWNgGAWjYDACZiDmYbBhYGMAM4jXksbAxka0FojhhxkYiNai2857+MWbmvPRfPINjA/ethGhxewwX5rlnGO3c9vYGJgN5xKnhcfMmIcNrIVNmpd4Lf/OgbSw/yZWi/Fj3rYDYFuYibaFcW5fMlBLYrPknHPEaDl/xvjDm292ufObDx/88KaMCC1AwCYBoRkbiFMPBMwfiFY6CkbBKBgFIxMAAOawMRIuDajPAAAAAElFTkSuQmCC","orcid":"https://orcid.org/0000-0002-7055-6368","institution":"MIT","correspondingAuthor":true,"prefix":"","firstName":"Mingda","middleName":"","lastName":"Li","suffix":""},{"id":509170266,"identity":"79193951-a500-4f74-9464-8f78bb9991d3","order_by":1,"name":"Mouyang Cheng","email":"","orcid":"https://orcid.org/0009-0001-7014-2464","institution":"MIT","correspondingAuthor":false,"prefix":"","firstName":"Mouyang","middleName":"","lastName":"Cheng","suffix":""},{"id":509170267,"identity":"c84a9adc-ec2c-4c93-a6f9-5ab7bee19cbc","order_by":2,"name":"Weiliang Luo","email":"","orcid":"https://orcid.org/0009-0005-6150-2797","institution":"MIT","correspondingAuthor":false,"prefix":"","firstName":"Weiliang","middleName":"","lastName":"Luo","suffix":""},{"id":509170268,"identity":"38162b42-f319-4458-8e44-5c69415a4189","order_by":3,"name":"Hao Tang","email":"","orcid":"https://orcid.org/0000-0002-6877-0226","institution":"Massachusetts Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Hao","middleName":"","lastName":"Tang","suffix":""},{"id":509170269,"identity":"659c48d2-7694-4fee-be4f-93912d6a7da6","order_by":4,"name":"Bowen Yu","email":"","orcid":"","institution":"MIT","correspondingAuthor":false,"prefix":"","firstName":"Bowen","middleName":"","lastName":"Yu","suffix":""},{"id":509170270,"identity":"1af2eed6-507f-4b6d-aa2c-c024fec30400","order_by":5,"name":"Yongqiang Cheng","email":"","orcid":"https://orcid.org/0000-0002-3263-4812","institution":"Oak Ridge National Lab","correspondingAuthor":false,"prefix":"","firstName":"Yongqiang","middleName":"","lastName":"Cheng","suffix":""},{"id":509170271,"identity":"a9bec4b2-82f2-448c-b981-94a382874c90","order_by":6,"name":"Weiwei Xie","email":"","orcid":"","institution":"Michigan State University","correspondingAuthor":false,"prefix":"","firstName":"Weiwei","middleName":"","lastName":"Xie","suffix":""},{"id":509170272,"identity":"e73ed57c-aba2-41fe-a844-bf1d484bf04f","order_by":7,"name":"Ju Li","email":"","orcid":"https://orcid.org/0000-0002-7841-8058","institution":"Massachusetts Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Ju","middleName":"","lastName":"Li","suffix":""},{"id":509170273,"identity":"a8a7a687-e263-4f2f-af06-e4366d135027","order_by":8,"name":"Heather Kulik","email":"","orcid":"https://orcid.org/0000-0001-9342-0191","institution":"Massachusetts Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Heather","middleName":"","lastName":"Kulik","suffix":""}],"badges":[],"createdAt":"2025-07-27 19:20:39","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7228011/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7228011/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":90528890,"identity":"e548477d-7107-4c46-8d54-aec6108e822a","added_by":"auto","created_at":"2025-09-03 17:46:12","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1976536,"visible":true,"origin":"","legend":"Article File","description":"","filename":"CrysVCDmain.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7228011/v1_covered_85caec82-d5bc-4527-9f89-9ddac9e10bb9.pdf"},{"id":90528371,"identity":"fbf4466e-ccd0-4db2-8ae6-0093c545d4b2","added_by":"auto","created_at":"2025-09-03 17:38:10","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":1873971,"visible":true,"origin":"","legend":"SUPPLEMENTARY INFORMATION","description":"","filename":"CrysVCDSI.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7228011/v1/9ff3747cd3efd9b20ea47a37.pdf"}],"financialInterests":"\u003cb\u003eYes\u003c/b\u003e there is potential Competing Interest.\nThe authors declare that a patent application has been filed relating to the material described in this manuscript.","formattedTitle":"Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7228011/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7228011/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Diffusion-based deep generative models have emerged as powerful tools for inverse materials design. Yet, many existing approaches overlook essential chemical constraints such as oxidation state balance, which can lead to chemically invalid structures. Here we introduce CrysVCD (Crystal generator with Valence-Constrained Design), a modular framework that integrates chemical rules directly into the generative process. CrysVCD first employs a transformer-based elemental language model to generate valence-balanced compositions, followed by a diffusion model to generate crystal structures. The valence constraint enables orders-of-magnitude more efficient chemical valence checking, compared to pure data-driven approaches with post-screening. When fine-tuned on stability metrics, CrysVCD achieves 85% thermodynamic stability and 68% phonon stability. Moreover, CrysVCD supports conditional generation of functional materials, enabling discovery of candidates such as high thermal conductivity semiconductors and high-κ dielectric compounds. Designed as a general-purpose plugin, CrysVCD can be integrated into diverse generative pipeline to promote chemical validity, offering a reliable, scientifically grounded path for materials discovery.","manuscriptTitle":"Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-03 17:38:05","doi":"10.21203/rs.3.rs-7228011/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-computational-science","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"natcomputsci","sideBox":"Learn more about [Nature Computational Science](http://www.nature.com/natcomputsci/)","snPcode":"","submissionUrl":"","title":"Nature Computational Science","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Research","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"f380749a-7c79-4f02-b8b8-fc7bc5e3bda1","owner":[],"postedDate":"September 3rd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":54076979,"name":"Physical sciences/Materials science/Theory and computation/Computational methods"},{"id":54076980,"name":"Physical sciences/Materials science/Materials for energy and catalysis"}],"tags":[],"updatedAt":"2026-05-07T20:16:17+00:00","versionOfRecord":[],"versionCreatedAt":"2025-09-03 17:38:05","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7228011","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7228011","identity":"rs-7228011","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.