AOGMMNC: Adaptive and Robust General-Purpose Clustering for Data Partition | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article AOGMMNC: Adaptive and Robust General-Purpose Clustering for Data Partition Shuping Sun, Yizhuo Zhang, Guangyu Liu, Shengmei Mo, Jinbo Chen, and 1 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7809379/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted You are reading this latest preprint version Abstract Gaussian Mixture Model (GMM) combined with Spectral Clustering (SC) is an innovative clustering methodology that has been successfully applied to various clustering problems, particularly those involving complex shapes and nonlinear structures. However, some challenges arise in determining the optimal GMM and its relationship with SC: (1) Determining the optimal GMM can be costly, as it requires evaluating all models across a wide range of parameters; (2) The random initialization process of the Expectation-Maximization (EM) algorithm may result in unstable outcomes; (3) The efficiency of SC is highly reliant on the adjacency matrix generated by the GMM; (4) Clustering can become particularly difficult when the optimal GMM consists of only one mixture component. To tackle the challenges, we first implement a modified incremental GMM combined with the EM algorithm for determining the optimal number of mixtures in the GMM, allowing for adaptive fitting of the dataset. Next, we propose a novel initialization method for the EM algorithm called KGMC, which focuses on optimizing the GMM based on entropy-penalized maximum likelihood. Furthermore, we introduce a revised adjacency matrix (Α) and combine it with the fast algorithm for solving the normalized cut (FCD) to merge the optimal GMM for data partitioning. Additionally, the probability partition-based multi-cluster concept is proposed to address clustering tasks related to the optimal GMM with only one mixture. Rigorous comparisons with general and specialized clustering methods conducted on simulated and real-world datasets consistently demonstrate the high performance of our clustering algorithm across all tested datasets. Scientific community and society/Scientific community/Research data/Databases/Protein databases Biological sciences/Neuroscience/Visual system/Pattern vision GMM SC EM KGMC A FCD Full Text Additional Declarations There is NO Competing Interest. Cite Share Download PDF Status: Under Review Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7809379","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":532954715,"identity":"cb289150-236c-41a3-85ef-9de5dea672d3","order_by":0,"name":"Shuping Sun","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAzUlEQVRIiWNgGAWjYBAC+RkMDIw/GBjkgGw24rQY3GBgYOZhYDCGaZEgrAWohBlIJzYQr0W69wAzY9ud9A3Hm589YNxhU0dQi/yccwmMP9ue5W44c8zcgPFMGmFbGG7kGDDwth3O3XAjwUyCse0w8VrSDe4//wbU8p94LQkGN3hAthwgwvs38hKYec4dNpx5JqdMIrEtWbKBkBb5GbkHGH+UHZbnO358m8THNjt+wg5j4GH/AaIUDgCJBCLUg7RArSPooFEwCkbBKBixAADf2D2e1BnFpgAAAABJRU5ErkJggg==","orcid":"","institution":"Hunan Institute of Science and Technology","correspondingAuthor":true,"prefix":"","firstName":"Shuping","middleName":"","lastName":"Sun","suffix":""},{"id":532954716,"identity":"b3d5d68e-86d6-40ed-ad97-ba38b9ad56ac","order_by":1,"name":"Yizhuo Zhang","email":"","orcid":"","institution":"Hunan institute of science and technology","correspondingAuthor":false,"prefix":"","firstName":"Yizhuo","middleName":"","lastName":"Zhang","suffix":""},{"id":532954717,"identity":"cdb555a5-4e3b-47b1-885d-7c43fcc0abf8","order_by":2,"name":"Guangyu Liu","email":"","orcid":"","institution":"Hunan institute of science and technology","correspondingAuthor":false,"prefix":"","firstName":"Guangyu","middleName":"","lastName":"Liu","suffix":""},{"id":532954718,"identity":"0e8efb86-c157-4afe-b3e0-3bc95e702542","order_by":3,"name":"Shengmei Mo","email":"","orcid":"","institution":"Hunan institute of science and technology","correspondingAuthor":false,"prefix":"","firstName":"Shengmei","middleName":"","lastName":"Mo","suffix":""},{"id":532954719,"identity":"6b55f3d2-b0c6-4231-b5c6-af5569387cb3","order_by":4,"name":"Jinbo Chen","email":"","orcid":"","institution":"Nanyang Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Jinbo","middleName":"","lastName":"Chen","suffix":""},{"id":532954720,"identity":"3a2dcef6-add6-44f2-bf69-b4194b85b34e","order_by":5,"name":"Yaonan Tong","email":"","orcid":"","institution":"Hunan Institute of Science and Technology","correspondingAuthor":false,"prefix":"","firstName":"Yaonan","middleName":"","lastName":"Tong","suffix":""}],"badges":[],"createdAt":"2025-10-08 15:16:54","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7809379/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7809379/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":94062142,"identity":"c1c945c2-4c69-43e1-9725-a31fb2a181e2","added_by":"auto","created_at":"2025-10-22 07:04:36","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":9038,"visible":true,"origin":"","legend":"","description":"","filename":"Coverletter2NC.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7809379/v1/c8d89aec2b6471c79856e241.pdf"},{"id":94062144,"identity":"21aba9ad-8e82-4b51-823e-869db3eb2f2d","added_by":"auto","created_at":"2025-10-22 07:04:37","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":19984344,"visible":true,"origin":"","legend":"","description":"","filename":"Manuscript2NC.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7809379/v1/7e170d29af0016edc3965aa5.pdf"},{"id":94062143,"identity":"ff846ca9-9188-4881-8719-1d296c5409d9","added_by":"auto","created_at":"2025-10-22 07:04:36","extension":"json","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":7601,"visible":true,"origin":"","legend":"","description":"","filename":"NCOMMS2580471.json","url":"https://assets-eu.researchsquare.com/files/rs-7809379/v1/3e6f8d372794d291d9d95e79.json"},{"id":94062657,"identity":"6f9a7aaf-5e36-44e9-87e4-86cf72447d88","added_by":"auto","created_at":"2025-10-22 07:12:54","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":20030695,"visible":true,"origin":"","legend":"Article File","description":"","filename":"Manuscript2NC.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7809379/v1_covered_87b50503-e951-463d-8c59-d5e9368f3a8c.pdf"}],"financialInterests":"There is \u003cb\u003eNO\u003c/b\u003e Competing Interest.","formattedTitle":"AOGMMNC: Adaptive and Robust General-Purpose Clustering for Data Partition","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"nature-portfolio","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"","title":"Nature Portfolio","twitterHandle":"","acdcEnabled":false,"dfaEnabled":false,"editorialSystem":"ejp","reportingPortfolio":"","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"GMM, SC, EM, KGMC, A, FCD","lastPublishedDoi":"10.21203/rs.3.rs-7809379/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7809379/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Gaussian Mixture Model (GMM) combined with Spectral Clustering (SC) is an innovative clustering methodology that has been successfully applied to various clustering problems, particularly those involving complex shapes and nonlinear structures. However, some challenges arise in determining the optimal GMM and its relationship with SC: (1) Determining the optimal GMM can be costly, as it requires evaluating all models across a wide range of parameters; (2) The random initialization process of the Expectation-Maximization (EM) algorithm may result in unstable outcomes; (3) The efficiency of SC is highly reliant on the adjacency matrix generated by the GMM; (4) Clustering can become particularly difficult when the optimal GMM consists of only one mixture component. To tackle the challenges, we first implement a modified incremental GMM combined with the EM algorithm for determining the optimal number of mixtures in the GMM, allowing for adaptive fitting of the dataset. Next, we propose a novel initialization method for the EM algorithm called KGMC, which focuses on optimizing the GMM based on entropy-penalized maximum likelihood. Furthermore, we introduce a revised adjacency matrix (Α) and combine it with the fast algorithm for solving the normalized cut (FCD) to merge the optimal GMM for data partitioning. Additionally, the probability partition-based multi-cluster concept is proposed to address clustering tasks related to the optimal GMM with only one mixture. Rigorous comparisons with general and specialized clustering methods conducted on simulated and real-world datasets consistently demonstrate the high performance of our clustering algorithm across all tested datasets.","manuscriptTitle":"AOGMMNC: Adaptive and Robust General-Purpose Clustering for Data Partition","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-22 07:04:31","doi":"10.21203/rs.3.rs-7809379/v1","editorialEvents":[],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"nature-communications","isNatureJournal":true,"hasQc":false,"allowDirectSubmit":false,"externalIdentity":"NCOMMS","sideBox":"Learn more about [Nature Communications](http://www.nature.com/ncomms/)","snPcode":"","submissionUrl":"https://mts-ncomms.nature.com/","title":"Nature Communications","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"ejp","reportingPortfolio":"Nature Communications","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"2b13df7e-572b-4381-9499-6b8d63c0f7c3","owner":[],"postedDate":"October 22nd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":56658003,"name":"Scientific community and society/Scientific community/Research data/Databases/Protein databases"},{"id":56658004,"name":"Biological sciences/Neuroscience/Visual system/Pattern vision"}],"tags":[],"updatedAt":"2026-04-24T13:30:24+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-22 07:04:31","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7809379","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7809379","identity":"rs-7809379","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.