Cardinality estimation based on QDSPN for embedded databases under dynamic workload | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Cardinality estimation based on QDSPN for embedded databases under dynamic workload Xiaoou Ding, Hongbin Su, Ziming Shen, Yiming Guan, Hongzhi Wang This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-3901354/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Cardinality estimation has been a pivotal and enduring research focus within database query optimization. While significant advancements have been made in estimating cardinalities for both individual tables and complex multi-table joins, there remains a notable gap in research pertaining to embedded database scenarios. Embedded databases are typically characterized by limited resources and a preponderance of dense, short-term hotspot queries. As a result, cardinality estimation within the constraints of embedded databases poses additional complexities and challenges. In this paper, we introduce a novel Query-driven Sum-Product Network (QDSPN), which leverages the capabilities of sum-product networks (SPNs) to learn from historical data and adapt to dynamic workload variations. This approach effectively mitigates the inherent challenges of SPNs, such as false cluster collisions and independence assumption errors, particularly under conditions of strongly correlated data. Furthermore, we propose a two-stage query clustering framework tailored for dynamic workload environments. This framework serves to guide the structural configuration of the sum-product network, enhancing its adaptability and efficiency. We conduct extensive experiments to validate the performance of QDSPN under dynamic workloads. The experimental results demonstrate the evident advantages of the proposed QDSPN, and highlight its potential for widespread adoption in embedded database systems. Cardinality estimation Embedded database Sum-product network Dynamic workload Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-3901354","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":271255036,"identity":"faf576ab-26e6-4af4-9b9a-0c3d9af9303b","order_by":0,"name":"Xiaoou Ding","email":"","orcid":"","institution":"Harbin Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Xiaoou","middleName":"","lastName":"Ding","suffix":""},{"id":271255037,"identity":"987e0413-40a9-47da-9f4e-848d0ca53a19","order_by":1,"name":"Hongbin Su","email":"","orcid":"","institution":"Harbin Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Hongbin","middleName":"","lastName":"Su","suffix":""},{"id":271255038,"identity":"446622f3-5312-4f32-ad58-f1da4a926e1a","order_by":2,"name":"Ziming Shen","email":"","orcid":"","institution":"Harbin Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Ziming","middleName":"","lastName":"Shen","suffix":""},{"id":271255039,"identity":"0933cc2e-840a-4bac-8ca3-f03923490466","order_by":3,"name":"Yiming Guan","email":"","orcid":"","institution":"Harbin Institute of Technology","correspondingAuthor":false,"prefix":"","firstName":"Yiming","middleName":"","lastName":"Guan","suffix":""},{"id":271255040,"identity":"a611e9e0-3459-4af6-a024-78bfb4212b3d","order_by":4,"name":"Hongzhi Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAx0lEQVRIiWNgGAWjYLACxgabBDYQg4cELWkJbGwkajmcwEC0FoPjZw+//LnjfB6ffAPjg7dtDPLmBLWcyUuzkDxzuxjoMGbDuW0MhjsbCGgxO5BjZmDYdjuxjY2BTZq3jSHB4AAhLeffmBkktp0DaWH/TZyWGznGDw62HQDbwkyUFvsbb8wYG9uSgVoSmyXnnJMw3EBIi2R/jvHHn212ifObDx/88KbMRp6gLUDAJgGhGRuAhARh9UDA/IEoZaNgFIyCUTByAQAiwz+MflMahgAAAABJRU5ErkJggg==","orcid":"","institution":"Harbin Institute of Technology","correspondingAuthor":true,"prefix":"","firstName":"Hongzhi","middleName":"","lastName":"Wang","suffix":""}],"badges":[],"createdAt":"2024-01-27 00:03:35","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-3901354/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-3901354/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":51021698,"identity":"65f6497f-bb8e-43e2-b07d-9c200a150523","added_by":"auto","created_at":"2024-02-12 20:09:36","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2924256,"visible":true,"origin":"","legend":"","description":"","filename":"Cardinality0205.pdf","url":"https://assets-eu.researchsquare.com/files/rs-3901354/v1_covered_e29187a9-3c4c-4c37-b5e2-7b056702c0be.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Cardinality estimation based on QDSPN for embedded databases under dynamic workload","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Cardinality estimation, Embedded database, Sum-product network, Dynamic workload","lastPublishedDoi":"10.21203/rs.3.rs-3901354/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-3901354/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Cardinality estimation has been a pivotal and enduring research focus within database query optimization. While significant advancements have been made in estimating cardinalities for both individual tables and complex multi-table joins, there remains a notable gap in research pertaining to embedded database scenarios. Embedded databases are typically characterized by limited resources and a preponderance of dense, short-term hotspot queries. As a result, cardinality estimation within the constraints of embedded databases poses additional complexities and challenges. In this paper, we introduce a novel Query-driven Sum-Product Network (QDSPN), which leverages the capabilities of sum-product networks (SPNs) to learn from historical data and adapt to dynamic workload variations. This approach effectively mitigates the inherent challenges of SPNs, such as false cluster collisions and independence assumption errors, particularly under conditions of strongly correlated data.\nFurthermore, we propose a two-stage query clustering framework tailored for dynamic workload environments. This framework serves to guide the structural configuration of the sum-product network, enhancing its adaptability and efficiency. We conduct extensive experiments to validate the performance of QDSPN under dynamic workloads. The experimental results demonstrate the evident advantages of the proposed QDSPN, and highlight its potential for widespread adoption in embedded database systems.\n\n","manuscriptTitle":"Cardinality estimation based on QDSPN for embedded databases under dynamic workload","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-02-07 18:10:39","doi":"10.21203/rs.3.rs-3901354/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"ba0dd05b-c6f8-4534-9748-4e3b2850f027","owner":[],"postedDate":"February 7th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2024-02-12T20:01:17+00:00","versionOfRecord":[],"versionCreatedAt":"2024-02-07 18:10:39","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-3901354","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-3901354","identity":"rs-3901354","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.