Constructing Small Sample Datasets with Game Mixed Sampling and Improved Genetic Algorithm | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Constructing Small Sample Datasets with Game Mixed Sampling and Improved Genetic Algorithm Bailin Zhu, Hongliang Wang, Mi Fan This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4127738/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 15 You are reading this latest preprint version Abstract In domains dealing with unbalanced data categorization, the dataset is expected to be balanced by adding a few more classes of data, which results in more data and low data quality. Therefore, The paper proposes an innovative hybrid method that combines game-based mixed sampling and an improved genetic algorithm to solve imbalanced data classification with a small dataset. Initially, the framework utilizes game theory to establish balanced sampling methods and ratios to address data imbalances. Subsequently, it employs the SelectKBest technique to optimize feature selection. Finally, the improved genetic algorithm will refine the sampling size and sample selection. In the feature encoding stage of the genetic algorithm, an ensemble learning method is adopted, using K-nearest neighbor (KNN), decision tree (DT), support vector machine (SVM), random forest (RF), and Adaboost, combined with precision, F1 score, and MCC performance evaluation indicators to measure, preventing premature convergence and optimizing the entire solution space, thus enhancing data sampling quality. Determination of the minimum stable population size employs a sliding standard deviation approach. Empirical findings corroborate the efficacy of this approach in tackling challenges associated with imbalanced data classification, refining the sample space, and improving sample quality. This methodology demonstrates significant practical utility in augmenting classifier performance when dealing with imbalanced datasets. Imbalanced dataset Game-based method Mixed sampling Genetic algorithm Performance metric coding Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 14 Apr, 2024 Reviews received at journal 11 Apr, 2024 Reviews received at journal 11 Apr, 2024 Reviews received at journal 11 Apr, 2024 Reviews received at journal 31 Mar, 2024 Reviewers agreed at journal 23 Mar, 2024 Reviewers agreed at journal 22 Mar, 2024 Reviewers agreed at journal 22 Mar, 2024 Reviewers agreed at journal 22 Mar, 2024 Reviewers agreed at journal 22 Mar, 2024 Reviewers agreed at journal 22 Mar, 2024 Reviewers invited by journal 22 Mar, 2024 Editor assigned by journal 19 Mar, 2024 Submission checks completed at journal 19 Mar, 2024 First submitted to journal 19 Mar, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4127738","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":281841495,"identity":"18afa911-e40d-4fa8-9676-e5c3b323b9c4","order_by":0,"name":"Bailin Zhu","email":"","orcid":"","institution":"Liaoning Petrochemical University","correspondingAuthor":false,"prefix":"","firstName":"Bailin","middleName":"","lastName":"Zhu","suffix":""},{"id":281841496,"identity":"1acc94cc-a2a8-41be-8b43-e39c2b863b46","order_by":1,"name":"Hongliang Wang","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABEElEQVRIiWNgGAWjYJACZhDBByYrLCBCPMRoYQOTZyRI0QIiGNuI0GJw/Ozh1wUVd+za2JmfPfw6TyLPXCKB8cHbNgZ5c1xazuSlWc848yy5jZnN3Fh2m0Sx5YwEZsO5bQyGOxuwazE7kGNmzNt2OBnoFzNpyW0SiRtuJLBJ87YxJBgcwKHl/BuYFvZv0pJzwFrYf+PVciPH+DFQix0bM4+Z5McGiC3M+LTY33hjxsxz5jBQGU+ZNMMxicSdPQ+bJeeckzDcgEOLZH+O8WeeisP2/PzHt0n+qLFJ3M6efPDDmzIbeVy2AAEbKC4SG4AEMyg6DBgYQWwJnOpBCj+AHAhiMf4AaxkFo2AUjIJRgAoAMxBWfYw5TSkAAAAASUVORK5CYII=","orcid":"","institution":"Liaoning Petrochemical University","correspondingAuthor":true,"prefix":"","firstName":"Hongliang","middleName":"","lastName":"Wang","suffix":""},{"id":281841497,"identity":"3765c90b-ade5-4a63-800f-4de891629fec","order_by":2,"name":"Mi Fan","email":"","orcid":"","institution":"Liaoning Petrochemical University","correspondingAuthor":false,"prefix":"","firstName":"Mi","middleName":"","lastName":"Fan","suffix":""}],"badges":[],"createdAt":"2024-03-19 06:20:30","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4127738/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4127738/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":53201414,"identity":"f9ee617c-3cf8-41d7-a2f4-4fa28d5ce666","added_by":"auto","created_at":"2024-03-21 19:39:22","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1347853,"visible":true,"origin":"","legend":"","description":"","filename":"ConstructingSmallSampleDatasetswithGameMixedSamplingandImprovedGeneticAlgorithm.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4127738/v1_covered_a0753b23-8b21-4157-9ce4-b8fa4d199623.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Constructing Small Sample Datasets with Game Mixed Sampling and Improved Genetic Algorithm","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"the-journal-of-supercomputing","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [The Journal of Supercomputing](https://www.springer.com/journal/11227)","snPcode":"11227","submissionUrl":"https://submission.nature.com/new-submission/11227/3","title":"The Journal of Supercomputing","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false},"keywords":"Imbalanced dataset, Game-based method, Mixed sampling, Genetic algorithm, Performance metric coding","lastPublishedDoi":"10.21203/rs.3.rs-4127738/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4127738/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"In domains dealing with unbalanced data categorization, the dataset is expected to be balanced by adding a few more classes of data, which results in more data and low data quality. Therefore, The paper proposes an innovative hybrid method that combines game-based mixed sampling and an improved genetic algorithm to solve imbalanced data classification with a small dataset. Initially, the framework utilizes game theory to establish balanced sampling methods and ratios to address data imbalances. Subsequently, it employs the SelectKBest technique to optimize feature selection. Finally, the improved genetic algorithm will refine the sampling size and sample selection. In the feature encoding stage of the genetic algorithm, an ensemble learning method is adopted, using K-nearest neighbor (KNN), decision tree (DT), support vector machine (SVM), random forest (RF), and Adaboost, combined with precision, F1 score, and MCC performance evaluation indicators to measure, preventing premature convergence and optimizing the entire solution space, thus enhancing data sampling quality. Determination of the minimum stable population size employs a sliding standard deviation approach. Empirical findings corroborate the efficacy of this approach in tackling challenges associated with imbalanced data classification, refining the sample space, and improving sample quality. This methodology demonstrates significant practical utility in augmenting classifier performance when dealing with imbalanced datasets.","manuscriptTitle":"Constructing Small Sample Datasets with Game Mixed Sampling and Improved Genetic Algorithm","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-03-21 19:30:40","doi":"10.21203/rs.3.rs-4127738/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-04-14T23:57:57+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-04-11T15:08:29+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-04-11T13:10:38+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-04-11T04:59:05+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-03-31T09:47:51+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"5d87ecc2-c52b-4cf8-ac0f-55110e1f31b6","date":"2024-03-23T07:57:37+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"24a49d5d-e7d2-446b-aa04-be004f54e249","date":"2024-03-23T03:23:57+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"a0815e4e-b11f-4ba4-b883-6c4239a0193d","date":"2024-03-23T02:51:37+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"830827b7-ce40-469d-853b-3b3bd50a5b14","date":"2024-03-23T02:11:53+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"63e7cbeb-7cad-47f2-9996-bdb39ec058c8","date":"2024-03-23T02:02:44+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"50bba04d-a9f2-4eba-8f08-12b19aa9fdcf","date":"2024-03-23T01:53:21+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-03-23T01:43:01+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-03-19T12:15:58+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-03-19T10:59:58+00:00","index":"","fulltext":""},{"type":"submitted","content":"The Journal of Supercomputing","date":"2024-03-19T06:19:13+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"the-journal-of-supercomputing","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"","sideBox":"Learn more about [The Journal of Supercomputing](https://www.springer.com/journal/11227)","snPcode":"11227","submissionUrl":"https://submission.nature.com/new-submission/11227/3","title":"The Journal of Supercomputing","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Springer Hybrid","inReviewEnabled":true,"inReviewRevisionsEnabled":false}}],"origin":"","ownerIdentity":"e37063f9-3140-48d8-9043-08243fe5f35d","owner":[],"postedDate":"March 21st, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2024-05-21T16:38:48+00:00","versionOfRecord":[],"versionCreatedAt":"2024-03-21 19:30:40","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-4127738","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4127738","identity":"rs-4127738","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.