Objective over Architecture: Fraud Detection Under Extreme Imbalance in Bank Account Opening

doi:10.21203/rs.3.rs-8303897/v1

Objective over Architecture: Fraud Detection Under Extreme Imbalance in Bank Account Opening

2025 · doi:10.21203/rs.3.rs-8303897/v1

preprint OA: closed

Full text JSON View at publisher

Full text 11,946 characters · extracted from preprint-html · click to expand

Objective over Architecture: Fraud Detection Under Extreme Imbalance in Bank Account Opening | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Objective over Architecture: Fraud Detection Under Extreme Imbalance in Bank Account Opening Wenxi Sun, Qiannan Shen, Yijun Gao, Qinkai Mao, Tongsong Qi, Shuo Xu This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8303897/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Fraud in financial services—especially account opening fraud—poses major operational and reputational risks. Static rules struggle to adapt to evolving tactics, missing novel patterns and generating excessive false positives. Machine learning promises adaptive detection, but deployment faces severe class imbalance: in the NeurIPS 2022 BAF Base benchmark used here, fraud prevalence is 1.10%. Standard metrics (accuracy, f1_weighted) can look strong while doing little for the minority class. We compare logistic regression, SVM (RBF), Random Forest, LightGBM, and a GRU model on N=1,000,000 accounts under a unified preprocessing pipeline. All models are trained to minimize their loss function, while configurations are selected on a stratified development set using validation 1_weighted. For the four classical models, class weighting in the loss (class_weight in {None, 'balanced'}) is treated as a hyperparameter and tuned. Similarly, the GRU is trained with a fixed class-weighted cross-entropy loss that up-weights fraud cases. This ensures that both model families leverage weighted training objectives, while their final hyperparameters are consistently selected by the f1_weighted metric. Despite similar AUCs and aligned feature importance across families, the classical models converge to high-precision, low-recall solutions (1-6% fraud recall), whereas the GRU recovers 78% recall at 5% precision (AUC = 0.8800). Under extreme imbalance, objective choice and operating point matter at least as much as architecture. fraud detection bank account opening fraud imbalanced classification precision-recall trade-off gated recurrent unit Full Text Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8303897","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":556729258,"identity":"ddc3a2eb-730b-4da0-881e-3bf7469631ed","order_by":0,"name":"Wenxi Sun","email":"","orcid":"","institution":"Johns Hopkins University","correspondingAuthor":false,"prefix":"","firstName":"Wenxi","middleName":"","lastName":"Sun","suffix":""},{"id":556729259,"identity":"fa36fa4e-6beb-4bd3-93cf-85e8f604a579","order_by":1,"name":"Qiannan Shen","email":"","orcid":"https://orcid.org/0009-0007-9492-1773","institution":"Boston University","correspondingAuthor":false,"prefix":"","firstName":"Qiannan","middleName":"","lastName":"Shen","suffix":""},{"id":556729260,"identity":"b02e9b95-f4ea-47f0-9045-2d952caef4b0","order_by":2,"name":"Yijun Gao","email":"","orcid":"","institution":"Johns Hopkins University","correspondingAuthor":false,"prefix":"","firstName":"Yijun","middleName":"","lastName":"Gao","suffix":""},{"id":556729261,"identity":"3b8e2a3d-5c1c-4e02-8ce7-dbe7799282d5","order_by":3,"name":"Qinkai Mao","email":"","orcid":"","institution":"Baruch College","correspondingAuthor":false,"prefix":"","firstName":"Qinkai","middleName":"","lastName":"Mao","suffix":""},{"id":556729262,"identity":"3565d424-6f09-4a9c-b8d3-e6ab7e2ff359","order_by":4,"name":"Tongsong Qi","email":"","orcid":"","institution":"Stevens Institute Of Technology","correspondingAuthor":false,"prefix":"","firstName":"Tongsong","middleName":"","lastName":"Qi","suffix":""},{"id":556729263,"identity":"8b840aee-8832-4158-8954-1124338476de","order_by":5,"name":"Shuo Xu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAvElEQVRIiWNgGAWjYDACCQaGAwkVNiCmAQMDG3FaGB98OJNGmhZmw5lth0nQIj+7x0yat+18Yn//4Q0MH8oOE9ZicOdYmjTPuduJM26kFTDOOEeMFonkY9I8ZbcTN0jwGDDzthGhRX5GYps0D9u5xA38ZwyY/xKjheFG8mHDGW0HEjcw5BgwMxKjxeBGWiIwkJONQX452HMunRiH5RgAo9JOFhhiGx/8KLMmwmHI4ACJ6kfBKBgFo2AU4AIALUY/W8pNcowAAAAASUVORK5CYII=","orcid":"","institution":"University of California San Diego","correspondingAuthor":true,"prefix":"","firstName":"Shuo","middleName":"","lastName":"Xu","suffix":""}],"badges":[],"createdAt":"2025-12-08 05:47:59","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":false,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-8303897/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8303897/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":97896797,"identity":"78951d32-9366-4c5b-a5e5-49c08e35e8eb","added_by":"auto","created_at":"2025-12-10 15:37:04","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":703884,"visible":true,"origin":"","legend":"","description":"","filename":"ObjectiveoverArchitectureFraudDetectionUnderExtremeImbalanceinBankAccountOpeningR2Copy1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8303897/v1_covered_9ba5707d-51e9-488b-9bc2-6bcec0c2e728.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eObjective over Architecture: Fraud Detection Under Extreme Imbalance in Bank Account Opening\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Boston University","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"fraud detection, bank account opening fraud, imbalanced classification, precision-recall trade-off, gated recurrent unit","lastPublishedDoi":"10.21203/rs.3.rs-8303897/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8303897/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eFraud in financial services—especially account opening fraud—poses major operational and reputational risks. Static rules struggle to adapt to evolving tactics, missing novel patterns and generating excessive false positives. Machine learning promises adaptive detection, but deployment faces severe class imbalance: in the NeurIPS 2022 BAF Base benchmark used here, fraud prevalence is 1.10%. Standard metrics (accuracy, f1_weighted) can look strong while doing little for the minority class. We compare logistic regression, SVM (RBF), Random Forest, LightGBM, and a GRU model on N=1,000,000 accounts under a unified preprocessing pipeline. All models are trained to minimize their loss function, while configurations are selected on a stratified development set using validation 1_weighted. For the four classical models, class weighting in the loss (class_weight in {None, 'balanced'}) is treated as a hyperparameter and tuned. Similarly, the GRU is trained with a fixed class-weighted cross-entropy loss that up-weights fraud cases. This ensures that both model families leverage weighted training objectives, while their final hyperparameters are consistently selected by the f1_weighted metric. Despite similar AUCs and aligned feature importance across families, the classical models converge to high-precision, low-recall solutions (1-6% fraud recall), whereas the GRU recovers 78% recall at 5% precision (AUC = 0.8800). Under extreme imbalance, objective choice and operating point matter at least as much as architecture.\u003c/p\u003e","manuscriptTitle":"Objective over Architecture: Fraud Detection Under Extreme Imbalance in Bank Account Opening","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-09 04:13:23","doi":"10.21203/rs.3.rs-8303897/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"b9ca4f3d-b07b-45de-8cc3-674d06947a11","owner":[],"postedDate":"December 9th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-12-09T04:13:23+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-09 04:13:23","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8303897","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8303897","identity":"rs-8303897","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00