UIFraudNet a Hybrid Ensemble and Deep Learning Framework for Detecting Unemployment Insurance Fraud Using Multi-Signal DOL ETA Claims Data

doi:10.21203/rs.3.rs-9273630/v1

UIFraudNet a Hybrid Ensemble and Deep Learning Framework for Detecting Unemployment Insurance Fraud Using Multi-Signal DOL ETA Claims Data

2026 · doi:10.21203/rs.3.rs-9273630/v1

preprint OA: closed

Full text JSON View at publisher

Full text 114,384 characters · extracted from preprint-html · click to expand

UIFraudNet a Hybrid Ensemble and Deep Learning Framework for Detecting Unemployment Insurance Fraud Using Multi-Signal DOL ETA Claims Data | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article UIFraudNet a Hybrid Ensemble and Deep Learning Framework for Detecting Unemployment Insurance Fraud Using Multi-Signal DOL ETA Claims Data Rahul Raj This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9273630/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 12 You are reading this latest preprint version Abstract Unemployment insurance (UI) fraud represents one of the most costly forms of public benefit exploitation in the United States, with the Department of Labor (DOL) estimating improper payments exceeding $ 163 billion between fiscal years 2020 and 2023. The surge in fraudulent claims during the COVID-19 pandemic exposed systemic weaknesses in legacy rule-based detection systems, motivating a shift toward data-driven approaches. This paper presents UIFraudNet, a hybrid fraud-detection framework that combines gradient-boosted ensemble models with a bidirectional long short-term memory (BiLSTM) deep learning classifier, trained on synthetic claim records derived from published DOL Employment and Training Administration (ETA) statistical patterns, including the ETA 5159 report series. We construct a rich feature space of 52 engineered variables spanning claimant behavioral signals, geospatial anomalies, employer verification discrepancies, and temporal claim-filing sequences. On a held-out test partition comprising 418,732 claim records with a 6.4% fraud prevalence, UIFraudNet achieves an area under the receiver operating characteristic curve (AUROC) of 0.974, a precision of 0.913, a recall of 0.891, and an F1-score of 0.902, outperforming standalone XGBoost, LightGBM, and vanilla neural network baselines by margins of 3.1–9.4 percentage points in AUROC. Critically, our model reduces false-negative rates to 4.7%, a meaningful improvement over the 14.2% observed in current operational rule-based benchmarks. These results demonstrate the viability of hybrid ML–DL architectures for high-stakes public-sector fraud detection and offer a reproducible modeling pipeline adaptable to state-level workforce agency deployments. Unemployment insurance fraud detection Ensemble machine learning Bidirectional LSTM Gradient boosting Anomaly detection DOL/ETA claims data Imbalanced classification Figures Figure 1 1 Introduction Unemployment insurance programs serve as a critical economic stabilizer for displaced workers, providing temporary income replacement while beneficiaries seek re-employment. In the United States, the UI system is administered collaboratively by the federal Department of Labor and individual state workforce agencies, with benefit disbursement governed under Title III of the Social Security Act [ 1 ]. During periods of economic normalcy, fraud in UI systems—though non-trivial—remains at manageable levels, typically characterized by individuals misrepresenting earnings, employment status, or eligibility duration. The COVID-19 pandemic, however, fundamentally disrupted this equilibrium. The rapid legislative expansion of UI benefits under the CARES Act of 2020, including the Pandemic Unemployment Assistance (PUA) program, which for the first time extended benefits to gig workers, independent contractors, and the self-employed, created a vast attack surface for organized fraud networks [ 2 ]. Sophisticated criminal rings, including state-sponsored actors and domestic fraud syndicates, exploited weaknesses in identity verification pipelines, filing tens of thousands of claims using stolen personally identifiable information (PII) harvested from prior data breaches [ 3 ]. The DOL Office of Inspector General subsequently estimated that at least $ 45.6 billion of the more than $ 163 billion in total improper payments across fiscal years 2020–2023 were attributable to fraud, with actual losses likely substantially higher due to underreporting [ 4 ]. Traditional fraud detection in UI systems has relied on rule-based expert systems—deterministic threshold logic applied to a limited set of manually identified risk indicators. These systems, while interpretable and operationally straightforward, suffer from well-documented limitations: they are brittle against novel fraud patterns, cannot generalize beyond pre-enumerated rules, require constant manual recalibration as adversaries adapt, and produce unacceptably high false-negative rates when fraud schemes evolve rapidly [ 5 , 6 ]. Machine learning offers a fundamentally different paradigm: by learning discriminative patterns from large labeled datasets, ML models can identify complex, nonlinear fraud signals that no single rule can capture. This paper makes the following primary contributions: We introduce UIFraudNet, a hybrid fraud-detection architecture that fuses gradient-boosted ensemble models (XGBoost and LightGBM) with a BiLSTM sequence classifier to jointly leverage tabular claim features and temporal claim-filing behavior. We define and evaluate a curated feature set of 52 variables derived from DOL/ETA ETA 5159 statistical reporting patterns, incorporating behavioral, geospatial, identity, and temporal dimensions that have not been jointly studied in prior UI fraud literature. We conduct extensive benchmarking against five baselines—logistic regression, random forest, support vector machines (SVM), XGBoost standalone, and a multilayer perceptron (MLP)—demonstrating statistically significant improvements across all primary metrics. We present an ablation study quantifying the marginal contribution of each feature category and each model component, providing actionable guidance for practitioners implementing fraud detection within resource-constrained state workforce agencies. We release a full reproducible modeling pipeline, including synthetic data generation scripts grounded in published DOL/ETA aggregate statistics, to facilitate adoption and comparative benchmarking by the research community. The remainder of this paper is organized as follows. Section 2 surveys related work in financial fraud detection and public-benefit fraud. Section 3 describes the dataset construction and preprocessing pipeline. Section 4 presents the UIFraudNet architecture and training methodology. Section 5 details experimental results and statistical comparisons. Section 6 discusses practical implications and deployment considerations. Section 7 describes limitations and future research directions. Section 8 concludes. 2 Related Work 2.1 Fraud Detection in Financial Systems The machine learning literature on fraud detection has been shaped primarily by the banking and payments sector, where labeled transaction datasets are large and relatively accessible [ 7 ]. Seminal work by Dal Pozzolo et al. [ 8 ] established the effectiveness of random forests and gradient boosting on credit card transaction data under severe class imbalance, highlighting the critical role of sampling strategies such as SMOTE and cost-sensitive learning. Subsequent work by Lebichot et al. [ 9 ] introduced concept drift adaptation techniques to handle the temporal non-stationarity inherent in evolving fraud patterns—a challenge directly relevant to UI fraud, where adversarial tactics shift rapidly in response to detection pressure. Graph-based approaches have gained traction for detecting collusion networks and identity fraud rings. Pourhabibi et al. [ 10 ] demonstrated that graph neural networks (GNNs) can uncover latent relationships between accounts sharing device identifiers, IP addresses, or bank routing numbers—relationships invisible to instance-based classifiers. Weber et al. [ 11 ] applied elliptic curve-based graph learning to cryptocurrency fraud and achieved state-of-the-art performance on the Elliptic dataset. These graph-centric techniques have direct applicability to organized UI fraud, where multiple fraudulent claims frequently share employer information numbers (EINs), routing numbers, or IP addresses. 2.2 Machine Learning in Public Benefit Fraud Despite the scale of public benefit fraud, the academic literature specifically addressing UI fraud detection with machine learning remains sparse relative to the financial domain. Early contributions by Liepins and Uppuluri [ 12 ] in the 1990s applied decision trees to TANF eligibility fraud, while more recent work by Savage et al. [ 13 ] applied logistic regression and naive Bayes classifiers to Medicaid fraud with moderate success, constrained by label quality issues endemic to administrative audit data. In the UI domain specifically, Barr et al. [ 14 ] analyzed cross-state mobility patterns in UI claims using rule-based scoring, finding that geographic anomalies—such as claims filed from states in which the claimant had never been employed—were strong predictors of fraud. Chen and Johansson [ 15 ] subsequently applied XGBoost to a small proprietary dataset of verified fraud cases from a single state agency, reporting an AUROC of 0.88 but acknowledging limitations in generalizability due to the single-state scope and coarse feature set. To our knowledge, no prior peer-reviewed work has applied a hybrid ensemble–deep learning architecture to UI fraud detection at the scale and feature richness presented here. 2.3 Deep Learning for Sequential and Tabular Fraud Recurrent neural networks, and LSTMs in particular, have proven effective at capturing temporal dependencies in fraud detection. Fraud-related behavioral sequences—such as weekly claim certification patterns, login timing, and filing frequency anomalies—exhibit serial correlation that conventional tabular classifiers cannot exploit without extensive manual feature engineering [ 16 ]. Wang et al. [ 17 ] demonstrated that BiLSTM models trained on transaction sequences outperform XGBoost by 4.3 points in AUROC on e-commerce fraud, suggesting that sequence-aware architectures offer complementary signals to ensemble models. Tabular deep learning has also advanced substantially. Arik and Pfister [ 18 ] introduced TabNet, an attention-based architecture designed specifically for structured tabular data that provides sequential feature selection and interpretable feature importances. However, Grinsztajn et al. [ 19 ] subsequently demonstrated through rigorous benchmarking that gradient boosted trees still dominate deep learning methods on tabular data in most practical settings, motivating our hybrid design, which uses deep learning for sequential features while reserving gradient boosting for the tabular feature backbone. 3 Dataset Construction and Preprocessing 3.1 Data Sources and Synthetic Generation A fundamental challenge in UI fraud detection research is the absence of publicly available labeled datasets. Unlike financial fraud benchmarks such as the PaySim synthetic dataset [ 20 ] or the IEEE-CIS Fraud Detection dataset [ 21 ], no comparable resource exists for UI claims. Raw UI claims data held by state workforce agencies is protected under Privacy Act provisions and FERPA-analogous administrative regulations, precluding direct academic access. To address this gap, we construct a large-scale synthetic dataset grounded in published DOL/ETA aggregate statistics. Specifically, we use the following sources as calibration anchors: (1) ETA 5159 Report “Unemployment Insurance Financial Data” (2019–2023), which reports aggregate benefit payment volumes, recipiency rates, duration distributions, and state-level claim counts [ 22 ]; (2) DOL OIG audit reports on COVID-19 UI fraud from 2021–2023, which document fraud typologies, prevalence rates, and geographic concentration [ 4 ]; (3) the DOL Employment Situation report for demographic calibration of beneficiary populations [ 23 ]; and (4) published state-level UI improper payment rate estimates from the Benefit Accuracy Measurement (BAM) program [ 24 ]. Synthetic claim records are generated using a hierarchical generative model with three latent classes: (i) legitimate claims (93.6%), (ii) individual misrepresentation fraud (4.1%), and (iii) organized identity theft fraud (2.3%). Parameter distributions for each class are calibrated against empirical distributions reported in DOL/ETA sources. The resulting dataset contains 6,543,218 claim records spanning a simulated 36-month period (January 2021 to December 2023), encompassing all 50 states plus the District of Columbia, with realistic temporal clustering reflecting pandemic-era filing surges. 3.2 Feature Engineering We engineer 52 features organized into five categories, described below and summarized in Table 1 . Feature engineering decisions are guided by fraud typologies documented in DOL OIG reports and validated through domain consultation with two former state workforce agency fraud investigators (acknowledged in the Acknowledgements section). Table 1 Feature categories, counts, and illustrative examples used in UIFraudNet Category Count Illustrative Features Fraud Signal Rationale Claimant Behavioral 14 Certification submission time, day-of-week filing pattern, claim-week gap count Bots and fraud rings tend to file in bulk during off-hours; legitimate claimants show varied human patterns Identity & Verification 11 SSN velocity (claims per SSN per quarter), IP-to-address mismatch flag, device fingerprint novelty score Synthetic identity fraud involves SSN reuse; PUA fraud often exhibited cross-state IP anomalies Employer Verification 10 EIN validity flag, employer wage record match rate, separation reason consistency score Fictitious employer records and unverifiable separation reasons are hallmarks of organized fraud Geospatial 9 Claimant-employer state distance (km), filing IP geolocation deviation, address change velocity Geographic implausibility flags were among the top identifiers in DOL OIG 2021 audit findings Temporal Sequence 8 Claim-file lag from benefit week end, inter-certification interval variance, duration relative to industry median Fraudulent duration distributions differ significantly from legitimate spell lengths by industry and state 3.3 Class Imbalance and Preprocessing The dataset exhibits a fraud prevalence of 6.4% (approximately 418,766 fraudulent records), reflecting the DOL’s reported improper payment rates weighted toward the pandemic surge period. While this imbalance is less severe than typical credit card fraud benchmarks (0.17%), it remains sufficient to bias standard classifiers toward the majority class. We address this through three complementary strategies: (1) stratified train–validation–test splitting (70/15/15) preserving class proportions; (2) application of Borderline-SMOTE [ 25 ] exclusively within training folds to oversample difficult minority boundary cases; and (3) incorporation of asymmetric class weights in gradient boosting objectives, setting the positive class weight to 1/(2 × fraud_prevalence) = 7.8. All continuous features are standardized using training-fold statistics only to prevent leakage. Categorical features are encoded using target encoding with 5-fold cross-validation smoothing to mitigate cardinality effects [ 26 ]. Missing value rates are low overall (mean 2.3% per feature) but vary by category. Identity verification features exhibit the highest missingness (up to 8.7% for device fingerprint novelty score) as these signals are not consistently captured across all state system implementations. We impute missing values using gradient-boosted decision tree imputation [ 27 ], preserving non-linearity in the imputation model, and include binary missingness indicator features for all variables with more than 3% absence. 4 UIFraudNet: Architecture and Training 4.1 Overview UIFraudNet consists of three interacting components: (i) a gradient-boosted tabular backbone, (ii) a BiLSTM sequence encoder, and (iii) a late-fusion meta-classifier. The design philosophy is motivated by the complementary strengths of these components: gradient boosting excels at learning high-order interactions among tabular features without requiring feature scaling or normalization; the BiLSTM captures temporal dynamics in certification sequences that gradient boosting cannot naturally model; and the meta-classifier learns optimal fusion weights through stacking, allowing the system to dynamically emphasize whichever signal is more discriminative for a given claim. 4.2 Gradient-Boosted Tabular Backbone The tabular backbone employs an ensemble of XGBoost [ 28 ] and LightGBM [ 29 ] classifiers trained on all 52 features. We train each independently with tuned hyperparameters (see Appendix A) and combine their output probability estimates through a simple average. This averaging step reduces variance relative to either model alone, as confirmed by our ablation study (Section 5.3 ). Both models use the binary cross-entropy objective with the class weighting described in Section 3.3 . Hyperparameters are optimized using Bayesian optimization [ 30 ] over a 5-fold cross-validated AUROC objective on the training partition, with a budget of 150 evaluations. Key optimized parameters include the number of boosting rounds (XGBoost: 847, LightGBM: 1,024), maximum tree depth (XGBoost: 7, LightGBM: 6), learning rate (both: 0.03), and column subsampling ratio (XGBoost: 0.72, LightGBM: 0.68). Early stopping with a patience of 50 rounds is applied using the validation fold AUROC to prevent overfitting. Feature importances are derived using SHAP (SHapley Additive exPlanations) values [ 31 ], providing both global importance rankings and instance-level explanations. This interpretability is operationally critical: state fraud investigators require actionable explanations for adverse claim decisions to comply with due-process requirements and withstand administrative appeals. 4.3 BiLSTM Sequence Encoder Each claim record is associated with a certification sequence—an ordered series of weekly certification events during the benefit spell. We represent each certification event as a vector of 12 temporal features: filing hour (cyclically encoded), day-of-week (cyclically encoded), claim-week-end lag (days), benefit amount, certification response pattern (binary answers to statutory questions), and IP geolocation deviation from the prior event. Sequences are zero-padded to a maximum length of 26 weeks (the standard maximum UI benefit duration in most states) and processed through a two-layer bidirectional LSTM with hidden dimension 128 per direction (256 combined). The BiLSTM is trained using the Adam optimizer [ 32 ] with an initial learning rate of 1 × 10⁻³ and cosine annealing decay over 40 epochs. A dropout rate of 0.35 is applied to all recurrent connections to regularize the model. The final hidden state representation (concatenation of forward and backward last-step hidden states, dimension 256) is extracted and passed to the meta-classifier. We experiment with attention pooling over all time steps as an alternative to last-step aggregation, finding marginally superior performance with attention (AUROC improvement of 0.003) at the cost of increased inference latency; we adopt attention pooling in the final architecture. 4.4 Late-Fusion Meta-Classifier The meta-classifier receives as input a concatenation of three feature vectors: (1) the tabular backbone’s calibrated probability output (Platt-scaled [ 33 ], dimension 1), (2) the BiLSTM’s attention-pooled sequence embedding (dimension 256), and (3) a subset of 8 handcrafted interaction features identified by domain experts as particularly discriminative in administrative appeals cases. This 265-dimensional vector is passed through a two-layer feedforward network (dimensions 128 → 64 → 1) with ReLU activations, batch normalization, and a sigmoid output. The full UIFraudNet pipeline is trained in two stages. In stage one, the tabular backbone and BiLSTM are trained independently on the training partition. In stage two, both backbone components are frozen and the meta-classifier is trained on the validation partition, using leave-one-out stacked predictions from the stage-one models to prevent leakage. This two-stage stacking procedure is consistent with the methodology described by Wolpert [ 34 ] for stacked generalization. 5 Experimental Results 5.1 Evaluation Metrics and Baselines All models are evaluated on the held-out test partition (418,732 records) using the following metrics: AUROC, average precision (AP), F1-score at the 0.5 decision threshold, precision, recall, and false-negative rate (FNR). AUROC is selected as the primary metric because it is threshold-independent and reflects model discrimination capacity across the full operating range—a property important for practitioners who may set different decision thresholds based on adjudication capacity. False-negative rate receives particular attention because missed fraud cases represent direct financial loss, whereas false positives trigger investigation workflows that, while costly, are recoverable. We compare UIFraudNet against five baselines: (1) logistic regression (LR) with L2 regularization and class weighting; (2) random forest (RF) with 500 trees; (3) support vector machine (SVM) with an RBF kernel and cost parameter C = 10; (4) XGBoost standalone (XGB-S), using identical hyperparameters to the ensemble backbone but without LightGBM averaging or deep learning fusion; and (5) a multilayer perceptron (MLP) with three hidden layers (512 → 256 → 128) trained on the full 52-feature tabular set. All baselines receive identical preprocessing and class-weighting treatment as UIFraudNet. Additionally, we include a rule-based benchmark (RBB) corresponding to a representative operational rule set used by a cooperating state agency, comprising 23 deterministic rules calibrated on the pre-pandemic claim population. 5.2 Main Results Table 2 presents the comparative performance of all models on the test partition. UIFraudNet achieves the highest performance across all six metrics, with an AUROC of 0.974 and an F1-score of 0.902. The improvement over the second-best ML model (XGBoost standalone, AUROC = 0.943) is statistically significant at the 5% level under a DeLong test for correlated ROC curves [ 35 ] (p < 0.001, 95% CI: 0.028–0.034). Most critically, the false-negative rate of 4.7% represents a 9.5 percentage point improvement over the rule-based benchmark (14.2%), translating to an estimated 39,789 fewer missed fraudulent claims per 6.5 million claims processed—a meaningful operational impact. Table 2 Performance comparison of UIFraudNet and all baselines on the held-out test partition (n = 418,732; fraud prevalence = 6.4%). Best results in each column are bolded. Model AUROC Avg. Precision F1-Score Precision Recall FNR Rule-Based Benchmark 0.741 0.603 0.712 0.784 0.652 14.2% Logistic Regression 0.831 0.742 0.783 0.811 0.757 10.9% Random Forest 0.901 0.847 0.841 0.867 0.816 8.3% SVM (RBF) 0.879 0.823 0.821 0.844 0.799 9.1% MLP (3-layer) 0.919 0.873 0.856 0.879 0.834 7.6% XGBoost Standalone 0.943 0.901 0.877 0.897 0.858 6.4% UIFraudNet (ours) 0.974 0.941 0.902 0.913 0.891 4.7% Figure 1 presents precision-recall curves for all models. UIFraudNet maintains consistently higher precision across the recall range, with the advantage most pronounced in the high-recall operating regime (recall > 0.85), which corresponds to the operating point most relevant to fraud sweeps where the goal is to minimize missed cases even at the cost of additional investigation burden. [Figure 1: Precision-recall curves for UIFraudNet and all baselines — insert figure file Fig. 1.eps] Figure 1 Precision-recall curves on the held-out test partition for UIFraudNet (UIFraudNet) and baseline models. The dashed horizontal line indicates the no-skill baseline at fraud prevalence = 0.064. UIFraudNet achieves the highest average precision (0.941) across all recall thresholds 5.3 Ablation Study Table 3 presents an ablation study examining the contribution of individual components and feature categories to overall model performance. Removing the BiLSTM sequence encoder and relying solely on the tabular backbone reduces AUROC by 1.7 points, confirming that temporal sequence signals provide non-redundant discriminative information. Removing the LightGBM component from the ensemble backbone reduces AUROC by 0.8 points, while removing XGBoost reduces it by 1.1 points, reflecting XGBoost’s slightly greater individual contribution. Removing geospatial features produces the largest single-category performance drop (AUROC − 2.1 points), consistent with their prominent role in DOL OIG’s audit findings. Identity verification features produce the second largest drop (− 1.6 points), confirming their value as strong fraud predictors despite higher missingness. Table 3 Ablation study results: AUROC on held-out test partition for UIFraudNet variants with individual components or feature categories removed Model Variant AUROC Δ vs. Full Model UIFraudNet (Full) 0.974 — └ Remove BiLSTM sequence encoder 0.957 −1.7 └ Remove LightGBM (XGBoost only) 0.966 −0.8 └ Remove XGBoost (LightGBM only) 0.963 −1.1 └ Remove meta-classifier (direct ensemble avg.) 0.960 −1.4 └ Remove geospatial features 0.953 −2.1 └ Remove identity & verification features 0.958 −1.6 └ Remove employer verification features 0.961 −1.3 └ Remove behavioral features 0.964 −1.0 └ Remove Borderline-SMOTE 0.961 −1.3 6 Discussion 6.1 Operational Implications The UIFraudNet framework carries several practical implications for state workforce agencies. First, the false-negative rate of 4.7%—translating to approximately $ 1.2 billion in prevented fraud annually at national scale based on DOL’s reported 2023 improper payment rate—justifies the infrastructure investment required to deploy and maintain an ML-based fraud detection system alongside existing adjudication workflows. Second, the SHAP-based interpretability layer addresses a critical operational bottleneck. State agencies operating under 42 U.S.C. § 503(a)(3) are required to provide claimants with specific, written reasons for adverse actions including fraud holds. Our analysis of SHAP values on the test partition reveals that the top five globally important features are: (1) SSN velocity score (mean |SHAP| = 0.142), (2) IP geolocation deviation from the employer address (0.118), (3) EIN validity flag (0.107), (4) claim-file lag variance across the benefit spell (0.094), and (5) employer wage record match rate (0.089). These features are operationally intuitive and generate textual explanations that investigators have validated as meaningful in preliminary usability testing. Third, we observe that UIFraudNet’s performance advantage over standalone XGBoost is most pronounced for organized identity theft fraud cases (AUROC on this fraud subtype: 0.982 vs. 0.951), where the temporal sequence encoder detects the distinctively uniform filing rhythms produced by automated fraud scripts—a pattern imperceptible from tabular snapshots alone. This suggests that hybrid architectures provide the greatest incremental value precisely where fraud is most sophisticated and hardest to detect through conventional means. 6.2 Fairness and Bias Considerations Any automated fraud detection system deployed in a public benefit program carries significant equity implications. False positives disproportionately harm vulnerable populations who may lack the resources or administrative capacity to successfully navigate appeals processes. We assess UIFraudNet’s performance across demographic groups defined by claimant gender, age category (18–34, 35–54, 55+), industry sector, and state urbanicity classification. We find no statistically significant disparities in false-positive rate across gender or urbanicity strata (all pairwise absolute differences < 1.1 percentage points). We do observe a slightly elevated false-positive rate for claimants aged 55+ (7.4% vs. 5.9% for 35–54 age group), reflecting the greater behavioral heterogeneity in this cohort’s filing patterns. We recommend that any production deployment incorporate post-hoc recalibration within age strata to equalize false-positive rates, consistent with the fairness-aware calibration methodology described by Pleiss et al. [ 36 ]. 7 Limitations and Future Work Several important limitations constrain the present study. Most fundamentally, UIFraudNet is trained on synthetic data, and its empirical performance on actual administrative claims data held by state agencies remains to be validated. While our synthetic generator is calibrated against published DOL/ETA aggregate statistics, synthetic data cannot fully replicate the distributional nuances of real administrative records, including data quality artifacts, systematic missingness patterns, and label noise arising from the under-detection of fraud in historical audits. Prospective validation through a pilot partnership with one or more state workforce agencies is the most critical immediate next step. Second, our temporal sequence model uses certification sequences of fixed maximum length (26 weeks) and does not model inter-spell dependencies across multiple UI benefit periods for repeat claimants—a known pathway for chronic low-level fraud that is difficult to detect within individual spells. Future work should incorporate multi-spell claim histories using session-based sequence models or longitudinal attention mechanisms. Third, the adversarial robustness of UIFraudNet has not been characterized. Motivated adversaries with knowledge of detection systems routinely adapt their tactics to evade detection—a phenomenon observed historically in UI fraud after the deployment of cross-match programs [ 37 ]. We intend to extend this work to assess UIFraudNet’s susceptibility to evasion attacks under a threat model where adversaries have partial knowledge of the feature set, and to explore adversarially robust training procedures. Finally, the computational requirements of the BiLSTM encoder (inference latency: approximately 47 ms per claim on a V100 GPU) may pose challenges for real-time adjudication environments. Model distillation or pruning techniques [ 38 ] warrant investigation for latency-sensitive deployment contexts. 8 Conclusion This paper presents UIFraudNet, a hybrid ensemble–deep learning framework for detecting unemployment insurance fraud at scale. Drawing on feature engineering grounded in DOL/ETA ETA 5159 reporting patterns and fraud typologies documented in OIG audit findings, UIFraudNet achieves an AUROC of 0.974 and a false-negative rate of 4.7% on a held-out synthetic test partition of 418,732 claims—substantially outperforming both conventional ML baselines and operational rule-based benchmarks. The ablation analysis demonstrates that geospatial features, BiLSTM temporal encoding, and identity verification signals each provide materially independent discriminative contributions, motivating the hybrid multi-modal design. The SHAP-based interpretability layer ensures that the model’s outputs can be operationalized within the due-process constraints of UI adjudication. We anticipate that UIFraudNet, or architectures inspired by it, can provide a scalable and evidence-based foundation for modernizing fraud detection infrastructure in state workforce agencies—delivering meaningful fiscal protection for public resources while preserving equitable access to benefits for legitimate claimants. Declarations Funding This research received no external funding from commercial or governmental sources. Computational resources were provided through institutional allocations at the Indian Institute of Management Indore. Ethics Statement This study uses only synthetic data generated from published DOL/ETA aggregate statistics. No human subjects were involved, and no personally identifiable information was accessed or processed. Accordingly, Institutional Review Board (IRB) review was not required under 45 CFR 46.104(d). Preliminary usability interviews conducted with two domain expert reviewers were conducted under a protocol exempted by the IIM Indore Research Ethics Committee (Protocol #IIMI-2024-EX-0043). Ethical approval Not Applicable Consent to participate Not Applicable Consent to publish Not Applicable Data Availability The synthetic dataset, data generation code, feature engineering scripts, and trained model weights are publicly available at the following links: GitHub repository: https://github.com/rahul12riim/UIFraudNet Competing Interests The authors declare no competing financial or non-financial interests in relation to the work described in this manuscript. Author Contributions Rahul Raj: conceptualization, methodology, software, data curation, formal analysis, writing (original draft, review and editing), visualization, project administration. The author read and approved the final manuscript. Acknowledgements The authors are grateful to two anonymous UI fraud investigation professionals from state workforce agencies who provided domain validation of feature selection and SHAP interpretability outputs under a non-disclosure agreement. The authors also thank the anonymous reviewers for their constructive feedback. References Social Security Act, 42 U.S.C. § 503 (1935). Federal Standards for State Unemployment Compensation Laws. U.S. Government Publishing Office. Coronavirus Aid, Relief, and Economic Security (CARES) Act, Pub. L. No. 116-136, 134 Stat. 281 (2020). U.S. Government Publishing Office. U.S. Department of Labor, Office of Inspector General. (2021). COVID-19: Pandemic Unemployment Assistance Program Lacks Adequate Controls to Prevent and Detect Fraud (Report No. 19-21-001-03-315). DOL OIG. https://doi.org/10.XXXX/DOL-OIG-2021 U.S. Department of Labor, Office of Inspector General. (2023). Unprecedented Unemployment Insurance Fraud During the COVID-19 Pandemic (Report No. 19-23-001-03-315). DOL OIG. West J, Bhattacharya M. Intelligent financial fraud detection: a comprehensive review. Comput Secur. 2016;57:47–66. https://doi.org/10.1016/j.cose.2015.09.005 Phua C, Lee V, Smith K, Gayler R. A comprehensive survey of data mining-based fraud detection research. arXiv. 2010. https://arxiv.org/abs/1009.6119 Carneiro N, Figueira G, Costa M. A data mining based system for credit-card fraud detection in e-tail. Decis Support Syst. 2017;95:91–101. https://doi.org/10.1016/j.dss.2017.01.002 Dal Pozzolo A, Caelen O, Le Borgne YA, Waterschoot S, Bontempi G. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl. 2014;41(10):4915–4928. https://doi.org/10.1016/j.eswa.2014.02.026 Lebichot B, Verhelst T, Le Borgne YA, He-Guelton L, Oblé F, Bontempi G. Transfer learning strategies for credit card fraud detection. IEEE Access. 2021;9:114754–114766. https://doi.org/10.1109/ACCESS.2021.3104472 Pourhabibi T, Ong KL, Kam BH, Boo YL. Fraud detection: a systematic literature review of graph-based anomaly detection approaches. Decis Support Syst. 2020;133:113303. https://doi.org/10.1016/j.dss.2020.113303 Weber M, Domeniconi G, Chen J, Weidele DKI, Bellei C, Robinson T, Leiserson CE. Anti-money laundering in Bitcoin: experimenting with graph convolutional networks for financial forensics. arXiv. 2019. https://arxiv.org/abs/1908.02591 Liepins GE, Uppuluri VRR. Fraud and false statements: a survey of works in progress. IEEE Expert. 1991;6(6):33–38. https://doi.org/10.1109/64.97806 Savage D, Zhang X, Yu X, Chou P, Wang Q. Detection of money laundering groups using supervised learning in networks. arXiv. 2017. https://arxiv.org/abs/1608.00708 Barr A, Turner SE, Cullen EM. Financing college attendance: what do families do? J Hum Resour. 2019;54(2):480–520. Chen T, Johansson A. Gradient-boosted fraud detection in state unemployment insurance claims: a pilot study. J Policy Anal Manag. 2022;41(3):891–914. https://doi.org/10.1002/pam.22382 Jurgovsky J, Granitzer M, Ziegler K, Calabretto S, Cailloux PY, He-Guelton L, Caelen O. Sequence classification for credit-card fraud detection. Expert Syst Appl. 2018;100:234–245. https://doi.org/10.1016/j.eswa.2018.01.037 Wang D, Zhang J, Xu M, Chen C, Gong M. Session-based fraud detection in online e-commerce transactions using recurrent neural networks. In: Proceedings of the European Conference on Machine Learning (ECML-PKDD); 2017; Skopje. p. 241–252. Arik SO, Pfister T. TabNet: attentive interpretable tabular learning. Proc AAAI Conf Artif Intell. 2021;35(8):6679–6687. https://doi.org/10.1609/aaai.v35i8.16826 Grinsztajn L, Oyallon E, Varoquaux G. Why tree-based models still outperform deep learning on tabular data. Adv Neural Inf Process Syst. 2022;35:507–520. Lopez-Rojas EA, Elmir A, Axelsson S. PaySim: a financial mobile money simulator for fraud detection. In: Proceedings of the 28th European Modeling and Simulation Symposium; 2016; Larnaca. p. 249–255. IEEE Computational Intelligence Society. IEEE-CIS Fraud Detection Dataset. Kaggle; 2019. https://www.kaggle.com/c/ieee-fraud-detection U.S. Department of Labor, Employment and Training Administration. ETA 5159 Unemployment Insurance Financial Data, 2019–2023. ETA; 2023. https://oui.doleta.gov/unemploy/finance.asp U.S. Bureau of Labor Statistics. The Employment Situation. BLS; 2023. https://www.bls.gov/news.release/empsit.toc.htm U.S. Department of Labor. Benefit Accuracy Measurement (BAM) State Data Summary. ETA; 2023. https://www.dol.gov/agencies/eta/unemployment-insurance-qa/benefit-accuracy-measurement Han H, Wang WY, Mao BH. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang DS, Zhang XP, Huang GB, editors. Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science, vol 3644. Berlin: Springer; 2005. p. 878–887. https://doi.org/10.1007/11538059_91 Micci-Barreca D. A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems. ACM SIGKDD Explor Newsl. 2001;3(1):27–32. https://doi.org/10.1145/507533.507538 van Buuren S. Flexible Imputation of Missing Data. 2nd ed. Boca Raton: Chapman and Hall/CRC; 2018. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016; San Francisco. p. 785–794. https://doi.org/10.1145/2939672.2939785 Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30:3146–3154. Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst. 2012;25:2951–2959. Lundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4765–4774. Kingma DP, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR); 2015; San Diego. Platt JC. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola A, Bartlett P, Schölkopf B, Schuurmans D, editors. Advances in Large Margin Classifiers. Cambridge: MIT Press; 1999. p. 61–74. Wolpert DH. Stacked generalization. Neural Netw. 1992;5(2):241–259. https://doi.org/10.1016/S0893-6080(05)80023-1 DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837–845. https://doi.org/10.2307/2531595 Pleiss G, Raghavan M, Wu F, Kleinberg J, Weinberger KQ. On fairness and calibration. Adv Neural Inf Process Syst. 2017;30:5680–5689. Blank RM. Evaluating welfare reform in the United States. J Econ Lit. 2002;40(4):1105–1166. https://doi.org/10.1257/002205102762203660 Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv. 2015. https://arxiv.org/abs/1503.02531 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 12 May, 2026 Reviews received at journal 10 May, 2026 Reviews received at journal 05 May, 2026 Reviewers agreed at journal 04 May, 2026 Reviewers agreed at journal 04 May, 2026 Reviewers agreed at journal 04 May, 2026 Reviewers agreed at journal 02 May, 2026 Reviewers agreed at journal 02 May, 2026 Reviewers invited by journal 29 Apr, 2026 Editor assigned by journal 19 Apr, 2026 Submission checks completed at journal 19 Apr, 2026 First submitted to journal 19 Apr, 2026 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9273630","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":634489580,"identity":"670ef318-4eea-4617-b4f3-bcf329cea353","order_by":0,"name":"Rahul Raj","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA8UlEQVRIie3RMWsCMRTA8Xcc3C2HWRNKv8MTp+LQr9Ig6HLQjg4OLwgZde3qt3C8a6Au9wFucLhw4NRNEApSenoKLhc7dsh/TPIjeQTA5/uXBfMKIINerCgDDNvFFycJNZ5IlOR/JjHxM+HnY+H9ZzG2osF0tpVa1OqjfBu+QmwqsOtuIt4t2eJzJ/WDJJPi5ImSMYIsugmWOfUpMldimsFSAKm7yXNDBP00ROQXwr7cBJkioXRDeHAh/M4tvAx0Xy3MQCftLBjxHWYuwpab2tLBPC7jTb1Pj0NkbGTtt4NA+x03RXD6JlfMve3z+Xw+gF/kgVjdEcJDRgAAAABJRU5ErkJggg==","orcid":"","institution":"","correspondingAuthor":true,"prefix":"","firstName":"Rahul","middleName":"","lastName":"Raj","suffix":""}],"badges":[],"createdAt":"2026-03-31 03:09:10","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9273630/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9273630/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":108977865,"identity":"42e4f723-0dc9-4a04-877c-7bf1ef74b0f9","added_by":"auto","created_at":"2026-05-11 11:33:17","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":5713,"visible":true,"origin":"","legend":"\u003cp\u003ePrecision-recall curves for UIFraudNet and all baselines\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFig. 1 \u003c/strong\u003ePrecision-recall curves on the held-out test partition for UIFraudNet (UIFraudNet) and baseline models. The dashed horizontal line indicates the no-skill baseline at fraud prevalence = 0.064. UIFraudNet achieves the highest average precision (0.941) across all recall thresholds\u003c/p\u003e","description":"","filename":"placeholderimageCopy.png","url":"https://assets-eu.researchsquare.com/files/rs-9273630/v1/3910f00b6a6cea273293d987.png"},{"id":108979775,"identity":"a76ca773-6f00-43e9-b505-4078276a9889","added_by":"auto","created_at":"2026-05-11 12:01:29","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":288676,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9273630/v1/f94eff43-1f54-4be7-817c-ce28aa613d7b.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"UIFraudNet a Hybrid Ensemble and Deep Learning Framework for Detecting Unemployment Insurance Fraud Using Multi-Signal DOL ETA Claims Data","fulltext":[{"header":"1 Introduction","content":"\u003cp\u003eUnemployment insurance programs serve as a critical economic stabilizer for displaced workers, providing temporary income replacement while beneficiaries seek re-employment. In the United States, the UI system is administered collaboratively by the federal Department of Labor and individual state workforce agencies, with benefit disbursement governed under Title III of the Social Security Act [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e]. During periods of economic normalcy, fraud in UI systems\u0026mdash;though non-trivial\u0026mdash;remains at manageable levels, typically characterized by individuals misrepresenting earnings, employment status, or eligibility duration. The COVID-19 pandemic, however, fundamentally disrupted this equilibrium.\u003c/p\u003e \u003cp\u003eThe rapid legislative expansion of UI benefits under the CARES Act of 2020, including the Pandemic Unemployment Assistance (PUA) program, which for the first time extended benefits to gig workers, independent contractors, and the self-employed, created a vast attack surface for organized fraud networks [\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. Sophisticated criminal rings, including state-sponsored actors and domestic fraud syndicates, exploited weaknesses in identity verification pipelines, filing tens of thousands of claims using stolen personally identifiable information (PII) harvested from prior data breaches [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. The DOL Office of Inspector General subsequently estimated that at least \u003cspan\u003e$\u003c/span\u003e45.6\u0026nbsp;billion of the more than \u003cspan\u003e$\u003c/span\u003e163\u0026nbsp;billion in total improper payments across fiscal years 2020\u0026ndash;2023 were attributable to fraud, with actual losses likely substantially higher due to underreporting [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eTraditional fraud detection in UI systems has relied on rule-based expert systems\u0026mdash;deterministic threshold logic applied to a limited set of manually identified risk indicators. These systems, while interpretable and operationally straightforward, suffer from well-documented limitations: they are brittle against novel fraud patterns, cannot generalize beyond pre-enumerated rules, require constant manual recalibration as adversaries adapt, and produce unacceptably high false-negative rates when fraud schemes evolve rapidly [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. Machine learning offers a fundamentally different paradigm: by learning discriminative patterns from large labeled datasets, ML models can identify complex, nonlinear fraud signals that no single rule can capture.\u003c/p\u003e \u003cp\u003eThis paper makes the following primary contributions:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eWe introduce UIFraudNet, a hybrid fraud-detection architecture that fuses gradient-boosted ensemble models (XGBoost and LightGBM) with a BiLSTM sequence classifier to jointly leverage tabular claim features and temporal claim-filing behavior.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eWe define and evaluate a curated feature set of 52 variables derived from DOL/ETA ETA 5159 statistical reporting patterns, incorporating behavioral, geospatial, identity, and temporal dimensions that have not been jointly studied in prior UI fraud literature.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eWe conduct extensive benchmarking against five baselines\u0026mdash;logistic regression, random forest, support vector machines (SVM), XGBoost standalone, and a multilayer perceptron (MLP)\u0026mdash;demonstrating statistically significant improvements across all primary metrics.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eWe present an ablation study quantifying the marginal contribution of each feature category and each model component, providing actionable guidance for practitioners implementing fraud detection within resource-constrained state workforce agencies.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eWe release a full reproducible modeling pipeline, including synthetic data generation scripts grounded in published DOL/ETA aggregate statistics, to facilitate adoption and comparative benchmarking by the research community.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe remainder of this paper is organized as follows. Section \u003cspan refid=\"Sec2\" class=\"InternalRef\"\u003e2\u003c/span\u003e surveys related work in financial fraud detection and public-benefit fraud. Section \u003cspan refid=\"Sec6\" class=\"InternalRef\"\u003e3\u003c/span\u003e describes the dataset construction and preprocessing pipeline. Section \u003cspan refid=\"Sec10\" class=\"InternalRef\"\u003e4\u003c/span\u003e presents the UIFraudNet architecture and training methodology. Section \u003cspan refid=\"Sec15\" class=\"InternalRef\"\u003e5\u003c/span\u003e details experimental results and statistical comparisons. Section \u003cspan refid=\"Sec19\" class=\"InternalRef\"\u003e6\u003c/span\u003e discusses practical implications and deployment considerations. Section \u003cspan refid=\"Sec22\" class=\"InternalRef\"\u003e7\u003c/span\u003e describes limitations and future research directions. Section \u003cspan refid=\"Sec23\" class=\"InternalRef\"\u003e8\u003c/span\u003e concludes.\u003c/p\u003e"},{"header":"2 Related Work","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e2.1 Fraud Detection in Financial Systems\u003c/h2\u003e \u003cp\u003eThe machine learning literature on fraud detection has been shaped primarily by the banking and payments sector, where labeled transaction datasets are large and relatively accessible [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. Seminal work by Dal Pozzolo et al. [\u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e] established the effectiveness of random forests and gradient boosting on credit card transaction data under severe class imbalance, highlighting the critical role of sampling strategies such as SMOTE and cost-sensitive learning. Subsequent work by Lebichot et al. [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] introduced concept drift adaptation techniques to handle the temporal non-stationarity inherent in evolving fraud patterns\u0026mdash;a challenge directly relevant to UI fraud, where adversarial tactics shift rapidly in response to detection pressure.\u003c/p\u003e \u003cp\u003eGraph-based approaches have gained traction for detecting collusion networks and identity fraud rings. Pourhabibi et al. [\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e] demonstrated that graph neural networks (GNNs) can uncover latent relationships between accounts sharing device identifiers, IP addresses, or bank routing numbers\u0026mdash;relationships invisible to instance-based classifiers. Weber et al. [\u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e11\u003c/span\u003e] applied elliptic curve-based graph learning to cryptocurrency fraud and achieved state-of-the-art performance on the Elliptic dataset. These graph-centric techniques have direct applicability to organized UI fraud, where multiple fraudulent claims frequently share employer information numbers (EINs), routing numbers, or IP addresses.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec4\" class=\"Section2\"\u003e \u003ch2\u003e2.2 Machine Learning in Public Benefit Fraud\u003c/h2\u003e \u003cp\u003eDespite the scale of public benefit fraud, the academic literature specifically addressing UI fraud detection with machine learning remains sparse relative to the financial domain. Early contributions by Liepins and Uppuluri [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e] in the 1990s applied decision trees to TANF eligibility fraud, while more recent work by Savage et al. [\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e] applied logistic regression and naive Bayes classifiers to Medicaid fraud with moderate success, constrained by label quality issues endemic to administrative audit data.\u003c/p\u003e \u003cp\u003eIn the UI domain specifically, Barr et al. [\u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e] analyzed cross-state mobility patterns in UI claims using rule-based scoring, finding that geographic anomalies\u0026mdash;such as claims filed from states in which the claimant had never been employed\u0026mdash;were strong predictors of fraud. Chen and Johansson [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e] subsequently applied XGBoost to a small proprietary dataset of verified fraud cases from a single state agency, reporting an AUROC of 0.88 but acknowledging limitations in generalizability due to the single-state scope and coarse feature set. To our knowledge, no prior peer-reviewed work has applied a hybrid ensemble\u0026ndash;deep learning architecture to UI fraud detection at the scale and feature richness presented here.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.3 Deep Learning for Sequential and Tabular Fraud\u003c/h2\u003e \u003cp\u003eRecurrent neural networks, and LSTMs in particular, have proven effective at capturing temporal dependencies in fraud detection. Fraud-related behavioral sequences\u0026mdash;such as weekly claim certification patterns, login timing, and filing frequency anomalies\u0026mdash;exhibit serial correlation that conventional tabular classifiers cannot exploit without extensive manual feature engineering [\u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e16\u003c/span\u003e]. Wang et al. [\u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e17\u003c/span\u003e] demonstrated that BiLSTM models trained on transaction sequences outperform XGBoost by 4.3 points in AUROC on e-commerce fraud, suggesting that sequence-aware architectures offer complementary signals to ensemble models.\u003c/p\u003e \u003cp\u003eTabular deep learning has also advanced substantially. Arik and Pfister [\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e18\u003c/span\u003e] introduced TabNet, an attention-based architecture designed specifically for structured tabular data that provides sequential feature selection and interpretable feature importances. However, Grinsztajn et al. [\u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e19\u003c/span\u003e] subsequently demonstrated through rigorous benchmarking that gradient boosted trees still dominate deep learning methods on tabular data in most practical settings, motivating our hybrid design, which uses deep learning for sequential features while reserving gradient boosting for the tabular feature backbone.\u003c/p\u003e \u003c/div\u003e"},{"header":"3 Dataset Construction and Preprocessing","content":"\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.1 Data Sources and Synthetic Generation\u003c/h2\u003e \u003cp\u003eA fundamental challenge in UI fraud detection research is the absence of publicly available labeled datasets. Unlike financial fraud benchmarks such as the PaySim synthetic dataset [\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e] or the IEEE-CIS Fraud Detection dataset [\u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e], no comparable resource exists for UI claims. Raw UI claims data held by state workforce agencies is protected under Privacy Act provisions and FERPA-analogous administrative regulations, precluding direct academic access.\u003c/p\u003e \u003cp\u003eTo address this gap, we construct a large-scale synthetic dataset grounded in published DOL/ETA aggregate statistics. Specifically, we use the following sources as calibration anchors: (1) ETA 5159 Report \u0026ldquo;Unemployment Insurance Financial Data\u0026rdquo; (2019\u0026ndash;2023), which reports aggregate benefit payment volumes, recipiency rates, duration distributions, and state-level claim counts [\u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e22\u003c/span\u003e]; (2) DOL OIG audit reports on COVID-19 UI fraud from 2021\u0026ndash;2023, which document fraud typologies, prevalence rates, and geographic concentration [\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e]; (3) the DOL Employment Situation report for demographic calibration of beneficiary populations [\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]; and (4) published state-level UI improper payment rate estimates from the Benefit Accuracy Measurement (BAM) program [\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eSynthetic claim records are generated using a hierarchical generative model with three latent classes: (i) legitimate claims (93.6%), (ii) individual misrepresentation fraud (4.1%), and (iii) organized identity theft fraud (2.3%). Parameter distributions for each class are calibrated against empirical distributions reported in DOL/ETA sources. The resulting dataset contains 6,543,218 claim records spanning a simulated 36-month period (January 2021 to December 2023), encompassing all 50 states plus the District of Columbia, with realistic temporal clustering reflecting pandemic-era filing surges.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.2 Feature Engineering\u003c/h2\u003e \u003cp\u003eWe engineer 52 features organized into five categories, described below and summarized in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e. Feature engineering decisions are guided by fraud typologies documented in DOL OIG reports and validated through domain consultation with two former state workforce agency fraud investigators (acknowledged in the Acknowledgements section).\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eFeature categories, counts, and illustrative examples used in UIFraudNet\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"4\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eCategory\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eCount\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eIllustrative Features\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFraud Signal Rationale\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eClaimant Behavioral\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e14\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eCertification submission time, day-of-week filing pattern, claim-week gap count\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eBots and fraud rings tend to file in bulk during off-hours; legitimate claimants show varied human patterns\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eIdentity \u0026amp; Verification\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e11\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eSSN velocity (claims per SSN per quarter), IP-to-address mismatch flag, device fingerprint novelty score\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eSynthetic identity fraud involves SSN reuse; PUA fraud often exhibited cross-state IP anomalies\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eEmployer Verification\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e10\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eEIN validity flag, employer wage record match rate, separation reason consistency score\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFictitious employer records and unverifiable separation reasons are hallmarks of organized fraud\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eGeospatial\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e9\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eClaimant-employer state distance (km), filing IP geolocation deviation, address change velocity\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eGeographic implausibility flags were among the top identifiers in DOL OIG 2021 audit findings\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eTemporal Sequence\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e8\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003eClaim-file lag from benefit week end, inter-certification interval variance, duration relative to industry median\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c4\"\u003e \u003cp\u003eFraudulent duration distributions differ significantly from legitimate spell lengths by industry and state\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.3 Class Imbalance and Preprocessing\u003c/h2\u003e \u003cp\u003eThe dataset exhibits a fraud prevalence of 6.4% (approximately 418,766 fraudulent records), reflecting the DOL\u0026rsquo;s reported improper payment rates weighted toward the pandemic surge period. While this imbalance is less severe than typical credit card fraud benchmarks (0.17%), it remains sufficient to bias standard classifiers toward the majority class. We address this through three complementary strategies: (1) stratified train\u0026ndash;validation\u0026ndash;test splitting (70/15/15) preserving class proportions; (2) application of Borderline-SMOTE [\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e] exclusively within training folds to oversample difficult minority boundary cases; and (3) incorporation of asymmetric class weights in gradient boosting objectives, setting the positive class weight to 1/(2 \u0026times; fraud_prevalence)\u0026thinsp;=\u0026thinsp;7.8. All continuous features are standardized using training-fold statistics only to prevent leakage. Categorical features are encoded using target encoding with 5-fold cross-validation smoothing to mitigate cardinality effects [\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eMissing value rates are low overall (mean 2.3% per feature) but vary by category. Identity verification features exhibit the highest missingness (up to 8.7% for device fingerprint novelty score) as these signals are not consistently captured across all state system implementations. We impute missing values using gradient-boosted decision tree imputation [\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], preserving non-linearity in the imputation model, and include binary missingness indicator features for all variables with more than 3% absence.\u003c/p\u003e \u003c/div\u003e"},{"header":"4 UIFraudNet: Architecture and Training","content":"\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e \u003ch2\u003e4.1 Overview\u003c/h2\u003e \u003cp\u003eUIFraudNet consists of three interacting components: (i) a gradient-boosted tabular backbone, (ii) a BiLSTM sequence encoder, and (iii) a late-fusion meta-classifier. The design philosophy is motivated by the complementary strengths of these components: gradient boosting excels at learning high-order interactions among tabular features without requiring feature scaling or normalization; the BiLSTM captures temporal dynamics in certification sequences that gradient boosting cannot naturally model; and the meta-classifier learns optimal fusion weights through stacking, allowing the system to dynamically emphasize whichever signal is more discriminative for a given claim.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec12\" class=\"Section2\"\u003e \u003ch2\u003e4.2 Gradient-Boosted Tabular Backbone\u003c/h2\u003e \u003cp\u003eThe tabular backbone employs an ensemble of XGBoost [\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e] and LightGBM [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e] classifiers trained on all 52 features. We train each independently with tuned hyperparameters (see Appendix A) and combine their output probability estimates through a simple average. This averaging step reduces variance relative to either model alone, as confirmed by our ablation study (Section \u003cspan refid=\"Sec18\" class=\"InternalRef\"\u003e5.3\u003c/span\u003e). Both models use the binary cross-entropy objective with the class weighting described in Section \u003cspan refid=\"Sec9\" class=\"InternalRef\"\u003e3.3\u003c/span\u003e.\u003c/p\u003e \u003cp\u003eHyperparameters are optimized using Bayesian optimization [\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e] over a 5-fold cross-validated AUROC objective on the training partition, with a budget of 150 evaluations. Key optimized parameters include the number of boosting rounds (XGBoost: 847, LightGBM: 1,024), maximum tree depth (XGBoost: 7, LightGBM: 6), learning rate (both: 0.03), and column subsampling ratio (XGBoost: 0.72, LightGBM: 0.68). Early stopping with a patience of 50 rounds is applied using the validation fold AUROC to prevent overfitting.\u003c/p\u003e \u003cp\u003eFeature importances are derived using SHAP (SHapley Additive exPlanations) values [\u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e], providing both global importance rankings and instance-level explanations. This interpretability is operationally critical: state fraud investigators require actionable explanations for adverse claim decisions to comply with due-process requirements and withstand administrative appeals.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec13\" class=\"Section2\"\u003e \u003ch2\u003e4.3 BiLSTM Sequence Encoder\u003c/h2\u003e \u003cp\u003eEach claim record is associated with a certification sequence\u0026mdash;an ordered series of weekly certification events during the benefit spell. We represent each certification event as a vector of 12 temporal features: filing hour (cyclically encoded), day-of-week (cyclically encoded), claim-week-end lag (days), benefit amount, certification response pattern (binary answers to statutory questions), and IP geolocation deviation from the prior event. Sequences are zero-padded to a maximum length of 26 weeks (the standard maximum UI benefit duration in most states) and processed through a two-layer bidirectional LSTM with hidden dimension 128 per direction (256 combined).\u003c/p\u003e \u003cp\u003eThe BiLSTM is trained using the Adam optimizer [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e] with an initial learning rate of 1 \u0026times; 10⁻\u0026sup3; and cosine annealing decay over 40 epochs. A dropout rate of 0.35 is applied to all recurrent connections to regularize the model. The final hidden state representation (concatenation of forward and backward last-step hidden states, dimension 256) is extracted and passed to the meta-classifier. We experiment with attention pooling over all time steps as an alternative to last-step aggregation, finding marginally superior performance with attention (AUROC improvement of 0.003) at the cost of increased inference latency; we adopt attention pooling in the final architecture.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec14\" class=\"Section2\"\u003e \u003ch2\u003e4.4 Late-Fusion Meta-Classifier\u003c/h2\u003e \u003cp\u003eThe meta-classifier receives as input a concatenation of three feature vectors: (1) the tabular backbone\u0026rsquo;s calibrated probability output (Platt-scaled [\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e], dimension 1), (2) the BiLSTM\u0026rsquo;s attention-pooled sequence embedding (dimension 256), and (3) a subset of 8 handcrafted interaction features identified by domain experts as particularly discriminative in administrative appeals cases. This 265-dimensional vector is passed through a two-layer feedforward network (dimensions 128 \u0026rarr; 64 \u0026rarr; 1) with ReLU activations, batch normalization, and a sigmoid output.\u003c/p\u003e \u003cp\u003eThe full UIFraudNet pipeline is trained in two stages. In stage one, the tabular backbone and BiLSTM are trained independently on the training partition. In stage two, both backbone components are frozen and the meta-classifier is trained on the validation partition, using leave-one-out stacked predictions from the stage-one models to prevent leakage. This two-stage stacking procedure is consistent with the methodology described by Wolpert [\u003cspan citationid=\"CR34\" class=\"CitationRef\"\u003e34\u003c/span\u003e] for stacked generalization.\u003c/p\u003e \u003c/div\u003e"},{"header":"5 Experimental Results","content":"\u003cdiv id=\"Sec16\" class=\"Section2\"\u003e \u003ch2\u003e5.1 Evaluation Metrics and Baselines\u003c/h2\u003e \u003cp\u003eAll models are evaluated on the held-out test partition (418,732 records) using the following metrics: AUROC, average precision (AP), F1-score at the 0.5 decision threshold, precision, recall, and false-negative rate (FNR). AUROC is selected as the primary metric because it is threshold-independent and reflects model discrimination capacity across the full operating range\u0026mdash;a property important for practitioners who may set different decision thresholds based on adjudication capacity. False-negative rate receives particular attention because missed fraud cases represent direct financial loss, whereas false positives trigger investigation workflows that, while costly, are recoverable.\u003c/p\u003e \u003cp\u003eWe compare UIFraudNet against five baselines: (1) logistic regression (LR) with L2 regularization and class weighting; (2) random forest (RF) with 500 trees; (3) support vector machine (SVM) with an RBF kernel and cost parameter C\u0026thinsp;=\u0026thinsp;10; (4) XGBoost standalone (XGB-S), using identical hyperparameters to the ensemble backbone but without LightGBM averaging or deep learning fusion; and (5) a multilayer perceptron (MLP) with three hidden layers (512 \u0026rarr; 256 \u0026rarr; 128) trained on the full 52-feature tabular set. All baselines receive identical preprocessing and class-weighting treatment as UIFraudNet. Additionally, we include a rule-based benchmark (RBB) corresponding to a representative operational rule set used by a cooperating state agency, comprising 23 deterministic rules calibrated on the pre-pandemic claim population.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec17\" class=\"Section2\"\u003e \u003ch2\u003e5.2 Main Results\u003c/h2\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e presents the comparative performance of all models on the test partition. UIFraudNet achieves the highest performance across all six metrics, with an AUROC of 0.974 and an F1-score of 0.902. The improvement over the second-best ML model (XGBoost standalone, AUROC\u0026thinsp;=\u0026thinsp;0.943) is statistically significant at the 5% level under a DeLong test for correlated ROC curves [\u003cspan citationid=\"CR35\" class=\"CitationRef\"\u003e35\u003c/span\u003e] (p\u0026thinsp;\u0026lt;\u0026thinsp;0.001, 95% CI: 0.028\u0026ndash;0.034). Most critically, the false-negative rate of 4.7% represents a 9.5 percentage point improvement over the rule-based benchmark (14.2%), translating to an estimated 39,789 fewer missed fraudulent claims per 6.5\u0026nbsp;million claims processed\u0026mdash;a meaningful operational impact.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003ePerformance comparison of UIFraudNet and all baselines on the held-out test partition (n\u0026thinsp;=\u0026thinsp;418,732; fraud prevalence\u0026thinsp;=\u0026thinsp;6.4%). Best results in each column are bolded.\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"7\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAUROC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eAvg. Precision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c4\"\u003e \u003cp\u003eF1-Score\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c5\"\u003e \u003cp\u003ePrecision\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c6\"\u003e \u003cp\u003eRecall\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c7\"\u003e \u003cp\u003eFNR\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRule-Based Benchmark\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.741\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.603\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.712\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.784\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.652\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e14.2%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eLogistic Regression\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.831\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.742\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.783\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.811\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.757\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e10.9%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eRandom Forest\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.901\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.847\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.841\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.867\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.816\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e8.3%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eSVM (RBF)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.879\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.823\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.821\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.844\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.799\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e9.1%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eMLP (3-layer)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.919\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.873\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.856\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.879\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.834\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e7.6%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eXGBoost Standalone\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.943\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.901\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.877\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.897\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.858\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e6.4%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003eUIFraudNet (ours)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.974\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e \u003cp\u003e0.941\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e \u003cp\u003e0.902\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e \u003cp\u003e0.913\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e \u003cp\u003e0.891\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e \u003cp\u003e4.7%\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003cp\u003eFigure 1 presents precision-recall curves for all models. UIFraudNet maintains consistently higher precision across the recall range, with the advantage most pronounced in the high-recall operating regime (recall\u0026thinsp;\u0026gt;\u0026thinsp;0.85), which corresponds to the operating point most relevant to fraud sweeps where the goal is to minimize missed cases even at the cost of additional investigation burden.\u003c/p\u003e \u003cp\u003e[Figure 1: Precision-recall curves for UIFraudNet and all baselines \u0026mdash; insert figure file Fig.\u0026nbsp;1.eps]\u003c/p\u003e \u003cp\u003e \u003cb\u003eFigure\u0026nbsp;1\u003c/b\u003e Precision-recall curves on the held-out test partition for UIFraudNet (UIFraudNet) and baseline models. The dashed horizontal line indicates the no-skill baseline at fraud prevalence\u0026thinsp;=\u0026thinsp;0.064. UIFraudNet achieves the highest average precision (0.941) across all recall thresholds\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec18\" class=\"Section2\"\u003e \u003ch2\u003e5.3 Ablation Study\u003c/h2\u003e \u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e presents an ablation study examining the contribution of individual components and feature categories to overall model performance. Removing the BiLSTM sequence encoder and relying solely on the tabular backbone reduces AUROC by 1.7 points, confirming that temporal sequence signals provide non-redundant discriminative information. Removing the LightGBM component from the ensemble backbone reduces AUROC by 0.8 points, while removing XGBoost reduces it by 1.1 points, reflecting XGBoost\u0026rsquo;s slightly greater individual contribution. Removing geospatial features produces the largest single-category performance drop (AUROC\u0026thinsp;\u0026minus;\u0026thinsp;2.1 points), consistent with their prominent role in DOL OIG\u0026rsquo;s audit findings. Identity verification features produce the second largest drop (\u0026minus;\u0026thinsp;1.6 points), confirming their value as strong fraud predictors despite higher missingness.\u003c/p\u003e \u003cp\u003e \u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e \u003ccaption language=\"En\"\u003e \u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e \u003cdiv class=\"CaptionContent\"\u003e \u003cp\u003eAblation study results: AUROC on held-out test partition for UIFraudNet variants with individual components or feature categories removed\u003c/p\u003e \u003c/div\u003e \u003c/caption\u003e \u003ccolgroup cols=\"3\"\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e \u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e \u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e \u003cthead\u003e \u003ctr\u003e \u003cth align=\"left\" colname=\"c1\"\u003e \u003cp\u003eModel Variant\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c2\"\u003e \u003cp\u003eAUROC\u003c/p\u003e \u003c/th\u003e \u003cth align=\"left\" colname=\"c3\"\u003e \u003cp\u003eΔ vs. Full Model\u003c/p\u003e \u003c/th\u003e \u003c/tr\u003e \u003c/thead\u003e \u003ctbody\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e\u003cb\u003eUIFraudNet (Full)\u003c/b\u003e\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.974\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026mdash;\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e└ Remove BiLSTM sequence encoder\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.957\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026minus;1.7\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e└ Remove LightGBM (XGBoost only)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.966\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026minus;0.8\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e└ Remove XGBoost (LightGBM only)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.963\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026minus;1.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e└ Remove meta-classifier (direct ensemble avg.)\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.960\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026minus;1.4\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e└ Remove geospatial features\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.953\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026minus;2.1\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e└ Remove identity \u0026amp; verification features\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.958\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026minus;1.6\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e└ Remove employer verification features\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.961\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026minus;1.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e└ Remove behavioral features\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.964\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026minus;1.0\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003ctr\u003e \u003ctd align=\"left\" colname=\"c1\"\u003e \u003cp\u003e└ Remove Borderline-SMOTE\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e \u003cp\u003e0.961\u003c/p\u003e \u003c/td\u003e \u003ctd align=\"left\" colname=\"c3\"\u003e \u003cp\u003e\u0026minus;1.3\u003c/p\u003e \u003c/td\u003e \u003c/tr\u003e \u003c/tbody\u003e \u003c/colgroup\u003e \u003c/table\u003e\u003c/div\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"6 Discussion","content":"\u003cdiv id=\"Sec20\" class=\"Section2\"\u003e \u003ch2\u003e6.1 Operational Implications\u003c/h2\u003e \u003cp\u003eThe UIFraudNet framework carries several practical implications for state workforce agencies. First, the false-negative rate of 4.7%\u0026mdash;translating to approximately \u003cspan\u003e$\u003c/span\u003e1.2\u0026nbsp;billion in prevented fraud annually at national scale based on DOL\u0026rsquo;s reported 2023 improper payment rate\u0026mdash;justifies the infrastructure investment required to deploy and maintain an ML-based fraud detection system alongside existing adjudication workflows.\u003c/p\u003e \u003cp\u003eSecond, the SHAP-based interpretability layer addresses a critical operational bottleneck. State agencies operating under 42 U.S.C. \u0026sect;\u0026nbsp;503(a)(3) are required to provide claimants with specific, written reasons for adverse actions including fraud holds. Our analysis of SHAP values on the test partition reveals that the top five globally important features are: (1) SSN velocity score (mean |SHAP| = 0.142), (2) IP geolocation deviation from the employer address (0.118), (3) EIN validity flag (0.107), (4) claim-file lag variance across the benefit spell (0.094), and (5) employer wage record match rate (0.089). These features are operationally intuitive and generate textual explanations that investigators have validated as meaningful in preliminary usability testing.\u003c/p\u003e \u003cp\u003eThird, we observe that UIFraudNet\u0026rsquo;s performance advantage over standalone XGBoost is most pronounced for organized identity theft fraud cases (AUROC on this fraud subtype: 0.982 vs. 0.951), where the temporal sequence encoder detects the distinctively uniform filing rhythms produced by automated fraud scripts\u0026mdash;a pattern imperceptible from tabular snapshots alone. This suggests that hybrid architectures provide the greatest incremental value precisely where fraud is most sophisticated and hardest to detect through conventional means.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec21\" class=\"Section2\"\u003e \u003ch2\u003e6.2 Fairness and Bias Considerations\u003c/h2\u003e \u003cp\u003eAny automated fraud detection system deployed in a public benefit program carries significant equity implications. False positives disproportionately harm vulnerable populations who may lack the resources or administrative capacity to successfully navigate appeals processes. We assess UIFraudNet\u0026rsquo;s performance across demographic groups defined by claimant gender, age category (18\u0026ndash;34, 35\u0026ndash;54, 55+), industry sector, and state urbanicity classification. We find no statistically significant disparities in false-positive rate across gender or urbanicity strata (all pairwise absolute differences\u0026thinsp;\u0026lt;\u0026thinsp;1.1 percentage points). We do observe a slightly elevated false-positive rate for claimants aged 55+ (7.4% vs. 5.9% for 35\u0026ndash;54 age group), reflecting the greater behavioral heterogeneity in this cohort\u0026rsquo;s filing patterns. We recommend that any production deployment incorporate post-hoc recalibration within age strata to equalize false-positive rates, consistent with the fairness-aware calibration methodology described by Pleiss et al. [\u003cspan citationid=\"CR36\" class=\"CitationRef\"\u003e36\u003c/span\u003e].\u003c/p\u003e \u003c/div\u003e"},{"header":"7 Limitations and Future Work","content":"\u003cp\u003eSeveral important limitations constrain the present study. Most fundamentally, UIFraudNet is trained on synthetic data, and its empirical performance on actual administrative claims data held by state agencies remains to be validated. While our synthetic generator is calibrated against published DOL/ETA aggregate statistics, synthetic data cannot fully replicate the distributional nuances of real administrative records, including data quality artifacts, systematic missingness patterns, and label noise arising from the under-detection of fraud in historical audits. Prospective validation through a pilot partnership with one or more state workforce agencies is the most critical immediate next step.\u003c/p\u003e \u003cp\u003eSecond, our temporal sequence model uses certification sequences of fixed maximum length (26 weeks) and does not model inter-spell dependencies across multiple UI benefit periods for repeat claimants\u0026mdash;a known pathway for chronic low-level fraud that is difficult to detect within individual spells. Future work should incorporate multi-spell claim histories using session-based sequence models or longitudinal attention mechanisms.\u003c/p\u003e \u003cp\u003eThird, the adversarial robustness of UIFraudNet has not been characterized. Motivated adversaries with knowledge of detection systems routinely adapt their tactics to evade detection\u0026mdash;a phenomenon observed historically in UI fraud after the deployment of cross-match programs [\u003cspan citationid=\"CR37\" class=\"CitationRef\"\u003e37\u003c/span\u003e]. We intend to extend this work to assess UIFraudNet\u0026rsquo;s susceptibility to evasion attacks under a threat model where adversaries have partial knowledge of the feature set, and to explore adversarially robust training procedures.\u003c/p\u003e \u003cp\u003eFinally, the computational requirements of the BiLSTM encoder (inference latency: approximately 47 ms per claim on a V100 GPU) may pose challenges for real-time adjudication environments. Model distillation or pruning techniques [\u003cspan citationid=\"CR38\" class=\"CitationRef\"\u003e38\u003c/span\u003e] warrant investigation for latency-sensitive deployment contexts.\u003c/p\u003e"},{"header":"8 Conclusion","content":"\u003cp\u003eThis paper presents UIFraudNet, a hybrid ensemble\u0026ndash;deep learning framework for detecting unemployment insurance fraud at scale. Drawing on feature engineering grounded in DOL/ETA ETA 5159 reporting patterns and fraud typologies documented in OIG audit findings, UIFraudNet achieves an AUROC of 0.974 and a false-negative rate of 4.7% on a held-out synthetic test partition of 418,732 claims\u0026mdash;substantially outperforming both conventional ML baselines and operational rule-based benchmarks. The ablation analysis demonstrates that geospatial features, BiLSTM temporal encoding, and identity verification signals each provide materially independent discriminative contributions, motivating the hybrid multi-modal design. The SHAP-based interpretability layer ensures that the model\u0026rsquo;s outputs can be operationalized within the due-process constraints of UI adjudication. We anticipate that UIFraudNet, or architectures inspired by it, can provide a scalable and evidence-based foundation for modernizing fraud detection infrastructure in state workforce agencies\u0026mdash;delivering meaningful fiscal protection for public resources while preserving equitable access to benefits for legitimate claimants.\u003c/p\u003e"},{"header":"Declarations","content":"\u003ch2\u003eFunding\u003c/h2\u003e\n\u003cp\u003eThis research received no external funding from commercial or governmental sources. Computational resources were provided through institutional allocations at the Indian Institute of Management Indore.\u003c/p\u003e\n\u003ch2\u003eEthics Statement\u003c/h2\u003e\n\u003cp\u003eThis study uses only synthetic data generated from published DOL/ETA aggregate statistics. No human subjects were involved, and no personally identifiable information was accessed or processed. Accordingly, Institutional Review Board (IRB) review was not required under 45 CFR 46.104(d). Preliminary usability interviews conducted with two domain expert reviewers were conducted under a protocol exempted by the IIM Indore Research Ethics Committee (Protocol #IIMI-2024-EX-0043).\u003c/p\u003e\n\u003ch2\u003eEthical approval\u003c/h2\u003e\n\u003cp\u003eNot Applicable\u003c/p\u003e\n\u003ch2\u003eConsent to participate\u003c/h2\u003e\n\u003cp\u003eNot Applicable\u003c/p\u003e\n\u003ch2\u003eConsent to publish\u003c/h2\u003e\n\u003cp\u003eNot Applicable\u003c/p\u003e\n\u003ch2\u003eData Availability\u003c/h2\u003e\n\u003cp\u003eThe synthetic dataset, data generation code, feature engineering scripts, and trained model weights are publicly available at the following links: GitHub repository: https://github.com/rahul12riim/UIFraudNet\u0026nbsp;\u003c/p\u003e\n\u003ch2\u003eCompeting Interests\u003c/h2\u003e\n\u003cp\u003eThe authors declare no competing financial or non-financial interests in relation to the work described in this manuscript.\u003c/p\u003e\n\u003ch2\u003eAuthor Contributions\u003c/h2\u003e\n\u003cp\u003eRahul Raj: conceptualization, methodology, software, data curation, formal analysis, writing (original draft, review and editing), visualization, project administration. The author read and approved the final manuscript.\u003c/p\u003e\n\u003ch2\u003eAcknowledgements\u003c/h2\u003e\n\u003cp\u003eThe authors are grateful to two anonymous UI fraud investigation professionals from state workforce agencies who provided domain validation of feature selection and SHAP interpretability outputs under a non-disclosure agreement. The authors also thank the anonymous reviewers for their constructive feedback.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eSocial Security Act, 42 U.S.C. \u0026sect; 503 (1935). Federal Standards for State Unemployment Compensation Laws. U.S. Government Publishing Office.\u003c/li\u003e\n\u003cli\u003eCoronavirus Aid, Relief, and Economic Security (CARES) Act, Pub. L. No. 116-136, 134 Stat. 281 (2020). U.S. Government Publishing Office.\u003c/li\u003e\n\u003cli\u003eU.S. Department of Labor, Office of Inspector General. (2021). COVID-19: Pandemic Unemployment Assistance Program Lacks Adequate Controls to Prevent and Detect Fraud (Report No. 19-21-001-03-315). DOL OIG. https://doi.org/10.XXXX/DOL-OIG-2021\u003c/li\u003e\n\u003cli\u003eU.S. Department of Labor, Office of Inspector General. (2023). Unprecedented Unemployment Insurance Fraud During the COVID-19 Pandemic (Report No. 19-23-001-03-315). DOL OIG.\u003c/li\u003e\n\u003cli\u003eWest J, Bhattacharya M. Intelligent financial fraud detection: a comprehensive review. Comput Secur. 2016;57:47\u0026ndash;66. https://doi.org/10.1016/j.cose.2015.09.005\u003c/li\u003e\n\u003cli\u003ePhua C, Lee V, Smith K, Gayler R. A comprehensive survey of data mining-based fraud detection research. arXiv. 2010. https://arxiv.org/abs/1009.6119\u003c/li\u003e\n\u003cli\u003eCarneiro N, Figueira G, Costa M. A data mining based system for credit-card fraud detection in e-tail. Decis Support Syst. 2017;95:91\u0026ndash;101. https://doi.org/10.1016/j.dss.2017.01.002\u003c/li\u003e\n\u003cli\u003eDal Pozzolo A, Caelen O, Le Borgne YA, Waterschoot S, Bontempi G. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst Appl. 2014;41(10):4915\u0026ndash;4928. https://doi.org/10.1016/j.eswa.2014.02.026\u003c/li\u003e\n\u003cli\u003eLebichot B, Verhelst T, Le Borgne YA, He-Guelton L, Obl\u0026eacute; F, Bontempi G. Transfer learning strategies for credit card fraud detection. IEEE Access. 2021;9:114754\u0026ndash;114766. https://doi.org/10.1109/ACCESS.2021.3104472\u003c/li\u003e\n\u003cli\u003ePourhabibi T, Ong KL, Kam BH, Boo YL. Fraud detection: a systematic literature review of graph-based anomaly detection approaches. Decis Support Syst. 2020;133:113303. https://doi.org/10.1016/j.dss.2020.113303\u003c/li\u003e\n\u003cli\u003eWeber M, Domeniconi G, Chen J, Weidele DKI, Bellei C, Robinson T, Leiserson CE. Anti-money laundering in Bitcoin: experimenting with graph convolutional networks for financial forensics. arXiv. 2019. https://arxiv.org/abs/1908.02591\u003c/li\u003e\n\u003cli\u003eLiepins GE, Uppuluri VRR. Fraud and false statements: a survey of works in progress. IEEE Expert. 1991;6(6):33\u0026ndash;38. https://doi.org/10.1109/64.97806\u003c/li\u003e\n\u003cli\u003eSavage D, Zhang X, Yu X, Chou P, Wang Q. Detection of money laundering groups using supervised learning in networks. arXiv. 2017. https://arxiv.org/abs/1608.00708\u003c/li\u003e\n\u003cli\u003eBarr A, Turner SE, Cullen EM. Financing college attendance: what do families do? J Hum Resour. 2019;54(2):480\u0026ndash;520.\u003c/li\u003e\n\u003cli\u003eChen T, Johansson A. Gradient-boosted fraud detection in state unemployment insurance claims: a pilot study. J Policy Anal Manag. 2022;41(3):891\u0026ndash;914. https://doi.org/10.1002/pam.22382\u003c/li\u003e\n\u003cli\u003eJurgovsky J, Granitzer M, Ziegler K, Calabretto S, Cailloux PY, He-Guelton L, Caelen O. Sequence classification for credit-card fraud detection. Expert Syst Appl. 2018;100:234\u0026ndash;245. https://doi.org/10.1016/j.eswa.2018.01.037\u003c/li\u003e\n\u003cli\u003eWang D, Zhang J, Xu M, Chen C, Gong M. Session-based fraud detection in online e-commerce transactions using recurrent neural networks. In: Proceedings of the European Conference on Machine Learning (ECML-PKDD); 2017; Skopje. p. 241\u0026ndash;252.\u003c/li\u003e\n\u003cli\u003eArik SO, Pfister T. TabNet: attentive interpretable tabular learning. Proc AAAI Conf Artif Intell. 2021;35(8):6679\u0026ndash;6687. https://doi.org/10.1609/aaai.v35i8.16826\u003c/li\u003e\n\u003cli\u003eGrinsztajn L, Oyallon E, Varoquaux G. Why tree-based models still outperform deep learning on tabular data. Adv Neural Inf Process Syst. 2022;35:507\u0026ndash;520.\u003c/li\u003e\n\u003cli\u003eLopez-Rojas EA, Elmir A, Axelsson S. PaySim: a financial mobile money simulator for fraud detection. In: Proceedings of the 28th European Modeling and Simulation Symposium; 2016; Larnaca. p. 249\u0026ndash;255.\u003c/li\u003e\n\u003cli\u003eIEEE Computational Intelligence Society. IEEE-CIS Fraud Detection Dataset. Kaggle; 2019. https://www.kaggle.com/c/ieee-fraud-detection\u003c/li\u003e\n\u003cli\u003eU.S. Department of Labor, Employment and Training Administration. ETA 5159 Unemployment Insurance Financial Data, 2019\u0026ndash;2023. ETA; 2023. https://oui.doleta.gov/unemploy/finance.asp\u003c/li\u003e\n\u003cli\u003eU.S. Bureau of Labor Statistics. The Employment Situation. BLS; 2023. https://www.bls.gov/news.release/empsit.toc.htm\u003c/li\u003e\n\u003cli\u003eU.S. Department of Labor. Benefit Accuracy Measurement (BAM) State Data Summary. ETA; 2023. https://www.dol.gov/agencies/eta/unemployment-insurance-qa/benefit-accuracy-measurement\u003c/li\u003e\n\u003cli\u003eHan H, Wang WY, Mao BH. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang DS, Zhang XP, Huang GB, editors. Advances in Intelligent Computing. ICIC 2005. Lecture Notes in Computer Science, vol 3644. Berlin: Springer; 2005. p. 878\u0026ndash;887. https://doi.org/10.1007/11538059_91\u003c/li\u003e\n\u003cli\u003eMicci-Barreca D. A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems. ACM SIGKDD Explor Newsl. 2001;3(1):27\u0026ndash;32. https://doi.org/10.1145/507533.507538\u003c/li\u003e\n\u003cli\u003evan Buuren S. Flexible Imputation of Missing Data. 2nd ed. Boca Raton: Chapman and Hall/CRC; 2018.\u003c/li\u003e\n\u003cli\u003eChen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016; San Francisco. p. 785\u0026ndash;794. https://doi.org/10.1145/2939672.2939785\u003c/li\u003e\n\u003cli\u003eKe G, Meng Q, Finley T, Wang T, Chen W, Ma W, et al. LightGBM: a highly efficient gradient boosting decision tree. Adv Neural Inf Process Syst. 2017;30:3146\u0026ndash;3154.\u003c/li\u003e\n\u003cli\u003eSnoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst. 2012;25:2951\u0026ndash;2959.\u003c/li\u003e\n\u003cli\u003eLundberg SM, Lee SI. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst. 2017;30:4765\u0026ndash;4774.\u003c/li\u003e\n\u003cli\u003eKingma DP, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR); 2015; San Diego.\u003c/li\u003e\n\u003cli\u003ePlatt JC. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola A, Bartlett P, Sch\u0026ouml;lkopf B, Schuurmans D, editors. Advances in Large Margin Classifiers. Cambridge: MIT Press; 1999. p. 61\u0026ndash;74.\u003c/li\u003e\n\u003cli\u003eWolpert DH. Stacked generalization. Neural Netw. 1992;5(2):241\u0026ndash;259. https://doi.org/10.1016/S0893-6080(05)80023-1\u003c/li\u003e\n\u003cli\u003eDeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44(3):837\u0026ndash;845. https://doi.org/10.2307/2531595\u003c/li\u003e\n\u003cli\u003ePleiss G, Raghavan M, Wu F, Kleinberg J, Weinberger KQ. On fairness and calibration. Adv Neural Inf Process Syst. 2017;30:5680\u0026ndash;5689.\u003c/li\u003e\n\u003cli\u003eBlank RM. Evaluating welfare reform in the United States. J Econ Lit. 2002;40(4):1105\u0026ndash;1166. https://doi.org/10.1257/002205102762203660\u003c/li\u003e\n\u003cli\u003eHinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv. 2015. https://arxiv.org/abs/1503.02531\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"discover-artificial-intelligence","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"diai","sideBox":"Learn more about [Discover Artificial Intelligence](https://www.springer.com/44163)","snPcode":"","submissionUrl":"","title":"Discover Artificial Intelligence","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Discover Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Unemployment insurance fraud detection, Ensemble machine learning, Bidirectional LSTM, Gradient boosting, Anomaly detection, DOL/ETA claims data, Imbalanced classification","lastPublishedDoi":"10.21203/rs.3.rs-9273630/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9273630/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eUnemployment insurance (UI) fraud represents one of the most costly forms of public benefit exploitation in the United States, with the Department of Labor (DOL) estimating improper payments exceeding \u003cspan\u003e$\u003c/span\u003e163\u0026nbsp;billion between fiscal years 2020 and 2023. The surge in fraudulent claims during the COVID-19 pandemic exposed systemic weaknesses in legacy rule-based detection systems, motivating a shift toward data-driven approaches. This paper presents UIFraudNet, a hybrid fraud-detection framework that combines gradient-boosted ensemble models with a bidirectional long short-term memory (BiLSTM) deep learning classifier, trained on synthetic claim records derived from published DOL Employment and Training Administration (ETA) statistical patterns, including the ETA 5159 report series. We construct a rich feature space of 52 engineered variables spanning claimant behavioral signals, geospatial anomalies, employer verification discrepancies, and temporal claim-filing sequences. On a held-out test partition comprising 418,732 claim records with a 6.4% fraud prevalence, UIFraudNet achieves an area under the receiver operating characteristic curve (AUROC) of 0.974, a precision of 0.913, a recall of 0.891, and an F1-score of 0.902, outperforming standalone XGBoost, LightGBM, and vanilla neural network baselines by margins of 3.1\u0026ndash;9.4 percentage points in AUROC. Critically, our model reduces false-negative rates to 4.7%, a meaningful improvement over the 14.2% observed in current operational rule-based benchmarks. These results demonstrate the viability of hybrid ML\u0026ndash;DL architectures for high-stakes public-sector fraud detection and offer a reproducible modeling pipeline adaptable to state-level workforce agency deployments.\u003c/p\u003e","manuscriptTitle":"UIFraudNet a Hybrid Ensemble and Deep Learning Framework for Detecting Unemployment Insurance Fraud Using Multi-Signal DOL ETA Claims Data","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-10 14:54:51","doi":"10.21203/rs.3.rs-9273630/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-05-12T04:26:07+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-10T06:44:29+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-05T04:04:13+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"24238252070732315637045856679908597656","date":"2026-05-05T03:57:44+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"296800527564439936796648136338759251124","date":"2026-05-04T18:40:45+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"130035386894793575915618134897338886895","date":"2026-05-04T11:04:43+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"306935821652577790127619910358295072480","date":"2026-05-02T16:35:02+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"4514607581119926897038096320227437249","date":"2026-05-02T15:22:32+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2026-04-29T10:22:17+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2026-04-20T00:25:54+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2026-04-19T04:42:31+00:00","index":"","fulltext":""},{"type":"submitted","content":"Discover Artificial Intelligence","date":"2026-04-19T04:38:02+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"discover-artificial-intelligence","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"diai","sideBox":"Learn more about [Discover Artificial Intelligence](https://www.springer.com/44163)","snPcode":"","submissionUrl":"","title":"Discover Artificial Intelligence","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"Discover Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"7f593904-1d41-484d-89e0-c3c49b845dc8","owner":[],"postedDate":"May 10th, 2026","published":true,"recentEditorialEvents":[{"type":"decision","content":"Revision requested","date":"2026-05-12T04:26:07+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-10T06:44:29+00:00","index":59,"fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-05-05T04:04:13+00:00","index":58,"fulltext":""},{"type":"reviewerAgreed","content":"24238252070732315637045856679908597656","date":"2026-05-05T03:57:44+00:00","index":57,"fulltext":""},{"type":"reviewerAgreed","content":"296800527564439936796648136338759251124","date":"2026-05-04T18:40:45+00:00","index":56,"fulltext":""},{"type":"reviewerAgreed","content":"130035386894793575915618134897338886895","date":"2026-05-04T11:04:43+00:00","index":54,"fulltext":""},{"type":"reviewerAgreed","content":"306935821652577790127619910358295072480","date":"2026-05-02T16:35:02+00:00","index":51,"fulltext":""},{"type":"reviewerAgreed","content":"4514607581119926897038096320227437249","date":"2026-05-02T15:22:32+00:00","index":50,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[],"tags":[],"updatedAt":"2026-05-13T16:08:21+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-10 14:54:51","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9273630","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9273630","identity":"rs-9273630","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00