Explainable Hybrid Machine Learning Framework for Detecting Fraudulent Job Listings Using SHAP-Based Interpretability

preprint OA: closed
Full text JSON View at publisher
Full text 10,101 characters · extracted from preprint-html · click to expand
Explainable Hybrid Machine Learning Framework for Detecting Fraudulent Job Listings Using SHAP-Based Interpretability | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Explainable Hybrid Machine Learning Framework for Detecting Fraudulent Job Listings Using SHAP-Based Interpretability Manish Rathi This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9693806/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Job fraud is a growing threat for freshers and job seekers who, in their desperation to find employment, unknow- ingly share sensitive personal data with fraudulent recruiters. This paper addresses this problem by proposing a machine learning framework that automatically detects fraudulent job listings on online recruitment platforms. The framework uses TF-IDF text features, metadata completeness indicators, class- weighted Logistic Regression, decision threshold tuning, and SHAP-based post-hoc explanations. The system is evaluated on the EMSCAD dataset containing 17,880 job postings, of which 4.84% are fraudulent, using a stratified 80/20 train-test split. Results show that the proposed framework achieves a fraudulent- class precision of 0.95 and recall of 0.64 at a decision threshold of 0.85, with an F1-score of 0.76. An ablation study confirms that class weighting is the most impactful component, improving recall from 0.48 to 0.90. SHAP explanations further enhance transparency by identifying missing company profiles and suspi- cious income-related terms as key fraud indicators, making the framework suitable for real-world moderation systems. Online Recruitment Fraud Fraudulent Job Listings Explainable Artificial Intelligence SHAP TF-IDF Class Imbalance Logistic Regression Natural Language Processing Full Text Additional Declarations The authors declare no competing interests. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9693806","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":639127506,"identity":"6397303e-f788-4ef3-9085-b241885dcde9","order_by":0,"name":"Manish Rathi","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABLElEQVRIie3Rv0rDQBzA8YRAslzo+guW+gRC5CAUrORVcghxuQri4iCSeHJZorOP4VQ6JhzGJaVrJqEUHKRDpSB0ES/tmkbdRO47HOS4D7k/mqZS/dEyPZIj6PFseTlAHYtl9Wf3W4JAZxjKsOekRVAT1PqbLdFMsG8Fdivq1pM7yUEyyfP1+GXf32McnCgkUVmu3qqrPtIs8fTYQLzyLBB2eXGYdnPWPx8PSJzcjY5oITeGwrBqIhl1hc4DPQUSV04ZEoYmI0xNSQB5jWS6cPM1D3xJIrC5IBzoK6afLUQeNrN5QCS5qQlGQI35kLeRhSskOUkrsr1kQIVnDO8BmbvOMqX4XW7sOHk4nW2e0n9m8xX9uO51LFE0kcZM2Iw/XV5nLH+zWqVSqf59XyNXbyflOHXPAAAAAElFTkSuQmCC","orcid":"","institution":"Noida institute of engineering and technology","correspondingAuthor":true,"prefix":"","firstName":"Manish","middleName":"","lastName":"Rathi","suffix":""}],"badges":[],"createdAt":"2026-05-12 15:05:07","currentVersionCode":1,"declarations":{"humanSubjects":false,"vertebrateSubjects":true,"conflictsOfInterestStatement":false,"humanSubjectEthicalGuidelines":false,"humanSubjectConsent":false,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":true},"doi":"10.21203/rs.3.rs-9693806/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9693806/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":109296050,"identity":"6c245640-18a0-4652-86d6-f93bff9ee442","added_by":"auto","created_at":"2026-05-15 08:44:53","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":626096,"visible":true,"origin":"","legend":"","description":"","filename":"IEEEConferenceTemplate13.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9693806/v1_covered_f4618838-7ea4-4a2b-893b-3d06aa459a92.pdf"}],"financialInterests":"The authors declare no competing interests.","formattedTitle":"\u003cp\u003eExplainable Hybrid Machine Learning Framework for Detecting Fraudulent Job Listings Using SHAP-Based Interpretability\u003c/p\u003e","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Noida institute of engineering and technology ","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Online Recruitment Fraud, Fraudulent Job Listings, Explainable Artificial Intelligence, SHAP, TF-IDF, Class Imbalance, Logistic Regression, Natural Language Processing","lastPublishedDoi":"10.21203/rs.3.rs-9693806/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9693806/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eJob fraud is a growing threat for freshers and job seekers who, in their desperation to find employment, unknow- ingly share sensitive personal data with fraudulent recruiters. This paper addresses this problem by proposing a machine learning framework that automatically detects fraudulent job listings on online recruitment platforms. The framework uses TF-IDF text features, metadata completeness indicators, class- weighted Logistic Regression, decision threshold tuning, and SHAP-based post-hoc explanations. The system is evaluated on the EMSCAD dataset containing 17,880 job postings, of which 4.84% are fraudulent, using a stratified 80/20 train-test split. Results show that the proposed framework achieves a fraudulent- class precision of 0.95 and recall of 0.64 at a decision threshold of 0.85, with an F1-score of 0.76. An ablation study confirms that class weighting is the most impactful component, improving recall from 0.48 to 0.90. SHAP explanations further enhance transparency by identifying missing company profiles and suspi- cious income-related terms as key fraud indicators, making the framework suitable for real-world moderation systems.\u003c/p\u003e","manuscriptTitle":"Explainable Hybrid Machine Learning Framework for Detecting Fraudulent Job Listings Using SHAP-Based Interpretability","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-05-14 14:57:45","doi":"10.21203/rs.3.rs-9693806/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"8d1f50b0-3c54-4fa1-8126-0cf67f24fe3f","owner":[],"postedDate":"May 14th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-05-14T14:57:45+00:00","versionOfRecord":[],"versionCreatedAt":"2026-05-14 14:57:45","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9693806","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9693806","identity":"rs-9693806","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00