ContrastSkill: Task-Oriented Contrastive Pre-Training for Enhanced Skill Extraction in Job Data

doi:10.21203/rs.3.rs-7312457/v1

ContrastSkill: Task-Oriented Contrastive Pre-Training for Enhanced Skill Extraction in Job Data

2025 · doi:10.21203/rs.3.rs-7312457/v1

preprint OA: closed

Full text JSON View at publisher

Full text 11,581 characters · extracted from preprint-html · click to expand

ContrastSkill: Task-Oriented Contrastive Pre-Training for Enhanced Skill Extraction in Job Data | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article ContrastSkill: Task-Oriented Contrastive Pre-Training for Enhanced Skill Extraction in Job Data Aleksander Bielinski, David Brazier, Alistair Lawson, Dimitra Gkatzia This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7312457/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract The continuously evolving nature of labour markets demands a more flexible and adaptable workforce. To meet these demands, job seekers must understand competence requirements not just for specific jobs but across entire sectors and industries. While prior work has explored Named Entity Recognition (NER) and rule-based methods for skill extraction, these approaches often struggle with generalisation across diverse datasets and industries and are susceptible to errors concerning rarer competencies. This gap poses challenges for policymakers, career researchers and HR professionals who rely on accurate, large-scale skill extraction to analyse workforce trends and inform policy decisions. In this work, we introduce ContrastSkill, a contrastive learning-based framework for pre-training language models to enhance skill extraction. By adding a supervised contrastive pre-training step utilising domain data, we improve generalisation and robustness in standard transfer learning NER pipelines. In cross-dataset experiments, ContrastSkill achieves up to 2.32 percentage points span-F1 improvement over standard fine-tuning and delivers significant gains on two of three evaluated datasets, with comparable performance on the third. We also compare ContrastSkill with baseline methods, conduct a comprehensive study across different models, and perform an extensive ablation to reveal interpretability insights and optimal architectural choices for job-related skill extraction. We release the code and supplementary materials to foster reproducibility. Information Extraction Natural Language Processing Contrastive Learning Skill Extraction Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7312457","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":521755011,"identity":"91f3ac45-8bdd-41bd-a481-bc77bf5fa5fb","order_by":0,"name":"Aleksander Bielinski","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABCUlEQVRIiWNgGAWjYFACHoYDDAUSQJqx8QGYCwUG+LUYgLU0GxCtBSLNw8AmgSyOUwt/A+/BAx8MLPLkew63VRfU3JHhlz7A+OEHw2FjXFokDvAlHJxhIFFscLax7faMY894JPsSmCV7GA6b4XTXAR6DwzwGEokb+BnbbvOwAdlnGBikGRgO2+DSIQ/S8geoZX4/Y1sxz7/DPPZnGJh/49NiANICDLHEBqDDmHnbgLYAwwFkC06HGR4G+qUH5LAzB5ulefue8UicYWyz7DFIx+l9ueO9hz/8qKhLnN+T/vAzz7c79vw9zIdv/KiwNmzApYcZLTSAmLEBb0SigwPEKx0Fo2AUjIIRAwCXPlMODtf2CwAAAABJRU5ErkJggg==","orcid":"","institution":"Edinburgh Napier University","correspondingAuthor":true,"prefix":"","firstName":"Aleksander","middleName":"","lastName":"Bielinski","suffix":""},{"id":521755012,"identity":"8091cba2-3285-42c9-90c9-ff5ec86c36aa","order_by":1,"name":"David Brazier","email":"","orcid":"","institution":"Edinburgh Napier University","correspondingAuthor":false,"prefix":"","firstName":"David","middleName":"","lastName":"Brazier","suffix":""},{"id":521755013,"identity":"459b8ef9-0347-43d9-be97-81965573d8df","order_by":2,"name":"Alistair Lawson","email":"","orcid":"","institution":"Edinburgh Napier University","correspondingAuthor":false,"prefix":"","firstName":"Alistair","middleName":"","lastName":"Lawson","suffix":""},{"id":521755014,"identity":"e61b2a53-af1d-4e62-8443-b98165203de6","order_by":3,"name":"Dimitra Gkatzia","email":"","orcid":"","institution":"Edinburgh Napier University","correspondingAuthor":false,"prefix":"","firstName":"Dimitra","middleName":"","lastName":"Gkatzia","suffix":""}],"badges":[],"createdAt":"2025-08-06 18:53:12","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7312457/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7312457/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":92431127,"identity":"774e48d5-0c49-4b38-9b1e-1a31ee2c83bf","added_by":"auto","created_at":"2025-09-29 16:08:32","extension":"json","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6416,"visible":true,"origin":"","legend":"","description":"","filename":"a476f8aeae4948e6b1108458643be537.json","url":"https://assets-eu.researchsquare.com/files/rs-7312457/v1/df79322e6ec6f06642f54130.json"},{"id":97330287,"identity":"dfaa90f7-9a9d-4c3e-8d15-8d586328df74","added_by":"auto","created_at":"2025-12-03 09:09:07","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":842637,"visible":true,"origin":"","legend":"","description":"","filename":"ContrastSkillsubmissionforDiscoverComputing.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7312457/v1_covered_2922c738-1ccb-403c-94fa-ae70d7caa935.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"ContrastSkill: Task-Oriented Contrastive Pre-Training for Enhanced Skill Extraction in Job Data","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Information Extraction, Natural Language Processing, Contrastive Learning, Skill Extraction","lastPublishedDoi":"10.21203/rs.3.rs-7312457/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7312457/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"The continuously evolving nature of labour markets demands a more flexible and adaptable workforce. To meet these demands, job seekers must understand competence requirements not just for specific jobs but across entire sectors and industries. While prior work has explored Named Entity Recognition (NER) and rule-based methods for skill extraction, these approaches often struggle with generalisation across diverse datasets and industries and are susceptible to errors concerning rarer competencies. This gap poses challenges for policymakers, career researchers and HR professionals who rely on accurate, large-scale skill extraction to analyse workforce trends and inform policy decisions. In this work, we introduce ContrastSkill, a contrastive learning-based framework for pre-training language models to enhance skill extraction. By adding a supervised contrastive pre-training step utilising domain data, we improve generalisation and robustness in standard transfer learning NER pipelines. In cross-dataset experiments, ContrastSkill achieves up to 2.32 percentage points span-F1 improvement over standard fine-tuning and delivers significant gains on two of three evaluated datasets, with comparable performance on the third. We also compare ContrastSkill with baseline methods, conduct a comprehensive study across different models, and perform an extensive ablation to reveal interpretability insights and optimal architectural choices for job-related skill extraction. We release the code and supplementary materials to foster reproducibility.","manuscriptTitle":"ContrastSkill: Task-Oriented Contrastive Pre-Training for Enhanced Skill Extraction in Job Data","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-09-29 16:06:38","doi":"10.21203/rs.3.rs-7312457/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"e5a74de4-3a65-4eed-be2b-29d302bfe1db","owner":[],"postedDate":"September 29th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-12-03T09:08:13+00:00","versionOfRecord":[],"versionCreatedAt":"2025-09-29 16:06:38","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7312457","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7312457","identity":"rs-7312457","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00