Leveraging Large Language Models to Build a Cutting-Edge French Word Sense Disambiguation Corpus

preprint OA: closed
Full text JSON View at publisher
Full text 10,333 characters · extracted from preprint-html · click to expand
Leveraging Large Language Models to Build a Cutting-Edge French Word Sense Disambiguation Corpus | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Leveraging Large Language Models to Build a Cutting-Edge French Word Sense Disambiguation Corpus Mouheb Mehdoui, Amel Fraisse, Mounir Zrigui This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5393717/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract With the increasing amount of data circulating over the Web, there is a growing need to develop and deploy tools aimed at unraveling semantic nuances within text or sentences. The challenges in extracting precise meanings arise from the complexity of natural language, while words usually have multiple interpretations depending on the context. The challenge of precisely interpreting words within a given context is what the task of Word Sense Disambiguation meets. It is a very old domain within the area of Natural Language Processing aimed at determining a word’s meaning that it is going to carry in a particular context, hence increasing the correctness of applications processing the language. Numerous linguistic resources are accessible online, including WordNet [1], thesauri[2], and dictionaries, enabling exploration of diverse contextual meanings. However, several limitations persist. These include the scarcity of resources for certain languages, a limited number of examples within corpora, and the challenge of accurately detecting the topic or context covered by text, which significantly impacts word sense disambiguation.This paper will discuss the different approaches to WSD and review corpora available for this task. We will contrast these approaches, highlighting the limitations, which will allow us to build a corpus in French, targeted for WSD. Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5393717","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":380557472,"identity":"b77033ca-58a9-44bb-abaa-d2902634b104","order_by":0,"name":"Mouheb Mehdoui","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABDElEQVRIiWNgGAWjYHACNgbGBiAlwWDAwGBgAxfm4SNSSxqQxQzRwkacFobDcC0MuLTotveYPfi4gyGaX7p542OegvN55u39Bx8X5tjJsDEwP3x0A1OL2Zkz5oYzzzDkzpxzrNiYx+B2scyZw8zGM7clAx3GZmycg0XLjRwzad42htwNIAZQS+IMiWQ2ad5tzEAtPGzSuLT8BWrZD9FyDqalHr8WRpAtEmAtB2BaDuPWcuZYmWTvGYncGTfSig3nGCQXS/AcNjbm3Xach40Zh1+ON2+T+LnDJrd/RvLGB2/+2OVJsDc+fMy7rdqen7354WMsWqBAAs5KQAgy41SOChIIqhgFo2AUjIIRBwD9AVr62ckDKgAAAABJRU5ErkJggg==","orcid":"","institution":"University of Monastir","correspondingAuthor":true,"prefix":"","firstName":"Mouheb","middleName":"","lastName":"Mehdoui","suffix":""},{"id":380557473,"identity":"183cc78f-9472-4b54-8afd-d07a58cb9431","order_by":1,"name":"Amel Fraisse","email":"","orcid":"","institution":"University of Lille","correspondingAuthor":false,"prefix":"","firstName":"Amel","middleName":"","lastName":"Fraisse","suffix":""},{"id":380557474,"identity":"3091c1d0-dd0c-4e8e-b934-ce3ec8725c29","order_by":2,"name":"Mounir Zrigui","email":"","orcid":"","institution":"University of Monastir","correspondingAuthor":false,"prefix":"","firstName":"Mounir","middleName":"","lastName":"Zrigui","suffix":""}],"badges":[],"createdAt":"2024-11-05 08:53:16","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-5393717/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-5393717/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":87022977,"identity":"47ff9e56-44a8-4771-b0bc-5f87e280c586","added_by":"auto","created_at":"2025-07-18 11:32:07","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":556638,"visible":true,"origin":"","legend":"","description":"","filename":"LeveragingLargeLanguageModelstoBuildaCuttingEdgeFrenchWordSenseDisambiguationCorpus2.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5393717/v1_covered_d4cfd01f-2a4c-44b4-9829-57c8d258c663.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Leveraging Large Language Models to Build a Cutting-Edge French Word Sense Disambiguation Corpus","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-5393717/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5393717/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eWith the increasing amount of data circulating over the Web, there is a growing need to develop and deploy tools aimed at unraveling semantic nuances within text or sentences. The challenges in extracting precise meanings arise from the complexity of natural language, while words usually have multiple interpretations depending on the context. The challenge of precisely interpreting words within a given context is what the task of Word Sense Disambiguation meets. It is a very old domain within the area of Natural Language Processing aimed at determining a word’s meaning that it is going to carry in a particular context, hence increasing the correctness of applications processing the language. Numerous linguistic resources are accessible online, including WordNet [1], thesauri[2], and dictionaries, enabling exploration of diverse contextual meanings. However, several limitations persist. These include the scarcity of resources for certain languages, a limited number of examples within corpora, and the challenge of accurately detecting the topic or context covered by text, which significantly impacts word sense disambiguation.This paper will discuss the different approaches to WSD and review corpora available for this task. We will contrast these approaches, highlighting the limitations, which will allow us to build a corpus in French, targeted for WSD.\u003c/p\u003e","manuscriptTitle":"Leveraging Large Language Models to Build a Cutting-Edge French Word Sense Disambiguation Corpus","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-11-27 06:10:18","doi":"10.21203/rs.3.rs-5393717/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"52b94722-30ba-4ac6-a05f-f44e3f3a8184","owner":[],"postedDate":"November 27th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-07-18T11:24:00+00:00","versionOfRecord":[],"versionCreatedAt":"2024-11-27 06:10:18","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-5393717","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5393717","identity":"rs-5393717","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2024) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00