Assessing the potential of LLM-assisted annotation for corpus pragmatics: the case of humor

preprint OA: closed
Full text JSON View at publisher
Full text 15,487 characters · extracted from preprint-html · click to expand
Assessing the potential of LLM-assisted annotation for corpus pragmatics: the case of humor | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Assessing the potential of LLM-assisted annotation for corpus pragmatics: the case of humor Antonio Bianco, Nicola Brocca, Davide Garassino This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8559781/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 21 Mar, 2026 Read the published version in Corpus Pragmatics → Version 1 posted You are reading this latest preprint version Abstract Corpus pragmatics faces ongoing challenges in quantitatively studying context-dependent categories like humor, given their subjectivity and the need for costly interrater reliability checks. Recent advances in LLMs offer a potential way to streamline these processes for pragmatic annotation tasks. This paper investigates that potential through an analysis of Italian political discourse on X, focusing on humorous tweets and their discursive functions (Attardo, 2020 ). We compare the performance of GPT-4o, LLaMA-3.3-70B-Instruct, and a novice annotator against that of an expert annotator. For the detection of humor, both models reached high agreement with the expert annotator (in particular, GPT-4o: Cohen’s k = 0.75; AC1 = 0.87). Instead, agreement dropped for the classification of humor functions (GPT-4o: Cohen’s k = 0.37; AC1 = 0.70). Qualitative results suggest that the models rely heavily on lexical cues rather than demonstrating deeper pragmatic competence. These findings indicate that while LLMs can provide useful assistance in the initial stages of large-scale annotation, they remain limited in capturing the nuanced and context-dependent nature of pragmatic functions. Humor LLMs political communication LLaMA GPT automatic annotation Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 21 Mar, 2026 Read the published version in Corpus Pragmatics → Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8559781","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":573412145,"identity":"0a6e7534-4c85-411e-81e5-14ffef5d1a70","order_by":0,"name":"Antonio Bianco","email":"","orcid":"","institution":"University of Bergamo","correspondingAuthor":false,"prefix":"","firstName":"Antonio","middleName":"","lastName":"Bianco","suffix":""},{"id":573412147,"identity":"55ca1920-8d28-4149-87e4-d91290f79cd0","order_by":1,"name":"Nicola Brocca","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA9klEQVRIiWNgGAWjYDACZiB+wMCQAOFVMPAY8BCjJQGu5QxCiwRebXAtjG0MDAS1mLMzP3yQUMOQZ97e/PDDz3mHZcx5DjB++MFgV4dLi2Uzm7FBwjGGYpkzx4wle7cd5rHsbWCW7GFIxmmLwWEeNokENobEGRI5bAy8QC0G5xkYpIF+JKDlH0QL4985YC3MvxkY6vFrSWyDaGHmbQBqOdvABrTlMB4tQL8k9kkUS/AcM5aWOZbOY9lzsM2yx+C4ZAMuLecPP3zw4ZtNngR788OPb2qs7c15kg/f+FFRzY/LFihAcQUj0HwDAhpGwSgYBaNgFOAFAI3oSpU4i0nBAAAAAElFTkSuQmCC","orcid":"","institution":"Universität Innsbruck","correspondingAuthor":true,"prefix":"","firstName":"Nicola","middleName":"","lastName":"Brocca","suffix":""},{"id":573412149,"identity":"7448362c-4373-479f-a8d8-5d17a55c97d5","order_by":2,"name":"Davide Garassino","email":"","orcid":"","institution":"Zurich University of Applied Sciences","correspondingAuthor":false,"prefix":"","firstName":"Davide","middleName":"","lastName":"Garassino","suffix":""}],"badges":[],"createdAt":"2026-01-09 10:23:28","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8559781/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8559781/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1007/s41701-026-00235-7","type":"published","date":"2026-03-21T15:59:21+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":100261541,"identity":"4afeb1bc-fc99-4fe3-a685-d7a92a9b1bc6","added_by":"auto","created_at":"2026-01-14 17:21:14","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2223295,"visible":true,"origin":"","legend":"","description":"","filename":"Humorcorpuspragmatics11.01.2026.docx","url":"https://assets-eu.researchsquare.com/files/rs-8559781/v1/61983b3ff3303168327cc4d8.docx"},{"id":100371509,"identity":"04377f21-d15f-48a4-a287-6d13a361c284","added_by":"auto","created_at":"2026-01-16 08:10:25","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4868,"visible":true,"origin":"","legend":"","description":"","filename":"d043db7497b245bba27cf56957ff13b7.json","url":"https://assets-eu.researchsquare.com/files/rs-8559781/v1/6c938d80ac87553b9f4c41a2.json"},{"id":100372740,"identity":"1062177f-47b6-4429-9306-98061483bd52","added_by":"auto","created_at":"2026-01-16 08:13:05","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":150240,"visible":true,"origin":"","legend":"","description":"","filename":"d043db7497b245bba27cf56957ff13b71enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8559781/v1/8d7df994ab2a75504b968c2f.xml"},{"id":100372081,"identity":"9eebb005-e045-4486-b3b8-9fe10f04d79a","added_by":"auto","created_at":"2026-01-16 08:11:36","extension":"jpeg","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":88630,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8559781/v1/903560d8079b017ffe2bb3bd.jpeg"},{"id":100372328,"identity":"c378ff2c-f534-47b4-a1af-32ae58855a91","added_by":"auto","created_at":"2026-01-16 08:12:03","extension":"jpeg","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":202983,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8559781/v1/d0d8dce9ace06b1c71bb5180.jpeg"},{"id":100261540,"identity":"f620dd04-0b1a-4ffc-9b2b-734c62e6c5c9","added_by":"auto","created_at":"2026-01-14 17:21:14","extension":"jpeg","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":227470,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-8559781/v1/3b30f9fe1123f60457a66322.jpeg"},{"id":100371655,"identity":"00e33a71-24de-48d7-bb28-f3489806365c","added_by":"auto","created_at":"2026-01-16 08:10:40","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":22157,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-8559781/v1/b6aa4e6e4aa349f38ef0708c.png"},{"id":100261546,"identity":"956c3657-b511-4fa4-9f32-74acaf8c4c37","added_by":"auto","created_at":"2026-01-14 17:21:14","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":45673,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-8559781/v1/d0943b6d0b68798f9ef1e0ae.png"},{"id":100261545,"identity":"6ad16a43-bd80-49aa-8070-ddf4d9606bb2","added_by":"auto","created_at":"2026-01-14 17:21:14","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":55326,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-8559781/v1/406552fec35661c4c82b8f0b.png"},{"id":100261548,"identity":"1333fece-df75-407a-b7c3-d7eb42394a77","added_by":"auto","created_at":"2026-01-14 17:21:15","extension":"xml","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":148214,"visible":true,"origin":"","legend":"","description":"","filename":"d043db7497b245bba27cf56957ff13b71structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8559781/v1/2c3f511e0b18f18da995a8fd.xml"},{"id":100370881,"identity":"657b2def-5564-4c57-a8f2-94e26f725229","added_by":"auto","created_at":"2026-01-16 08:08:52","extension":"html","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":162116,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8559781/v1/14d1557993a72f4b83f324df.html"},{"id":105223720,"identity":"7167e525-fad8-4a85-8a48-5c18afb95c2b","added_by":"auto","created_at":"2026-03-23 16:09:20","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":598481,"visible":true,"origin":"","legend":"","description":"","filename":"Humorcorpuspragmatics11.01.2026.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8559781/v1_covered_21363eb8-1bbd-40f0-91df-78f76a3845b4.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Assessing the potential of LLM-assisted annotation for corpus pragmatics: the case of humor","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Humor, LLMs, political communication, LLaMA, GPT, automatic annotation","lastPublishedDoi":"10.21203/rs.3.rs-8559781/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8559781/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eCorpus pragmatics faces ongoing challenges in quantitatively studying context-dependent categories like humor, given their subjectivity and the need for costly interrater reliability checks. Recent advances in LLMs offer a potential way to streamline these processes for pragmatic annotation tasks. This paper investigates that potential through an analysis of Italian political discourse on X, focusing on humorous tweets and their discursive functions (Attardo, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). We compare the performance of GPT-4o, LLaMA-3.3-70B-Instruct, and a novice annotator against that of an expert annotator. For the detection of humor, both models reached high agreement with the expert annotator (in particular, GPT-4o: Cohen\u0026rsquo;s k\u0026thinsp;=\u0026thinsp;0.75; AC1\u0026thinsp;=\u0026thinsp;0.87). Instead, agreement dropped for the classification of humor functions (GPT-4o: Cohen\u0026rsquo;s k\u0026thinsp;=\u0026thinsp;0.37; AC1\u0026thinsp;=\u0026thinsp;0.70). Qualitative results suggest that the models rely heavily on lexical cues rather than demonstrating deeper pragmatic competence. These findings indicate that while LLMs can provide useful assistance in the initial stages of large-scale annotation, they remain limited in capturing the nuanced and context-dependent nature of pragmatic functions.\u003c/p\u003e","manuscriptTitle":"Assessing the potential of LLM-assisted annotation for corpus pragmatics: the case of humor","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-14 17:21:10","doi":"10.21203/rs.3.rs-8559781/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"c5cebde2-5258-4bf4-a1b4-d91d8ec0d266","owner":[],"postedDate":"January 14th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-03-23T16:06:18+00:00","versionOfRecord":{"articleIdentity":"rs-8559781","link":"https://doi.org/10.1007/s41701-026-00235-7","journal":{"identity":"corpus-pragmatics","isVorOnly":false,"title":"Corpus Pragmatics"},"publishedOn":"2026-03-21 15:59:21","publishedOnDateReadable":"March 21st, 2026"},"versionCreatedAt":"2026-01-14 17:21:10","video":"","vorDoi":"10.1007/s41701-026-00235-7","vorDoiUrl":"https://doi.org/10.1007/s41701-026-00235-7","workflowStages":[]},"version":"v1","identity":"rs-8559781","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8559781","identity":"rs-8559781","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00