SCoPE: Shift-Aware Speaker-Conditioned Priors for Emotion Recognition in Conversations

preprint OA: closed
Full text JSON View at publisher
Full text 11,165 characters · extracted from preprint-html · click to expand
SCoPE: Shift-Aware Speaker-Conditioned Priors for Emotion Recognition in Conversations | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article SCoPE: Shift-Aware Speaker-Conditioned Priors for Emotion Recognition in Conversations Burak Can Kaplan, Stefan Wermter This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-9065619/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract In conversations, human emotions are transient; however, they tend to persist across multiple utterances. For example, we rarely switch instantly between contrasting emotions such as happiness and anger. Instead, emotions tend to evolve smoothly, and these patterns are often speaker-specific. Some people might escalate, while others gradually cool down over time. Furthermore, when emotions change during a conversation, they are often driven by contextual factors, such as newly received information or unexpected events. Even though progress has been made in Emotion Recognition in Conversations (ERC), most existing approaches still rely heavily on overt evidence and do not sufficiently model these non-apparent factors. Especially in multimodal settings, this makes these models fragile when the signals are noisy (e.g., occluded faces, slang expressions, or microphone noise). To address these limitations, we introduce Speaker-Conditioned Priors over Emotions (SCoPE). SCoPE is a light weight module that utilizes the emotional history of each speaker and explicitly models their priors for use in subsequent emotion classification. Second, we incorporate emotion shift prediction, a well-established concept in ERC, to guide the model in balancing the priors from SCoPE and multimodal evidence. Finally, we propose a shift-aware fusion mechanism that performs precision-weighted logit integration between multimodal evidence and the speaker prior, forming a Bayesian-inspired product-of-experts formulation. This dynamic fusion allows the model to rely on historical priors when emotions persist and to prioritize multimodal evidence when shifts are likely. Experimental results show our model achieves superior performance over recent state-of-the-art models on the IEMOCAP dataset in multimodal settings. affective computing emotion recognition transformer-based architectures neural networks Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-9065619","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":616879269,"identity":"0dc389da-7612-4f95-b756-09e3579e2e92","order_by":0,"name":"Burak Can Kaplan","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABPklEQVRIie2QMUvDQBSA7zi4LgXXhg7+hZNAa1HjD3HJEUiXpASEUujQgBCXguv9jANBzXZyYJbSOeJSEcTBIeIkIvoStaCJiptgPu5xd+/xce8dQjU1fxAGgcPiSCEChFZgU3mKLbPfKnA1wl8rTL2lvlK6Lf/yTiBrtYsovw+YNTBT/1ThE2sHSlc3aLj5WemJvmlI5KzFIdVtwZzdTjqwFZ45fiz6nXU0d0uNpS4yFtC8VI2w3WSEH6Ue008R8WXqUoYjXaGQB1C2Qdl7bLIJPxQeUziavCvPFQqFxhSXip7BK5rLVqHoXCELHKnSLNNr2oMRHKmpuyFYwsXsNlcSP4YSsudO6ccaLjmfjqwtmUTmRTAa84N9z8xwNPaPoZRlQ6v0y6/tQZBifYS27GphSUkh2Q9GTU1Nzb/gBS7hcwEIysiAAAAAAElFTkSuQmCC","orcid":"","institution":"Universität Hamburg","correspondingAuthor":true,"prefix":"","firstName":"Burak","middleName":"Can","lastName":"Kaplan","suffix":""},{"id":616879275,"identity":"c52d8b2f-f3e5-4434-b426-45e19adac8aa","order_by":1,"name":"Stefan Wermter","email":"","orcid":"","institution":"Universität Hamburg","correspondingAuthor":false,"prefix":"","firstName":"Stefan","middleName":"","lastName":"Wermter","suffix":""}],"badges":[],"createdAt":"2026-03-08 16:53:15","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-9065619/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-9065619/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":106280669,"identity":"0eb0e1ee-0484-49fb-a2ae-cb29e209db4d","added_by":"auto","created_at":"2026-04-07 05:41:44","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":3196580,"visible":true,"origin":"","legend":"","description":"","filename":"SCoPE.pdf","url":"https://assets-eu.researchsquare.com/files/rs-9065619/v1_covered_47503114-0198-4576-ba58-d7cdb4cd9c0f.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"SCoPE: Shift-Aware Speaker-Conditioned Priors for Emotion Recognition in Conversations","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"affective computing, emotion recognition, transformer-based architectures, neural networks","lastPublishedDoi":"10.21203/rs.3.rs-9065619/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-9065619/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eIn conversations, human emotions are transient; however, they tend to persist across multiple utterances. For example, we rarely switch instantly between contrasting emotions such as happiness and anger. Instead, emotions tend to evolve smoothly, and these patterns are often speaker-specific. Some people might escalate, while others gradually cool down over time. Furthermore, when emotions change during a conversation, they are often driven by contextual factors, such as newly received information or unexpected events. Even though progress has been made in Emotion Recognition in Conversations (ERC), most existing approaches still rely heavily on overt evidence and do not sufficiently model these non-apparent factors. Especially in multimodal settings, this makes these models fragile when the signals are noisy (e.g., occluded faces, slang expressions, or microphone noise). To address these limitations, we introduce Speaker-Conditioned Priors over Emotions (SCoPE). SCoPE is a light weight module that utilizes the emotional history of each speaker and explicitly models their priors for use in subsequent emotion classification. Second, we incorporate emotion shift prediction, a well-established concept in ERC, to guide the model in balancing the priors from SCoPE and multimodal evidence. Finally, we propose a shift-aware fusion mechanism that performs precision-weighted logit integration between multimodal evidence and the speaker prior, forming a Bayesian-inspired product-of-experts formulation. This dynamic fusion allows the model to rely on historical priors when emotions persist and to prioritize multimodal evidence when shifts are likely. Experimental results show our model achieves superior performance over recent state-of-the-art models on the IEMOCAP dataset in multimodal settings.\u003c/p\u003e","manuscriptTitle":"SCoPE: Shift-Aware Speaker-Conditioned Priors for Emotion Recognition in Conversations","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-04-07 05:39:46","doi":"10.21203/rs.3.rs-9065619/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"47489613-67b4-4ca8-ba31-8fd2e7589b56","owner":[],"postedDate":"April 7th, 2026","published":true,"recentEditorialEvents":[{"type":"editorInvitedReview","content":"","date":"2026-05-14T15:30:25+00:00","index":24,"fulltext":""}],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-04-07T05:39:46+00:00","versionOfRecord":[],"versionCreatedAt":"2026-04-07 05:39:46","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-9065619","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-9065619","identity":"rs-9065619","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00