FedEmoNet: Privacy-Preserving FederatedLearning with TCN-Transformer Fusion forCross-Corpus Speech Emotion Recognition | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article FedEmoNet: Privacy-Preserving FederatedLearning with TCN-Transformer Fusion forCross-Corpus Speech Emotion Recognition MOHAMMED TAWFIK, Razan Ali Obeidat, Saddam Kamel, Njood Anwer Aljarrah, and 2 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8726974/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Cross-corpus speech emotion recognition faces significant challenges due to domainshifts and privacy concerns, with existing systems showing 20–40% performance degradation across datasets while requiring centralized data collection. This paper presents aprivacy-preserving federated learning framework integrating FedProx-based distributedtraining with a hybrid TCN-Transformer architecture, PSO-optimized feature selection,and formal differential privacy guarantees. The federated protocol enables collaborativemodel training across five distributed clients under non-IID data distribution (Dirichletα = 0.5) without sharing raw speech data. Within each client, the local model employsmulti-scale phase space reconstruction at micro (25ms), meso (250ms), and macro (2.5s)temporal scales, combined with spectral and handcrafted features processed through aTCN-Transformer fusion architecture. Formal (ϵ = 1.0, δ = 10−5)-differential privacyis achieved via gradient clipping and calibrated noise injection. Experiments followa consistent 80/20 train-test split with subject-independent validation. The framework achieves 99.07%±0.35% accuracy on EmoDB and 98.96%±0.42% on RAVDESS,with cross-corpus evaluation on CREMA-D achieving 68.15% ± 1.23% without finetuning. Ablation studies quantify component contributions: PSO feature selection(+2.80%), Transformer blocks (+2.10%), and FedProx protocol (+2.62%). Privacyanalysis demonstrates membership inference attack resistance with AUC reduced to0.52 while maintaining 98.5% accuracy under differential privacy constraints. Biological sciences/Computational biology and bioinformatics Physical sciences/Engineering Physical sciences/Mathematics and computing Speech emotion recognition Cross-corpus generalization Temporal convolutional networks Transformer architecture Particle swarm optimization Federated learning Privacy-preserving machine learning FedProx Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8726974","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":587872024,"identity":"5eb6223f-93bf-4e2f-90e6-c7c9cf90a2e2","order_by":0,"name":"MOHAMMED TAWFIK","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA30lEQVRIiWNgGAWjYBACgwMg0kbCgB/MZSNCi+UBZiCZJmEg2UCsFnuIFgYDiHXEaDE7wH/wM0+ChbHx+TMGDB/KDjMYHG8gpIWZWZonQcLM7EaOAeOMc0AtZw4Q1MIgnftDwsbsBo8BM28bUMuNBPxaDIC2/M5JkLAx7j9jwPwXpOX+A4Ja2KSBWswMGHIMmBnBtuDXwWBwmNnM+k+ChLHEjbSCgz3n0nkkzxBy2PHGxzdnJNQZ9vcf3vjgR5m1HN/xAwSsYUZig9TyKBDSgQnkG0jWMgpGwSgYBcMcAACJjEK8u26NlQAAAABJRU5ErkJggg==","orcid":"","institution":"Sana'a University","correspondingAuthor":true,"prefix":"","firstName":"MOHAMMED","middleName":"","lastName":"TAWFIK","suffix":""},{"id":587872026,"identity":"a7142ca3-2da1-4d2e-a68c-93f9295b7ad7","order_by":1,"name":"Razan Ali Obeidat","email":"","orcid":"","institution":"Ajloun National University","correspondingAuthor":false,"prefix":"","firstName":"Razan","middleName":"Ali","lastName":"Obeidat","suffix":""},{"id":587872031,"identity":"62bb499b-b8e5-4995-9125-ea2ed04b1e70","order_by":2,"name":"Saddam Kamel","email":"","orcid":"","institution":"Sana'a University","correspondingAuthor":false,"prefix":"","firstName":"Saddam","middleName":"","lastName":"Kamel","suffix":""},{"id":587872032,"identity":"69242466-0320-4412-b0a7-b3839ca4ffce","order_by":3,"name":"Njood Anwer Aljarrah","email":"","orcid":"","institution":"Ajloun National University","correspondingAuthor":false,"prefix":"","firstName":"Njood","middleName":"Anwer","lastName":"Aljarrah","suffix":""},{"id":587872033,"identity":"637b75f4-cfde-48cd-b32f-7aa59dbc3ea3","order_by":4,"name":"Haneen Hussein Shehadeh","email":"","orcid":"","institution":"Ajloun National University","correspondingAuthor":false,"prefix":"","firstName":"Haneen","middleName":"Hussein","lastName":"Shehadeh","suffix":""},{"id":587872035,"identity":"f50e33e5-3a79-418c-899a-a9f7072973cf","order_by":5,"name":"Ahmad Dalalah","email":"","orcid":"","institution":"Ajloun National University","correspondingAuthor":false,"prefix":"","firstName":"Ahmad","middleName":"","lastName":"Dalalah","suffix":""}],"badges":[],"createdAt":"2026-01-29 04:38:20","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8726974/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8726974/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":106724358,"identity":"7f3ecbe9-5698-49f8-8d52-de649c1c0a5a","added_by":"auto","created_at":"2026-04-12 18:27:43","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":6205291,"visible":true,"origin":"","legend":"","description":"","filename":"tawfik2026v1.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8726974/v1_covered_dc336259-17d4-4642-90ab-02463f017b7e.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"FedEmoNet: Privacy-Preserving FederatedLearning with TCN-Transformer Fusion forCross-Corpus Speech Emotion Recognition","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Speech emotion recognition, Cross-corpus generalization, Temporal convolutional networks, Transformer architecture, Particle swarm optimization, Federated learning, Privacy-preserving machine learning, FedProx","lastPublishedDoi":"10.21203/rs.3.rs-8726974/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8726974/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Cross-corpus speech emotion recognition faces significant challenges due to domainshifts and privacy concerns, with existing systems showing 20–40% performance degradation across datasets while requiring centralized data collection. This paper presents aprivacy-preserving federated learning framework integrating FedProx-based distributedtraining with a hybrid TCN-Transformer architecture, PSO-optimized feature selection,and formal differential privacy guarantees. The federated protocol enables collaborativemodel training across five distributed clients under non-IID data distribution (Dirichletα = 0.5) without sharing raw speech data. Within each client, the local model employsmulti-scale phase space reconstruction at micro (25ms), meso (250ms), and macro (2.5s)temporal scales, combined with spectral and handcrafted features processed through aTCN-Transformer fusion architecture. Formal (ϵ = 1.0, δ = 10−5)-differential privacyis achieved via gradient clipping and calibrated noise injection. Experiments followa consistent 80/20 train-test split with subject-independent validation. The framework achieves 99.07%±0.35% accuracy on EmoDB and 98.96%±0.42% on RAVDESS,with cross-corpus evaluation on CREMA-D achieving 68.15% ± 1.23% without finetuning. Ablation studies quantify component contributions: PSO feature selection(+2.80%), Transformer blocks (+2.10%), and FedProx protocol (+2.62%). Privacyanalysis demonstrates membership inference attack resistance with AUC reduced to0.52 while maintaining 98.5% accuracy under differential privacy constraints.","manuscriptTitle":"FedEmoNet: Privacy-Preserving FederatedLearning with TCN-Transformer Fusion forCross-Corpus Speech Emotion Recognition","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-02-09 07:10:59","doi":"10.21203/rs.3.rs-8726974/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3fba2b88-c2ba-4736-8140-73d221d3e45b","owner":[],"postedDate":"February 9th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":62540304,"name":"Biological sciences/Computational biology and bioinformatics"},{"id":62540305,"name":"Physical sciences/Engineering"},{"id":62540306,"name":"Physical sciences/Mathematics and computing"}],"tags":[],"updatedAt":"2026-04-08T17:55:47+00:00","versionOfRecord":[],"versionCreatedAt":"2026-02-09 07:10:59","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8726974","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8726974","identity":"rs-8726974","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.