Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media

doi:10.21203/rs.3.rs-8476200/v1

Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media

2026 · doi:10.21203/rs.3.rs-8476200/v1

preprint OA: closed

Full text JSON View at publisher

Full text 11,130 characters · extracted from preprint-html · click to expand

Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media Shereen Fouad, Ezzaldin Alkooheji This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8476200/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Transformer-based models have advanced sentiment analysis but remain difficult to interpret, especially in sensitive domains such as public discourse about women in science, technology, engineering, and mathematics (STEM). Building on earlier work on women-in-STEM sentiment, this study introduces an ethically curated corpus of over 140,000 English tweets/X posts and a validated automatic labelling pipeline that combines hand annotated data with state-of-the-art transformer-based sentiment models. We quantitatively compare several transfer-learning approaches and identify the best-performing model for this domain, achieving high overall accuracy while revealing systematic confusions between neutral and mildly positive content.To open this “black box,” we apply {SHAPley Additive exPlanations (SHAP)} and Integrated Gradients (IG) XAI methods to obtain word-level attributions for correctly and incorrectly classified tweets, showing how specific linguistic cues, such as celebratory hashtags, negation, and emotionally charged terms, drive sentiment predictions and common error modes. We further design a bias probing protocol based on minimally different gendered sentence pairs (e.g., “Women in STEM” vs. “Men in STEM”) and show that the model assigns systematically different sentiment scores and attributions to male- and female-marked variants, indicating learned gender bias. All data processing scripts, model configurations, and analysis code are released here to support transparency, reproducibility, and future research on explainable and fair sentiment analysis in socially critical contexts Women in STEM Social Media Sentiment Analysis Explainable Artificial Intelligence Transformers Gender Bias Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8476200","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":569013308,"identity":"b1f8d2f4-87b8-466e-8584-e1ef942665cd","order_by":0,"name":"Shereen Fouad","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA00lEQVRIiWNgGAWjYFACxgdAglkOSYSNkBZmAxBhDMQkaklsIFqLfAMz42OeGuv0/v7zBx8XMNjJM0ikJeDVYnCAmdmY51h67owbyczGMxiSDRsk0g7g18LAf0yah+1wbsMNZjZpHgbmBAaJ9AZCDgOq/Hc4Xf78YZCWesJaGA4AtfC2HU4wOJAM0nIYqIWQww4zMxvO7Us33Hgj2diYx+C4YRvPswT8DmtvZnzw5pu1vNz5gw8f81RUy/OzpxngdxgzCs+AiIgcBaNgFIyCUUAYAADM/jbJz67vyQAAAABJRU5ErkJggg==","orcid":"","institution":"Aston University","correspondingAuthor":true,"prefix":"","firstName":"Shereen","middleName":"","lastName":"Fouad","suffix":""},{"id":569013309,"identity":"5539a7d8-cfda-46c7-838b-4e25ca434b0a","order_by":1,"name":"Ezzaldin Alkooheji","email":"","orcid":"","institution":"Aston University","correspondingAuthor":false,"prefix":"","firstName":"Ezzaldin","middleName":"","lastName":"Alkooheji","suffix":""}],"badges":[],"createdAt":"2025-12-30 00:08:03","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8476200/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8476200/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":99998535,"identity":"7543b4f8-c1fa-4aa6-9387-4458e7dde569","added_by":"auto","created_at":"2026-01-12 03:39:19","extension":"json","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":5286,"visible":true,"origin":"","legend":"","description":"","filename":"5399dba352734603a941504a3cd79e9c.json","url":"https://assets-eu.researchsquare.com/files/rs-8476200/v1/2343a388528c6b01faec5c57.json"},{"id":107332859,"identity":"b5c30756-955c-4c73-9f2e-0145356ba50f","added_by":"auto","created_at":"2026-04-20 12:57:41","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":2986520,"visible":true,"origin":"","legend":"","description":"","filename":"WomeninSTEMJournalDec25.pdf","url":"https://assets-eu.researchsquare.com/files/rs-8476200/v1_covered_c9c54835-68c2-48f3-9bd2-f082d09f945f.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Women in STEM, Social Media, Sentiment Analysis, Explainable Artificial Intelligence, Transformers, Gender Bias","lastPublishedDoi":"10.21203/rs.3.rs-8476200/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8476200/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Transformer-based models have advanced sentiment analysis but remain difficult to interpret, especially in sensitive domains such as public discourse about women in science, technology, engineering, and mathematics (STEM). Building on earlier work on women-in-STEM sentiment, this study introduces an ethically curated corpus of over 140,000 English tweets/X posts and a validated automatic labelling pipeline that combines hand annotated data with state-of-the-art transformer-based sentiment models. We quantitatively compare several transfer-learning approaches and identify the best-performing model for this domain, achieving high overall accuracy while revealing systematic confusions between neutral and mildly positive content.To open this “black box,” we apply {SHAPley Additive exPlanations (SHAP)} and Integrated Gradients (IG) XAI methods to obtain word-level attributions for correctly and incorrectly classified tweets, showing how specific linguistic cues, such as celebratory hashtags, negation, and emotionally charged terms, drive sentiment predictions and common error modes. We further design a bias probing protocol based on minimally different gendered sentence pairs (e.g., “Women in STEM” vs. “Men in STEM”) and show that the model assigns systematically different sentiment scores and attributions to male- and female-marked variants, indicating learned gender bias. All data processing scripts, model configurations, and analysis code are released here to support transparency, reproducibility, and future research on explainable and fair sentiment analysis in socially critical contexts","manuscriptTitle":"Integrating Explainability for Sentiment Interpretation, Misclassification, and Bias Detection in Women-in-STEM Social Media","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2026-01-12 03:38:41","doi":"10.21203/rs.3.rs-8476200/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"0c5ccc2c-9940-4fb6-84d2-1b9213e33b45","owner":[],"postedDate":"January 12th, 2026","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-04-20T12:57:12+00:00","versionOfRecord":[],"versionCreatedAt":"2026-01-12 03:38:41","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8476200","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8476200","identity":"rs-8476200","version":["v1"]},"buildId":"XKTyCvWXoU3ODBz1xrDgd","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2026) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00