Sentiment Detection in Low-Resource Language Gujarati: Evaluating Machine Translation Pipelines for Cross-Lingual Preservation

preprint OA: closed
Full text JSON View at publisher
Full text 17,891 characters · extracted from preprint-html · click to expand
Sentiment Detection in Low-Resource Language Gujarati: Evaluating Machine Translation Pipelines for Cross-Lingual Preservation | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Sentiment Detection in Low-Resource Language Gujarati: Evaluating Machine Translation Pipelines for Cross-Lingual Preservation Neha Shah¹, Preeti Baser², Niraj Shah, Parag Sanghani This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7438637/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Cross-lingual applications in low-resource languages have greatly benefited from machine translation (MT); yet, emotion polarity preservation is still extensively researched. Although surface-level lexical similarity is captured by traditional assessment metrics like BLEU and CHRF, the reliability of sentiment transfer remains unclear for applications involving social media, news, and reviews. This paper introduces the first lexicon-anchored benchmark of 5,000 Gujarati news headlines with sentiment labels, establishing a resource for both sentiment analysis and MT evaluation. This study evaluates sentiment preservation in a low-resource Indian language using a lexicon-anchored benchmark of 5,000 sentiment-labeled Gujarati news headlines. (i) XLM-R direct multilingual modeling, (ii) Gujarati-Hindi translation followed by VADER sentiment analysis, and (iii) Gujarati-Hindi translation followed by a transformer-based sentiment classifier are the three pipelines that we appraise. The findings show that the transformer technique based on Gujarati–Hindi translation attains the highest sentiment preservation rate (35.25%), while straight multilingual modelling attains the lowest (27.35%). The usefulness of hybrid evaluation frameworks is further demonstrated by mistake analysis, which identifies instances in which a Gujarati sentiment lexicon effectively restores polarity lost during translation. Our findings indicate that linguistically proximate pivot languages, like Hindi for Gujarati, can improve cross-lingual sentiment fidelity and establish sentiment preservation as an additional evaluation factor for MT in low-resource scenarios. Machine Translation Sentiment Analysis Low-Resource Languages Gujarati Cross-Lingual Evaluation Natural Language Processing Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7438637","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":506210922,"identity":"39f7f359-055e-43d6-8831-f84e81f7fbdd","order_by":0,"name":"Neha Shah¹","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA7ElEQVRIiWNgGAWjYFACHgbGDwwHGBiYDz58AOLyEaOFWQKkhS3Z2ADEZSNGCxCBtZhJgPgEtZiz9x78IPHnTj4/GzNb5dccOxk2BuaHj27g0WLZcy5ZorDtmeXMNma227LbkoEOYzM2zsGjxeBGjoGEZMNhA4P7/cduS25jBmrhYZPGq+X+G+MfPH+AWo4xsxVLbqsnQssNHjMJHjaIFsaP2w4ToeVMjpm1ZNthA8k2ZmZpxm3HeYDBQMAvx88Y3/wAdBgwxBg//txWbc/P3vzwMT4tKICZB0wSqxwEGH+QonoUjIJRMApGDAAAN/5EZi+n6agAAAAASUVORK5CYII=","orcid":"","institution":"P. P. Savani University","correspondingAuthor":true,"prefix":"","firstName":"Neha","middleName":"","lastName":"Shah¹","suffix":""},{"id":506210923,"identity":"202be705-606f-48d6-a7c9-56bcc25171c6","order_by":1,"name":"Preeti Baser²","email":"","orcid":"","institution":"P. P. Savani University","correspondingAuthor":false,"prefix":"","firstName":"Preeti","middleName":"","lastName":"Baser²","suffix":""},{"id":506210924,"identity":"ba016b4d-c897-4985-9671-b77372c50cba","order_by":2,"name":"Niraj Shah","email":"","orcid":"","institution":"P. P. Savani University","correspondingAuthor":false,"prefix":"","firstName":"Niraj","middleName":"","lastName":"Shah","suffix":""},{"id":506210925,"identity":"d34946df-60ee-46a1-b528-f3d409ddbff7","order_by":3,"name":"Parag Sanghani","email":"","orcid":"","institution":"P. P. Savani University","correspondingAuthor":false,"prefix":"","firstName":"Parag","middleName":"","lastName":"Sanghani","suffix":""}],"badges":[],"createdAt":"2025-08-23 05:08:11","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7438637/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7438637/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":94623451,"identity":"c2089e6a-bea5-48bd-b986-2d2656e2a022","added_by":"auto","created_at":"2025-10-29 04:19:10","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":157902,"visible":true,"origin":"","legend":"","description":"","filename":"SentimentDetectioninLowResourceLanguageGujaratigEvaluatingMachineTranslationPipelinesforCrossLingualPreservation.docx","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/9a694db64dd581264cd2c944.docx"},{"id":94623286,"identity":"3bc84154-3446-4b28-b2c6-5b318c31f868","added_by":"auto","created_at":"2025-10-29 04:19:03","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":5974,"visible":true,"origin":"","legend":"","description":"","filename":"0665345186544119a7ff99be9af37767.json","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/504445833a0505b87142e2fd.json"},{"id":94623455,"identity":"dfc22a28-3a38-4ea8-8b42-f818b05ea9f9","added_by":"auto","created_at":"2025-10-29 04:19:10","extension":"xml","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":71185,"visible":true,"origin":"","legend":"","description":"","filename":"0665345186544119a7ff99be9af377671enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/66c203463116f5b0c82f7cc6.xml"},{"id":94623559,"identity":"3a7ac278-0dfb-4209-85a7-587bec23740e","added_by":"auto","created_at":"2025-10-29 04:19:16","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":39210,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/b42fff9bd092ccd7f85b776c.png"},{"id":94640147,"identity":"3297ba01-208a-46e6-bfcf-78e131bfc271","added_by":"auto","created_at":"2025-10-29 07:48:31","extension":"png","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":19792,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/1ae8c39b5d7993414c7d4ab8.png"},{"id":94623449,"identity":"f1ad4382-ccbe-4629-af7d-d655e5ceda4d","added_by":"auto","created_at":"2025-10-29 04:19:10","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":16564,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/8f71b273be10c66a4889b68a.png"},{"id":94623483,"identity":"0e56c6f9-9fac-4608-9c9e-ea3c27961047","added_by":"auto","created_at":"2025-10-29 04:19:12","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":16762,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/e3f82cd307b7e79f6382a87a.png"},{"id":94623557,"identity":"0d1dbe64-2fc9-404b-8b3a-a2a109fa96dc","added_by":"auto","created_at":"2025-10-29 04:19:16","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":16141,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/5785a45b0044efd2657cc1b0.png"},{"id":94623426,"identity":"c2c72300-a367-4c26-b039-3bd9076cbbf1","added_by":"auto","created_at":"2025-10-29 04:19:08","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":12994,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/d20378f9b7e2090dc3a60c62.png"},{"id":94623466,"identity":"0d0da742-6e99-478e-a4e5-af7b4e179c71","added_by":"auto","created_at":"2025-10-29 04:19:11","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":6455,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/6fe3280ea1adc3b3fcfa09db.png"},{"id":94623160,"identity":"3d7032c8-15ca-47fb-a458-a521b994b0fc","added_by":"auto","created_at":"2025-10-29 04:18:56","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":5033,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/cbf2b04b14f89b37bed6795e.png"},{"id":94623442,"identity":"988722d0-8132-4a37-b7f6-3e5e11e41a35","added_by":"auto","created_at":"2025-10-29 04:19:10","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":5050,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/29e94c32f4c0e3f5b182132a.png"},{"id":94623363,"identity":"41e0be47-ebd9-4ae0-bc19-49bdeb1cbf75","added_by":"auto","created_at":"2025-10-29 04:19:06","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4854,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/c867ecf6d09574ad0aa064cc.png"},{"id":94623535,"identity":"699c1c71-8940-420a-9ab1-12a744de9b4d","added_by":"auto","created_at":"2025-10-29 04:19:14","extension":"xml","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":68270,"visible":true,"origin":"","legend":"","description":"","filename":"0665345186544119a7ff99be9af377671structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/a6218f749c37623d57d5170e.xml"},{"id":94623499,"identity":"5c03613b-1e41-42de-9331-ae0d0a0c58d8","added_by":"auto","created_at":"2025-10-29 04:19:12","extension":"html","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":76802,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1/dcf7f165c15babff199a8fe7.html"},{"id":94641091,"identity":"91cc0b50-ce17-464b-8329-5668ab63781b","added_by":"auto","created_at":"2025-10-29 07:51:09","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":516884,"visible":true,"origin":"","legend":"","description":"","filename":"SentimentDetectioninLowResourceLanguageGujaratigEvaluatingMachineTranslationPipelinesforCrossLingualPreservation.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7438637/v1_covered_ce48db22-d3b5-47b2-9f81-2a246c181749.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"Sentiment Detection in Low-Resource Language Gujarati: Evaluating Machine Translation Pipelines for Cross-Lingual Preservation","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Machine Translation, Sentiment Analysis, Low-Resource Languages, Gujarati, Cross-Lingual Evaluation, Natural Language Processing","lastPublishedDoi":"10.21203/rs.3.rs-7438637/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7438637/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eCross-lingual applications in low-resource languages have greatly benefited from machine translation (MT); yet, emotion polarity preservation is still extensively researched. Although surface-level lexical similarity is captured by traditional assessment metrics like BLEU and CHRF, the reliability of sentiment transfer remains unclear for applications involving social media, news, and reviews. This paper introduces the first lexicon-anchored benchmark of 5,000 Gujarati news headlines with sentiment labels, establishing a resource for both sentiment analysis and MT evaluation. This study evaluates sentiment preservation in a low-resource Indian language using a lexicon-anchored benchmark of 5,000 sentiment-labeled Gujarati news headlines. (i) XLM-R direct multilingual modeling, (ii) Gujarati-Hindi translation followed by VADER sentiment analysis, and (iii) Gujarati-Hindi translation followed by a transformer-based sentiment classifier are the three pipelines that we appraise. The findings show that the transformer technique based on Gujarati\u0026ndash;Hindi translation attains the highest sentiment preservation rate (35.25%), while straight multilingual modelling attains the lowest (27.35%). The usefulness of hybrid evaluation frameworks is further demonstrated by mistake analysis, which identifies instances in which a Gujarati sentiment lexicon effectively restores polarity lost during translation. Our findings indicate that linguistically proximate pivot languages, like Hindi for Gujarati, can improve cross-lingual sentiment fidelity and establish sentiment preservation as an additional evaluation factor for MT in low-resource scenarios.\u003c/p\u003e","manuscriptTitle":"Sentiment Detection in Low-Resource Language Gujarati: Evaluating Machine Translation Pipelines for Cross-Lingual Preservation","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-10-29 04:11:07","doi":"10.21203/rs.3.rs-7438637/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"6f7808f5-10de-4d62-8f57-cfc221e4206e","owner":[],"postedDate":"October 29th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2025-10-29T04:11:07+00:00","versionOfRecord":[],"versionCreatedAt":"2025-10-29 04:11:07","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7438637","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7438637","identity":"rs-7438637","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00