Evaluating gender bias in Large Language Models in long-term care | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Evaluating gender bias in Large Language Models in long-term care Sam Rickman This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-5166499/v3 This work is licensed under a CC BY 4.0 License Archived Versions: Posted Version 3 posted You are reading this latest preprint version Abstract Background: Large language models (LLMs) are being used to reduce the administrative burden in long-term care by automatically generating and summarising case notes. However, LLMs can reproduce bias in their training data. This study evaluates gender bias in summaries of long-term care records generated with two state-of-the-art, open-source LLMs released in 2024: Meta's Llama 3 and Google Gemma. Methods: Gender-swapped versions of long-term care records for 617 older people from a London local authority were created. Summaries of male and female versions were generated with Llama 3 and Gemma, as well as benchmark models from Meta and Google released in 2019: T5 and BART. Linguistic and inclusion bias was quantified through sentiment analysis, and frequency of words and themes Results: The benchmark models exhibited some variation in output on the basis of gender. Llama 3 showed no gender-based differences across any metrics. Gemma displayed the most significant gender-based differences. Male summaries focus more on physical and mental health issues. Language used for men was more direct, with women's needs downplayed more often than men's. Conclusions: Care services are allocated on the basis of need. If women's health issues are underemphasised, this may lead to gender-based disparities in service receipt. LLMs may offer substantial benefits in easing administrative burden. However, the findings highlight the variation in state-of-the-art LLMs, and the need for evaluation of bias in LLMs. Bias across gender and other protected characteristics should be evaluated in LLMs used in long-term care. The methods in this paper provide a practical framework for such evaluations. The code is available on GitHub. LLMs long-term care gender bias Full Text Additional Declarations No competing interests reported. Supplementary Files tables.zip supplementaryinformation.zip Cite Share Download PDF Archived Versions: Posted Version 3 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-5166499","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[{"code":3,"date":"2025-07-09 17:52:03","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-informatics-and-decision-making","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"midm","sideBox":"Learn more about [BMC Medical Informatics and Decision Making](http://bmcmedinformdecismak.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/midm/default.aspx","title":"BMC Medical Informatics and Decision Making","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":483110801,"identity":"5207be0c-5ced-11f0-91e4-06cc9d20a69f","order_by":0,"name":"Sam Rickman","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA30lEQVRIie2RPQrCMBSAXylkqri2S3sCIaXwQJCeJaHQqYhjx4DQLh7AYzg5Rwq6VFzjVneHHqCg9Q+cYkeHfG96wxe+RwAMhr+F9gNgNc17t8RAxaZsuAJPhbiDlEA41M4XcTQpj/ucdTMYl5J4a93r0qFWTRPEep4qXqTg1ox4G23QaGMJKmcoM1RcVAAKiNdowz7K6YoL1t0g+KWAfCmIKkNgRAJ9KNqwatzuBE0iVNfI5UXihDVfTnXnB2XBL6KLw+0pC9u2i33/UO3OK12Y3bd9787AjzQYDAaDjjuLrkoPY5YE5QAAAABJRU5ErkJggg==","orcid":"","institution":"London School of Economics and Political Science","correspondingAuthor":true,"prefix":"","firstName":"Sam","middleName":"","lastName":"Rickman","suffix":""}],"badges":[],"createdAt":"2024-09-27 16:23:14","currentVersionCode":3,"declarations":"","doi":"10.21203/rs.3.rs-5166499/v3","doiUrl":"https://doi.org/10.21203/rs.3.rs-5166499/v3","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12911-025-03118-0","type":"published","date":"2025-08-11T15:57:38+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":89310546,"identity":"114c0f6b-b766-4609-a8aa-4194a1eda712","added_by":"auto","created_at":"2025-08-18 16:07:40","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":420899,"visible":true,"origin":"","legend":"","description":"","filename":"evaluatinggenderbias.pdf","url":"https://assets-eu.researchsquare.com/files/rs-5166499/v3_covered_a15d2f1f-0f4a-4cd7-b93b-651412b3f1ff.pdf"},{"id":86358365,"identity":"2b8eb240-3c10-4eb9-8f97-4f9c0b0c5eb8","added_by":"auto","created_at":"2025-07-09 17:52:07","extension":"zip","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":46714,"visible":true,"origin":"","legend":"","description":"","filename":"tables.zip","url":"https://assets-eu.researchsquare.com/files/rs-5166499/v3/078f84ea62c355e3df0cbafe.zip"},{"id":86358366,"identity":"88de9436-5832-421e-a692-e5860fc252a8","added_by":"auto","created_at":"2025-07-09 17:52:07","extension":"zip","order_by":1,"title":"","display":"","copyAsset":false,"role":"supplement","size":3525992,"visible":true,"origin":"","legend":"","description":"","filename":"supplementaryinformation.zip","url":"https://assets-eu.researchsquare.com/files/rs-5166499/v3/e3ec62bb6f598d8e08904617.zip"}],"financialInterests":"No competing interests reported.","formattedTitle":"Evaluating gender bias in Large Language Models in long-term care","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-informatics-and-decision-making","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"midm","sideBox":"Learn more about [BMC Medical Informatics and Decision Making](http://bmcmedinformdecismak.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/midm/default.aspx","title":"BMC Medical Informatics and Decision Making","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"LLMs, long-term care, gender, bias","lastPublishedDoi":"10.21203/rs.3.rs-5166499/v3","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-5166499/v3","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"Background: Large language models (LLMs) are being used to reduce the administrative burden in long-term care by automatically generating and summarising case notes. However, LLMs can reproduce bias in their training data. This study evaluates gender bias in summaries of long-term care records generated with two state-of-the-art, open-source LLMs released in 2024: Meta's Llama 3 and Google Gemma.\nMethods: Gender-swapped versions of long-term care records for 617 older people from a London local authority were created. Summaries of male and female versions were generated with Llama 3 and Gemma, as well as benchmark models from Meta and Google released in 2019: T5 and BART. Linguistic and inclusion bias was quantified through sentiment analysis, and frequency of words and themes\nResults: The benchmark models exhibited some variation in output on the basis of gender. Llama 3 showed no gender-based differences across any metrics. Gemma displayed the most significant gender-based differences. Male summaries focus more on physical and mental health issues. Language used for men was more direct, with women's needs downplayed more often than men's.\nConclusions: Care services are allocated on the basis of need. If women's health issues are underemphasised, this may lead to gender-based disparities in service receipt. LLMs may offer substantial benefits in easing administrative burden. However, the findings highlight the variation in state-of-the-art LLMs, and the need for evaluation of bias in LLMs. Bias across gender and other protected characteristics should be evaluated in LLMs used in long-term care. The methods in this paper provide a practical framework for such evaluations. The code is available on GitHub.","manuscriptTitle":"Evaluating gender bias in Large Language Models in long-term care","msid":"","msnumber":"","nonDraftVersions":[{"code":"","date":"2024-10-01 11:38:20","doi":"","editorialEvents":[{"type":"decision","content":"Revision requested","date":"2024-10-03T09:08:13+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-10-01T11:32:41+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Informatics and Decision Making","date":"2024-10-01T11:31:26+00:00","index":"","fulltext":""},{"type":"notPreprinted","content":""}],"status":"timeline","journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-informatics-and-decision-making","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"midm","sideBox":"Learn more about [BMC Medical Informatics and Decision Making](http://bmcmedinformdecismak.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/midm/default.aspx","title":"BMC Medical Informatics and Decision Making","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}},{"code":1,"date":"2024-10-15 21:08:28","doi":"10.21203/rs.3.rs-5166499/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-10-01T08:54:13+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-10-01T05:25:38+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-09-30T05:55:10+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Informatics and Decision Making","date":"2024-09-27T16:18:57+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-informatics-and-decision-making","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"midm","sideBox":"Learn more about [BMC Medical Informatics and Decision Making](http://bmcmedinformdecismak.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/midm/default.aspx","title":"BMC Medical Informatics and Decision Making","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}},{"code":2,"date":"2024-10-24 19:55:07","doi":"10.21203/rs.3.rs-5166499/v2","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-06-27T04:11:58+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-26T15:05:21+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-06-25T11:46:19+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"320316772992713655443043229162844576906","date":"2025-06-02T14:56:20+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"271500768830437196155809753671962648000","date":"2025-06-02T14:48:28+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"197280637415608495770318748210816073188","date":"2025-06-02T13:24:21+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-10-14T11:59:31+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"123767872862179171914115571367640870880","date":"2024-10-12T18:47:15+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-10-10T18:35:50+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-10-04T07:00:40+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-10-03T17:59:29+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Informatics and Decision Making","date":"2024-10-03T17:58:25+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-informatics-and-decision-making","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"midm","sideBox":"Learn more about [BMC Medical Informatics and Decision Making](http://bmcmedinformdecismak.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/midm/default.aspx","title":"BMC Medical Informatics and Decision Making","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"f5567098-3df1-4304-ab54-919baf7b3b1a","owner":[],"postedDate":"July 9th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2025-08-18T16:01:47+00:00","versionOfRecord":{"articleIdentity":"rs-5166499","link":"https://doi.org/10.1186/s12911-025-03118-0","journal":{"identity":"bmc-medical-informatics-and-decision-making","isVorOnly":false,"title":"BMC Medical Informatics and Decision Making"},"publishedOn":"2025-08-11 15:57:38","publishedOnDateReadable":"August 11th, 2025"},"versionCreatedAt":"2025-07-09 17:52:03","video":"","vorDoi":"10.1186/s12911-025-03118-0","vorDoiUrl":"https://doi.org/10.1186/s12911-025-03118-0","workflowStages":[]},"version":"v3","identity":"rs-5166499","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-5166499","identity":"rs-5166499","version":["v3"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.