An Open-source Fine-tuned Large Language Model for Radiological Impression Generation: A Multi-reader Performance Study | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article An Open-source Fine-tuned Large Language Model for Radiological Impression Generation: A Multi-reader Performance Study Adrian Serapio, Gunvant Chaudhari, Cody Savage, Yoo Jin Lee, Maya Vella, and 5 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4656707/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 27 Sep, 2024 Read the published version in BMC Medical Imaging → Version 1 posted 10 You are reading this latest preprint version Abstract Background The impression section integrates key findings of a radiology report but can be subjective and variable. A fine-tuned open-source Large Language Model (LLM) was evaluated in its ability to generate radiological report impressions across different imaging modalities and hospitals. We sought to clinically validate an open-source fine-tuned LLM that automatically generates impressions to summarize radiology reports. Methods In this institutional review board-approved retrospective study, we fine-tuned an open-source LLM to generate the impression from the remainder of the radiology report. CT, US, and MRI radiology reports from Hospital 1 (n = 372716) and Hospital 2 (n = 60049), both under a single institution, were included in this study. The ROUGE score was used for automatic natural language evaluation and a reader study with five thoracic radiologists was performed for a clinical evaluation of CT chest impressions with a subspecialist baseline. We also stratified the results of the reader performance study based on the diagnosis category and the original impression length to gauge case complexity. Results The large language model achieved ROUGE-L scores of 46.51, 44.2, and 50.96 on the Hospital 1 dataset across the CT, US, and MRI modalities respectively. Upon external validation on the Hospital 2 independent test dataset, the model achieved ROUGE-L scores of 40.74, 37.89, and 24.61 for the same set of modalities. For the reader performance study, the model achieved overall mean scores of 3.56/4, 3.92/4, and 3.37/4, 18.29 seconds, and 12.32 words for clinical accuracy, grammatical accuracy, stylistic quality, edit time, and edit distance respectively. The LLM achieved the highest clinical accuracy ratings for acute/emergent findings. In terms of impression length, the LLM performed the best in clinical accuracy on shorter impressions. Conclusions We demonstrated that an open-source fine-tuned LLM can generate high-quality radiological impressions of clinical accuracy, grammatical accuracy, and stylistic quality across multiple imaging modalities and hospitals. Natural Language Processing Large Language Model Open-source Summarization Impressions Full Text Additional Declarations No competing interests reported. Cite Share Download PDF Status: Published Journal Publication published 27 Sep, 2024 Read the published version in BMC Medical Imaging → Version 1 posted Editorial decision: Revision requested 04 Sep, 2024 Reviews received at journal 03 Sep, 2024 Reviewers agreed at journal 13 Aug, 2024 Reviews received at journal 06 Aug, 2024 Reviewers agreed at journal 26 Jul, 2024 Reviewers invited by journal 16 Jul, 2024 Editor invited by journal 16 Jul, 2024 Editor assigned by journal 09 Jul, 2024 Submission checks completed at journal 09 Jul, 2024 First submitted to journal 28 Jun, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4656707","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":333183094,"identity":"2cb07c8e-a64c-495b-ad6e-b1d9bfe47d66","order_by":0,"name":"Adrian Serapio","email":"","orcid":"","institution":"University of California, San Francisco","correspondingAuthor":false,"prefix":"","firstName":"Adrian","middleName":"","lastName":"Serapio","suffix":""},{"id":333183095,"identity":"1103172b-252e-4858-b88e-4c5da562399d","order_by":1,"name":"Gunvant Chaudhari","email":"","orcid":"","institution":"University of California, San Francisco","correspondingAuthor":false,"prefix":"","firstName":"Gunvant","middleName":"","lastName":"Chaudhari","suffix":""},{"id":333183096,"identity":"a0cf91fd-2567-4f18-9905-97c1c9d6286c","order_by":2,"name":"Cody Savage","email":"","orcid":"","institution":"University of California, San Francisco","correspondingAuthor":false,"prefix":"","firstName":"Cody","middleName":"","lastName":"Savage","suffix":""},{"id":333183097,"identity":"f2dd91d9-d902-4d04-95cc-a2a5f00eff78","order_by":3,"name":"Yoo Jin Lee","email":"","orcid":"","institution":"University of California, San Francisco","correspondingAuthor":false,"prefix":"","firstName":"Yoo","middleName":"Jin","lastName":"Lee","suffix":""},{"id":333183098,"identity":"7ba7f648-85ef-461c-be86-a3b89c11170a","order_by":4,"name":"Maya Vella","email":"","orcid":"","institution":"University of California, San Francisco","correspondingAuthor":false,"prefix":"","firstName":"Maya","middleName":"","lastName":"Vella","suffix":""},{"id":333183099,"identity":"b65ae8f5-5a23-4ae6-a0fb-2620bed55cb7","order_by":5,"name":"Shravan Sridhar","email":"","orcid":"","institution":"University of California, San Francisco","correspondingAuthor":false,"prefix":"","firstName":"Shravan","middleName":"","lastName":"Sridhar","suffix":""},{"id":333183100,"identity":"a0fdcd7f-c397-4090-9ad2-b08fb54798c0","order_by":6,"name":"Jamie Schroeder","email":"","orcid":"","institution":"University of California, San Francisco","correspondingAuthor":false,"prefix":"","firstName":"Jamie","middleName":"","lastName":"Schroeder","suffix":""},{"id":333183101,"identity":"81b7aef7-1762-49a8-a295-86982f395e3e","order_by":7,"name":"Jonathan Liu","email":"","orcid":"","institution":"University of California, San Francisco","correspondingAuthor":false,"prefix":"","firstName":"Jonathan","middleName":"","lastName":"Liu","suffix":""},{"id":333183102,"identity":"f762b0f6-0582-421c-8faf-60053a761b3e","order_by":8,"name":"Adam Yala","email":"","orcid":"","institution":"University of California, San Francisco","correspondingAuthor":false,"prefix":"","firstName":"Adam","middleName":"","lastName":"Yala","suffix":""},{"id":333183103,"identity":"b63164a8-cec7-4f96-94ec-bd9ca442701d","order_by":9,"name":"Jae Ho Sohn","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAv0lEQVRIiWNgGAWjYFACHgaGj/9sgAzGxgNEa2GcwZYG0tJAvBZmHrbDYCZxWuTbe489nMFz3m5t+2GgLTU20QS1GJw5l27wQeJ28rYziUAtx9JyGwhqkcgxk5xhcDvZ7ABQC2PDYcJa5GfkmEnzJJxLNjv/kEgtDDdAWg4csDO7QawtBmfOmEnObEhOMLsBtCWBGL/It/eYSXxssLM3O5/+8MGHGhsiHAYFiWCVCcQqBwF7UhSPglEwCkbBCAMARg1G6x5qUIIAAAAASUVORK5CYII=","orcid":"","institution":"University of California, San Francisco","correspondingAuthor":true,"prefix":"","firstName":"Jae","middleName":"Ho","lastName":"Sohn","suffix":""}],"badges":[],"createdAt":"2024-06-28 20:51:04","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4656707/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4656707/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12880-024-01435-w","type":"published","date":"2024-09-27T15:57:39+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":65627240,"identity":"b3ef6882-4714-42e7-8377-1f0eab6d1852","added_by":"auto","created_at":"2024-09-30 16:13:43","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1665974,"visible":true,"origin":"","legend":"","description":"","filename":"Manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4656707/v1_covered_6c73d44b-5b4a-48f2-93d2-3e16ef3bcc6e.pdf"}],"financialInterests":"No competing interests reported.","formattedTitle":"An Open-source Fine-tuned Large Language Model for Radiological Impression Generation: A Multi-reader Performance Study","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-imaging","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bmim","sideBox":"Learn more about [BMC Medical Imaging](http://bmcmedimaging.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bmim/default.aspx","title":"BMC Medical Imaging","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Natural Language Processing, Large Language Model, Open-source, Summarization, Impressions","lastPublishedDoi":"10.21203/rs.3.rs-4656707/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4656707/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e \u003cp\u003eThe impression section integrates key findings of a radiology report but can be subjective and variable. A fine-tuned open-source Large Language Model (LLM) was evaluated in its ability to generate radiological report impressions across different imaging modalities and hospitals. We sought to clinically validate an open-source fine-tuned LLM that automatically generates impressions to summarize radiology reports.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003e In this institutional review board-approved retrospective study, we fine-tuned an open-source LLM to generate the impression from the remainder of the radiology report. CT, US, and MRI radiology reports from Hospital 1 (n\u0026thinsp;=\u0026thinsp;372716) and Hospital 2 (n\u0026thinsp;=\u0026thinsp;60049), both under a single institution, were included in this study. The ROUGE score was used for automatic natural language evaluation and a reader study with five thoracic radiologists was performed for a clinical evaluation of CT chest impressions with a subspecialist baseline. We also stratified the results of the reader performance study based on the diagnosis category and the original impression length to gauge case complexity.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThe large language model achieved ROUGE-L scores of 46.51, 44.2, and 50.96 on the Hospital 1 dataset across the CT, US, and MRI modalities respectively. Upon external validation on the Hospital 2 independent test dataset, the model achieved ROUGE-L scores of 40.74, 37.89, and 24.61 for the same set of modalities. For the reader performance study, the model achieved overall mean scores of 3.56/4, 3.92/4, and 3.37/4, 18.29 seconds, and 12.32 words for clinical accuracy, grammatical accuracy, stylistic quality, edit time, and edit distance respectively. The LLM achieved the highest clinical accuracy ratings for acute/emergent findings. In terms of impression length, the LLM performed the best in clinical accuracy on shorter impressions.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eWe demonstrated that an open-source fine-tuned LLM can generate high-quality radiological impressions of clinical accuracy, grammatical accuracy, and stylistic quality across multiple imaging modalities and hospitals.\u003c/p\u003e","manuscriptTitle":"An Open-source Fine-tuned Large Language Model for Radiological Impression Generation: A Multi-reader Performance Study","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-08-01 13:47:12","doi":"10.21203/rs.3.rs-4656707/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2024-09-04T10:03:02+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-09-03T09:33:29+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"229977461686320135510744664924340152699","date":"2024-08-13T11:35:03+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2024-08-07T01:31:30+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"293213591489298471261490325279589773179","date":"2024-07-26T05:34:50+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2024-07-16T06:41:09+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2024-07-16T06:23:54+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-07-09T06:36:02+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-07-09T06:35:08+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Medical Imaging","date":"2024-06-28T20:48:14+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"bmc-medical-imaging","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bmim","sideBox":"Learn more about [BMC Medical Imaging](http://bmcmedimaging.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bmim/default.aspx","title":"BMC Medical Imaging","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"2cc72e0e-70c7-4d9e-87d1-ee9659b66579","owner":[],"postedDate":"August 1st, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2024-09-30T16:03:55+00:00","versionOfRecord":{"articleIdentity":"rs-4656707","link":"https://doi.org/10.1186/s12880-024-01435-w","journal":{"identity":"bmc-medical-imaging","isVorOnly":false,"title":"BMC Medical Imaging"},"publishedOn":"2024-09-27 15:57:39","publishedOnDateReadable":"September 27th, 2024"},"versionCreatedAt":"2024-08-01 13:47:12","video":"","vorDoi":"10.1186/s12880-024-01435-w","vorDoiUrl":"https://doi.org/10.1186/s12880-024-01435-w","workflowStages":[]},"version":"v1","identity":"rs-4656707","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4656707","identity":"rs-4656707","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.