Detection of Personal and Family History of Suicidal Thoughts and Behaviors using Deep Learning and Natural Language Processing: A Multi-Site Study | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Detection of Personal and Family History of Suicidal Thoughts and Behaviors using Deep Learning and Natural Language Processing: A Multi-Site Study Prakash Adekkanattu, Al'ona Furmanchuk, Yonghui Wu, Aman Pathak, and 15 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-4014472/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 28 Sep, 2024 Read the published version in npj Digital Medicine → Version 1 posted 9 You are reading this latest preprint version Abstract Objective: Personal and family history of suicidal thoughts and behaviors (PSH and FSH, respectively) are significant risk factors associated with future suicide events. These are often captured in narrative clinical notes in electronic health records (EHRs). Collaboratively, Weill Cornell Medicine (WCM), Northwestern Medicine (NM), and the University of Florida (UF) developed and validated deep learning (DL)-based natural language processing (NLP) tools to detect PSH and FSH from such notes. The tool's performance was further benchmarked against a method relying exclusively on ICD-9/10 diagnosis codes. Materials and Methods: We developed DL-based NLP tools utilizing pre-trained transformer models Bio_ClinicalBERT and GatorTron, and compared them with expert-informed, rule-based methods. The tools were initially developed and validated using manually annotated clinical notes at WCM. Their portability and performance were further evaluated using clinical notes at NM and UF. Results: The DL tools outperformed the rule-based NLP tool in identifying PSH and FHS. For detecting PSH, the rule-based system obtained an F1-score of 0.75 ± 0.07, while the Bio_ClinicalBERT and GatorTron DL tools scored 0.83 ± 0.09 and 0.84 ± 0.07, respectively. For detecting FSH, the rule-based NLP tool's F1-score was 0.69 ± 0.11, compared to 0.89 ± 0.10 for Bio_ClinicalBERT and 0.92 ± 0.07 for GatorTron. For the gold standard corpora across the three sites, only 2.2% (WCM), 9.3% (NM), and 7.8% (UF) of patients reported to have an ICD-9/10 diagnosis code for suicidal thoughts and behaviors prior to the clinical notes report date. The best performing GatorTron DL tool identified 93.0% (WCM), 80.4% (NM), and 89.0% (UF) of patients with documented PSH, and 85.0%(WCM), 89.5%(NM), and 100%(UF) of patients with documented FSH in their notes. Discussion: While PSH and FSH are significant risk factors for future suicide events, little effort has been made previously to identify individuals with these history. To address this, we developed a transformer based DL method and compared with conventional rule-based NLP approach. The varying effectiveness of the rule-based tools across sites suggests a need for improvement in its dictionary-based approach. In contrast, the performances of the DL tools were higher and comparable across sites. Furthermore, DL tools were fine-tuned using only small number of annotated notes at each site, underscores its greater adaptability to local documentation practices and lexical variations. Conclusion: Variations in local documentation practices across health care systems pose challenges to rule-based NLP tools. In contrast, the developed DL tools can effectively extract PSH and FSH information from unstructured clinical notes. These tools will provide clinicians with crucial information for assessing and treating patients at elevated risk for suicide who are rarely been diagnosed. Health sciences/Diseases/Psychiatric disorders Health sciences/Health care/Public health Full Text Additional Declarations (Not answered) Supplementary Files DetectionofSTBhistorySupplementaryMaterials.pdf Cite Share Download PDF Status: Published Journal Publication published 28 Sep, 2024 Read the published version in npj Digital Medicine → Version 1 posted Editorial decision: revise 25 Apr, 2024 Review # 2 received at journal 23 Apr, 2024 Reviewer # 2 agreed at journal 03 Apr, 2024 Review # 1 received at journal 19 Mar, 2024 Reviewer # 1 agreed at journal 11 Mar, 2024 Reviewers invited by journal 06 Mar, 2024 Editor assigned by journal 05 Mar, 2024 Submission checks completed at journal 05 Mar, 2024 First submitted to journal 04 Mar, 2024 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-4014472","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":276872334,"identity":"653bd123-905e-459e-bd5b-01632fb2e5a1","order_by":0,"name":"Prakash Adekkanattu","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAA90lEQVRIiWNgGAWjYHACxgcJBjZyQIYBhM/O3MDAwIZXC7PBh4o0Y4QWZkaCWtgkZ5w5nNhAtBZz6cMPpHnb0tLXtjdvYPhRcS+PH6TlQ9lhnFos+9IMjHnbbHK3nTlWwNhzprhYspmxgXHGOdxaDM4wGCQDbcnddiPHgJmxLSFxw2HGBmbeNnxa2D8cBipIN7v/BqJlP0jLX7xaeAwbgd5PMLvBA7UF6BcgA49feniKGYCBbLjtTFrBwZ4zCYkzgLYc7DmXjlOLOQ/79h/AqJQ3O35444MfFQmJ/e3NBx/8KLPG7TBkzgEMBkEto2AUjIJRMAqwAgB1Glr5UHOxIQAAAABJRU5ErkJggg==","orcid":"https://orcid.org/0000-0003-1125-1449","institution":"Weill Cornell Medicine","correspondingAuthor":true,"prefix":"","firstName":"Prakash","middleName":"","lastName":"Adekkanattu","suffix":""},{"id":276872335,"identity":"861b4037-8bfc-46c9-a3bf-92f80cfbe1fa","order_by":1,"name":"Al'ona Furmanchuk","email":"","orcid":"","institution":"Northwestern University Feinberg School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Al'ona","middleName":"","lastName":"Furmanchuk","suffix":""},{"id":276872336,"identity":"8831e0cf-d54e-4602-a1fb-1fe6263e908e","order_by":2,"name":"Yonghui Wu","email":"","orcid":"https://orcid.org/0000-0002-6780-6135","institution":"University of Florida","correspondingAuthor":false,"prefix":"","firstName":"Yonghui","middleName":"","lastName":"Wu","suffix":""},{"id":276872337,"identity":"f57889ba-a3f6-4688-a2e7-ab258d48128d","order_by":3,"name":"Aman Pathak","email":"","orcid":"","institution":"University of Florida College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Aman","middleName":"","lastName":"Pathak","suffix":""},{"id":276872338,"identity":"d3711765-ada1-4e97-be8a-cc2582d792c1","order_by":4,"name":"Braja Patra","email":"","orcid":"","institution":"Weill Cornell Medicine","correspondingAuthor":false,"prefix":"","firstName":"Braja","middleName":"","lastName":"Patra","suffix":""},{"id":276872339,"identity":"3711a894-c1c3-4185-9231-9d6e76ceabc1","order_by":5,"name":"Sarah Bost","email":"","orcid":"","institution":"University of Florida College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Sarah","middleName":"","lastName":"Bost","suffix":""},{"id":276872340,"identity":"39813148-5ef6-485c-bb3f-ee5a285642a7","order_by":6,"name":"Destinee Morrow","email":"","orcid":"","institution":"Lawrence Berkeley National Laboratory","correspondingAuthor":false,"prefix":"","firstName":"Destinee","middleName":"","lastName":"Morrow","suffix":""},{"id":276872341,"identity":"bb1ccbfc-aff2-4ad6-8995-17b85a3ce6fa","order_by":7,"name":"Grace Wang","email":"","orcid":"","institution":"University of Florida College of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Grace","middleName":"","lastName":"Wang","suffix":""},{"id":276872342,"identity":"f8a8220e-83af-4c7b-abff-4d8113f3dc4e","order_by":8,"name":"Yuyang Yang","email":"","orcid":"","institution":"Northwestern University Feinberg School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Yuyang","middleName":"","lastName":"Yang","suffix":""},{"id":276872343,"identity":"fc504b41-f7af-4077-8039-41839a002aaa","order_by":9,"name":"Noah Forrest","email":"","orcid":"","institution":"Northwestern University Feinberg School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Noah","middleName":"","lastName":"Forrest","suffix":""},{"id":276872344,"identity":"f1d9d68e-c208-4040-81b0-8bdfe1a8a026","order_by":10,"name":"Yuan Luo","email":"","orcid":"https://orcid.org/0000-0003-0195-7456","institution":"Northwestern University","correspondingAuthor":false,"prefix":"","firstName":"Yuan","middleName":"","lastName":"Luo","suffix":""},{"id":276872345,"identity":"1ecd5d2e-8616-4453-a3a2-7a9bd91b7c82","order_by":11,"name":"Theresa Walunas","email":"","orcid":"","institution":"Northwestern University Feinberg School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Theresa","middleName":"","lastName":"Walunas","suffix":""},{"id":276872346,"identity":"3bdea1a5-afab-400a-9dd3-bd0fd0d9d51b","order_by":12,"name":"Wei-Hsuan Jenny","email":"","orcid":"","institution":"University of Pittsburgh School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Wei-Hsuan","middleName":"","lastName":"Jenny","suffix":""},{"id":276872347,"identity":"d93931e5-a6da-44cc-be16-20622393ed9a","order_by":13,"name":"Walid Gellad","email":"","orcid":"","institution":"University of Pittsburgh School of Medicine","correspondingAuthor":false,"prefix":"","firstName":"Walid","middleName":"","lastName":"Gellad","suffix":""},{"id":276872348,"identity":"5b777028-ec93-46a6-9532-4cf5624f0fc2","order_by":14,"name":"Jiang Bian","email":"","orcid":"https://orcid.org/0000-0002-2238-5429","institution":"University of Florida","correspondingAuthor":false,"prefix":"","firstName":"Jiang","middleName":"","lastName":"Bian","suffix":""},{"id":276872349,"identity":"3f651b70-cb7e-445a-b7ff-9f2ae43f64d6","order_by":15,"name":"Yuhua Bao","email":"","orcid":"","institution":"Weill Cornell Medicine","correspondingAuthor":false,"prefix":"","firstName":"Yuhua","middleName":"","lastName":"Bao","suffix":""},{"id":276872350,"identity":"ac18a66d-2dd8-4300-8ebe-5696b3b01ece","order_by":16,"name":"Mark Weiner","email":"","orcid":"https://orcid.org/0000-0001-5586-9940","institution":"Weill Cornell Medicine","correspondingAuthor":false,"prefix":"","firstName":"Mark","middleName":"","lastName":"Weiner","suffix":""},{"id":276872351,"identity":"b6a8dc5f-c832-4cbb-8cae-9450f28abd31","order_by":17,"name":"Dave Oslin","email":"","orcid":"","institution":"Corporal Michael J Crescenz Veterans Affairs Medical Center","correspondingAuthor":false,"prefix":"","firstName":"Dave","middleName":"","lastName":"Oslin","suffix":""},{"id":276872352,"identity":"65923ddf-e0ed-49e5-b2fa-df6189ca051c","order_by":18,"name":"Jyotishman Pathak","email":"","orcid":"","institution":"Cornell","correspondingAuthor":false,"prefix":"","firstName":"Jyotishman","middleName":"","lastName":"Pathak","suffix":""}],"badges":[],"createdAt":"2024-03-04 22:30:25","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-4014472/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-4014472/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1038/s41746-024-01266-7","type":"published","date":"2024-09-28T04:00:00+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":65523876,"identity":"3e6bf50f-0eb8-40b7-bf87-c53490f7c9e0","added_by":"auto","created_at":"2024-09-29 07:08:11","extension":"pdf","order_by":1,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":523423,"visible":true,"origin":"","legend":"","description":"","filename":"DetectionofSTBhistorythroughNLPandDLFinal.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4014472/v1_covered_9f657ef2-5554-4daf-bc90-da0387ffe5b6.pdf"},{"id":52400763,"identity":"825c4ff5-1df1-41c6-8818-742d62303165","added_by":"auto","created_at":"2024-03-11 06:55:05","extension":"pdf","order_by":2,"title":"","display":"","copyAsset":false,"role":"supplement","size":106350,"visible":true,"origin":"","legend":"","description":"","filename":"DetectionofSTBhistorySupplementaryMaterials.pdf","url":"https://assets-eu.researchsquare.com/files/rs-4014472/v1/af63214b236584df0f749dfb.pdf"}],"financialInterests":"(Not answered)","formattedTitle":"Detection of Personal and Family History of Suicidal Thoughts and Behaviors using Deep Learning and Natural Language Processing: A Multi-Site Study","fulltext":[],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":false,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":true,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":true,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"
[email protected]","identity":"npj-digital-medicine","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjdigitalmed","sideBox":"Learn more about [npj Digital Medicine](http://www.nature.com/npjdigitalmed/)","snPcode":"41746","submissionUrl":"https://submission.springernature.com/new-submission/41746/3","title":"npj Digital Medicine","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-4014472/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-4014472/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eObjective: Personal and family history of suicidal thoughts and behaviors (PSH and FSH, respectively) are significant risk factors associated with future suicide events. These are often captured in narrative clinical notes in electronic health records (EHRs). Collaboratively, Weill Cornell Medicine (WCM), Northwestern Medicine (NM), and the University of Florida (UF) developed and validated deep learning (DL)-based natural language processing (NLP) tools to detect PSH and FSH from such notes. The tool's performance was further benchmarked against a method relying exclusively on ICD-9/10 diagnosis codes.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eMaterials and Methods: We developed DL-based NLP tools utilizing pre-trained transformer models Bio_ClinicalBERT and GatorTron, and compared them with expert-informed, rule-based methods. The tools were initially developed and validated using manually annotated clinical notes at WCM. Their portability and performance were further evaluated using clinical notes at NM and UF.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eResults: The DL tools outperformed the rule-based NLP tool in identifying PSH and FHS. For detecting PSH, the rule-based system obtained an F1-score of 0.75 ± 0.07, while the Bio_ClinicalBERT and GatorTron DL tools scored 0.83 ± 0.09 and 0.84 ± 0.07, respectively. For detecting FSH, the rule-based NLP tool's F1-score was 0.69 ± 0.11, compared to 0.89 ± 0.10 for Bio_ClinicalBERT and 0.92 ± 0.07 for GatorTron. For the gold standard corpora across the three sites, only 2.2% (WCM), 9.3% (NM), and 7.8% (UF) of patients reported to have an ICD-9/10 diagnosis code for suicidal thoughts and behaviors prior to the clinical notes report date. The best performing GatorTron DL tool identified 93.0% (WCM), 80.4% (NM), and 89.0% (UF) of patients with documented PSH, and 85.0%(WCM), 89.5%(NM), and 100%(UF) of patients with documented FSH in their notes.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eDiscussion: While PSH and FSH are significant risk factors for future suicide events, little effort has been made previously to identify individuals with these history. To address this, we developed a transformer based DL method and compared with conventional rule-based NLP approach. The varying effectiveness of the rule-based tools across sites suggests a need for improvement in its dictionary-based approach. In contrast, the performances of the DL tools were higher and comparable across sites. Furthermore, DL tools were fine-tuned using only small number of annotated notes at each site, underscores its greater adaptability to local documentation practices and lexical variations.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eConclusion: Variations in local documentation practices across health care systems pose challenges to rule-based NLP tools. In contrast, the developed DL tools can effectively extract PSH and FSH information from unstructured clinical notes. These tools will provide clinicians with crucial information for assessing and treating patients at elevated risk for suicide who are rarely been diagnosed.\u003c/p\u003e","manuscriptTitle":"Detection of Personal and Family History of Suicidal Thoughts and Behaviors using Deep Learning and Natural Language Processing: A Multi-Site Study","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2024-03-11 06:55:00","doi":"10.21203/rs.3.rs-4014472/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"revise","date":"2024-04-25T04:38:53+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"This content is not available.","date":"2024-04-24T03:14:49+00:00","index":2,"fulltext":"This content is not available."},{"type":"reviewerAgreed","content":"This content is not available.","date":"2024-04-03T21:05:50+00:00","index":2,"fulltext":"This content is not available."},{"type":"editorInvitedReview","content":"This content is not available.","date":"2024-03-19T22:00:44+00:00","index":1,"fulltext":"This content is not available."},{"type":"reviewerAgreed","content":"This content is not available.","date":"2024-03-11T17:58:15+00:00","index":1,"fulltext":"This content is not available."},{"type":"reviewersInvited","content":"","date":"2024-03-07T02:09:42+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2024-03-06T01:03:57+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2024-03-05T15:14:22+00:00","index":"","fulltext":""},{"type":"submitted","content":"npj Digital Medicine","date":"2024-03-04T22:26:44+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"
[email protected]","identity":"npj-digital-medicine","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjdigitalmed","sideBox":"Learn more about [npj Digital Medicine](http://www.nature.com/npjdigitalmed/)","snPcode":"41746","submissionUrl":"https://submission.springernature.com/new-submission/41746/3","title":"npj Digital Medicine","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"98bbbd31-8c27-4028-8733-cbbe57ace13a","owner":[],"postedDate":"March 11th, 2024","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[{"id":29182483,"name":"Health sciences/Diseases/Psychiatric disorders"},{"id":29182484,"name":"Health sciences/Health care/Public health"}],"tags":[],"updatedAt":"2024-09-29T07:08:05+00:00","versionOfRecord":{"articleIdentity":"rs-4014472","link":"https://doi.org/10.1038/s41746-024-01266-7","journal":{"identity":"npj-digital-medicine","isVorOnly":false,"title":"npj Digital Medicine"},"publishedOn":"2024-09-28 04:00:00","publishedOnDateReadable":"September 28th, 2024"},"versionCreatedAt":"2024-03-11 06:55:00","video":"","vorDoi":"10.1038/s41746-024-01266-7","vorDoiUrl":"https://doi.org/10.1038/s41746-024-01266-7","workflowStages":[]},"version":"v1","identity":"rs-4014472","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-4014472","identity":"rs-4014472","version":["v1"]},"buildId":"qtupq5eGEP_6zYnWcrvyt","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.