Abstract
Introduction Retrospective analysis of sleep health among pediatric patients can enable important care and condition related discoveries. Often, sleep health is only encoded in a patient’s structured data after formal diagnosis. However, their unstructured clinical text often contains many detailed sleep health mentions prior to diagnosis. These mentions are numerous and cannot reasonably be identified manually, thus computer assisted tools must be developed. We present a novel, low-resource sleep vocabulary that can be applied to identify notes containing sleep mentions automatically.
Methods
Using a combination of existing sleep ontologies, interviews with clinicians, and examination of clinical note narratives, we develop a novel vocabulary of sleep health terms and phrases that cover both technical terms, abbreviations, and colloquial keywords used in describing sleep health. We compare our vocabulary against a set of manually annotated clinical notes to determine the effectiveness of our vocabulary for identifying notes with sleep health mentions.
Results
Our vocabulary was able to correctly identify clinical notes with sleep health mentions with a precision of 0.838 and recall of 0.869.
Conclusion
Our vocabulary showed excellent performance for identifying sleep health mentions at the clinical note level. The vocabulary was not able to accurately identify the specific text spans containing the mentions, which likely would require a more high-resource model. Thus, our low-resource vocabulary, which can be deployed in almost any compute environment, can serve as an identifying first pass over clinical notes to identify which notes should be further processed by more advanced models or manual review to identifying sleep health mentions.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
MD was supported by the National Heart, Lung, Blood Institute (1K01HL169493-1; Principal Investigator: MD).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The IRB at Nationwide Children's Hospital reviewed this research and approved it (STUDY00004027: Determinants of the Sleep Health Care Disparities).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Data is not available.
Text is read by the "Ask this paper" AI Q&A widget below.
Extraction quality varies by source — PMC NXML preserves structure
cleanly, OA-HTML may include some navigation residue, and OA-PDF can
have broken hyphenation. The publisher copy
(via DOI)
is the canonical version.