SleepBert: An Intelligent Clinical Encyclopaedia for Sleep Disorders Using Large Language Models

doi:10.21203/rs.3.rs-6605863/v1

SleepBert: An Intelligent Clinical Encyclopaedia for Sleep Disorders Using Large Language Models

2025 · doi:10.21203/rs.3.rs-6605863/v1

preprint OA: closed

Full text JSON View at publisher

Full text 109,718 characters · extracted from preprint-html · click to expand

SleepBert: An Intelligent Clinical Encyclopaedia for Sleep Disorders Using Large Language Models | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article SleepBert: An Intelligent Clinical Encyclopaedia for Sleep Disorders Using Large Language Models Amala Ann KA, Dr Vaidhehi V This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-6605863/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Diagnosis of sleep disorders is difficult owing to the nature of sleep microarchitecture and the heterogeneity of symptom presentation. Conventional analysis of Polysomnography (PSG)—the interpretation of EEG bandpower, sleep spindles, and K-complexes—is time-consuming, laborious, and subjective, restricting detection of infrequent co-occurrences of disorders and their link to neuro-cognitive and genetic disorders. To overcome these challenges, we present SleepBert , a hybrid Retrieval-Augmented Generation (RAG) model that combines structured PSG features with unstructured clinical narratives for holistic sleep disorder analysis. Constructed by fine-tuning ClinicalBERT on PSG data from the NCH (paediatric dataset) and ISRUC datasets, SleepBert has a PSG-specific knowledge retrieval layer to retrieve real-time evidence from medical databases such as PubMed. The model delivered 93.40% accuracy, outdoing ClinicalBERT (87.20%) and BERT (80.90%), with 90.1% accuracy in retrieving PubMed and response latency of 5.4 seconds. This system serves as an Encyclopaedia of sleep disorders, delivering evidence-based, correct insights and support for decision making to clinicians and researchers. The system supports the analysis of a large number of PSGs, speeds up data-driven discoveries, and allows access to rare neuro-cognitive and genetic markers. SleepBert is an extensible platform for pushing the frontier of sleep disorder research and enhancing clinical decision-making through quick, accurate interpretations of sophisticated PSG data. Computational Neuroscience Medical Informatics Artificial Intelligence and Machine Learning Sleep Study RAG NCH LLMs Bert Polysomnography Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 1. Introduction Sleep is an important biological function that serves to sustain cognitive processes, emotional well-being, and physical health. Sleep disturbances have been shown to result in numerous health issues, such as cardiovascular disease, cognitive disorders, and psychiatric illnesses. Sleep disorders touch the lives of millions of people across the world, with Obstructive Sleep Apnea (OSA) occurring in about 936 million adults worldwide and insomnia occurring in 10–30% of the world's population. Furthermore, rare genetic and neuro-cognitive disorders tend to be characterized by abnormal sleep microarchitecture, hence the importance of early and accurate diagnosis for successful intervention. Though clinically valuable, Polysomnography (PSG), the gold standard for sleep disorder diagnosis[ 1 ], is a manual time-consuming process involving expert interpretation of EEG, EOG, EMG, and other physiological signals. The constraint prevents identification of rare disorder co-occurrences and retards data-driven discoveries in sleep research. 1.1. Vital Features for Sleep Study Quantitative sleep analysis is dependent on specific PSG characteristics representing brain activity and physiological changes in sleep. These characteristics are: EEG Bandpower: Quantifies brain wave activity (Delta, Theta, Alpha, Beta, Gamma) to analyze sleep stages and disturbances. Sleep Spindles: Correlated with memory consolidation and stability of NREM sleep. K-Complexes: Represents sensory processing in slow-wave sleep. Eye Movements (EOG): Identifies REM sleep and differentiates between sleep phases. Muscle Tone (EMG): Detects muscle atonia in REM and movement disorders. Large Language Models (LLMs) role in Sleep Study Recent developments in Large Language Models (LLMs) provide new opportunities for the automation and augmentation of medical diagnostics. LLMs like BERT and ClinicalBERT can handle sophisticated clinical narratives and combine structured and unstructured medical information. In sleep research, LLMs can be used to analyze PSG features, detect patterns, and offer evidence-based conclusions. Conventional LLMs are not effective with specialized medical questions and need regular updates to remain in sync with new research. 1.2. Our Approach: Objectives, Methodology, and Rationale To overcome the shortcomings of manual PSG interpretation and restricted LLM performance, we introduce SleepBert, a hybrid RAG model that is fine-tuned for extensive sleep disorder analysis. SleepBert is constructed by fine-tuning ClinicalBERT on NCH (children's dataset) and ISRUC datasets' PSG data. SleepBert integrates structured PSG features (e.g., EEG bandpower, sleep spindles, K-complexes) with unstructured clinical notes and retrieves real-time evidence from PubMed. This allows SleepBert to: Multimodal integration and Specialized query response. Improve Decision Support: Deliver evidence-informed insights to help clinicians. Find Rare Co-Occurrences: Uncover unusual disorder pairs and upcoming trends. Scale Research: Facilitate large-scale investigation of neuro-cognitive and genetic markers associated with sleep disorders. 2. Literature review The literature study was conducted on works related to LLMs between the years 2022–2024. 2.1. Related work on LLMs associated with health study Ghali et al in their research [ 1 ] tackles the challenge using Retrieval-Augmented Generation (RAG), a method that improves model responses by basing them on factual knowledge. To address scalability issues, the study explores linking user queries with sophisticated language models such as BERT and Orca2 using an innovative query optimization process. The research compares three scenarios: first, without RAG; second, without extra help; and lastly, with augmented query support. Empirical findings, obtained from schizophrenia-related questions, show a significant enhancement in the performance of the base language model when RAG is applied, especially when it is augmented by prompt augmenters. BERT returns the best accuracy among models under test; however, its computational time is the highest. There is another study[ 2 ] discussing a system that tunes a GPT-4-based LLM and couples it with a vector database using RAG to provide increased personalization of care plans. Diagnostic reports generated by AI, tested and rated by clinical physicians, reached 90% accuracy and 88% readability score according to major clinical parameters. Yu et al in their article [ 3 ] presents Health-LLM, a new system that encompasses large-scale feature extraction, accurate medical knowledge scoring, and machine learning methods in order to provide improved analysis for patient health reports. The system outperforms GPT-3.5, GPT-4, and fine-tuned LLaMA 2 by a wide margin in predicting future diseases. SouLLMate[ 4 ] is a responsive LLM-based system that incorporates large language model technologies, Chain, Retrieval-Augmented Generation (RAG), prompt engineering, and domain expertise. It provides sophisticated capabilities, such as Risk Detection, Proactive Guidance Dialogue, and Conversational Information Extraction through RAG-based personalized profile uploads. The performance of the system for mental health pre-screening was tested using the DAIC-WOZ database, which centers on psychological distress disorders like anxiety, depression, and PTSD. In zero-shot settings, SouLLMate performed at 80% accuracy in clinical mental health evaluations, which speaks to its strength in detecting psychological risks. The sleep health and lifestyle dataset employed in this research [ 5 ] is taken from the Kaggle website. This work examines using the superior language and reasoning ability of large language models (LLMs) to automatically detect sleep disorders. LLMs were trained on data that includes sleep patterns, lifestyle habits, and associated health indicators, applying three new prompting strategies to drive classifier design, training, and evaluation. The outcome shows that an SVM classifier, determined by decomposed prompting, obtained 91.9% accuracy (F1-score: 0.919), performing much better compared to conventional zero-shot and few-shot approaches.One of the studies uses[ 6 ] basic sleep metrics that were recovered from polysomnography (PSG) notes of veterans within the Corporate Data Warehouse (CDW) national database via large language models (LLMs). The model's accuracy was tested on 464 human-annotated notes and proved as accurate as human extraction for sleep efficiency (SE) and total sleep time (TST). Interestingly, LLM performed at a 7.6% improvement in obtaining sleep onset latency (SOL) over human annotation, signaling its improved accuracy in the identification of certain sleep parameters. Khaokaew et al [ 7 ] brought about ZzzGPT. Of note, it also improved by 7.6% in the extraction of sleep onset latency (SOL) compared to human annotation, reflecting its improved specificity in the identification of individual sleep parameters.Lastly. Sano et al in their paper [ 9 ] investigates the application of large language models (LLMs) for predicting attention states, sleep stages, and sleep quality, along with producing tailored sleep improvement recommendations and adaptive guided imagery scripts using electroencephalogram (EEG) and physical activity data (e.g., waveforms, power spectrogram images, and numeric features. Limitations of existing study Most models are only trained on certain datasets (e.g., IMCS-21, DAIC-WOZ, CDW) that potentially lack diversity across populations. This restricts their use for wider, real-world clinical environments in various age groups, ethnicities, and health conditions. The models are based on existing, static databases that do not necessarily reflect current physiological alterations or longitudinal trends in patient health. This limits their capacity to adjust to changing patient conditions over time. Although some research incorporates multimodal data (e.g., EEG waveforms, physical activity, textual notes), there is no common framework for integrating structured and unstructured data in a seamless manner. This may result in incomplete or biased predictions. Most of the LLM-based systems target mental disorders (e.g., anxiety, PTSD) and lifestyle diseases, without any specialized models optimized for sleep disorder detection and characterization. This limitation prevents existing frameworks from recognizing and interpreting intricate sleep-related pathologies such as sleep apnea, insomnia, narcolepsy, and parasomnias. 3. Need for research Precise diagnosis and individualized treatment of sleep disorders continue to be a challenging task because PSG data is complex and heterogeneous. Although current studies use large language models (LLMs) for broad medical applications, there is an urgent need to bridge the gap between structured PSG signals and unstructured clinical notes to analyze the entire range of sleep microarchitecture. Limitations of Existing Approaches Most recent LLM-based studies concentrate on mental illness and lifestyle disorders, omitting the holistic interpretation of PSG data obtained from overnight experiments. None of the current LLM models are sleep architecture-specific and can deal with the multi-modal PSG signals (EEG, EOG, EMG, ECG, and respiratory channels). Challenges in PSG Data Interpretation PSG signals are multi-modal and need sophisticated methods to extract meaningful patterns from heterogeneous sources, such as sleep stages, respiratory events, and neurophysiological markers. 3.1. Need for a Specialized PSG-Focused LLM-RAG System Sleep disorders continue to be under-researched, especially their relation to genetic and neurological disorders. Although PSG information offers an exhaustive perspective of sleep architecture, its capability to detect hidden neuro-cognitive deficits and genetic markers has not been well researched. Current studies concentrate largely on mental and lifestyle-linked sleep disturbances with little research focusing on uncommon sleep disorders and their biological foundations. In addition, multi-modal PSG signals, which record various physiological processes, tend to be investigated in silo, losing significant cross-signal interactions that can elucidate the connection between sleep dysfunction and neurologic or genetic disorders. The absence of an area-specific system that can accommodate the complexity of PSG data and its relation to neuro-cognitive and genetic bases restricts delivering precise diagnoses and tailored treatments[ 10 ]. Fulfilling this need is instrumental in moving towards precision medicine for sleep research and enhancing the realization of how the microarchitecture of sleep mirrors conditions of systemic health. 3.2. Problem Statement Although large language models (LLMs) have been promising in clinical reasoning and information extraction, most current models tend to concentrate on lifestyle-dependent sleep habits or mental health use cases [ 15 ]. The present study attempts to fill these voids by creating a hybrid PSG-centric LLM-RAG system that integrates: Multi-Modal Integration: Leveraging the LLM on structured PSG features (e.g., EEG bands, EOG, sleep architecture) and unstructured clinical notes (e.g., sleep stage annotations, diagnosis details, and demographics). This facilitates holistic modelling of biological signals and clinical context. PSG-Specific Knowledge Retrieval: Developing a specialized knowledge retrieval layer that augments RAG using PSG data and PubMed articles related to the literature based meta-analysis and other biomarkers. This guarantees that the model can retrieve and respond based on clinical evidence. Specialized Query Response: Enhancing query interpretation for sleep disorders using domain-specific prompting and PSG-guided retrieval, enabling more precise and clinically relevant outputs. The system to be proposed will enable automated sleep parameter extraction, improved diagnostic understanding, and tailored intervention suggestions, filling the gap between state-of-the-art language modelling and PSG-driven sleep disorder analysis. 3.4. Expected Impact of this research The system acts as an Encyclopaedia of sleep disorders, giving medical professionals such as doctors, sleep specialists, and researchers a one-stop shop for accurate, evidence-based information. It brings together knowledge from PSG data, clinical notes, and specialist literature to facilitate instant access to information on uncommon genetic and neuro-cognitive markers associated with sleep abnormalities. It is meant to complement but not replace clinical know-how. It is a decision-support system that provides sound interpretations of intricate PSG data and proposes pertinent relationships with sleep disorders, enabling clinicians to decide while maintaining clinical control. The PSG-specific knowledge retrieval layer maximizes information retrieval from expert medical databases (e.g., PubMed). This guarantees that the system provides clinically applicable and current evidence, enabling large-scale medical studies and exploring new patterns in sleep research. Accelerating Sleep Disorder Research: The infrastructure offers a scalable research environment for studying rare sleep disorders and their overlap with neurological and genetic diseases. It enables large-scale analysis of PSG data sets, supporting data-driven findings and allowing researchers to discover new clinical correlations. 4. Methodology The methodology section is divided into three major parts starting with the Data, Modelling and the Results. 4.1. Data We have used a combination of ISRUC-Sleep Dataset and the NCH Dataset for our study. ISRUC data was collected from human adults[ 14 ], healthy subjects, and subjects with sleep disorders who were under the influence of sleep medication. The data set, which is designed to accommodate various research goals, contains three sets of data:data related to 100 subjects, It also contains data collected from one recording session pertaining to 10 healthy subjects, which are helpful for studies dealing with comparison of healthy subjects with the patients having sleep disorders. The NCH Sleep DataBank consists of 3,984 sleep studies performed on 3,673 unique patients. Of them, 3,400 patients have one sleep study in the dataset, 238 have two studies, and 35 patients have more than two studies, with a maximum of 5 sleep studies for one patient. In terms of gender distribution, 2,068 patients were male, and 1,604 were female, with one unknown. 4.1.1. Inclusion Criteria of participants Subjects for this study were randomly selected from two publicly available datasets: NCH and ISRUC, and 30 subjects from each dataset (60 subjects in total) as depicted in Table 1 . The following inclusion criteria were used: Availability of PSG Data: Subjects should have full overnight PSG recordings that include EEG, EOG, EMG, ECG, snore microphone, and pressure flow signals. Age Range: NCH dataset: Participants between the ages of 2 and 18 years (paediatric group); ISRUC dataset: Adult group of participants aged 18 years and above. This is to prevent our model from being biased towards a particular gender or age group alone. Sleep Disorder Diagnosis: Participants with clinically confirmed sleep disorders (e.g., insomnia, sleep apnea, narcolepsy, and other neuro-cognitive disorders) were included for in-depth disorder analysis. Data Quality: High-quality recordings with low artifact contamination and correct clinical annotations (sleep stage scoring and diagnostic information) were chosen. Age Range: Participants were recruited from various age groups to provide representative coverage of age-related sleep differences. Demographic Diversity: Male and female participants were recruited to identify gender-based differences in sleep patterns and disorder presentation. Table 1 Demographics of data Dataset Age Interval (Years) Male (n) Female (n) Total (n) NCH 2–6 7 5 12 7–12 6 6 12 13–18 3 3 6 ISRUC 18–30 5 4 9 31–50 6 6 12 51–70 4 5 9 Total 31 29 60 4.1.2. Selection Criteria of input features Table 2 presents an overview of the selected input feature points. Table 2 Feature selection for the model Feature Category Features Description PSG Signal Features EEG (Electroencephalography) Brain activity during sleep across different regions. Frontal EEG (F4-M1, F3-M2) Monitors activity in the frontal cortex. Occipital EEG (O2-M1, O1-M2) Tracks visual and posterior brain activity. EOG (Electrooculography) Monitors eye movements, crucial for REM detection. Left Outer Canthus (E1-M2) Captures left-eye movements. Right Outer Canthus (E2-M2) Captures right-eye movements. EMG (Electromyography) Measures muscle tone, distinguishing REM from non-REM. Chin EMG (EMG1, EMG2, EMG3) Records muscle activity in the chin region. ECG (Electrocardiography) Captures heart rate variability (HRV). ECG (ECG1-ECG2) Measures cardiac activity. Respiratory Signals Monitors breathing patterns for event detection. Snore Microphone (Snore) Captures snoring patterns. Pressure Flow (Pflow) Monitors airflow to detect apneas. Clinical and Demographic Features Sleep Stage Annotations Epoch-level sleep stages (Wake, N1, N2, N3, REM). Diagnosis Details Specific sleep disorders and neuro-genetic conditions. Demographic Data Contextual patient data (age, gender, etc.). Age Age range of participants. Gender Male/Female count. Derived and Statistical Features Total Sleep Time (TST) Total time spent asleep. Sleep Efficiency (SE) Efficiency of sleep relative to time in bed. Sleep Onset Latency (SOL) Time to transition from wake to sleep. REM Latency Time from sleep onset to the first REM stage. Event Annotations Detection of abnormal sleep events. Apneas & Hypopneas Abnormal breathing events. Arousals Sudden EEG bursts (> 16 Hz). Power Spectral Density (PSD) Frequency-based EEG feature analysis. PubMed Articles Related to Sleep Studies Indexed research articles for knowledge retrieval. Scientific Literature Features Sleep disorder frameworks Classification standards (e.g., ICSD-3) [ 9 ] EEG biomarkers Patterns linked to neuro-cognitive conditions. Genetic markers and sleep Insights on genetic influences on sleep. Information Extraction Process Dynamic retrieval for enhanced predictions. Data Preparation and Method of calculation of each feature The PSG signal characteristics include multi-channel overnight physiological recordings such as EEG, EOG, EMG, ECG, and respiration signals (snore microphone and pressure flow). For dataset consistency, the signals are resampled to a single frequency using the MNE-Python library. The datasets are separated into 30-second epochs, a common technique in sleep studies, where n epochs constitute the entire recording of each participant. For each epoch, we calculate the mean value for each signal, so that one representative value is obtained per epoch. The resultant processed dataset for PSG signals comprises participant_id, epoch, and the calculated mean value for each PSG feature as depicted in Fig. 1 . Derived features are calculated by statistical processing of the PSG signals and reflect sophisticated sleep architecture and detection of abnormal events[ 11 ]. Total Sleep Time (TST) is defined as the total of all non-wake epochs, and Sleep Efficiency (SE) is calculated as the TST divided by the total time in bed (TIB), multiplied by 100 to be expressed as a percentage. Sleep Onset Latency (SOL) is defined as the interval between the lights-off event and the onset of sleep (first appearance of the N1 stage), and REM Latency as the time between the onset of sleep and the first REM epoch. Atypical respiratory events such as apneas and hypopneas are identified via amplitude-based thresholding and duration of the events, and arousals by finding short bursts of high-frequency EEG activity (> 16 Hz). To measure frequency-based dynamics quantitatively, Power Spectral Density (PSD) is calculated for every epoch for all EEG bands with Welch's method [ 10 ], yielding a frequency-domain measure of brain activity. Figure 2 illustrates the PSD of the first subject. 4.2. Experimental setup The experiment was executed in Python 3.12, and dependencies were installed for the model accordingly. Training was done with two RTX 2080-Ti GPUs. In the training procedure. The random seed was set to 42 for the entire training process for both the training processes to achieve reproducibility. 4.3. Model Architecture and Training Process Choice of ClinicalBERT as the Base LLM We chose ClinicalBERT as our base language model because of its dedicated training on biomedical corpora of PubMed and MIMIC-III clinical notes, which specifically makes it well suited for grasping medical jargon and patient narratives. Moreover, ClinicalBERT's Bidirectional Transformer-based architecture accurately captures contextual relationships, which is essential when deciphering intricate sleep-stage annotations and sleep patterns versus neurological or genetic disorders' correlations. 4.3.1. Data Collection and Preprocessing The first step is the accumulation of a specialized dataset targeting sleep disorders, neuro-cognitive disorders, and genetic markers. SleepBert's knowledge base is comprised of three main sources: 1.PSG Data: Polysomnography-structured features (e.g., EEG, EOG, EMG signals) augmented with demographic data and sleep stage annotations. 2.Clinical Notes: Unstructured text including diagnostic reports, patient history, and descriptions of sleep architecture. 3.Medical Literature: Filtered scientific articles from PubMed with a focus on sleep-relevant genetic mutations, neurophysiological tendencies, and uncommon sleep disorders. How is the External Medical Literature prepared ? By utilizing the NCBI Entrez API, we build advanced-level search queries that use (Medical Subject Heading) MeSH terms and Boolean operators to narrow down articles from the past decade. This search approach focuses on major themes including EEG patterns, genetic mutations, and unusual sleep disorders (as shown in Fig. 4 ), to provide exhaustive coverage of applicable biomedical studies. The articles that are retrieved go through preprocessing involving text cleaning and Named Entity Recognition (NER) with ScispaCy, to identify vital information such as gene mentions, sleep conditions, and neurophysiological markers. Each snippet of full-text and abstract is then converted into dense vector embeddings by ClinicalBERT, which encode the contextual meaning of the text. These embeddings are indexed in a high-performance FAISS index, allowing for efficient similarity search. Upon receiving a query from a user, the system converts the query into an embedding and conducts a nearest-neighbor search over the indexed knowledge base to return the most relevant scientific contexts. This retrieved information is blended with the user query and refined PSG data to generate a complete, evidence-based response. This external biomedical retrieval layer enriches SleepBert by coupling real-time scientific knowledge with its learned representations to generate up-to-date and contextually rich responses for intricate sleep-related queries. The data is then normalized to eliminate noise (special characters, citations), and segmented into 512-token chunks for input size constraints of the transformer. 4.3.2. SleepBert: ClinicalBERT Fine-Tuning As mentioned before, the base model is ClinicalBERT, a transformer pre-trained on biomedical text. It is fine-tuned on both the merged PSG data and clinical notes to identify domain-specific patterns. Fine-tuning improves SleepBert's capability to recognize and relate sleep microarchitecture to neuro-cognitive and genetic disorders and sleep disorders. For a mixed dataset of both structured and unstructured inputs, the fine-tuning takes a multi-task approach: Masked Language Modeling (MLM): To enhance contextual comprehension of medical vocabulary and PSG-related terminologies. Sequence Classification: To link certain sleep stages and types of disorders with their neurophysiological patterns. The result of this step is SleepBert, a domain-specific BERT variant that can encode sophisticated sleep-related queries and provide medically correct responses. 4.3.3. Embedding Generation and Storage SleepBert converts the aggregated texts into compact embeddings—768-dimensional vectors that represent the semantic meaning of each chunk. This is done for both the medical literature and the PSG-clinical dataset. These embeddings are kept in a vector database to facilitate quick and efficient querying by similarity. Each vector has an association to its originating source (PSG observation, clinical note, or PubMed article) to maintain context accuracy. 4.3.4. Query Processing and Contextual Retrieval When a query is submitted (e.g., "What are the genetic mutations linked with sleep apnea?"), the system converts it to a query embedding with SleepBert. It is matched with the precomputed vectors in the database using cosine similarity. The top-k most similar contexts covering PSG patterns, clinical annotations, and PubMed articles are fetched to be augmented. 4.3.5. Augmentation and Context Fusion The RAG system uses the user query along with the extracted contexts to generate an augmented prompt. This improved input augments SleepBert's understanding with PSG-specific and clinical knowledge as well as external medical literature. For instance, the enhanced input may include: User query: "What genetic mutations are linked to sleep apnea?" Retrieved context: "PER3 variants increase susceptibility to sleep apnea through circadian modulation." "HLA-DQB1*06:02 allele is more prevalent in patients with severe obstructive sleep apnea." 4.3.6. Response Generation with SleepBert SleepBert processes the augmented input to generate a comprehensive, context-aware response. The model synthesizes knowledge from PSG data and external literature, ensuring that the output reflects current clinical understanding . For the genetic mutation query, SleepBert might generate: "Genetic mutations linked to sleep apnea include PER3 polymorphisms , which affect circadian rhythms and sleep architecture. HLA-DQB1 variants, particularly HLA-DQB1*06 02 , are associated with increased severity of obstructive sleep apnea. Furthermore, mutations in CLOCK and BMAL1 influence sleep fragmentation and circadian disruption." 4.3.7. Output Layer and Source Attribution The final response is presented with source citations to maintain transparency and facilitate further inquiry. SleepBert automatically maps each retrieved context to its originating data source. Some examples of how the SleepBert responds to the user queries are shown in Fig. 5 , 6 , and 7 . 4.4. Evaluation Metrics To compare the performance of SleepBert with the baseline models, ClinicalBERT and BERT, on our particular application of examining PSG features and clinical questions, we compare them on our specific use case. We employ various metrics to determine the accuracy and relevance of the models, such as the BLEU score (Bilingual Evaluation Understudy) [ 12 ], which calculates the similarity between the generated text and reference outputs and reflects the quality of text generation. Bleu estimates readability between generated and reference text through n-gram matching. It is measured both in terms of precision and recall through n-gram overlap. In spite of coming under criticism, Bleu is yet to be deprecated for the purpose of textual fluency evaluation[ 11 ]. Basically, Bleu calculates the geometric mean of the n-gram precision of the generated text with respect to the reference. Moreover, a brevity penalty term is also added to Bleu for compensation of length differences between the generated and reference text. In addition, we monitor average input tokens (query tokens), average completion tokens (tokens in response), and derive these from response.usage to evaluate the effectiveness of responses from the model. The result obtained was computed by comparing the generated responses from the model against a manually annotated clinical query dataset and their respective correct answers. The output of each model was assessed for correctness depending on whether it accurately tagged the relevant PSG abnormalities, clinical conditions, and PubMed references as confirmed by domain experts as depicted in Table 3 . The accuracy measure represents the proportion of correct responses over all queries evaluated. For response latency and other time metrics, we utilized response.usage offered by the Hugging Face API to monitor the processing time of every model[ 13 ]. Response time was calculated from the instant that the query was posted to the model to the instant the response was completely generated. Table 3 Comparison of SleepBert performance with BERT and ClinicalBert Metric SleepBert ClinicalBERT BERT Accuracy 93.40% 87.20% 80.90% BLEU Score 0.81 0.75 0.68 Average Input Tokens 512 510 520 Average Completion Tokens 142 130 125 Response Latency 5.4 seconds 5.8 seconds 6.1 seconds PubMed Retrieval Accuracy 90.1% (relevant references) 88.3% (relevant references) 83.5% (relevant references) Completion Efficiency 94.3% (short, precise) 92.7% (concise) 89.4% (verbose) 5. Conclusion and Discussion Here, we present SleepBert, a domain-specific RAG-LLM (Retrieval-Augmented Generation with a LLM framework that is specifically designed for end-to-end sleep disorder analysis. SleepBert combines PSG data, clinical text, and domain-specific medical literature (e.g., PubMed) to offer precise, evidence-based findings on sleep microarchitecture, neuro-cognitive disorders, and genetic disorders. By fine-tuning ClinicalBERT over multimodal sleep data and adding a PSG-specific knowledge retrieval layer, we significantly improved the performance. SleepBert obtained 93.4% accuracy, overtaking both ClinicalBERT (87.2%) and BERT (80.9%), with 90.1% PubMed retrieval accuracy, guaranteeing accurate and timely evidence retrieval.The effect of SleepBert is multi-faceted. It serves as an Encyclopaedia of Sleep Disorders, equipping clinicians, researchers, and physicians with a single, centralized source for quick, trusted decision support.But there are some limitations to our method. The performance of the model can be different for unseen rare conditions, and retrieval precision is subject to the availability and quality of external literature. Declarations Institution where the work was performed: This work was conducted at the Department of Data Science and Statistics, CHRIST (Deemed to be University), Bengaluru. Author Approval Statement: All authors have seen and approved the final version of the manuscript. Declarations for Each Author: Financial Support: No funding was received for the conduct of this study. Conflict of Interest: The authors declare that there are no conflicts of interest. Clinical Trial Declaration: This manuscript does not report on a clinical trial. Acknowledgments NCH Sleep DataBank was supported by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under Award Number R01EB025018. The National Sleep Research Resource was supported by the U.S. National Institutes of Health, National Heart Lung and Blood Institute (R24 HL114473, 75N92019R002). ISRUC-Sleep dataset contains data collected from all-night PSG recordings with duration around eight hours. Each recording was randomly selected between PSG recordings that were acquired by the Sleep Medicine Centre of the Hospital of Coimbra University (CHUC), in the period 2009–2013. Ethical Considerations With the advent of more advanced models and improved performance enhancement methods, there remains room for further improvements. SleepBert is sufficiently professional to serve as an assistant to mental health professionals. Importantly, SleepBert does not engage in scenarios requiring serious diagnoses, thereby maintaining ethical boundaries and ensuring that it complements rather than replaces professional judgment. Declaration of Conflicting Interests In relation to the research, writing, and/or publication of this work, the author(s) declared no conflicts of interest. Funding The research, writing, and/or publication of this work were all done without financial assistance from the author(s) References Ghali JPE, Shima K, Moriyama K, Mutoh A, Inuzuka N (2024) Enhancing Retrieval Processes for Language Generation with Augmented Queries to Provide Factual Information on Schizophrenia. Procedia Comput Sci 246:443–452 Hu HW, Lin YC, Chia CH, Chuang E, Ru YC (2024), July Leveraging Large Language Models for Generating Personalized Care Recommendations in Dementia. In 2024 IEEE International Workshop on Electromagnetics: Applications and Student Innovation Competition (iWEM) (pp. 1–4). IEEE Yu Q, Jin M, Shu D, Zhang C, Fan L, Hua W, Zhang Y (2024) Health-LLM: Personalized Retrieval-Augmented Disease Prediction System. arXiv preprint arXiv:2402.00746 Guo Q, Tang J, Sun W, Tang H, Shang Y, Wang W SouLLMate: An Application Enhancing Diverse Mental Health Support with Adaptive LLMs, Prompt Engineering, and, Techniques RAG (2024) arXiv preprint arXiv:2410.16322 Zhao Y (2025) Automatic Sleep Disorder Classification Using Large Language Model Prompting on Sleep Health and Lifestyle Data Zhao Y (2025) Automatic Sleep Disorder Classification Using Large Language Model Prompting on Sleep Health and Lifestyle Data Khaokaew Y, Ji K, Nguyen TH, Kegalle H, Alaofi M, Xue H, Salim FD (2023) ZzzGPT: an interactive GPT approach to enhance sleep quality. arXiv preprint arXiv:2310.16242 Sano A, Amores J, Czerwinski M (2024) Exploration of LLMs, EEG, and behavioral data to measure and support attention and sleep. arXiv preprint arXiv:2408.07822 Sano A, Amores J, Czerwinski M (2024) Exploration of LLMs, EEG, and behavioral data to measure and support attention and sleep. arXiv preprint arXiv :240807822 Kim J, Lee SY, Kim JH, Shin DH, Oh EH, Kim JA, Cho JW (2024) ChatGPT vs. sleep disorder specialist responses to common sleep queries: Ratings by experts and laypeople. Sleep Health 10(6):665–670 Ohayon MM, Roberts RE (2001) Comparability of sleep disorders diagnoses using DSM-IV and ICSD classifications with adolescents. Sleep 24(8):920–925 Parhi KK, Ayinala M (2013) Low-complexity Welch power spectral density computation. IEEE Trans Circuits Syst I Regul Pap 61(1):172–182 Mahapatra J, Garain U (2024) Impact of model size on fine-tuned llm performance in data-to-text generation: A state-of-the-art investigation. arXiv preprint arXiv:2407.14088 Khalighi S, Sousa T, Santos JM, Nunes U (2016) ISRUC-Sleep: A comprehensive public dataset for sleep researchers. Comput Methods Programs Biomed 124:180–192 Kim J, Lee SY, Kim JH, Shin DH, Oh EH, Kim JA, Cho JW (2024) ChatGPT vs. sleep disorder specialist responses to common sleep queries: Ratings by experts and laypeople. Sleep Health 10(6):665–670 Riitta Hari and Riitta Salmelin (1997) Human cortical oscillations: a neuromagnetic view through the skull. Trends Neurosci 20(1):44–49. 10.1016/S0166-2236(96)10065-5 Additional Declarations The authors declare potential competing interests as follows: There is no conflict of interest for both the authors Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-6605863","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":453003598,"identity":"b945f5aa-0fdf-4e02-9ee6-a624f98fb41c","order_by":0,"name":"Amala Ann KA","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABQ0lEQVRIie3QMUvDQBTA8RcCyXIl60lK/QoXMtSCtF8lR6FdigiCBAQNFOJSdJMW8TvEzfFKoFmCs3JLQwcXC+kWoYPJWQupoeAmcv8h9+DuB48AyGR/M4WJQ/fE4YAqTiy+xRWqMF8EsS1RvC1h+wh2NgQEKd2VMu7COcugfdI8WCaJGK71t0Xt6YjemlESrnyoN8sE8x6ZjqB71rrv2zbKh3qIrGEtxnRyMyBs6gNqeWXDHcIQqDTgPc3MF6FjFSnDmo9tEiMQhJS3O+T9dLqGKxq8zvSPTBA9EaQT6/MqQviAhAhCGrxoWv5zCgJWQRoEAakiFh+chnUS0SDuqSYi3WIxa/KQExznj+NnvEsavP+YvLsXNIhmyipz23RsRPN06V8iY6QvUvf8uLNDNuv9GL5TNFzxfn/rXwuZTCb7d30CHXV3pXvyuH0AAAAASUVORK5CYII=","orcid":"","institution":"","correspondingAuthor":true,"prefix":"","firstName":"Amala","middleName":"Ann","lastName":"KA","suffix":""},{"id":453003599,"identity":"1f6bf6aa-664e-45b2-a7f3-9bb0e483f380","order_by":1,"name":"Dr Vaidhehi V","email":"","orcid":"","institution":"","correspondingAuthor":false,"prefix":"Dr","firstName":"Vaidhehi","middleName":"","lastName":"V","suffix":""}],"badges":[],"createdAt":"2025-05-06 18:58:21","currentVersionCode":1,"declarations":{"humanSubjects":true,"vertebrateSubjects":false,"conflictsOfInterestStatement":true,"humanSubjectEthicalGuidelines":true,"humanSubjectConsent":true,"humanSubjectClinicalTrial":false,"humanSubjectCaseReport":false,"vertebrateSubjectEthicalGuidelines":false},"doi":"10.21203/rs.3.rs-6605863/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-6605863/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":82222034,"identity":"e0a74bdf-7cdf-4a21-ad33-4b56d3e04672","added_by":"auto","created_at":"2025-05-08 02:47:41","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":93170,"visible":true,"origin":"","legend":"\u003cp\u003ePSG plot for a single participant\u003c/p\u003e","description":"","filename":"image1.png","url":"https://assets-eu.researchsquare.com/files/rs-6605863/v1/7ce25875b23d7726fbc159a0.png"},{"id":82222037,"identity":"63b51dda-6f69-40da-9245-16a32d0524dc","added_by":"auto","created_at":"2025-05-08 02:47:41","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":103845,"visible":true,"origin":"","legend":"\u003cp\u003ePSD calculated using the Welch’s method\u003c/p\u003e","description":"","filename":"image2.png","url":"https://assets-eu.researchsquare.com/files/rs-6605863/v1/f6250c18d35fb9d9026051ce.png"},{"id":82222663,"identity":"bcd32194-7b85-478b-9d3a-99e8f1dc717c","added_by":"auto","created_at":"2025-05-08 02:55:41","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":94245,"visible":true,"origin":"","legend":"\u003cp\u003eRAG Design\u003c/p\u003e","description":"","filename":"image3.png","url":"https://assets-eu.researchsquare.com/files/rs-6605863/v1/d170ba3f2cf0dcd5dc7ff617.png"},{"id":82222035,"identity":"c5d1f123-b38c-4b35-8163-9da188cce95c","added_by":"auto","created_at":"2025-05-08 02:47:41","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":113419,"visible":true,"origin":"","legend":"\u003cp\u003eWord frequency visualization for the external knowledge resource\u003c/p\u003e","description":"","filename":"image4.png","url":"https://assets-eu.researchsquare.com/files/rs-6605863/v1/720d1fdc5e7ed3437465497f.png"},{"id":82222664,"identity":"14e80ff3-ec6d-43ee-a311-3a4003d16290","added_by":"auto","created_at":"2025-05-08 02:55:41","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":36633,"visible":true,"origin":"","legend":"\u003cp\u003eExample 1\u003c/p\u003e","description":"","filename":"image5.png","url":"https://assets-eu.researchsquare.com/files/rs-6605863/v1/3636805820b1bd3eb2867697.png"},{"id":82222665,"identity":"2a7966be-0d56-4c29-8825-463e2444f2c6","added_by":"auto","created_at":"2025-05-08 02:55:41","extension":"png","order_by":6,"title":"Figure 6","display":"","copyAsset":false,"role":"figure","size":28950,"visible":true,"origin":"","legend":"\u003cp\u003eExample 2\u003c/p\u003e","description":"","filename":"image6.png","url":"https://assets-eu.researchsquare.com/files/rs-6605863/v1/8e33e067d7489b52dcdcda58.png"},{"id":82222667,"identity":"b1270634-2d96-4cf5-9ccb-8f7c3ee5b836","added_by":"auto","created_at":"2025-05-08 02:55:41","extension":"png","order_by":7,"title":"Figure 7","display":"","copyAsset":false,"role":"figure","size":30256,"visible":true,"origin":"","legend":"\u003cp\u003eExample 3\u003c/p\u003e","description":"","filename":"image7.png","url":"https://assets-eu.researchsquare.com/files/rs-6605863/v1/eb1ad7c6e35156dd0fe10340.png"},{"id":82222926,"identity":"ca033590-7aad-4156-8f26-4d91062906d8","added_by":"auto","created_at":"2025-05-08 03:03:42","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1585049,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-6605863/v1/bbacee38-e555-4abc-a86b-de8f8208daa9.pdf"}],"financialInterests":"The authors declare potential competing interests as follows: There is no conflict of interest for both the authors","formattedTitle":"\u003cp\u003eSleepBert: An Intelligent Clinical Encyclopaedia for Sleep Disorders Using Large Language Models\u003c/p\u003e","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eSleep is an important biological function that serves to sustain cognitive processes, emotional well-being, and physical health. Sleep disturbances have been shown to result in numerous health issues, such as cardiovascular disease, cognitive disorders, and psychiatric illnesses. Sleep disorders touch the lives of millions of people across the world, with Obstructive Sleep Apnea (OSA) occurring in about 936\u0026nbsp;million adults worldwide and insomnia occurring in 10\u0026ndash;30% of the world's population. Furthermore, rare genetic and neuro-cognitive disorders tend to be characterized by abnormal sleep microarchitecture, hence the importance of early and accurate diagnosis for successful intervention. Though clinically valuable, Polysomnography (PSG), the gold standard for sleep disorder diagnosis[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e], is a manual time-consuming process involving expert interpretation of EEG, EOG, EMG, and other physiological signals. The constraint prevents identification of rare disorder co-occurrences and retards data-driven discoveries in sleep research.\u003c/p\u003e \u003cdiv id=\"Sec2\" class=\"Section2\"\u003e \u003ch2\u003e1.1. Vital Features for Sleep Study\u003c/h2\u003e \u003cp\u003eQuantitative sleep analysis is dependent on specific PSG characteristics representing brain activity and physiological changes in sleep. These characteristics are: EEG Bandpower: Quantifies brain wave activity (Delta, Theta, Alpha, Beta, Gamma) to analyze sleep stages and disturbances.\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eSleep Spindles: Correlated with memory consolidation and stability of NREM sleep.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eK-Complexes: Represents sensory processing in slow-wave sleep.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eEye Movements (EOG): Identifies REM sleep and differentiates between sleep phases.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eMuscle Tone (EMG): Detects muscle atonia in REM and movement disorders.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eLarge Language Models (LLMs) role in Sleep Study\u003c/b\u003e \u003c/p\u003e \u003cp\u003eRecent developments in Large Language Models (LLMs) provide new opportunities for the automation and augmentation of medical diagnostics. LLMs like BERT and ClinicalBERT can handle sophisticated clinical narratives and combine structured and unstructured medical information. In sleep research, LLMs can be used to analyze PSG features, detect patterns, and offer evidence-based conclusions. Conventional LLMs are not effective with specialized medical questions and need regular updates to remain in sync with new research.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec3\" class=\"Section2\"\u003e \u003ch2\u003e1.2. Our Approach: Objectives, Methodology, and Rationale\u003c/h2\u003e \u003cp\u003eTo overcome the shortcomings of manual PSG interpretation and restricted LLM performance, we introduce SleepBert, a hybrid RAG model that is fine-tuned for extensive sleep disorder analysis. SleepBert is constructed by fine-tuning ClinicalBERT on NCH (children's dataset) and ISRUC datasets' PSG data. SleepBert integrates structured PSG features (e.g., EEG bandpower, sleep spindles, K-complexes) with unstructured clinical notes and retrieves real-time evidence from PubMed. This allows SleepBert to:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eMultimodal integration and Specialized query response.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eImprove Decision Support: Deliver evidence-informed insights to help clinicians.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eFind Rare Co-Occurrences: Uncover unusual disorder pairs and upcoming trends.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eScale Research: Facilitate large-scale investigation of neuro-cognitive and genetic markers associated with sleep disorders.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"2. Literature review","content":"\u003cp\u003eThe literature study was conducted on works related to LLMs between the years 2022\u0026ndash;2024.\u003c/p\u003e \u003cdiv id=\"Sec5\" class=\"Section2\"\u003e \u003ch2\u003e2.1. Related work on LLMs associated with health study\u003c/h2\u003e \u003cp\u003eGhali et al in their research [\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e] tackles the challenge using Retrieval-Augmented Generation (RAG), a method that improves model responses by basing them on factual knowledge. To address scalability issues, the study explores linking user queries with sophisticated language models such as BERT and Orca2 using an innovative query optimization process. The research compares three scenarios: first, without RAG; second, without extra help; and lastly, with augmented query support. Empirical findings, obtained from schizophrenia-related questions, show a significant enhancement in the performance of the base language model when RAG is applied, especially when it is augmented by prompt augmenters. BERT returns the best accuracy among models under test; however, its computational time is the highest. There is another study[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e] discussing a system that tunes a GPT-4-based LLM and couples it with a vector database using RAG to provide increased personalization of care plans. Diagnostic reports generated by AI, tested and rated by clinical physicians, reached 90% accuracy and 88% readability score according to major clinical parameters. Yu et al in their article [\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e] presents Health-LLM, a new system that encompasses large-scale feature extraction, accurate medical knowledge scoring, and machine learning methods in order to provide improved analysis for patient health reports. The system outperforms GPT-3.5, GPT-4, and fine-tuned LLaMA 2 by a wide margin in predicting future diseases.\u003c/p\u003e \u003cp\u003eSouLLMate[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e] is a responsive LLM-based system that incorporates large language model technologies, Chain, Retrieval-Augmented Generation (RAG), prompt engineering, and domain expertise. It provides sophisticated capabilities, such as Risk Detection, Proactive Guidance Dialogue, and Conversational Information Extraction through RAG-based personalized profile uploads. The performance of the system for mental health pre-screening was tested using the DAIC-WOZ database, which centers on psychological distress disorders like anxiety, depression, and PTSD.\u003c/p\u003e \u003cp\u003eIn zero-shot settings, SouLLMate performed at 80% accuracy in clinical mental health evaluations, which speaks to its strength in detecting psychological risks. The sleep health and lifestyle dataset employed in this research [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e] is taken from the Kaggle website. This work examines using the superior language and reasoning ability of large language models (LLMs) to automatically detect sleep disorders. LLMs were trained on data that includes sleep patterns, lifestyle habits, and associated health indicators, applying three new prompting strategies to drive classifier design, training, and evaluation. The outcome shows that an SVM classifier, determined by decomposed prompting, obtained 91.9% accuracy (F1-score: 0.919), performing much better compared to conventional zero-shot and few-shot approaches.One of the studies uses[\u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e] basic sleep metrics that were recovered from polysomnography (PSG) notes of veterans within the Corporate Data Warehouse (CDW) national database via large language models (LLMs). The model's accuracy was tested on 464 human-annotated notes and proved as accurate as human extraction for sleep efficiency (SE) and total sleep time (TST). Interestingly, LLM performed at a 7.6% improvement in obtaining sleep onset latency (SOL) over human annotation, signaling its improved accuracy in the identification of certain sleep parameters. Khaokaew et al [\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e] brought about ZzzGPT. Of note, it also improved by 7.6% in the extraction of sleep onset latency (SOL) compared to human annotation, reflecting its improved specificity in the identification of individual sleep parameters.Lastly. Sano et al in their paper [\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e] investigates the application of large language models (LLMs) for predicting attention states, sleep stages, and sleep quality, along with producing tailored sleep improvement recommendations and adaptive guided imagery scripts using electroencephalogram (EEG) and physical activity data (e.g., waveforms, power spectrogram images, and numeric features.\u003c/p\u003e \u003cp\u003e \u003cb\u003eLimitations of existing study\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eMost models are only trained on certain datasets (e.g., IMCS-21, DAIC-WOZ, CDW) that potentially lack diversity across populations. This restricts their use for wider, real-world clinical environments in various age groups, ethnicities, and health conditions.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThe models are based on existing, static databases that do not necessarily reflect current physiological alterations or longitudinal trends in patient health. This limits their capacity to adjust to changing patient conditions over time.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eAlthough some research incorporates multimodal data (e.g., EEG waveforms, physical activity, textual notes), there is no common framework for integrating structured and unstructured data in a seamless manner. This may result in incomplete or biased predictions.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eMost of the LLM-based systems target mental disorders (e.g., anxiety, PTSD) and lifestyle diseases, without any specialized models optimized for sleep disorder detection and characterization. This limitation prevents existing frameworks from recognizing and interpreting intricate sleep-related pathologies such as sleep apnea, insomnia, narcolepsy, and parasomnias.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"3. Need for research","content":"\u003cp\u003ePrecise diagnosis and individualized treatment of sleep disorders continue to be a challenging task because PSG data is complex and heterogeneous. Although current studies use large language models (LLMs) for broad medical applications, there is an urgent need to bridge the gap between structured PSG signals and unstructured clinical notes to analyze the entire range of sleep microarchitecture.\u003c/p\u003e \u003cp\u003e \u003cb\u003eLimitations of Existing Approaches\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eMost recent LLM-based studies concentrate on mental illness and lifestyle disorders, omitting the holistic interpretation of PSG data obtained from overnight experiments.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eNone of the current LLM models are sleep architecture-specific and can deal with the multi-modal PSG signals (EEG, EOG, EMG, ECG, and respiratory channels).\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003e \u003cb\u003eChallenges in PSG Data Interpretation\u003c/b\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003ePSG signals are multi-modal and need sophisticated methods to extract meaningful patterns from heterogeneous sources, such as sleep stages, respiratory events, and neurophysiological markers.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cdiv id=\"Sec7\" class=\"Section2\"\u003e \u003ch2\u003e3.1. Need for a Specialized PSG-Focused LLM-RAG System\u003c/h2\u003e \u003cp\u003eSleep disorders continue to be under-researched, especially their relation to genetic and neurological disorders. Although PSG information offers an exhaustive perspective of sleep architecture, its capability to detect hidden neuro-cognitive deficits and genetic markers has not been well researched. Current studies concentrate largely on mental and lifestyle-linked sleep disturbances with little research focusing on uncommon sleep disorders and their biological foundations. In addition, multi-modal PSG signals, which record various physiological processes, tend to be investigated in silo, losing significant cross-signal interactions that can elucidate the connection between sleep dysfunction and neurologic or genetic disorders. The absence of an area-specific system that can accommodate the complexity of PSG data and its relation to neuro-cognitive and genetic bases restricts delivering precise diagnoses and tailored treatments[\u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e]. Fulfilling this need is instrumental in moving towards precision medicine for sleep research and enhancing the realization of how the microarchitecture of sleep mirrors conditions of systemic health.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec8\" class=\"Section2\"\u003e \u003ch2\u003e3.2. Problem Statement\u003c/h2\u003e \u003cp\u003eAlthough large language models (LLMs) have been promising in clinical reasoning and information extraction, most current models tend to concentrate on lifestyle-dependent sleep habits or mental health use cases [\u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e].\u003c/p\u003e \u003cp\u003eThe present study attempts to fill these voids by creating a hybrid PSG-centric LLM-RAG system that integrates:\u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eMulti-Modal Integration: Leveraging the LLM on structured PSG features (e.g., EEG bands, EOG, sleep architecture) and unstructured clinical notes (e.g., sleep stage annotations, diagnosis details, and demographics). This facilitates holistic modelling of biological signals and clinical context.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003ePSG-Specific Knowledge Retrieval: Developing a specialized knowledge retrieval layer that augments RAG using PSG data and PubMed articles related to the literature based meta-analysis and other biomarkers. This guarantees that the model can retrieve and respond based on clinical evidence.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eSpecialized Query Response: Enhancing query interpretation for sleep disorders using domain-specific prompting and PSG-guided retrieval, enabling more precise and clinically relevant outputs.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003cp\u003eThe system to be proposed will enable automated sleep parameter extraction, improved diagnostic understanding, and tailored intervention suggestions, filling the gap between state-of-the-art language modelling and PSG-driven sleep disorder analysis.\u003c/p\u003e \u003c/div\u003e \u003cdiv id=\"Sec9\" class=\"Section2\"\u003e \u003ch2\u003e3.4. Expected Impact of this research\u003c/h2\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eThe system acts as an Encyclopaedia of sleep disorders, giving medical professionals such as doctors, sleep specialists, and researchers a one-stop shop for accurate, evidence-based information. It brings\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003cdiv class=\"BlockQuote\"\u003e \u003cp\u003etogether knowledge from PSG data, clinical notes, and specialist literature to facilitate instant access to information on uncommon genetic and neuro-cognitive markers associated with sleep abnormalities.\u003c/p\u003e \u003c/div\u003e \u003c/p\u003e \u003cp\u003e \u003cul\u003e \u003cli\u003e \u003cp\u003eIt is meant to complement but not replace clinical know-how. It is a decision-support system that provides sound interpretations of intricate PSG data and proposes pertinent relationships with sleep disorders, enabling clinicians to decide while maintaining clinical control.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eThe PSG-specific knowledge retrieval layer maximizes information retrieval from expert medical databases (e.g., PubMed). This guarantees that the system provides clinically applicable and current evidence, enabling large-scale medical studies and exploring new patterns in sleep research.\u003c/p\u003e \u003c/li\u003e \u003cli\u003e \u003cp\u003eAccelerating Sleep Disorder Research: The infrastructure offers a scalable research environment for studying rare sleep disorders and their overlap with neurological and genetic diseases. It enables large-scale analysis of PSG data sets, supporting data-driven findings and allowing researchers to discover new clinical correlations.\u003c/p\u003e \u003c/li\u003e \u003c/ul\u003e \u003c/p\u003e \u003c/div\u003e"},{"header":"4. Methodology","content":"\u003cp\u003eThe methodology section is divided into three major parts starting with the Data, Modelling and the Results.\u003c/p\u003e\n\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\n\u003ch2\u003e4.1. Data\u003c/h2\u003e\n\u003cp\u003eWe have used a combination of ISRUC-Sleep Dataset and the NCH Dataset for our study.\u003c/p\u003e\n\u003cp\u003eISRUC data was collected from human adults[\u003cspan class=\"CitationRef\"\u003e14\u003c/span\u003e], healthy subjects, and subjects with sleep disorders who were under the influence of sleep medication. The data set, which is designed to accommodate various research goals, contains three sets of data:data related to 100 subjects, It also contains data collected from one recording session pertaining to 10 healthy subjects, which are helpful for studies dealing with comparison of healthy subjects with the patients having sleep disorders.\u003c/p\u003e\n\u003cp\u003eThe NCH Sleep DataBank consists of 3,984 sleep studies performed on 3,673 unique patients. Of them, 3,400 patients have one sleep study in the dataset, 238 have two studies, and 35 patients have more than two studies, with a maximum of 5 sleep studies for one patient. In terms of gender distribution, 2,068 patients were male, and 1,604 were female, with one unknown.\u003c/p\u003e\n\u003cdiv id=\"Sec12\" class=\"Section3\"\u003e\n\u003ch2\u003e4.1.1. Inclusion Criteria of participants\u003c/h2\u003e\n\u003cp\u003eSubjects for this study were randomly selected from two publicly available datasets: NCH and ISRUC, and 30 subjects from each dataset (60 subjects in total) as depicted in Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e. The following inclusion criteria were used:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003eAvailability of PSG Data: Subjects should have full overnight PSG recordings that include EEG, EOG, EMG, ECG, snore microphone, and pressure flow signals.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eAge Range: NCH dataset: Participants between the ages of 2 and 18 years (paediatric group); ISRUC dataset: Adult group of participants aged 18 years and above.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eThis is to prevent our model from being biased towards a particular gender or age group alone.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eSleep Disorder Diagnosis: Participants with clinically confirmed sleep disorders (e.g., insomnia, sleep apnea, narcolepsy, and other neuro-cognitive disorders) were included for in-depth disorder analysis.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eData Quality: High-quality recordings with low artifact contamination and correct clinical annotations (sleep stage scoring and diagnostic information) were chosen.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eAge Range: Participants were recruited from various age groups to provide representative coverage of age-related sleep differences.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eDemographic Diversity: Male and female participants were recruited to identify gender-based differences in sleep patterns and disorder presentation.\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003ctable id=\"Tab1\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003eDemographics of data\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eDataset\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eAge Interval (Years)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eMale (n)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eFemale (n)\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eTotal (n)\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eNCH\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e2\u0026ndash;6\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e7\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e5\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e12\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e7\u0026ndash;12\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e6\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e6\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e12\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e13\u0026ndash;18\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e3\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e3\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e6\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eISRUC\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e18\u0026ndash;30\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e5\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e4\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e9\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e31\u0026ndash;50\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e6\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e6\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e12\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e51\u0026ndash;70\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e4\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e5\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e9\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eTotal\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e31\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e29\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"char\" char=\".\"\u003e\n\u003cp\u003e60\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section3\"\u003e\n\u003ch2\u003e4.1.2. Selection Criteria of input features\u003c/h2\u003e\n\u003cp\u003eTable\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e presents an overview of the selected input feature points.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003ctable id=\"Tab2\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003eFeature selection for the model\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eFeature Category\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eFeatures\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eDescription\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd rowspan=\"13\" align=\"left\"\u003e\n\u003cp\u003ePSG Signal Features\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eEEG (Electroencephalography)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eBrain activity during sleep across different regions.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eFrontal EEG (F4-M1, F3-M2)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eMonitors activity in the frontal cortex.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eOccipital EEG (O2-M1, O1-M2)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eTracks visual and posterior brain activity.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eEOG (Electrooculography)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eMonitors eye movements, crucial for REM detection.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eLeft Outer Canthus (E1-M2)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eCaptures left-eye movements.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRight Outer Canthus (E2-M2)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eCaptures right-eye movements.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eEMG (Electromyography)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eMeasures muscle tone, distinguishing REM from non-REM.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eChin EMG (EMG1, EMG2, EMG3)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRecords muscle activity in the chin region.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eECG (Electrocardiography)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eCaptures heart rate variability (HRV).\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eECG (ECG1-ECG2)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eMeasures cardiac activity.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eRespiratory Signals\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eMonitors breathing patterns for event detection.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSnore Microphone (Snore)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eCaptures snoring patterns.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePressure Flow (Pflow)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eMonitors airflow to detect apneas.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd rowspan=\"5\" align=\"left\"\u003e\n\u003cp\u003eClinical and Demographic Features\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSleep Stage Annotations\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eEpoch-level sleep stages (Wake, N1, N2, N3, REM).\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eDiagnosis Details\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSpecific sleep disorders and neuro-genetic conditions.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eDemographic Data\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eContextual patient data (age, gender, etc.).\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eAge\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eAge range of participants.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eGender\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eMale/Female count.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd rowspan=\"9\" align=\"left\"\u003e\n\u003cp\u003eDerived and Statistical Features\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eTotal Sleep Time (TST)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eTotal time spent asleep.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSleep Efficiency (SE)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eEfficiency of sleep relative to time in bed.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSleep Onset Latency (SOL)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eTime to transition from wake to sleep.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eREM Latency\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eTime from sleep onset to the first REM stage.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eEvent Annotations\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eDetection of abnormal sleep events.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eApneas \u0026amp; Hypopneas\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eAbnormal breathing events.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eArousals\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSudden EEG bursts (\u0026gt;\u0026thinsp;16 Hz).\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePower Spectral Density (PSD)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eFrequency-based EEG feature analysis.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePubMed Articles Related to Sleep Studies\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eIndexed research articles for knowledge retrieval.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd rowspan=\"4\" align=\"left\"\u003e\n\u003cp\u003eScientific Literature Features\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eSleep disorder frameworks\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eClassification standards (e.g., ICSD-3) [\u003cspan class=\"CitationRef\"\u003e9\u003c/span\u003e]\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eEEG biomarkers\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePatterns linked to neuro-cognitive conditions.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eGenetic markers and sleep\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eInsights on genetic influences on sleep.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eInformation Extraction Process\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eDynamic retrieval for enhanced predictions.\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003eData Preparation and Method of calculation of each feature\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe PSG signal characteristics include multi-channel overnight physiological recordings such as EEG, EOG, EMG, ECG, and respiration signals (snore microphone and pressure flow). For dataset consistency, the signals are resampled to a single frequency using the MNE-Python library. The datasets are separated into 30-second epochs, a common technique in sleep studies, where n epochs constitute the entire recording of each participant. For each epoch, we calculate the mean value for each signal, so that one representative value is obtained per epoch. The resultant processed dataset for PSG signals comprises participant_id, epoch, and the calculated mean value for each PSG feature as depicted in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e\n\u003cp\u003eDerived features are calculated by statistical processing of the PSG signals and reflect sophisticated sleep architecture and detection of abnormal events[\u003cspan class=\"CitationRef\"\u003e11\u003c/span\u003e]. Total Sleep Time (TST) is defined as the total of all non-wake epochs, and Sleep Efficiency (SE) is calculated as the TST divided by the total time in bed (TIB), multiplied by 100 to be expressed as a percentage. Sleep Onset Latency (SOL) is defined as the interval between the lights-off event and the onset of sleep (first appearance of the N1 stage), and REM Latency as the time between the onset of sleep and the first REM epoch. Atypical respiratory events such as apneas and hypopneas are identified via amplitude-based thresholding and duration of the events, and arousals by finding short bursts of high-frequency EEG activity (\u0026gt;\u0026thinsp;16 Hz). To measure frequency-based dynamics quantitatively, Power Spectral Density (PSD) is calculated for every epoch for all EEG bands with Welch's method [\u003cspan class=\"CitationRef\"\u003e10\u003c/span\u003e], yielding a frequency-domain measure of brain activity. Figure\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e illustrates the PSD of the first subject.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\n\u003ch2\u003e4.2. Experimental setup\u003c/h2\u003e\n\u003cp\u003eThe experiment was executed in Python 3.12, and dependencies were installed for the model accordingly. Training was done with two RTX 2080-Ti GPUs. In the training procedure. The random seed was set to 42 for the entire training process for both the training processes to achieve reproducibility.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\n\u003ch2\u003e4.3. Model Architecture and Training Process\u003c/h2\u003e\n\u003cp\u003e\u003cstrong\u003eChoice of ClinicalBERT as the Base LLM\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWe chose ClinicalBERT as our base language model because of its dedicated training on biomedical corpora of PubMed and MIMIC-III clinical notes, which specifically makes it well suited for grasping medical jargon and patient narratives. Moreover, ClinicalBERT's Bidirectional Transformer-based architecture accurately captures contextual relationships, which is essential when deciphering intricate sleep-stage annotations and sleep patterns versus neurological or genetic disorders' correlations.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;\u003c/p\u003e\n\u003cdiv id=\"Sec16\" class=\"Section3\"\u003e\n\u003ch2\u003e4.3.1. Data Collection and Preprocessing\u003c/h2\u003e\n\u003cp\u003eThe first step is the accumulation of a specialized dataset targeting sleep disorders, neuro-cognitive disorders, and genetic markers. SleepBert's knowledge base is comprised of three main sources:\u003c/p\u003e\n1.PSG Data: Polysomnography-structured features (e.g., EEG, EOG, EMG signals) augmented with demographic data and sleep stage annotations.\u003c/div\u003e\n\u003c/div\u003e\n\u003cp\u003e2.Clinical Notes: Unstructured text including diagnostic reports, patient history, and descriptions of sleep architecture.\u003c/p\u003e\n\u003cp\u003e3.Medical Literature: Filtered scientific articles from PubMed with a focus on sleep-relevant genetic mutations, neurophysiological tendencies, and uncommon sleep disorders.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eHow is the External Medical Literature prepared ?\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eBy utilizing the NCBI Entrez API, we build advanced-level search queries that use (Medical Subject Heading) MeSH terms and Boolean operators to narrow down articles from the past decade. This search approach focuses on major themes including EEG patterns, genetic mutations, and unusual sleep disorders (as shown in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e4\u003c/span\u003e), to provide exhaustive coverage of applicable biomedical studies. The articles that are retrieved go through preprocessing involving text cleaning and Named Entity Recognition (NER) with ScispaCy, to identify vital information such as gene mentions, sleep conditions, and neurophysiological markers. Each snippet of full-text and abstract is then converted into dense vector embeddings by ClinicalBERT, which encode the contextual meaning of the text. These embeddings are indexed in a high-performance FAISS index, allowing for efficient similarity search. Upon receiving a query from a user, the system converts the query into an embedding and conducts a nearest-neighbor search over the indexed knowledge base to return the most relevant scientific contexts. This retrieved information is blended with the user query and refined PSG data to generate a complete, evidence-based response. This external biomedical retrieval layer enriches SleepBert by coupling real-time scientific knowledge with its learned representations to generate up-to-date and contextually rich responses for intricate sleep-related queries.\u003c/p\u003e\n\u003cp\u003eThe data is then normalized to eliminate noise (special characters, citations), and segmented into 512-token chunks for input size constraints of the transformer.\u003c/p\u003e\n\u003cdiv id=\"Sec18\" class=\"Section2\"\u003e\n\u003ch2\u003e4.3.2. SleepBert: ClinicalBERT Fine-Tuning\u003c/h2\u003e\n\u003cp\u003eAs mentioned before, the base model is ClinicalBERT, a transformer pre-trained on biomedical text. It is fine-tuned on both the merged PSG data and clinical notes to identify domain-specific patterns. Fine-tuning improves SleepBert's capability to recognize and relate sleep microarchitecture to neuro-cognitive and genetic disorders and sleep disorders.\u003c/p\u003e\n\u003cp\u003eFor a mixed dataset of both structured and unstructured inputs, the fine-tuning takes a multi-task approach:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003eMasked Language Modeling (MLM): To enhance contextual comprehension of medical vocabulary and PSG-related terminologies.\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eSequence Classification: To link certain sleep stages and types of disorders with their neurophysiological patterns.\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eThe result of this step is SleepBert, a domain-specific BERT variant that can encode sophisticated sleep-related queries and provide medically correct responses.\u003c/p\u003e\n\u003cdiv id=\"Sec19\" class=\"Section3\"\u003e\n\u003ch2\u003e4.3.3. Embedding Generation and Storage\u003c/h2\u003e\n\u003cp\u003eSleepBert converts the aggregated texts into compact embeddings\u0026mdash;768-dimensional vectors that represent the semantic meaning of each chunk. This is done for both the medical literature and the PSG-clinical dataset. These embeddings are kept in a vector database to facilitate quick and efficient querying by similarity. Each vector has an association to its originating source (PSG observation, clinical note, or PubMed article) to maintain context accuracy.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec20\" class=\"Section3\"\u003e\n\u003ch2\u003e4.3.4. Query Processing and Contextual Retrieval\u003c/h2\u003e\n\u003cp\u003eWhen a query is submitted (e.g., \"What are the genetic mutations linked with sleep apnea?\"), the system converts it to a query embedding with SleepBert. It is matched with the precomputed vectors in the database using cosine similarity. The top-k most similar contexts covering PSG patterns, clinical annotations, and PubMed articles are fetched to be augmented.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec21\" class=\"Section3\"\u003e\n\u003ch2\u003e4.3.5. Augmentation and Context Fusion\u003c/h2\u003e\n\u003cp\u003eThe RAG system uses the user query along with the extracted contexts to generate an augmented prompt. This improved input augments SleepBert's understanding with PSG-specific and clinical knowledge as well as external medical literature.\u003c/p\u003e\n\u003cp\u003eFor instance, the enhanced input may include:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003eUser query: \u003cem\u003e\"What genetic mutations are linked to sleep apnea?\"\u003c/em\u003e\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003eRetrieved context:\u003c/p\u003e\n\u003cul\u003e\n\u003cli\u003e\n\u003cp\u003e\"PER3 variants increase susceptibility to sleep apnea through circadian modulation.\"\u003c/p\u003e\n\u003c/li\u003e\n\u003cli\u003e\n\u003cp\u003e\"HLA-DQB1*06:02 allele is more prevalent in patients with severe obstructive sleep apnea.\"\u003c/p\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec22\" class=\"Section3\"\u003e\n\u003ch2\u003e4.3.6. Response Generation with SleepBert\u003c/h2\u003e\n\u003cp\u003eSleepBert processes the \u003cstrong\u003eaugmented input\u003c/strong\u003e to generate a comprehensive, context-aware response. The model synthesizes knowledge from PSG data and external literature, ensuring that the output reflects \u003cstrong\u003ecurrent clinical understanding\u003c/strong\u003e.\u003c/p\u003e\n\u003cp\u003eFor the genetic mutation query, SleepBert might generate:\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e\"Genetic mutations linked to sleep apnea include \u003cstrong\u003ePER3 polymorphisms\u003c/strong\u003e, which affect circadian rhythms and sleep architecture. \u003cstrong\u003eHLA-DQB1\u003c/strong\u003e variants, particularly \u003cstrong\u003eHLA-DQB1*06\u003c/strong\u003e\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003e02\u003c/strong\u003e, \u003cem\u003eare associated with increased severity of obstructive sleep apnea. Furthermore, mutations in\u003c/em\u003e \u003cstrong\u003eCLOCK\u003c/strong\u003e \u003cem\u003eand\u003c/em\u003e \u003cstrong\u003eBMAL1\u003c/strong\u003e \u003cem\u003einfluence sleep fragmentation and circadian disruption.\"\u003c/em\u003e\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec23\" class=\"Section3\"\u003e\n\u003ch2\u003e4.3.7. Output Layer and Source Attribution\u003c/h2\u003e\n\u003cp\u003eThe final response is presented with \u003cstrong\u003esource citations\u003c/strong\u003e to maintain transparency and facilitate further inquiry. SleepBert automatically maps each retrieved context to its originating data source.\u003c/p\u003e\n\u003cp\u003eSome examples of how the SleepBert responds to the user queries are shown in Fig.\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e5\u003c/span\u003e,\u003cspan class=\"InternalRef\"\u003e6\u003c/span\u003e, and \u003cspan class=\"InternalRef\"\u003e7\u003c/span\u003e.\u003c/p\u003e\n\u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec24\" class=\"Section2\"\u003e\n\u003ch2\u003e4.4. Evaluation Metrics\u003c/h2\u003e\n\u003cp\u003eTo compare the performance of SleepBert with the baseline models, ClinicalBERT and BERT, on our particular application of examining PSG features and clinical questions, we compare them on our specific use case. We employ various metrics to determine the accuracy and relevance of the models, such as the BLEU score (Bilingual Evaluation Understudy) [\u003cspan class=\"CitationRef\"\u003e12\u003c/span\u003e], which calculates the similarity between the generated text and reference outputs and reflects the quality of text generation. Bleu estimates readability between generated and reference text through n-gram matching. It is measured both in terms of precision and recall through n-gram overlap. In spite of coming under criticism, Bleu is yet to be deprecated for the purpose of textual fluency evaluation[\u003cspan class=\"CitationRef\"\u003e11\u003c/span\u003e]. Basically, Bleu calculates the geometric mean of the n-gram precision of the generated text with respect to the reference. Moreover, a brevity penalty term is also added to Bleu for compensation of length differences between the generated and reference text. In addition, we monitor average input tokens (query tokens), average completion tokens (tokens in response), and derive these from response.usage to evaluate the effectiveness of responses from the model. The result obtained was computed by comparing the generated responses from the model against a manually annotated clinical query dataset and their respective correct answers. The output of each model was assessed for correctness depending on whether it accurately tagged the relevant PSG abnormalities, clinical conditions, and PubMed references as confirmed by domain experts as depicted in Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e3\u003c/span\u003e. The accuracy measure represents the proportion of correct responses over all queries evaluated. For response latency and other time metrics, we utilized response.usage offered by the Hugging Face API to monitor the processing time of every model[\u003cspan class=\"CitationRef\"\u003e13\u003c/span\u003e]. Response time was calculated from the instant that the query was posted to the model to the instant the response was completely generated.\u003c/p\u003e\n\u003cdiv class=\"gridtable\"\u003e\n\u003cdiv class=\"colspec\" align=\"left\"\u003e\u0026nbsp;\u003c/div\u003e\n\u003ctable id=\"Tab3\" border=\"1\"\u003e\u003ccaption\u003e\n\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\n\u003cdiv class=\"CaptionContent\"\u003e\n\u003cp\u003eComparison of SleepBert performance with BERT and ClinicalBert\u003c/p\u003e\n\u003c/div\u003e\n\u003c/caption\u003e\n\u003cthead\u003e\n\u003ctr\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eMetric\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eSleepBert\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eClinicalBERT\u003c/p\u003e\n\u003c/th\u003e\n\u003cth align=\"left\"\u003e\n\u003cp\u003eBERT\u003c/p\u003e\n\u003c/th\u003e\n\u003c/tr\u003e\n\u003c/thead\u003e\n\u003ctbody\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eAccuracy\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e93.40%\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e87.20%\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e80.90%\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eBLEU Score\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e0.81\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e0.75\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e0.68\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eAverage Input Tokens\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e512\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e510\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e520\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eAverage Completion Tokens\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e142\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e130\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e125\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eResponse Latency\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e5.4 seconds\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e5.8 seconds\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e6.1 seconds\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003ePubMed Retrieval Accuracy\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e90.1% (relevant references)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e88.3% (relevant references)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e83.5% (relevant references)\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003ctr\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003eCompletion Efficiency\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e94.3% (short, precise)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e92.7% (concise)\u003c/p\u003e\n\u003c/td\u003e\n\u003ctd align=\"left\"\u003e\n\u003cp\u003e89.4% (verbose)\u003c/p\u003e\n\u003c/td\u003e\n\u003c/tr\u003e\n\u003c/tbody\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n\u003c/div\u003e"},{"header":"5. Conclusion and Discussion","content":"\u003cp\u003eHere, we present SleepBert, a domain-specific RAG-LLM (Retrieval-Augmented Generation with a LLM framework that is specifically designed for end-to-end sleep disorder analysis. SleepBert combines PSG data, clinical text, and domain-specific medical literature (e.g., PubMed) to offer precise, evidence-based findings on sleep microarchitecture, neuro-cognitive disorders, and genetic disorders. By fine-tuning ClinicalBERT over multimodal sleep data and adding a PSG-specific knowledge retrieval layer, we significantly improved the performance. SleepBert obtained 93.4% accuracy, overtaking both ClinicalBERT (87.2%) and BERT (80.9%), with 90.1% PubMed retrieval accuracy, guaranteeing accurate and timely evidence retrieval.The effect of SleepBert is multi-faceted. It serves as an Encyclopaedia of Sleep Disorders, equipping clinicians, researchers, and physicians with a single, centralized source for quick, trusted decision support.But there are some limitations to our method. The performance of the model can be different for unseen rare conditions, and retrieval precision is subject to the availability and quality of external literature.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eInstitution where the work was performed:\u003cbr\u003e\u003c/strong\u003eThis work was conducted at the Department of Data Science and Statistics, CHRIST (Deemed to be University), Bengaluru.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor Approval Statement:\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors have seen and approved the final version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclarations for Each Author:\u003c/strong\u003e\u003c/p\u003e\n\u003cul\u003e\n \u003cli\u003eFinancial Support: No funding was received for the conduct of this study.\u003c/li\u003e\n \u003cli\u003eConflict of Interest: The authors declare that there are no conflicts of interest.\u003c/li\u003e\n \u003cli\u003eClinical Trial Declaration: This manuscript does not report on a clinical trial.\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgments\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNCH Sleep DataBank was supported by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under Award Number R01EB025018. The National Sleep Research Resource was supported by the U.S. National Institutes of Health, National Heart Lung and Blood Institute (R24 HL114473, 75N92019R002).\u003c/p\u003e\n\u003cp\u003eISRUC-Sleep dataset contains data collected from all-night PSG recordings with duration around eight hours. Each recording was randomly selected between PSG recordings that were acquired by the Sleep Medicine Centre of the Hospital of Coimbra University (CHUC), in the period 2009\u0026ndash;2013.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthical Considerations\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eWith the advent of more advanced models and improved performance enhancement methods, there remains room for further improvements. SleepBert is sufficiently professional to serve as an assistant to mental health professionals. Importantly, SleepBert does not engage in scenarios requiring serious diagnoses, thereby maintaining ethical boundaries and ensuring that it complements rather than replaces professional judgment.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eDeclaration of Conflicting Interests\u0026nbsp;\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eIn relation to the research, writing, and/or publication of this work, the author(s) declared no conflicts of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe research, writing, and/or publication of this work were all done without financial assistance from the author(s)\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eGhali JPE, Shima K, Moriyama K, Mutoh A, Inuzuka N (2024) Enhancing Retrieval Processes for Language Generation with Augmented Queries to Provide Factual Information on Schizophrenia. Procedia Comput Sci 246:443\u0026ndash;452\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eHu HW, Lin YC, Chia CH, Chuang E, Ru YC (2024), July Leveraging Large Language Models for Generating Personalized Care Recommendations in Dementia. In \u003cem\u003e2024 IEEE International Workshop on Electromagnetics: Applications and Student Innovation Competition (iWEM)\u003c/em\u003e (pp. 1\u0026ndash;4). IEEE\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eYu Q, Jin M, Shu D, Zhang C, Fan L, Hua W, Zhang Y (2024) Health-LLM: Personalized Retrieval-Augmented Disease Prediction System. \u003cem\u003earXiv preprint arXiv:2402.00746\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eGuo Q, Tang J, Sun W, Tang H, Shang Y, Wang W SouLLMate: An Application Enhancing Diverse Mental Health Support with Adaptive LLMs, Prompt Engineering, and, Techniques RAG (2024) \u003cem\u003earXiv preprint arXiv:2410.16322\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao Y (2025) Automatic Sleep Disorder Classification Using Large Language Model Prompting on Sleep Health and Lifestyle Data\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eZhao Y (2025) Automatic Sleep Disorder Classification Using Large Language Model Prompting on Sleep Health and Lifestyle Data\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKhaokaew Y, Ji K, Nguyen TH, Kegalle H, Alaofi M, Xue H, Salim FD (2023) ZzzGPT: an interactive GPT approach to enhance sleep quality. \u003cem\u003earXiv preprint arXiv:2310.16242\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSano A, Amores J, Czerwinski M (2024) Exploration of LLMs, EEG, and behavioral data to measure and support attention and sleep. \u003cem\u003earXiv preprint arXiv:2408.07822\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eSano A, Amores J, Czerwinski M (2024) Exploration of LLMs, EEG, and behavioral data to measure and support attention and sleep. arXiv preprint arXiv :240807822\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim J, Lee SY, Kim JH, Shin DH, Oh EH, Kim JA, Cho JW (2024) ChatGPT vs. sleep disorder specialist responses to common sleep queries: Ratings by experts and laypeople. Sleep Health 10(6):665\u0026ndash;670\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eOhayon MM, Roberts RE (2001) Comparability of sleep disorders diagnoses using DSM-IV and ICSD classifications with adolescents. Sleep 24(8):920\u0026ndash;925\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eParhi KK, Ayinala M (2013) Low-complexity Welch power spectral density computation. IEEE Trans Circuits Syst I Regul Pap 61(1):172\u0026ndash;182\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eMahapatra J, Garain U (2024) Impact of model size on fine-tuned llm performance in data-to-text generation: A state-of-the-art investigation. \u003cem\u003earXiv preprint arXiv:2407.14088\u003c/em\u003e\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKhalighi S, Sousa T, Santos JM, Nunes U (2016) ISRUC-Sleep: A comprehensive public dataset for sleep researchers. Comput Methods Programs Biomed 124:180\u0026ndash;192\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eKim J, Lee SY, Kim JH, Shin DH, Oh EH, Kim JA, Cho JW (2024) ChatGPT vs. sleep disorder specialist responses to common sleep queries: Ratings by experts and laypeople. Sleep Health 10(6):665\u0026ndash;670\u003c/span\u003e\u003c/li\u003e \u003cli\u003e\u003cspan\u003eRiitta Hari and Riitta Salmelin (1997) Human cortical oscillations: a neuromagnetic view through the skull. Trends Neurosci 20(1):44\u0026ndash;49. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003e10.1016/S0166-2236(96)10065-5\u003c/span\u003e\u003cspan address=\"10.1016/S0166-2236(96)10065-5\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":true,"hideJournal":true,"highlight":"","institution":"Christ University","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Sleep Study, RAG, NCH, LLMs, Bert, Polysomnography","lastPublishedDoi":"10.21203/rs.3.rs-6605863/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-6605863/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003cp\u003eDiagnosis of sleep disorders is difficult owing to the nature of sleep microarchitecture and the heterogeneity of symptom presentation. Conventional analysis of Polysomnography (PSG)\u0026mdash;the interpretation of EEG bandpower, sleep spindles, and K-complexes\u0026mdash;is time-consuming, laborious, and subjective, restricting detection of infrequent co-occurrences of disorders and their link to neuro-cognitive and genetic disorders. To overcome these challenges, we present \u003cb\u003eSleepBert\u003c/b\u003e, a hybrid Retrieval-Augmented Generation (RAG) model that combines structured PSG features with unstructured clinical narratives for holistic sleep disorder analysis. Constructed by fine-tuning ClinicalBERT on PSG data from the NCH (paediatric dataset) and ISRUC datasets, SleepBert has a PSG-specific knowledge retrieval layer to retrieve real-time evidence from medical databases such as PubMed. The model delivered 93.40% accuracy, outdoing ClinicalBERT (87.20%) and BERT (80.90%), with 90.1% accuracy in retrieving PubMed and response latency of 5.4 seconds. This system serves as an Encyclopaedia of sleep disorders, delivering evidence-based, correct insights and support for decision making to clinicians and researchers. The system supports the analysis of a large number of PSGs, speeds up data-driven discoveries, and allows access to rare neuro-cognitive and genetic markers. SleepBert is an extensible platform for pushing the frontier of sleep disorder research and enhancing clinical decision-making through quick, accurate interpretations of sophisticated PSG data.\u003c/p\u003e","manuscriptTitle":"SleepBert: An Intelligent Clinical Encyclopaedia for Sleep Disorders Using Large Language Models","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-05-08 02:47:36","doi":"10.21203/rs.3.rs-6605863/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"3e6cfe2d-fa3d-43df-b982-d030e304675e","owner":[],"postedDate":"May 8th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[{"id":48174750,"name":"Computational Neuroscience"},{"id":48174751,"name":"Medical Informatics"},{"id":48174752,"name":"Artificial Intelligence and Machine Learning"}],"tags":[],"updatedAt":"2025-05-08T02:47:36+00:00","versionOfRecord":[],"versionCreatedAt":"2025-05-08 02:47:36","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-6605863","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-6605863","identity":"rs-6605863","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00