Exploring the Generalizability and Explainability of LLMs in Detecting Suicidal Ideation: The Impact of Data Heterogeneity

preprint OA: closed
Full text JSON View at publisher
Full text 136,897 characters · extracted from preprint-html · click to expand
Exploring the Generalizability and Explainability of LLMs in Detecting Suicidal Ideation: The Impact of Data Heterogeneity | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Article Exploring the Generalizability and Explainability of LLMs in Detecting Suicidal Ideation: The Impact of Data Heterogeneity Rong Huang, Longdi Xian, Christopher Chi Wai Cheng, Jie Chen, and 8 more This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7657467/v1 This work is licensed under a CC BY 4.0 License Status: Under Review Version 1 posted 10 You are reading this latest preprint version Abstract Objectives With the recent advancement of artificial intelligence (AI) and large language models (LLMs), the use of text analysis to detect suicidal ideation can be a promising tool. However, the performance of such detection system could be influenced by the language use difference caused by individuals’ alexithymic characteristics (difficulties in expressing emotion with unique language pattern), resulting in the subgroup disparity. The current study aims to explore the capability of a detection system on a clinical sample of heterogeneous language use (i.e., systematic difference in language use as influenced by patient characteristics and the language context). Methods AI models (classifiers) were trained with 5-fold cross-validation using clinical transcripts of 299 individuals (n = 193 with major depressive disorder and 106 controls without psychiatric problems) to detect suicidal ideation. More specifically, the topic-general classifier was trained using full clinical transcripts while the topic-specific classifiers (i.e., factorization models) were trained using specific sections of the clinical transcripts, focusing on either mood-related or suicide-specific topics. The performance of the classifiers was assessed in both groups (alexithymia and non-alexithymia) and whole sample. Mediation analyses were conducted to further investigate the role of language features in explaining the subgroup disparity. Results Results showed subgroup disparity in topic-general classifier between alexithymia and non-alexithymia groups at which alexithymia group was associated with a decreased likelihood of true detection of suicidal ideation (OR = 0.31, p < .001) and unique language features, such as family-related words (p = .02), played a mediating/explanatory role. Furthermore, topic-specific classifiers demonstrated superior performance (AUC = 0.96) compared to topic-general classifier (AUC = 0.83) and the subgroup disparity was largely reduced. Conclusion Models trained on a heterogeneous clinical population may not be equitably effective in detecting suicidal ideation in patient groups with and without alexithymia. The development of a factorization model is pertinent to enhance generalizability and equity, especially when patient characteristics are inaccessible or confidential for model training. Meanwhile, clinicians should interpret model predictions with caution due to the influence that patient characteristics might have on the model performance. Health sciences/Health care Biological sciences/Psychology Social science/Psychology Figures Figure 1 Figure 2 Figure 3 Introduction Suicide remains one of the leading causes of death worldwide, accounting for 1 out of every 100 deaths (WHO, 2021 ). The alarming trend of its increase over the past few decades underscores the urgent need for early detection. Language expression, as a reflection of internal mental states, has shown promise in providing valuable insights into the detection of suicidal thoughts, plan, and behavior (Homan et al., 2022 ). Studies using different data sources (ranging from social media to healthcare records) and outcome measures (ranging from self-report to clinician-rated) have found that increased suicide risk was associated with an increased use of words (Cheng et al., 2017 ; Kim et al., 2019 ), pronouns (Coppersmith et al., 2016 ; Zhang et al., 2021 ), negative words (Lekkas et al., 2021 ; Tadesse et al., 2020 ; Vioulès et al., 2018 ), but a decreased use of social-related words (Nguyen et al., 2017 ; Zhang et al., 2021 ), with medium to large effect size (Kim et al., 2019 ; Zhang et al., 2021 ). Recent advancements in artificial intelligence (AI) and large language models (LLMs) have enabled text analysis to better detect and predict suicidal ideation (Ji et al., 2021 ; Li et al., 2023 ). These techniques might provide clinicians with valuable insights into early detection, prevention and remedial measures. However, as the text was inputted as learning material, two sources of data heterogeneities might have been introduced (as shown in Table 1 ), influencing the model performance. First, the inter-individual data heterogeneity resulted from the difference in language use between individuals. For example, individuals with alexithymia, who have difficulties in recognizing, expressing, and describing their emotions, demonstrated a rather unique language pattern (Lam et al., under review; Welding & Samur, 2018 ). Research indicated that alexithymia individuals tended to use fewer emotional words when recalling personal experiences to avoid reliving associated emotion (Camia et al., 2020 ). In addition, alexithymia was linked to a less use of social-related words, reflecting poor interpersonal relationships resulting from deficits in emotional processing (Meganck et al., 2008 ). Other language features, such as word count and cognitive processing words, have also been linked to alexithymia characteristics (Welding & Samur, 2018 ). Such differences in language use as occurring in the clinical interview might confuse the AI model when identifying suicide-related language features, resulting in performance differences across patient groups, which is referred as subgroup disparity in AI models (Libin et al., 2024 ). As indicated by a recent large-scale study on AI fairness in healthcare, the subgroup AUC differences could be as large as 0.41 (Libin et al., 2024 ). Second, the intra-individual data heterogeneity introduced by the difference in language use across different contexts/topics within individuals. Text extracted from social media could vary significantly across topics, ranging from health to politics. Similarly, the clinical interview also contained conversations focusing on different topics, such as mood symptoms and suicidal ideation. While the use of language would vary across different topics, so were the changes in language indicators of suicidality (Li et al., 2023 ; Pennebaker & King, 1999 ). For example, the use of discrepancy words was a predictive indicator of suicide risk during the discussion on suicide-related topics while its predictive power diminished when the conversation shifted to daily activities (Li et al., 2023 ). Therefore, mixing the text content of different topics might weaken the significance of some language indicators, leading to poorer model performance. Indeed, a topic-specific model trained by Zhou et al. ( 2023 ) outperformed the topic-general model in the sentiment classification task. In the field of computer science, this process is called factorization which involves decomposing complex data into simpler components to make model more efficient (Lee & Seung, 2000 ). Hence, it is possible that by decomposing the intra-individual data heterogeneity (e.g., chunking the textual data to be more topic-specific), we can offload the task complexity for the AI model and consequently reduce the subgroup disparity. While there were some studies investigating the relationship of the language use and alexithymia or suicidal ideation (e.g., Camia et al., 2020 ; De Berardis et al., 2017 ; Huang et al., 2024 ), there was no study exploring the intricate interplay among language use, suicidal ideation expression, and alexithymia, particularly with respect to their influence on AI detection model performance. Given the heterogeneity of language use in clinical settings, understanding the impact of patient characteristics on AI detection capabilities and possible solutions may enhance the generalizability and explainability of these detection systems, especially when patient information is inaccessible or confidential, making it impossible to pre-train or tailor the model using patient-specific knowledge. Therefore, the current study asked two research questions. First, was AI equitably effective in detecting suicidal ideation between patient groups with and without alexithymia? If not, can such subgroup disparity be explained by language features? Second, will the decomposition of intra-individual data heterogeneity (i.e., factorization) help reduce the subgroup disparity? Based on previous literature, we hypothesized that (1) there will be subgroup disparity in suicidal detection between patient groups, which is mediated by language features; and (2) topic-specific classifiers (i.e., chunking the textual data into different topics) will improve the model performance, reducing the subgroup disparity. Table 1 Definition and examples of terminologies used. Data Heterogeneity refers to a dataset composed of different data types, structures, formats, or sources Inter-individual Data Heterogeneity Intra-individual Data Heterogeneity Definition : Between individuals, their data might vary because of factors, such as age, gender, education, ethnicity, and personality. Example : The use of language varies depending on whether the individual is introverted or extroverted. This study : In clinical interviews, individuals with varying patient characteristics (non-alexithymia and alexithymia) might show differences in language use due to their differing abilities in emotional processing. Definition : Within individuals, their data might vary across factors, such as, time, contexts, and locations. Example : The use of language varies depending on whether it is in a party or a meeting. This study : During the clinical interview, the same individual might exhibit differences in language use depending on the conversation topics, such as mood-general or suicide-specific. Method Participants This cross-sectional study was conducted as part of an ongoing digital phenotyping research project that aimed to automate the detection of depressive features. The study used a case-control design (Chen et al., 2024 , under review) For the case group, individuals diagnosed with major depressive disorders were recruited from outpatient clinics in a local university-affiliated hospital. The control group consisted of individuals recruited from the community. The Structured Clinical Interview for DSM-5—Clinician Version (SCID-5-CV) was used to assess whether the control group had any DSM-IV diagnosis by a trained medical researcher (Chen et al., 2024 ). The inclusion criteria for participants were native Cantonese-speaking Chinese adults aged 18 or older. Exclusion criteria included voice, speech, and language problems, a history of psychiatric disorders other than major depressive disorders, and an inability to provide written informed consent. Data was collected between 2020 and 2022 from 299 participants, including 194 cases and 105 controls (Mean age = 53.15 ± 11.68, female n = 171, 57%). Participants were compensated with a cash coupon for their participation. Ethical approval was obtained from the Joint Chinese University of Hong Kong-New Territories East Cluster Clinical Research Ethics Committee (Ref No: 2020.492). Participants could withdraw from the interview at any time. Clinical trial number: not applicable. Measurements HDRS All participants underwent the semi-structured interview with the 17-item Hamilton Depression Rating Scale (HDRS) (Chan et al., 2022 ) by trained interviewer/psychiatrist (JC). The item H11 of the HDRS was used to assess suicide risk, “Since last week, have you had any thoughts that life is not worth living?” Suicide risk was rated in five progressive levels: (1) having no suicidal thoughts; (2) feeling life is not worth living; (3) having wishes to be dead or any thoughts of the possible death of self; (4) having suicidal ideation or gestures; and (5) having suicide attempts. The ratings were further validated by TMHL (with a kappa of 0.92). The item H11 with a rating of (2) or above was used as the cut-off point to determine the presence of significant suicidal ideation (i.e., the gold standard). The overall HDRS score was utilized to determine the current depression with a score of 8 or above as the cut-off point for the presence of the current depression (Chan et al., 2022 ; Chen et al., 2024 ). The interview lasted approximately 15–30 mins. TAS TAS The Toronto Alexithymia Scale (TAS-20) was utilized to assess alexithymia. TAS is a 20-item self-report questionnaire measuring three dimensions: difficulty identifying feelings (DIF), difficulty describing feelings (DDF), and externally oriented thinking (EOT) (Zhu et al., 2007 ). Participants rate each item on a 5-point scale, ranging from strongly disagree (1) to strongly agree (5). The international cut-off values are as follows: 20–51 for non-alexithymia, 52–60 for possible alexithymia and 61–100 for alexithymia (Camia et al., 2020 ; Welding & Samur, 2018 ). In the current study, the Cronbach's alpha for this scale was 0.90, and possible alexithymia and alexithymia were grouped as the alexithymia group. LIWC Linguistic Inquiry and Word Count (LIWC), a text analysis application developed to analyse the emotional, cognitive, and structural components of a text (Boyd et al., 2022 ), was applied to extract and count language features of the 299 clinical transcripts. Examples of LIWC language features are “negative emotion words”, “tentative words,” and “family-related words” (see supplementary table 1 for details). The current study used the internal 2015 traditional Chinese version of the LIWC dictionary. Given the transcripts are in Chinese, where words are not separated by space like in English, word segmentation needed to be conducted before applying LIWC. Jieba, a Chinese word segmentation module, was used (Fu et al., 2024 ). Large language model – BERT The study used the Bidirectional Encoder Representations from Transformers (BERT) model (Malgaroli et al., 2023 ) for binary classification to detect the presence of suicidal ideation using Chinese (Cantonese) clinical transcripts of 299 participants. To ensure the replicability of results, a random seed was set. TensorFlow 2.15.0, Transformer 4.37.2 and NVIDIA CUDA 12.4 were utilized to train the BERT models. The BERT tokenizer was employed to tokenize the textual data, which was then converted into TensorFlow datasets. The model was optimized using Adam optimization algorithm with a self-defined learning rate of 1e-5. Model training was performed over 30 epochs with a batch size of 8 to strike a balance between efficiency and convergence. A 10-fold cross-validation was used to ensure a more robust result. To ensure comparable representation, the folds were stratified based on the labels of suicidal ideation (i.e., with and without suicidal ideation), aiming to have a similar number of cases and controls in each fold. Ultimately, the classifiers predicted the suicidal ideation label for each of the 299 participants, and its performance was assessed in both groups (alexithymia and non-alexithymia) and whole sample. The current study has trained three BERT-based classifiers separately, with different levels of intra-individual data heterogeneity. Levels of intra-individual data heterogeneity were achieved by narrowing down the conversation topic of the clinical transcripts. In the high heterogeneity level, a topic-general classifier 1 was trained on the full transcripts, containing responses to all HDRS items, while in the moderate heterogeneity level, a topic-specific classifier 2 was trained on the non-H11 transcripts, containing responses to all HDRS items except for H11 (i.e., mood-related topic). In the lowest heterogeneity level, a topic-specific classifier 3 was trained on H11-only transcripts, containing responses to H11 only (i.e., suicide-specific topic). Figure 1 demonstrates the flow of the current study. Statistical Analysis All analyses were conducted using R (version 4.3.1; R Foundation for Statistical Computing). A p-value < .05 was considered statistically significant. Descriptive statistics for categorical variables are presented as numbers and percentages. Receiver operating characteristic curve analysis was used for analysing the accuracy of classification results. For each classifier, the performance was assessed by area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Classifier comparison was assessed using the “roc.test” for AUC from the R package. Mediation analyses were conducted to probe further into the subgroup disparity. Five performance metrics including the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) and the overall false detection (FP + FN) were computed for each individual. Each metric has two levels (0 and 1), with 1 indicating its presence. For example, if an individual with clinician-rated suicide risk (i.e., positive case) was classified as positive by the classifier, the TP of this individual will be 1. After computing the performance metrics, mediation analysis was conducted to test the mediating role of different LIWC language features on the relationship between the patient group and individual performance metrics using the lavaan package in R. As shown in Fig. 2 , the patient group was the independent variable (IV) of two levels (0 and 1) with 1 indicating the presence of alexithymia, while the performance metric was a binary dependent variable (DV). The path a was the effect of IV (i.e., patient group) on the mediator (i.e., language feature). Path b was the effect of the mediator on DV (i.e., performance metric). Path c’ was the effect of IV on DV while controlling the mediator, which was also referred to as a direct effect of IV on DV. The product of path a and path b was considered as an indirect effect of IV on DV (i.e., ab). Path c represented the total effect of IV on DV, which was the combination of direct and indirect effects (i.e., ab + c’). All paths were adjusted by sex, age, and the current depression. Bonferroni correction was applied for multiple comparison. The p-value of indirect effect was derived using 5000 bootstrapping. Subgroup disparity was identified if a significant association was found between the patient group and any one of these five performance metrics. In other words, the presence of alexithymia influenced one’s likelihood of being correctly detected. Both comparative fit index (CFI) and Tucker–Lewis index (TLI) should be 0.95 or greater; the root mean squared error of approximation (RMSEA) should be 0.06 or lower and the standardized root mean squared residual (SRMR) should be 0.08 or lower (Hu & Bentler, 1999 ). Results Table 2 showed the clinical data of the current sample. In total, 22.7% of them were rated by the clinician with significant suicidal ideation (68 out of 299) and 35.5% of them had current depression (106 out of 299). There was a significant effect of alexithymia on the rates of suicidal ideation and depression (both p < .001). The rate of suicidal ideation and depression was higher in the alexithymia group (36.8% and 50.9% respectively), compared to non-alexithymia group (4.6% and 15.7% respectively). No significant age or sex differences were found between alexithymia and non-alexithymia groups (p = .17 and p = .98 respectively). Table 2 Characteristics of participants. Suicidal (%) Depression (%) Male (%) Age Mean (SD) Non-Alexithymia (n = 108) 5 (4.6) 17 (15.7) 44 (34.4) 50.4 (12.1) Alexithymia (n = 106) 39 (36.8) 54 (50.9) 43 (40.6) 52.56 (10.3) Whole Sample (n = 299) 68 (22.7) 106 (35.5) 128 (42.8) 53.2 (11.7) Note. 85 individuals did not complete the TAS questionnaire. Supplementary table 2 demonstrated the language pattern of these two patient groups across topics, exhibiting both between and within individual differences in language use (i.e., intra- and inter-individual data heterogeneity). Hypothesis 1 As shown in Table 3 , subgroup disparity was observed in detecting suicidal ideation, with as much as 0.11 difference in AUCs. The topic-general classifier 1 (i.e., using transcripts of all responses) performed sub-optimally in alexithymia group. Table 3 Performance of topic-general classifier in detecting suicidal ideation across patient groups Sensitivity Specificity PPV NPV AUC Non-Alexithymia 0.40 1.00 1.00 0.97 0.88*** Classifier 1 Alexithymia 0.56 0.85 0.69 0.77 0.77*** (topic-general) Whole Sample 0.51 0.90 0.61 0.86 0.83*** Note. For AUC only: p < .05*, p < .01**, p < .001***. Optimal threshold was used to achieve balanced sensitivity, specificity, PPV and NPV. Mediation analyses were conducted to test the mediating role of language features. Table 4 displayed the mediation analysis results for mediators with significant indirect effects, excluding cases where the non-significant total effect was not due to opposing signs of the direct and indirect effects (i.e., total effect was cancelled out). All models have a good fit. Detailed explanations and examples of these mediators (i.e., LIWC language features/categories) can be found in supplementary table 1 . For the topic-general classifier 1, an overall negative relationship between the patient group and TN (βc = -1.18, p < .001) was observed, as partially mediated by “family-related words” (βab = -0.41, p = .024; βc’ = -0.77, p = .023). On the other hand, an overall positive relationship between the patient group and FP (βc = 5.37, p < .001) was observed, as partially mediated by “word count” (βab = 0.16, p < .001; βc’ = 5.22, p < .001), religion-related words (βab =-0.41, p = .007; βc’ = 5.78, p < .001) and tentative words (βab =-0.44, p = .007; βc’ = 5.81, p < .001). Table 4 Mediation analysis of the relationship between patient group (IV) and individual performance metrics (DVs) of topic-general classifier as mediated by individual LIWC features (mediators). Mediator DV Indirect effect (ab) Direct effect (c’) Total effect (c) family-related words TN -0.41* -0.77* -1.18*** Classifier 1 word count FP 0.16*** 5.22*** 5.37*** (topic-general) religion-related words FP -0.41** 5.78*** 5.37*** tentative words FP -0.44** 5.81*** 5.37*** Note. TN: true negative. FP: false positive. p < .05*, p < .01**, p < .001*** Hypothesis 2 Based on the findings of hypothesis 1, where a subgroup disparity was found, two additional topic-specific classifiers (factorization models) were trained with decomposed intra-individual heterogeneity. As indicated by Fig. 3 , both mood-related classifier 2 and suicide-specific classifier 3 demonstrated significant improved performances for alexithymia group and whole sample (all p < .001), reducing the subgroup disparity drastically. As shown in Table 5 , differences in AUCs decreased from 0.11 for topic-general classifier 1 to 0.05 and 0.01 for mood-related classifier 2 and suicide-specific classifier 3, respectively. Note p < .05*, p < .01**, p < .001*** as compared to topic-general classifier using roc.test. Table 5 Performance of topic-specific classifiers in detecting suicidal ideation in both groups and whole sample. Sensitivity Specificity PPV NPV AUC Classifier 2 Non-Alexithymia 0.80 1.00 1.00 0.99 0.98*** (mood-related) Alexithymia 0.85 0.91 0.85 0.91 0.93*** Whole Sample 0.82 0.97 0.89 0.95 0.96*** Classifier 3 Non-Alexithymia 0.80 0.96 0.50 0.99 0.94*** (suicide-specific) Alexithymia 0.82 0.96 0.91 0.90 0.95*** Whole Sample 0.84 0.95 0.83 0.95 0.96*** Note. For AUC only: p < .05*, p < .01**, p < .001***. Optimal threshold was used to achieve balanced sensitivity, specificity, PPV and NPV. Discussion This study examined the effect of inter- and intra-individual data heterogeneity on the performance of BERT model in detecting suicidal ideations and explored whether topic-specific classifiers (i.e., factorization models) could address the subgroup disparity raised by the inter-individual data heterogeneity. Our mediation analysis of the topic-general classifier 1 showed that the presence of alexithymia impacted various performance metrics, including TN and FP, which would be partially explained by language features. Thus, our first hypothesis of the subgroup disparity in AI detection model of suicidal ideation as mediated by language features was supported. More specifically, our results indicated that alexithymia group tended to use more family-related words and tentative words, which were associated with a decreased likelihood of true detections and an increased likelihood of false detection of suicidal ideation respectively. Our findings were consistent with the literature, suggesting that individuals with alexithymia, due to their unique language use and difficulty in emotional expression, might confuse the AI detection model. For example, an increased use of social-related words, such as family and friends in non-alexithymia individuals tended to imply better social integration, while in alexithymia individuals, it might imply the opposite. According to Spitzer et al. ( 2005 ), alexithymia individuals tended to encounter more interpersonal problems and have distinct interpersonal style, due to the deficit in emotional processing. Therefore, an increased use of social-related words (e.g., family) might not necessarily reflect better social integration in alexithymia individuals (Meganck et al., 2009 ). Similarly, Welding & Samur ( 2018 ) indicated that alexithymia individuals did not perceive emotional words to be more salient than neutral words, meaning that neutral words could also represent emotional salience in alexithymia individuals. To address the subgroup disparity, we employed factorization by training two topic-specific classifiers (i.e., mood-related classifier 2 and suicide-specific classifier 3). A significant improvement in AUCs was found for both topic-specific classifiers for alexithymia group and whole sample, reducing the subgroup disparity drastically. This result was consistent with previous findings, indicating that decomposing intra-individual heterogeneity could improve model performance (Zhou et al., 2023 ) and highlights the presence of topic-specific indicators for suicidal ideation (Li et al., 2023 ). Thus, our second hypothesis that decomposing intra-individual heterogeneity could offload task difficulty and consequently reduce the subgroup disparity, was supported. Furthermore, the improvement in mood-related classifier 2 (i.e., no suicide-related questions or responses were used for model training) implied that decomposing intra-individual heterogeneity could have already offloaded the model without the necessity to elicit sensitive suicide-related memories. One possible explanation for the unsignificant improvement in non-alexithymia group could be the high intra-individual difference in language use across different conversation topics among the alexithymia individuals (compared to non-alexithymia individuals), suggesting that utilizing factorization model could yield greater benefits for alexithymia group. Indeed, studies have demonstrated that the communication deficit of alexithymia individuals might rather be domain-specific than domain-general (Wang et al., 2022). Overall, the significance of such subgroup disparity in topic-general classifier 1 implied the need to enhance the generalizability and equity of AI detection models for suicidal ideation. In addition, clinicians should interpret the detection result of AI model with attention to patient characteristics and contextual information as these could impact the accuracy of the detection results. However, as informed by the findings of this study, offloading the task difficulty by decomposing data complexity could be a promising solution. More importantly, the proposed solution required no patient information to train the model, thereby ensuring the confidentiality and resolving situations where patient information is inaccessible (e.g., social media data). Limitations & Future Directions Some limitations of the study should be noted. The current study only explored languages at a lexicon level (in terms of frequency) in mediation analysis. As all mediators found in the current study only partially mediated the effect of alexithymia on performance metrics, other language features such as syntactic features (e.g., grammatical structure, word dependency) might also play an explanatory role in the observed subgroup disparity. Even at a lexicon level, not only the frequency of the lexicon matters but also the complexity and valence of the lexicon. Studies increasingly revealed that while alexithymia individuals demonstrated a comparable frequency of emotional word use, their use of emotional words was less nuanced than non-alexithymia individuals, indicating a qualitative rather than quantitative difference in their language use (Wotschack & Klann-Delius, 2013 ). Similarly, the valance of the same words can be significantly different, such as the use of social-related words can represent both social conflicts and social supports or connections. Future studies may adopt a mixed method approach to investigate different language features. Conclusion The current study revealed a subgroup disparity between alexithymia and non-alexithymia group in AI detection models for suicidal ideation. This subgroup disparity could be partially explained by specific language features and largely reduced by training a factorization model with decomposed intra-individual data heterogeneity (i.e., topic-specific classifiers). These findings provide valuable insights into the development of more equitable and generalizable AI models for suicidal ideation detection. Declarations Competing Interests YKW received personal fees from Eisai Co. for lectures and travel support from Lundbeck HK Limited and Aculys Pharma, Japan. JWYC received personal fee from Eisai Co., Ltd and travel support from Lundbeck HK limited for overseas conference. All other authors declare no financial or non-financial competing interests. Funding This work was supported by the Health and Medical Research Fund (09203066 and 21220821), General Research Fund (14106223), Innovation and Technology Support Programme (ITS/178/22), CUHK Direct Grant for Research (2022.073 and 2024.061), CUHK IdeaBooster Fund (IDBF24MED15), and CUHK Improvement on Competitiveness in Hiring New Faculties Funding Scheme (371). CL was supported by the Faculty Postdoctoral Fellowship Scheme of the Chinese University of Hong Kong (FPFS/23–24/024). Author Contribution Conceptualization, Y.K.W. and T.M.H.L.; Data curation, J.C. and K.Y.C.; Formal analysis, R.H. ; Funding acquisition, Y.K.W. and T.M.H.L.; Writing—original draft, R.H. ; Writing—review and editing, R.H., L.X., C.C.W.C, J.C., K.Y.C., C.L., J.W.Y.C., S.W.H.C., N.Y.C., B.H., Y.K.W. and T.M.H.L. All authors will be informed about each step of manuscript processing, including submission, revision, revision reminder, etc., via emails from our system or the assigned Assistant Editor. All authors have read and agreed to the published version of the manuscript. Data Availability The datasets generated and/or analysed during the current study and the underlying code are not publicly available due to [MASK FOR REVIEW] but are available from the corresponding author on reasonable request. References Boyd, R. L., Ashwini Ashokkumar, Seraj, S., & Pennebaker, J. W. (2022). The Development and Psychometric Properties of LIWC-22 . https://doi.org/10.13140/RG.2.2.23890.43205 Camia, C., Desmedt, O., & Luminet, O. (2020). Exploring autobiographical memory specificity and narrative emotional processing in alexithymia. Narrative Inquiry , 30 (1), 59–79. https://doi.org/10.1075/ni.18089.kob Chan, J. W., Lam, S., Li, S. X., Chau, S. W., Chan, S., Chan, N., Zhang, J., & Wing, Y. (2022). Adjunctive bright light treatment with gradual advance in unipolar major depressive disorder with evening chronotype – A randomized controlled trial. Psychological Medicine , 52 (8), 1448–1457. https://doi.org/10.1017/S0033291720003232 Chen, J., Chan, N. Y., Li, C.-T., Chan, J. W. Y., Liu, Y., Li, S. X., Chau, S. W. H., Leung, K. S., Heng, P.-A., Lee, T. M. C., Li, T. M. H., & Wing, Y.-K. (2024). Multimodal digital assessment of depression with actigraphy and app in Hong Kong Chinese. Translational Psychiatry , 14 (1), 150. https://doi.org/10.1038/s41398-024-02873-4 Chen, J., Li, C.-T., Chan, N. Y., Chen, C. X., Chen, S., Chan, J. W. Y., Liu, Y., Li, S. X., Chau, S. W. H., Leung, K. S., Heng, P.-A., Lee, T. M. C., Li, T. M. H., & Wing, Y.-K. (under review). Capturing omega sign in the clinical assessment of depression by deep learning. Under Review . Cheng, Q., Li, T. M., Kwok, C.-L., Zhu, T., & Yip, P. S. (2017). Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study. Journal of Medical Internet Research , 19 (7), e7276. https://doi.org/10.2196/jmir.7276 Coppersmith, G., Ngo, K., Leary, R., & Wood, A. (2016). Exploratory Analysis of Social Media Prior to a Suicide Attempt. In K. Hollingshead & L. Ungar (Eds), Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology (pp. 106–117). Association for Computational Linguistics. https://doi.org/10.18653/v1/W16-0311 De Berardis, D., Fornaro, M., Orsolini, L., Valchera, A., Carano, A., Vellante, F., Perna, G., Serafini, G., Gonda, X., Pompili, M., Martinotti, G., & Di Giannantonio, M. (2017). Alexithymia and Suicide Risk in Psychiatric Disorders: A Mini-Review. Frontiers in Psychiatry , 8 . https://doi.org/10.3389/fpsyt.2017.00148 Fu, Z., Hsu, Y. C., Chan, C. S., Lau, C. M., Liu, J., & Yip, P. S. F. (2024). Efficacy of ChatGPT in Cantonese Sentiment Analysis: Comparative Study. Journal of Medical Internet Research , 26 (1), e51069. https://doi.org/10.2196/51069 Homan, S., Gabi, M., Klee, N., Bachmann, S., Moser, A.-M., Duri’, M., Michel, S., Bertram, A.-M., Maatz, A., Seiler, G., Stark, E., & Kleim, B. (2022). Linguistic features of suicidal thoughts and behaviors: A systematic review. Clinical Psychology Review , 95 , 102161. https://doi.org/10.1016/j.cpr.2022.102161 Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal , 6 (1), 1–55. https://doi.org/10.1080/10705519909540118 Huang, R., Yi, S., Chen, J., Chan, K. Y., Chan, J. W. Y., Chan, N. Y., Li, S. X., Wing, Y. K., & Li, T. M. H. (2024). Exploring the Role of First-Person Singular Pronouns in Detecting Suicidal Ideation: A Machine Learning Analysis of Clinical Transcripts. Behavioral Sciences , 14 (3), Article 3. https://doi.org/10.3390/bs14030225 Ji, S., Pan, S., Li, X., Cambria, E., Long, G., & Huang, Z. (2021). Suicidal Ideation Detection: A Review of Machine Learning Methods and Applications. IEEE Transactions on Computational Social Systems , 8 (1), 214–226. IEEE Transactions on Computational Social Systems. https://doi.org/10.1109/TCSS.2020.3021467 Kim, K., Choi, S., Lee, J., & Sea, J. (2019). Differences in linguistic and psychological characteristics between suicide notes and diaries. The Journal of General Psychology , 146 (4), 391–416. https://doi.org/10.1080/00221309.2019.1590304 Lam, C., Xian, L., Huang, R., Chen, J., Chan, K. Y., Chan, J. W. Y., Chau, S. W. H., Chan, N. Y., Li, S. X., Wing, Y.-K., & Li, T. M. H. (under review). Do patients or AI know better about depressive symptoms? A deep learning study of individuals with and without alexithymia. Under Review . Lee, D., & Seung, H. S. (2000). Algorithms for Non-negative Matrix Factorization. Advances in Neural Information Processing Systems , 13 . https://proceedings.neurips.cc/paper_files/paper/2000/hash/f9d1152547c0bde01830b7e8bd60024c-Abstract.html Lekkas, D., Klein, R. J., & Jacobson, N. C. (2021). Predicting acute suicidal ideation on Instagram using ensemble machine learning models. Internet Interventions , 25 , 100424. https://doi.org/10.1016/j.invent.2021.100424 Li, T. M. H., Chen, J., Law, F. O. C., Li, C.-T., Chan, N. Y., Chan, J. W. Y., Chau, S. W. H., Liu, Y., Li, S. X., Zhang, J., Leung, K.-S., & Wing, Y.-K. (2023). Detection of Suicidal Ideation in Clinical Interviews for Depression Using Natural Language Processing and Machine Learning: Cross-Sectional Study. JMIR Medical Informatics , 11 (1), e50221. https://doi.org/10.2196/50221 Libin, A., Treitler, J. T., Vasaitis, T., & Shao, Y. (2024). Evaluating and Reducing Subgroup Disparity in AI Models: An Analysis of Pediatric COVID-19 Test Outcomes. medRxiv: The Preprint Server for Health Sciences , 2024.09.18.24313889. https://doi.org/10.1101/2024.09.18.24313889 Malgaroli, M., Hull, T. D., Zech, J. M., & Althoff, T. (2023). Natural language processing for mental health interventions: A systematic review and research framework. Translational Psychiatry , 13 (1), 309. https://doi.org/10.1038/s41398-023-02592-2 Meganck, R., Vanheule, S., & Desmet, M. (2008). Does the TAS-20 measure alexithymia? A study of natural language use. 28th European Conference on Psychosomatic Research, Abstracts . 28th European Conference on Psychosomatic Research (ECPR – 2012). http://hdl.handle.net/1854/LU-3036354 Meganck, R., Vanheule, S., Inslegers, R., & Desmet, M. (2009). Alexithymia and interpersonal problems: A study of natural language use. Personality and Individual Differences , 47 (8), 990–995. https://doi.org/10.1016/j.paid.2009.08.005 Nguyen, T., O’Dea, B., Larsen, M., Phung, D., Venkatesh, S., & Christensen, H. (2017). Using linguistic and topic analysis to classify sub-groups of online depression communities. Multimedia Tools and Applications , 76 (8), 10653–10676. https://doi.org/10.1007/s11042-015-3128-x Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology , 77 (6), 1296–1312. https://doi.org/10.1037/0022-3514.77.6.1296 Spitzer, C., Siebel-Jürges, U., Barnow, S., Grabe, H. J., & Freyberger, H. J. (2005). Alexithymia and Interpersonal Problems. Psychotherapy and Psychosomatics , 74 (4), 240–246. https://doi.org/10.1159/000085148 Tadesse, M. M., Lin, H., Xu, B., & Yang, L. (2020). Detection of Suicide Ideation in Social Media Forums Using Deep Learning. Algorithms , 13 (1), Article 1. https://doi.org/10.3390/a13010007 Vioulès, M. J., Moulahi, B., Azé, J., & Bringay, S. (2018). Detection of suicide-related posts in Twitter data streams. IBM Journal of Research and Development , 62 (1), 7:1–7:12. IBM Journal of Research and Development. https://doi.org/10.1147/JRD.2017.2768678 Welding, C., & Samur, D. (2018). Language Processing in Alexithymia. In O. Luminet, R. M. Bagby, & G. J. Taylor (Eds), Alexithymia: Advances in Research, Theory, and Clinical Practice (1st edn, pp. 90–104). Cambridge University Press. https://doi.org/10.1017/9781108241595.008 WHO. (2021). Suicide Worldwide In 2019: Global Health Estimates (1st ed). World Health Organization. Wotschack, C., & Klann-Delius, G. (2013). Alexithymia and the conceptualization of emotions: A study of language use and semantic knowledge. Journal of Research in Personality , 47 (5), 514–523. https://doi.org/10.1016/j.jrp.2013.01.011 Zhang, T., Schoene, A. M., & Ananiadou, S. (2021). Automatic identification of suicide notes with a transformer-based deep learning model. Internet Interventions , 25 , 100422. https://doi.org/10.1016/j.invent.2021.100422 Zhou, Y., Liao, L., Gao, Y., Wang, R., & Huang, H. (2023). TopicBERT: A Topic-Enhanced Neural Language Model Fine-Tuned for Sentiment Classification. IEEE Transactions on Neural Networks and Learning Systems , 34 (1), 380–393. IEEE Transactions on Neural Networks and Learning Systems. https://doi.org/10.1109/TNNLS.2021.3094987 Zhu, X., Yi, J., Yao, S., Ryder, A. G., Taylor, G. J., & Bagby, R. M. (2007). Cross-cultural validation of a Chinese translation of the 20-item Toronto Alexithymia Scale. Comprehensive Psychiatry , 48 (5), 489–496. https://doi.org/10.1016/j.comppsych.2007.04.007 Additional Declarations Competing interest reported. YKW received personal fees from Eisai Co. for lectures and travel support from Lundbeck HK Limited and Aculys Pharma, Japan. JWYC received personal fee from Eisai Co., Ltd and travel support from Lundbeck HK limited for overseas conference. All other authors declare no financial or non-financial competing interests. Supplementary Files supplementarymaterial.docx Cite Share Download PDF Status: Under Review Version 1 posted Editorial decision: Revision requested 16 Dec, 2025 Reviews received at journal 10 Dec, 2025 Reviews received at journal 01 Dec, 2025 Reviewers agreed at journal 28 Nov, 2025 Reviewers agreed at journal 25 Nov, 2025 Reviewers agreed at journal 04 Nov, 2025 Reviewers invited by journal 30 Oct, 2025 Editor assigned by journal 21 Oct, 2025 Submission checks completed at journal 21 Oct, 2025 First submitted to journal 19 Sep, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7657467","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Article","associatedPublications":[],"authors":[{"id":542313782,"identity":"4b90b313-7dbd-445f-b74d-14c6feeed65f","order_by":0,"name":"Rong Huang","email":"","orcid":"","institution":"Chinese University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Rong","middleName":"","lastName":"Huang","suffix":""},{"id":542313784,"identity":"dedc5442-b756-4d9d-81d2-bc06a20bde5a","order_by":1,"name":"Longdi Xian","email":"","orcid":"","institution":"Chinese University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Longdi","middleName":"","lastName":"Xian","suffix":""},{"id":542313786,"identity":"009178b6-03e3-4302-83cf-bdf4e2cc8f79","order_by":2,"name":"Christopher Chi Wai Cheng","email":"","orcid":"","institution":"Chinese University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Christopher","middleName":"Chi Wai","lastName":"Cheng","suffix":""},{"id":542313787,"identity":"1da3e612-45b2-4791-a439-65d9de9ce5b8","order_by":3,"name":"Jie Chen","email":"","orcid":"","institution":"Chinese University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Jie","middleName":"","lastName":"Chen","suffix":""},{"id":542313789,"identity":"da6df393-0dde-46f9-af7f-0447bfe53b47","order_by":4,"name":"Kit Ying Chan","email":"","orcid":"","institution":"Chinese University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Kit","middleName":"Ying","lastName":"Chan","suffix":""},{"id":542313791,"identity":"790c329a-3ce0-40f5-be95-4137a2b0c4d0","order_by":5,"name":"Calvin Lam","email":"","orcid":"","institution":"Chinese University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Calvin","middleName":"","lastName":"Lam","suffix":""},{"id":542313792,"identity":"3d9a83fb-b22b-4ca1-af2d-01ab0bd7ef81","order_by":6,"name":"Joey W Y Chan","email":"","orcid":"","institution":"Chinese University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Joey","middleName":"W Y","lastName":"Chan","suffix":""},{"id":542313793,"identity":"d451abd3-815e-4cbc-96d5-f43f0af9728b","order_by":7,"name":"Steven W H Chau","email":"","orcid":"","institution":"Chinese University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Steven","middleName":"W H","lastName":"Chau","suffix":""},{"id":542313794,"identity":"c7e32695-1b7c-423b-b71f-8f0d9fd74ec6","order_by":8,"name":"Ngan Yin Chan","email":"","orcid":"","institution":"Chinese University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Ngan","middleName":"Yin","lastName":"Chan","suffix":""},{"id":542313795,"identity":"224f4a49-a9ee-40ae-bc22-569fab6d868f","order_by":9,"name":"Bei Huang","email":"","orcid":"","institution":"Chinese University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Bei","middleName":"","lastName":"Huang","suffix":""},{"id":542313798,"identity":"8b895f5d-410a-4ec3-b76c-3d04b7e0826f","order_by":10,"name":"Yun-Kwok Wing","email":"","orcid":"","institution":"Chinese University of Hong Kong","correspondingAuthor":false,"prefix":"","firstName":"Yun-Kwok","middleName":"","lastName":"Wing","suffix":""},{"id":542313800,"identity":"cfd17a60-33d3-4b02-aa1d-c75b5b8d211e","order_by":11,"name":"Tim M H Li","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAAAzUlEQVRIiWNgGAWjYHACNgaGAgk5BgYeZiAHiCXYiNFiIGFMshaGxAaitei2nz324IOBRXr/jNzDBgwV1okN0m0JeLWYnclLN5xhIJE740ZecgLDmfTEBpljB/BrOZBjJs0D1LJBIsf4AGPb4cQGifQG/FrOvwFrSTcAa/lHjJYbEFsSQFoSGBtAWtIIOOzGGzNJoF8MZ5x5Y2yQcCzduE0iLYGAw3LMJD5U1Mnzt+cYS3yosZbtl0gzwKsFFYCMJxyRo2AUjIJRMAoIAgBL9z9gzHTpZgAAAABJRU5ErkJggg==","orcid":"","institution":"Chinese University of Hong Kong","correspondingAuthor":true,"prefix":"","firstName":"Tim","middleName":"M H","lastName":"Li","suffix":""}],"badges":[],"createdAt":"2025-09-19 10:23:18","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7657467/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7657467/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":95691586,"identity":"7a29acf4-3f79-468a-b965-5036c49aaaa1","added_by":"auto","created_at":"2025-11-12 02:20:43","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":525614,"visible":true,"origin":"","legend":"","description":"","filename":"ExploringtheGeneralizabilityandExplainabilityofLLMsinDetectingSuicidalIdeationoffloadfinal.docx","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/48d80a9b0cde043e6e7403e8.docx"},{"id":95691589,"identity":"bc7f8620-b158-4e2a-a207-9280658b08a8","added_by":"auto","created_at":"2025-11-12 02:20:43","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":13436,"visible":true,"origin":"","legend":"","description":"","filename":"ed379a3e1b944179bd40bd188b96bff5.json","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/deb400fba462388005823dae.json"},{"id":95800509,"identity":"6e60de2f-ecf3-42e5-bc9c-98155aa72fd5","added_by":"auto","created_at":"2025-11-13 08:22:46","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":20726,"visible":true,"origin":"","legend":"","description":"","filename":"supplementarymaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/f655caec5cf6c4f58f1d7b70.docx"},{"id":95691592,"identity":"6c13cf32-298b-47c0-8941-9e6400550313","added_by":"auto","created_at":"2025-11-12 02:20:43","extension":"xml","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":117478,"visible":true,"origin":"","legend":"","description":"","filename":"ed379a3e1b944179bd40bd188b96bff51enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/78bec170400f9f1c3e71f4cd.xml"},{"id":95691594,"identity":"eb1b0463-aedb-4c5f-9d43-8976d8d10ff6","added_by":"auto","created_at":"2025-11-12 02:20:43","extension":"jpeg","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":353562,"visible":true,"origin":"","legend":"","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/49f5338bf3e84fd3db3ddcba.jpeg"},{"id":95691596,"identity":"711653ec-5b8f-477f-ad72-6a178363e08f","added_by":"auto","created_at":"2025-11-12 02:20:43","extension":"png","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":61352,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/1d478459a520e5acae387f23.png"},{"id":95691600,"identity":"d50bf4ef-be16-4141-aafc-871c660e0b6d","added_by":"auto","created_at":"2025-11-12 02:20:43","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":27505,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/39aa1984518d9ea77195dfcc.png"},{"id":95691595,"identity":"7f3fae60-6fdd-4e57-8bf1-3023b8ba0a3b","added_by":"auto","created_at":"2025-11-12 02:20:43","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":16024,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/bce91ae925578d9268fd73ee.png"},{"id":95691597,"identity":"0991544b-e748-4a0c-9f0e-b7c60784366e","added_by":"auto","created_at":"2025-11-12 02:20:43","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":62517,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/9968555ee038d7d59a6d413e.png"},{"id":95691598,"identity":"f07a4c9d-97e6-4162-92ff-0d55886d476a","added_by":"auto","created_at":"2025-11-12 02:20:43","extension":"xml","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":114799,"visible":true,"origin":"","legend":"","description":"","filename":"ed379a3e1b944179bd40bd188b96bff51structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/a5241eecaf4dd8fb052f38f4.xml"},{"id":95691599,"identity":"dde892f4-e39f-438e-b89c-53afe541757b","added_by":"auto","created_at":"2025-11-12 02:20:43","extension":"html","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":126359,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/d0f53b26b46b1d6559a9fb1c.html"},{"id":95691590,"identity":"baa08151-2af8-4269-b240-046a57b26bda","added_by":"auto","created_at":"2025-11-12 02:20:43","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":222509,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eThe logic flow of the current study.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/b51bb377163170faf5ba09fc.png"},{"id":95798867,"identity":"c97d3601-2b55-4fa7-b1ad-cc7ef86cc60f","added_by":"auto","created_at":"2025-11-13 08:18:01","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":115733,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eMediationanalysis of the relationship between the patient group and performance metric, as mediated by language feature.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/628da600f4825966c9000385.png"},{"id":95691585,"identity":"71bdd395-310c-4a4a-8dee-7b3c6adec605","added_by":"auto","created_at":"2025-11-12 02:20:43","extension":"jpg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":69878,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eROC-AUC curves of topic-specific classifier 2 \u0026amp; 3 as compared to topic-general classifier 1 in both groups and whole sample.\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cem\u003eNote. \u003c/em\u003ep\u0026lt;.05*, p\u0026lt;.01**, p\u0026lt;.001*** as compared to topic-general classifier using roc.test.\u003c/p\u003e","description":"","filename":"3.jpg","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/4052663f3f6b9112bca9b0ae.jpg"},{"id":95804854,"identity":"85d4a8ef-dcb8-4f89-85b2-7bdd31bca522","added_by":"auto","created_at":"2025-11-13 08:39:50","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1331119,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/e0d97d1a-0436-4c06-a44f-9f0518edb70a.pdf"},{"id":95799591,"identity":"be610321-26bb-410f-9d52-09e65981611a","added_by":"auto","created_at":"2025-11-13 08:20:19","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":20726,"visible":true,"origin":"","legend":"","description":"","filename":"supplementarymaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-7657467/v1/c2269401b2e53d447acd4148.docx"}],"financialInterests":"Competing interest reported. YKW received personal fees from Eisai Co. for lectures and travel support from Lundbeck HK Limited and Aculys Pharma, Japan. JWYC received personal fee from Eisai Co., Ltd and travel support from Lundbeck HK limited for overseas conference. All other authors declare no financial or non-financial competing interests.","formattedTitle":"Exploring the Generalizability and Explainability of LLMs in Detecting Suicidal Ideation: The Impact of Data Heterogeneity","fulltext":[{"header":"Introduction","content":"\u003cp\u003eSuicide remains one of the leading causes of death worldwide, accounting for 1 out of every 100 deaths (WHO, \u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e2021\u003c/span\u003e). The alarming trend of its increase over the past few decades underscores the urgent need for early detection. Language expression, as a reflection of internal mental states, has shown promise in providing valuable insights into the detection of suicidal thoughts, plan, and behavior (Homan et al., \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2022\u003c/span\u003e). Studies using different data sources (ranging from social media to healthcare records) and outcome measures (ranging from self-report to clinician-rated) have found that increased suicide risk was associated with an increased use of words (Cheng et al., \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Kim et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2019\u003c/span\u003e), pronouns (Coppersmith et al., \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2016\u003c/span\u003e; Zhang et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), negative words (Lekkas et al., \u003cspan citationid=\"CR17\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Tadesse et al., \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Vioul\u0026egrave;s et al., \u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e2018\u003c/span\u003e), but a decreased use of social-related words (Nguyen et al., \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Zhang et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), with medium to large effect size (Kim et al., \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e2019\u003c/span\u003e; Zhang et al., \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e2021\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eRecent advancements in artificial intelligence (AI) and large language models (LLMs) have enabled text analysis to better detect and predict suicidal ideation (Ji et al., \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Li et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). These techniques might provide clinicians with valuable insights into early detection, prevention and remedial measures. However, as the text was inputted as learning material, two sources of data heterogeneities might have been introduced (as shown in Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e), influencing the model performance. First, the inter-individual data heterogeneity resulted from the difference in language use between individuals. For example, individuals with alexithymia, who have difficulties in recognizing, expressing, and describing their emotions, demonstrated a rather unique language pattern (Lam et al., under review; Welding \u0026amp; Samur, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Research indicated that alexithymia individuals tended to use fewer emotional words when recalling personal experiences to avoid reliving associated emotion (Camia et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). In addition, alexithymia was linked to a less use of social-related words, reflecting poor interpersonal relationships resulting from deficits in emotional processing (Meganck et al., \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2008\u003c/span\u003e). Other language features, such as word count and cognitive processing words, have also been linked to alexithymia characteristics (Welding \u0026amp; Samur, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). Such differences in language use as occurring in the clinical interview might confuse the AI model when identifying suicide-related language features, resulting in performance differences across patient groups, which is referred as subgroup disparity in AI models (Libin et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). As indicated by a recent large-scale study on AI fairness in healthcare, the subgroup AUC differences could be as large as 0.41 (Libin et al., \u003cspan citationid=\"CR19\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eSecond, the intra-individual data heterogeneity introduced by the difference in language use across different contexts/topics within individuals. Text extracted from social media could vary significantly across topics, ranging from health to politics. Similarly, the clinical interview also contained conversations focusing on different topics, such as mood symptoms and suicidal ideation. While the use of language would vary across different topics, so were the changes in language indicators of suicidality (Li et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Pennebaker \u0026amp; King, \u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e1999\u003c/span\u003e). For example, the use of discrepancy words was a predictive indicator of suicide risk during the discussion on suicide-related topics while its predictive power diminished when the conversation shifted to daily activities (Li et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Therefore, mixing the text content of different topics might weaken the significance of some language indicators, leading to poorer model performance. Indeed, a topic-specific model trained by Zhou et al. (\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) outperformed the topic-general model in the sentiment classification task. In the field of computer science, this process is called factorization which involves decomposing complex data into simpler components to make model more efficient (Lee \u0026amp; Seung, \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2000\u003c/span\u003e).\u003c/p\u003e\u003cp\u003eHence, it is possible that by decomposing the intra-individual data heterogeneity (e.g., chunking the textual data to be more topic-specific), we can offload the task complexity for the AI model and consequently reduce the subgroup disparity. While there were some studies investigating the relationship of the language use and alexithymia or suicidal ideation (e.g., Camia et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; De Berardis et al., \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2017\u003c/span\u003e; Huang et al., \u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2024\u003c/span\u003e), there was no study exploring the intricate interplay among language use, suicidal ideation expression, and alexithymia, particularly with respect to their influence on AI detection model performance. Given the heterogeneity of language use in clinical settings, understanding the impact of patient characteristics on AI detection capabilities and possible solutions may enhance the generalizability and explainability of these detection systems, especially when patient information is inaccessible or confidential, making it impossible to pre-train or tailor the model using patient-specific knowledge. Therefore, the current study asked two research questions. First, was AI equitably effective in detecting suicidal ideation between patient groups with and without alexithymia? If not, can such subgroup disparity be explained by language features? Second, will the decomposition of intra-individual data heterogeneity (i.e., factorization) help reduce the subgroup disparity? Based on previous literature, we hypothesized that (1) there will be subgroup disparity in suicidal detection between patient groups, which is mediated by language features; and (2) topic-specific classifiers (i.e., chunking the textual data into different topics) will improve the model performance, reducing the subgroup disparity.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eDefinition and examples of terminologies used.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"2\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colspan=\"2\" nameend=\"c2\" namest=\"c1\"\u003e\u003cp\u003eData Heterogeneity\u003c/p\u003e\u003cp\u003e\u003cem\u003erefers to a dataset composed of different data types, structures, formats, or sources\u003c/em\u003e\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eInter-individual Data Heterogeneity\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e\u003cb\u003eIntra-individual Data Heterogeneity\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eDefinition\u003c/b\u003e: Between individuals, their data might vary because of factors, such as age, gender, education, ethnicity, and personality.\u003c/p\u003e\u003cp\u003e\u003cb\u003eExample\u003c/b\u003e: The use of language varies depending on whether the individual is introverted or extroverted.\u003c/p\u003e\u003cp\u003e\u003cb\u003eThis study\u003c/b\u003e: In clinical interviews, individuals with varying patient characteristics (non-alexithymia and alexithymia) might show differences in language use due to their differing abilities in emotional processing.\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e\u003cb\u003eDefinition\u003c/b\u003e: Within individuals, their data might vary across factors, such as, time, contexts, and locations.\u003c/p\u003e\u003cp\u003e\u003cb\u003eExample\u003c/b\u003e: The use of language varies depending on whether it is in a party or a meeting.\u003c/p\u003e\u003cp\u003e\u003cb\u003eThis study\u003c/b\u003e: During the clinical interview, the same individual might exhibit differences in language use depending on the conversation topics, such as mood-general or suicide-specific.\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e"},{"header":"Method","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003eParticipants\u003c/h2\u003e\u003cp\u003eThis cross-sectional study was conducted as part of an ongoing digital phenotyping research project that aimed to automate the detection of depressive features. The study used a case-control design (Chen et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2024\u003c/span\u003e, under review) For the case group, individuals diagnosed with major depressive disorders were recruited from outpatient clinics in a local university-affiliated hospital. The control group consisted of individuals recruited from the community. The Structured Clinical Interview for DSM-5\u0026mdash;Clinician Version (SCID-5-CV) was used to assess whether the control group had any DSM-IV diagnosis by a trained medical researcher (Chen et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). The inclusion criteria for participants were native Cantonese-speaking Chinese adults aged 18 or older. Exclusion criteria included voice, speech, and language problems, a history of psychiatric disorders other than major depressive disorders, and an inability to provide written informed consent. Data was collected between 2020 and 2022 from 299 participants, including 194 cases and 105 controls (Mean\u003csub\u003eage\u003c/sub\u003e = 53.15\u0026thinsp;\u0026plusmn;\u0026thinsp;11.68, female n\u0026thinsp;=\u0026thinsp;171, 57%). Participants were compensated with a cash coupon for their participation. Ethical approval was obtained from the Joint Chinese University of Hong Kong-New Territories East Cluster Clinical Research Ethics Committee (Ref No: 2020.492). Participants could withdraw from the interview at any time. Clinical trial number: not applicable.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eMeasurements\u003c/h3\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\u003ch2\u003eHDRS\u003c/h2\u003e\u003cp\u003eAll participants underwent the semi-structured interview with the 17-item Hamilton Depression Rating Scale (HDRS) (Chan et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2022\u003c/span\u003e) by trained interviewer/psychiatrist (JC). The item H11 of the HDRS was used to assess suicide risk, \u0026ldquo;Since last week, have you had any thoughts that life is not worth living?\u0026rdquo; Suicide risk was rated in five progressive levels: (1) having no suicidal thoughts; (2) feeling life is not worth living; (3) having wishes to be dead or any thoughts of the possible death of self; (4) having suicidal ideation or gestures; and (5) having suicide attempts. The ratings were further validated by TMHL (with a kappa of 0.92). The item H11 with a rating of (2) or above was used as the cut-off point to determine the presence of significant suicidal ideation (i.e., the gold standard). The overall HDRS score was utilized to determine the current depression with a score of 8 or above as the cut-off point for the presence of the current depression (Chan et al., \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2022\u003c/span\u003e; Chen et al., \u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). The interview lasted approximately 15\u0026ndash;30 mins.\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003eTAS\u003c/h3\u003e\n\u003cdiv class=\"Heading\"\u003eTAS\u003c/div\u003e\u003cp\u003eThe Toronto Alexithymia Scale (TAS-20) was utilized to assess alexithymia. TAS is a 20-item self-report questionnaire measuring three dimensions: difficulty identifying feelings (DIF), difficulty describing feelings (DDF), and externally oriented thinking (EOT) (Zhu et al., \u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e2007\u003c/span\u003e). Participants rate each item on a 5-point scale, ranging from strongly disagree (1) to strongly agree (5). The international cut-off values are as follows: 20\u0026ndash;51 for non-alexithymia, 52\u0026ndash;60 for possible alexithymia and 61\u0026ndash;100 for alexithymia (Camia et al., \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Welding \u0026amp; Samur, \u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2018\u003c/span\u003e). In the current study, the Cronbach's alpha for this scale was 0.90, and possible alexithymia and alexithymia were grouped as the alexithymia group.\u003c/p\u003e\n\u003ch3\u003eLIWC\u003c/h3\u003e\n\u003cp\u003eLinguistic Inquiry and Word Count (LIWC), a text analysis application developed to analyse the emotional, cognitive, and structural components of a text (Boyd et al., \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), was applied to extract and count language features of the 299 clinical transcripts. Examples of LIWC language features are \u0026ldquo;negative emotion words\u0026rdquo;, \u0026ldquo;tentative words,\u0026rdquo; and \u0026ldquo;family-related words\u0026rdquo; (see supplementary table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003e1\u003c/span\u003e for details). The current study used the internal 2015 traditional Chinese version of the LIWC dictionary. Given the transcripts are in Chinese, where words are not separated by space like in English, word segmentation needed to be conducted before applying LIWC. Jieba, a Chinese word segmentation module, was used (Fu et al., \u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003eLarge language model \u0026ndash; BERT\u003c/h2\u003e\u003cp\u003eThe study used the Bidirectional Encoder Representations from Transformers (BERT) model (Malgaroli et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) for binary classification to detect the presence of suicidal ideation using Chinese (Cantonese) clinical transcripts of 299 participants. To ensure the replicability of results, a random seed was set. TensorFlow 2.15.0, Transformer 4.37.2 and NVIDIA CUDA 12.4 were utilized to train the BERT models. The BERT tokenizer was employed to tokenize the textual data, which was then converted into TensorFlow datasets. The model was optimized using Adam optimization algorithm with a self-defined learning rate of 1e-5.\u003c/p\u003e\u003cp\u003eModel training was performed over 30 epochs with a batch size of 8 to strike a balance between efficiency and convergence. A 10-fold cross-validation was used to ensure a more robust result. To ensure comparable representation, the folds were stratified based on the labels of suicidal ideation (i.e., with and without suicidal ideation), aiming to have a similar number of cases and controls in each fold. Ultimately, the classifiers predicted the suicidal ideation label for each of the 299 participants, and its performance was assessed in both groups (alexithymia and non-alexithymia) and whole sample.\u003c/p\u003e\u003cp\u003eThe current study has trained three BERT-based classifiers separately, with different levels of intra-individual data heterogeneity. Levels of intra-individual data heterogeneity were achieved by narrowing down the conversation topic of the clinical transcripts. In the high heterogeneity level, a topic-general classifier 1 was trained on the full transcripts, containing responses to all HDRS items, while in the moderate heterogeneity level, a topic-specific classifier 2 was trained on the non-H11 transcripts, containing responses to all HDRS items except for H11 (i.e., mood-related topic). In the lowest heterogeneity level, a topic-specific classifier 3 was trained on H11-only transcripts, containing responses to H11 only (i.e., suicide-specific topic). Figure\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e demonstrates the flow of the current study.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\u003ch2\u003eStatistical Analysis\u003c/h2\u003e\u003cp\u003eAll analyses were conducted using R (version 4.3.1; R Foundation for Statistical Computing). A p-value\u0026thinsp;\u0026lt;\u0026thinsp;.05 was considered statistically significant. Descriptive statistics for categorical variables are presented as numbers and percentages. Receiver operating characteristic curve analysis was used for analysing the accuracy of classification results. For each classifier, the performance was assessed by area under the curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Classifier comparison was assessed using the \u0026ldquo;roc.test\u0026rdquo; for AUC from the R package.\u003c/p\u003e\u003cp\u003eMediation analyses were conducted to probe further into the subgroup disparity. Five performance metrics including the True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) and the overall false detection (FP\u0026thinsp;+\u0026thinsp;FN) were computed for each individual. Each metric has two levels (0 and 1), with 1 indicating its presence. For example, if an individual with clinician-rated suicide risk (i.e., positive case) was classified as positive by the classifier, the TP of this individual will be 1. After computing the performance metrics, mediation analysis was conducted to test the mediating role of different LIWC language features on the relationship between the patient group and individual performance metrics using the lavaan package in R. As shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig2\" class=\"InternalRef\"\u003e2\u003c/span\u003e, the patient group was the independent variable (IV) of two levels (0 and 1) with 1 indicating the presence of alexithymia, while the performance metric was a binary dependent variable (DV). The path a was the effect of IV (i.e., patient group) on the mediator (i.e., language feature). Path b was the effect of the mediator on DV (i.e., performance metric). Path c\u0026rsquo; was the effect of IV on DV while controlling the mediator, which was also referred to as a direct effect of IV on DV. The product of path a and path b was considered as an indirect effect of IV on DV (i.e., ab). Path c represented the total effect of IV on DV, which was the combination of direct and indirect effects (i.e., ab\u0026thinsp;+\u0026thinsp;c\u0026rsquo;). All paths were adjusted by sex, age, and the current depression. Bonferroni correction was applied for multiple comparison. The p-value of indirect effect was derived using 5000 bootstrapping. Subgroup disparity was identified if a significant association was found between the patient group and any one of these five performance metrics. In other words, the presence of alexithymia influenced one\u0026rsquo;s likelihood of being correctly detected. Both comparative fit index (CFI) and Tucker\u0026ndash;Lewis index (TLI) should be 0.95 or greater; the root mean squared error of approximation (RMSEA) should be 0.06 or lower and the standardized root mean squared residual (SRMR) should be 0.08 or lower (Hu \u0026amp; Bentler, \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e1999\u003c/span\u003e).\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"Results","content":"\u003cp\u003eTable\u0026nbsp;\u003cspan refid=\"Tab2\" class=\"InternalRef\"\u003e2\u003c/span\u003e showed the clinical data of the current sample. In total, 22.7% of them were rated by the clinician with significant suicidal ideation (68 out of 299) and 35.5% of them had current depression (106 out of 299). There was a significant effect of alexithymia on the rates of suicidal ideation and depression (both p\u0026thinsp;\u0026lt;\u0026thinsp;.001). The rate of suicidal ideation and depression was higher in the alexithymia group (36.8% and 50.9% respectively), compared to non-alexithymia group (4.6% and 15.7% respectively). No significant age or sex differences were found between alexithymia and non-alexithymia groups (p\u0026thinsp;=\u0026thinsp;.17 and p\u0026thinsp;=\u0026thinsp;.98 respectively).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eCharacteristics of participants.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSuicidal (%)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eDepression (%)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eMale (%)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eAge\u003c/p\u003e\u003cp\u003eMean (SD)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eNon-Alexithymia (n\u0026thinsp;=\u0026thinsp;108)\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e5 (4.6)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e17 (15.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e44 (34.4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e50.4 (12.1)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eAlexithymia (n\u0026thinsp;=\u0026thinsp;106)\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e39 (36.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e54 (50.9)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e43 (40.6)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e52.56 (10.3)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eWhole Sample (n\u0026thinsp;=\u0026thinsp;299)\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c2\"\u003e\u003cp\u003e68 (22.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e106 (35.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e128 (42.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e53.2 (11.7)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"5\"\u003e\u003cem\u003eNote.\u003c/em\u003e 85 individuals did not complete the TAS questionnaire.\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eSupplementary table 2 demonstrated the language pattern of these two patient groups across topics, exhibiting both between and within individual differences in language use (i.e., intra- and inter-individual data heterogeneity).\u003c/p\u003e\u003cdiv id=\"Sec11\" class=\"Section2\"\u003e\u003ch2\u003eHypothesis 1\u003c/h2\u003e\u003cp\u003eAs shown in Table\u0026nbsp;\u003cspan refid=\"Tab3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, subgroup disparity was observed in detecting suicidal ideation, with as much as 0.11 difference in AUCs. The topic-general classifier 1 (i.e., using transcripts of all responses) performed sub-optimally in alexithymia group.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003ePerformance of topic-general classifier in detecting suicidal ideation across patient groups\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"7\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eSensitivity\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eSpecificity\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003ePPV\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eNPV\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eAUC\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eNon-Alexithymia\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.40\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1.00\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e1.00\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.88***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eClassifier 1\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eAlexithymia\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.56\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.85\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.69\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.77\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.77***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e(topic-general)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eWhole Sample\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.51\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.90\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.61\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.86\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.83***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"7\"\u003e\u003cem\u003eNote.\u003c/em\u003e For AUC only: p\u0026thinsp;\u0026lt;\u0026thinsp;.05*, p\u0026thinsp;\u0026lt;\u0026thinsp;.01**, p\u0026thinsp;\u0026lt;\u0026thinsp;.001***. Optimal threshold was used to achieve balanced sensitivity, specificity, PPV and NPV.\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003eMediation analyses were conducted to test the mediating role of language features. Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e4\u003c/span\u003e displayed the mediation analysis results for mediators with significant indirect effects, excluding cases where the non-significant total effect was not due to opposing signs of the direct and indirect effects (i.e., total effect was cancelled out). All models have a good fit. Detailed explanations and examples of these mediators (i.e., LIWC language features/categories) can be found in supplementary table \u003cspan refid=\"MOESM1\" class=\"InternalRef\"\u003e1\u003c/span\u003e.\u003c/p\u003e\u003cp\u003eFor the topic-general classifier 1, an overall negative relationship between the patient group and TN (βc = -1.18, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) was observed, as partially mediated by \u0026ldquo;family-related words\u0026rdquo; (βab = -0.41, p\u0026thinsp;=\u0026thinsp;.024; βc\u0026rsquo; = -0.77, p\u0026thinsp;=\u0026thinsp;.023). On the other hand, an overall positive relationship between the patient group and FP (βc\u0026thinsp;=\u0026thinsp;5.37, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) was observed, as partially mediated by \u0026ldquo;word count\u0026rdquo; (βab\u0026thinsp;=\u0026thinsp;0.16, p\u0026thinsp;\u0026lt;\u0026thinsp;.001; βc\u0026rsquo; = 5.22, p\u0026thinsp;\u0026lt;\u0026thinsp;.001), religion-related words (βab =-0.41, p\u0026thinsp;=\u0026thinsp;.007; βc\u0026rsquo; = 5.78, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) and tentative words (βab =-0.44, p\u0026thinsp;=\u0026thinsp;.007; βc\u0026rsquo; = 5.81, p\u0026thinsp;\u0026lt;\u0026thinsp;.001).\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 4\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eMediation analysis of the relationship between patient group (IV) and individual performance metrics (DVs) of topic-general classifier as mediated by individual LIWC features (mediators).\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"6\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eMediator\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eDV\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eIndirect effect (ab)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eDirect effect (c\u0026rsquo;)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eTotal\u003c/p\u003e\u003cp\u003eeffect (c)\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003efamily-related words\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eTN\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e-0.41*\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e-0.77*\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e-1.18***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eClassifier 1\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eword count\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eFP\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.16***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e5.22***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e5.37***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e(topic-general)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003ereligion-related words\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eFP\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e-0.41**\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e5.78***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e5.37***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003etentative words\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eFP\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e-0.44**\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e5.81***\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e5.37***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"6\"\u003e\u003cem\u003eNote.\u003c/em\u003e TN: true negative. FP: false positive. p\u0026thinsp;\u0026lt;\u0026thinsp;.05*, p\u0026thinsp;\u0026lt;\u0026thinsp;.01**, p\u0026thinsp;\u0026lt;\u0026thinsp;.001***\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\u003ch2\u003eHypothesis 2\u003c/h2\u003e\u003cp\u003eBased on the findings of hypothesis 1, where a subgroup disparity was found, two additional topic-specific classifiers (factorization models) were trained with decomposed intra-individual heterogeneity. As indicated by Fig.\u0026nbsp;\u003cspan refid=\"Fig3\" class=\"InternalRef\"\u003e3\u003c/span\u003e, both mood-related classifier 2 and suicide-specific classifier 3 demonstrated significant improved performances for alexithymia group and whole sample (all p\u0026thinsp;\u0026lt;\u0026thinsp;.001), reducing the subgroup disparity drastically. As shown in Table\u0026nbsp;\u003cspan refid=\"Tab5\" class=\"InternalRef\"\u003e5\u003c/span\u003e, differences in AUCs decreased from 0.11 for topic-general classifier 1 to 0.05 and 0.01 for mood-related classifier 2 and suicide-specific classifier 3, respectively.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003e\u003cstrong\u003eNote\u003c/strong\u003e\u003cp\u003ep\u0026thinsp;\u0026lt;\u0026thinsp;.05*, p\u0026thinsp;\u0026lt;\u0026thinsp;.01**, p\u0026thinsp;\u0026lt;\u0026thinsp;.001*** as compared to topic-general classifier using roc.test.\u003c/p\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab5\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 5\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003ePerformance of topic-specific classifiers in detecting suicidal ideation in both groups and whole sample.\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"7\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eSensitivity\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eSpecificity\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003ePPV\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eNPV\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eAUC\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eClassifier 2\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eNon-Alexithymia\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.80\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e1.00\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e1.00\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.98***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e(mood-related)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eAlexithymia\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.85\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.91\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.85\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.91\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.93***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eWhole Sample\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.82\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.97\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.89\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.95\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.96***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e\u003cb\u003eClassifier 3\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eNon-Alexithymia\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.80\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.96\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.50\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.99\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.94***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003e(suicide-specific)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eAlexithymia\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.82\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.96\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.91\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.90\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.95***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eWhole Sample\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c3\"\u003e\u003cp\u003e0.84\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c4\"\u003e\u003cp\u003e0.95\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.83\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c6\"\u003e\u003cp\u003e0.95\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c7\"\u003e\u003cp\u003e0.96***\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"7\"\u003e\u003cem\u003eNote.\u003c/em\u003e For AUC only: p\u0026thinsp;\u0026lt;\u0026thinsp;.05*, p\u0026thinsp;\u0026lt;\u0026thinsp;.01**, p\u0026thinsp;\u0026lt;\u0026thinsp;.001***. Optimal threshold was used to achieve balanced sensitivity, specificity, PPV and NPV.\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003c/div\u003e"},{"header":"Discussion","content":"\u003cp\u003eThis study examined the effect of inter- and intra-individual data heterogeneity on the performance of BERT model in detecting suicidal ideations and explored whether topic-specific classifiers (i.e., factorization models) could address the subgroup disparity raised by the inter-individual data heterogeneity. Our mediation analysis of the topic-general classifier 1 showed that the presence of alexithymia impacted various performance metrics, including TN and FP, which would be partially explained by language features. Thus, our first hypothesis of the subgroup disparity in AI detection model of suicidal ideation as mediated by language features was supported. More specifically, our results indicated that alexithymia group tended to use more family-related words and tentative words, which were associated with a decreased likelihood of true detections and an increased likelihood of false detection of suicidal ideation respectively. Our findings were consistent with the literature, suggesting that individuals with alexithymia, due to their unique language use and difficulty in emotional expression, might confuse the AI detection model. For example, an increased use of social-related words, such as family and friends in non-alexithymia individuals tended to imply better social integration, while in alexithymia individuals, it might imply the opposite. According to Spitzer et al. (\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2005\u003c/span\u003e), alexithymia individuals tended to encounter more interpersonal problems and have distinct interpersonal style, due to the deficit in emotional processing. Therefore, an increased use of social-related words (e.g., family) might not necessarily reflect better social integration in alexithymia individuals (Meganck et al., \u003cspan citationid=\"CR22\" class=\"CitationRef\"\u003e2009\u003c/span\u003e). Similarly, Welding \u0026amp; Samur (\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e2018\u003c/span\u003e) indicated that alexithymia individuals did not perceive emotional words to be more salient than neutral words, meaning that neutral words could also represent emotional salience in alexithymia individuals.\u003c/p\u003e\u003cp\u003eTo address the subgroup disparity, we employed factorization by training two topic-specific classifiers (i.e., mood-related classifier 2 and suicide-specific classifier 3). A significant improvement in AUCs was found for both topic-specific classifiers for alexithymia group and whole sample, reducing the subgroup disparity drastically. This result was consistent with previous findings, indicating that decomposing intra-individual heterogeneity could improve model performance (Zhou et al., \u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) and highlights the presence of topic-specific indicators for suicidal ideation (Li et al., \u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Thus, our second hypothesis that decomposing intra-individual heterogeneity could offload task difficulty and consequently reduce the subgroup disparity, was supported. Furthermore, the improvement in mood-related classifier 2 (i.e., no suicide-related questions or responses were used for model training) implied that decomposing intra-individual heterogeneity could have already offloaded the model without the necessity to elicit sensitive suicide-related memories. One possible explanation for the unsignificant improvement in non-alexithymia group could be the high intra-individual difference in language use across different conversation topics among the alexithymia individuals (compared to non-alexithymia individuals), suggesting that utilizing factorization model could yield greater benefits for alexithymia group. Indeed, studies have demonstrated that the communication deficit of alexithymia individuals might rather be domain-specific than domain-general (Wang et al., 2022).\u003c/p\u003e\u003cp\u003eOverall, the significance of such subgroup disparity in topic-general classifier 1 implied the need to enhance the generalizability and equity of AI detection models for suicidal ideation. In addition, clinicians should interpret the detection result of AI model with attention to patient characteristics and contextual information as these could impact the accuracy of the detection results. However, as informed by the findings of this study, offloading the task difficulty by decomposing data complexity could be a promising solution. More importantly, the proposed solution required no patient information to train the model, thereby ensuring the confidentiality and resolving situations where patient information is inaccessible (e.g., social media data).\u003c/p\u003e\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\u003ch2\u003eLimitations \u0026amp; Future Directions\u003c/h2\u003e\u003cp\u003eSome limitations of the study should be noted. The current study only explored languages at a lexicon level (in terms of frequency) in mediation analysis. As all mediators found in the current study only partially mediated the effect of alexithymia on performance metrics, other language features such as syntactic features (e.g., grammatical structure, word dependency) might also play an explanatory role in the observed subgroup disparity. Even at a lexicon level, not only the frequency of the lexicon matters but also the complexity and valence of the lexicon. Studies increasingly revealed that while alexithymia individuals demonstrated a comparable frequency of emotional word use, their use of emotional words was less nuanced than non-alexithymia individuals, indicating a qualitative rather than quantitative difference in their language use (Wotschack \u0026amp; Klann-Delius, \u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e2013\u003c/span\u003e). Similarly, the valance of the same words can be significantly different, such as the use of social-related words can represent both social conflicts and social supports or connections. Future studies may adopt a mixed method approach to investigate different language features.\u003c/p\u003e\u003c/div\u003e"},{"header":"Conclusion","content":"\u003cp\u003eThe current study revealed a subgroup disparity between alexithymia and non-alexithymia group in AI detection models for suicidal ideation. This subgroup disparity could be partially explained by specific language features and largely reduced by training a factorization model with decomposed intra-individual data heterogeneity (i.e., topic-specific classifiers). These findings provide valuable insights into the development of more equitable and generalizable AI models for suicidal ideation detection.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003ch2\u003eCompeting Interests\u003c/h2\u003e\u003cp\u003eYKW received personal fees from Eisai Co. for lectures and travel support from Lundbeck HK Limited and Aculys Pharma, Japan. JWYC received personal fee from Eisai Co., Ltd and travel support from Lundbeck HK limited for overseas conference. All other authors declare no financial or non-financial competing interests.\u003c/p\u003e\u003c/p\u003e\u003ch2\u003eFunding\u003c/h2\u003e\u003cp\u003eThis work was supported by the Health and Medical Research Fund (09203066 and 21220821), General Research Fund (14106223), Innovation and Technology Support Programme (ITS/178/22), CUHK Direct Grant for Research (2022.073 and 2024.061), CUHK IdeaBooster Fund (IDBF24MED15), and CUHK Improvement on Competitiveness in Hiring New Faculties Funding Scheme (371). CL was supported by the Faculty Postdoctoral Fellowship Scheme of the Chinese University of Hong Kong (FPFS/23\u0026ndash;24/024).\u003c/p\u003e\u003ch2\u003eAuthor Contribution\u003c/h2\u003e\u003cp\u003eConceptualization, Y.K.W. and T.M.H.L.; Data curation, J.C. and K.Y.C.; Formal analysis, R.H. ; Funding acquisition, Y.K.W. and T.M.H.L.; Writing\u0026mdash;original draft, R.H. ; Writing\u0026mdash;review and editing, R.H., L.X., C.C.W.C, J.C., K.Y.C., C.L., J.W.Y.C., S.W.H.C., N.Y.C., B.H., Y.K.W. and T.M.H.L. All authors will be informed about each step of manuscript processing, including submission, revision, revision reminder, etc., via emails from our system or the assigned Assistant Editor. All authors have read and agreed to the published version of the manuscript.\u003c/p\u003e\u003ch2\u003eData Availability\u003c/h2\u003e\u003cp\u003eThe datasets generated and/or analysed during the current study and the underlying code are not publicly available due to [MASK FOR REVIEW] but are available from the corresponding author on reasonable request.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eBoyd, R. L., Ashwini Ashokkumar, Seraj, S., \u0026amp; Pennebaker, J. W. (2022). \u003cem\u003eThe Development and Psychometric Properties of LIWC-22\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.13140/RG.2.2.23890.43205\u003c/span\u003e\u003cspan address=\"10.13140/RG.2.2.23890.43205\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCamia, C., Desmedt, O., \u0026amp; Luminet, O. (2020). Exploring autobiographical memory specificity and narrative emotional processing in alexithymia. \u003cem\u003eNarrative Inquiry\u003c/em\u003e, \u003cem\u003e30\u003c/em\u003e(1), 59\u0026ndash;79. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1075/ni.18089.kob\u003c/span\u003e\u003cspan address=\"10.1075/ni.18089.kob\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChan, J. W., Lam, S., Li, S. X., Chau, S. W., Chan, S., Chan, N., Zhang, J., \u0026amp; Wing, Y. (2022). Adjunctive bright light treatment with gradual advance in unipolar major depressive disorder with evening chronotype \u0026ndash; A randomized controlled trial. \u003cem\u003ePsychological Medicine\u003c/em\u003e, \u003cem\u003e52\u003c/em\u003e(8), 1448\u0026ndash;1457. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1017/S0033291720003232\u003c/span\u003e\u003cspan address=\"10.1017/S0033291720003232\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen, J., Chan, N. Y., Li, C.-T., Chan, J. W. Y., Liu, Y., Li, S. X., Chau, S. W. H., Leung, K. S., Heng, P.-A., Lee, T. M. C., Li, T. M. H., \u0026amp; Wing, Y.-K. (2024). Multimodal digital assessment of depression with actigraphy and app in Hong Kong Chinese. \u003cem\u003eTranslational Psychiatry\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(1), 150. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41398-024-02873-4\u003c/span\u003e\u003cspan address=\"10.1038/s41398-024-02873-4\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eChen, J., Li, C.-T., Chan, N. Y., Chen, C. X., Chen, S., Chan, J. W. Y., Liu, Y., Li, S. X., Chau, S. W. H., Leung, K. S., Heng, P.-A., Lee, T. M. C., Li, T. M. H., \u0026amp; Wing, Y.-K. (under review). Capturing omega sign in the clinical assessment of depression by deep learning. \u003cem\u003eUnder Review\u003c/em\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCheng, Q., Li, T. M., Kwok, C.-L., Zhu, T., \u0026amp; Yip, P. S. (2017). Assessing Suicide Risk and Emotional Distress in Chinese Social Media: A Text Mining and Machine Learning Study. \u003cem\u003eJournal of Medical Internet Research\u003c/em\u003e, \u003cem\u003e19\u003c/em\u003e(7), e7276. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/jmir.7276\u003c/span\u003e\u003cspan address=\"10.2196/jmir.7276\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCoppersmith, G., Ngo, K., Leary, R., \u0026amp; Wood, A. (2016). Exploratory Analysis of Social Media Prior to a Suicide Attempt. In K. Hollingshead \u0026amp; L. Ungar (Eds), \u003cem\u003eProceedings of the Third Workshop on Computational Linguistics and Clinical Psychology\u003c/em\u003e (pp. 106\u0026ndash;117). Association for Computational Linguistics. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.18653/v1/W16-0311\u003c/span\u003e\u003cspan address=\"10.18653/v1/W16-0311\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDe Berardis, D., Fornaro, M., Orsolini, L., Valchera, A., Carano, A., Vellante, F., Perna, G., Serafini, G., Gonda, X., Pompili, M., Martinotti, G., \u0026amp; Di Giannantonio, M. (2017). Alexithymia and Suicide Risk in Psychiatric Disorders: A Mini-Review. \u003cem\u003eFrontiers in Psychiatry\u003c/em\u003e, \u003cem\u003e8\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fpsyt.2017.00148\u003c/span\u003e\u003cspan address=\"10.3389/fpsyt.2017.00148\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFu, Z., Hsu, Y. C., Chan, C. S., Lau, C. M., Liu, J., \u0026amp; Yip, P. S. F. (2024). Efficacy of ChatGPT in Cantonese Sentiment Analysis: Comparative Study. \u003cem\u003eJournal of Medical Internet Research\u003c/em\u003e, \u003cem\u003e26\u003c/em\u003e(1), e51069. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/51069\u003c/span\u003e\u003cspan address=\"10.2196/51069\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHoman, S., Gabi, M., Klee, N., Bachmann, S., Moser, A.-M., Duri\u0026rsquo;, M., Michel, S., Bertram, A.-M., Maatz, A., Seiler, G., Stark, E., \u0026amp; Kleim, B. (2022). Linguistic features of suicidal thoughts and behaviors: A systematic review. \u003cem\u003eClinical Psychology Review\u003c/em\u003e, \u003cem\u003e95\u003c/em\u003e, 102161. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.cpr.2022.102161\u003c/span\u003e\u003cspan address=\"10.1016/j.cpr.2022.102161\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHu, L., \u0026amp; Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. \u003cem\u003eStructural Equation Modeling: A Multidisciplinary Journal\u003c/em\u003e, \u003cem\u003e6\u003c/em\u003e(1), 1\u0026ndash;55. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/10705519909540118\u003c/span\u003e\u003cspan address=\"10.1080/10705519909540118\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eHuang, R., Yi, S., Chen, J., Chan, K. Y., Chan, J. W. Y., Chan, N. Y., Li, S. X., Wing, Y. K., \u0026amp; Li, T. M. H. (2024). Exploring the Role of First-Person Singular Pronouns in Detecting Suicidal Ideation: A Machine Learning Analysis of Clinical Transcripts. \u003cem\u003eBehavioral Sciences\u003c/em\u003e, \u003cem\u003e14\u003c/em\u003e(3), Article 3. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/bs14030225\u003c/span\u003e\u003cspan address=\"10.3390/bs14030225\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJi, S., Pan, S., Li, X., Cambria, E., Long, G., \u0026amp; Huang, Z. (2021). Suicidal Ideation Detection: A Review of Machine Learning Methods and Applications. \u003cem\u003eIEEE Transactions on Computational Social Systems\u003c/em\u003e, \u003cem\u003e8\u003c/em\u003e(1), 214\u0026ndash;226. IEEE Transactions on Computational Social Systems. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TCSS.2020.3021467\u003c/span\u003e\u003cspan address=\"10.1109/TCSS.2020.3021467\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKim, K., Choi, S., Lee, J., \u0026amp; Sea, J. (2019). Differences in linguistic and psychological characteristics between suicide notes and diaries. \u003cem\u003eThe Journal of General Psychology\u003c/em\u003e, \u003cem\u003e146\u003c/em\u003e(4), 391\u0026ndash;416. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1080/00221309.2019.1590304\u003c/span\u003e\u003cspan address=\"10.1080/00221309.2019.1590304\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLam, C., Xian, L., Huang, R., Chen, J., Chan, K. Y., Chan, J. W. Y., Chau, S. W. H., Chan, N. Y., Li, S. X., Wing, Y.-K., \u0026amp; Li, T. M. H. (under review). Do patients or AI know better about depressive symptoms? A deep learning study of individuals with and without alexithymia. \u003cem\u003eUnder Review\u003c/em\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLee, D., \u0026amp; Seung, H. S. (2000). Algorithms for Non-negative Matrix Factorization. \u003cem\u003eAdvances in Neural Information Processing Systems\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://proceedings.neurips.cc/paper_files/paper/2000/hash/f9d1152547c0bde01830b7e8bd60024c-Abstract.html\u003c/span\u003e\u003cspan address=\"https://proceedings.neurips.cc/paper_files/paper/2000/hash/f9d1152547c0bde01830b7e8bd60024c-Abstract.html\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLekkas, D., Klein, R. J., \u0026amp; Jacobson, N. C. (2021). Predicting acute suicidal ideation on Instagram using ensemble machine learning models. \u003cem\u003eInternet Interventions\u003c/em\u003e, \u003cem\u003e25\u003c/em\u003e, 100424. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.invent.2021.100424\u003c/span\u003e\u003cspan address=\"10.1016/j.invent.2021.100424\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLi, T. M. H., Chen, J., Law, F. O. C., Li, C.-T., Chan, N. Y., Chan, J. W. Y., Chau, S. W. H., Liu, Y., Li, S. X., Zhang, J., Leung, K.-S., \u0026amp; Wing, Y.-K. (2023). Detection of Suicidal Ideation in Clinical Interviews for Depression Using Natural Language Processing and Machine Learning: Cross-Sectional Study. \u003cem\u003eJMIR Medical Informatics\u003c/em\u003e, \u003cem\u003e11\u003c/em\u003e(1), e50221. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.2196/50221\u003c/span\u003e\u003cspan address=\"10.2196/50221\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLibin, A., Treitler, J. T., Vasaitis, T., \u0026amp; Shao, Y. (2024). Evaluating and Reducing Subgroup Disparity in AI Models: An Analysis of Pediatric COVID-19 Test Outcomes. \u003cem\u003emedRxiv: The Preprint Server for Health Sciences\u003c/em\u003e, 2024.09.18.24313889. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1101/2024.09.18.24313889\u003c/span\u003e\u003cspan address=\"10.1101/2024.09.18.24313889\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMalgaroli, M., Hull, T. D., Zech, J. M., \u0026amp; Althoff, T. (2023). Natural language processing for mental health interventions: A systematic review and research framework. \u003cem\u003eTranslational Psychiatry\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e(1), 309. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41398-023-02592-2\u003c/span\u003e\u003cspan address=\"10.1038/s41398-023-02592-2\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMeganck, R., Vanheule, S., \u0026amp; Desmet, M. (2008). Does the TAS-20 measure alexithymia? A study of natural language use. \u003cem\u003e28th European Conference on Psychosomatic Research, Abstracts\u003c/em\u003e. 28th European Conference on Psychosomatic Research (ECPR \u0026ndash;\u0026thinsp;2012). \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttp://hdl.handle.net/1854/LU-3036354\u003c/span\u003e\u003cspan address=\"http://hdl.handle.net/1854/LU-3036354\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMeganck, R., Vanheule, S., Inslegers, R., \u0026amp; Desmet, M. (2009). Alexithymia and interpersonal problems: A study of natural language use. \u003cem\u003ePersonality and Individual Differences\u003c/em\u003e, \u003cem\u003e47\u003c/em\u003e(8), 990\u0026ndash;995. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.paid.2009.08.005\u003c/span\u003e\u003cspan address=\"10.1016/j.paid.2009.08.005\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNguyen, T., O\u0026rsquo;Dea, B., Larsen, M., Phung, D., Venkatesh, S., \u0026amp; Christensen, H. (2017). Using linguistic and topic analysis to classify sub-groups of online depression communities. \u003cem\u003eMultimedia Tools and Applications\u003c/em\u003e, \u003cem\u003e76\u003c/em\u003e(8), 10653\u0026ndash;10676. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1007/s11042-015-3128-x\u003c/span\u003e\u003cspan address=\"10.1007/s11042-015-3128-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePennebaker, J. W., \u0026amp; King, L. A. (1999). Linguistic styles: Language use as an individual difference. \u003cem\u003eJournal of Personality and Social Psychology\u003c/em\u003e, \u003cem\u003e77\u003c/em\u003e(6), 1296\u0026ndash;1312. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1037/0022-3514.77.6.1296\u003c/span\u003e\u003cspan address=\"10.1037/0022-3514.77.6.1296\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSpitzer, C., Siebel-J\u0026uuml;rges, U., Barnow, S., Grabe, H. J., \u0026amp; Freyberger, H. J. (2005). Alexithymia and Interpersonal Problems. \u003cem\u003ePsychotherapy and Psychosomatics\u003c/em\u003e, \u003cem\u003e74\u003c/em\u003e(4), 240\u0026ndash;246. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1159/000085148\u003c/span\u003e\u003cspan address=\"10.1159/000085148\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTadesse, M. M., Lin, H., Xu, B., \u0026amp; Yang, L. (2020). Detection of Suicide Ideation in Social Media Forums Using Deep Learning. \u003cem\u003eAlgorithms\u003c/em\u003e, \u003cem\u003e13\u003c/em\u003e(1), Article 1. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/a13010007\u003c/span\u003e\u003cspan address=\"10.3390/a13010007\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVioul\u0026egrave;s, M. J., Moulahi, B., Az\u0026eacute;, J., \u0026amp; Bringay, S. (2018). Detection of suicide-related posts in Twitter data streams. \u003cem\u003eIBM Journal of Research and Development\u003c/em\u003e, \u003cem\u003e62\u003c/em\u003e(1), 7:1\u0026ndash;7:12. IBM Journal of Research and Development. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1147/JRD.2017.2768678\u003c/span\u003e\u003cspan address=\"10.1147/JRD.2017.2768678\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWelding, C., \u0026amp; Samur, D. (2018). Language Processing in Alexithymia. In O. Luminet, R. M. Bagby, \u0026amp; G. J. Taylor (Eds), \u003cem\u003eAlexithymia: Advances in Research, Theory, and Clinical Practice\u003c/em\u003e (1st edn, pp. 90\u0026ndash;104). Cambridge University Press. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1017/9781108241595.008\u003c/span\u003e\u003cspan address=\"10.1017/9781108241595.008\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWHO. (2021). \u003cem\u003eSuicide Worldwide In 2019: Global Health Estimates\u003c/em\u003e (1st ed). World Health Organization.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWotschack, C., \u0026amp; Klann-Delius, G. (2013). Alexithymia and the conceptualization of emotions: A study of language use and semantic knowledge. \u003cem\u003eJournal of Research in Personality\u003c/em\u003e, \u003cem\u003e47\u003c/em\u003e(5), 514\u0026ndash;523. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.jrp.2013.01.011\u003c/span\u003e\u003cspan address=\"10.1016/j.jrp.2013.01.011\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhang, T., Schoene, A. M., \u0026amp; Ananiadou, S. (2021). Automatic identification of suicide notes with a transformer-based deep learning model. \u003cem\u003eInternet Interventions\u003c/em\u003e, \u003cem\u003e25\u003c/em\u003e, 100422. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.invent.2021.100422\u003c/span\u003e\u003cspan address=\"10.1016/j.invent.2021.100422\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhou, Y., Liao, L., Gao, Y., Wang, R., \u0026amp; Huang, H. (2023). TopicBERT: A Topic-Enhanced Neural Language Model Fine-Tuned for Sentiment Classification. \u003cem\u003eIEEE Transactions on Neural Networks and Learning Systems\u003c/em\u003e, \u003cem\u003e34\u003c/em\u003e(1), 380\u0026ndash;393. IEEE Transactions on Neural Networks and Learning Systems. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TNNLS.2021.3094987\u003c/span\u003e\u003cspan address=\"10.1109/TNNLS.2021.3094987\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eZhu, X., Yi, J., Yao, S., Ryder, A. G., Taylor, G. J., \u0026amp; Bagby, R. M. (2007). Cross-cultural validation of a Chinese translation of the 20-item Toronto Alexithymia Scale. \u003cem\u003eComprehensive Psychiatry\u003c/em\u003e, \u003cem\u003e48\u003c/em\u003e(5), 489\u0026ndash;496. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.comppsych.2007.04.007\u003c/span\u003e\u003cspan address=\"10.1016/j.comppsych.2007.04.007\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"npj-digital-medicine","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjdigitalmed","sideBox":"Learn more about [npj Digital Medicine](http://www.nature.com/npjdigitalmed/)","snPcode":"41746","submissionUrl":"https://submission.springernature.com/new-submission/41746/3","title":"npj Digital Medicine","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"","lastPublishedDoi":"10.21203/rs.3.rs-7657467/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7657467/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eObjectives\u003c/h2\u003e\u003cp\u003eWith the recent advancement of artificial intelligence (AI) and large language models (LLMs), the use of text analysis to detect suicidal ideation can be a promising tool. However, the performance of such detection system could be influenced by the language use difference caused by individuals\u0026rsquo; alexithymic characteristics (difficulties in expressing emotion with unique language pattern), resulting in the subgroup disparity. The current study aims to explore the capability of a detection system on a clinical sample of heterogeneous language use (i.e., systematic difference in language use as influenced by patient characteristics and the language context).\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e\u003cp\u003eAI models (classifiers) were trained with 5-fold cross-validation using clinical transcripts of 299 individuals (n\u0026thinsp;=\u0026thinsp;193 with major depressive disorder and 106 controls without psychiatric problems) to detect suicidal ideation. More specifically, the topic-general classifier was trained using full clinical transcripts while the topic-specific classifiers (i.e., factorization models) were trained using specific sections of the clinical transcripts, focusing on either mood-related or suicide-specific topics. The performance of the classifiers was assessed in both groups (alexithymia and non-alexithymia) and whole sample. Mediation analyses were conducted to further investigate the role of language features in explaining the subgroup disparity.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e\u003cp\u003eResults showed subgroup disparity in topic-general classifier between alexithymia and non-alexithymia groups at which alexithymia group was associated with a decreased likelihood of true detection of suicidal ideation (OR\u0026thinsp;=\u0026thinsp;0.31, p\u0026thinsp;\u0026lt;\u0026thinsp;.001) and unique language features, such as family-related words (p\u0026thinsp;=\u0026thinsp;.02), played a mediating/explanatory role. Furthermore, topic-specific classifiers demonstrated superior performance (AUC\u0026thinsp;=\u0026thinsp;0.96) compared to topic-general classifier (AUC\u0026thinsp;=\u0026thinsp;0.83) and the subgroup disparity was largely reduced.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e\u003cp\u003eModels trained on a heterogeneous clinical population may not be equitably effective in detecting suicidal ideation in patient groups with and without alexithymia. The development of a factorization model is pertinent to enhance generalizability and equity, especially when patient characteristics are inaccessible or confidential for model training. Meanwhile, clinicians should interpret model predictions with caution due to the influence that patient characteristics might have on the model performance.\u003c/p\u003e","manuscriptTitle":"Exploring the Generalizability and Explainability of LLMs in Detecting Suicidal Ideation: The Impact of Data Heterogeneity","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-12 02:20:38","doi":"10.21203/rs.3.rs-7657467/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2025-12-16T17:54:55+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-12-10T08:00:27+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-12-01T17:28:48+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"307534701801499374807066511200052202906","date":"2025-11-28T16:48:11+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"128105454911391654272181943376215664910","date":"2025-11-25T13:39:27+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"183154248765278265152958668344822194954","date":"2025-11-05T01:48:51+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-10-31T00:22:37+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-10-21T19:43:23+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-10-21T15:13:44+00:00","index":"","fulltext":""},{"type":"submitted","content":"npj Digital Medicine","date":"2025-09-19T10:14:18+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"npj-digital-medicine","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"npjdigitalmed","sideBox":"Learn more about [npj Digital Medicine](http://www.nature.com/npjdigitalmed/)","snPcode":"41746","submissionUrl":"https://submission.springernature.com/new-submission/41746/3","title":"npj Digital Medicine","twitterHandle":"","acdcEnabled":true,"dfaEnabled":true,"editorialSystem":"stoa","reportingPortfolio":"NPJ","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"6483b283-2762-40b9-a21e-6d210566da01","owner":[],"postedDate":"November 12th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"under-review","subjectAreas":[{"id":57687987,"name":"Health sciences/Health care"},{"id":57687988,"name":"Biological sciences/Psychology"},{"id":57687989,"name":"Social science/Psychology"}],"tags":[],"updatedAt":"2026-04-21T03:38:29+00:00","versionOfRecord":[],"versionCreatedAt":"2025-11-12 02:20:38","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-7657467","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7657467","identity":"rs-7657467","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

Ask this paper AI returns verbatim quotes from the full text · source: preprint-html

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc
last seen: 2026-05-20T01:45:00.602351+00:00