Machine learning models for predicting work-related sickness absence due to mental disorders using national surveillance data in Brazil

doi:10.21203/rs.3.rs-8339860/v1

Machine learning models for predicting work-related sickness absence due to mental disorders using national surveillance data in Brazil

2025 · doi:10.21203/rs.3.rs-8339860/v1

preprint OA: closed

Full text JSON View at publisher

Full text 133,656 characters · extracted from preprint-html · click to expand

Machine learning models for predicting work-related sickness absence due to mental disorders using national surveillance data in Brazil | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Machine learning models for predicting work-related sickness absence due to mental disorders using national surveillance data in Brazil Beatriz Queiroz Reis, Letícia Martins Raposo This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-8339860/v1 This work is licensed under a CC BY 4.0 License Status: Posted Version 1 posted You are reading this latest preprint version Abstract Purpose To develop and compare supervised machine learning models to predict sickness absence among workers notified with work-related mental disorders in the Brazilian National Notifiable Diseases Information System (SINAN), and to identify the most influential predictors associated with this outcome. Methods A cross-sectional study was conducted using SINAN records from 2006–2024. The analytical sample comprised 4,217 workers aged ≥ 18 years with ICD-10 mental or behavioral disorders (F00–F99, Z73.0). Three supervised algorithms—Decision Tree, Random Forest, and Extreme Gradient Boosting (XGBoost)—were trained using an 80/20 stratified split. Performance was evaluated using accuracy, sensitivity, specificity, precision, F1-score, and AUC-ROC, accompanied by 95% confidence intervals. Model interpretability and feature importance were assessed using SHAP values. Results The three models exhibited comparable performance, with overlapping 95% CIs. AUC-ROC values ranged from 0.697 (Decision Tree) to 0.745 (XGBoost), and accuracy ranged from 0.665 to 0.691. SHAP analyses identified structural and service-related variables—specifically the issuance of work accident reports, referral to psychosocial care centers, geographic region, psychotropic medication use, and employment status—as the primary drivers of prediction. Conclusions Supervised machine learning models demonstrated robust predictive capacity and represent promising tools for occupational health surveillance. Predictions within the SINAN context were driven primarily by structural and organizational factors rather than individual characteristics, underscoring the critical role of institutional and territorial determinants in work-related mental health outcomes. Occupational Health Mental Disorders Sick Leave Machine Learning Epidemiological Monitoring Occupational Exposure Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 1. Introduction Work-related mental disorders have become a growing public health concern, generating substantial societal and economic impacts through sickness absence, reduced productivity, and increased healthcare demands (Axén et al. 2020 ; Rugulies et al. 2023 ). Sickness absence serves as an indicator of psychological distress and shortcomings in primary prevention systems (Terluin et al. 2011 ; Halonen et al. 2020 ). In this context, the early identification of workers at high risk is essential for strengthening occupational health surveillance and informing preventive interventions. Machine learning (ML) techniques offer promising opportunities for predicting work-related mental health outcomes by identifying risk profiles and supporting evidence-based decision-making. Prior studies have demonstrated the effectiveness of supervised algorithms, particularly tree-based models, in selecting relevant predictors of mental health risk. For example, Katarya and Maan ( 2020 ) reported the strong performance of Decision Tree and Random Forest models for identifying factors such as family history, prior symptoms, and occupational context. Similarly, Reddy et al. (2018) applied a boosting model to IT professionals and achieved a high accuracy in predicting stress, highlighting gender, family history, and psychological support as determinants of vulnerability. Despite these methodological advances, few investigations have focused on the Brazilian context or utilized national epidemiological surveillance systems, such as the Brazilian National Notifiable Diseases Information System (SINAN). Most international studies rely on restricted occupational groups and do not capture the heterogeneity of employment conditions and work arrangements in Brazil. This gap limits the generalizability of the findings and underscores the need for predictive approaches based on large-scale national data. The present study addresses this gap by applying three supervised ML models, Decision Tree, Random Forest, and Extreme Gradient Boosting (XGBoost), to predict sickness absence due to work-related mental disorders using secondary SINAN data. We hypothesized that tree-based algorithms would identify key demographic, occupational, and clinical predictors associated with work disability. The rationale is that integrating predictive analytics into epidemiological surveillance may enhance the early detection of high-risk groups and support the development of more effective prevention strategies in occupational mental health. 1. Methods 1.1. Study design and setting This observational, cross-sectional study utilized secondary data extracted from the SINAN. The dataset comprised notifications recorded between 2006 and 2024. The primary aim was to develop and evaluate ML models capable of predicting sickness absence among workers diagnosed with work-related mental and behavioral disorders. 1.2. Participants The target population consisted of workers aged ≥ 18 years with diagnoses compatible with work-related mental disorders, identified through International Classification of Diseases, 10th Revision (ICD-10) codes F00–F99 or Z73.0. Records lacking information on the outcome variable or containing inconsistencies in essential predictors were excluded. The outcome was defined as a binary variable indicating the occurrence of sickness absence (“Yes” vs. “No”). The class distribution was relatively balanced, with 60% of cases being positive. Consequently, no resampling or balancing techniques were applied, as the objective was to evaluate model performance under the natural class distribution. The initial dataset contained 8,435 records. After excluding records with missing outcome data (n = 584) and those with missing or inconsistent predictor information (n = 3,634), the final analytical sample comprised 4,217 observations (Fig. 1). All procedures, including preprocessing, model development, hyperparameter tuning, and visualization, were conducted in R version 4.5.1 using RStudio. 1.3. Variables and measurements Predictor variables were selected based on epidemiological plausibility, availability in SINAN forms, and potential relevance to sickness absence. The variables included: Demographic: Age group (18–29; 30–39; 40–49; 50–59; ≥60); Biological sex (female; male); Race/ethnicity (White; Black/Mixed; Asian/Indigenous); Educational level (up to primary; secondary; higher education). Occupational: Notification period (pre-pandemic; pandemic); Geographic region (North; Northeast; Center-West; Southeast; South); Employment status (formal employee; public servant; other); Outsourced employment (yes; no); Occupational group (based on CBO 2002: administrative services; agriculture/forestry/fishing; commerce; industrial goods and services; mid-level technicians; military/police/firefighters; public administration; repair and maintenance; science/arts). Clinical/Behavioral: Alcohol use (yes; no); Tobacco use (yes; no); Illicit drug use (yes; no); Psychotropic medication use (yes; no); Other workers affected by the same event (yes; no); Referred to psychosocial care centers (yes; no); Work accident report (yes; no). Diagnosis: Category (burnout syndrome; personality disorders; mood disorders; schizophrenic spectrum disorders; neurotic/stress-related disorders; unspecified mental disorder); Treatment regime (outpatient; hospital). 1.4. Data preprocessing Preprocessing steps ensured analytical consistency. Irrelevant administrative fields and identifiers were removed. Categorical variables were recoded to reduce sparsity and improve interpretability; specifically, rare categories within educational level and employment status were merged into broader groups (“up to primary education” and “other”, respectively). ICD-10 diagnoses were aggregated into major mental disorder groups. The dataset was split into training (80%) and testing (20%) subsets using stratified sampling to maintain outcome distributions across sets. 1.5. Bias and confounding control Selection bias was addressed by excluding records with missing essential information while preserving the representativeness of the SINAN database. Information bias was mitigated by using standardized variables from a national surveillance system with established reporting protocols. Confounding was managed intrinsically through the use of ML models capable of capturing nonlinear relationships and interactions among predictors. Tree-based algorithms naturally account for multicollinearity and complex dependencies. Overfitting was minimized through stratified 10-fold cross-validation, hyperparameter tuning, and evaluation on an independent test set. 1.6. Sample size and power he final analytical sample of 4,217 notifications exceeded the minimum requirements typically recommended for ML classification tasks with moderate dimensionality. Given the number of predictors and the outcome prevalence (60%), the study possessed adequate statistical power to support model training, tuning, and internal validation. Tree-based ensemble models, such as Random Forest and XGBoost, have demonstrated stable performance with sample sizes of this magnitude in epidemiological applications. 1.7. Modelling Three supervised ML algorithms were implemented for binary classification: Decision Tree, Random Forest, and XGBoost. Decision Tree was implemented using the rpart method in the caret package (Therneau and Atkinson 2025 ). The Gini impurity index was used as the splitting criterion. The complexity parameter (cp), which controls pruning, was tuned via cross-validation. The Random Forest model was fitted using the randomForest engine in caret (Liaw and Wiener 2002 ). Hyperparameter optimization focused on mtry (predictors sampled at each split). The XGBoost algorithm was implemented using the xgbTree method in caret . Tuning involved adjusting the learning rate (eta), maximum tree depth, boosting iterations (nrounds), L1 (α) and L2 (λ) regularization parameters, and subsampling rate (Mienye and Jere 2024 ). Model training was parallelized using the doParallel package (Corporation and Weston 2022 ). All models were trained using a randomized hyperparameter search (tuneLength = 1000), combined with stratified 10-fold cross-validation to ensure robust out-of-sample performance estimation. Model interpretability was examined using SHAP (SHapley Additive exPlanations) values computed with fastshap (Greenwell 2024 ) and visualized using shapviz (Mayer 2025 ). Performance was evaluated using accuracy, sensitivity (recall), specificity, precision (positive predictive value), F1-score, and the area under the receiver operating characteristic curve (AUC-ROC). A fixed decision threshold of 0.5 was applied. The AUC-ROC was estimated using the pROC package (Robin et al. 2011 ). All metrics were accompanied by 95% confidence intervals, computed via stratified bootstrap resampling (2,000 iterations) using functions from caret , pROC , and boot (Canty and Ripley 2025 ). 1.8. Ethical Considerations The study relied exclusively on fully anonymized, publicly accessible secondary data from SINAN. In accordance with Resolution No. 510/2016 of the Brazilian National Health Council, research utilizing public anonymized data is exempt from ethical review. 2. Results 2.1. Participants The age distribution was concentrated in the 30–39 (34%) and 40–49 (31%) groups. Female workers comprised 66% of the sample. Regarding race/ethnicity, White participants represented 56%, followed by Black/Mixed race individuals (percentage not provided in original text, usually recommended to include if available, otherwise keep as is). The majority of participants resided in the Southeast region (45%). Educational attainment was balanced between secondary (44%) and higher education (43%). The notification period was evenly distributed (pandemic: 52%; pre-pandemic: 48%). Regarding occupational groups, commerce was the most frequent (24%), followed by administrative services (20%). Employment was predominantly formal (66%) and non-outsourced (93%). Most patients received outpatient care (96%).\ Clinical and behavioral variables showed high frequencies of negative responses for alcohol use (91% “no”), illicit drug use (86% “no”), and smoking (91% “no”). Conversely, notable proportions reported psychotropic medication use (53%) and referral to psychosocial care centers (68%), consistent with greater functional severity or specialized care needs. The outcome of interest, sickness absence, was observed in 60% of cases. Workplace context data indicated that 62% of reports involved other workers affected by the same event. Among aggregated ICD-10 diagnoses, neurotic, stress-related, and somatoform disorders predominated (65%). No statistically significant differences were detected between the training and test distributions across covariates, confirming the preservation of representativeness (Table 1 ). Table 1 Descriptive characteristics and train–test comparison for all variables (n = 4,217). Variable Total (N = 4,217) Test (N = 3,373) Train (N = 844) p-value Age group 0.63 18–29 859 (20%) 186 (22%) 673 (20%) 30–39 1419 (34%) 287 (34%) 1132 (34%) 40–49 1289 (31%) 248 (29%) 1041 (31%) 50–59 582 (14%) 109 (13%) 473 (14%) 60 or older 68 (1.6%) 14 (1.7%) 54 (1.6%) Biological sex 0.81 Female 2798 (66%) 557 (66%) 2241 (66%) Male 1419 (34%) 287 (34%) 1132 (34%) Race/ethnicity 0.054 White 2364 (56%) 449 (53%) 1915 (57%) Black/Brown 1754 (42%) 368 (44%) 1386 (41%) Asian/Indigenous 99 (2.3%) 27 (3.2%) 72 (2.1%) Educational level 0.83 Up to primary 546 (13%) 104 (12%) 442 (13%) Secondary 1850 (44%) 373 (44%) 1477 (44%) Higher education 1821 (43%) 367 (43%) 1454 (43%) Notification period 0.78 Pre-pandemic 2045 (48%) 413 (49%) 1632 (48%) Pandemic 2172 (52%) 431 (51%) 1741 (52%) Geographic region 0.058 North 208 (4.9%) 48 (5.7%) 160 (4.7%) Northeast 1227 (29%) 263 (31%) 964 (29%) Central-West 253 (6.0%) 62 (7.3%) 191 (5.7%) Southeast 1889 (45%) 355 (42%) 1534 (45%) South 640 (15%) 116 (14%) 524 (16%) Employment status 0.68 Formal employee 2792 (66%) 549 (65%) 2243 (66%) Public servant 1283 (30%) 264 (31%) 1019 (30%) Others 142 (3.4%) 31 (3.7%) 111 (3.3%) Outsourced employment 0.74 No 3916 (93%) 786 (93%) 3130 (93%) Yes 301 (7.1%) 58 (6.9%) 243 (7.2%) Occupational group 0.23 Administrative services 832 (20%) 155 (18%) 677 (20%) Agriculture/forestry/fishing 29 (0.7%) 12 (1.4%) 17 (0.5%) Commerce 1033 (24%) 204 (24%) 829 (25%) Industrial goods and services 615 (15%) 129 (15%) 486 (14%) Mid-level technicians 558 (13%) 110 (13%) 448 (13%) Military/police/firefighters 16 (0.4%) 2 (0.2%) 14 (0.4%) Public administration 243 (5.8%) 55 (6.5%) 188 (5.6%) Repair and maintenance 91 (2.2%) 17 (2.0%) 74 (2.2%) Science/arts 800 (19%) 160 (19%) 640 (19%) Alcohol use 0.53 No 3826 (91%) 761 (90%) 3065 (91%) Yes 391 (9.3%) 83 (9.8%) 308 (9.1%) Tobacco use 0.72 No 3821 (91%) 762 (90%) 3059 (91%) Yes 396 (9.4%) 82 (9.7%) 314 (9.3%) Ilicit drug use 0.56 No 3614 (86%) 718 (85%) 2896 (86%) Yes 603 (14%) 126 (15%) 477 (14%) Psychotropic medication use 0.95 No 1968 (47%) 393 (47%) 1575 (47%) Yes 2249 (53%) 451 (53%) 1798 (53%) Other workers affected by the same event 0.54 No 1585 (38%) 325 (39%) 1260 (37%) Yes 2632 (62%) 519 (61%) 2113 (63%) Referred to psychosocial care centers 0.44 No 1361 (32%) 263 (31%) 1098 (33%) Yes 2856 (68%) 581 (69%) 2275 (67%) Work accident report 0.69 No 2638 (63%) 533 (63%) 2105 (62%) Yes 1579 (37%) 311 (37%) 1268 (38%) Diagnosis category 0.12 Burnout syndrome 322 (7.6%) 70 (8.3%) 252 (7.5%) Personality disorders 43 (1.0%) 8 (0.9%) 35 (1.0%) Mood disorders 951 (23%) 178 (21%) 773 (23%) Schizophrenic spectrum disorders 21 (0.5%) 7 (0.8%) 14 (0.4%) Neurotic/stress-related disorders 2749 (65%) 564 (67%) 2185 (65%) Unspecified mental disorder 131 (3.1%) 17 (2.0%) 114 (3.4%) Treatment regime > 0.99 Outpatient 4062 (96%) 813 (96%) 3249 (96%) Hospital 155 (3.7%) 31 (3.7%) 124 (3.7%) Work-related sick leave 0.46 No 1666 (40%) 324 (38%) 1342 (40%) Yes 2551 (60%) 520 (62%) 2031 (60%) 2.2. Model Optimization Hyperparameters were tuned via cross-validation to maximize model generalization. The final Decision Tree utilized a complexity parameter (cp) of 0.0017. For the Random Forest, the optimal number of predictors sampled at each split (mtry) was 3. The XGBoost model was optimized with the following parameters: nrounds = 775, max_depth = 9, eta = 0.015, gamma = 3.8, colsample_bytree = 0.43, min_child_weight = 1, and subsample = 0.85. 2.3. Performance Evaluation Predictive performance on the independent test set was comparable across algorithms, with overlapping 95% confidence intervals (CIs) for all metrics (Table 2 ). The AUC-ROC ranged from 0.697 (95% CI 0.661–0.733) for the Decision Tree to 0.745 (95% CI 0.712–0.778) for XGBoost. Overall accuracy varied narrowly (0.665–0.691). Table 2 Test-set performance metrics (area under the receiver operating characteristic curve (AUC-ROC), accuracy, sensitivity, specificity, precision, F1-score) with 95% confidence intervals for Decision Tree, Random Forest, and Extreme Gradient Boosting (XGBoost) models Decision Tree Random Forest XGBoost AUC-ROC 0.697 (0.661–0.733) 0.733 (0.699–0.767) 0.745 (0.712–0.778) Accuracy 0.665 (0.632–0.696) 0.689 (0.657–0.721) 0.691 (0.658–0.722) Sensitivity 0.786 (0.765–0.834) 0.861 (0.833–0.890) 0.834 (0.802–0.865) Specificity 0.469 (0.413–0.525) 0.413 (0.361–0.469) 0.460 (0.407–0.515) Precision 0.704 (0.682–0.727) 0.702 (0.682–0.722) 0.712 (0.690–0.734) F1-score 0.518 (0.471–0.564) 0.505 (0.453–0.553) 0.533 (0.482–0.578) Sensitivity and specificity exhibited the expected trade-off. Random Forest achieved the highest sensitivity (0.861; 95% CI 0.833–0.890), paired with lower specificity (0.413; 95% CI 0.361–0.469). Precision was stable across models (0.702–0.712), and F1-scores ranged from 0.505 to 0.533. Detailed metrics are presented in Table 2 . 2.4. Model Interpretability The Decision Tree (Fig. 2) identified the work accident report as the primary splitting criterion. Individuals with a formal work accident report exhibited a consistently high prevalence of sickness absence (74%). Among those without a work accident report, geographic region served as the main discriminator; workers outside the Northeast—particularly in the South—were more likely to be classified as negative (absence of sickness leave). Within the Southern subgroup, protective factors included diagnoses of neurotic/stress-related disorders and psychotropic medication use. Conversely, risk factors for sickness absence in specific subgroups included illicit drug use, outsourced employment, and notifications occurring during the pandemic period. Global variable importance based on mean absolute SHAP values highlighted the work accident report as the top predictor (> 0.065), followed by geographic region and referral to psychosocial care centers (> 0.040) (Fig. 3). The SHAP beeswarm plot (Fig. 4) visualizes the direction and dispersion of feature effects. High values for work accident reports were associated with strong positive SHAP values, indicating an increased probability of sickness absence. In contrast, variables such as geographic region and referral to psychosocial care displayed broader distributions, reflecting heterogeneous effects across categories. Demographics such as race/ethnicity showed SHAP values centered near zero, suggesting a minimal marginal contribution to the model’s predictions. Individual predictions are illustrated via SHAP waterfall plots (Fig. 5). High-risk cases (predicted probability > 0.85) were primarily driven by work accident reports, residence in the Northeast/Southeast, and referral to psychosocial care. Conversely, low-risk cases were characterized by residence in the South, absence of work accidents, and specific diagnostic categories. These plots confirm that while the work accident report is a dominant driver, the interplay of clinical and occupational factors finely tunes the individual risk prediction. 3. Discussion This study demonstrated that the three supervised machine learning models—Decision Tree, Random Forest, and XGBoost—exhibited similar predictive performance in identifying work-related sickness absence among workers with mental and behavioral disorders recorded in SINAN. Although XGBoost achieved a slightly higher AUC-ROC and accuracy, the overlapping confidence intervals indicate comparable discriminative capacity across algorithms. Crucially, the analysis revealed that a specific subset of structural and contextual predictors, particularly work accident reports, geographic region, and referral to psychosocial care centers, exerted greater influence on predicted probabilities than traditional individual-level characteristics. Our results diverge from those of Katarya and Maan ( 2020 ), who found that individual characteristics (e.g., personal and family mental health history and, to a lesser extent, gender) were the primary drivers of prediction, while organizational factors contributed minimally. In the present study, structural and service-related variables were dominant. The predictive power of the work accident report aligns with Brazilian evidence indicating that work-related mental disorders are frequent causes of sickness absence. Furthermore, trends in formal accident reporting closely track increases in such conditions, particularly in sectors with high emotional demands (de Araújo et al. 2024 ; de Souza Mendonça et al. 2024 ). Similarly, referral to psychosocial care centers likely serves as a proxy for clinical severity and the need for specialized care, factors typically linked to prolonged or recurrent absences (Sanine et al. 2024). The significant role of geographic region is consistent with documented territorial inequalities regarding the risk of mental disorder–related absence in Brazil. This is particularly evident in the Northeast and Southeast, where work intensity, productive organization, urbanization levels, and unequal access to services shape morbidity patterns (Bastos et al. 2023 ; Melo et al. 2023 ; de Souza Mendonça et al. 2024 ). The predictive relevance of psychotropic medication use is also concordant with findings that workers on sick leave frequently utilize these drugs, and that both clinical severity and treatment adherence influence the probability and duration of absence (Leão et al. 2021 ; Helgesson et al. 2024 ). Differences compared to Mitravinda et al. ( 2023 ) warrant specific attention, particularly regarding the impact of the COVID-19 pandemic. While that study—focused on IT professionals—reported heightened psychological symptoms and risk exposures post-pandemic, consistent with literature on increased mental health–related absences (van der Plaat et al. 2021 ; Barros-Areal et al. 2022 ), our SINAN-based models associated the pandemic period with a lower predicted probability of absence relative to the pre-pandemic period. This apparent discrepancy likely reflects the distinct data-generating processes. Mitravinda et al. ( 2023 ) employed subjective and psychosocial measures, whereas our analysis relied on formal notifications, which depend on care-seeking behaviors, service capacity, and administrative reporting dynamics. During the pandemic, access restrictions, health service overload, and work reorganization may have reduced formal notifications, even if underlying psychological distress increased. Furthermore, heterogeneity by productive sector explains divergent patterns: IT workers were disproportionately exposed to remote work intensification, whereas SINAN covers a broad occupational spectrum, including sectors affected by activity suspension. Our findings on employment status further support this interpretation; the lower predicted probability among public servants aligns with evidence that employment stability buffered financial anxiety during the crisis (Vieira et al. 2021 ), even as absenteeism remains a concern modulated by institutional factors (Fantazia et al. 2018 ; Bastos et al. 2023 ). Thus, subjective indicators may capture worsening distress, whereas administrative notifications reflect underreporting and access barriers during crises. Study strengths include: (i) the use of a national surveillance system (SINAN) spanning 2006–2024; (ii) methodological rigor, including stratified train–test splits, 10-fold cross-validation, and bootstrap CIs; and (iii) model interpretability, leveraging tree-based methods and SHAP explanations to illuminate drivers of predictions relevant to surveillance policy. However, limitations merit careful consideration. First, reliance on secondary administrative data introduces potential issues regarding underreporting, variable completeness, and regional heterogeneity in data quality. Second, by retaining the natural class distribution, specificity, and metrics sensitive to the negative class may have been affected, although this choice preserves real-world prevalence for public health applications. Third, key determinants of occupational mental health—such as prior clinical history, psychosocial demands, work intensity, tenure, and direct measures of working conditions—are unavailable in SINAN, constraining explanatory power and shifting importance toward service/administrative proxies. Fourth, the lack of external validation limits generalizability beyond the SINAN context. Finally, coverage is restricted to notified work-related conditions; unnotified cases and subpopulations with limited access to care may be underrepresented. The findings suggest that within the Brazilian surveillance context, structural and service-related factors play a central role in predicting sickness absence due to work-related mental disorders. These results have several implications for policy and practice. Integrating ML-based risk scores with indicators such as work accident reports and referrals could enable earlier identification of high-risk workers, supporting timely stepped-care interventions. Pronounced geographic disparities indicate a need for region-specific preventive strategies and differential resource allocation. The relevance of employment status reinforces the importance of primary prevention efforts, including improvements in work design and psychosocial risk management. Strengthening SINAN by incorporating variables such as job demands, decision latitude, and working hours would enhance model calibration. For real-time surveillance, interpretable models accompanied by SHAP-based dashboards can balance accuracy with transparency, facilitating adoption by occupational health teams. Future research should prioritize external validation in independent datasets, prospective evaluation of ML-guided interventions, and data linkage across social security systems, primary care, and workplace assessments to refine risk stratification and determine the broader policy impact of predictive approaches. 4. Conclusions This study demonstrated the utility of three supervised machine learning algorithms—Decision Tree, Random Forest, and XGBoost—for predicting work-related sickness absence due to mental disorders using nationwide surveillance data from SINAN. The models exhibited comparable predictive performance, suggesting that model selection for operational deployment within occupational health systems should prioritize interpretability, computational efficiency, and ease of implementation. Feature importance analyses revealed that predictions were primarily driven by structural, service-related, and organizational factors—specifically the issuance of work accident reports, referrals to psychosocial care centers, and geographic region—rather than individual-level demographic attributes. This indicates that, within the context of notified health events, sickness absence is more strongly determined by institutional and contextual characteristics. The finding that the pandemic period was associated with a lower predicted probability of sickness absence likely reflects administrative reporting dynamics, such as restricted access to services and shifts in care pathways, rather than a risk reduction. Consequently, these effects must be interpreted with caution, considering the broader labor market and health system disruptions during the crisis. In conclusion, supervised machine learning represents a robust framework for enhancing occupational health surveillance. These models can support the early identification of workers at high risk of mental health–related absence, thereby informing more targeted preventive strategies and optimizing resource allocation. Declarations Acknowledgments The authors thank the Federal University of the State of Rio de Janeiro (Universidade Federal do Estado do Rio de Janeiro – UNIRIO) for institutional support throughout the development of this study. This research forms part of the undergraduate thesis of Beatriz Queiroz Reis, supervised by Prof. Letícia Martins Raposo. Author contributions BQR contributed to conceptualization, data curation, formal analysis, investigation, methodology, project administration, resources, software development, validation, visualization, and writing of the original draft, as well as to the review and editing of the manuscript. LMR contributed to conceptualization, investigation, methodology, project administration, resources, supervision, validation, and writing – review and editing. Both authors approved the final version of the manuscript. Funding No funding was received for this work. Data availability The data are available on https://github.com/leticiaraposo/ml-work-sickness-absence-brazil. Conflict of interest There are no known competing interests. Ethics approval This study used fully anonymized, publicly available secondary data from the Brazilian National Notifiable Diseases Information System (SINAN). Because the dataset contains no identifiable personal information and does not allow re-identification of individuals, ethical review was not required. In accordance with Resolution No. 510/2016 of the Brazilian National Research Ethics Commission, studies based solely on public, anonymized data are exempt from review by a Research Ethics Committee. Notes on AI use Large Language Models (LLMs) were used to assist in English translation, academic editing, and structural formatting of the manuscript. All analytical decisions, results interpretation, and final content verification were performed solely by the authors. References Axén I, Björk Brämberg E, Vaez M, et al (2020) Interventions for common mental disorders in the occupational health service: a systematic review with a narrative synthesis. Int Arch Occup Environ Health 93:823–838. https://doi.org/10.1007/s00420-020-01535-4 Barros-Areal AF, Albuquerque CP, Silva NM, et al (2022) Impact of COVID-19 on the mental health of public university hospital workers in Brazil: A cohort-based analysis of 32,691 workers. PLOS ONE 17:. https://doi.org/10.1371/journal.pone.0269318 Bastos MLA, Silva de Carvalho TG, Mattos Lacerda E, Monteiro Ferreira MJ (2023) Absenteeism Due to Mental Disorders in Agents Fighting Endemic Diseases in Ceará/Northeast Brazil. Journal of Occupational and Environmental Medicine 65:. https://doi.org/10.1097/JOM.0000000000002881 Canty A, Ripley B (2025) boot: Bootstrap Functions Corporation M, Weston S (2022) doParallel: Foreach Parallel Adaptor for the “parallel” Package de Araújo GS, de Oliveira CR, Palmeira RGS, et al (2024) Epidemiological profile of work-related mental disorders in Paraíba from 2015–2023. In: V Seven International Multidisciplinary Congress, Anais SEV7N. SEV7N de Souza Mendonça PB, Costa IB, de Góes e Silva Faustino da Costa VC, de Castro JL (2024) Sick leave due to occupational mental disorders in Brazil Northeastern states: an ecological study. Revista Brasileira de Medicina do Trabalho 22:. https://doi.org/10.47626/1679-4435-2022-1007 Fantazia M, Bernardes J, Dias A (2018) Profile of illness among workers of a university campus in the state of São Paulo: analysis of illness-related absenteeism. In: Occupational and Environmental Medicine Greenwell B (2024) fastshap: Fast Approximate Shapley Values Halonen JI, Hiilamo A, Butterworth P, et al (2020) Psychological distress and sickness absence: Within- versus between-individual analysis. Journal of Affective Disorders 264:333–339. https://doi.org/10.1016/j.jad.2020.01.006 Helgesson M, Pettersson E, Lindsäter E, et al (2024) Trajectories of work disability among individuals with anxiety-, mood/affective-, or stress-related disorders in a primary healthcare setting. BMC Psychiatry 24:. https://doi.org/10.1186/s12888-024-06068-5 Katarya R, Maan S (2020) Predicting Mental health disorders using Machine Learning for employees in technical and non-technical companies. In: 2020 IEEE International Conference on Advances and Developments in Electrical and Electronics Engineering (ICADEE). IEEE, pp 1–5 Leão FVG, Mesquita AR, de Oliveira Gotelipe LG, Menezes de Pádua C (2021) Use of psychotropic drugs among workers on leave due to mental disorders. Einstein (São Paulo) 19:. https://doi.org/10.31744/einstein_journal/2021AO5506 Liaw A, Wiener M (2002) Classification and Regression by randomForest. R News 2:18–22 Mayer M (2025) shapviz: SHAP Visualizations Melo BF, Santos KOB, Stock S, et al (2023) Mental disorders in judicial workers: analysis of sickness absence in a cohort study. Revista de Saúde Pública 57:. https://doi.org/10.11606/s1518-8787.2023057004737 Mienye ID, Jere N (2024) A Survey of Decision Trees: Concepts, Algorithms, and Applications. IEEE Access 12:86716–86727. https://doi.org/10.1109/access.2024.3416838 Mitravinda KM, Nair DS, Srinivasa G (2023) Mental Health in Tech: Analysis of Workplace Risk Factors and Impact of COVID-19. SN Computer Science 4:. https://doi.org/10.1007/s42979-022-01613-z Reddy US, Thota AV, Dharun A (2018) Machine Learning Techniques for Stress Prediction in Working Employees. In: 2018 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC). IEEE, pp 1–4 Robin X, Turck N, Hainard A, et al (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12:77 Rugulies R, Aust B, Greiner BA, et al (2023) Work-related causes of mental health conditions and interventions for their improvement in workplaces. The Lancet 402:1368–1381. https://doi.org/10.1016/S0140-6736(23)00869-3 Sanine PR, da Silva Godoi LP, da Costa Rosa TE, et al (2024) Factors associated with the hospitalization of users referred from primary health care to follow-up in Psychosocial Care Centers in the city of São Paulo, Brazil. Ciência & Saúde Coletiva 29:. https://doi.org/10.1590/1413-81232024292.19932022 Terluin B, Van Rhenen W, Anema JR, Taris TW (2011) Psychological symptoms and subsequent sickness absence. Int Arch Occup Environ Health 84:825–837. https://doi.org/10.1007/s00420-011-0637-4 Therneau T, Atkinson B (2025) rpart: Recursive Partitioning and Regression Trees van der Plaat DA, Edge R, Coggon D, et al (2021) Impact of COVID-19 pandemic on sickness absence for mental ill health in National Health Service staff. BMJ Open 11:. https://doi.org/10.1136/bmjopen-2021-054533 Vieira KM, Potrich ACG, Bressan AA, Klein L (2021) Loss of financial well-being in the COVID-19 pandemic: Does job stability make a difference? Journal of Behavioral and Experimental Finance 31:. https://doi.org/10.1016/j.jbef.2021.100554 Additional Declarations No competing interests reported. Cite Share Download PDF Status: Posted Version 1 posted You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-8339860","acceptedTermsAndConditions":true,"allowDirectSubmit":true,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":563652209,"identity":"5e24a945-026c-47dd-a199-ad94cc26dddf","order_by":0,"name":"Beatriz Queiroz Reis","email":"","orcid":"","institution":"Federal University of the State of Rio de Janeiro","correspondingAuthor":false,"prefix":"","firstName":"Beatriz","middleName":"Queiroz","lastName":"Reis","suffix":""},{"id":563652212,"identity":"17b644f7-cb64-4dce-9743-ae7c7e0579e9","order_by":1,"name":"Letícia Martins Raposo","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABDElEQVRIiWNgGAWjYBACPiA+wGAA59swMDCDREBAArsWNqAKZC1pUC0J+LUgg8NQGp8W9v6DB34U2DGYt589+PBHxfnE7ey8Dw98/GEjzyDd+wCrFp7DDAd7DJIZZM7kJRvznLmduLOZ3eDgjIQ0wwaZ4wZYtUgkAx1jwAx0RY6ZNGPb7cQNh9kYDvMkHE5gkEjD7jD5xyAt9QwS/G/MJH+2nYNp+Y9biwQzSMthBgmJHDMJ3rYDMC0HcGvhSTYA+uU4j4TEO5Bfko1BWg7OSEs2bJM5hlULP/vBxx9+/KmWk+DPBYWYneyG88eYP3ywsZPnl27DqgUGeMAI1QF4NcB0jYJRMApGwSjABgCplVgHK3qZcQAAAABJRU5ErkJggg==","orcid":"","institution":"Federal University of the State of Rio de Janeiro","correspondingAuthor":true,"prefix":"","firstName":"Letícia","middleName":"Martins","lastName":"Raposo","suffix":""}],"badges":[],"createdAt":"2025-12-11 20:08:40","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-8339860/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-8339860/v1","draftVersion":[],"editorialEvents":[],"editorialNote":"","failedWorkflow":false,"files":[{"id":99309724,"identity":"460eb556-9d82-4a16-aed7-9e0df9642c99","added_by":"auto","created_at":"2025-12-31 16:11:02","extension":"png","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":217643,"visible":true,"origin":"","legend":"","description":"","filename":"Fig3.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/1999cf3ccc226d982c241317.png"},{"id":98895622,"identity":"4d4491db-8da8-4f7a-b85c-5926d8545516","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"docx","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":48859,"visible":true,"origin":"","legend":"","description":"","filename":"ManuscriptJOR.docx","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/b3f697270048f2a5e9b0eee2.docx"},{"id":98895614,"identity":"4e10cbb8-294f-4997-a353-aa8759fe47aa","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":29916,"visible":true,"origin":"","legend":"","description":"","filename":"Table1.docx","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/2085fb00873656193936ceaf.docx"},{"id":98895619,"identity":"76e2c1b7-1f97-4af8-8cab-39a2a9a1f6fa","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"png","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":278779,"visible":true,"origin":"","legend":"","description":"","filename":"Fig4.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/4ebb9c5dbb555663feea9014.png"},{"id":99309823,"identity":"e300515d-41b9-47ad-a455-48d99a0b46e1","added_by":"auto","created_at":"2025-12-31 16:11:14","extension":"docx","order_by":4,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":28415,"visible":true,"origin":"","legend":"","description":"","filename":"Table2.docx","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/1aa8c65965c21c535ea59bd7.docx"},{"id":99309868,"identity":"3b32a84f-9c10-4c9a-bb8f-fe898da513a4","added_by":"auto","created_at":"2025-12-31 16:11:19","extension":"png","order_by":5,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":574041,"visible":true,"origin":"","legend":"","description":"","filename":"Fig5.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/24814dcbb2df62e98038f3b2.png"},{"id":98895628,"identity":"94cda74e-4eda-4ce1-a323-a713df6cc3da","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"png","order_by":6,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":52006,"visible":true,"origin":"","legend":"","description":"","filename":"Fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/cca4540912484fbc56db4e59.png"},{"id":98895627,"identity":"cb154e8a-191c-4c86-b3d3-59af65418a41","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"png","order_by":7,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1649696,"visible":true,"origin":"","legend":"","description":"","filename":"Fig2.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/62c494d4319d8d75b6a48166.png"},{"id":98895633,"identity":"d740edfe-a3b6-4cb0-a8cc-cd530a80488f","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"json","order_by":8,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":4858,"visible":true,"origin":"","legend":"","description":"","filename":"5bf8e61a0aa2490ca181e29ede664f1f.json","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/10c5d95613ea2792e685f014.json"},{"id":98895630,"identity":"16415813-f705-4c62-855a-58d8ce77e5a3","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"xml","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":100906,"visible":true,"origin":"","legend":"","description":"","filename":"5bf8e61a0aa2490ca181e29ede664f1f1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/e104eff0415068f997b58910.xml"},{"id":98895635,"identity":"66cc6183-0866-4ee1-8c72-d40f537d309c","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":217643,"visible":true,"origin":"","legend":"","description":"","filename":"Fig3.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/f823ee7b8310de88632b2d47.png"},{"id":99309667,"identity":"7fa8ee46-7129-4337-8e42-c75d77a22927","added_by":"auto","created_at":"2025-12-31 16:10:56","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":278779,"visible":true,"origin":"","legend":"","description":"","filename":"Fig4.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/582d0f09c666f8e660b90ce0.png"},{"id":98895638,"identity":"26184948-ddfd-4185-b7e0-8627797ee4e5","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":574041,"visible":true,"origin":"","legend":"","description":"","filename":"Fig5.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/6e4efd727679f1d198f35340.png"},{"id":99309801,"identity":"e2fe242d-fa3c-42ec-9fda-0785e76cadfd","added_by":"auto","created_at":"2025-12-31 16:11:12","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":52006,"visible":true,"origin":"","legend":"","description":"","filename":"Fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/1ebc5a6ca21105c81c92968c.png"},{"id":99309539,"identity":"c13b062c-6ce1-43e9-a24d-565f942cf832","added_by":"auto","created_at":"2025-12-31 16:10:42","extension":"png","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1649696,"visible":true,"origin":"","legend":"","description":"","filename":"Fig2.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/a495541637daa45919a57b38.png"},{"id":98895641,"identity":"97d19715-3035-4d85-bd6e-ffbe93c82089","added_by":"auto","created_at":"2025-12-23 17:24:27","extension":"png","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":207307,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFig3.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/f36ee39fcb1b7190a6b7576d.png"},{"id":99309965,"identity":"4a49ae33-22ea-4c1c-a1f3-a3abf9175680","added_by":"auto","created_at":"2025-12-31 16:11:30","extension":"png","order_by":16,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":256979,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFig4.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/a86be514b0e2f7511059fc15.png"},{"id":98895626,"identity":"fcf7dcd5-66ff-4b1f-95b7-95f400f9c9a3","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"png","order_by":17,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":324193,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFig5.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/539e44162b4a73d832c250b7.png"},{"id":98895632,"identity":"84f0d42f-19b4-4db4-989e-37d1071d27b0","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"png","order_by":18,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":36720,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFig1.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/d4822fe246613779fdc9677c.png"},{"id":99309522,"identity":"f85e0886-3603-4fcd-9b90-1a485cdf3501","added_by":"auto","created_at":"2025-12-31 16:10:35","extension":"png","order_by":19,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":1608305,"visible":true,"origin":"","legend":"","description":"","filename":"OnlineFig2.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/889e44a51bc2135ad30185e7.png"},{"id":99309432,"identity":"1aefb17b-470a-4eb4-8c0d-b67d65dd67c1","added_by":"auto","created_at":"2025-12-31 16:10:22","extension":"xml","order_by":20,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":102132,"visible":true,"origin":"","legend":"","description":"","filename":"5bf8e61a0aa2490ca181e29ede664f1f1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/81088f0b5e60fbf7ef81b8b5.xml"},{"id":98895639,"identity":"9b5c9ef2-5f7c-4512-9016-ae88523f4cdb","added_by":"auto","created_at":"2025-12-23 17:24:27","extension":"html","order_by":21,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":109607,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/f81c5bda27030228d7f5c17d.html"},{"id":99309833,"identity":"38374ea6-0b3d-456a-b02d-89d2fd5aadb5","added_by":"auto","created_at":"2025-12-31 16:11:16","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":208872,"visible":true,"origin":"","legend":"\u003cp\u003eFlow diagram of the data selection process. The initial dataset comprised 8,435 records. After excluding observations with missing outcome information (n = 584) and those with missing or inconsistent predictor data (n = 3,634), the final analytical sample consisted of 4,217 records.\u003c/p\u003e","description":"","filename":"Fig1.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/23ee4efe1dfef8d1f46e646a.png"},{"id":98895616,"identity":"ce25181c-b4cb-4448-bfd1-1f226841ec82","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"png","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":1649696,"visible":true,"origin":"","legend":"\u003cp\u003eDecision Tree; nodes show splitting rules, class proportions, and terminal node sizes\u003c/p\u003e","description":"","filename":"Fig2.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/d12604bacf4914e3e64e4209.png"},{"id":98895618,"identity":"e1ab5905-d60b-4493-95e5-a072a439a5af","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"png","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":1417307,"visible":true,"origin":"","legend":"\u003cp\u003eGlobal variable importance based on mean absolute SHAP values for the XGBoost model\u003c/p\u003e","description":"","filename":"Fig3.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/6cfb9c8b6d234f46cd9f8c82.png"},{"id":99309930,"identity":"297b0ea2-a427-4b0f-9f41-5bd981eb3a4a","added_by":"auto","created_at":"2025-12-31 16:11:25","extension":"png","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":1522393,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSHAP beeswarm\u003c/strong\u003e summarizing variable effects on XGBoost predictions.\u003c/p\u003e","description":"","filename":"Fig4.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/8a637b39a8b21402dd0813e2.png"},{"id":98895621,"identity":"a2e15b10-c4b7-4e71-9e9e-cfa8b1f91c98","added_by":"auto","created_at":"2025-12-23 17:24:26","extension":"png","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":3476335,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eSHAP waterfall\u003c/strong\u003e plots for selected individuals (baseline \u003cstrong\u003eE[f(x)] = 0.602\u003c/strong\u003e), with positive bars increasing and negative bars decreasing the predicted probability.\u003c/p\u003e","description":"","filename":"Fig5.png","url":"https://assets-eu.researchsquare.com/files/rs-8339860/v1/10e081db645f60adc558b464.png"}],"financialInterests":"No competing interests reported.","formattedTitle":"Machine learning models for predicting work-related sickness absence due to mental disorders using national surveillance data in Brazil","fulltext":[{"header":"1. Introduction","content":"\u003cp\u003eWork-related mental disorders have become a growing public health concern, generating substantial societal and economic impacts through sickness absence, reduced productivity, and increased healthcare demands (Ax\u0026eacute;n et al. \u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e2020\u003c/span\u003e; Rugulies et al. \u003cspan citationid=\"CR21\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Sickness absence serves as an indicator of psychological distress and shortcomings in primary prevention systems (Terluin et al. \u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e2011\u003c/span\u003e; Halonen et al. \u003cspan citationid=\"CR10\" class=\"CitationRef\"\u003e2020\u003c/span\u003e). In this context, the early identification of workers at high risk is essential for strengthening occupational health surveillance and informing preventive interventions.\u003c/p\u003e \u003cp\u003eMachine learning (ML) techniques offer promising opportunities for predicting work-related mental health outcomes by identifying risk profiles and supporting evidence-based decision-making. Prior studies have demonstrated the effectiveness of supervised algorithms, particularly tree-based models, in selecting relevant predictors of mental health risk. For example, Katarya and Maan (\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2020\u003c/span\u003e) reported the strong performance of Decision Tree and Random Forest models for identifying factors such as family history, prior symptoms, and occupational context. Similarly, Reddy et al. (2018) applied a boosting model to IT professionals and achieved a high accuracy in predicting stress, highlighting gender, family history, and psychological support as determinants of vulnerability.\u003c/p\u003e \u003cp\u003eDespite these methodological advances, few investigations have focused on the Brazilian context or utilized national epidemiological surveillance systems, such as the Brazilian National Notifiable Diseases Information System (SINAN). Most international studies rely on restricted occupational groups and do not capture the heterogeneity of employment conditions and work arrangements in Brazil. This gap limits the generalizability of the findings and underscores the need for predictive approaches based on large-scale national data.\u003c/p\u003e \u003cp\u003eThe present study addresses this gap by applying three supervised ML models, Decision Tree, Random Forest, and Extreme Gradient Boosting (XGBoost), to predict sickness absence due to work-related mental disorders using secondary SINAN data. We hypothesized that tree-based algorithms would identify key demographic, occupational, and clinical predictors associated with work disability. The rationale is that integrating predictive analytics into epidemiological surveillance may enhance the early detection of high-risk groups and support the development of more effective prevention strategies in occupational mental health.\u003c/p\u003e"},{"header":"1. Methods","content":"\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\n \u003ch2\u003e1.1. Study design and setting\u003c/h2\u003e\n \u003cp\u003eThis observational, cross-sectional study utilized secondary data extracted from the SINAN. The dataset comprised notifications recorded between 2006 and 2024. The primary aim was to develop and evaluate ML models capable of predicting sickness absence among workers diagnosed with work-related mental and behavioral disorders.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec4\" class=\"Section2\"\u003e\n \u003ch2\u003e1.2. Participants\u003c/h2\u003e\n \u003cp\u003eThe target population consisted of workers aged\u0026thinsp;\u0026ge;\u0026thinsp;18 years with diagnoses compatible with work-related mental disorders, identified through International Classification of Diseases, 10th Revision (ICD-10) codes F00\u0026ndash;F99 or Z73.0. Records lacking information on the outcome variable or containing inconsistencies in essential predictors were excluded.\u003c/p\u003e\n \u003cp\u003eThe outcome was defined as a binary variable indicating the occurrence of sickness absence (\u0026ldquo;Yes\u0026rdquo; vs. \u0026ldquo;No\u0026rdquo;). The class distribution was relatively balanced, with 60% of cases being positive. Consequently, no resampling or balancing techniques were applied, as the objective was to evaluate model performance under the natural class distribution.\u003c/p\u003e\n \u003cp\u003eThe initial dataset contained 8,435 records. After excluding records with missing outcome data (n\u0026thinsp;=\u0026thinsp;584) and those with missing or inconsistent predictor information (n\u0026thinsp;=\u0026thinsp;3,634), the final analytical sample comprised 4,217 observations (Fig. 1). All procedures, including preprocessing, model development, hyperparameter tuning, and visualization, were conducted in R version 4.5.1 using RStudio.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec5\" class=\"Section2\"\u003e\n \u003ch2\u003e1.3. Variables and measurements\u003c/h2\u003e\n \u003cp\u003ePredictor variables were selected based on epidemiological plausibility, availability in SINAN forms, and potential relevance to sickness absence. The variables included:\u003c/p\u003e\n \u003cul\u003e\n \u003cli\u003e\n \u003cp\u003eDemographic: Age group (18\u0026ndash;29; 30\u0026ndash;39; 40\u0026ndash;49; 50\u0026ndash;59; \u0026ge;60); Biological sex (female; male); Race/ethnicity (White; Black/Mixed; Asian/Indigenous); Educational level (up to primary; secondary; higher education).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eOccupational: Notification period (pre-pandemic; pandemic); Geographic region (North; Northeast; Center-West; Southeast; South); Employment status (formal employee; public servant; other); Outsourced employment (yes; no); Occupational group (based on CBO 2002: administrative services; agriculture/forestry/fishing; commerce; industrial goods and services; mid-level technicians; military/police/firefighters; public administration; repair and maintenance; science/arts).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eClinical/Behavioral: Alcohol use (yes; no); Tobacco use (yes; no); Illicit drug use (yes; no); Psychotropic medication use (yes; no); Other workers affected by the same event (yes; no); Referred to psychosocial care centers (yes; no); Work accident report (yes; no).\u003c/p\u003e\n \u003c/li\u003e\n \u003cli\u003e\n \u003cp\u003eDiagnosis: Category (burnout syndrome; personality disorders; mood disorders; schizophrenic spectrum disorders; neurotic/stress-related disorders; unspecified mental disorder); Treatment regime (outpatient; hospital).\u003c/p\u003e\n \u003c/li\u003e\n \u003c/ul\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec6\" class=\"Section2\"\u003e\n \u003ch2\u003e1.4. Data preprocessing\u003c/h2\u003e\n \u003cp\u003ePreprocessing steps ensured analytical consistency. Irrelevant administrative fields and identifiers were removed. Categorical variables were recoded to reduce sparsity and improve interpretability; specifically, rare categories within educational level and employment status were merged into broader groups (\u0026ldquo;up to primary education\u0026rdquo; and \u0026ldquo;other\u0026rdquo;, respectively). ICD-10 diagnoses were aggregated into major mental disorder groups. The dataset was split into training (80%) and testing (20%) subsets using stratified sampling to maintain outcome distributions across sets.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec7\" class=\"Section2\"\u003e\n \u003ch2\u003e1.5. Bias and confounding control\u003c/h2\u003e\n \u003cp\u003eSelection bias was addressed by excluding records with missing essential information while preserving the representativeness of the SINAN database. Information bias was mitigated by using standardized variables from a national surveillance system with established reporting protocols. Confounding was managed intrinsically through the use of ML models capable of capturing nonlinear relationships and interactions among predictors. Tree-based algorithms naturally account for multicollinearity and complex dependencies. Overfitting was minimized through stratified 10-fold cross-validation, hyperparameter tuning, and evaluation on an independent test set.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\n \u003ch2\u003e1.6. Sample size and power\u003c/h2\u003e\n \u003cp\u003ehe final analytical sample of 4,217 notifications exceeded the minimum requirements typically recommended for ML classification tasks with moderate dimensionality. Given the number of predictors and the outcome prevalence (60%), the study possessed adequate statistical power to support model training, tuning, and internal validation. Tree-based ensemble models, such as Random Forest and XGBoost, have demonstrated stable performance with sample sizes of this magnitude in epidemiological applications.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec9\" class=\"Section2\"\u003e\n \u003ch2\u003e1.7. Modelling\u003c/h2\u003e\n \u003cp\u003eThree supervised ML algorithms were implemented for binary classification: Decision Tree, Random Forest, and XGBoost.\u003c/p\u003e\n \u003cp\u003eDecision Tree was implemented using the \u003cem\u003erpart\u003c/em\u003e method in the \u003cem\u003ecaret\u003c/em\u003e package (Therneau and Atkinson \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e). The Gini impurity index was used as the splitting criterion. The complexity parameter (cp), which controls pruning, was tuned via cross-validation.\u003c/p\u003e\n \u003cp\u003eThe Random Forest model was fitted using the \u003cem\u003erandomForest\u003c/em\u003e engine in \u003cem\u003ecaret\u003c/em\u003e (Liaw and Wiener \u003cspan class=\"CitationRef\"\u003e2002\u003c/span\u003e). Hyperparameter optimization focused on mtry (predictors sampled at each split).\u003c/p\u003e\n \u003cp\u003eThe XGBoost algorithm was implemented using the \u003cem\u003exgbTree\u003c/em\u003e method in \u003cem\u003ecaret\u003c/em\u003e. Tuning involved adjusting the learning rate (eta), maximum tree depth, boosting iterations (nrounds), L1 (\u0026alpha;) and L2 (\u0026lambda;) regularization parameters, and subsampling rate (Mienye and Jere \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e). Model training was parallelized using the \u003cem\u003edoParallel\u003c/em\u003e package (Corporation and Weston \u003cspan class=\"CitationRef\"\u003e2022\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003eAll models were trained using a randomized hyperparameter search (tuneLength\u0026thinsp;=\u0026thinsp;1000), combined with stratified 10-fold cross-validation to ensure robust out-of-sample performance estimation. Model interpretability was examined using SHAP (SHapley Additive exPlanations) values computed with \u003cem\u003efastshap\u003c/em\u003e (Greenwell \u003cspan class=\"CitationRef\"\u003e2024\u003c/span\u003e) and visualized using \u003cem\u003eshapviz\u003c/em\u003e (Mayer \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e\n \u003cp\u003ePerformance was evaluated using accuracy, sensitivity (recall), specificity, precision (positive predictive value), F1-score, and the area under the receiver operating characteristic curve (AUC-ROC). A fixed decision threshold of 0.5 was applied. The AUC-ROC was estimated using the \u003cem\u003epROC\u003c/em\u003e package (Robin et al. \u003cspan class=\"CitationRef\"\u003e2011\u003c/span\u003e). All metrics were accompanied by 95% confidence intervals, computed via stratified bootstrap resampling (2,000 iterations) using functions from \u003cem\u003ecaret\u003c/em\u003e, \u003cem\u003epROC\u003c/em\u003e, and \u003cem\u003eboot\u003c/em\u003e (Canty and Ripley \u003cspan class=\"CitationRef\"\u003e2025\u003c/span\u003e).\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec10\" class=\"Section2\"\u003e\n \u003ch2\u003e1.8. Ethical Considerations\u003c/h2\u003e\n \u003cp\u003eThe study relied exclusively on fully anonymized, publicly accessible secondary data from SINAN. In accordance with Resolution No. 510/2016 of the Brazilian National Health Council, research utilizing public anonymized data is exempt from ethical review.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"2. Results","content":"\u003cdiv id=\"Sec12\" class=\"Section2\"\u003e\n \u003ch2\u003e2.1. Participants\u003c/h2\u003e\n \u003cp\u003eThe age distribution was concentrated in the 30\u0026ndash;39 (34%) and 40\u0026ndash;49 (31%) groups. Female workers comprised 66% of the sample. Regarding race/ethnicity, White participants represented 56%, followed by Black/Mixed race individuals (percentage not provided in original text, usually recommended to include if available, otherwise keep as is). The majority of participants resided in the Southeast region (45%). Educational attainment was balanced between secondary (44%) and higher education (43%).\u003c/p\u003e\n \u003cp\u003eThe notification period was evenly distributed (pandemic: 52%; pre-pandemic: 48%). Regarding occupational groups, commerce was the most frequent (24%), followed by administrative services (20%). Employment was predominantly formal (66%) and non-outsourced (93%). Most patients received outpatient care (96%).\\\u003c/p\u003e\n \u003cp\u003eClinical and behavioral variables showed high frequencies of negative responses for alcohol use (91% \u0026ldquo;no\u0026rdquo;), illicit drug use (86% \u0026ldquo;no\u0026rdquo;), and smoking (91% \u0026ldquo;no\u0026rdquo;). Conversely, notable proportions reported psychotropic medication use (53%) and referral to psychosocial care centers (68%), consistent with greater functional severity or specialized care needs. The outcome of interest, sickness absence, was observed in 60% of cases. Workplace context data indicated that 62% of reports involved other workers affected by the same event. Among aggregated ICD-10 diagnoses, neurotic, stress-related, and somatoform disorders predominated (65%). No statistically significant differences were detected between the training and test distributions across covariates, confirming the preservation of representativeness (Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e1\u003c/span\u003e).\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u003ctable id=\"Tab1\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eDescriptive characteristics and train\u0026ndash;test comparison for all variables (n\u0026thinsp;=\u0026thinsp;4,217).\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"5\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eVariable\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTotal\u003c/p\u003e\n \u003cp\u003e(N\u0026thinsp;=\u0026thinsp;4,217)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTest\u003c/p\u003e\n \u003cp\u003e(N\u0026thinsp;=\u0026thinsp;3,373)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eTrain\u003c/p\u003e\n \u003cp\u003e(N\u0026thinsp;=\u0026thinsp;844)\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003ep-value\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eAge group\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.63\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e18\u0026ndash;29\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e859 (20%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e186 (22%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e673 (20%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e30\u0026ndash;39\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1419 (34%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e287 (34%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1132 (34%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e40\u0026ndash;49\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1289 (31%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e248 (29%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1041 (31%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e50\u0026ndash;59\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e582 (14%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e109 (13%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e473 (14%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e60 or older\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e68 (1.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e14 (1.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e54 (1.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eBiological sex\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.81\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFemale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2798 (66%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e557 (66%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2241 (66%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMale\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1419 (34%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e287 (34%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1132 (34%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eRace/ethnicity\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.054\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eWhite\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2364 (56%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e449 (53%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1915 (57%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBlack/Brown\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1754 (42%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e368 (44%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1386 (41%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAsian/Indigenous\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e99 (2.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e27 (3.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e72 (2.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eEducational level\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.83\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eUp to primary\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e546 (13%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e104 (12%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e442 (13%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSecondary\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1850 (44%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e373 (44%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1477 (44%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHigher education\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1821 (43%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e367 (43%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1454 (43%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eNotification period\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.78\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePre-pandemic\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2045 (48%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e413 (49%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1632 (48%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePandemic\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2172 (52%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e431 (51%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1741 (52%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eGeographic region\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.058\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNorth\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e208 (4.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e48 (5.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e160 (4.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNortheast\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1227 (29%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e263 (31%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e964 (29%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCentral-West\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e253 (6.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e62 (7.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e191 (5.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSoutheast\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1889 (45%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e355 (42%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1534 (45%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSouth\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e640 (15%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e116 (14%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e524 (16%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eEmployment status\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.68\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eFormal employee\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2792 (66%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e549 (65%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2243 (66%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePublic servant\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1283 (30%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e264 (31%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1019 (30%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eOthers\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e142 (3.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e31 (3.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e111 (3.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eOutsourced employment\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.74\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3916 (93%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e786 (93%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3130 (93%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e301 (7.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e58 (6.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e243 (7.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eOccupational group\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.23\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAdministrative services\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e832 (20%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e155 (18%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e677 (20%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAgriculture/forestry/fishing\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e29 (0.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e12 (1.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e17 (0.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eCommerce\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1033 (24%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e204 (24%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e829 (25%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eIndustrial goods and services\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e615 (15%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e129 (15%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e486 (14%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMid-level technicians\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e558 (13%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e110 (13%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e448 (13%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMilitary/police/firefighters\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e16 (0.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2 (0.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e14 (0.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePublic administration\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e243 (5.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e55 (6.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e188 (5.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eRepair and maintenance\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e91 (2.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e17 (2.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e74 (2.2%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eScience/arts\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e800 (19%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e160 (19%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e640 (19%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eAlcohol use\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.53\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3826 (91%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e761 (90%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3065 (91%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e391 (9.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e83 (9.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e308 (9.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eTobacco use\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.72\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3821 (91%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e762 (90%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3059 (91%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e396 (9.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e82 (9.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e314 (9.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eIlicit drug use\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.56\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3614 (86%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e718 (85%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2896 (86%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e603 (14%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e126 (15%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e477 (14%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003ePsychotropic medication use\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.95\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1968 (47%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e393 (47%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1575 (47%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2249 (53%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e451 (53%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1798 (53%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eOther workers affected by the same event\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.54\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1585 (38%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e325 (39%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1260 (37%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2632 (62%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e519 (61%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2113 (63%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eReferred to psychosocial care centers\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.44\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1361 (32%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e263 (31%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1098 (33%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2856 (68%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e581 (69%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2275 (67%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eWork accident report\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.69\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2638 (63%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e533 (63%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2105 (62%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1579 (37%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e311 (37%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1268 (38%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eDiagnosis category\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.12\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eBurnout syndrome\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e322 (7.6%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e70 (8.3%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e252 (7.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePersonality disorders\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e43 (1.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e8 (0.9%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e35 (1.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eMood disorders\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e951 (23%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e178 (21%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e773 (23%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSchizophrenic spectrum disorders\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e21 (0.5%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e7 (0.8%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e14 (0.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNeurotic/stress-related disorders\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2749 (65%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e564 (67%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2185 (65%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eUnspecified mental disorder\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e131 (3.1%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e17 (2.0%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e114 (3.4%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eTreatment regime\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e\u0026gt;\u0026thinsp;0.99\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eOutpatient\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e4062 (96%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e813 (96%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e3249 (96%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eHospital\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e155 (3.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e31 (3.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e124 (3.7%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e\u003cstrong\u003eWork-related sick leave\u003c/strong\u003e\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.46\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eNo\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1666 (40%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e324 (38%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e1342 (40%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eYes\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2551 (60%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e520 (62%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003e2031 (60%)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"left\"\u003e\u0026nbsp;\u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec13\" class=\"Section2\"\u003e\n \u003ch2\u003e2.2. Model Optimization\u003c/h2\u003e\n \u003cp\u003eHyperparameters were tuned via cross-validation to maximize model generalization. The final Decision Tree utilized a complexity parameter (cp) of 0.0017. For the Random Forest, the optimal number of predictors sampled at each split (mtry) was 3. The XGBoost model was optimized with the following parameters: nrounds\u0026thinsp;=\u0026thinsp;775, max_depth\u0026thinsp;=\u0026thinsp;9, eta\u0026thinsp;=\u0026thinsp;0.015, gamma\u0026thinsp;=\u0026thinsp;3.8, colsample_bytree\u0026thinsp;=\u0026thinsp;0.43, min_child_weight\u0026thinsp;=\u0026thinsp;1, and subsample\u0026thinsp;=\u0026thinsp;0.85.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec14\" class=\"Section2\"\u003e\n \u003ch2\u003e2.3. Performance Evaluation\u003c/h2\u003e\n \u003cp\u003ePredictive performance on the independent test set was comparable across algorithms, with overlapping 95% confidence intervals (CIs) for all metrics (Table\u0026nbsp;\u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e). The AUC-ROC ranged from 0.697 (95% CI 0.661\u0026ndash;0.733) for the Decision Tree to 0.745 (95% CI 0.712\u0026ndash;0.778) for XGBoost. Overall accuracy varied narrowly (0.665\u0026ndash;0.691).\u003c/p\u003e\n \u003cdiv class=\"gridtable\"\u003e\u0026nbsp;\u003ctable id=\"Tab2\" border=\"1\"\u003e\n \u003ccaption language=\"En\"\u003e\n \u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\n \u003cdiv class=\"CaptionContent\"\u003e\n \u003cp\u003eTest-set performance metrics (area under the receiver operating characteristic curve (AUC-ROC), accuracy, sensitivity, specificity, precision, F1-score) with 95% confidence intervals for Decision Tree, Random Forest, and Extreme Gradient Boosting (XGBoost) models\u003c/p\u003e\n \u003c/div\u003e\n \u003c/caption\u003e\n \u003ccolgroup cols=\"4\"\u003e\u003c/colgroup\u003e\n \u003cthead\u003e\n \u003ctr\u003e\n \u003cth align=\"left\"\u003e\u0026nbsp;\u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eDecision Tree\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eRandom Forest\u003c/p\u003e\n \u003c/th\u003e\n \u003cth align=\"left\"\u003e\n \u003cp\u003eXGBoost\u003c/p\u003e\n \u003c/th\u003e\n \u003c/tr\u003e\n \u003c/thead\u003e\n \u003ctbody\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAUC-ROC\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.697 (0.661\u0026ndash;0.733)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.733 (0.699\u0026ndash;0.767)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.745 (0.712\u0026ndash;0.778)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eAccuracy\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.665 (0.632\u0026ndash;0.696)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.689 (0.657\u0026ndash;0.721)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.691 (0.658\u0026ndash;0.722)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSensitivity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.786 (0.765\u0026ndash;0.834)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.861 (0.833\u0026ndash;0.890)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.834 (0.802\u0026ndash;0.865)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eSpecificity\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.469 (0.413\u0026ndash;0.525)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.413 (0.361\u0026ndash;0.469)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.460 (0.407\u0026ndash;0.515)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003ePrecision\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.704 (0.682\u0026ndash;0.727)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.702 (0.682\u0026ndash;0.722)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.712 (0.690\u0026ndash;0.734)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003ctr\u003e\n \u003ctd align=\"left\"\u003e\n \u003cp\u003eF1-score\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.518 (0.471\u0026ndash;0.564)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.505 (0.453\u0026ndash;0.553)\u003c/p\u003e\n \u003c/td\u003e\n \u003ctd align=\"char\"\u003e\n \u003cp\u003e0.533 (0.482\u0026ndash;0.578)\u003c/p\u003e\n \u003c/td\u003e\n \u003c/tr\u003e\n \u003c/tbody\u003e\n \u003c/table\u003e\n \u003c/div\u003e\n \u003cp\u003eSensitivity and specificity exhibited the expected trade-off. Random Forest achieved the highest sensitivity (0.861; 95% CI 0.833\u0026ndash;0.890), paired with lower specificity (0.413; 95% CI 0.361\u0026ndash;0.469). Precision was stable across models (0.702\u0026ndash;0.712), and F1-scores ranged from 0.505 to 0.533. Detailed metrics are presented in Table \u003cspan class=\"InternalRef\"\u003e2\u003c/span\u003e.\u003c/p\u003e\n\u003c/div\u003e\n\u003cdiv id=\"Sec15\" class=\"Section2\"\u003e\n \u003ch2\u003e2.4. Model Interpretability\u003c/h2\u003e\n \u003cp\u003eThe Decision Tree (Fig. 2) identified the work accident report as the primary splitting criterion. Individuals with a formal work accident report exhibited a consistently high prevalence of sickness absence (74%). Among those without a work accident report, geographic region served as the main discriminator; workers outside the Northeast\u0026mdash;particularly in the South\u0026mdash;were more likely to be classified as negative (absence of sickness leave). Within the Southern subgroup, protective factors included diagnoses of neurotic/stress-related disorders and psychotropic medication use. Conversely, risk factors for sickness absence in specific subgroups included illicit drug use, outsourced employment, and notifications occurring during the pandemic period.\u003c/p\u003e\n \u003cp\u003eGlobal variable importance based on mean absolute SHAP values highlighted the work accident report as the top predictor (\u0026gt;\u0026thinsp;0.065), followed by geographic region and referral to psychosocial care centers (\u0026gt;\u0026thinsp;0.040) (Fig. 3).\u003c/p\u003e\n \u003cp\u003eThe SHAP beeswarm plot (Fig. 4) visualizes the direction and dispersion of feature effects. High values for work accident reports were associated with strong positive SHAP values, indicating an increased probability of sickness absence. In contrast, variables such as geographic region and referral to psychosocial care displayed broader distributions, reflecting heterogeneous effects across categories. Demographics such as race/ethnicity showed SHAP values centered near zero, suggesting a minimal marginal contribution to the model\u0026rsquo;s predictions.\u003c/p\u003e\n \u003cp\u003eIndividual predictions are illustrated via SHAP waterfall plots (Fig. 5). High-risk cases (predicted probability\u0026thinsp;\u0026gt;\u0026thinsp;0.85) were primarily driven by work accident reports, residence in the Northeast/Southeast, and referral to psychosocial care. Conversely, low-risk cases were characterized by residence in the South, absence of work accidents, and specific diagnostic categories. These plots confirm that while the work accident report is a dominant driver, the interplay of clinical and occupational factors finely tunes the individual risk prediction.\u003c/p\u003e\n\u003c/div\u003e"},{"header":"3. Discussion","content":"\u003cp\u003eThis study demonstrated that the three supervised machine learning models\u0026mdash;Decision Tree, Random Forest, and XGBoost\u0026mdash;exhibited similar predictive performance in identifying work-related sickness absence among workers with mental and behavioral disorders recorded in SINAN. Although XGBoost achieved a slightly higher AUC-ROC and accuracy, the overlapping confidence intervals indicate comparable discriminative capacity across algorithms. Crucially, the analysis revealed that a specific subset of structural and contextual predictors, particularly work accident reports, geographic region, and referral to psychosocial care centers, exerted greater influence on predicted probabilities than traditional individual-level characteristics.\u003c/p\u003e \u003cp\u003eOur results diverge from those of Katarya and Maan (\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e2020\u003c/span\u003e), who found that individual characteristics (e.g., personal and family mental health history and, to a lesser extent, gender) were the primary drivers of prediction, while organizational factors contributed minimally. In the present study, structural and service-related variables were dominant. The predictive power of the work accident report aligns with Brazilian evidence indicating that work-related mental disorders are frequent causes of sickness absence. Furthermore, trends in formal accident reporting closely track increases in such conditions, particularly in sectors with high emotional demands (de Ara\u0026uacute;jo et al. \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e2024\u003c/span\u003e; de Souza Mendon\u0026ccedil;a et al. \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). Similarly, referral to psychosocial care centers likely serves as a proxy for clinical severity and the need for specialized care, factors typically linked to prolonged or recurrent absences (Sanine et al. 2024).\u003c/p\u003e \u003cp\u003eThe significant role of geographic region is consistent with documented territorial inequalities regarding the risk of mental disorder\u0026ndash;related absence in Brazil. This is particularly evident in the Northeast and Southeast, where work intensity, productive organization, urbanization levels, and unequal access to services shape morbidity patterns (Bastos et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; Melo et al. \u003cspan citationid=\"CR16\" class=\"CitationRef\"\u003e2023\u003c/span\u003e; de Souza Mendon\u0026ccedil;a et al. \u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e2024\u003c/span\u003e). The predictive relevance of psychotropic medication use is also concordant with findings that workers on sick leave frequently utilize these drugs, and that both clinical severity and treatment adherence influence the probability and duration of absence (Le\u0026atilde;o et al. \u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Helgesson et al. \u003cspan citationid=\"CR11\" class=\"CitationRef\"\u003e2024\u003c/span\u003e).\u003c/p\u003e \u003cp\u003eDifferences compared to Mitravinda et al. (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) warrant specific attention, particularly regarding the impact of the COVID-19 pandemic. While that study\u0026mdash;focused on IT professionals\u0026mdash;reported heightened psychological symptoms and risk exposures post-pandemic, consistent with literature on increased mental health\u0026ndash;related absences (van der Plaat et al. \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e2021\u003c/span\u003e; Barros-Areal et al. \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2022\u003c/span\u003e), our SINAN-based models associated the pandemic period with a lower predicted probability of absence relative to the pre-pandemic period.\u003c/p\u003e \u003cp\u003eThis apparent discrepancy likely reflects the distinct data-generating processes. Mitravinda et al. (\u003cspan citationid=\"CR18\" class=\"CitationRef\"\u003e2023\u003c/span\u003e) employed subjective and psychosocial measures, whereas our analysis relied on formal notifications, which depend on care-seeking behaviors, service capacity, and administrative reporting dynamics. During the pandemic, access restrictions, health service overload, and work reorganization may have reduced formal notifications, even if underlying psychological distress increased. Furthermore, heterogeneity by productive sector explains divergent patterns: IT workers were disproportionately exposed to remote work intensification, whereas SINAN covers a broad occupational spectrum, including sectors affected by activity suspension. Our findings on employment status further support this interpretation; the lower predicted probability among public servants aligns with evidence that employment stability buffered financial anxiety during the crisis (Vieira et al. \u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e2021\u003c/span\u003e), even as absenteeism remains a concern modulated by institutional factors (Fantazia et al. \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e2018\u003c/span\u003e; Bastos et al. \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e2023\u003c/span\u003e). Thus, subjective indicators may capture worsening distress, whereas administrative notifications reflect underreporting and access barriers during crises.\u003c/p\u003e \u003cp\u003eStudy strengths include: (i) the use of a national surveillance system (SINAN) spanning 2006\u0026ndash;2024; (ii) methodological rigor, including stratified train\u0026ndash;test splits, 10-fold cross-validation, and bootstrap CIs; and (iii) model interpretability, leveraging tree-based methods and SHAP explanations to illuminate drivers of predictions relevant to surveillance policy.\u003c/p\u003e \u003cp\u003eHowever, limitations merit careful consideration. First, reliance on secondary administrative data introduces potential issues regarding underreporting, variable completeness, and regional heterogeneity in data quality. Second, by retaining the natural class distribution, specificity, and metrics sensitive to the negative class may have been affected, although this choice preserves real-world prevalence for public health applications. Third, key determinants of occupational mental health\u0026mdash;such as prior clinical history, psychosocial demands, work intensity, tenure, and direct measures of working conditions\u0026mdash;are unavailable in SINAN, constraining explanatory power and shifting importance toward service/administrative proxies. Fourth, the lack of external validation limits generalizability beyond the SINAN context. Finally, coverage is restricted to notified work-related conditions; unnotified cases and subpopulations with limited access to care may be underrepresented.\u003c/p\u003e \u003cp\u003eThe findings suggest that within the Brazilian surveillance context, structural and service-related factors play a central role in predicting sickness absence due to work-related mental disorders. These results have several implications for policy and practice. Integrating ML-based risk scores with indicators such as work accident reports and referrals could enable earlier identification of high-risk workers, supporting timely stepped-care interventions. Pronounced geographic disparities indicate a need for region-specific preventive strategies and differential resource allocation. The relevance of employment status reinforces the importance of primary prevention efforts, including improvements in work design and psychosocial risk management. Strengthening SINAN by incorporating variables such as job demands, decision latitude, and working hours would enhance model calibration. For real-time surveillance, interpretable models accompanied by SHAP-based dashboards can balance accuracy with transparency, facilitating adoption by occupational health teams.\u003c/p\u003e \u003cp\u003eFuture research should prioritize external validation in independent datasets, prospective evaluation of ML-guided interventions, and data linkage across social security systems, primary care, and workplace assessments to refine risk stratification and determine the broader policy impact of predictive approaches.\u003c/p\u003e"},{"header":"4. Conclusions","content":"\u003cp\u003eThis study demonstrated the utility of three supervised machine learning algorithms\u0026mdash;Decision Tree, Random Forest, and XGBoost\u0026mdash;for predicting work-related sickness absence due to mental disorders using nationwide surveillance data from SINAN. The models exhibited comparable predictive performance, suggesting that model selection for operational deployment within occupational health systems should prioritize interpretability, computational efficiency, and ease of implementation.\u003c/p\u003e \u003cp\u003eFeature importance analyses revealed that predictions were primarily driven by structural, service-related, and organizational factors\u0026mdash;specifically the issuance of work accident reports, referrals to psychosocial care centers, and geographic region\u0026mdash;rather than individual-level demographic attributes. This indicates that, within the context of notified health events, sickness absence is more strongly determined by institutional and contextual characteristics.\u003c/p\u003e \u003cp\u003eThe finding that the pandemic period was associated with a lower predicted probability of sickness absence likely reflects administrative reporting dynamics, such as restricted access to services and shifts in care pathways, rather than a risk reduction. Consequently, these effects must be interpreted with caution, considering the broader labor market and health system disruptions during the crisis.\u003c/p\u003e \u003cp\u003eIn conclusion, supervised machine learning represents a robust framework for enhancing occupational health surveillance. These models can support the early identification of workers at high risk of mental health\u0026ndash;related absence, thereby informing more targeted preventive strategies and optimizing resource allocation.\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003e\u003cstrong\u003eAcknowledgments\u003c/strong\u003e The authors thank the Federal University of the State of Rio de Janeiro (Universidade Federal do Estado do Rio de Janeiro – UNIRIO) for institutional support throughout the development of this study. This research forms part of the undergraduate thesis of Beatriz Queiroz Reis, supervised by Prof. Letícia Martins Raposo.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthor contributions\u0026nbsp;\u003c/strong\u003eBQR contributed to conceptualization, data curation, formal analysis, investigation, methodology, project administration, resources, software development, validation, visualization, and writing of the original draft, as well as to the review and editing of the manuscript. LMR contributed to conceptualization, investigation, methodology, project administration, resources, supervision, validation, and writing – review and editing. Both authors approved the final version of the manuscript.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u0026nbsp;\u003c/strong\u003eNo funding was received for this work.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eData availability\u0026nbsp;\u003c/strong\u003eThe data are available on https://github.com/leticiaraposo/ml-work-sickness-absence-brazil.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConflict of interest\u0026nbsp;\u003c/strong\u003eThere are no known competing interests.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eEthics approval\u0026nbsp;\u003c/strong\u003eThis study used fully anonymized, publicly available secondary data from the Brazilian National Notifiable Diseases Information System (SINAN). Because the dataset contains no identifiable personal information and does not allow re-identification of individuals, ethical review was not required. In accordance with Resolution No. 510/2016 of the Brazilian National Research Ethics Commission, studies based solely on public, anonymized data are exempt from review by a Research Ethics Committee.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eNotes on AI use\u0026nbsp;\u003c/strong\u003eLarge Language Models (LLMs) were used to assist in English translation, academic editing, and structural formatting of the manuscript. All analytical decisions, results interpretation, and final content verification were performed solely by the authors.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\n\u003cli\u003eAx\u0026eacute;n I, Bj\u0026ouml;rk Br\u0026auml;mberg E, Vaez M, et al (2020) Interventions for common mental disorders in the occupational health service: a systematic review with a narrative synthesis. Int Arch Occup Environ Health 93:823\u0026ndash;838. https://doi.org/10.1007/s00420-020-01535-4\u003c/li\u003e\n\u003cli\u003eBarros-Areal AF, Albuquerque CP, Silva NM, et al (2022) Impact of COVID-19 on the mental health of public university hospital workers in Brazil: A cohort-based analysis of 32,691 workers. PLOS ONE 17:. https://doi.org/10.1371/journal.pone.0269318\u003c/li\u003e\n\u003cli\u003eBastos MLA, Silva de Carvalho TG, Mattos Lacerda E, Monteiro Ferreira MJ (2023) Absenteeism Due to Mental Disorders in Agents Fighting Endemic Diseases in Cear\u0026aacute;/Northeast Brazil. Journal of Occupational and Environmental Medicine 65:. https://doi.org/10.1097/JOM.0000000000002881\u003c/li\u003e\n\u003cli\u003eCanty A, Ripley B (2025) boot: Bootstrap Functions\u003c/li\u003e\n\u003cli\u003eCorporation M, Weston S (2022) doParallel: Foreach Parallel Adaptor for the \u0026ldquo;parallel\u0026rdquo; Package\u003c/li\u003e\n\u003cli\u003ede Ara\u0026uacute;jo GS, de Oliveira CR, Palmeira RGS, et al (2024) Epidemiological profile of work-related mental disorders in Para\u0026iacute;ba from 2015\u0026ndash;2023. In: V Seven International Multidisciplinary Congress, Anais SEV7N. SEV7N\u003c/li\u003e\n\u003cli\u003ede Souza Mendon\u0026ccedil;a PB, Costa IB, de G\u0026oacute;es e Silva Faustino da Costa VC, de Castro JL (2024) Sick leave due to occupational mental disorders in Brazil Northeastern states: an ecological study. Revista Brasileira de Medicina do Trabalho 22:. https://doi.org/10.47626/1679-4435-2022-1007\u003c/li\u003e\n\u003cli\u003eFantazia M, Bernardes J, Dias A (2018) Profile of illness among workers of a university campus in the state of S\u0026atilde;o Paulo: analysis of illness-related absenteeism. In: Occupational and Environmental Medicine\u003c/li\u003e\n\u003cli\u003eGreenwell B (2024) fastshap: Fast Approximate Shapley Values\u003c/li\u003e\n\u003cli\u003eHalonen JI, Hiilamo A, Butterworth P, et al (2020) Psychological distress and sickness absence: Within- versus between-individual analysis. Journal of Affective Disorders 264:333\u0026ndash;339. https://doi.org/10.1016/j.jad.2020.01.006\u003c/li\u003e\n\u003cli\u003eHelgesson M, Pettersson E, Linds\u0026auml;ter E, et al (2024) Trajectories of work disability among individuals with anxiety-, mood/affective-, or stress-related disorders in a primary healthcare setting. BMC Psychiatry 24:. https://doi.org/10.1186/s12888-024-06068-5\u003c/li\u003e\n\u003cli\u003eKatarya R, Maan S (2020) Predicting Mental health disorders using Machine Learning for employees in technical and non-technical companies. In: 2020 IEEE International Conference on Advances and Developments in Electrical and Electronics Engineering (ICADEE). IEEE, pp 1\u0026ndash;5\u003c/li\u003e\n\u003cli\u003eLe\u0026atilde;o FVG, Mesquita AR, de Oliveira Gotelipe LG, Menezes de P\u0026aacute;dua C (2021) Use of psychotropic drugs among workers on leave due to mental disorders. Einstein (S\u0026atilde;o Paulo) 19:. https://doi.org/10.31744/einstein_journal/2021AO5506\u003c/li\u003e\n\u003cli\u003eLiaw A, Wiener M (2002) Classification and Regression by randomForest. R News 2:18\u0026ndash;22\u003c/li\u003e\n\u003cli\u003eMayer M (2025) shapviz: SHAP Visualizations\u003c/li\u003e\n\u003cli\u003eMelo BF, Santos KOB, Stock S, et al (2023) Mental disorders in judicial workers: analysis of sickness absence in a cohort study. Revista de Sa\u0026uacute;de P\u0026uacute;blica 57:. https://doi.org/10.11606/s1518-8787.2023057004737\u003c/li\u003e\n\u003cli\u003eMienye ID, Jere N (2024) A Survey of Decision Trees: Concepts, Algorithms, and Applications. IEEE Access 12:86716\u0026ndash;86727. https://doi.org/10.1109/access.2024.3416838\u003c/li\u003e\n\u003cli\u003eMitravinda KM, Nair DS, Srinivasa G (2023) Mental Health in Tech: Analysis of Workplace Risk Factors and Impact of COVID-19. SN Computer Science 4:. https://doi.org/10.1007/s42979-022-01613-z\u003c/li\u003e\n\u003cli\u003eReddy US, Thota AV, Dharun A (2018) Machine Learning Techniques for Stress Prediction in Working Employees. In: 2018 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC). IEEE, pp 1\u0026ndash;4\u003c/li\u003e\n\u003cli\u003eRobin X, Turck N, Hainard A, et al (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12:77\u003c/li\u003e\n\u003cli\u003eRugulies R, Aust B, Greiner BA, et al (2023) Work-related causes of mental health conditions and interventions for their improvement in workplaces. The Lancet 402:1368\u0026ndash;1381. https://doi.org/10.1016/S0140-6736(23)00869-3\u003c/li\u003e\n\u003cli\u003eSanine PR, da Silva Godoi LP, da Costa Rosa TE, et al (2024) Factors associated with the hospitalization of users referred from primary health care to follow-up in Psychosocial Care Centers in the city of S\u0026atilde;o Paulo, Brazil. Ci\u0026ecirc;ncia \u0026amp; Sa\u0026uacute;de Coletiva 29:. https://doi.org/10.1590/1413-81232024292.19932022\u003c/li\u003e\n\u003cli\u003eTerluin B, Van Rhenen W, Anema JR, Taris TW (2011) Psychological symptoms and subsequent sickness absence. Int Arch Occup Environ Health 84:825\u0026ndash;837. https://doi.org/10.1007/s00420-011-0637-4\u003c/li\u003e\n\u003cli\u003eTherneau T, Atkinson B (2025) rpart: Recursive Partitioning and Regression Trees\u003c/li\u003e\n\u003cli\u003evan der Plaat DA, Edge R, Coggon D, et al (2021) Impact of COVID-19 pandemic on sickness absence for mental ill health in National Health Service staff. BMJ Open 11:. https://doi.org/10.1136/bmjopen-2021-054533\u003c/li\u003e\n\u003cli\u003eVieira KM, Potrich ACG, Bressan AA, Klein L (2021) Loss of financial well-being in the COVID-19 pandemic: Does job stability make a difference? Journal of Behavioral and Experimental Finance 31:. https://doi.org/10.1016/j.jbef.2021.100554\u003c/li\u003e\n\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":true,"highlight":"","institution":"","isAcceptedByJournal":false,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":false,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true},"keywords":"Occupational Health, Mental Disorders, Sick Leave, Machine Learning, Epidemiological Monitoring, Occupational Exposure","lastPublishedDoi":"10.21203/rs.3.rs-8339860/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-8339860/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003ePurpose\u003c/h2\u003e \u003cp\u003eTo develop and compare supervised machine learning models to predict sickness absence among workers notified with work-related mental disorders in the Brazilian National Notifiable Diseases Information System (SINAN), and to identify the most influential predictors associated with this outcome.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e \u003cp\u003eA cross-sectional study was conducted using SINAN records from 2006\u0026ndash;2024. The analytical sample comprised 4,217 workers aged\u0026thinsp;\u0026ge;\u0026thinsp;18 years with ICD-10 mental or behavioral disorders (F00\u0026ndash;F99, Z73.0). Three supervised algorithms\u0026mdash;Decision Tree, Random Forest, and Extreme Gradient Boosting (XGBoost)\u0026mdash;were trained using an 80/20 stratified split. Performance was evaluated using accuracy, sensitivity, specificity, precision, F1-score, and AUC-ROC, accompanied by 95% confidence intervals. Model interpretability and feature importance were assessed using SHAP values.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e \u003cp\u003eThe three models exhibited comparable performance, with overlapping 95% CIs. AUC-ROC values ranged from 0.697 (Decision Tree) to 0.745 (XGBoost), and accuracy ranged from 0.665 to 0.691. SHAP analyses identified structural and service-related variables\u0026mdash;specifically the issuance of work accident reports, referral to psychosocial care centers, geographic region, psychotropic medication use, and employment status\u0026mdash;as the primary drivers of prediction.\u003c/p\u003e\u003ch2\u003eConclusions\u003c/h2\u003e \u003cp\u003eSupervised machine learning models demonstrated robust predictive capacity and represent promising tools for occupational health surveillance. Predictions within the SINAN context were driven primarily by structural and organizational factors rather than individual characteristics, underscoring the critical role of institutional and territorial determinants in work-related mental health outcomes.\u003c/p\u003e","manuscriptTitle":"Machine learning models for predicting work-related sickness absence due to mental disorders using national surveillance data in Brazil","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-12-23 17:24:20","doi":"10.21203/rs.3.rs-8339860/v1","editorialEvents":[{"type":"communityComments","content":0}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"researchsquare","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":true,"externalIdentity":"","sideBox":"","snPcode":"","submissionUrl":"/submission","title":"Research Square","twitterHandle":"researchsquare","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"","reportingPortfolio":"","inReviewEnabled":false,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"f530b096-7d92-4ae1-b5fd-ebfc01dd46e3","owner":[],"postedDate":"December 23rd, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"posted","subjectAreas":[],"tags":[],"updatedAt":"2026-01-27T14:10:56+00:00","versionOfRecord":[],"versionCreatedAt":"2025-12-23 17:24:20","video":"","vorDoi":"","vorDoiUrl":"","workflowStages":[]},"version":"v1","identity":"rs-8339860","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-8339860","identity":"rs-8339860","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00