Machine learning models for predicting childhood anemia in Mozambique: analysis from national survey data

doi:10.21203/rs.3.rs-7918744/v1

Machine learning models for predicting childhood anemia in Mozambique: analysis from national survey data

2025 · doi:10.21203/rs.3.rs-7918744/v1

preprint OA: closed

Full text JSON View at publisher

Full text 131,930 characters · extracted from preprint-html · click to expand

Machine learning models for predicting childhood anemia in Mozambique: analysis from national survey data | Research Square window.SnipcartSettings = { analytics: { enabled: false } }; (function() { var accessVector = localStorage.getItem('access_vector') || ''; window.dataLayer = window.dataLayer || []; if (accessVector) { window.dataLayer.push({ user: { profile: { profileInfo: { snid: accessVector } } } }); } })(); (function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src='https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);})(window,document,'script','dataLayer','GTM-K279D39R'); Browse Preprints In Review Journals COVID-19 Preprints AJE Video Bytes Research Tools Research Promotion AJE Professional Editing AJE Rubriq About Preprint Platform In Review Editorial Policies Our Team Advisory Board Help Center Sign In Submit a Preprint Cite Share Download PDF Research Article Machine learning models for predicting childhood anemia in Mozambique: analysis from national survey data Ana Raquel Manuel Gotine, Audencio Victor, Sancho Pedro Xavier This is a preprint; it has not been peer reviewed by a journal. https://doi.org/ 10.21203/rs.3.rs-7918744/v1 This work is licensed under a CC BY 4.0 License Status: Published Journal Publication published 25 Apr, 2026 Read the published version in BMC Pediatrics → Version 1 posted 15 You are reading this latest preprint version Abstract Background Childhood anemia remains a major public health concern in sub-Saharan Africa, with Mozambique among the most affected countries. Despite the growing use of machine learning (ML) to enhance disease prediction, there is a lack of national-level evidence on its application to childhood anemia in low-resource settings. This study aimed to develop, compare, and interpret ML models to predict anemia among children under five years of age in Mozambique using nationally representative survey data. Methods Data extracted from the 2022–2023 Mozambique Demographic and Health Survey (MDHS). Children under five were included, with anemia defined as hemoglobin < 11.0 g/dL. Five ML models were developed and validated, comprising Logistic Regression, Random Forest, XGBoost, LightGBM, and CatBoost. The predictive capacity of each model was assessed using AUC-ROC and slope calibration and SHAP analysis for interpretability. Results Among the 1,638 children analyzed, the prevalence of anemia was high (40.3%). XGBoost demonstrated the best discrimination (AUC-ROC = 0.722), with a sensitivity of 57.6% and a specificity of 93.2%. The SHAP analysis identified child’s age in months, lack of vitamin A supplementation, low household wealth, maternal education, number of children, and recent diarrhea as the strongest predictors. Conclusion In general, the essemble boosting models showed the highest discriminatory capacity, with XGBoost having the highest, with the potential of interpretable, low-cost predictive models to support early screening and targeted interventions for childhood anemia. Future work should explore regional retraining and recalibration, fairness evaluation, transfer learning, and external validation to enhance generalizability and field applicability. Childhood anaemia Machine learning Predictive modelling Mozambique Public health Figures Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Introduction Anemia is one of the most prevalent hematological conditions with the greatest impact on global health[ 1 , 2 ]. In 2021, the global prevalence of anemia in all ages was estimated at 24.3%, with children under five and populations in sub-Saharan Africa being the most affected[ 3 ]. Since 2019, the burden of years lived with disability due to anemia in children under five has remained persistently high[ 2 , 3 ], reflecting the slow progress towards the Sustainable Development Goals (SDGs) target of reducing anemia by 50% by 2030. According to the World Health Organization (WHO), sub-Saharan Africa remains one of the most affected regions with around 103 million anemic children[ 4 ]. Anemia is defined by a reduction in the concentration of hemoglobin or in the functionality of red blood cells, compromising the blood's ability to transport oxygen efficiently[ 5 , 6 ]. It is clinically manifested by symptoms such as fatigue, weakness and dyspnea[ 7 ]. In children, the consequences are even more serious, including delayed cognitive and motor development, greater vulnerability to infections such as malaria, helminthiases and respiratory infections, and an increased risk of early mortality[ 5 , 6 , 8 ]. In Mozambique, childhood anemia remains a major public health challenge, with a high prevalence among children under five of 77.7% and variations between provinces[ 9 ]. Cabo Delgado (86.2%) has the highest prevalence, while Maputo (70.2%) has the lowest[ 10 – 12 ]. Despite the value of these studies, there is still little research exploring the individual prediction of anemia risk in children based on multiple sociodemographic, nutritional and clinical determinants, which would be particularly relevant in contexts with laboratory limitations. In recent years, machine learning (ML) methods have established themselves as powerful tools for analyzing health data, making it possible to capture complex, non-linear relationships between variables and improving the predictive capacity of models compared to traditional statistical methods[ 13 , 14 ]. In various areas of epidemiology and public health, applications have shown promise in risk stratification, automated screening and the efficient allocation of resources, from predicting weight gain, underweight, infectious diseases, cardiovascular diseases, cancers[ 15 – 20 ]. In related contexts, studies using ML for anemia have shown promising results[ 21 – 23 ]. Recent studies have explored the use of ML techniques in predicting childhood anemia in different contexts, such as Nigeria, Ethiopia and Ghana, using data from national surveys (DHS)[ 21 – 24 ]. Despite the advances observed in the use of ML in public health, there are still significant gaps in low-income countries. In Mozambique, to date, no studies have been identified that use national samples to predict childhood anemia using these approaches. Thus, this study seeks to fill this gap by developing and validating multiple ML models for the prediction of anemia in children under five, using data from the Demographic and Health Survey (DHS) of 2022/2023. Methods Secondary data from the Mozambique Demographic and Health Survey (MDHS) from the 2022–2023. The DHS is a nationally representative household survey that collects information on women aged 15–49 years and their children born in the two years preceding the survey. It covers all provinces of the country, except for eight districts in Cabo Delgado due to security constraints, and includes more than 16,000 households. The data used in this study was obtained from Children's Data (Children's Recode (KR)). This database contains a record for each child of the women interviewed, born in the five years prior to the survey. This dataset enables the assessment of key indicators related to maternal and child health, including mortality, malnutrition, anemia, vaccination coverage, child development, and health service utilization. From an initial pool of 9,289 observations, a total of 1,638 records were retained for analysis after excluding missing or incomplete data, following a complete-case analysis approach to ensure data integrity and consistency across variables. Outcome The outcome variable was the presence or absence of anemia among children under five years of age, coded as 1 for anemic and 0 for non-anemic. According to World Health Organization (WHO, 2023) criteria, anemia in this age group is defined as a hemoglobin concentration below 11.0 g/dL[ 25 ]. Predictor variables The selection of predictors was based on their availability and accessibility and on previously published studies Africa [ 12 , 21 – 23 ]. These include mother’s education, number of children, sex of the household head, wealth index, maternal age, maternal hemoglobin, sex of the child, child’s size at birth, recent diarrhea, fever or cough in the last two weeks, vitamin A supplementation, child’s age in months, and weeks of pregnancy. All predictors are routinely collected, low-cost, and publicly available, ensuring the feasibility and scalability of the proposed models in resource-limited settings. Model Design As shown in Fig. 1 , different machine learning models were tested to evaluate whether alternative specifications improved predictive performance for the outcome variable the presence or absence of anemia in children under five years of age. Each model was trained independently, without information sharing between algorithms, to ensure unbiased performance comparison. Before fitting the models, the numerical variables were standardized using StandardScaler, a procedure that transforms the data distribution to have a mean equal to zero and a standard deviation equal to one. The non-binary categorical variables were converted into dummy variables using one-hot encoding (Table 1 ). Observations with missing data on key variables were excluded from the analysis. Table 1 Standardization and creation of dummies Feature type Features Processing Numeric Age, Mother age, Weeks pregnancy, Number of children Padronization (StandartScaler) Categorical Sex child, Cough last two weeks, Diarrhea last two weeks, Fever last two weeks, Vitamin A supplement, Mother education, Sex of household head, Wealth index, One-hot encoding if not binary Table 2 Child, maternal, and demographic characteristics according to anemia status Variable Frequency n (%) and mean (SD) total n = 1638 With anemia (660) Without anemia (978) p-value Age (month) mean (SD) 1638 (100) 9.11 (± 10.5) 13.6 (± 9.1) < 0.0001 Sex child Female 843 (51.5) 348 (52.7) 495 (50.6) 0.433 Male 795 (48.5) 312 (47.3) 483 (49.4) Cough last two weeks 0.2116 Yes 223 (13.6) 86 (13.0) 137 (14.0) No 1408 (86.0) 569 (86.2) 839 (85.8) Don’t known 7 (0.4) 5 (0.8) 2 (0.2) Diarrhea last two weeks < 0.0001 Yes 221 (13.5) 61 (9.2) 160 (16.4) No 1405 (85.8) 590 (89.4) 815 (83.3) Don't known 12 (0.7) 9 (1.4) 3 (0.3) Fever last two weeks 0.0004 Yes 204 (12.5) 60 (9.1) 144 (14.7) No 1423 (86.9) 592 (89.7) 831 (85.0) Don't known 11 (0.7) 8 (1.2) 3 (0.3) Vitamin A supplement < 0.0001 Yes 861 (52.6) 286 (43.3) 575 (58.8) No 759 (46.3) 364 (55.2) 395 (40.4) Don't known 18 (1.1) 10 (1.5) 8 (0.8) Mother age (yers) mean (SD) 1638 (100) 26.95 ± 7.28 27.51 ± 7.40 0.130 Weeks pregnancy mean (SD) 1638 (100) 35.96 ± 1.18 36.02 ± 1.02 0.283 Mother education 0.015 No education 434 (26.5) 161 (24.4) 273 (27.9) Primary 710 (43.3) 275 (41.7) 435 (44.5) Secondary 450 (27.5) 199 (30.2) 251 (25.7) Higher 44 (2.7) 25 (3.8) 19 (1.9) Number of children mean (SD) 1638 (100) 1.85 ± 1.02 1.87 ± 0.92 0.604 Sex of household head 0.649 Female 478 (29.2) 188 (28.5) 290 (29.7) Male 1160 (70.8) 472 (71.5) 688 (70.3) Wealth index < 0.0001 Poorest 352 (21.5) 124 (18.8) 228 (23.3) Poorer 259 (15.8) 94 (14.2) 165 (16.9) Middle 280 (17.1) 108 (16.4) 172 (17.6) Richer 377 (23.0) 144 (21.8) 233 (23.8) richest 370 (22.6) 190 (28.8) 180 (18.4) Data expressed as mean (standard deviation) or relative frequency (%). p -values obtained from Student´s t- test or Mann Whitney test for continuous variables with or without normal distribution, respectively, and Pearson´s chi-square for categorical variables. Model selection Five different algorithms were tested, such as Logistic regression, Catboost[ 26 ], Xgboost[ 27 ], Lightgbm[ 28 ] and Random Forest. For Catboost, XGBoost, and LightGBM, the Python packages were used. For the remaining algorithms, the scikit-learn library was employed [ 29 ]. These boosting algorithms were chosen due to their high performance, as demonstrated by several studies[ 30 , 31 ]. Selection of hyperparameters Model training was performed using a 5-fold cross-validation to estimate global performance and capacity. To efficiently explore the hyperparameters and identify the optmic configuration of the model, the GridSearch was used (See Table 3 in the supplementary material.) [ 32 ]. To do this, the data was divided into two parts, training (0.70) and test (0.30) where the previously trained models were evaluated. Table 3 Performance Metrics of Prediction Models after Feature Selection with Boruta Models AUC-ROC Precision Specificity Recall F1 MCC Xgboost 0.722 0.811 0.932 0.429 0.561 0.432 Catboost 0.716 0.767 0.905 0.465 0.579 0.422 Lightgbm 0.711 0.918 0.802 0.490 0.608 0.465 Random Forest 0.716 0.781 0.915 0.449 0.571 0.424 Logistic regression 0.676 0.649 0.844 0.429 0.517 0.303 Best threshold (0.40) Xgboost 0.655 0.576 0.613 0.381 Catboost 0.644 0.586 0.614 0.375 LightGBM * 0.802 0.626 0.608 0.464 Random Forest 0.623 0.601 0.612 0.358 Logistic regression* 0.456 0.823 0.587 0.180 *Best threshold of Logistic regression was 0.30; Area Under the Receiver Operating Characteristic Curve (AUC-ROC), Recall (Sensitivity), F1-score, and Matthews Correlation Coefficient (MCC). Table 3 The hyperparameters and the best hyperparameters selected through GridSearch Models Hyperparameters Best hyperparameters Xgboost 'n_estimators':(200, 500, 100) 'max_depth': (3, 5, 7, 9) 'learning_rate': (0.01, 0.05, 0.1) 'subsample': (0.7, 0.8, 0.9), 'colsample_bytree': (0.7, 0.8, 0.9) 'gamma':(0.0, 0.1, 0.2, 0.3) 'colsample_bytree': 0.7, 'gamma': 0.3, 'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 200, 'subsample': 0.7 Catboost 'n_estimators': (200, 500, 100), 'max_depth': (3, 5, 7), 'learning_rate': (0.01, 0.05, 0.1), #'subsample': (0.6, 1.0, 2), 'grow_policy': ('Depthwise','SymmectricTree') 'grow_policy': 'Depthwise', 'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 200 Lightgbm 'n_estimators': (100, 200, 500), 'max_depth': (10, 15), 'learning_rate': (0.01, 0.05, 0.1), 'subsample': (0.6, 0.8, 1.0), 'colsample_bytree': (0.6, 0.8, 1.0), 'reg_alpha': (0, 3, 4) 'colsample_bytree': 0.6, 'learning_rate': 0.1, 'max_depth': 10, 'n_estimators': 100, 'reg_alpha': 4, 'subsample': 0.6 Random Forest 'n_estimators': (100, 500, 200), 'max_depth':(10, 15), 'min_samples_split':(0.005, 0.05, 5) 'min_samples_leaf': [ 5 , 15 , 25 ], 'max_features': ['sqrt', 'log2'], 'criterion':['gini', 'entropy'] 'criterion': 'entropy', 'max_depth': 10, 'max_features': 'sqrt', 'min_samples_leaf': 5, 'min_samples_split': 0.05, 'n_estimators': 100 Logistic Regression 'penalty': ['l2', None], 'C': np.logspace(-3, 3, 10) 'C': np.float64(0.1), 'penalty': 'l2' Model validation and performance metrics The performance of the models was evaluated in the test set using the metrics Accuracy, precision, specificity, recall, F1-score, Matthews Correlation Coefficient (MCC), and AUC-ROC. The latter was the main metric used to compare the models. measures the model’s ability to discriminate between positive and negative cases. When two models showed similar discrimination, then slop calibration was used to select the best calibrated model to serve as the best model. The model with the best performance and good calibration was then analyzed by the Shapley Additive Explanations (SHAP) (Chen; Lundberg; Lee, 2022). SHAP values quantify the average marginal contribution of each feature to the prediction, providing a theoretically consistent and model-agnostic explanation of how individual features influence the predicted probability of positive cases. All analyses adhered to the TRIPOD + AI guidelines for transparent reporting of machine learning based prediction models[ 33 ]. Ethical approval The researchers received the survey data approval letter from the USAID DHS program after registering with the link https://www.dhsprogram.com/data/dataset_admin/login_main.cfm and maintained the confidentiality and privacy of the data. Clinical trial number: not applicable. Results Among the 1,638 children included in the study, the mean age was significantly lower among those with anemia (9.1 ± 10.5 months; p < 0.0001). The distribution of sex was similar between groups ( p = 0.433). Recent episodes of diarrhea ( p < 0.0001) and fever ( p = 0.0004) were more frequently reported among non-anemic children. Vitamin A supplementation was significantly more common among children without anemia (58.8%) compared to those with anemia (43.3%, p < 0.0001). Maternal age and gestational age did not differ significantly between groups. However, maternal education showed a significant association ( p = 0.015), with higher education levels being more common among mothers of non-anemic children. The household wealth index was also significantly associated with anemia status ( p < 0.0001) ( Table 2 ). Table 3 and figure 2 shows the performance of the prediction models after applying the Boruta algorithm for feature selection. In all cases, both before and after Boruta, XGBoost remained the best model based on AUC-ROC. After feature selection, XGBoost achieved an AUC-ROC of 0.722, which was higher than those observed for CatBoost (0.716), LightGBM (0.711), Random Forest (0.716), and logistic regression (0.676). Regarding other metrics, XGBoost showed high specificity (0.932) but lower sensitivity (0.576), with but sensitivity F1-score of 0.613 and MCC of 0.381. The Precision-Recall curve indicated that XGBoost and CatBoost achieved the best AUC ( Figure 3 ). Notably, XGBoost demonstrated superior calibration ( Figure 4 ). The SHAP analysis identified child’s age as the most important predictor of anemia, followed by vitamin A supplementation and socioeconomic factors, such as household wealth, as illustrated in Figure 5, Supplementary Figure 2 and Figure S4 . Discussion Our findings show that ML models, especially XGBoost, had the best overall performance, with AUC-ROC de 0,722 followed by CatBoost, LightGBM and Random Forest, outperforming traditional logistic regression in predicting anemia in children. These results support existing evidence that ensemble boosting methods are better at capturing non-linear relationships, interactions between multiple predictors and complex hierarchical effects, which are typical characteristics of epidemiological and demographic data [22, 23, 30, 31, 34]. In many cases, these boosting models also demonstrated better calibration, a finding found in this study. Poorly calibrated models can lead to incorrect decisions, resulting in over- or under-triangulation [35]. These findings reinforce the applicability of ML in the individual prediction of anemia in contexts with limited laboratory resources, such as Mozambique. A study developed with data from Nigeria's DHS 2018 applied 16 ML algorithms, including Decision Tree, Random Forest, Gradient Boosting, Extra Trees and AdaBoost, and observed that the Extra Trees Classifier showed the best performance (AUC = 0,83)[24]. And in Ghana, Siddiqa et al.[22] reported that ML models based on XGBoost and Random Forest achieved high accuracy (>80%) [22]. Similarly, in Bangladesh, Islam et al.[18] compared the XGBoost, LightGBM, CatBoost, Random Forest and logistic regression algorithms, reporting an accuracy of AUC > 0.80, with the child's age, level of wealth, maternal education and nutritional status being the most significant predictors of anemia cases in [18]. In Ethiopia, Yimer et al.[21] and Kassaw et al.[23] applied multiple ML algorithms to the 2016 DHS data, where Random Forest showed the best performance, with AUC = 0.818. On the other hand, another study found that, in certain Ethiopian subsamples, logistic regression outperformed more complex algorithms, with an AUC close to 0.69, suggesting that the relative performance of the models depends on the distribution of variables, sample size and class balancing [36]. This finding illustrates the so-called "No Free Lunch Theorem", according to which no algorithm is universally superior in all data contexts, reinforcing the importance of comparing multiple models and adjusting hyperparameters locally before operational application [37, 38]. In this study, after applying feature selection by means of Boruta through SHAP, the child's age was the most important predictor. This finding was consistent in other contexts. For example, in a study conducted in Ghana, the child's age was the most important predictor [22], also in Ethiopia [21, 23], mainly for children aged between 6 and 23 months [36]. This pattern can be explained by the high vulnerability of younger children to anemia, due to rapid growth and increased nutritional needs in the first two years of life, related to greater susceptibility to infections and limited dietary diversity, critical factors that influence hemoglobin levels in this age group [39]. The second most important predictor was vitamin A supplementation, which plays an essential role in erythropoiesis, regulating the differentiation of hematopoietic stem cells and favoring the mobilization and use of iron in tissues [40]. It also has an immunomodulatory and antioxidant effect, reducing susceptibility to infections that often compromise iron metabolism and hemoglobin synthesis. [40]. Other important predictors were socioeconomic factors, such as family income index, number of children under five in the household, maternal schooling, and nutritional and clinical factors, such as recent diarrhea. A study using Boruta identified the number of children in the household, distance from the health center, health insurance coverage, destination of the youngest child's feces, type of cooking fuel and rotavirus vaccination [21, 23]. One of the main strengths of this study is the use of simple, inexpensive and easily collected variables. By using accessible indicators such as the child's age, maternal schooling, household wealth, vitamin A supplementation and a history of recent diarrhea, the model achieved good predictive performance without relying on laboratory biomarkers or expensive tests. This feature reveals its operational viability, especially in contexts with limited resources, where access to hemoglobin tests is restricted. In addition, the study used a variable selection method (Boruta) and interpretation via SHAP, which guarantee transparency and interpretability, fundamental aspects for the ethical and safe adoption of artificial intelligence in health. From an applied point of view, the ML models developed in this study have the potential to support screening in primary care centers, indicating children at higher risk for anemia and guiding prioritized screening and supplementation. However, for use in the field, it is essential that the models are retrained and recalibrated for specific subpopulations, considering regional and socioeconomic differences between provinces and rural and urban areas, to minimize bias and systematic errors. It is also essential to evaluate algorithmic fairness, to ensure that the predictions do not underestimate the risk among the most vulnerable groups, such as children from poor families or regions with less access to health services[33]. Conducting equity audits and incorporating fairness metrics are recommended steps before large-scale implementation. In addition, approaches such as transfer learning represent a promising strategy for adapting models trained in Mozambique to other sub-Saharan African countries with similar epidemiological contexts, optimizing time and computational resources and increasing the external validity of predictions. The study's limitations include the reduced sample size due to the exclusion of observations with missing data, which can affect the stability of the models; the absence of clinical and laboratory variables in the DHS, such as ferritin, transferrin and C-reactive protein, which could improve the discriminatory capacity; the cross-sectional design, which prevents causal inferences; reverse temporality, inherent to cross-sectional designs. Because exposure and outcome are measured simultaneously, it is not possible to determine the temporal sequence between predictors and anemia status. Consequently, some explainability identified by the models may reflect reverse temporal relationships rather than true causal effects. For example, vitamin A supplementation may appear as a protective factor; however, it is also plausible that children who were already anemic or at higher risk received supplementation as part of routine health interventions. This ambiguity limits the ability to establish the directionality of observed associations and highlights the need for longitudinal studies to clarify causal pathways.the potential selection bias, if the children with complete data differed systematically from those excluded; and the lack of external validation, making it necessary to replicate the results in independent cohorts or in other Mozambican provinces. Future research should prioritize pilot implementations in primary health services, with retraining and regional recalibration and continuous performance monitoring, as well as prospective validation in independent cohorts. We also recommend exploring integration with mobile nutritional surveillance platforms, incorporating geospatial and environmental data (such as malaria prevalence and sanitation quality), and carrying out cost-effectiveness analyses to compare the value of using these models compared to traditional laboratory methods. Conclusion In this study, five models (XGBoost, CatBoost, LightGBM, Random Forest and Logistic Regression) were developed and evaluated, with XGBoost showing the best overall performance. The predictors identified with the greatest impact on anemia were the age of the child, not receiving vitamin A, the level of household wealth, the number of children in the household, the recent occurrence of diarrhea and maternal education. These findings suggest that interpretable ML models can be incorporated into nutritional surveillance and public policy planning, supporting early screening of vulnerable groups and the design of targeted and cost-effective interventions, contributing to the achievement of global anemia reduction targets by 2030. Declarations The study did not use primary data; ethical approval is not applicable. Informed Consent Statement Not applicable Acknowledgements The authors would like to acknowledge DHS (www.dhsprogram.com) (access date: 10.09.2025) for giving us access to all datasets. Consent for publication Not applicable. Availability of data and materials The data is available at the following link: https://dhsprogram.com/data/dataset_admin/index.cfm Competing interests The authors have declared no conflicts of interest. Funding This research did not receive any specific grant from funding agencies in the public, commercial, or nonprofit sectors. Authors’ contributions All authors agreed to be accountable for all aspects of the work and participated in all stages from the conception of the study idea, analysis, and interpretation. They also contributed to the drafting and revision of the manuscript. Based on this, the research idea was conceptualized by (AREMG). Finally, all authors gave their final review and approval. References Hunt JM. Forging Effective Strategies to Combat Iron Deficiency Reversing Productivity Losses from Iron Deficiency. Economic Case. 2002;1(2):3. Liu Y, Ren W, Wang S, Xiang M, Zhang S, Zhang F. Global burden of anemia and cause among children under five years 1990–2019: findings from the global burden of disease study 2019. Front Nutr. 2024;11. https://doi.org/10.3389/fnut.2024.1474664 . Gardner WM, Razo C, McHugh TA, Hagins H, Vilchis-Tella VM, Hennessy C, et al. Prevalence, years lived with disability, and trends in anaemia burden by severity and cause, 1990–2021: findings from the Global Burden of Disease Study 2021. Lancet Haematol. 2023;10:e713–34. https://doi.org/10.1016/S2352-3026(23)00160-6 . World Health Organization. Global Anaemia Estimates. 2021 edition. 2021. Larson LM, Kubes JN, Ramírez-Luzuriaga MJ, Khishen S, Shankar H, Prado A. Effects of increased hemoglobin on child growth, development, and disease: a systematic review and meta-analysis. Ann N Y Acad Sci. 2019;1450:83–104. https://doi.org/10.1111/nyas.14105 . Ferrari B, Peyvandi F. How I treat thrombotic thrombocytopenic purpura in pregnancy. Blood. 2020;136:2125–32. https://doi.org/10.1182/BLOOD.2019000962 . Aapro M, Beguin Y, Bokemeyer C, Dicato M, Gascón P, Glaspy J, et al. Management of anaemia and iron deficiency in patients with cancer: ESMO Clinical Practice Guidelines. Ann Oncol. 2018;29:iv96–110. https://doi.org/10.1093/annonc/mdx758 . McWilliams S, Singh I, Leung W, Stockler S, Ipsiroglu OS. Iron deficiency and common neurodevelopmental disorders—A scoping review. PLoS ONE. 2022;17. https://doi.org/10.1371/journal.pone.0273819 . 9 September. DHS. Inquérito Demografico e de Saude, 2022-23. 2023. Tekeba B, Wassie M, Mekonen EG, Tamir TT, Aemro A. Spatial distribution and determinants of anemia among under-five children in Mozambique. Sci Rep. 2025;15. https://doi.org/10.1038/s41598-024-83899-y . Nazeem Muhajarine DA, Adeyinka, Mbate Matandalasse SC. Inequities in childhood anaemia in Mozambique: results from multilevel Bayesian analysis of 2018 National Malaria Indicator Survey. medRxiv. 2021;1:1–13. Cane RM, Sheffel A, Salomão C, Sambo J, Matusse E, Ismail E, et al. Structural readiness of health facilities in Mozambique: how is Mozambique positioned to deliver nutrition-specific interventions to women and children? J Glob Health Rep. 2023;7. https://doi.org/10.29392/001c.89000 . Mooney SJ, Pejaver V. Big Data in Public Health: Terminology, Machine Learning, and Privacy. Annu Rev Public Health. 2018;39:95–112. https://doi.org/10.1146/annurev-publhealth-040617-014208 . Loring Z, Mehrotra S, Piccini JP. Machine learning in big data: Handle with care. Europace. 2019;21:1284–5. https://doi.org/10.1093/europace/euz130 . Victor A, Geremias dos Santos H, Silva GFS, Barcellos Filho F, de Fátima Cobre A, Luzia LA, et al. Predictive modeling of gestational weight gain: a machine learning multiclass classification study. BMC Pregnancy Childbirth. 2024;24. https://doi.org/10.1186/s12884-024-06952-8 . Victor A, Almeida F, Xavier SP, Rondó PHC. Predicting low birth weight risks in pregnant women in Brazil using machine learning algorithms: data from the Araraquara cohort study. BMC Pregnancy Childbirth. 2025;25. https://doi.org/10.1186/s12884-025-07351-3 . Schmidt LJ, Rieger O, Neznansky M, Hackelöer M, Dröge LA, Henrich W, et al. A machine-learning–based algorithm improves prediction of preeclampsia-associated adverse outcomes. Am J Obstet Gynecol. 2022;227. https://doi.org/10.1016/j.ajog.2022.01.026 . :77.e1-77.e30. Khan JR, Chowdhury S, Islam H, Raheem E. Machine Learning Algorithms To Predict The Childhood Anemia In Bangladesh. J Data Sci. 2021;17:195–218. https://doi.org/10.6339/jds.201901_17(1).0009 . Jeong YS, Jeon M, Park JH, Kim MC, Lee E, Park SY, et al. Machine-learning-based approach to differential diagnosis in tuberculous and viral meningitis. Infect Chemother. 2021;53. https://doi.org/10.3947/IC.2020.0104 . Al Mudawi N, Alazeb A. A Model for Predicting Cervical Cancer Using Machine Learning Algorithms. Sensors. 2022;22. https://doi.org/10.3390/s22114132 . Yimer A, Yesuf HA, Ahmed S, Zemariam AB, Mussa E, Sirage N, et al. Optimizing machine learning models for predicting anemia among under-five children in Ethiopia: insights from Ethiopian demographic and health survey data. BMC Pediatr. 2025;25. https://doi.org/10.1186/s12887-025-05659-9 . Siddiqa M, Shah G, Butt MS, Kamal A, Opoku ST. Early Childhood Anemia in Ghana: Prevalence and Predictors Using Machine Learning Techniques. Children. 2025;12. https://doi.org/10.3390/children12070924 . Kebede Kassaw A, Yimer A, Abey W, Molla TL, Zemariam AB. The application of machine learning approaches to determine the predictors of anemia among under five children in Ethiopia. Sci Rep. 2023;13. https://doi.org/10.1038/s41598-023-50128-x . Ja’afar IK, Uthman OA. Predicting Childhood Anaemia in Nigeria: A Machine Learning Approach to Uncover Key Risk Factors. Public Health Challenges. 2025. https://doi.org/10.1002/puh2.70135 . 4. World Health Organization. Haemoglobin concentrations for the diagnosis of anaemia and assessment of severity. 2011. https://iris.who.int/handle/10665/85839 . Accessed 29 Jul 2025. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. Tianqi Chen CG. XGBoost: A Scalable Tree Boosting System. 2016. NIPS-2017-. lightgbm-a-highly-efficient-gradient-boosting-decision-tree-Paper Pedregosa FABIANPEDREGOSAF, Michel V, Grisel OLIVIERGRISELO, Blondel M, Prettenhofer P, Weiss R et al. Scikit-learn: Machine Learning in Python Gaël Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos PEDREGOSA, VAROQUAUX, GRAMFORT ET AL. Matthieu Perrot. 2011. Borisov V, Leemann T, Seßler K, Haug J, Pawelczyk M, Kasneci G. Deep Neural Networks and Tabular Data: A Survey. 2022. https://doi.org/10.1109/TNNLS.2022.3229161 Shwartz-Ziv R, Armon A. Tabular Data: Deep Learning is Not All You Need. 2021. Youness G, Uyen N, Phan T, Cohen Boulakia B, Cohen B, Bootbogs B. BootBOGS: Hands-on optimizing Grid Search in hyperparameter tuning of MLP. 2023. Collins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD + AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024. https://doi.org/10.1136/bmj-2023-078378 . McElfresh D, Khandagale S, Valverde J, C VP, Feuer B, Hegde C et al. When Do Neural Nets Outperform Boosted Trees on Tabular Data? 2024. Van Calster B, McLernon DJ, Van Smeden M, Wynants L, Steyerberg EW, Bossuyt P, et al. Calibration: The Achilles heel of predictive analytics. BMC Med. 2019;17. https://doi.org/10.1186/s12916-019-1466-7 . Tesfaye SH, Seboka BT, Sisay D. Application of machine learning methods for predicting childhood anaemia: Analysis of Ethiopian Demographic Health Survey of 2016. PLoS ONE. 2024;19. https://doi.org/10.1371/journal.pone.0300172 . 4 April. Wolpert DH, Macready WG. No Free Lunch Theorems for Optimization. 1997. Gómez D, Rojas A. An empirical overview of the no free lunch theorem and its effect on real-world machine learning classification. Neural Comput. 2016;28:216–28. https://doi.org/10.1162/NECO_a_00793 . Ntenda PAM, Motsa MPS, Ntenda JK, Mbewe RB, Tiruneh FN. Predictors of iron status among preschool-age children in Malawi: insights from a micronutrient survey. Int Health. 2025. https://doi.org/10.1093/inthealth/ihaf054 . Semba RD, Bloem MW, Keller H. The anemia of vitamin A deficiency: epidemiology and pathogenesis. Eur J Clin Nutr. 2002;56:271–81. https://doi.org/10.1038=sj=ejcn=1601320. Additional Declarations No competing interests reported. Supplementary Files Supplementarymaterial.docx Cite Share Download PDF Status: Published Journal Publication published 25 Apr, 2026 Read the published version in BMC Pediatrics → Version 1 posted Editorial decision: Revision requested 08 Jan, 2026 Reviews received at journal 07 Jan, 2026 Reviews received at journal 06 Jan, 2026 Reviewers agreed at journal 30 Dec, 2025 Reviewers agreed at journal 29 Dec, 2025 Reviews received at journal 15 Dec, 2025 Reviewers agreed at journal 13 Dec, 2025 Reviewers agreed at journal 09 Dec, 2025 Reviewers agreed at journal 07 Dec, 2025 Reviewers agreed at journal 26 Nov, 2025 Reviewers invited by journal 17 Nov, 2025 Editor invited by journal 29 Oct, 2025 Editor assigned by journal 29 Oct, 2025 Submission checks completed at journal 29 Oct, 2025 First submitted to journal 21 Oct, 2025 You are reading this latest preprint version Research Square lets you share your work early, gain feedback from the community, and start making changes to your manuscript prior to peer review in a journal. As a division of Research Square Company, we’re committed to making research communication faster, fairer, and more useful. We do this by developing innovative software and high quality services for the global research community. Our growing team is made up of researchers and industry professionals working together to solve the most critical problems facing scientific publishing. Also discoverable on Platform About Our Team In Review Editorial Policies Advisory Board Help Center Resources Author Services Accessibility API Access RSS feed Manage Cookie Preferences © Research Square 2026 | ISSN 2693-5015 (online) Privacy Policy Terms of Service Do Not Sell My Personal Information {"props":{"pageProps":{"initialData":{"identity":"rs-7918744","acceptedTermsAndConditions":true,"allowDirectSubmit":false,"archivedVersions":[],"articleType":"Research Article","associatedPublications":[],"authors":[{"id":548563001,"identity":"1b95f631-c53a-4fda-99ba-468e4aca10c7","order_by":0,"name":"Ana Raquel Manuel Gotine","email":"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZAAAAAyAQMAAABI0h/eAAAABlBMVEX///8AAABVwtN+AAAACXBIWXMAAA7EAAAOxAGVKw4bAAABB0lEQVRIiWNgGAWjYDADPghlIQciDzwgRgsbhJIwBmtJIEVLYgOIwqeFf3bzsw8/yg7LsYkdfvaYp0IifX7Y4YdAW+zkdBuwa5G4c8x4Zs+5w8Zs0mnmxjxnJHI33k4zAGpJNjY7gMOaGwnGDLxthxPbpBPMpHnbgFpmJ4C0HEjchkOL/I30z4x/2w7Xt0mnf5Pm/SeRbjg7/QNeLQY3coyZgbYksEnnAG1pkEiQl87Bb4vhjZxiZplz6YZt0jllknOOSRhukM4pOJBggNsvcjfSNzO+KbOW55dO3ybxpsZGXn52+uYPHyrs5HB6HwzYkJ0KVmmATzm6FvkGQqpHwSgYBaNgpAEA0yFbeEiD4GcAAAAASUVORK5CYII=","orcid":"","institution":"University of São Paulo (USP)","correspondingAuthor":true,"prefix":"","firstName":"Ana","middleName":"Raquel Manuel","lastName":"Gotine","suffix":""},{"id":548563002,"identity":"e889c9fe-d7af-4a59-bb2a-3bafb0c7d613","order_by":1,"name":"Audencio Victor","email":"","orcid":"","institution":"University of São Paulo (USP)","correspondingAuthor":false,"prefix":"","firstName":"Audencio","middleName":"","lastName":"Victor","suffix":""},{"id":548563003,"identity":"e353cca7-5cb5-4edd-8a45-14cbef060c75","order_by":2,"name":"Sancho Pedro Xavier","email":"","orcid":"","institution":"University of São Paulo (USP)","correspondingAuthor":false,"prefix":"","firstName":"Sancho","middleName":"Pedro","lastName":"Xavier","suffix":""}],"badges":[],"createdAt":"2025-10-22 08:25:13","currentVersionCode":1,"declarations":"","doi":"10.21203/rs.3.rs-7918744/v1","doiUrl":"https://doi.org/10.21203/rs.3.rs-7918744/v1","draftVersion":[],"editorialEvents":[{"content":"https://doi.org/10.1186/s12887-026-06925-0","type":"published","date":"2026-04-25T15:57:29+00:00"}],"editorialNote":"","failedWorkflow":false,"files":[{"id":96967888,"identity":"7b31e765-38c0-45a6-8af7-1f24f3773281","added_by":"auto","created_at":"2025-11-28 06:57:43","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":2623082,"visible":true,"origin":"","legend":"","description":"","filename":"Paperrevised.docx","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/564b35f67562256f886d3694.docx"},{"id":96967877,"identity":"5c910b41-00d5-4818-a447-0e8ef16dc30e","added_by":"auto","created_at":"2025-11-28 06:57:42","extension":"json","order_by":1,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":5659,"visible":true,"origin":"","legend":"","description":"","filename":"4c54797344c54bedbd22d2a81291867c.json","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/28895ac1a71fcfbc119c72db.json"},{"id":97136637,"identity":"419c00cb-6bc8-4490-a3fa-ee03f3e984c2","added_by":"auto","created_at":"2025-12-01 09:56:50","extension":"docx","order_by":2,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":719674,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarymaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/3ff3807723d08fbf90b82e33.docx"},{"id":96967890,"identity":"9602d185-4fa7-49d3-9e81-9acbabe7655c","added_by":"auto","created_at":"2025-11-28 06:57:43","extension":"xml","order_by":3,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":112451,"visible":true,"origin":"","legend":"","description":"","filename":"4c54797344c54bedbd22d2a81291867c1enriched.xml","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/5752aec87d253b9f23bc7ec1.xml"},{"id":96967875,"identity":"d55d9696-61cc-4543-a526-48ec1559de43","added_by":"auto","created_at":"2025-11-28 06:57:42","extension":"png","order_by":9,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":21026,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/ebe55624a0c1c65dea7b3c1f.png"},{"id":96967873,"identity":"df45fa2b-c5aa-4883-9848-2e9214dc3fdb","added_by":"auto","created_at":"2025-11-28 06:57:42","extension":"png","order_by":10,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":44711,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage2.png","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/da1a7999e2ec249deb166b84.png"},{"id":96967883,"identity":"641e2eec-4a2c-4a9e-8ff1-16ec1a8a587b","added_by":"auto","created_at":"2025-11-28 06:57:42","extension":"png","order_by":11,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":48960,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage3.png","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/6847807874b986c4cc814103.png"},{"id":97137541,"identity":"66c2c043-df0d-4760-a43c-f0b14214e376","added_by":"auto","created_at":"2025-12-01 09:57:53","extension":"png","order_by":12,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":53476,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage4.png","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/4b9452009556cb6280d10d00.png"},{"id":97136340,"identity":"67aeaffe-7a0f-4c30-9490-3b38e8ac36f6","added_by":"auto","created_at":"2025-12-01 09:56:25","extension":"png","order_by":13,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":30403,"visible":true,"origin":"","legend":"","description":"","filename":"Onlinefloatimage5.png","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/374c26c820f7cad1ca4df000.png"},{"id":96967886,"identity":"a1f97d96-cc05-43fd-829d-c1aac47d632b","added_by":"auto","created_at":"2025-11-28 06:57:43","extension":"xml","order_by":14,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":112421,"visible":true,"origin":"","legend":"","description":"","filename":"4c54797344c54bedbd22d2a81291867c1structuring.xml","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/352efd5c954560082fd108df.xml"},{"id":96967885,"identity":"6a9fa9cb-1756-44ab-bb7e-1f35c91f0755","added_by":"auto","created_at":"2025-11-28 06:57:43","extension":"html","order_by":15,"title":"","display":"","copyAsset":false,"role":"acdc-reference","size":121065,"visible":true,"origin":"","legend":"","description":"","filename":"earlyproof.html","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/e7e4b4794d41d890480ca18c.html"},{"id":96967878,"identity":"f9364fcf-c02a-44cf-94e8-50b2ef7b8eea","added_by":"auto","created_at":"2025-11-28 06:57:42","extension":"png","order_by":1,"title":"Figure 1","display":"","copyAsset":false,"role":"figure","size":52328,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAnalytical pipeline for predicting childhood anemia using ML (Mozambique, DHS 2022–2023)\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage1.png","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/f1368531c390286c7261276b.png"},{"id":96967887,"identity":"8847382b-2a04-494c-b6a5-6e516ed9469b","added_by":"auto","created_at":"2025-11-28 06:57:43","extension":"jpeg","order_by":2,"title":"Figure 2","display":"","copyAsset":false,"role":"figure","size":201154,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eAUC -ROC curve of the logistic regression models, Random Forest, XGBoost, LightGBM, CatBoost.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage2.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/6a5f332507db3c3482e6590a.jpeg"},{"id":96967882,"identity":"a1c64f0a-62a7-4f8e-b590-b645dcf25d5d","added_by":"auto","created_at":"2025-11-28 06:57:42","extension":"jpeg","order_by":3,"title":"Figure 3","display":"","copyAsset":false,"role":"figure","size":170333,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003ePrecission-Recall of the logistic regression models, Random Forest, XGBoost, LightGBM, CatBoost.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage3.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/615ea841713bc156423df7be.jpeg"},{"id":96967880,"identity":"c6b54da4-3291-46b6-bfe8-cb6663cf839e","added_by":"auto","created_at":"2025-11-28 06:57:42","extension":"jpeg","order_by":4,"title":"Figure 4","display":"","copyAsset":false,"role":"figure","size":206911,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eCalibration curve of the logistic regression models, Random Forest, XGBoost, LightGBM, CatBoost.\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage4.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/5cd3ee5daa2f863f2fe3ac9e.jpeg"},{"id":96967879,"identity":"d0123d1a-1827-49b5-89d5-648a29f56f2d","added_by":"auto","created_at":"2025-11-28 06:57:42","extension":"jpeg","order_by":5,"title":"Figure 5","display":"","copyAsset":false,"role":"figure","size":137230,"visible":true,"origin":"","legend":"\u003cp\u003e\u003cstrong\u003eFeature Importance in predicting anemia in children under 5 years old (SHAP values)\u003c/strong\u003e\u003c/p\u003e","description":"","filename":"floatimage5.jpeg","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/7d995377b1b1d1b73e52ef22.jpeg"},{"id":107928299,"identity":"ead5fb82-4fc0-45c6-9d5a-474e9051ef81","added_by":"auto","created_at":"2026-04-27 16:09:39","extension":"pdf","order_by":0,"title":"","display":"","copyAsset":false,"role":"manuscript-pdf","size":1152090,"visible":true,"origin":"","legend":"","description":"","filename":"manuscript.pdf","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/f09c5296-b80f-4c85-8c18-894e88ee98f4.pdf"},{"id":96967876,"identity":"588600ea-ae19-466a-b4fe-74097fae5b0a","added_by":"auto","created_at":"2025-11-28 06:57:42","extension":"docx","order_by":0,"title":"","display":"","copyAsset":false,"role":"supplement","size":719674,"visible":true,"origin":"","legend":"","description":"","filename":"Supplementarymaterial.docx","url":"https://assets-eu.researchsquare.com/files/rs-7918744/v1/73bae5b5fc13a061a774ddd2.docx"}],"financialInterests":"No competing interests reported.","formattedTitle":"Machine learning models for predicting childhood anemia in Mozambique: analysis from national survey data","fulltext":[{"header":"Introduction","content":"\u003cp\u003eAnemia is one of the most prevalent hematological conditions with the greatest impact on global health[\u003cspan citationid=\"CR1\" class=\"CitationRef\"\u003e1\u003c/span\u003e, \u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e]. In 2021, the global prevalence of anemia in all ages was estimated at 24.3%, with children under five and populations in sub-Saharan Africa being the most affected[\u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e]. Since 2019, the burden of years lived with disability due to anemia in children under five has remained persistently high[\u003cspan citationid=\"CR2\" class=\"CitationRef\"\u003e2\u003c/span\u003e, \u003cspan citationid=\"CR3\" class=\"CitationRef\"\u003e3\u003c/span\u003e], reflecting the slow progress towards the Sustainable Development Goals (SDGs) target of reducing anemia by 50% by 2030. According to the World Health Organization (WHO), sub-Saharan Africa remains one of the most affected regions with around 103\u0026nbsp;million anemic children[\u003cspan citationid=\"CR4\" class=\"CitationRef\"\u003e4\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eAnemia is defined by a reduction in the concentration of hemoglobin or in the functionality of red blood cells, compromising the blood's ability to transport oxygen efficiently[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e]. It is clinically manifested by symptoms such as fatigue, weakness and dyspnea[\u003cspan citationid=\"CR7\" class=\"CitationRef\"\u003e7\u003c/span\u003e]. In children, the consequences are even more serious, including delayed cognitive and motor development, greater vulnerability to infections such as malaria, helminthiases and respiratory infections, and an increased risk of early mortality[\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR6\" class=\"CitationRef\"\u003e6\u003c/span\u003e, \u003cspan citationid=\"CR8\" class=\"CitationRef\"\u003e8\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eIn Mozambique, childhood anemia remains a major public health challenge, with a high prevalence among children under five of 77.7% and variations between provinces[\u003cspan citationid=\"CR9\" class=\"CitationRef\"\u003e9\u003c/span\u003e]. Cabo Delgado (86.2%) has the highest prevalence, while Maputo (70.2%) has the lowest[\u003cspan additionalcitationids=\"CR11\" citationid=\"CR10\" class=\"CitationRef\"\u003e10\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e]. Despite the value of these studies, there is still little research exploring the individual prediction of anemia risk in children based on multiple sociodemographic, nutritional and clinical determinants, which would be particularly relevant in contexts with laboratory limitations.\u003c/p\u003e\u003cp\u003eIn recent years, machine learning (ML) methods have established themselves as powerful tools for analyzing health data, making it possible to capture complex, non-linear relationships between variables and improving the predictive capacity of models compared to traditional statistical methods[\u003cspan citationid=\"CR13\" class=\"CitationRef\"\u003e13\u003c/span\u003e, \u003cspan citationid=\"CR14\" class=\"CitationRef\"\u003e14\u003c/span\u003e]. In various areas of epidemiology and public health, applications have shown promise in risk stratification, automated screening and the efficient allocation of resources, from predicting weight gain, underweight, infectious diseases, cardiovascular diseases, cancers[\u003cspan additionalcitationids=\"CR16 CR17 CR18 CR19\" citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR20\" class=\"CitationRef\"\u003e20\u003c/span\u003e]. In related contexts, studies using ML for anemia have shown promising results[\u003cspan additionalcitationids=\"CR22\" citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eRecent studies have explored the use of ML techniques in predicting childhood anemia in different contexts, such as Nigeria, Ethiopia and Ghana, using data from national surveys (DHS)[\u003cspan additionalcitationids=\"CR22 CR23\" citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR24\" class=\"CitationRef\"\u003e24\u003c/span\u003e].\u003c/p\u003e\u003cp\u003eDespite the advances observed in the use of ML in public health, there are still significant gaps in low-income countries. In Mozambique, to date, no studies have been identified that use national samples to predict childhood anemia using these approaches. Thus, this study seeks to fill this gap by developing and validating multiple ML models for the prediction of anemia in children under five, using data from the Demographic and Health Survey (DHS) of 2022/2023.\u003c/p\u003e"},{"header":"Methods","content":"\u003cp\u003eSecondary data from the Mozambique Demographic and Health Survey (MDHS) from the 2022\u0026ndash;2023. The DHS is a nationally representative household survey that collects information on women aged 15\u0026ndash;49 years and their children born in the two years preceding the survey. It covers all provinces of the country, except for eight districts in Cabo Delgado due to security constraints, and includes more than 16,000 households.\u003c/p\u003e\u003cp\u003eThe data used in this study was obtained from Children's Data (Children's Recode (KR)). This database contains a record for each child of the women interviewed, born in the five years prior to the survey. This dataset enables the assessment of key indicators related to maternal and child health, including mortality, malnutrition, anemia, vaccination coverage, child development, and health service utilization. From an initial pool of 9,289 observations, a total of 1,638 records were retained for analysis after excluding missing or incomplete data, following a complete-case analysis approach to ensure data integrity and consistency across variables.\u003c/p\u003e\u003cdiv id=\"Sec3\" class=\"Section2\"\u003e\u003ch2\u003eOutcome\u003c/h2\u003e\u003cp\u003eThe outcome variable was the presence or absence of anemia among children under five years of age, coded as 1 for anemic and 0 for non-anemic. According to World Health Organization (WHO, 2023) criteria, anemia in this age group is defined as a hemoglobin concentration below 11.0 g/dL[\u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\n\u003ch3\u003ePredictor variables\u003c/h3\u003e\n\u003cp\u003eThe selection of predictors was based on their availability and accessibility and on previously published studies Africa [\u003cspan citationid=\"CR12\" class=\"CitationRef\"\u003e12\u003c/span\u003e, \u003cspan additionalcitationids=\"CR22\" citationid=\"CR21\" class=\"CitationRef\"\u003e21\u003c/span\u003e\u0026ndash;\u003cspan citationid=\"CR23\" class=\"CitationRef\"\u003e23\u003c/span\u003e]. These include mother\u0026rsquo;s education, number of children, sex of the household head, wealth index, maternal age, maternal hemoglobin, sex of the child, child\u0026rsquo;s size at birth, recent diarrhea, fever or cough in the last two weeks, vitamin A supplementation, child\u0026rsquo;s age in months, and weeks of pregnancy. All predictors are routinely collected, low-cost, and publicly available, ensuring the feasibility and scalability of the proposed models in resource-limited settings.\u003c/p\u003e\n\u003ch3\u003eModel Design\u003c/h3\u003e\n\u003cp\u003eAs shown in Fig.\u0026nbsp;\u003cspan refid=\"Fig1\" class=\"InternalRef\"\u003e1\u003c/span\u003e, different machine learning models were tested to evaluate whether alternative specifications improved predictive performance for the outcome variable the presence or absence of anemia in children under five years of age. Each model was trained independently, without information sharing between algorithms, to ensure unbiased performance comparison.\u003c/p\u003e\u003cp\u003e\u003c/p\u003e\u003cp\u003eBefore fitting the models, the numerical variables were standardized using StandardScaler, a procedure that transforms the data distribution to have a mean equal to zero and a standard deviation equal to one. The non-binary categorical variables were converted into dummy variables using one-hot encoding (Table\u0026nbsp;\u003cspan refid=\"Tab1\" class=\"InternalRef\"\u003e1\u003c/span\u003e). Observations with missing data on key variables were excluded from the analysis.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab1\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 1\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eStandardization and creation of dummies\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFeature type\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eFeatures\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eProcessing\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNumeric\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eAge, Mother age, Weeks pregnancy, Number of children\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003ePadronization (StandartScaler)\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCategorical\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003eSex child, Cough last two weeks, Diarrhea last two weeks, Fever last two weeks, Vitamin A supplement, Mother education, Sex of household head, Wealth index,\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003eOne-hot encoding if not binary\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab2\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 2\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eChild, maternal, and demographic characteristics according to anemia status\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"5\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"char\" char=\".\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eVariable\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eFrequency n (%) and mean (SD) total n\u0026thinsp;=\u0026thinsp;1638\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eWith anemia\u003c/p\u003e\u003cp\u003e(660)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eWithout anemia\u003c/p\u003e\u003cp\u003e(978)\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003ep-value\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eAge (month) mean (SD)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1638 (100)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e9.11 (\u0026plusmn;\u0026thinsp;10.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e13.6 (\u0026plusmn;\u0026thinsp;9.1)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.0001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSex child\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFemale\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e843 (51.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e348 (52.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e495 (50.6)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.433\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMale\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e795 (48.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e312 (47.3)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e483 (49.4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCough last two weeks\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.2116\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eYes\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e223 (13.6)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e86 (13.0)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e137 (14.0)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNo\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1408 (86.0)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e569 (86.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e839 (85.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDon\u0026rsquo;t known\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e7 (0.4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e5 (0.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e2 (0.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDiarrhea last two weeks\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.0001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eYes\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e221 (13.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e61 (9.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e160 (16.4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNo\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1405 (85.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e590 (89.4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e815 (83.3)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDon't known\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e12 (0.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e9 (1.4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3 (0.3)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFever last two weeks\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.0004\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eYes\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e204 (12.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e60 (9.1)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e144 (14.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNo\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1423 (86.9)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e592 (89.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e831 (85.0)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDon't known\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e11 (0.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e8 (1.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e3 (0.3)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eVitamin A supplement\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.0001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eYes\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e861 (52.6)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e286 (43.3)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e575 (58.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNo\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e759 (46.3)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e364 (55.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e395 (40.4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eDon't known\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e18 (1.1)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e10 (1.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e8 (0.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMother age (yers) mean (SD)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1638 (100)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e26.95\u0026thinsp;\u0026plusmn;\u0026thinsp;7.28\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e27.51\u0026thinsp;\u0026plusmn;\u0026thinsp;7.40\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.130\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eWeeks pregnancy mean (SD)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1638 (100)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e35.96\u0026thinsp;\u0026plusmn;\u0026thinsp;1.18\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e36.02\u0026thinsp;\u0026plusmn;\u0026thinsp;1.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.283\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMother education\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.015\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNo education\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e434 (26.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e161 (24.4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e273 (27.9)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePrimary\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e710 (43.3)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e275 (41.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e435 (44.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSecondary\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e450 (27.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e199 (30.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e251 (25.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eHigher\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e44 (2.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e25 (3.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e19 (1.9)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eNumber of children mean (SD)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1638 (100)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e1.85\u0026thinsp;\u0026plusmn;\u0026thinsp;1.02\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e1.87\u0026thinsp;\u0026plusmn;\u0026thinsp;0.92\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.604\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eSex of household head\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e0.649\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eFemale\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e478 (29.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e188 (28.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e290 (29.7)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMale\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e1160 (70.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e472 (71.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e688 (70.3)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eWealth index\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"char\" char=\".\" colname=\"c5\"\u003e\u003cp\u003e\u0026lt;\u0026thinsp;0.0001\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePoorest\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e352 (21.5)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e124 (18.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e228 (23.3)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003ePoorer\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e259 (15.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e94 (14.2)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e165 (16.9)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eMiddle\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e280 (17.1)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e108 (16.4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e172 (17.6)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRicher\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e377 (23.0)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e144 (21.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e233 (23.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003erichest\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e370 (22.6)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e190 (28.8)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e180 (18.4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u0026nbsp;\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"5\"\u003eData expressed as mean (standard deviation) or relative frequency (%). \u003cem\u003ep\u003c/em\u003e-values obtained from Student\u0026acute;s \u003cem\u003et-\u003c/em\u003etest or Mann Whitney test for continuous variables with or without normal distribution, respectively, and Pearson\u0026acute;s chi-square for categorical variables.\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\n\u003ch3\u003eModel selection\u003c/h3\u003e\n\u003cp\u003eFive different algorithms were tested, such as Logistic regression, Catboost[\u003cspan citationid=\"CR26\" class=\"CitationRef\"\u003e26\u003c/span\u003e], Xgboost[\u003cspan citationid=\"CR27\" class=\"CitationRef\"\u003e27\u003c/span\u003e], Lightgbm[\u003cspan citationid=\"CR28\" class=\"CitationRef\"\u003e28\u003c/span\u003e] and Random Forest. For Catboost, XGBoost, and LightGBM, the Python packages were used. For the remaining algorithms, the scikit-learn library was employed [\u003cspan citationid=\"CR29\" class=\"CitationRef\"\u003e29\u003c/span\u003e]. These boosting algorithms were chosen due to their high performance, as demonstrated by several studies[\u003cspan citationid=\"CR30\" class=\"CitationRef\"\u003e30\u003c/span\u003e, \u003cspan citationid=\"CR31\" class=\"CitationRef\"\u003e31\u003c/span\u003e].\u003c/p\u003e\n\u003ch3\u003eSelection of hyperparameters\u003c/h3\u003e\n\u003cp\u003eModel training was performed using a 5-fold cross-validation to estimate global performance and capacity. To efficiently explore the hyperparameters and identify the optmic configuration of the model, the GridSearch was used (See Table\u0026nbsp;\u003cspan refid=\"Tab4\" class=\"InternalRef\"\u003e3\u003c/span\u003e in the supplementary material.) [\u003cspan citationid=\"CR32\" class=\"CitationRef\"\u003e32\u003c/span\u003e]. To do this, the data was divided into two parts, training (0.70) and test (0.30) where the previously trained models were evaluated.\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab3\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003ePerformance Metrics of Prediction Models after Feature Selection with Boruta\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"7\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c4\" colnum=\"4\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c5\" colnum=\"5\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c6\" colnum=\"6\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c7\" colnum=\"7\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eModels\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eAUC-ROC\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003ePrecision\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c4\"\u003e\u003cp\u003eSpecificity\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c5\"\u003e\u003cp\u003eRecall\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c6\"\u003e\u003cp\u003eF1\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c7\"\u003e\u003cp\u003eMCC\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eXgboost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.722\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.811\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.932\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.429\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.561\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.432\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCatboost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.716\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.767\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.905\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.465\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.579\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.422\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLightgbm\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.711\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.918\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.802\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.490\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.608\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.465\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRandom Forest\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.716\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.781\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.915\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.449\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.571\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.424\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLogistic regression\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e0.676\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.649\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u003cp\u003e0.844\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.429\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.517\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.303\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colspan=\"7\" nameend=\"c7\" namest=\"c1\"\u003e\u003cp\u003e\u003cb\u003eBest threshold (0.40)\u003c/b\u003e\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eXgboost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.655\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.576\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.613\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.381\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCatboost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.644\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.586\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.614\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.375\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLightGBM\u003csup\u003e*\u003c/sup\u003e\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.802\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.626\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.608\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.464\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRandom Forest\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.623\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.601\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.612\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.358\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLogistic regression*\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e0.456\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c4\"\u003e\u0026nbsp;\u003c/td\u003e\u003ctd align=\"left\" colname=\"c5\"\u003e\u003cp\u003e0.823\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c6\"\u003e\u003cp\u003e0.587\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c7\"\u003e\u003cp\u003e0.180\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003ctfoot\u003e\u003ctr\u003e\u003ctd colspan=\"7\"\u003e*Best threshold of Logistic regression was 0.30; Area Under the Receiver Operating Characteristic Curve (AUC-ROC), Recall (Sensitivity), F1-score, and Matthews Correlation Coefficient (MCC).\u003c/td\u003e\u003c/tr\u003e\u003c/tfoot\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cp\u003e\u003cdiv class=\"gridtable\"\u003e\u003ctable float=\"Yes\" id=\"Tab4\" border=\"1\"\u003e\u003ccaption language=\"En\"\u003e\u003cdiv class=\"CaptionNumber\"\u003eTable 3\u003c/div\u003e\u003cdiv class=\"CaptionContent\"\u003e\u003cp\u003eThe hyperparameters and the best hyperparameters selected through GridSearch\u003c/p\u003e\u003c/div\u003e\u003c/caption\u003e\u003ccolgroup cols=\"3\"\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c1\" colnum=\"1\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c2\" colnum=\"2\"\u003e\u003c/div\u003e\u003cdiv align=\"left\" class=\"colspec\" colname=\"c3\" colnum=\"3\"\u003e\u003c/div\u003e\u003cthead\u003e\u003ctr\u003e\u003cth align=\"left\" colname=\"c1\"\u003e\u003cp\u003eModels\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c2\"\u003e\u003cp\u003eHyperparameters\u003c/p\u003e\u003c/th\u003e\u003cth align=\"left\" colname=\"c3\"\u003e\u003cp\u003eBest hyperparameters\u003c/p\u003e\u003c/th\u003e\u003c/tr\u003e\u003c/thead\u003e\u003ctbody\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eXgboost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e'n_estimators':(200, 500, 100)\u003c/p\u003e\u003cp\u003e'max_depth': (3, 5, 7, 9) 'learning_rate': (0.01, 0.05, 0.1)\u003c/p\u003e\u003cp\u003e'subsample': (0.7, 0.8, 0.9),\u003c/p\u003e\u003cp\u003e'colsample_bytree': (0.7, 0.8, 0.9)\u003c/p\u003e\u003cp\u003e'gamma':(0.0, 0.1, 0.2, 0.3)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e'colsample_bytree': 0.7, 'gamma': 0.3, 'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 200, 'subsample': 0.7\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eCatboost\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e'n_estimators': (200, 500, 100),\u003c/p\u003e\u003cp\u003e'max_depth': (3, 5, 7),\u003c/p\u003e\u003cp\u003e'learning_rate': (0.01, 0.05, 0.1),\u003c/p\u003e\u003cp\u003e#'subsample': (0.6, 1.0, 2),\u003c/p\u003e\u003cp\u003e'grow_policy': ('Depthwise','SymmectricTree')\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e'grow_policy': 'Depthwise', 'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 200\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLightgbm\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e'n_estimators': (100, 200, 500),\u003c/p\u003e\u003cp\u003e'max_depth': (10, 15),\u003c/p\u003e\u003cp\u003e'learning_rate': (0.01, 0.05, 0.1),\u003c/p\u003e\u003cp\u003e'subsample': (0.6, 0.8, 1.0),\u003c/p\u003e\u003cp\u003e'colsample_bytree': (0.6, 0.8, 1.0),\u003c/p\u003e\u003cp\u003e'reg_alpha': (0, 3, 4)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e'colsample_bytree': 0.6, 'learning_rate': 0.1, 'max_depth': 10, 'n_estimators': 100, 'reg_alpha': 4, 'subsample': 0.6\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eRandom Forest\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e'n_estimators': (100, 500, 200), 'max_depth':(10, 15),\u003c/p\u003e\u003cp\u003e'min_samples_split':(0.005, 0.05, 5)\u003c/p\u003e\u003cp\u003e'min_samples_leaf': [\u003cspan citationid=\"CR5\" class=\"CitationRef\"\u003e5\u003c/span\u003e, \u003cspan citationid=\"CR15\" class=\"CitationRef\"\u003e15\u003c/span\u003e, \u003cspan citationid=\"CR25\" class=\"CitationRef\"\u003e25\u003c/span\u003e],\u0026nbsp;'max_features': ['sqrt', 'log2'],\u003c/p\u003e\u003cp\u003e'criterion':['gini', 'entropy']\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e'criterion': 'entropy', 'max_depth': 10, 'max_features': 'sqrt', 'min_samples_leaf': 5, 'min_samples_split': 0.05, 'n_estimators': 100\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003ctr\u003e\u003ctd align=\"left\" colname=\"c1\"\u003e\u003cp\u003eLogistic Regression\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c2\"\u003e\u003cp\u003e'penalty': ['l2', None],\u003c/p\u003e\u003cp\u003e'C': np.logspace(-3, 3, 10)\u003c/p\u003e\u003c/td\u003e\u003ctd align=\"left\" colname=\"c3\"\u003e\u003cp\u003e'C': np.float64(0.1), 'penalty': 'l2'\u003c/p\u003e\u003c/td\u003e\u003c/tr\u003e\u003c/tbody\u003e\u003c/colgroup\u003e\u003c/table\u003e\u003c/div\u003e\u003c/p\u003e\u003cdiv id=\"Sec8\" class=\"Section2\"\u003e\u003ch2\u003eModel validation and performance metrics\u003c/h2\u003e\u003cp\u003eThe performance of the models was evaluated in the test set using the metrics Accuracy, precision, specificity, recall, F1-score, Matthews Correlation Coefficient (MCC), and AUC-ROC. The latter was the main metric used to compare the models. measures the model\u0026rsquo;s ability to discriminate between positive and negative cases. When two models showed similar discrimination, then slop calibration was used to select the best calibrated model to serve as the best model. The model with the best performance and good calibration was then analyzed by the Shapley Additive Explanations (SHAP) (Chen; Lundberg; Lee, 2022). SHAP values quantify the average marginal contribution of each feature to the prediction, providing a theoretically consistent and model-agnostic explanation of how individual features influence the predicted probability of positive cases.\u003c/p\u003e\u003cp\u003eAll analyses adhered to the TRIPOD\u0026thinsp;+\u0026thinsp;AI guidelines for transparent reporting of machine learning based prediction models[\u003cspan citationid=\"CR33\" class=\"CitationRef\"\u003e33\u003c/span\u003e].\u003c/p\u003e\u003c/div\u003e\n\u003cp\u003e\u003cstrong\u003eEthical approval\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe researchers received the survey data approval letter from the USAID DHS program after registering with the link https://www.dhsprogram.com/data/dataset_admin/login_main.cfm and maintained the confidentiality and privacy of the data. Clinical trial number: not applicable.\u003c/p\u003e"},{"header":"Results","content":"\u003cp\u003eAmong the 1,638 children included in the study, the mean age was significantly lower among those with anemia (9.1 ± 10.5 months;\u0026nbsp;\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.0001). The distribution of sex was similar between groups (\u003cem\u003ep\u003c/em\u003e = 0.433). Recent episodes of diarrhea (\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.0001) and fever (\u003cem\u003ep\u003c/em\u003e = 0.0004) were more frequently reported among non-anemic children.\u003c/p\u003e\n\u003cp\u003eVitamin A supplementation was significantly more common among children without anemia (58.8%) compared to those with anemia (43.3%, \u003cem\u003ep\u003c/em\u003e \u0026lt; 0.0001).\u0026nbsp;\u0026nbsp;Maternal age and gestational age did not differ significantly between groups. However, maternal education showed a significant association (\u003cem\u003ep\u003c/em\u003e = 0.015), with higher education levels being more common among mothers of non-anemic children. The household wealth index was also significantly associated with anemia status (\u003cem\u003ep\u003c/em\u003e \u0026lt; 0.0001)\u0026nbsp;(\u003cstrong\u003eTable 2\u003c/strong\u003e).\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eTable 3\u003c/strong\u003e and \u003cstrong\u003efigure 2\u003c/strong\u003e shows the performance of the prediction models after applying the Boruta algorithm for feature selection. In all cases, both before and after Boruta, XGBoost remained the best model based on AUC-ROC.\u0026nbsp;After feature selection,\u0026nbsp;XGBoost\u0026nbsp;achieved an\u0026nbsp;AUC-ROC of 0.722, which was higher than those observed for CatBoost (0.716), LightGBM (0.711), Random Forest (0.716), and logistic regression (0.676). Regarding other metrics, XGBoost showed high specificity (0.932) but lower sensitivity (0.576), with but sensitivity F1-score of\u0026nbsp;0.613 and MCC of 0.381.\u003c/p\u003e\n\u003cp\u003eThe Precision-Recall curve indicated that XGBoost and CatBoost achieved the best AUC (\u003cstrong\u003eFigure 3\u003c/strong\u003e). Notably, XGBoost demonstrated superior calibration (\u003cstrong\u003eFigure 4\u003c/strong\u003e). The SHAP analysis identified child’s age as the most important predictor of anemia, followed by vitamin A supplementation and socioeconomic factors, such as household wealth, as illustrated in \u003cstrong\u003eFigure 5, Supplementary Figure 2 and\u0026nbsp;\u003c/strong\u003e\u003cstrong\u003eFigure S4\u003c/strong\u003e.\u003c/p\u003e"},{"header":"Discussion","content":"\u003cp\u003eOur findings show that ML models, especially XGBoost, had the best overall performance, with AUC-ROC de 0,722 followed by CatBoost, LightGBM and Random Forest, outperforming traditional logistic regression in predicting anemia in children. These results support existing evidence that ensemble boosting methods are better at capturing non-linear relationships, interactions between multiple predictors and complex hierarchical effects, which are typical characteristics of epidemiological and demographic data [22, 23, 30, 31, 34]. In many cases, these boosting models also demonstrated better calibration, a finding found in this study. Poorly calibrated models can lead to incorrect decisions, resulting in over- or under-triangulation [35].\u003c/p\u003e\n\u003cp\u003eThese findings reinforce the applicability of ML in the individual prediction of anemia in contexts with limited laboratory resources, such as Mozambique. A study developed with data from Nigeria's DHS 2018 applied 16 ML algorithms, including Decision Tree, Random Forest, Gradient Boosting, Extra Trees and AdaBoost, and observed that the Extra Trees Classifier showed the best performance (AUC = 0,83)[24]. And in Ghana, Siddiqa et al.[22] \u0026nbsp;reported that ML models based on XGBoost and Random Forest achieved high accuracy (\u0026gt;80%) [22]. Similarly, in Bangladesh, Islam et al.[18] compared the XGBoost, LightGBM, CatBoost, Random Forest and logistic regression algorithms, reporting an accuracy of AUC \u0026gt; 0.80, with the child's age, level of wealth, maternal education and nutritional status being the most significant predictors of anemia cases in [18].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eIn Ethiopia, Yimer et al.[21] and Kassaw et al.[23] applied multiple ML algorithms to the 2016 DHS data, where Random Forest showed the best performance, with AUC = 0.818.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOn the other hand, another study found that, in certain Ethiopian subsamples, logistic regression outperformed more complex algorithms, with an AUC close to 0.69, suggesting that the relative performance of the models depends on the distribution of variables, sample size and class balancing [36]. This finding illustrates the so-called \"No Free Lunch Theorem\", according to which no algorithm is universally superior in all data contexts, reinforcing the importance of comparing multiple models and adjusting hyperparameters locally before operational application [37, 38].\u003c/p\u003e\n\u003cp\u003eIn this study, after applying feature selection by means of Boruta through SHAP, the child's age was the most important predictor. This finding was consistent in other contexts. For example, in a study conducted in Ghana, the child's age was the most important predictor [22], also in Ethiopia [21, 23], mainly for children aged between 6 and 23 months [36].\u0026nbsp;This pattern can be explained by the high vulnerability of younger children to anemia, due to rapid growth and increased nutritional needs in the first two years of life, related to greater susceptibility to infections and limited dietary diversity, critical factors that influence hemoglobin levels in this age group\u0026nbsp;[39]. The second most important predictor was vitamin A supplementation, which plays an essential role in erythropoiesis, regulating the differentiation of hematopoietic stem cells and favoring the mobilization and use of iron in tissues\u0026nbsp;[40]. It also has an immunomodulatory and antioxidant effect, reducing susceptibility to infections that often compromise iron metabolism and hemoglobin synthesis.\u0026nbsp;[40].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOther important predictors were socioeconomic factors, such as family income index, number of children under five in the household, maternal schooling, and nutritional and clinical factors, such as recent diarrhea. A study using Boruta identified the number of children in the household, distance from the health center, health insurance coverage, destination of the youngest child's feces, type of cooking fuel and rotavirus vaccination [21, 23].\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eOne of the main strengths of this study is the use of simple, inexpensive and easily collected variables. By using accessible indicators such as the child's age, maternal schooling, household wealth, vitamin A supplementation and a history of recent diarrhea, the model achieved good predictive performance without relying on laboratory biomarkers or expensive tests. This feature reveals its operational viability, especially in contexts with limited resources, where access to hemoglobin tests is restricted. In addition, the study used a variable selection method (Boruta) and interpretation via SHAP, which guarantee transparency and interpretability, fundamental aspects for the ethical and safe adoption of artificial intelligence in health.\u003c/p\u003e\n\u003cp\u003eFrom an applied point of view, the ML models developed in this study have the potential to support screening in primary care centers, indicating children at higher risk for anemia and guiding prioritized screening and supplementation. However, for use in the field, it is essential that the models are retrained and recalibrated for specific subpopulations, considering regional and socioeconomic differences between provinces and rural and urban areas, to minimize bias and systematic errors.\u003c/p\u003e\n\u003cp\u003e\u0026nbsp;It is also essential to evaluate algorithmic fairness, to ensure that the predictions do not underestimate the risk among the most vulnerable groups, such as children from poor families or regions with less access to health services[33]. Conducting equity audits and incorporating fairness metrics are recommended steps before large-scale implementation. In addition, approaches such as transfer learning represent a promising strategy for adapting models trained in Mozambique to other sub-Saharan African countries with similar epidemiological contexts, optimizing time and computational resources and increasing the external validity of predictions.\u0026nbsp;\u003c/p\u003e\n\u003cp\u003eThe study's limitations include the reduced sample size due to the exclusion of observations with missing data, which can affect the stability of the models; the absence of clinical and laboratory variables in the DHS, such as ferritin, transferrin and C-reactive protein, which could improve the discriminatory capacity; the cross-sectional design, which prevents causal inferences; reverse temporality, inherent to cross-sectional designs. Because exposure and outcome are measured simultaneously, it is not possible to determine the temporal sequence between predictors and anemia status. Consequently, some explainability identified by the models may reflect reverse temporal relationships rather than true causal effects. For example, vitamin A supplementation may appear as a protective factor; however, it is also plausible that children who were already anemic or at higher risk received supplementation as part of routine health interventions. This ambiguity limits the ability to establish the directionality of observed associations and highlights the need for longitudinal studies to clarify causal pathways.the potential selection bias, if the children with complete data differed systematically from those excluded; and the lack of external validation, making it necessary to replicate the results in independent cohorts or in other Mozambican provinces.\u003c/p\u003e\n\u003cp\u003eFuture research should prioritize pilot implementations in primary health services, with retraining and regional recalibration and continuous performance monitoring, as well as prospective validation in independent cohorts. We also recommend exploring integration with mobile nutritional surveillance platforms, incorporating geospatial and environmental data (such as malaria prevalence and sanitation quality), and carrying out cost-effectiveness analyses to compare the value of using these models compared to traditional laboratory methods.\u0026nbsp;\u003c/p\u003e"},{"header":"Conclusion","content":"\u003cp\u003eIn this study, five models (XGBoost, CatBoost, LightGBM, Random Forest and Logistic Regression) were developed and evaluated, with XGBoost showing the best overall performance. The predictors identified with the greatest impact on anemia were the age of the child, not receiving vitamin A, the level of household wealth, the number of children in the household, the recent occurrence of diarrhea and maternal education. These findings suggest that interpretable ML models can be incorporated into nutritional surveillance and public policy planning, supporting early screening of vulnerable groups and the design of targeted and cost-effective interventions, contributing to the achievement of global anemia reduction targets by 2030.\u003c/p\u003e\n\u003cp\u003e\u003cbr\u003e\u003c/p\u003e"},{"header":"Declarations","content":"\u003cp\u003eThe study did not use primary data; ethical approval is not applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eInformed Consent Statement\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAcknowledgements\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors would like to acknowledge DHS (www.dhsprogram.com) (access date: 10.09.2025) for giving us access to all datasets.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eConsent for publication\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eNot applicable.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAvailability of data and materials\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe data is available at the following link:\u0026nbsp;\u003cstrong\u003ehttps://dhsprogram.com/data/dataset_admin/index.cfm\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eCompeting interests\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThe authors have declared no conflicts of interest.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eFunding\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eThis research did not receive any specific grant from funding agencies in the public, commercial, or nonprofit sectors.\u003c/p\u003e\n\u003cp\u003e\u003cstrong\u003eAuthors\u0026rsquo; contributions\u003c/strong\u003e\u003c/p\u003e\n\u003cp\u003eAll authors agreed to be accountable for all aspects of the work and participated in all stages from the conception of the study idea, analysis, and interpretation. They also contributed to the drafting and revision of the manuscript. Based on this, the research idea was conceptualized by (AREMG). Finally, all authors gave their final review and approval.\u003c/p\u003e"},{"header":"References","content":"\u003col\u003e\u003cli\u003e\u003cspan\u003eHunt JM. Forging Effective Strategies to Combat Iron Deficiency Reversing Productivity Losses from Iron Deficiency. Economic Case. 2002;1(2):3.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLiu Y, Ren W, Wang S, Xiang M, Zhang S, Zhang F. Global burden of anemia and cause among children under five years 1990\u0026ndash;2019: findings from the global burden of disease study 2019. Front Nutr. 2024;11. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3389/fnut.2024.1474664\u003c/span\u003e\u003cspan address=\"10.3389/fnut.2024.1474664\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eGardner WM, Razo C, McHugh TA, Hagins H, Vilchis-Tella VM, Hennessy C, et al. Prevalence, years lived with disability, and trends in anaemia burden by severity and cause, 1990\u0026ndash;2021: findings from the Global Burden of Disease Study 2021. Lancet Haematol. 2023;10:e713\u0026ndash;34. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/S2352-3026(23)00160-6\u003c/span\u003e\u003cspan address=\"10.1016/S2352-3026(23)00160-6\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWorld Health Organization. Global Anaemia Estimates. 2021 edition. 2021.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLarson LM, Kubes JN, Ram\u0026iacute;rez-Luzuriaga MJ, Khishen S, Shankar H, Prado A. Effects of increased hemoglobin on child growth, development, and disease: a systematic review and meta-analysis. Ann N Y Acad Sci. 2019;1450:83\u0026ndash;104. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1111/nyas.14105\u003c/span\u003e\u003cspan address=\"10.1111/nyas.14105\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eFerrari B, Peyvandi F. How I treat thrombotic thrombocytopenic purpura in pregnancy. Blood. 2020;136:2125\u0026ndash;32. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1182/BLOOD.2019000962\u003c/span\u003e\u003cspan address=\"10.1182/BLOOD.2019000962\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAapro M, Beguin Y, Bokemeyer C, Dicato M, Gasc\u0026oacute;n P, Glaspy J, et al. Management of anaemia and iron deficiency in patients with cancer: ESMO Clinical Practice Guidelines. Ann Oncol. 2018;29:iv96\u0026ndash;110. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/annonc/mdx758\u003c/span\u003e\u003cspan address=\"10.1093/annonc/mdx758\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMcWilliams S, Singh I, Leung W, Stockler S, Ipsiroglu OS. Iron deficiency and common neurodevelopmental disorders\u0026mdash;A scoping review. PLoS ONE. 2022;17. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0273819\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0273819\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. 9 September.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eDHS. Inqu\u0026eacute;rito Demografico e de Saude, 2022-23. 2023.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTekeba B, Wassie M, Mekonen EG, Tamir TT, Aemro A. Spatial distribution and determinants of anemia among under-five children in Mozambique. Sci Rep. 2025;15. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-024-83899-y\u003c/span\u003e\u003cspan address=\"10.1038/s41598-024-83899-y\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNazeem Muhajarine DA, Adeyinka, Mbate Matandalasse SC. Inequities in childhood anaemia in Mozambique: results from multilevel Bayesian analysis of 2018 National Malaria Indicator Survey. medRxiv. 2021;1:1\u0026ndash;13.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCane RM, Sheffel A, Salom\u0026atilde;o C, Sambo J, Matusse E, Ismail E, et al. Structural readiness of health facilities in Mozambique: how is Mozambique positioned to deliver nutrition-specific interventions to women and children? J Glob Health Rep. 2023;7. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.29392/001c.89000\u003c/span\u003e\u003cspan address=\"10.29392/001c.89000\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMooney SJ, Pejaver V. Big Data in Public Health: Terminology, Machine Learning, and Privacy. Annu Rev Public Health. 2018;39:95\u0026ndash;112. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1146/annurev-publhealth-040617-014208\u003c/span\u003e\u003cspan address=\"10.1146/annurev-publhealth-040617-014208\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eLoring Z, Mehrotra S, Piccini JP. Machine learning in big data: Handle with care. Europace. 2019;21:1284\u0026ndash;5. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/europace/euz130\u003c/span\u003e\u003cspan address=\"10.1093/europace/euz130\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVictor A, Geremias dos Santos H, Silva GFS, Barcellos Filho F, de F\u0026aacute;tima Cobre A, Luzia LA, et al. Predictive modeling of gestational weight gain: a machine learning multiclass classification study. BMC Pregnancy Childbirth. 2024;24. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12884-024-06952-8\u003c/span\u003e\u003cspan address=\"10.1186/s12884-024-06952-8\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVictor A, Almeida F, Xavier SP, Rond\u0026oacute; PHC. Predicting low birth weight risks in pregnant women in Brazil using machine learning algorithms: data from the Araraquara cohort study. BMC Pregnancy Childbirth. 2025;25. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12884-025-07351-3\u003c/span\u003e\u003cspan address=\"10.1186/s12884-025-07351-3\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSchmidt LJ, Rieger O, Neznansky M, Hackel\u0026ouml;er M, Dr\u0026ouml;ge LA, Henrich W, et al. A machine-learning\u0026ndash;based algorithm improves prediction of preeclampsia-associated adverse outcomes. Am J Obstet Gynecol. 2022;227. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1016/j.ajog.2022.01.026\u003c/span\u003e\u003cspan address=\"10.1016/j.ajog.2022.01.026\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. :77.e1-77.e30.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKhan JR, Chowdhury S, Islam H, Raheem E. Machine Learning Algorithms To Predict The Childhood Anemia In Bangladesh. J Data Sci. 2021;17:195\u0026ndash;218. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.6339/jds.201901_17(1).0009\u003c/span\u003e\u003cspan address=\"10.6339/jds.201901_17(1).0009\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJeong YS, Jeon M, Park JH, Kim MC, Lee E, Park SY, et al. Machine-learning-based approach to differential diagnosis in tuberculous and viral meningitis. Infect Chemother. 2021;53. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3947/IC.2020.0104\u003c/span\u003e\u003cspan address=\"10.3947/IC.2020.0104\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eAl Mudawi N, Alazeb A. A Model for Predicting Cervical Cancer Using Machine Learning Algorithms. Sensors. 2022;22. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/s22114132\u003c/span\u003e\u003cspan address=\"10.3390/s22114132\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYimer A, Yesuf HA, Ahmed S, Zemariam AB, Mussa E, Sirage N, et al. Optimizing machine learning models for predicting anemia among under-five children in Ethiopia: insights from Ethiopian demographic and health survey data. BMC Pediatr. 2025;25. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12887-025-05659-9\u003c/span\u003e\u003cspan address=\"10.1186/s12887-025-05659-9\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSiddiqa M, Shah G, Butt MS, Kamal A, Opoku ST. Early Childhood Anemia in Ghana: Prevalence and Predictors Using Machine Learning Techniques. Children. 2025;12. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.3390/children12070924\u003c/span\u003e\u003cspan address=\"10.3390/children12070924\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eKebede Kassaw A, Yimer A, Abey W, Molla TL, Zemariam AB. The application of machine learning approaches to determine the predictors of anemia among under five children in Ethiopia. Sci Rep. 2023;13. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1038/s41598-023-50128-x\u003c/span\u003e\u003cspan address=\"10.1038/s41598-023-50128-x\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eJa\u0026rsquo;afar IK, Uthman OA. Predicting Childhood Anaemia in Nigeria: A Machine Learning Approach to Uncover Key Risk Factors. Public Health Challenges. 2025. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1002/puh2.70135\u003c/span\u003e\u003cspan address=\"10.1002/puh2.70135\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. 4.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWorld Health Organization. Haemoglobin concentrations for the diagnosis of anaemia and assessment of severity. 2011. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://iris.who.int/handle/10665/85839\u003c/span\u003e\u003cspan address=\"https://iris.who.int/handle/10665/85839\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. Accessed 29 Jul 2025.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eProkhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTianqi Chen CG. XGBoost: A Scalable Tree Boosting System. 2016.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNIPS-2017-. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003elightgbm-a-highly-efficient-gradient-boosting-decision-tree-Paper\u003c/span\u003e\u003cspan address=\"http://lightgbm-a-highly-efficient-gradient-boosting-decision-tree-Paper\" targettype=\"URL\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003ePedregosa FABIANPEDREGOSAF, Michel V, Grisel OLIVIERGRISELO, Blondel M, Prettenhofer P, Weiss R et al. Scikit-learn: Machine Learning in Python Ga\u0026euml;l Varoquaux Bertrand Thirion Vincent Dubourg Alexandre Passos PEDREGOSA, VAROQUAUX, GRAMFORT ET AL. Matthieu Perrot. 2011.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eBorisov V, Leemann T, Se\u0026szlig;ler K, Haug J, Pawelczyk M, Kasneci G. Deep Neural Networks and Tabular Data: A Survey. 2022. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1109/TNNLS.2022.3229161\u003c/span\u003e\u003cspan address=\"10.1109/TNNLS.2022.3229161\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eShwartz-Ziv R, Armon A. Tabular Data: Deep Learning is Not All You Need. 2021.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eYouness G, Uyen N, Phan T, Cohen Boulakia B, Cohen B, Bootbogs B. BootBOGS: Hands-on optimizing Grid Search in hyperparameter tuning of MLP. 2023.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eCollins GS, Moons KGM, Dhiman P, Riley RD, Beam AL, Van Calster B, et al. TRIPOD\u0026thinsp;+\u0026thinsp;AI statement: Updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ. 2024. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1136/bmj-2023-078378\u003c/span\u003e\u003cspan address=\"10.1136/bmj-2023-078378\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eMcElfresh D, Khandagale S, Valverde J, C VP, Feuer B, Hegde C et al. When Do Neural Nets Outperform Boosted Trees on Tabular Data? 2024.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eVan Calster B, McLernon DJ, Van Smeden M, Wynants L, Steyerberg EW, Bossuyt P, et al. Calibration: The Achilles heel of predictive analytics. BMC Med. 2019;17. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1186/s12916-019-1466-7\u003c/span\u003e\u003cspan address=\"10.1186/s12916-019-1466-7\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eTesfaye SH, Seboka BT, Sisay D. Application of machine learning methods for predicting childhood anaemia: Analysis of Ethiopian Demographic Health Survey of 2016. PLoS ONE. 2024;19. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1371/journal.pone.0300172\u003c/span\u003e\u003cspan address=\"10.1371/journal.pone.0300172\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e. 4 April.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eWolpert DH, Macready WG. No Free Lunch Theorems for Optimization. 1997.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eG\u0026oacute;mez D, Rojas A. An empirical overview of the no free lunch theorem and its effect on real-world machine learning classification. Neural Comput. 2016;28:216\u0026ndash;28. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1162/NECO_a_00793\u003c/span\u003e\u003cspan address=\"10.1162/NECO_a_00793\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eNtenda PAM, Motsa MPS, Ntenda JK, Mbewe RB, Tiruneh FN. Predictors of iron status among preschool-age children in Malawi: insights from a micronutrient survey. Int Health. 2025. \u003cspan class=\"ExternalRef\"\u003e\u003cspan class=\"RefSource\"\u003ehttps://doi.org/10.1093/inthealth/ihaf054\u003c/span\u003e\u003cspan address=\"10.1093/inthealth/ihaf054\" targettype=\"DOI\" class=\"RefTarget\"\u003e\u003c/span\u003e\u003c/span\u003e.\u003c/span\u003e\u003c/li\u003e\u003cli\u003e\u003cspan\u003eSemba RD, Bloem MW, Keller H. The anemia of vitamin A deficiency: epidemiology and pathogenesis. Eur J Clin Nutr. 2002;56:271\u0026ndash;81. https://doi.org/10.1038=sj=ejcn=1601320.\u003c/span\u003e\u003c/li\u003e\u003c/ol\u003e"}],"fulltextSource":"","fullText":"","funders":[],"hasAdminPriorityOnWorkflow":false,"hasManuscriptDocX":true,"hasOptedInToPreprint":true,"hasPassedJournalQc":"","hasAnyPriority":false,"hideJournal":false,"highlight":"","institution":"","isAcceptedByJournal":true,"isAuthorSuppliedPdf":false,"isDeskRejected":"","isHiddenFromSearch":false,"isInQc":false,"isInWorkflow":false,"isPdf":false,"isPdfUpToDate":true,"isWithdrawnOrRetracted":false,"journal":{"display":true,"email":"[email protected]","identity":"bmc-pediatrics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bped","sideBox":"Learn more about [BMC Pediatrics](http://bmcpediatr.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bped/default.aspx","title":"BMC Pediatrics","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true},"keywords":"Childhood anaemia, Machine learning, Predictive modelling, Mozambique, Public health","lastPublishedDoi":"10.21203/rs.3.rs-7918744/v1","lastPublishedDoiUrl":"https://doi.org/10.21203/rs.3.rs-7918744/v1","license":{"name":"CC BY 4.0","url":"https://creativecommons.org/licenses/by/4.0/"},"manuscriptAbstract":"\u003ch2\u003eBackground\u003c/h2\u003e\u003cp\u003eChildhood anemia remains a major public health concern in sub-Saharan Africa, with Mozambique among the most affected countries. Despite the growing use of machine learning (ML) to enhance disease prediction, there is a lack of national-level evidence on its application to childhood anemia in low-resource settings. This study aimed to develop, compare, and interpret ML models to predict anemia among children under five years of age in Mozambique using nationally representative survey data.\u003c/p\u003e\u003ch2\u003eMethods\u003c/h2\u003e\u003cp\u003eData extracted from the 2022\u0026ndash;2023 Mozambique Demographic and Health Survey (MDHS). Children under five were included, with anemia defined as hemoglobin\u0026thinsp;\u0026lt;\u0026thinsp;11.0 g/dL. Five ML models were developed and validated, comprising Logistic Regression, Random Forest, XGBoost, LightGBM, and CatBoost. The predictive capacity of each model was assessed using AUC-ROC and slope calibration and SHAP analysis for interpretability.\u003c/p\u003e\u003ch2\u003eResults\u003c/h2\u003e\u003cp\u003eAmong the 1,638 children analyzed, the prevalence of anemia was high (40.3%). XGBoost demonstrated the best discrimination (AUC-ROC\u0026thinsp;=\u0026thinsp;0.722), with a sensitivity of 57.6% and a specificity of 93.2%. The SHAP analysis identified child\u0026rsquo;s age in months, lack of vitamin A supplementation, low household wealth, maternal education, number of children, and recent diarrhea as the strongest predictors.\u003c/p\u003e\u003ch2\u003eConclusion\u003c/h2\u003e\u003cp\u003eIn general, the essemble boosting models showed the highest discriminatory capacity, with XGBoost having the highest, with the potential of interpretable, low-cost predictive models to support early screening and targeted interventions for childhood anemia. Future work should explore regional retraining and recalibration, fairness evaluation, transfer learning, and external validation to enhance generalizability and field applicability.\u003c/p\u003e","manuscriptTitle":"Machine learning models for predicting childhood anemia in Mozambique: analysis from national survey data","msid":"","msnumber":"","nonDraftVersions":[{"code":1,"date":"2025-11-28 06:57:36","doi":"10.21203/rs.3.rs-7918744/v1","editorialEvents":[{"type":"communityComments","content":0},{"type":"decision","content":"Revision requested","date":"2026-01-08T17:30:09+00:00","index":"","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-01-07T14:14:10+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2026-01-06T14:50:33+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"244827139987633391260801564518864568748","date":"2025-12-30T15:58:57+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"100267703925205506012340504692895324352","date":"2025-12-29T12:38:42+00:00","index":"hide","fulltext":""},{"type":"editorInvitedReview","content":"","date":"2025-12-16T03:19:23+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"307608923858588827394754988600111585303","date":"2025-12-13T11:59:05+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"284970129785333051419914827369252363707","date":"2025-12-09T08:02:57+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"64734851173522591935425636819004873271","date":"2025-12-07T19:56:24+00:00","index":"hide","fulltext":""},{"type":"reviewerAgreed","content":"308677161475402240225463244046297188130","date":"2025-11-26T10:36:16+00:00","index":"hide","fulltext":""},{"type":"reviewersInvited","content":"","date":"2025-11-18T00:29:08+00:00","index":"","fulltext":""},{"type":"editorInvited","content":"","date":"2025-10-29T08:47:57+00:00","index":"","fulltext":""},{"type":"editorAssigned","content":"","date":"2025-10-29T07:52:35+00:00","index":"","fulltext":""},{"type":"checksComplete","content":"","date":"2025-10-29T07:50:36+00:00","index":"","fulltext":""},{"type":"submitted","content":"BMC Pediatrics","date":"2025-10-22T02:49:08+00:00","index":"","fulltext":""}],"status":"published","journal":{"display":true,"email":"[email protected]","identity":"bmc-pediatrics","isNatureJournal":false,"hasQc":true,"allowDirectSubmit":false,"externalIdentity":"bped","sideBox":"Learn more about [BMC Pediatrics](http://bmcpediatr.biomedcentral.com/)","snPcode":"","submissionUrl":"https://www.editorialmanager.com/bped/default.aspx","title":"BMC Pediatrics","twitterHandle":"BMC_series","acdcEnabled":true,"dfaEnabled":false,"editorialSystem":"em","reportingPortfolio":"BMC Series","inReviewEnabled":true,"inReviewRevisionsEnabled":true}}],"origin":"","ownerIdentity":"dc439a48-a733-4af1-a73e-cd4c73903806","owner":[],"postedDate":"November 28th, 2025","published":true,"recentEditorialEvents":[],"rejectedJournal":[],"revision":"","amendment":"","status":"published-in-journal","subjectAreas":[],"tags":[],"updatedAt":"2026-04-27T16:07:30+00:00","versionOfRecord":{"articleIdentity":"rs-7918744","link":"https://doi.org/10.1186/s12887-026-06925-0","journal":{"identity":"bmc-pediatrics","isVorOnly":false,"title":"BMC Pediatrics"},"publishedOn":"2026-04-25 15:57:29","publishedOnDateReadable":"April 25th, 2026"},"versionCreatedAt":"2025-11-28 06:57:36","video":"","vorDoi":"10.1186/s12887-026-06925-0","vorDoiUrl":"https://doi.org/10.1186/s12887-026-06925-0","workflowStages":[]},"version":"v1","identity":"rs-7918744","journalConfig":"researchsquare"},"__N_SSP":true},"page":"/article/[identity]/[[...version]]","query":{"redirect":"/article/rs-7918744","identity":"rs-7918744","version":["v1"]},"buildId":"8U1c8b4HqxoKbykW_rLl7","isFallback":false,"isExperimentalCompile":false,"dynamicIds":[84888],"gssp":true,"scriptLoader":[]}

Text is read by the "Ask this paper" AI Q&A widget below. Extraction quality varies by source — PMC NXML preserves structure cleanly, OA-HTML may include some navigation residue, and OA-PDF can have broken hyphenation. The publisher copy (via DOI) is the canonical version.

My notes (saved in your browser only)

⚙ Ask this paper AI returns verbatim quotes from the full text · source: preprint-html ⓘ

Answers must be backed by verbatim quotes from this paper's full text. Hallucinated quotes are dropped automatically; if no verbatim passage answers the question, we say so. How this works

Citation neighborhood (no data yet)

We don't have any in-corpus citations linked to this paper yet. This is a recent paper (2025) — citers typically take a year or two to land, and the OpenAlex reference graph may still be filling in.

Source provenance

europepmc: last seen: 2026-05-20T01:45:00.602351+00:00